JP7679871B2

JP7679871B2 - Fingering practice device, fingering practice method, and training method

Info

Publication number: JP7679871B2
Application number: JP2023505094A
Authority: JP
Inventors: 正博鈴木
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2021-03-09
Filing date: 2021-11-01
Publication date: 2025-05-20
Anticipated expiration: 2041-11-01
Also published as: JP7841641B2; JPWO2022190453A1; JP2025111710A; WO2022190453A1; CN116940978A

Description

本発明は、楽器を演奏する際の運指を提示する運指提示装置、訓練装置、運指提示方法および訓練方法に関する。 The present invention relates to a fingering presentation device, a training device, a fingering presentation method and a training method that present fingering when playing a musical instrument.

楽器の演奏の練習を補助するための装置が知られている。例えば、特許文献１に記載された情報処理装置においては、演奏者の演奏技術レベルが算出され、算出された演奏技術レベルに基づいて、演奏者に演奏可能な楽曲が提示される。しかしながら、演奏者が未熟である場合には、各音符を楽器により演奏する際の指使い（以下、運指と呼ぶ。）を適切に決めることは容易ではない。これに対し、特許文献２には、音符系列の各音符での運指を確率モデルに基づいて決定する運指決定方法が記載されている。
特開２０１３－０８３８４５号公報特開２００７－２４１０３４号公報 There are known devices for assisting practice of playing musical instruments. For example, in an information processing device described in Patent Literature 1, a performance technical level of a player is calculated, and musical pieces that the player can play are presented based on the calculated performance technical level. However, if the player is an inexperienced player, it is not easy to appropriately determine fingering (hereinafter referred to as fingering) for playing each note on an instrument. In response to this, Patent Literature 2 describes a fingering determination method for determining fingering for each note of a note sequence based on a probabilistic model.
JP 2013-083845 A JP 2007-241034 A

特許文献２によれば、演奏者は、確率モデルに基づいた楽器演奏における運指を認識することができる。しかしながら、現実的には運指の組み合わせは無数に存在し、楽曲を演奏するための最適な運指は１つではない。そのため、より適切な運指が提示されることが望まれる。According to Patent Document 2, a performer can recognize fingerings for playing an instrument based on a probabilistic model. However, in reality, there are an infinite number of fingering combinations, and there is not just one optimal fingering for playing a piece of music. Therefore, it is desirable to be presented with more appropriate fingerings.

本発明の目的は、楽器を演奏する際の適切な運指を提示することが可能な運指提示装置、訓練装置、運指提示方法および訓練方法を提供することである。 The object of the present invention is to provide a fingering presentation device, a training device, a fingering presentation method and a training method capable of presenting appropriate fingering when playing a musical instrument.

本発明の第１の局面に従う運指提示装置は、複数の音符からなる音符列を含む時系列データを受け付ける受付部と、訓練済モデルを用いて、音符列に含まれる少なくとも一部の音符を楽器により演奏する際に使用する指を示す指情報を推定する推定部とを備え、時系列データは、音符列を演奏する演奏者を示す演奏者識別子をさらに含み、推定部は、演奏者識別子に基づいて指情報を推定し、訓練済モデルは、複数の音符からなる参照音符列を含む入力時系列データと、参照音符列に含まれる少なくとも一部の音符を楽器により演奏する際に使用する指を示す出力指情報との間の入出力関係を習得した機械学習モデルであり、入力時系列データは、参照音符列を演奏する参照演奏者を示す参照演奏者識別子をさらに含み、出力指情報は、音符を楽器により演奏する際に使用する参照演奏者の指を示す。
本発明の第２の局面に従う運指提示装置は、複数の音符からなる音符列を含む時系列データを受け付ける受付部と、訓練済モデルを用いて、音符列に含まれる少なくとも一部の音符を楽器により演奏する際に使用する指を示す指情報を推定する推定部とを備え、時系列データは、音符列に含まれる音符のうち第１の割合の音符を楽器により演奏する際に使用する指を示す基本指情報をさらに含み、推定部は、基本指情報に基づいて、音符列に含まれる音符のうち第１の割合よりも大きい第２の割合の音符を楽器により演奏する際に使用する指を示す指情報を推定し、訓練済モデルは、複数の音符からなる参照音符列を含む入力時系列データと、参照音符列に含まれる少なくとも一部の音符を楽器により演奏する際に使用する指を示す出力指情報との間の入出力関係を習得した機械学習モデルであり、入力時系列データは、参照音符列に含まれる音符のうち第１の割合の音符を楽器により演奏する際に使用する指を示す基本指情報をさらに含み、出力指情報は、入力時系列データに含まれる基本指情報をさらに含む。
本発明の第３の局面に従う運指提示装置は、複数の音符からなる音符列を含む時系列データを受け付ける受付部と、訓練済モデルを用いて、音符列に含まれる少なくとも一部の音符を楽器により演奏する際に使用する指を示す指情報、および音符列から運指を付与する対象となる音符を示す音符情報を推定する推定部とを備え、推定部は、音符列に含まれる各音符を楽器により演奏する際に使用する指を示す中間指情報と、音符情報とを推定し、中間指情報から、音符情報における運指を付与する対象となる音符に対応する指情報を削除することにより、音符列に含まれる音符のうち音符情報が示す音符以外の音符を楽器により演奏する際に使用する指を示す指情報を推定する。 A fingering presentation device according to a first aspect of the present invention includes a receiving unit that receives time series data including a note sequence consisting of a plurality of notes, and an estimation unit that uses a trained model to estimate fingering information indicating fingers to be used when playing at least some of the notes included in the note sequence on an instrument, wherein the time series data further includes a player identifier indicating a player playing the note sequence, and the estimation unit estimates the fingering information based on the player identifier , and the trained model is a machine learning model that has mastered an input/output relationship between input time series data including a reference note sequence consisting of a plurality of notes, and output fingering information indicating fingers to be used when playing at least some of the notes included in the reference note sequence on an instrument, wherein the input time series data further includes a reference player identifier indicating a reference player playing the reference note sequence, and the output fingering information indicates the fingers of the reference player to be used when playing the notes on the instrument .
A fingering presentation device according to a second aspect of the present invention includes a receiving unit that receives time-series data including a note sequence consisting of a plurality of notes, and an estimation unit that uses a trained model to estimate fingering information indicating fingers to be used when playing at least some of the notes included in the note sequence on a musical instrument, wherein the time-series data further includes basic fingering information indicating fingers to be used when playing a first proportion of the notes included in the note sequence on a musical instrument, and the estimation unit estimates fingering information indicating fingers to be used when playing a second proportion of the notes included in the note sequence on a musical instrument, the second proportion being greater than the first proportion, based on the basic fingering information, and the trained model is a machine learning model that has learned an input-output relationship between input time-series data including a reference note sequence consisting of a plurality of notes, and output fingering information indicating fingers to be used when playing at least some of the notes included in the reference note sequence on a musical instrument, and the input time-series data further includes basic fingering information indicating fingers to be used when playing the first proportion of the notes included in the reference note sequence on a musical instrument, and the output fingering information further includes the basic fingering information included in the input time-series data .
A fingering presentation device according to a third aspect of the present invention includes a receiving unit that receives time-series data including a note sequence consisting of a plurality of notes, and an estimation unit that uses a trained model to estimate finger information indicating the fingers to be used when playing at least some of the notes included in the note sequence on an instrument, and note information indicating the notes to which fingering is to be assigned from the note sequence, wherein the estimation unit estimates intermediate finger information indicating the fingers to be used when playing each note included in the note sequence on an instrument, and the note information , and estimates finger information indicating the fingers to be used when playing notes included in the note sequence on an instrument other than the notes indicated by the note information by deleting from the intermediate finger information the finger information corresponding to the notes to which fingering is to be assigned in the note information.

本発明の第４の局面に従う訓練装置は、複数の音符からなる参照音符列を含む入力時系列データを取得する第１の取得部と、参照音符列に含まれる少なくとも一部の音符を楽器により演奏する際に使用する指を示す出力指情報、または参照音符列から運指を付与する対象となる音符を示す出力音符情報を取得する第２の取得部と、入力時系列データと出力指情報または出力音符情報との間の入出力関係を習得した訓練済モデルを構築する構築部とを備え、入力時系列データは、参照音符列を演奏する参照演奏者を示す参照演奏者識別子をさらに含み、出力指情報は、楽器により演奏する際に使用する参照演奏者の指を示す。 A training device according to a fourth aspect of the present invention includes a first acquisition unit that acquires input time series data including a reference note sequence consisting of a plurality of notes, a second acquisition unit that acquires output fingering information indicating fingers to be used when playing at least some of the notes included in the reference note sequence on an instrument, or output note information indicating notes to which fingering is to be assigned from the reference note sequence, and a construction unit that constructs a trained model that has acquired the input/output relationship between the input time series data and the output fingering information or the output note information, wherein the input time series data further includes a reference player identifier indicating the reference player playing the reference note sequence, and the output fingering information indicates the fingers of the reference player to be used when playing on the instrument.

本発明の第５の局面に従う運指提示方法は、複数の音符からなる音符列を含む時系列データを受け付け、訓練済モデルを用いて、音符列に含まれる少なくとも一部の音符を楽器により演奏する際に使用する指を示す指情報を推定し、時系列データは、音符列を演奏する演奏者を示す演奏者識別子をさらに含み、指情報を推定することは、演奏者識別子に基づいて指情報を推定することを含み、訓練済モデルは、複数の音符からなる参照音符列を含む入力時系列データと、参照音符列に含まれる少なくとも一部の音符を楽器により演奏する際に使用する指を示す出力指情報との間の入出力関係を習得した機械学習モデルであり、入力時系列データは、参照音符列を演奏する参照演奏者を示す参照演奏者識別子をさらに含み、出力指情報は、音符を楽器により演奏する際に使用する参照演奏者の指を示し、コンピュータにより実行される。
本発明の第６の局面に従う運指提示方法は、複数の音符からなる音符列を含む時系列データを受け付け、訓練済モデルを用いて、音符列に含まれる少なくとも一部の音符を楽器により演奏する際に使用する指を示す指情報を推定し、時系列データは、音符列に含まれる音符のうち第１の割合の音符を楽器により演奏する際に使用する指を示す基本指情報をさらに含み、指情報を推定することは、基本指情報に基づいて、音符列に含まれる音符のうち第１の割合よりも大きい第２の割合の音符を楽器により演奏する際に使用する指を示す指情報を推定することを含み、訓練済モデルは、複数の音符からなる参照音符列を含む入力時系列データと、参照音符列に含まれる少なくとも一部の音符を楽器により演奏する際に使用する指を示す出力指情報との間の入出力関係を習得した機械学習モデルであり、入力時系列データは、参照音符列に含まれる音符のうち第１の割合の音符を楽器により演奏する際に使用する指を示す基本指情報をさらに含み、出力指情報は、入力時系列データに含まれる基本指情報をさらに含み、コンピュータにより実行される。
本発明の第７の局面に従う運指提示方法は、複数の音符からなる音符列を含む時系列データを受け付け、訓練済モデルを用いて、音符列に含まれる少なくとも一部の音符を楽器により演奏する際に使用する指を示す指情報、および音符列から運指を付与する対象となる音符を示す音符情報を推定し、指情報を推定することは、音符列に含まれる各音符を楽器により演奏する際に使用する指を示す中間指情報と、音符情報とを推定し、中間指情報から、音符情報における運指を付与する対象となる音符に対応する指情報を削除することにより、音符列に含まれる音符のうち音符情報が示す音符以外の音符を楽器により演奏する際に使用する指を示す指情報を推定することを含み、コンピュータにより実行される。 A fingering guidance presentation method according to a fifth aspect of the present invention receives time-series data including a note sequence consisting of a plurality of notes, and estimates fingering information indicating fingers to be used when playing at least some of the notes included in the note sequence on a musical instrument using a trained model, the time-series data further including a player identifier indicating a player playing the note sequence, and estimating the fingering information includes estimating the fingering information based on the player identifier, the trained model is a machine learning model that has acquired an input-output relationship between input time-series data including a reference note sequence consisting of a plurality of notes and output fingering information indicating fingers to be used when playing at least some of the notes included in the reference note sequence on a musical instrument, the input time-series data further including a reference player identifier indicating a reference player playing the reference note sequence, and the output fingering information indicating the fingers of the reference player to be used when playing the notes on the musical instrument, and is executed by a computer.
A fingering guidance presentation method according to a sixth aspect of the present invention receives time-series data including a note sequence consisting of a plurality of notes, and estimates fingering information indicating fingers to be used when playing at least some of the notes included in the note sequence on a musical instrument using a trained model, the time-series data further including basic fingering information indicating fingers to be used when playing a first proportion of the notes included in the note sequence on a musical instrument, and estimating the fingering information includes estimating, based on the basic fingering information, fingering information indicating fingers to be used when playing a second proportion of the notes included in the note sequence on a musical instrument, the second proportion being greater than the first proportion, and the trained model is a machine learning model that has acquired an input-output relationship between input time-series data including a reference note sequence consisting of a plurality of notes and output fingering information indicating fingers to be used when playing at least some of the notes included in the reference note sequence on a musical instrument, the input time-series data further including basic fingering information indicating fingers to be used when playing the first proportion of the notes included in the reference note sequence on a musical instrument, and the output fingering information further including the basic fingering information included in the input time-series data, and is executed by a computer.
A fingering presentation method according to a seventh aspect of the present invention accepts time-series data including a note sequence consisting of a plurality of notes, and uses a trained model to estimate finger information indicating fingers to be used when playing at least some of the notes included in the note sequence on an instrument, and note information indicating notes to which fingering is to be assigned from the note sequence, where estimating the finger information includes estimating intermediate finger information indicating fingers to be used when playing each note included in the note sequence on an instrument, and the note information, and deleting from the intermediate finger information finger information corresponding to the note to which fingering in the note information is to be assigned, thereby estimating finger information indicating fingers to be used when playing notes included in the note sequence on an instrument other than the notes indicated by the note information, and is executed by a computer.

本発明の第８の局面に従う訓練方法は、複数の音符からなる参照音符列を含む入力時系列データを取得し、参照音符列に含まれる少なくとも一部の音符を楽器により演奏する際に使用する指を示す出力指情報、または参照音符列から運指を付与する対象となる音符を示す出力音符情報を取得し、入力時系列データと出力指情報または出力音符情報との間の入出力関係を習得した訓練済モデルを構築し、入力時系列データは、参照音符列を演奏する参照演奏者を示す参照演奏者識別子をさらに含み、出力指情報は、楽器により演奏する際に使用する参照演奏者の指を示し、コンピュータにより実行される。 A training method according to an eighth aspect of the present invention includes the steps of: acquiring input time series data including a reference note sequence consisting of a plurality of notes; acquiring output fingering information indicating fingers to be used when playing at least some of the notes included in the reference note sequence on an instrument, or output note information indicating notes to which fingering is to be assigned from the reference note sequence; constructing a trained model that has acquired an input/output relationship between the input time series data and the output fingering information or output note information, wherein the input time series data further includes a reference player identifier indicating a reference player playing the reference note sequence, and the output fingering information indicates the fingers of the reference player to be used when playing on the instrument, and is executed by a computer.

本発明によれば、楽器を演奏する際の適切な運指を提示することができる。 The present invention makes it possible to present appropriate fingering when playing a musical instrument.

図１は本発明の第１の実施の形態に係る運指提示装置および訓練装置を含む処理システムの構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of a processing system including a fingering technique presentation device and a training device according to a first embodiment of the present invention. 図２は各訓練データの一例を示す図である。FIG. 2 is a diagram showing an example of each training data. 図３は訓練装置および運指提示装置の構成を示すブロック図である。FIG. 3 is a block diagram showing the configuration of the training device and the fingering technique presentation device. 図４は表示部に表示される補助用楽譜の一例を示す。FIG. 4 shows an example of the auxiliary musical score displayed on the display unit. 図５は図３の訓練装置による訓練処理の一例を示すフローチャートである。FIG. 5 is a flow chart showing an example of a training process performed by the training device of FIG. 図６は図３の運指提示装置による運指提示処理の一例を示すフローチャートである。FIG. 6 is a flowchart showing an example of a fingering guide process performed by the fingering guide device of FIG. 図７は入力時系列データの他の例を示す図である。FIG. 7 is a diagram showing another example of the input time series data. 図８は変形例における入力時系列データの一例を示す図である。FIG. 8 is a diagram showing an example of input time-series data in the modified example. 図９は変形例における出力指情報の一例を示す図である。FIG. 9 is a diagram showing an example of output finger information in the modified example. 図１０は第２の実施の形態における入力時系列データの一例を示す図である。FIG. 10 is a diagram showing an example of input time-series data in the second embodiment. 図１１は第３の実施の形態における出力指情報の一例を示す図である。FIG. 11 is a diagram showing an example of output finger information in the third embodiment. 図１２は変形例における運指提示処理の一例を示すフローチャートである。FIG. 12 is a flowchart showing an example of a fingering guide suggestion process in the modified example. 図１３は運指提示処理のステップＳ２４で推定される指情報の一例を示す図である。FIG. 13 is a diagram showing an example of finger information estimated in step S24 of the fingering pattern suggestion process.

［１］第１の実施の形態
（１）処理システムの構成
以下、本発明の実施の形態に係る運指提示装置、訓練装置、運指提示方法および訓練方法について図面を用いて詳細に説明する。図１は、本発明の第１の実施の形態に係る運指提示装置および訓練装置を含む処理システムの構成を示すブロック図である。図１に示すように、処理システム１００は、ＲＡＭ（ランダムアクセスメモリ）１１０、ＲＯＭ（リードオンリメモリ）１２０、ＣＰＵ（中央演算処理装置）１３０、記憶部１４０、操作部１５０および表示部１６０を備える。 [1] First embodiment (1) Configuration of the processing system Hereinafter, a fingering practice device, a training device, a fingering practice method, and a training method according to an embodiment of the present invention will be described in detail with reference to the drawings. Fig. 1 is a block diagram showing the configuration of a processing system including a fingering practice device and a training device according to a first embodiment of the present invention. As shown in Fig. 1, a processing system 100 includes a RAM (random access memory) 110, a ROM (read only memory) 120, a CPU (central processing unit) 130, a storage unit 140, an operation unit 150, and a display unit 160.

処理システム１００は、パーソナルコンピュータ、タブレット端末またはスマートフォン等のコンピュータにより実現される。あるいは、処理システム１００は、イーサネット等の通信路により接続された複数のコンピュータの共同動作により実現されてもよいし、電子ピアノ等の演奏機能を備えた電子楽器により実現されてもよい。The processing system 100 is realized by a computer such as a personal computer, a tablet terminal, or a smartphone. Alternatively, the processing system 100 may be realized by the cooperative operation of multiple computers connected by a communication path such as Ethernet, or may be realized by an electronic musical instrument with a performance function such as an electronic piano.

ＲＡＭ１１０、ＲＯＭ１２０、ＣＰＵ１３０、記憶部１４０、操作部１５０および表示部１６０は、バス１７０に接続される。ＲＡＭ１１０、ＲＯＭ１２０およびＣＰＵ１３０により訓練装置１０および運指提示装置２０が構成される。本実施の形態では、訓練装置１０と運指提示装置２０とは共通の処理システム１００により構成されるが、別個の処理システムにより構成されてもよい。The RAM 110, the ROM 120, the CPU 130, the memory unit 140, the operation unit 150 and the display unit 160 are connected to a bus 170. The training device 10 and the fingering presentation device 20 are configured by the RAM 110, the ROM 120 and the CPU 130. In this embodiment, the training device 10 and the fingering presentation device 20 are configured by a common processing system 100, but may be configured by separate processing systems.

ＲＡＭ１１０は、例えば揮発性メモリからなり、ＣＰＵ１３０の作業領域として用いられる。ＲＯＭ１２０は、例えば不揮発性メモリからなり、訓練プログラムおよび運指提示プログラムを記憶する。ＣＰＵ１３０は、ＲＯＭ１２０に記憶された訓練プログラムをＲＡＭ１１０上で実行することにより訓練処理を行う。また、ＣＰＵ１３０は、ＲＯＭ１２０に記憶された運指提示プログラムをＲＡＭ１１０上で実行することにより運指提示処理を行う。訓練処理および運指提示処理の詳細については後述する。 The RAM 110 is, for example, a volatile memory, and is used as a working area for the CPU 130. The ROM 120 is, for example, a non-volatile memory, and stores a training program and a fingering presentation program. The CPU 130 performs training processing by executing the training program stored in the ROM 120 on the RAM 110. The CPU 130 also performs fingering presentation processing by executing the fingering presentation program stored in the ROM 120 on the RAM 110. Details of the training processing and the fingering presentation processing will be described later.

訓練プログラムまたは運指提示プログラムは、ＲＯＭ１２０ではなく記憶部１４０に記憶されてもよい。あるいは、訓練プログラムまたは運指提示プログラムは、コンピュータが読み取り可能な記憶媒体に記憶された形態で提供され、ＲＯＭ１２０または記憶部１４０にインストールされてもよい。あるいは、処理システム１００がインターネット等のネットワークに接続されている場合には、当該ネットワーク上のサーバ（クラウドサーバを含む。）から配信された訓練プログラムまたは運指提示プログラムがＲＯＭ１２０または記憶部１４０にインストールされてもよい。The training program or fingering presentation program may be stored in the storage unit 140 instead of the ROM 120. Alternatively, the training program or fingering presentation program may be provided in a form stored in a computer-readable storage medium and installed in the ROM 120 or the storage unit 140. Alternatively, when the processing system 100 is connected to a network such as the Internet, the training program or fingering presentation program distributed from a server (including a cloud server) on the network may be installed in the ROM 120 or the storage unit 140.

記憶部１４０は、ハードディスク、光学ディスク、磁気ディスクまたはメモリカード等の記憶媒体を含み、訓練済モデルＭおよび複数の訓練データＤを記憶する。訓練済モデルＭまたは各訓練データＤは、記憶部１４０に記憶されず、コンピュータが読み取り可能な記憶媒体に記憶されていてもよい。あるいは、処理システム１００がネットワークに接続されている場合には、訓練済モデルＭまたは各訓練データＤは、当該ネットワーク上のサーバに記憶されていてもよい。The storage unit 140 includes a storage medium such as a hard disk, an optical disk, a magnetic disk, or a memory card, and stores the trained model M and a plurality of training data D. The trained model M or each training data D may be stored in a computer-readable storage medium rather than in the storage unit 140. Alternatively, when the processing system 100 is connected to a network, the trained model M or each training data D may be stored in a server on the network.

（２）訓練データ
訓練済モデルＭは、運指提示装置２０の使用者（以下、演奏者と呼ぶ。）が楽器により楽曲を演奏する際の運指を提示するために訓練された機械学習モデルであり、複数の訓練データＤを用いて構築される。訓練装置１０の使用者は、操作部１５０を操作することにより、訓練データＤを生成することができる。訓練データＤは、参照演奏者の演奏知識または演奏スタイル等に基づいて作成されたデータである。参照演奏者は、楽曲の演奏に関して比較的高い技量を有する。参照演奏者は、楽曲の演奏における演奏者の指導者または師であってもよい。 (2) Training Data The trained model M is a machine learning model trained to present fingering when a user of the fingering presentation device 20 (hereinafter referred to as a performer) plays a musical piece on an instrument, and is constructed using multiple training data D. A user of the training device 10 can generate the training data D by operating the operation unit 150. The training data D is data created based on the performance knowledge or performance style of a reference performer. The reference performer has a relatively high level of skill in performing a musical piece. The reference performer may be an instructor or teacher of the performer in performing the musical piece.

訓練データＤは、入力時系列データと出力指情報との組を示す。入力時系列データは、複数の音符からなる参照音符列を示す。入力時系列データは楽譜の画像を示す画像データであってもよい。出力指情報は、参照音符列の各音符を楽器により演奏する際に使用する参照演奏者の指を示し、参照音符列を演奏する際の運指を提示するために用いることができる。出力指情報は、各指に付与された固有の番号であってもよい。本例では、親指、人指し指、中指、薬指および小指に番号「１」～「５」がそれぞれ付与される。 Training data D indicates a set of input time series data and output fingering information. The input time series data indicates a reference note sequence consisting of a plurality of notes. The input time series data may be image data showing an image of a musical score. The output fingering information indicates the fingers of the reference performer to be used when playing each note of the reference note sequence on an instrument, and may be used to present fingering when playing the reference note sequence. The output fingering information may be a unique number assigned to each finger. In this example, the numbers "1" to "5" are assigned to the thumb, index finger, middle finger, ring finger, and little finger, respectively.

ここで、楽曲を演奏するための最適な運指は、演奏者の身体的特徴または演奏者による演奏の流儀により異なる。そこで、本実施の形態においては、入力時系列データは、参照音符列を演奏する参照演奏者の分類（カテゴリ）を示す参照演奏者識別子をさらに含む。参照演奏者識別子は、参照演奏者の身体的特徴と参照演奏者による演奏の流儀との少なくとも一方ごとに異なるように決定される。参照演奏者の身体的特徴は、例えば参照演奏者の手の大きさ（指の長さ）、年齢、性別または大人か子供かの区別を含む。Here, the optimal fingering for playing a piece of music varies depending on the physical characteristics of the performer or the performance style of the performer. Therefore, in this embodiment, the input time series data further includes a reference performer identifier indicating the classification (category) of the reference performer who performs the reference sequence of notes. The reference performer identifier is determined to be different for at least one of the physical characteristics of the reference performer and the performance style of the reference performer. The physical characteristics of the reference performer include, for example, the hand size (finger length), age, sex, or whether the reference performer is an adult or a child.

図２は、各訓練データＤの一例を示す図である。図２の例は、参照演奏者がピアノを演奏する際の入力時系列データおよび出力指情報の一部を示す。図２に示すように、入力時系列データＡは、要素Ａ０～Ａ１６を含む。要素Ａ０は、参照演奏者識別子に対応し、参照演奏者の身体的特徴と参照演奏者による演奏の流儀との少なくとも一方ごとに異なる文字列により表される。要素Ａ１～Ａ１６は、参照音符列に対応する。本例では、要素Ａ０は入力時系列データＡにおける先頭、すなわち参照音符列（要素Ａ１～Ａ１６）の前に配置されるが、入力時系列データＡにおける任意の位置に配置されてもよい。 Figure 2 is a diagram showing an example of each training data D. The example in Figure 2 shows a portion of the input time series data and output fingering information when a reference performer plays the piano. As shown in Figure 2, the input time series data A includes elements A0 to A16. Element A0 corresponds to a reference performer identifier and is represented by a character string that differs for at least one of the physical characteristics of the reference performer and the style of performance by the reference performer. Elements A1 to A16 correspond to the reference note sequence. In this example, element A0 is placed at the beginning of the input time series data A, i.e., before the reference note sequence (elements A1 to A16), but it may be placed at any position in the input time series data A.

要素Ａ１，Ａ３，Ａ５，…，Ａ１５における“Ｌ”は左手を意味し、数字は鍵に付与された番号を意味し、“ｏｎ”および“ｏｆｆ”はそれぞれ押鍵および離鍵を意味する。要素Ａ２，Ａ４，Ａ６，…，Ａ１６における“ｗａｉｔ”は待機を意味し、数字は時間の長さを意味する。したがって、要素Ａ１～Ａ４は、番号「６６」の鍵を押して１３単位時間だけ維持した後、番号「６６」の鍵を離して２単位時間だけ維持することを意味する。 In elements A1, A3, A5, ..., A15, "L" means the left hand, the numbers mean the numbers assigned to the keys, and "on" and "off" mean pressing and releasing a key, respectively. In elements A2, A4, A6, ..., A16, "wait" means waiting, and the numbers mean the length of time. Thus, elements A1 to A4 mean pressing key number "66" and holding it for 13 units of time, and then releasing key number "66" and holding it for 2 units of time.

出力指情報Ｂは、入力時系列データＡの要素Ａ０～Ａ１６にそれぞれ対応する要素Ｂ０～Ｂ１６を含む。要素Ｂ０は、参照演奏者識別子を示し、要素Ａ０と同一の文字列により表される。要素Ｂ１，Ｂ３，Ｂ５，…，Ｂ１５における“Ｌ”は左手を意味し、数字は指に付与された番号を意味し、“ｄｏｗｎ”および“ｕｐ”はそれぞれ押し上げおよび押し下げを意味する。要素Ｂ２，Ｂ４，Ｂ６，…，Ｂ１６における“ｗａｉｔ”は待機を意味し、数字は時間の長さを意味する。したがって、要素Ｂ１～Ｂ４は、左手の中指を押し下げて１３単位時間だけ待機した後、左手の中指を押し上げて２単位時間だけ維持することを意味する。 The output finger information B includes elements B0 to B16, which correspond to elements A0 to A16 of the input time series data A, respectively. Element B0 indicates the reference performer identifier and is represented by the same character string as element A0. In elements B1, B3, B5, ..., B15, "L" means the left hand, the numbers mean the numbers assigned to the fingers, and "down" and "up" mean pushing up and pushing down, respectively. In elements B2, B4, B6, ..., B16, "wait" means waiting, and the numbers mean the length of time. Thus, elements B1 to B4 mean that the middle finger of the left hand is pushed down and waited for 13 units of time, and then the middle finger of the left hand is pushed up and maintained for 2 units of time.

図２の訓練データＤは、左手の運指を示すために生成されるが、実施の形態はこれに限定されない。訓練データＤは、右手の運指を示すために生成されてもよいし、左手および右手の各々の運指を示すために生成されてもよい。右手の運指を示すための入力時系列データＡおよび出力指情報Ｂの要素においては、文字「Ｌ」ではなく例えば「Ｒ」が使用されてもよい。 The training data D in FIG. 2 is generated to indicate fingering for the left hand, but the embodiment is not limited to this. The training data D may be generated to indicate fingering for the right hand, or to indicate fingering for each of the left and right hands. In the elements of the input time series data A and the output fingering information B to indicate fingering for the right hand, the letter "R", for example, may be used instead of "L".

（３）訓練装置および運指提示装置
図３は、訓練装置１０および運指提示装置２０の構成を示すブロック図である。図３に示すように、訓練装置１０は、機能部として、第１の取得部１１、第２の取得部１２および構築部１３を含む。図１のＣＰＵ１３０が訓練プログラムを実行することにより、訓練装置１０の機能部が実現される。訓練装置１０の機能部の少なくとも一部は、電子回路等のハードウエアにより実現されてもよい。 (3) Training Device and Fingering Presentation Device Fig. 3 is a block diagram showing the configurations of the training device 10 and the fingering presentation device 20. As shown in Fig. 3, the training device 10 includes, as functional units, a first acquisition unit 11, a second acquisition unit 12, and a construction unit 13. The functional units of the training device 10 are realized by the CPU 130 in Fig. 1 executing a training program. At least a part of the functional units of the training device 10 may be realized by hardware such as an electronic circuit.

第１の取得部１１は、記憶部１４０等に記憶された各訓練データＤから入力時系列データＡを取得する。第２の取得部１２は、各訓練データＤから出力指情報Ｂを取得する。構築部１３は、各訓練データＤについて、第１の取得部１１により取得された入力時系列データＡを入力要素とし、第２の取得部１２により取得された出力指情報Ｂを出力要素とする機械学習を行う。複数の訓練データＤについて機械学習を繰り返すことにより、構築部１３は、入力時系列データＡと出力指情報Ｂとの間の入出力関係を示す訓練済モデルＭを構築する。The first acquisition unit 11 acquires input time series data A from each training data D stored in the memory unit 140 or the like. The second acquisition unit 12 acquires output finger information B from each training data D. The construction unit 13 performs machine learning for each training data D, using the input time series data A acquired by the first acquisition unit 11 as an input element and the output finger information B acquired by the second acquisition unit 12 as an output element. By repeating machine learning for multiple training data D, the construction unit 13 constructs a trained model M that indicates the input/output relationship between the input time series data A and the output finger information B.

本例では、構築部１３はＴｒａｎｓｆｏｒｍｅｒを訓練することにより訓練済モデルＭを構築するが、実施の形態はこれに限定されない。構築部１３は、時系列を扱う他の方式の機械学習モデルを訓練することにより訓練済モデルＭを構築してもよい。構築部１３により構築された訓練済モデルＭは、例えば記憶部１４０に記憶される。構築部１３により構築された訓練済モデルＭは、ネットワーク上のサーバ等に記憶されてもよい。In this example, the construction unit 13 constructs the trained model M by training the Transformer, but the embodiment is not limited to this. The construction unit 13 may construct the trained model M by training another type of machine learning model that handles time series. The trained model M constructed by the construction unit 13 is stored in, for example, the storage unit 140. The trained model M constructed by the construction unit 13 may be stored in a server on a network, etc.

運指提示装置２０は、機能部として、受付部２１、推定部２２および生成部２３を含む。図１のＣＰＵ１３０が運指提示プログラムを実行することにより、運指提示装置２０の機能部が実現される。運指提示装置２０の機能部の少なくとも一部は、電子回路等のハードウエアにより実現されてもよい。The fingering presentation device 20 includes, as functional units, a reception unit 21, an estimation unit 22, and a generation unit 23. The functional units of the fingering presentation device 20 are realized by the CPU 130 in Fig. 1 executing the fingering presentation program. At least a part of the functional units of the fingering presentation device 20 may be realized by hardware such as an electronic circuit.

本実施の形態では、受付部２１は、複数の音符からなる音符列を含む時系列データを受け付ける。演奏者は、楽譜の画像を示す画像データを時系列データとして受付部２１に与えることができる。あるいは、演奏者は、操作部１５０を操作することにより時系列データを生成し、受付部２１に与えることができる。In this embodiment, the receiving unit 21 receives time series data including a string of notes consisting of a plurality of notes. The performer can provide image data showing an image of a musical score to the receiving unit 21 as time series data. Alternatively, the performer can generate time series data by operating the operation unit 150 and provide the time series data to the receiving unit 21.

本例では、時系列データは、図２の入力時系列データＡと同様の構成を有し、音符列を演奏する演奏者の分類（カテゴリ）を示す演奏者識別子をさらに含む。演奏者識別子は、演奏者の身体的特徴と演奏者による演奏の流儀との少なくとも一方ごとに異なるように決定される。演奏者の身体的特徴は、例えば演奏者の手の大きさ、年齢、性別または大人か子供かの区別を含む。In this example, the time series data has a structure similar to that of the input time series data A in FIG. 2, and further includes a performer identifier indicating a classification (category) of a performer who performs the sequence of notes. The performer identifier is determined to be different for at least one of the physical characteristics of the performer and the performance style of the performer. The physical characteristics of the performer include, for example, the size of the performer's hands, age, sex, or whether the performer is an adult or a child.

推定部２２は、記憶部１４０等に記憶された訓練済モデルＭを用いて指情報を推定する。指情報は、受付部２１により受け付けられた音符列の各音符を演奏する際に使用する演奏者の指を示し、音符列および演奏者識別子に基づいて推定される。指情報は、各指に付与された固有の番号であってもよい。生成部２３は、受付部２１により受け付けられた時系列データの音符列および推定部２２により推定された指情報に基づいて楽譜情報を生成する。The estimation unit 22 estimates finger information using a trained model M stored in the memory unit 140 or the like. The finger information indicates the player's fingers used when playing each note in the note sequence accepted by the acceptance unit 21, and is estimated based on the note sequence and a player identifier. The finger information may be a unique number assigned to each finger. The generation unit 23 generates musical score information based on the note sequence of the time-series data accepted by the acceptance unit 21 and the finger information estimated by the estimation unit 22.

表示部１６０には、生成部２３により生成された楽譜情報に基づいて補助用楽譜が表示される。図４は、表示部１６０に表示される補助用楽譜の一例を示す。図４に示すように、補助用楽譜には、推定部２２により推定された指情報が受付部２１により受け付けられた音符列の各音符に対応するように示される。図４の例では、指情報として一方の手の指の番号が示されている。The display unit 160 displays the auxiliary musical score based on the musical score information generated by the generation unit 23. Fig. 4 shows an example of the auxiliary musical score displayed on the display unit 160. As shown in Fig. 4, the auxiliary musical score shows finger information estimated by the estimation unit 22 corresponding to each note of the sequence of notes accepted by the acceptance unit 21. In the example of Fig. 4, the finger numbers of one hand are shown as finger information.

左手または右手の指の番号の区別をする場合には、左手の指の番号の付近に「Ｌ」等の所定の文字が付され、右手の指の番号の付近に「Ｒ」等の他の所定の文字が付されてもよい。あるいは、左手の指の番号またはそれに対応する音符に赤色等の所定の色が付され、右手の指の番号またはそれに対応する音符に青色等の他の所定の色が付されてもよい。When distinguishing between the numbers of the fingers of the left and right hands, a predetermined letter such as "L" may be added near the number of the fingers of the left hand, and another predetermined letter such as "R" may be added near the number of the fingers of the right hand. Alternatively, a predetermined color such as red may be added to the numbers of the fingers of the left hand or the corresponding notes, and another predetermined color such as blue may be added to the numbers of the fingers of the right hand or the corresponding notes.

（４）訓練処理および運指提示処理
図５は、図３の訓練装置１０による訓練処理の一例を示すフローチャートである。図５の訓練処理は、図１のＣＰＵ１３０が訓練プログラムを実行することにより行われる。まず、第１の取得部１１は、各訓練データＤから入力時系列データＡを取得する（ステップＳ１）。また、第２の取得部１２は、各訓練データＤから出力指情報Ｂを取得する（ステップＳ２）。ステップＳ１，Ｓ２は、いずれが先に実行されてもよいし、同時に実行されてもよい。 (4) Training Processing and Fingering Presentation Processing Fig. 5 is a flow chart showing an example of training processing by the training device 10 of Fig. 3. The training processing of Fig. 5 is performed by the CPU 130 of Fig. 1 executing a training program. First, the first acquisition unit 11 acquires input time-series data A from each training data D (step S1). Furthermore, the second acquisition unit 12 acquires output finger information B from each training data D (step S2). Either step S1 or S2 may be executed first, or they may be executed simultaneously.

次に、構築部１３は、各訓練データＤについて、ステップＳ１で取得された入力時系列データＡを入力要素とし、ステップＳ２で取得された出力指情報Ｂを出力要素として機械学習を行う（ステップＳ３）。続いて、構築部１３は、十分な機械学習が実行されたか否かを判定する（ステップＳ４）。機械学習が不十分な場合、構築部１３はステップＳ３に戻る。十分な機械学習が実行されるまで、パラメータが変化されつつステップＳ３，Ｓ４が繰り返される。機械学習の繰り返し回数は、構築される訓練済モデルＭが満たすべき品質条件に応じて変化する。Next, the construction unit 13 performs machine learning for each training data D using the input time series data A acquired in step S1 as an input element and the output finger information B acquired in step S2 as an output element (step S3). Next, the construction unit 13 determines whether sufficient machine learning has been performed (step S4). If the machine learning is insufficient, the construction unit 13 returns to step S3. Steps S3 and S4 are repeated while changing the parameters until sufficient machine learning has been performed. The number of times the machine learning is repeated varies depending on the quality conditions that the trained model M to be constructed must satisfy.

十分な機械学習が実行された場合、構築部１３は、ステップＳ３の機械学習により習得した入力時系列データＡと出力指情報Ｂとの間の入出力関係を訓練済モデルＭとして保存する（ステップＳ５）。これにより、訓練処理が終了する。When sufficient machine learning has been performed, the construction unit 13 saves the input/output relationship between the input time-series data A and the output finger information B acquired by the machine learning in step S3 as a trained model M (step S5). This ends the training process.

図６は、図３の運指提示装置２０による運指提示処理の一例を示すフローチャートである。図６の運指提示処理は、図１のＣＰＵ１３０が運指提示プログラムを実行することにより行われる。まず、受付部２１は、時系列データを受け付ける（ステップＳ１１）。次に、推定部２２は、訓練処理のステップＳ５で保存された訓練済モデルＭを用いて、ステップＳ１１で受け付けられた時系列データから指情報を推定する（ステップＳ１２）。 Figure 6 is a flowchart showing an example of fingering presentation processing by the fingering presentation device 20 of Figure 3. The fingering presentation processing of Figure 6 is performed by the CPU 130 of Figure 1 executing a fingering presentation program. First, the reception unit 21 receives time series data (step S11). Next, the estimation unit 22 estimates finger information from the time series data received in step S11 using the trained model M saved in step S5 of the training processing (step S12).

その後、生成部２３は、ステップＳ１１で受け付けられた時系列データの音符列およびステップＳ１２で推定された指情報に基づいて楽譜情報を生成する（ステップＳ１３）。生成された楽譜情報に基づいて、補助用楽譜が表示部１６０に表示されてもよい。これにより、運指提示処理が終了する。Then, the generation unit 23 generates score information based on the note sequence of the time-series data received in step S11 and the fingering information estimated in step S12 (step S13). Based on the generated score information, an auxiliary score may be displayed on the display unit 160. This completes the fingering presentation process.

（５）実施の形態の効果
以上説明したように、本実施の形態に係る運指提示装置２０は、複数の音符からなる音符列を含む時系列データを受け付ける受付部２１と、訓練済モデルＭを用いて、音符列の各音符を楽器により演奏する際に使用する指を示す指情報を推定する推定部２２とを備える。この構成によれば、訓練済モデルＭを用いて、時系列データにおける複数の音符の時間的流れから適切な指情報が推定される。これにより、楽器を演奏する際の適切な運指を提示することができる。 (5) Effect of the embodiment As described above, the fingering presentation device 20 according to the embodiment includes a receiving unit 21 that receives time-series data including a note sequence consisting of a plurality of notes, and an estimating unit 22 that estimates fingering information indicating the finger to be used when playing each note of the note sequence on an instrument, using a trained model M. According to this configuration, appropriate fingering information is estimated from the temporal flow of the plurality of notes in the time-series data, using the trained model M. This makes it possible to present appropriate fingering when playing an instrument.

訓練済モデルＭは、複数の音符からなる参照音符列を含む入力時系列データＡと、参照音符列の各音符を楽器により演奏する際に使用する指を示す出力指情報Ｂとの間の入出力関係を習得した機械学習モデルであってもよい。この場合、時系列データから指情報を容易に推定することができる。The trained model M may be a machine learning model that has learned the input/output relationship between input time-series data A including a reference note sequence consisting of a plurality of notes, and output fingering information B indicating the finger to be used when playing each note of the reference note sequence on an instrument. In this case, the fingering information can be easily estimated from the time-series data.

時系列データは、音符列を演奏する演奏者を示す演奏者識別子をさらに含み、推定部２２は、演奏者識別子に基づいて指情報を推定してもよい。この場合、演奏者に応じて適切な指情報を推定することができる。The time-series data may further include a performer identifier indicating a performer who plays the sequence of notes, and the estimation unit 22 may estimate finger information based on the performer identifier. In this case, appropriate finger information can be estimated according to the performer.

演奏者識別子は、演奏者の身体的特徴に対応するように決定されてもよい。この場合、演奏者の身体的特徴に応じて適切な指情報を推定することができる。The performer identifier may be determined to correspond to the physical characteristics of the performer. In this case, appropriate finger information can be estimated according to the physical characteristics of the performer.

演奏者識別子は、演奏者による演奏の流儀に対応するように決定されてもよい。この場合、演奏者による演奏の流儀に応じて適切な指情報を推定することができる。The performer identifier may be determined to correspond to the performance style of the performer. In this case, appropriate fingering information can be estimated according to the performance style of the performer.

運指提示装置２０は、音符列の各音符に対応するように指情報が付された補助用楽譜を示す楽譜情報を生成する生成部２３をさらに備えてもよい。この場合、演奏者は、補助用楽譜を視認することにより、音符列の各音符に対応する指を容易に認識することができる。 The fingering presentation device 20 may further include a generating unit 23 that generates score information indicating an auxiliary score to which fingering information is added so as to correspond to each note of the note sequence. In this case, the performer can easily recognize the finger corresponding to each note of the note sequence by visually checking the auxiliary score.

本実施の形態に係る訓練装置１０は、複数の音符からなる参照音符列を含む入力時系列データＡを取得する第１の取得部１１と、参照音符列の各音符を楽器により演奏する際に使用する指を示す出力指情報Ｂを取得する第２の取得部１２と、入力時系列データＡと出力指情報Ｂとの間の入出力関係を習得した訓練済モデルＭを構築する構築部１３とを備える。この構成によれば、入力時系列データＡと出力指情報Ｂとの間の入出力関係を習得した訓練済モデルＭを容易に構築することができる。The training device 10 according to this embodiment includes a first acquisition unit 11 that acquires input time series data A including a reference note sequence consisting of a plurality of notes, a second acquisition unit 12 that acquires output finger information B indicating the finger to be used when playing each note of the reference note sequence on an instrument, and a construction unit 13 that constructs a trained model M that has acquired the input/output relationship between the input time series data A and the output finger information B. With this configuration, a trained model M that has acquired the input/output relationship between the input time series data A and the output finger information B can be easily constructed.

（６）訓練データの他の例
本実施の形態において、入力時系列データＡは参照演奏者識別子を含み、時系列データは演奏者識別子を含むが、実施の形態はこれに限定されない。入力時系列データＡは、参照音符列を含めばよく、参照演奏者識別子を含まなくてもよい。同様に、時系列データは、音符列を含めばよく、演奏者識別子を含まなくてもよい。 (6) Other Examples of Training Data In this embodiment, the input time series data A includes a reference performer identifier, and the time series data includes a performer identifier, but the embodiment is not limited to this. The input time series data A only needs to include a reference note sequence, and does not need to include a reference performer identifier. Similarly, the time series data only needs to include a note sequence, and does not need to include a performer identifier.

また、本実施の形態において、入力時系列データＡおよび出力指情報Ｂは、ＭＩＤＩ（Musical Instrument Digital Interface）規格における押鍵または離鍵等を示す、いわゆる動作ベースで記述されるが、実施の形態はこれに限定されない。入力時系列データＡおよび出力指情報Ｂは、他の方式で記述されてもよい。例えば、入力時系列データＡおよび出力指情報Ｂは、ＭＩＤＩ規格における音符の開始位置または音符の長さ等を示す、いわゆる音符ベースで記述されてもよい。時系列データおよび指情報についても同様である。 In addition, in this embodiment, the input time series data A and the output finger information B are described in a so-called action-based manner, which indicates key presses or key releases in the MIDI (Musical Instrument Digital Interface) standard, but the embodiment is not limited to this. The input time series data A and the output finger information B may be described in other formats. For example, the input time series data A and the output finger information B may be described in a so-called note-based manner, which indicates the start position of a note or the length of a note in the MIDI standard. The same applies to the time series data and finger information.

図７は、入力時系列データＡの他の例を示す図である。図７の上段には、動作ベースで記述された入力時系列データＡ（Ａｘ）が示される。図７の中段には、音符ベースで記述された入力時系列データＡ（Ａｙ）が示される。入力時系列データＡｘと入力時系列データＡｙとは、同一の参照音符列（図７の下段に示される楽譜中の参照音符列）を含む。入力時系列データＡｘ，Ａｙにおける“ｂａｒ”および“ｂｅａｔ”は、参照音符列の拍節構造を示す要素である。 Figure 7 shows another example of input time series data A. The top part of Figure 7 shows input time series data A (Ax) described on an action basis. The middle part of Figure 7 shows input time series data A (Ay) described on a note basis. The input time series data Ax and the input time series data Ay contain the same reference note sequence (the reference note sequence in the musical score shown in the bottom part of Figure 7). "Bar" and "beat" in the input time series data Ax and Ay are elements that indicate the metrical structure of the reference note sequence.

図７に示すように、入力時系列データＡを音符ベースで記述することにより、入力時系列データＡの長さが短縮される。これにより、より長い入力時系列データＡを容易に処理することが可能になる。なお、入力時系列データＡに対応する出力指情報Ｂは、入力時系列データＡにおける音高の番号を示す要素（“ｎｏｔｅ＿○○”）の直後に指の番号を示す要素を挿入することにより記述することができる。 As shown in Figure 7, by describing the input time series data A on a note basis, the length of the input time series data A is shortened. This makes it possible to easily process longer input time series data A. Note that the output fingering information B corresponding to the input time series data A can be described by inserting an element indicating the finger number immediately after the element indicating the pitch number in the input time series data A ("note_○○").

あるいは、入力時系列データＡおよび出力指情報Ｂは、楽譜を表す方式により記述されてもよい。楽譜を表す方式により記述された入力時系列データＡおよび出力指情報Ｂの詳細については、以下の変形例において説明する。Alternatively, the input time series data A and the output fingering information B may be described in a format that represents musical scores. Details of the input time series data A and the output fingering information B described in a format that represents musical scores will be described in the following modified example.

（７）変形例
図８は、変形例における入力時系列データＡの一例を示す図である。図８の上段には、楽譜を表す方式により記述された入力時系列データＡ（Ａｚ）が示される。図８の下段には、入力時系列データＡｚにより表された楽譜が示される。図８の上段に示すように、入力時系列データＡｚは、複数の要素Ａ０～Ａ２４を含む。一部の要素は属性を有する。要素の属性は、当該要素の後部（アンダーバーの後）に記述される。 (7) Modifications Fig. 8 is a diagram showing an example of input time series data A in a modification. The upper part of Fig. 8 shows input time series data A (Az) described in a format for expressing musical scores. The lower part of Fig. 8 shows the musical score represented by the input time series data Az . As shown in the upper part of Fig. 8, the input time series data Az includes a number of elements A0 to A24. Some of the elements have attributes. The attributes of an element are described at the end of the element (after the underscore).

要素Ａ０は、参照音符列に含まれる音符のうち運指を付与する音符の割合を示す。要素Ａ０は入力時系列データＡｚにおける先頭に配置されるが、入力時系列データＡｚにおける任意の位置に配置されてもよい。要素Ａ０における“ｆｉｎｇｅｒｒａｔｅ”の属性により、割合が指定される。本例における属性「５」は、１００％の割合を意味する。割合は、例えば２０～４０％または４０～６０％のように、範囲を有してもよいし、複数の範囲に分割されてもよい。 The element A0 indicates the proportion of notes to which fingering is to be applied among the notes included in the reference sequence of notes. The element A0 is placed at the beginning of the input time-series data Az, but may be placed at any position in the input time-series data Az. The proportion is specified by the "fingerrate" attribute of the element A0. In this example, the attribute "5" means a proportion of 100%. The proportion may have a range, for example, 20-40% or 40-60%, or may be divided into multiple ranges.

要素Ａ１はパートを示す。要素Ａ１は要素Ａ０の直後に配置されるが、入力時系列データＡｚにおける任意の位置に配置されてもよい。要素Ａ１として、「Ｒ」および「Ｌ」は、それぞれ右手および左手のパートを示す。本例では、「Ｒ」の後に右手に対応する要素が配置される。その後に「Ｌ」が配置され、「Ｌ」の後に左手に対応する要素が配置される。「Ｒ」および右手に対応する要素は、左手に対応する要素の後に配置されてもよい。パートの区別がない場合、入力時系列データＡｚは要素Ａ１を含まない。 Element A1 indicates a part. Element A1 is placed immediately after element A0, but may be placed at any position in the input time series data Az. In element A1, "R" and "L" indicate the right hand and left hand parts, respectively. In this example, the element corresponding to the right hand is placed after "R". Then "L" is placed, and after "L" is placed the element corresponding to the left hand. "R" and the element corresponding to the right hand may be placed after the element corresponding to the left hand. If there is no distinction between parts, the input time series data Az does not include element A1.

要素Ａ２，Ａ１５，Ａ２４は、楽譜の小節線を示す。したがって、図８の例では、要素Ａ２における“ｂａｒ”と要素Ａ１５における“ｂａｒ”とにより区切られた範囲が第１小節に対応する。要素Ａ１５における“ｂａｒ”と要素Ａ２４における“ｂａｒ”とにより区切られた範囲が第２小節に対応する。Elements A2, A15, and A24 represent bar lines in the musical score. Therefore, in the example of Figure 8, the range delimited by "bar" in element A2 and "bar" in element A15 corresponds to the first bar. The range delimited by "bar" in element A15 and "bar" in element A24 corresponds to the second bar.

要素Ａ３は、楽譜の音部記号を示す。要素Ａ３における“ｃｌｅｆ”の属性により、音部記号の種類が指定される。図８の例では、属性が“ｔｒｅｂｌｅ”であるため、要素Ａ３により音部記号としてト音記号が指定される。なお、属性が“ｂａｓｓ”である場合には、要素Ａ３により音部記号としてヘ音記号が指定される。 Element A3 indicates the clef of the musical score. The type of clef is specified by the "clef" attribute of element A3. In the example of Figure 8, since the attribute is "treble", element A3 specifies a treble clef as the clef. If the attribute were "bass", element A3 would specify a bass clef as the clef.

要素Ａ４は、楽譜の拍子記号を示す。要素Ａ４における“ｔｉｍｅ”の属性により、拍子記号の種類が指定される。図８の例では、属性が“４／４”であるため、要素Ａ４により拍子記号として“４／４”が指定される。 Element A4 indicates the time signature of the musical score. The type of time signature is specified by the "time" attribute in element A4. In the example in Figure 8, since the attribute is "4/4", element A4 specifies "4/4" as the time signature.

参照音符列における音符は、音高と音価との組により示される。音高は、要素Ａ５，Ａ９，Ａ１１，Ａ１３，Ａ１６，Ａ１８，Ａ２０における“ｎｏｔｅ”の属性により指定される。音価は、要素Ａ６，Ａ１０，Ａ１２，Ａ１４，Ａ１７，Ａ１９，Ａ２１における“ｌｅｎ”の属性により指定される。本例では、“ｌｅｎ＿１”が１拍に相当する。 Notes in the reference note sequence are indicated by a pair of pitch and value. The pitch is specified by the "note" attribute in elements A5, A9, A11, A13, A16, A18, and A20. The value is specified by the "len" attribute in elements A6, A10, A12, A14, A17, A19, and A21. In this example, "len_1" corresponds to one beat.

楽譜における音符の符幹の方向は、要素Ａ６，Ａ１０，Ａ１２，Ａ１４，Ａ１７，Ａ１９，Ａ２１における“ｌｅｎ”の他の属性により指定される。他の属性が“ｄｏｗｎ”の場合、符幹は符頭から下に延びる。他の属性が“ｕｐ”の場合、符幹は符頭から上に延びる。８分音符または１６分音符等の複数の音符が連桁によりつなげられる場合には、連桁の開始位置、中継位置および終了位置が、要素Ａ１０，Ａ１２，Ａ１４における“ｌｅｎ”のさらに他の属性“ｓｔａｒｔ”、“ｃｏｎｔｉｎｕｅ”および“ｓｔｏｐ”によりそれぞれ指定される。The direction of the stem of the notes in the musical score is specified by other attributes of "len" in elements A6, A10, A12, A14, A17, A19, and A21. If the other attribute is "down", the stem extends downward from the note head. If the other attribute is "up", the stem extends upward from the note head. When multiple notes such as eighth notes or sixteenth notes are connected by a beam, the start, intermediate, and end positions of the beam are specified by further attributes of "len" in elements A10, A12, and A14, "start", "continue", and "stop", respectively.

参照音符列における休符は、要素Ａ７，Ａ２２における“ｒｅｓｔ”により指定される。休符の音価は、要素Ａ８，Ａ２３における“ｌｅｎ”の属性により記述される。 Rests in the reference sequence are specified by the "rest" attribute in elements A7 and A22. The note value of the rest is described by the "len" attribute in elements A8 and A23.

図８の例では、要素Ａ５，Ａ６は音符Ｎ１を示し、要素Ａ７，Ａ８は休符Ｒ１を示す。要素Ａ９，Ａ１０は音符Ｎ２を示し、要素Ａ１１，Ａ１２は音符Ｎ３を示し、要素Ａ１３，Ａ１４は音符Ｎ４を示す。要素Ａ１６，Ａ１７は音符Ｎ５を示し、要素Ａ１８，Ａ１９は音符Ｎ６を示す。要素Ａ２０，Ａ２１は音符Ｎ７を示し、要素Ａ２２，Ａ２３は休符Ｒ２を示す。In the example of Figure 8, elements A5 and A6 indicate note N1, and elements A7 and A8 indicate rest R1. Elements A9 and A10 indicate note N2, elements A11 and A12 indicate note N3, and elements A13 and A14 indicate note N4. Elements A16 and A17 indicate note N5, and elements A18 and A19 indicate note N6. Elements A20 and A21 indicate note N7, and elements A22 and A23 indicate rest R2.

図９は、変形例における出力指情報Ｂの一例を示す図である。図９の上段には、楽譜を表す方式により記述された出力指情報Ｂ（Ｂｚ）が示される。出力指情報Ｂｚは、図８の入力時系列データＡｚに対応する。図９の下段には、出力指情報Ｂｚにより表された楽譜が示される。 Figure 9 is a diagram showing an example of output finger information B in a modified example. The upper part of Figure 9 shows output finger information B (Bz) described in a format for representing musical scores. The output finger information Bz corresponds to the input time series data Az in Figure 8. The lower part of Figure 9 shows the musical score represented by the output finger information Bz.

図９の上段に示すように、出力指情報Ｂｚは、複数の要素Ｂ０～Ｂ２４を含む。また、出力指情報Ｂｚは、要素Ｂ５，Ｂ９，Ｂ１１，Ｂ１３，Ｂ１６，Ｂ１８，Ｂ２０の直後にそれぞれ配置された要素Ｂ５ｆ，Ｂ９ｆ，Ｂ１１ｆ，Ｂ１３ｆ，Ｂ１６ｆ，Ｂ１８ｆ，Ｂ２０ｆをさらに含む。要素Ｂ０～Ｂ２４は、図８の入力時系列データＡｚの要素Ａ０～Ａ２４とそれぞれ同様である。そのため、図３の第１の取得部１１は、出力指情報Ｂｚから要素Ｂ５ｆ，Ｂ９ｆ，Ｂ１１ｆ，Ｂ１３ｆ，Ｂ１６ｆ，Ｂ１８ｆ，Ｂ２０ｆを削除することにより、入力時系列データＡｚを取得することができる。 As shown in the upper part of Figure 9, the output finger information Bz includes a number of elements B0 to B24. The output finger information Bz also includes elements B5f, B9f, B11f, B13f, B16f, B18f, and B20f, which are respectively placed immediately after elements B5, B9, B11, B13, B16, B18, and B20. The elements B0 to B24 are respectively similar to the elements A0 to A24 of the input time series data Az in Figure 8. Therefore, the first acquisition unit 11 in Figure 3 can acquire the input time series data Az by deleting the elements B5f, B9f, B11f, B13f, B16f, B18f, and B20f from the output finger information Bz.

要素Ｂ５ｆ，Ｂ９ｆ，Ｂ１１ｆ，Ｂ１３ｆ，Ｂ１６ｆ，Ｂ１８ｆ，Ｂ２０ｆは、直前の要素Ｂ５，Ｂ９，Ｂ１１，Ｂ１３，Ｂ１６，Ｂ１８，Ｂ２０に対応する音符を楽器により演奏する際に使用する指の番号をそれぞれ示す。要素Ｂ５ｆ，Ｂ９ｆ，Ｂ１１ｆ，Ｂ１３ｆ，Ｂ１６ｆ，Ｂ１８ｆ，Ｂ２０ｆにおける“ｆｉｎｇｅｒ”の属性により、指の番号が指定される。したがって、要素Ｂ５ｆ，Ｂ９ｆ，Ｂ１１ｆ，Ｂ１３ｆ，Ｂ１６ｆ，Ｂ１８ｆ，Ｂ２０ｆにより、図９の下段に示すように、音符Ｎ１～Ｎ７を演奏する際に使用する指の番号「１」、「１」、「２」、「１」、「３」、「３」および「２」がそれぞれ楽譜に記載される。 Elements B5f, B9f, B11f, B13f, B16f, B18f, and B20f indicate the finger numbers to be used when playing the notes corresponding to the immediately preceding elements B5, B9, B11, B13, B16, B18, and B20 on an instrument, respectively. The finger numbers are specified by the "finger" attribute in elements B5f, B9f, B11f, B13f, B16f, B18f, and B20f. Therefore, elements B5f, B9f, B11f, B13f, B16f, B18f, and B20f write the finger numbers "1", "1", "2", "1", "3", "3", and "2" to be used when playing notes N1 to N7, respectively, on the musical score, as shown in the lower part of Figure 9.

（８）変形例の効果
第１の実施の形態の変形例においては、要素Ａ０の属性により、参照音符列に含まれる音符のうち運指を付与する音符の割合を任意に指定することができる。割合が１００％であるときには、推定部２２は、音符列に含まれる全部の音符についての指情報を推定する。この場合、入門レベルの演奏者が楽器を演奏する際の適切な運指を提示することができる。 (8) Effects of the Modification In the modification of the first embodiment, the percentage of notes to which fingering is to be assigned among the notes included in the reference sequence of notes can be arbitrarily specified by the attribute of element A0. When the percentage is 100%, the estimation unit 22 estimates fingering information for all notes included in the sequence of notes. In this case, it is possible to present appropriate fingering when a beginner-level player plays an instrument.

また、割合が１００％であるときには、生成部２３は、推定部２２により推定された指情報に基づいて、指の動きをアニメーション等により示す動画ファイルを生成してもよい。これにより、指の動きを可視化することができる。このような動画ファイルの生成は、図６の運指提示処理におけるステップＳ１３の前または後に実行されてもよいし、ステップＳ１３と並列的に実行されてもよいし、ステップＳ１３に代えて実行されてもよい。 When the ratio is 100%, the generation unit 23 may generate a video file showing the finger movement by animation or the like based on the finger information estimated by the estimation unit 22. This makes it possible to visualize the finger movement. The generation of such a video file may be performed before or after step S13 in the fingering presentation process of FIG. 6, may be performed in parallel with step S13, or may be performed instead of step S13.

一方、割合が１００％未満であるときには、推定部２２は、音符列に含まれる音符のうち、運指を付与する対象となる一部の音符と、当該一部の音符についての指情報とを推定する。この場合、入門レベルよりも高い初級レベルまたは中級レベルの演奏者が楽器を演奏する際の適切な運指を提示することができる。この構成においては、出力指情報Ｂｚは、要素Ｂ５ｆ，Ｂ９ｆ，Ｂ１１ｆ，Ｂ１３ｆ，Ｂ１６ｆ，Ｂ１８ｆ，Ｂ２０ｆの一部を含まない。On the other hand, when the ratio is less than 100%, the estimation unit 22 estimates some of the notes in the sequence of notes to which fingering is to be assigned and fingering information for the some of the notes. In this case, it is possible to present appropriate fingering when a player at a beginner level or intermediate level higher than the introductory level plays the instrument. In this configuration, the output fingering information Bz does not include some of the elements B5f, B9f, B11f, B13f, B16f, B18f, and B20f.

また、割合が１００％未満であるときには、推定部２２は、指情報を推定せずに、音符列から運指を付与する対象となる音符を示す音符情報を推定してもよい。詳細は、後述する第３の実施の形態で説明する。 When the ratio is less than 100%, the estimation unit 22 may estimate note information indicating the note to which fingering is to be assigned from the note sequence without estimating finger information. Details will be described in the third embodiment described later.

［２］第２の実施の形態
（１）処理システム
第２の実施の形態における処理システム１００について、第１の実施の形態における処理システム１００と異なる点を説明する。図３の訓練装置１０において、第１の取得部１１および第２の取得部１２は、訓練データＤの入力時系列データＡおよび出力指情報Ｂをそれぞれ取得する。 [2] Second embodiment (1) Processing system The processing system 100 in the second embodiment will be described with respect to differences from the processing system 100 in the first embodiment. In the training device 10 in Fig. 3, a first acquisition unit 11 and a second acquisition unit 12 acquire input time-series data A and output finger information B of training data D, respectively.

図１０は、第２の実施の形態における入力時系列データＡの一例を示す図である。図１０の上段には、楽譜を表す方式により記述された入力時系列データＡｚが示される。図１０の下段には、入力時系列データＡｚにより表された楽譜が示される。 Figure 10 is a diagram showing an example of input time series data A in the second embodiment. The upper part of Figure 10 shows input time series data Az described in a format for representing musical scores. The lower part of Figure 10 shows the musical score represented by the input time series data Az.

図１０の上段に示すように、入力時系列データＡｚは、複数の要素Ａ０～Ａ２４を含む。図１０の要素Ａ０～Ａ２４は、第１の実施の形態における変形例（図８）の要素Ａ０～Ａ２４とそれぞれ同様である。また、入力時系列データＡｚは、音符に対応する要素Ａ５，Ａ９，Ａ１１，Ａ１３，Ａ１６，Ａ１８，Ａ２０の一部の直後に配置された追加の要素を含む。図１０の例では、入力時系列データＡｚは、要素Ａ５，Ａ１１，Ａ１６，Ａ２０の直後にそれぞれ配置された要素Ａ５ｆ，Ａ１１ｆ，Ａ１６ｆ，Ａ２０ｆをさらに含む。As shown in the upper part of Figure 10, the input time series data Az includes multiple elements A0 to A24. The elements A0 to A24 in Figure 10 are respectively similar to the elements A0 to A24 in the modified example of the first embodiment (Figure 8). The input time series data Az also includes additional elements that are placed immediately after parts of the elements A5, A9, A11, A13, A16, A18, and A20 that correspond to the notes. In the example of Figure 10, the input time series data Az further includes elements A5f, A11f, A16f, and A20f that are placed immediately after the elements A5, A11, A16, and A20, respectively.

要素Ａ５ｆ，Ａ１１ｆ，Ａ１６ｆ，Ａ２０ｆは、直前の要素Ａ５，Ａ１１，Ａ１６，Ａ２０に対応する音符を楽器により演奏する際に使用する指の番号をそれぞれ示す指情報（以下、基本指情報と呼ぶ。）である。要素Ａ５ｆ，Ａ１１ｆ，Ａ１６ｆ，Ａ２０ｆにおける“ｆｉｎｇｅｒ”の属性により、指の番号が指定される。したがって、要素Ａ５ｆ，Ａ１１ｆ，Ａ１６ｆ，Ａ２０ｆにより、図１０の下段に示すように、音符Ｎ１，Ｎ３，Ｎ５，Ｎ７を演奏する際に使用する指の番号「１」、「２」「３」および「２」がそれぞれ楽譜に記載される。 Elements A5f, A11f, A16f, and A20f are finger information (hereinafter referred to as basic finger information) that indicate the finger numbers to be used when playing the notes corresponding to the immediately preceding elements A5, A11, A16, and A20 on an instrument. The finger numbers are specified by the "finger" attribute in elements A5f, A11f, A16f, and A20f. Therefore, elements A5f, A11f, A16f, and A20f write the finger numbers "1," "2," "3," and "2" to be used when playing notes N1, N3, N5, and N7, respectively, on the musical score, as shown in the lower part of Figure 10.

本実施の形態における出力指情報Ｂｚは、第１の実施の形態における変形例（図９）の出力指情報Ｂｚと同様である。そのため、第１の取得部１１は、出力指情報Ｂｚから要素Ｂ５ｆ，Ｂ９ｆ，Ｂ１１ｆ，Ｂ１３ｆ，Ｂ１６ｆ，Ｂ１８ｆ，Ｂ２０ｆの一部をランダムに削除することにより、入力時系列データＡｚを取得することができる。削除する要素Ｂ５ｆ，Ｂ９ｆ，Ｂ１１ｆ，Ｂ１３ｆ，Ｂ１６ｆ，Ｂ１８ｆ，Ｂ２０ｆの割合は、訓練装置１０の使用者が図１の操作部１５０を操作することにより指定することができる。The output finger information Bz in this embodiment is the same as the output finger information Bz in the modified example of the first embodiment (FIG. 9). Therefore, the first acquisition unit 11 can acquire the input time series data Az by randomly deleting a portion of the elements B5f, B9f, B11f, B13f, B16f, B18f, and B20f from the output finger information Bz. The proportion of the elements B5f, B9f, B11f, B13f, B16f, B18f, and B20f to be deleted can be specified by the user of the training device 10 by operating the operation unit 150 in FIG. 1.

本例では、出力指情報Ｂｚから要素Ｂ９ｆ，Ｂ１３ｆ，Ｂ１８ｆが削除されることにより入力時系列データＡｚが取得される。削除されない要素Ｂ５ｆ，Ｂ１１ｆ，Ｂ１６ｆ，Ｂ２０ｆが、基本指情報である要素Ａ５ｆ，Ａ１１ｆ，Ａ１６ｆ，Ａ２０ｆとして残存する。In this example, the input time series data Az is obtained by deleting the elements B9f, B13f, and B18f from the output finger information Bz. The elements B5f, B11f, B16f, and B20f that are not deleted remain as the elements A5f, A11f, A16f, and A20f, which are the basic finger information.

図３の構築部１３は、上記の入力時系列データＡｚを入力要素とし、出力指情報Ｂｚを出力要素とする機械学習を行う。複数の訓練データＤについて機械学習が繰り返されることにより、入力時系列データＡｚと出力指情報Ｂｚとの間の入出力関係を示す訓練済モデルＭが構築される。The construction unit 13 in FIG. 3 performs machine learning using the above-mentioned input time series data Az as an input element and the output finger information Bz as an output element. By repeating machine learning for multiple training data D, a trained model M showing the input/output relationship between the input time series data Az and the output finger information Bz is constructed.

運指提示装置２０においては、受付部２１が時系列データを受け付ける。時系列データは、音符列に含まれる音符のうち一部の音符を楽器により演奏する際に使用する指を示す基本指情報をさらに含む。推定部２２は、構築された訓練済モデルＭと、基本指情報とに基づいて、音符列に含まれる音符を楽器により演奏する際に使用する指を示す指情報を推定する。生成部２３は、時系列データの音符列および指情報に基づいて楽譜情報を生成する。In the fingering presentation device 20, the receiving unit 21 receives time series data. The time series data further includes basic fingering information indicating the fingers to be used when playing some of the notes included in the note sequence on an instrument. The estimation unit 22 estimates fingering information indicating the fingers to be used when playing the notes included in the note sequence on an instrument based on the constructed trained model M and the basic fingering information. The generation unit 23 generates musical score information based on the note sequence and fingering information of the time series data.

（２）実施の形態の効果
本実施の形態によれば、時系列データの音符列に含まれる音符のうち、一部の音符についての指情報（基本指情報）のみが既知であり、残りの音符についての指情報が与えられていない場合でも、残りの音符についての指情報が補完される。これにより、入門レベルの演奏者が楽器を演奏する際の適切な運指を提示することができる。生成部２３は、推定部２２により推定された指情報に基づいて、指の動きをアニメーション等により示す動画ファイルを生成してもよい。この場合、指の動きを可視化することができる。 (2) Effects of the embodiment According to the embodiment, even if fingering information (basic fingering information) for only some of the notes included in the note sequence of the time-series data is known and fingering information for the remaining notes is not given, the fingering information for the remaining notes is supplemented. This makes it possible to present appropriate fingering for beginner-level players when playing an instrument. The generating unit 23 may generate a video file showing finger movements by animation or the like based on the fingering information estimated by the estimating unit 22. In this case, the finger movements can be visualized.

（３）変形例
本実施の形態において、推定部２２は時系列データの音符列に含まれる全部の音符についての指情報を推定するが、実施の形態はこれに限定されない。音符列に含まれる音符のうち第１の割合の音符について指情報が与えられている場合において、推定部２２は、音符列に含まれる音符のうち第１の割合よりも大きい第２の割合の音符についての指情報を推定してもよい。この場合、初級レベルまたは中級レベルの演奏者が楽器を演奏する際の適切な運指を提示することができる。 (3) Modification In this embodiment, the estimation unit 22 estimates fingering information for all notes included in the note sequence of the time-series data, but the embodiment is not limited to this. When fingering information is given for a first proportion of notes included in the note sequence, the estimation unit 22 may estimate fingering information for a second proportion of notes included in the note sequence that is greater than the first proportion. In this case, it is possible to present appropriate fingering when a beginner or intermediate level player plays a musical instrument.

変形例においては、訓練データＤの出力指情報Ｂは、要素Ｂ５ｆ，Ｂ９ｆ，Ｂ１１ｆ，Ｂ１３ｆ，Ｂ１６ｆ，Ｂ１８ｆ，Ｂ２０ｆの一部を含まなくてもよい。例えば、入力時系列データＡｚが要素Ａ５ｆ，Ａ１１ｆ，Ａ１６ｆ，Ａ２０ｆを含む場合には、出力指情報Ｂは要素Ｂ５ｆ，Ｂ１１ｆ，Ｂ１６ｆ，Ｂ２０ｆを含む。一方、出力指情報Ｂは、要素Ｂ９ｆ，Ｂ１３ｆ，Ｂ１８ｆの一部を含まなくてもよい。In a modified example, the output finger information B of the training data D may not include some of the elements B5f, B9f, B11f, B13f, B16f, B18f, and B20f. For example, if the input time series data Az includes elements A5f, A11f, A16f, and A20f, the output finger information B includes elements B5f, B11f, B16f, and B20f. On the other hand, the output finger information B may not include some of the elements B9f, B13f, and B18f.

［３］第３の実施の形態
（１）処理システム
第３の実施の形態における処理システム１００について、第１の実施の形態における処理システム１００と異なる点を説明する。本実施の形態においては、訓練データＤは、入力時系列データＡと出力音符情報との組を示す。図３の訓練装置１０において、第１の取得部１１および第２の取得部１２は、訓練データＤの入力時系列データＡおよび出力音符情報をそれぞれ取得する。出力音符情報の取得は、図５の音学習処理におけるステップＳ２に代えて実行される。 [3] Third embodiment (1) Processing system The processing system 100 in the third embodiment will be described with respect to differences from the processing system 100 in the first embodiment. In this embodiment, the training data D indicates a set of input time series data A and output note information. In the training device 10 in Fig. 3, the first acquisition unit 11 and the second acquisition unit 12 acquire the input time series data A and the output note information of the training data D, respectively. The acquisition of the output note information is executed in place of step S2 in the sound learning process in Fig. 5.

本実施の形態における入力時系列データＡｚは、第１の実施の形態における変形例（図８）の入力時系列データＡｚと同様である。第１の取得部１１は、後述する図１１の出力音符情報Ｃｚから要素Ｃ９ｆ，Ｃ１１ｆ，Ｃ１６ｆを削除することにより、入力時系列データＡｚを取得することができる。The input time series data Az in this embodiment is the same as the input time series data Az in the modified example of the first embodiment (FIG. 8). The first acquisition unit 11 can acquire the input time series data Az by deleting the elements C9f, C11f, and C16f from the output note information Cz in FIG. 11 described later.

図１１は、第３の実施の形態における出力音符情報Ｃの一例を示す図である。図１１の上段には、楽譜を表す方式により記述された出力音符情報Ｃ（Ｃｚ）が示される。図１１の下段には、出力音符情報Ｃｚにより表された楽譜が示される。 Figure 11 is a diagram showing an example of output note information C in the third embodiment. The upper part of Figure 11 shows output note information C (Cz) described in a format for representing musical scores. The lower part of Figure 11 shows the musical score represented by the output note information Cz.

図１１の上段に示すように、出力音符情報Ｃｚは、複数の要素Ｃ０～Ｃ２４を含む。図１１の要素Ｃ０～Ｃ２４は、第１の実施の形態における変形例（図９）の出力指情報Ｂｚの要素Ｂ０～Ｂ２４とそれぞれ同様である。また、出力音符情報Ｃｚは、音符に対応する要素Ｃ５，Ｃ９，Ｃ１１，Ｃ１３，Ｃ１６，Ｃ１８，Ｃ２０の一部の直後に配置された追加の要素を含む。 As shown in the upper part of Figure 11, the output note information Cz includes multiple elements C0 to C24. The elements C0 to C24 in Figure 11 are similar to the elements B0 to B24 of the output finger information Bz in the modified example of the first embodiment (Figure 9). In addition, the output note information Cz includes additional elements that are positioned immediately after some of the elements C5, C9, C11, C13, C16, C18, and C20 that correspond to the notes.

本例では、要素Ｃ０における“ｆｉｎｇｅｒｒａｔｅ”の属性は「２」であり、属性「２」は４０％の割合を意味する。そのため、出力音符情報Ｃｚは、要素Ｃ５，Ｃ９，Ｃ１１，Ｃ１３，Ｃ１６，Ｃ１８，Ｃ２０のうち、約４０％の要素である要素Ｃ９，Ｃ１１，Ｃ１６の直後にそれぞれ配置された要素Ｃ９ｆ，Ｃ１１ｆ，Ｃ１６ｆをさらに含む。In this example, the attribute of "fingerrate" in element C0 is "2", which means a percentage of 40%. Therefore, the output note information Cz further includes elements C9f, C11f, and C16f, which are respectively placed immediately after elements C9, C11, and C16, which are approximately 40% of the elements C5, C9, C11, C13, C16, C18, and C20.

要素Ｃ９ｆ，Ｃ１１ｆ，Ｃ１６ｆは、直前の要素Ｃ９，Ｃ１１，Ｃ１６に対応する音符を、参照音符列から運指を付与する対象となる音符としてそれぞれ示す。図１１の下段に示すように、要素Ｃ９ｆ，Ｃ１１ｆ，Ｃ１６ｆにより、要素Ｃ９，Ｃ１１，Ｃ１６に対応する音符Ｎ２，Ｎ３，Ｎ５がそれぞれ識別可能に楽譜に記載される。 The elements C9f, C11f, and C16f indicate the notes corresponding to the immediately preceding elements C9, C11, and C16, respectively, as the notes to which fingering is to be assigned from the reference note sequence. As shown in the lower part of Figure 11, the elements C9f, C11f, and C16f allow the notes N2, N3, and N5 corresponding to the elements C9, C11, and C16 to be written in the musical score in an identifiable manner.

図３の構築部１３は、上記の入力時系列データＡｚを入力要素とし、出力音符情報Ｃｚを出力要素とする機械学習を行う。複数の訓練データＤについて機械学習が繰り返されることにより、入力時系列データＡｚと出力音符情報Ｃｚとの間の入出力関係を示す訓練済モデルＭが構築される。The construction unit 13 in FIG. 3 performs machine learning using the above-mentioned input time series data Az as an input element and the output note information Cz as an output element. By repeating machine learning for multiple training data D, a trained model M showing the input/output relationship between the input time series data Az and the output note information Cz is constructed.

運指提示装置２０においては、受付部２１が時系列データを受け付ける。推定部２２は、訓練装置１０により構築された訓練済モデルＭと、受付部２１により受け付けられた時系列データに基づいて、音符列から運指を付与する対象となる音符を示す音符情報を推定する。音符情報の推定は、図６の運指提示処理におけるステップＳ１２に代えて実行される。生成部２３は、音符情報が示す音符が識別可能に表示された補助用楽譜を示す楽譜情報を生成する。In the fingering presentation device 20, the reception unit 21 receives time series data. The estimation unit 22 estimates note information indicating the notes to which fingering is to be assigned from the sequence of notes based on the trained model M constructed by the training device 10 and the time series data received by the reception unit 21. The estimation of the note information is performed in place of step S12 in the fingering presentation process of Figure 6. The generation unit 23 generates musical score information indicating an auxiliary musical score in which the notes indicated by the note information are displayed in an identifiable manner.

（２）実施の形態の効果
本実施の形態によれば、音符列から運指を付与する対象となる音符を提示することができる。これにより、初級レベルまたは中級レベルの演奏者は、楽器を演奏する際に要所となる音符を認識することができる。 (2) Advantages of the embodiment According to the embodiment, it is possible to present the notes to which fingering should be assigned from the sequence of notes, thereby enabling beginner or intermediate level players to recognize the key notes when playing an instrument.

（３）変形例
推定部２２は、第１の実施の形態で構築された第１の訓練済モデルＭと、本実施の形態で構築された第２の訓練済モデルＭとを用いて、音符列に含まれる一部の音符を楽器により演奏する際に使用する指を示す指情報を推定してもよい。図１２は、変形例における運指提示処理の一例を示すフローチャートである。 (3) Modification The estimation unit 22 may estimate fingering information indicating fingers to be used when playing some notes included in a sequence of notes on an instrument, using the first trained model M constructed in the first embodiment and the second trained model M constructed in the present embodiment. Fig. 12 is a flowchart showing an example of a fingering suggestion process in the modification.

まず、受付部２１は、時系列データを受け付ける（ステップＳ２１）。次に、推定部２２は、第１の実施の形態で構築された第１の訓練済モデルＭを用いて、ステップＳ１１で受け付けられた時系列データから中間指情報を推定する（ステップＳ２２）。中間指情報は、音符列に含まれる各音符を楽器により演奏する際に使用する指を示す。First, the reception unit 21 receives time series data (step S21). Next, the estimation unit 22 estimates intermediate finger information from the time series data received in step S11 using the first trained model M constructed in the first embodiment (step S22). The intermediate finger information indicates the fingers to be used when playing each note included in the sequence of notes on an instrument.

また、推定部２２は、本実施の形態で構築された第２の訓練済モデルＭを用いて、ステップＳ２１で受け付けられた時系列データから音符情報を推定する（ステップＳ２３）。ステップＳ２２，Ｓ２３は、いずれが先に実行されてもよいし、同時に実行されてもよい。 In addition, the estimation unit 22 estimates note information from the time-series data accepted in step S21 by using the second trained model M constructed in this embodiment (step S23). Steps S22 and S23 may be executed either first or simultaneously.

続いて、推定部２２は、ステップＳ２２で推定された中間指情報に基づいて、音符列に含まれる音符のうち、ステップＳ２３で推定された音符情報が示す音符以外の音符についての指情報を推定する（ステップＳ２４）。その後、生成部２３は、ステップＳ２１で受け付けられた時系列データの音符列およびステップＳ２４で推定された指情報に基づいて楽譜情報を生成する（ステップＳ２５）。これにより、運指提示処理が終了する。Next, the estimation unit 22 estimates fingering information for notes included in the note sequence other than the notes indicated by the note information estimated in step S23 based on the intermediate fingering information estimated in step S22 (step S24). The generation unit 23 then generates musical score information based on the note sequence of the time-series data accepted in step S21 and the fingering information estimated in step S24 (step S25). This ends the fingering presentation process.

この運指提示処理においては、ステップＳ２２で推定される中間指情報は、例えば第１の実施の形態における変形例（図９）の出力指情報Ｂｚと同様の構成を有する。また、ステップＳ２３で推定される音符情報は、図１１の出力音符情報Ｃｚと同様の構成を有する。図１３は、運指提示処理のステップＳ２４で推定される指情報の一例を示す図である。In this fingering presentation process, the intermediate finger information estimated in step S22 has a similar configuration to the output finger information Bz of the modified example in the first embodiment (Figure 9), for example. Also, the note information estimated in step S23 has a similar configuration to the output note information Cz in Figure 11. Figure 13 is a diagram showing an example of finger information estimated in step S24 of the fingering presentation process.

図１３の上段には、楽譜を表す方式により記述された指情報Ｆ（Ｆｚ）が示される。図１３の下段には、指情報Ｆｚにより表された補助用楽譜が示される。指情報Ｆｚは、中間指情報（図９参照）から、音符情報（図１１参照）における運指を付与する対象となる音符を示す要素Ｃ９ｆ，Ｃ１１ｆ，Ｃ１６ｆにそれぞれ対応する要素Ｂ９ｆ，Ｂ１１ｆ，Ｂ１６ｆを削除することにより推定される。 The upper part of Fig. 13 shows fingering information F (Fz) described in a method for representing musical scores. The lower part of Fig. 13 shows auxiliary musical scores represented by fingering information Fz. Fingering information Fz is estimated by deleting elements B9f, B11f, and B16f, which respectively correspond to elements C9f, C11f, and C16f indicating the notes to which fingering is to be assigned in the note information (see Fig. 11), from the intermediate fingering information (see Fig. 9).

具体的には、図１３の上段に示すように、指情報Ｆｚは、複数の要素Ｆ１～Ｆ２４を含む。図１３の要素Ｆ１～Ｆ２４は、第１の実施の形態における変形例（図９）の出力指情報Ｂｚの要素Ｂ１～Ｂ２４とそれぞれ同様である。また、指情報Ｆｚは、音符に対応する要素Ｆ５，Ｆ９，Ｆ１１，Ｆ１３，Ｆ１６，Ｆ１８，Ｆ２０の一部の直後に配置された追加の要素を含む。本例では、指情報Ｆｚは、要素Ｆ５，Ｆ１３，Ｆ１８，Ｆ２０の直後にそれぞれ配置された要素Ｆ５ｆ，Ｆ１３ｆ，Ｆ１８ｆ，Ｆ２０ｆをさらに含む。 Specifically, as shown in the upper part of Fig. 13, finger information Fz includes a number of elements F1 to F24. Elements F1 to F24 in Fig. 13 are similar to elements B1 to B24 of output finger information Bz in the modified example of the first embodiment (Fig. 9), respectively. Finger information Fz also includes additional elements arranged immediately after parts of elements F5, F9, F11, F13, F16, F18, and F20 corresponding to notes. In this example, finger information Fz further includes elements F5f, F13f, F18f, and F20f arranged immediately after elements F5, F13, F18, and F20, respectively.

要素Ｆ５ｆ，Ｆ１３ｆ，Ｆ１８ｆ，Ｆ２０ｆは、直前の要素Ｆ５，Ｆ１３，Ｆ１８，Ｆ２０に対応する音符を楽器により演奏する際に使用する指の番号をそれぞれ示す。要素Ｆ５ｆ，Ｆ１３ｆ，Ｆ１８ｆ，Ｆ２０ｆにおける“ｆｉｎｇｅｒ”の属性により、指の番号が指定される。したがって、要素Ｆ５ｆ，Ｆ１３ｆ，Ｆ１８ｆ，Ｆ２０ｆにより、図１３の下段に示すように、音符Ｎ１，Ｎ４，Ｎ６，Ｎ７を演奏する際に使用する指の番号「１」、「１」、「３」および「２」がそれぞれ補助用楽譜に記載される。 Elements F5f, F13f, F18f, and F20f indicate the finger numbers to be used when playing the notes corresponding to the immediately preceding elements F5, F13, F18, and F20 on an instrument, respectively. The finger numbers are specified by the "finger" attribute in elements F5f, F13f, F18f, and F20f. Therefore, elements F5f, F13f, F18f, and F20f write the finger numbers "1", "1", "3", and "2" to be used when playing notes N1, N4, N6, and N7, respectively, in the auxiliary score, as shown in the lower part of Figure 13.

変形例によれば、時系列データの音符列に含まれる全部の音符についての指情報から、一部の音符についての指情報が間引きされる。この場合、初級レベルまたは中級レベルの演奏者が楽器を演奏する際の適切な運指を提示することができる。例えば、楽器を演奏する際に要所となる音符についての指情報が間引かれるので、初級レベルまたは中級レベルの演奏者は、楽器を練習する際に、適切な運指の判断力を養うことができる。According to the modified example, fingering information for some notes is thinned out from fingering information for all notes included in the sequence of notes in the time-series data. In this case, it is possible to present appropriate fingering when a beginner or intermediate level player plays an instrument. For example, fingering information for key notes when playing an instrument is thinned out, so that a beginner or intermediate level player can develop the ability to judge appropriate fingering when practicing an instrument.

［４］他の実施の形態
上記実施の形態において、運指提示装置２０は生成部２３を含むが、実施の形態はこれに限定されない。演奏者は、推定部２２により推定された指情報を所望の楽譜に転記することにより補助用楽譜を作成することができる。そのため、運指提示装置２０は、生成部２３を含まなくてもよい。 [4] Other embodiments In the above embodiment, the fingering presentation device 20 includes the generation unit 23, but the embodiment is not limited to this. A performer can create an auxiliary score by transcribing the fingering information estimated by the estimation unit 22 to a desired score. Therefore, the fingering presentation device 20 does not need to include the generation unit 23.

上記実施の形態において、訓練データＤはピアノにより演奏を行う際の指情報を推定するように訓練されるが、実施の形態はこれに限定されない。訓練データＤは、ドラム等の他の楽器により演奏を行う際の指情報を推定するように訓練されてもよい。In the above embodiment, the training data D is trained to estimate finger information when playing a piano, but the embodiment is not limited to this. The training data D may be trained to estimate finger information when playing other instruments such as drums.

上記実施の形態において、運指提示装置２０の使用者が演奏者である場合を例に説明したが、運指提示装置２０の使用者は、例えば、楽譜の作成会社のスタッフであってもよい。また、訓練装置１０による機械学習は、楽譜の作成会社のスタッフにより事前に行われてもよい。In the above embodiment, the user of the fingering presentation device 20 is a performer, but the user of the fingering presentation device 20 may be, for example, a staff member of a music score production company. Furthermore, machine learning by the training device 10 may be performed in advance by a staff member of the music score production company.

Claims

A reception unit that receives time series data including a musical note sequence consisting of a plurality of musical notes;
an estimation unit that estimates fingering information indicating fingers to be used when playing at least some of the notes included in the sequence of notes on a musical instrument using a trained model,
The time-series data further includes a performer identifier indicating a performer who performs the sequence of notes,
The estimation unit estimates the fingering information based on the player identifier ,
the trained model is a machine learning model that has learned an input/output relationship between input time-series data including a reference sequence of notes, and output fingering information indicating fingers to be used when playing at least some of the notes included in the reference sequence of notes on a musical instrument;
the input time-series data further includes a reference performer identifier indicating a reference performer who performs the reference sequence of notes;
The output fingering information indicates the fingers of the reference player to be used when playing a note on the musical instrument .

The estimation unit further estimates note information indicating a note to which fingering is to be assigned from the sequence of notes using the trained model;
The fingering guidance presentation device according to claim 1, wherein the trained model is a machine learning model that further acquires an input/output relationship between the input time series data and output note information indicating notes to which fingerings are to be assigned from the reference note sequence.

The fingering suggestion device according to claim 1 or 2, wherein the performer identifier is determined to correspond to the physical characteristics of the performer.

The fingering presentation device according to any one of claims 1 to 3, wherein the performer identifier is determined to correspond to the performance style of the performer.

A reception unit that receives time series data including a musical note sequence consisting of a plurality of musical notes;
an estimation unit that estimates fingering information indicating fingers to be used when playing at least some of the notes included in the sequence of notes on a musical instrument using a trained model,
the time-series data further includes basic fingering information indicating fingers to be used when playing a first proportion of notes included in the sequence of notes on a musical instrument,
the estimation unit estimates, based on the basic fingering information, the fingering information indicating a finger to be used when playing notes having a second proportion greater than the first proportion among the notes included in the sequence of notes , using an instrument;
the trained model is a machine learning model that has learned an input/output relationship between input time-series data including a reference sequence of notes, and output fingering information indicating fingers to be used when playing at least some of the notes included in the reference sequence of notes on a musical instrument;
the input time-series data further includes the basic fingering information indicating fingers to be used when playing a first proportion of notes included in the reference sequence of notes on the musical instrument,
The output fingering information further includes the basic fingering information included in the input time-series data .

A reception unit that receives time series data including a musical note sequence consisting of a plurality of musical notes;
an estimation unit that estimates fingering information indicating fingers to be used when playing at least some of the notes included in the sequence of notes on a musical instrument using a trained model, and note information indicating notes to which fingerings are to be assigned from the sequence of notes,
The fingering presentation device includes an estimation unit that estimates intermediate fingering information indicating the fingers to be used when playing each note included in the sequence of notes on an instrument, and the note information , and estimates the fingering information indicating the fingers to be used when playing notes included in the sequence of notes on an instrument other than the notes indicated by the note information, by deleting from the intermediate fingering information finger information corresponding to the notes to which fingering in the note information is to be assigned.

The fingering presentation device according to any one of claims 1 to 6, further comprising a generation unit that generates musical score information indicating a first auxiliary musical score to which the fingering information is assigned so as to correspond to at least some of the notes included in the sequence of notes.

7. The fingering guidance suggestion device according to claim 2, further comprising a generating unit that generates score information representing a second auxiliary score on which the notes represented by the note information are displayed in an identifiable manner.

A first acquisition unit that acquires input time series data including a reference note sequence consisting of a plurality of notes;
a second acquisition unit that acquires output fingering information indicating fingers to be used when playing at least some of the notes included in the reference sequence of notes on a musical instrument, or output note information indicating notes to which fingerings are to be assigned from the reference sequence of notes;
a construction unit that constructs a trained model that has acquired an input/output relationship between the input time-series data and the output fingering information or the output note information,
the input time-series data further includes a reference performer identifier indicating a reference performer who performs the reference sequence of notes;
The output fingering information indicates the reference player's fingers to use when playing on the musical instrument.

Accepts time series data including a sequence of multiple notes,
using the trained model to estimate fingering information indicative of fingers to be used when playing at least some of the notes in the sequence of notes on a musical instrument;
The time-series data further includes a performer identifier indicating a performer who performs the sequence of notes,
estimating the finger information includes estimating the finger information based on the player identifier,
the trained model is a machine learning model that has learned an input/output relationship between input time-series data including a reference sequence of notes, and output fingering information indicating fingers to be used when playing at least some of the notes included in the reference sequence of notes on a musical instrument;
the input time-series data further includes a reference performer identifier indicating a reference performer who performs the reference sequence of notes;
the output fingering information indicates the fingers of the reference player to be used when playing a note on the musical instrument;
A computer-implemented method for providing fingering instructions.

Accepts time series data including a sequence of multiple notes,
using the trained model to estimate fingering information indicative of fingers to be used when playing at least some of the notes in the sequence of notes on a musical instrument;
the time-series data further includes basic fingering information indicating fingers to be used when playing a first proportion of notes included in the sequence of notes on a musical instrument,
estimating the fingering information includes estimating, based on the basic fingering information, fingering information indicating fingers to be used when playing notes having a second proportion, which is greater than the first proportion, among the notes included in the sequence of notes, on a musical instrument; and
the trained model is a machine learning model that has learned an input/output relationship between input time-series data including a reference sequence of notes, and output fingering information indicating fingers to be used when playing at least some of the notes included in the reference sequence of notes on a musical instrument;
the input time-series data further includes the basic fingering information indicating fingers to be used when playing a first proportion of notes included in the reference sequence of notes on the musical instrument,
The output move information further includes the basic move information included in the input time-series data,
A computer-implemented method for providing fingering instructions.

Accepts time series data including a sequence of multiple notes,
using the trained model, estimating fingering information indicating fingers to be used when playing at least some of the notes included in the sequence of notes on a musical instrument, and note information indicating notes to which fingerings are to be assigned from the sequence of notes;
estimating the fingering information includes estimating intermediate fingering information indicating fingers to be used when playing each note included in the sequence of notes on a musical instrument, and the note information, and deleting fingering information corresponding to a note to which fingering in the note information is to be assigned from the intermediate fingering information, thereby estimating the fingering information indicating fingers to be used when playing notes included in the sequence of notes, other than the notes indicated by the note information, on a musical instrument,
A computer-implemented method for providing fingering instructions.

Obtaining input time series data including a reference sequence of a plurality of notes;
obtain output fingering information indicating fingers to be used when playing at least some of the notes included in the reference sequence of notes on a musical instrument, or output note information indicating notes to which fingering is to be assigned from the reference sequence of notes;
Constructing a trained model that has learned the input/output relationship between the input time-series data and the output fingering information or the output note information;
the input time-series data further includes a reference performer identifier indicating a reference performer who performs the reference sequence of notes;
the output fingering information indicates fingers of the reference player to be used when playing the musical instrument,
A computer-implemented training method.