JP7084520B2

JP7084520B2 - Simulation equipment, simulation method and simulation program

Info

Publication number: JP7084520B2
Application number: JP2021035735A
Authority: JP
Inventors: 健一郎島田; 浩二伊藤; 知範泉谷; 大地木村
Original assignee: NTT Docomo Business Inc; NTT Communications Corp
Current assignee: NTT Docomo Business Inc
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2022-06-14
Anticipated expiration: 2039-02-08
Also published as: JP2021082367A

Description

本発明は、シミュレーション装置、シミュレーション方法およびシミュレーションプログラムに関する。 The present invention relates to a simulation apparatus, a simulation method and a simulation program.

近年、工場、プラント、ビルおよびデータセンタ等の様々な環境における機器制御に強化学習器等の機械学習を用いることが提案されている。この様な機械学習では、工場、プラント、ビルおよびデータセンタ等の様々な環境に対して、制御装置や空調機等の様々な機器の入出力をモデル化してシミュレーション環境を構築することで、強化学習器の制御パラメータの探索が行われている。制御パラメータの探索は、熟練作業者が試行錯誤を行うことで行われている。 In recent years, it has been proposed to use machine learning such as a reinforcement learning device for device control in various environments such as factories, plants, buildings and data centers. Such machine learning is strengthened by modeling the input and output of various devices such as control devices and air conditioners and building a simulation environment for various environments such as factories, plants, buildings and data centers. The control parameters of the learner are being searched. The search for control parameters is performed by a skilled worker through trial and error.

佐藤和也外２名著、「はじめての制御工学」、株式会社講談社、２０１０年１０月Kazuya Sato, 2 authors, "First Control Engineering", Kodansha Co., Ltd., October 2010 Richard S.Sutton 外１名著、「強化学習」、森北出版株式会社、２０００年１２月Richard S. Sutton, 1 author, "Reinforcement Learning", Morikita Publishing Co., Ltd., December 2000 Volodymyr Mnih 外６名、“Playing Atari with Deep Reinforcement Learning”、［Online］、２０１３年１２月、NIPS Deep Learning Workshop 2013、［平成３１年１月２９日検索］、インターネット＜https://arxiv.org/pdf/1312.5602.pdf>Volodymyr Mnih 6 people outside, "Playing Atari with Deep Reinforcement Learning", [Online], December 2013, NIPS Deep Learning Workshop 2013, [Search January 29, 2019], Internet <https://arxiv.org /pdf/1312.5602.pdf>

しかしながら、従来は、高精度なシミュレーション環境を容易に構築することが困難であったという課題がある。例えば、シミュレーション環境の構築と、制御パラメータの探索とは、別々に行われていたため、制御パラメータの探索において、シミュレーション環境を変更したい場合、どのように変更するのかを作業者が指示することが求められる。このため、シミュレーション環境の構築と、制御パラメータの探索とを繰り返す場合、多くの手間と時間とを要することになり、高精度なシミュレーション環境を容易に構築することが困難である。 However, in the past, there is a problem that it has been difficult to easily construct a high-precision simulation environment. For example, the construction of the simulation environment and the search for control parameters were performed separately, so when searching for control parameters, if the simulation environment is to be changed, the operator is required to instruct how to change it. Be done. Therefore, when the construction of the simulation environment and the search for the control parameters are repeated, a lot of labor and time are required, and it is difficult to easily construct a highly accurate simulation environment.

上述した課題を解決し、目的を達成するために、本発明のシミュレーション装置は、学習用データの入力を受け付ける第１受付部と、受け付けた前記学習用データを用いて学習し、予測モデルを生成する生成部と、シミュレーションに用いる評価用データと、生成された前記予測モデルと、前記シミュレーションにおける強化学習を行う強化学習器とのうち、いずれか１つまたは複数の配置を受け付ける第２受付部と、受け付けた前記配置の状態に基づいて、前記評価用データと、前記予測モデルと、前記強化学習器とを用いた前記シミュレーションを実行する実行部と、前記シミュレーションにおける前記強化学習器の学習結果に基づいて、前記予測モデルの再生成を行うか否かを判定し、前記予測モデルの再生成を行うと判定した場合、前記生成部に対して、前記予測モデルの再生成を指示する判定部と、を有することを特徴とする。 In order to solve the above-mentioned problems and achieve the object, the simulation apparatus of the present invention learns using the first reception unit that accepts the input of the training data and the received training data, and generates a prediction model. A second reception unit that accepts one or more of the generation unit, the evaluation data used for the simulation, the generated prediction model, and the reinforcement learning device that performs reinforcement learning in the simulation. Based on the received state of the arrangement, the execution unit that executes the simulation using the evaluation data, the prediction model, and the reinforcement learning device, and the learning result of the reinforcement learning device in the simulation. Based on this, it is determined whether or not to regenerate the prediction model, and when it is determined that the prediction model is to be regenerated, a determination unit that instructs the generation unit to regenerate the prediction model. It is characterized by having.

また、本発明のシミュレーション方法は、学習用データの入力を受け付ける第１受付工程と、受け付けた前記学習用データを用いて学習し、予測モデルを生成する生成工程と、シミュレーションに用いる評価用データと、生成された前記予測モデルと、前記シミュレーションにおける強化学習を行う強化学習器とのうち、いずれか１つまたは複数の配置を受け付ける第２受付工程と、受け付けた前記配置の状態に基づいて、前記評価用データと、前記予測モデルと、前記強化学習器とを用いた前記シミュレーションを実行する実行工程と、前記シミュレーションにおける前記強化学習器の学習結果に基づいて、前記予測モデルの再生成を行うか否かを判定し、前記予測モデルの再生成を行うと判定した場合、前記生成工程に対して、前記予測モデルの再生成を指示する判定工程と、をシミュレーション装置が実行することを特徴とする。 Further, the simulation method of the present invention includes a first reception step of accepting input of training data, a generation step of learning using the received training data to generate a prediction model, and evaluation data used for simulation. Based on the second reception step of accepting one or more arrangements of the generated prediction model and the reinforcement learning device for performing reinforcement learning in the simulation, and the state of the received arrangements. Whether to regenerate the prediction model based on the evaluation data, the execution step of executing the simulation using the prediction model, the reinforcement learning device, and the learning result of the reinforcement learning device in the simulation. When it is determined whether or not the prediction model is to be regenerated, the simulation device is characterized in that the simulation apparatus executes a determination step of instructing the generation step to regenerate the prediction model. ..

また、本発明のシミュレーションプログラムは、学習用データの入力を受け付ける第１受付ステップと、受け付けた前記学習用データを用いて学習し、予測モデルを生成する生成ステップと、シミュレーションに用いる評価用データと、生成された前記予測モデルと、前記シミュレーションにおける強化学習を行う強化学習器とのうち、いずれか１つまたは複数の配置を受け付ける第２受付ステップと、受け付けた前記配置の状態に基づいて、前記評価用データと、前記予測モデルと、前記強化学習器とを用いた前記シミュレーションを実行する実行ステップと、前記シミュレーションにおける前記強化学習器の学習結果に基づいて、前記予測モデルの再生成を行うか否かを判定し、前記予測モデルの再生成を行うと判定した場合、前記生成ステップに対して、前記予測モデルの再生成を指示する判定ステップと、をコンピュータに実行させることを特徴とする。 Further, the simulation program of the present invention includes a first reception step that accepts input of training data, a generation step that learns using the received training data and generates a prediction model, and evaluation data used for simulation. , The second reception step that accepts one or more arrangements of the generated prediction model and the reinforcement learning device that performs reinforcement learning in the simulation, and the state of the received arrangement. Whether to regenerate the prediction model based on the evaluation data, the execution step of executing the simulation using the prediction model, the reinforcement learning device, and the learning result of the reinforcement learning device in the simulation. When it is determined whether or not the prediction model is to be regenerated, the computer is made to execute a determination step instructing the generation step to regenerate the prediction model.

本発明によれば、高精度なシミュレーション環境を容易に構築することができるという効果を奏する。 According to the present invention, there is an effect that a highly accurate simulation environment can be easily constructed.

図１は、第１の実施形態に係るシミュレーション装置の構成の一例を示すブロック図である。FIG. 1 is a block diagram showing an example of the configuration of the simulation apparatus according to the first embodiment. 図２は、配置画面の一例を示す図である。FIG. 2 is a diagram showing an example of an arrangement screen. 図３は、配置画面の他の一例を示す図である。FIG. 3 is a diagram showing another example of the arrangement screen. 図４は、予測画像モデルにおける学習の一例を説明する図である。FIG. 4 is a diagram illustrating an example of learning in the predicted image model. 図５は、第１の実施形態におけるシミュレーション処理の一例を示すフローチャートである。FIG. 5 is a flowchart showing an example of the simulation process in the first embodiment. 図６は、シミュレーションプログラムを実行するコンピュータの一例を示す図である。FIG. 6 is a diagram showing an example of a computer that executes a simulation program.

以下、図面に基づいて、本願の開示するシミュレーション装置、シミュレーション方法およびシミュレーションプログラムの実施の形態を詳細に説明する。なお、この実施の形態により本願に係るシミュレーション装置、シミュレーション方法およびシミュレーションプログラムが限定されるものではない。 Hereinafter, embodiments of the simulation apparatus, simulation method, and simulation program disclosed in the present application will be described in detail with reference to the drawings. The simulation apparatus, simulation method, and simulation program according to the present application are not limited to this embodiment.

［第１の実施形態］
以下の実施の形態では、第１の実施形態に係るシミュレーション装置１００の構成、シミュレーション装置１００の処理の流れを順に説明し、最後に第１の実施形態による効果を説明する。 [First Embodiment]
In the following embodiments, the configuration of the simulation apparatus 100 and the processing flow of the simulation apparatus 100 according to the first embodiment will be described in order, and finally, the effects of the first embodiment will be described.

［シミュレーション装置の構成］
まず、図１を用いて、シミュレーション装置１００の構成を説明する。図１は、第１の実施形態に係るシミュレーション装置の構成の一例を示すブロック図である。シミュレーション装置１００は、例えば、他の情報処理装置から学習用データの入力を受け付ける。シミュレーション装置１００は、生成部が、受け付けた学習用データを用いて学習し、予測モデルを生成する。シミュレーション装置１００は、シミュレーションに用いる評価用データと、生成された予測モデルと、シミュレーションにおける強化学習を行う強化学習器と、シミュレーションにおける模倣学習を行う模倣学習器とのうち、いずれか１つまたは複数の配置を受け付ける。シミュレーション装置１００は、受け付けた配置の状態に基づいて、評価用データと、予測モデルと、強化学習器と、模倣学習器とを用いたシミュレーションを実行する。シミュレーション装置１００は、シミュレーションにおける強化学習器の学習結果に基づいて、予測モデルの再生成を行うか否かを判定する。シミュレーション装置１００は、予測モデルの再生成を行うと判定した場合、生成部に対して、予測モデルの再生成を指示する。これにより、シミュレーション装置１００は、高精度なシミュレーション環境を容易に構築することができる。 [Simulation device configuration]
First, the configuration of the simulation device 100 will be described with reference to FIG. FIG. 1 is a block diagram showing an example of the configuration of the simulation apparatus according to the first embodiment. The simulation device 100 receives input of learning data from, for example, another information processing device. The simulation device 100 learns using the received learning data by the generation unit, and generates a prediction model. The simulation device 100 is one or more of the evaluation data used for the simulation, the generated prediction model, the reinforcement learning device for performing reinforcement learning in the simulation, and the imitation learning device for performing imitation learning in the simulation. Accept the placement of. The simulation device 100 executes a simulation using the evaluation data, the prediction model, the reinforcement learner, and the imitation learner based on the received arrangement state. The simulation device 100 determines whether or not to regenerate the prediction model based on the learning result of the reinforcement learning device in the simulation. When the simulation device 100 determines that the prediction model is to be regenerated, the simulation device 100 instructs the generation unit to regenerate the prediction model. As a result, the simulation device 100 can easily construct a highly accurate simulation environment.

図１に示すように、シミュレーション装置１００は、通信部１１０と、表示部１１１と、操作部１１２と、記憶部１２０と、制御部１３０とを有する。なお、シミュレーション装置１００は、図１に示す機能部以外にも既知のコンピュータが有する各種の機能部、例えば各種の入力デバイスや音声出力デバイス等の機能部を有することとしてもかまわない。 As shown in FIG. 1, the simulation device 100 includes a communication unit 110, a display unit 111, an operation unit 112, a storage unit 120, and a control unit 130. In addition to the functional units shown in FIG. 1, the simulation device 100 may have various functional units of known computers, such as various input devices and voice output devices.

通信部１１０は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。通信部１１０は、図示しないネットワークを介して、他の情報処理装置と有線または無線で接続され、他の情報処理装置との間で情報の通信を司る通信インタフェースである。通信部１１０は、例えば、他の情報処理装置から、学習用データおよび評価用データを受信する。通信部１１０は、受信した学習用データおよび評価用データを制御部１３０に出力する。なお、通信部１１０は、例えば、他の情報処理装置から、後述する予測モデル、第１学習済モデルおよび第２学習済モデル等を受信するようにしてもよい。 The communication unit 110 is realized by, for example, a NIC (Network Interface Card) or the like. The communication unit 110 is a communication interface that is connected to another information processing device by wire or wirelessly via a network (not shown) and controls information communication with the other information processing device. The communication unit 110 receives learning data and evaluation data from, for example, another information processing device. The communication unit 110 outputs the received learning data and evaluation data to the control unit 130. The communication unit 110 may receive, for example, a prediction model, a first trained model, a second trained model, and the like, which will be described later, from another information processing device.

表示部１１１は、各種情報を表示するための表示デバイスである。表示部１１１は、例えば、表示デバイスとして液晶ディスプレイ等によって実現される。表示部１１１は、制御部１３０から入力された表示画面等の各種画面を表示する。 The display unit 111 is a display device for displaying various information. The display unit 111 is realized by, for example, a liquid crystal display or the like as a display device. The display unit 111 displays various screens such as a display screen input from the control unit 130.

操作部１１２は、シミュレーション装置１００のユーザから各種操作を受け付ける入力デバイスである。操作部１１２は、例えば、入力デバイスとして、キーボードやマウス等によって実現される。操作部１１２は、ユーザによって入力された操作を操作情報として制御部１３０に出力する。なお、操作部１１２は、入力デバイスとして、タッチパネル等によって実現されるようにしてもよく、表示部１１１の表示デバイスと、操作部１１２の入力デバイスとは、一体化されるようにしてもよい。 The operation unit 112 is an input device that receives various operations from the user of the simulation device 100. The operation unit 112 is realized by, for example, a keyboard, a mouse, or the like as an input device. The operation unit 112 outputs the operation input by the user to the control unit 130 as operation information. The operation unit 112 may be realized by a touch panel or the like as an input device, or the display device of the display unit 111 and the input device of the operation unit 112 may be integrated.

記憶部１２０は、例えば、ＲＡＭ（Random Access Memory）、フラッシュメモリ等の半導体メモリ素子、ハードディスクや光ディスク等の記憶装置によって実現される。記憶部１２０は、学習用データ記憶部１２１と、予測モデル記憶部１２２と、評価用データ記憶部１２３と、配置情報記憶部１２４と、学習済モデル記憶部１２５とを有する。また、記憶部１２０は、制御部１３０での処理に用いる情報を記憶する。 The storage unit 120 is realized by, for example, a RAM (Random Access Memory), a semiconductor memory element such as a flash memory, or a storage device such as a hard disk or an optical disk. The storage unit 120 includes a learning data storage unit 121, a prediction model storage unit 122, an evaluation data storage unit 123, an arrangement information storage unit 124, and a learned model storage unit 125. Further, the storage unit 120 stores information used for processing in the control unit 130.

学習用データ記憶部１２１は、例えば、プラントの各部に設けられた温度や圧力等の各種センサが出力するセンサ情報を学習用データとして記憶する。学習用データは、予測対象に対する前処理が施された後、予測モデルの機械学習に用いられる。 The learning data storage unit 121 stores, for example, sensor information output by various sensors such as temperature and pressure provided in each unit of the plant as learning data. The training data is used for machine learning of the prediction model after the prediction target is preprocessed.

予測モデル記憶部１２２は、予測対象の各種パラメータの出力値が得られるように、前処理済みの学習用データを機械学習で学習させた予測モデルを記憶する。予測モデル記憶部１２２は、シミュレーションにおいて、評価用データに基づいて、予測対象の各種パラメータを出力する。また、予測モデルは、例えば、センサ情報を特徴量として、ニューラルネットワークを用いて深層学習を行ったものである。ニューラルネットワークとしては、例えば、ＣＮＮ（Convolutional Neural Network）を用いることができる。つまり、予測モデルは、例えば、学習パラメータとしてニューラルネットワークの各種パラメータ（重み係数）等を記憶する。なお、通信部１１０を介して他の情報処理装置から予測モデルを取得した場合、予測モデルは、シミュレーションにおいて新たな機械学習を行わないようにしてもよい。 The prediction model storage unit 122 stores a prediction model in which preprocessed learning data is trained by machine learning so that output values of various parameters to be predicted can be obtained. The prediction model storage unit 122 outputs various parameters to be predicted based on the evaluation data in the simulation. Further, in the prediction model, for example, deep learning is performed using a neural network using sensor information as a feature quantity. As the neural network, for example, CNN (Convolutional Neural Network) can be used. That is, the prediction model stores, for example, various parameters (weighting coefficients) of the neural network as learning parameters. When a prediction model is acquired from another information processing device via the communication unit 110, the prediction model may not perform new machine learning in the simulation.

評価用データ記憶部１２３は、例えば、プラントの各部に設けられた温度や圧力等の各種センサが出力するセンサ情報を評価用データとして記憶する。評価用データは、取得されたセンサ情報のデータを、学習用データと評価用データとに分割したものであってもよい。 The evaluation data storage unit 123 stores, for example, sensor information output by various sensors such as temperature and pressure provided in each part of the plant as evaluation data. The evaluation data may be obtained by dividing the acquired sensor information data into learning data and evaluation data.

配置情報記憶部１２４は、ユーザから受け付けた評価用データと、予測モデルと、強化学習器と、模倣学習器とのうち、いずれか１つまたは複数の要素の配置の状態を表す配置情報を記憶する。配置情報は、配置される各要素間の接続情報も含む。 The placement information storage unit 124 stores the placement information representing the placement state of any one or more of the evaluation data received from the user, the prediction model, the reinforcement learner, and the imitation learner. do. The placement information also includes connection information between each placed element.

学習済モデル記憶部１２５は、シミュレーションにおける強化学習器の学習結果である第１学習済モデルと、模倣学習器の学習結果である第２学習済モデルとを記憶する。第１学習済モデルおよび第２学習済モデルは、例えば、制御対象のバルブの開閉状況に応じて出力流量を出力する予測モデルに対するバルブの開度を学習する。学習済モデル記憶部１２５は、第１学習済モデルとして、例えば、Ｑ学習の各種パラメータを記憶する。また、学習済モデル記憶部１２５は、第２学習済モデルとして、例えば、バルブをＰＩＤ（Proportional Integral Differential）制御した場合の結果を模倣した各種パラメータを記憶する。なお、通信部１１０を介して他の情報処理装置から第１学習済モデルおよび第２学習済モデルを取得した場合、第１学習済モデルおよび第２学習済モデルは、シミュレーションにおいて新たな強化学習および模倣学習を行わないようにしてもよい。 The trained model storage unit 125 stores the first trained model, which is the learning result of the reinforcement learner in the simulation, and the second trained model, which is the learning result of the imitation learner. The first trained model and the second trained model learn, for example, the opening degree of the valve with respect to the prediction model that outputs the output flow rate according to the open / closed state of the valve to be controlled. The trained model storage unit 125 stores, for example, various parameters of Q-learning as the first trained model. Further, the trained model storage unit 125 stores various parameters as the second trained model, for example, imitating the result when the valve is controlled by PID (Proportional Integral Differential). When the first trained model and the second trained model are acquired from another information processing device via the communication unit 110, the first trained model and the second trained model are subjected to new reinforcement learning and new training in the simulation. It is possible not to perform imitation learning.

制御部１３０は、例えば、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）等によって、内部の記憶装置に記憶されているプログラムがＲＡＭを作業領域として実行されることにより実現される。また、制御部１３０は、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現されるようにしてもよい。 The control unit 130 is realized by, for example, using a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or the like to execute a program stored in an internal storage device using the RAM as a work area. Further, the control unit 130 may be realized by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

制御部１３０は、第１受付部１３１と、設定部１３２と、生成部１３３と、第２受付部１３４と、実行部１３５と、判定部１３６と、出力制御部１３７とを有し、以下に説明する情報処理の機能や作用を実現または実行する。なお、制御部１３０の内部構成は、図１に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。 The control unit 130 includes a first reception unit 131, a setting unit 132, a generation unit 133, a second reception unit 134, an execution unit 135, a determination unit 136, and an output control unit 137. Realize or execute the information processing functions and actions described. The internal configuration of the control unit 130 is not limited to the configuration shown in FIG. 1, and may be any other configuration as long as it is configured to perform information processing described later.

第１受付部１３１は、ユーザの操作に基づいて、例えば、通信部１１０を介して、他の情報処理装置から学習用データの入力を受け付ける。第１受付部１３１は、受け付けた学習用データを学習用データ記憶部１２１に記憶する。また、第１受付部１３１は、ユーザの操作に基づいて、例えば、通信部１１０を介して、他の情報処理装置から評価用データの入力を受け付ける。第１受付部１３１は、受け付けた評価用データを評価用データ記憶部１２３に記憶する。また、第１受付部１３１は、ユーザから予測対象に対する設定の開始を受け付けると、設定部１３２に設定指示を出力する。なお、第１受付部１３１は、ユーザから設定を行う予測対象を複数受け付けてもよい。 The first reception unit 131 receives input of learning data from another information processing device, for example, via the communication unit 110, based on the user's operation. The first reception unit 131 stores the received learning data in the learning data storage unit 121. Further, the first reception unit 131 receives input of evaluation data from another information processing device, for example, via the communication unit 110, based on the user's operation. The first reception unit 131 stores the received evaluation data in the evaluation data storage unit 123. Further, when the first reception unit 131 receives the start of the setting for the prediction target from the user, the first reception unit 131 outputs a setting instruction to the setting unit 132. The first reception unit 131 may receive a plurality of prediction targets to be set by the user.

設定部１３２には、第１受付部１３１または生成部１３３から設定指示が入力される。また、設定部１３２には、判定部１３６から再設定指示が入力される。設定部１３２は、設定指示または再設定指示が入力されると、予測対象を設定する。設定部１３２は、例えば、予測対象がバルブであれば、出力値として出力流量を設定し、制御対象としてバルブの開度を設定する。なお、設定部１３２は、再設定指示が入力された場合、既に試行済みの設定条件と異なる条件に予測対象を設定する。設定条件は、例えば、パラメータのリストを順番に試行したり、予測モデル自体を変更したりすることで変更する。設定部１３２は、予測対象を設定すると、設定した予測対象に対応する前処理を、学習用データ記憶部１２１の学習用データに対して実行する。なお、前処理とは、例えば、欠損値の補充、異常値処理、標準化等が挙げられる。設定部１３２は、前処理が完了すると、生成指示を生成部１３３に出力する。 A setting instruction is input to the setting unit 132 from the first reception unit 131 or the generation unit 133. Further, a resetting instruction is input to the setting unit 132 from the determination unit 136. When the setting instruction or the resetting instruction is input, the setting unit 132 sets the prediction target. For example, if the prediction target is a valve, the setting unit 132 sets the output flow rate as the output value and sets the opening degree of the valve as the control target. When the resetting instruction is input, the setting unit 132 sets the prediction target under a condition different from the already tried setting condition. The setting conditions are changed, for example, by trying a list of parameters in order or changing the prediction model itself. When the prediction target is set, the setting unit 132 executes preprocessing corresponding to the set prediction target on the learning data of the learning data storage unit 121. The pretreatment includes, for example, replenishment of missing values, processing of abnormal values, standardization, and the like. When the preprocessing is completed, the setting unit 132 outputs a generation instruction to the generation unit 133.

生成部１３３は、設定部１３２から生成指示が入力されると、学習用データ記憶部１２１から学習用データを読み込んで機械学習を行い、予測モデルを生成する。生成部１３３は、例えば、ＣＮＮ等の深層学習を行い、予測モデルを生成する。生成部１３３は、生成した予測モデルを予測モデル記憶部１２２に記憶する。生成部１３３は、予測モデルが未生成の予測対象があるか否かを判定する。生成部１３３は、予測モデルが未生成の予測対象があると判定した場合、設定部１３２に対して、残りの予測対象について設定を行うように設定指示を出力する。生成部１３３は、予測モデルが未生成の予測対象がないと判定した場合、第２受付部１３４に対して受付指示を出力する。なお、設定部１３２と生成部１３３とは、統合してもよい。また、判定部１３６から設定部１３２に入力される再設定指示は、予測モデルの再生成を指示するものであり、設定部１３２と生成部１３３とを統合した場合、統合後の生成部に入力される。 When the generation instruction is input from the setting unit 132, the generation unit 133 reads the learning data from the learning data storage unit 121, performs machine learning, and generates a prediction model. The generation unit 133 performs deep learning such as CNN and generates a prediction model. The generation unit 133 stores the generated prediction model in the prediction model storage unit 122. The generation unit 133 determines whether or not the prediction model has an ungenerated prediction target. When the generation unit 133 determines that the prediction model has an ungenerated prediction target, the generation unit 133 outputs a setting instruction to the setting unit 132 to set the remaining prediction targets. When the generation unit 133 determines that there is no ungenerated prediction target in the prediction model, the generation unit 133 outputs a reception instruction to the second reception unit 134. The setting unit 132 and the generation unit 133 may be integrated. Further, the resetting instruction input from the determination unit 136 to the setting unit 132 indicates the regeneration of the prediction model, and when the setting unit 132 and the generation unit 133 are integrated, they are input to the generation unit after integration. Will be done.

第２受付部１３４は、生成部１３３から受付指示が入力されると、表示部１１１に配置画面を表示させ、ユーザからシミュレーション環境における各要素の配置を受け付ける。第２受付部１３４は、配置する要素として、生成した予測モデルの配置を受け付ける。また、第２受付部１３４は、配置する要素として、評価用データ、強化学習器および模倣学習器の配置を受け付ける。さらに、第２受付部１３４は、例えば、配置された強化学習器および模倣学習器の制御対象等の設定、つまり各要素間の接続情報を受け付ける。第２受付部１３４は、ユーザから配置の完了を受け付けると、各要素の配置および接続情報を含む配置情報を配置情報記憶部１２４に記憶する。 When the reception instruction is input from the generation unit 133, the second reception unit 134 causes the display unit 111 to display the arrangement screen, and receives the arrangement of each element in the simulation environment from the user. The second reception unit 134 receives the arrangement of the generated prediction model as an element to be arranged. Further, the second reception unit 134 receives the arrangement of the evaluation data, the reinforcement learning device, and the imitation learning device as the elements to be arranged. Further, the second reception unit 134 receives, for example, settings such as control targets of the arranged reinforcement learner and imitation learner, that is, connection information between each element. When the second reception unit 134 receives the completion of the arrangement from the user, the second reception unit 134 stores the arrangement information including the arrangement and connection information of each element in the arrangement information storage unit 124.

ここで、図２および図３を用いて、配置画面について説明する。図２は、配置画面の一例を示す図である。図２に示す配置画面１０は、シミュレーション環境において配置対象となる各要素を表示する選択領域１１と、シミュレーション対象となるシミュレーション環境を構築する構築領域１２とを有する。配置対象となる各要素としては、例えば、評価用データに対応するデータ１３、各予測モデルに対応する予測モデルＣ１～ＣＸ、関数処理Ｄ１，Ｄ２、ＰＩＤ制御Ｐ１、強化学習器に対応する強化学習１４、および、模倣学習器に対応する模倣学習１５が挙げられる。なお、図２では、選択領域１１のＰＩＤパラメータ１６は省略されている。 Here, the arrangement screen will be described with reference to FIGS. 2 and 3. FIG. 2 is a diagram showing an example of an arrangement screen. The arrangement screen 10 shown in FIG. 2 has a selection area 11 for displaying each element to be arranged in the simulation environment, and a construction area 12 for constructing the simulation environment to be simulated. The elements to be arranged include, for example, data 13 corresponding to evaluation data, prediction models C1 to CX corresponding to each prediction model, function processing D1, D2, PID control P1, and reinforcement learning corresponding to a reinforcement learner. 14 and imitation learning 15 corresponding to the imitation learner. In FIG. 2, the PID parameter 16 of the selection area 11 is omitted.

構築領域１２では、例えば、ユーザが選択領域１１から各要素についてドラッグアンドドロップ操作を行うことで、各要素が配置される。図２の例では、データ１３と、予測モデルＣ１～Ｃ６と、関数処理Ｄ１と、強化学習１４と、模倣学習１５と、ＰＩＤ制御Ｐ１と、ＰＩＤパラメータ１６とが配置されている。なお、関数処理Ｄ１は、入力に対して何らかの関数処理を行って出力を行う要素であり、例えば、移動平均を求める関数等である。また、ＰＩＤ制御Ｐ１は、フィードバック制御を行うＰＩＤ制御に対応する。ＰＩＤパラメータ１６は、ＰＩＤ制御Ｐ１の各種ゲインや時間、目標値、操作量等のパラメータである。 In the construction area 12, for example, each element is arranged by the user performing a drag-and-drop operation for each element from the selection area 11. In the example of FIG. 2, data 13, prediction models C1 to C6, function processing D1, reinforcement learning 14, imitation learning 15, PID control P1 and PID parameter 16 are arranged. The function processing D1 is an element that performs some function processing on the input and outputs the output, and is, for example, a function for obtaining a moving average. Further, the PID control P1 corresponds to the PID control that performs feedback control. The PID parameter 16 is a parameter such as various gains, times, target values, and operation amounts of the PID control P1.

また、図２の例では、データ１３は、予測モデルＣ１～Ｃ６に接続される。予測モデルＣ１の出力は、ＰＩＤ制御Ｐ１と、ＰＩＤパラメータ１６とに接続される。ＰＩＤパラメータ１６の出力は、ＰＩＤ制御Ｐ１に接続される。ＰＩＤ制御Ｐ１の出力は、強化学習１４と、模倣学習１５とに接続される。また、予測モデルＣ２～Ｃ６および関数処理Ｄ１の出力は、強化学習１４と、模倣学習１５とに接続される。一方、強化学習１４および模倣学習１５の制御出力は、予測モデルＣ１～Ｃ６および関数処理Ｄ１に接続される。 Further, in the example of FIG. 2, the data 13 is connected to the prediction models C1 to C6. The output of the prediction model C1 is connected to the PID control P1 and the PID parameter 16. The output of the PID parameter 16 is connected to the PID control P1. The output of the PID control P1 is connected to the reinforcement learning 14 and the imitation learning 15. Further, the outputs of the prediction models C2 to C6 and the function processing D1 are connected to the reinforcement learning 14 and the imitation learning 15. On the other hand, the control outputs of the reinforcement learning 14 and the imitation learning 15 are connected to the prediction models C1 to C6 and the function processing D1.

図３は、配置画面の他の一例を示す図である。図３に示す配置画面２０は、図２の配置画面１０に対して状態の予測画像を生成および入力する配置としたものである。図３の例では、配置画面１０と比較して、選択領域１１に要素として、予測画像２１および予測画像モデルＣ７が追加され、構築領域１２には、予測画像２１および予測画像モデルＣ７が追加して配置されている。 FIG. 3 is a diagram showing another example of the arrangement screen. The arrangement screen 20 shown in FIG. 3 is an arrangement in which a state prediction image is generated and input to the arrangement screen 10 of FIG. In the example of FIG. 3, the predicted image 21 and the predicted image model C7 are added to the selection area 11 as elements, and the predicted image 21 and the predicted image model C7 are added to the construction area 12 as compared with the arrangement screen 10. Is arranged.

図３の例では、データ１３は、予測モデルＣ１～Ｃ６に接続される。予測モデルＣ１の出力は、ＰＩＤ制御Ｐ１と、ＰＩＤパラメータ１６と、予測画像モデルＣ７とに接続される。ＰＩＤパラメータ１６の出力は、ＰＩＤ制御Ｐ１に接続される。ＰＩＤ制御Ｐ１の出力は、強化学習１４と、模倣学習１５とに接続される。また、予測モデルＣ２，Ｃ３，Ｃ５，Ｃ６および関数処理Ｄ１の出力は、強化学習１４と、模倣学習１５とに接続される。さらに、予測モデルＣ２，Ｃ３および関数処理Ｄ１の出力は、予測画像モデルＣ７に接続される。また、予測モデルＣ４の出力は、関数処理Ｄ１に接続される。 In the example of FIG. 3, the data 13 is connected to the prediction models C1 to C6. The output of the prediction model C1 is connected to the PID control P1, the PID parameter 16, and the prediction image model C7. The output of the PID parameter 16 is connected to the PID control P1. The output of the PID control P1 is connected to the reinforcement learning 14 and the imitation learning 15. Further, the outputs of the prediction models C2, C3, C5, C6 and the function processing D1 are connected to the reinforcement learning 14 and the imitation learning 15. Further, the outputs of the prediction models C2 and C3 and the function processing D1 are connected to the prediction image model C7. Further, the output of the prediction model C4 is connected to the function process D1.

予測画像モデルＣ７は、生成部１３３において、入力を潜在変数（特徴量）として、ＧＡＮ（Generative Adversarial Networks）を用いて学習が行われる。予測画像モデルＣ７は、シミュレーションにおいて、予測モデルＣ１，Ｃ２，Ｃ３および関数処理Ｄ１の出力に基づいて、予測画像２１を生成する。すなわち、構築領域１２では、予測画像モデルＣ７の出力は、予測画像２１に接続される。予測画像２１は、画像データであり、強化学習１４と、模倣学習１５とに接続される。つまり、図３の例では、強化学習１４および模倣学習１５は、自身の予測値によって制御された予測モデルの出力に基づいて生成された予測画像についても学習する。なお、予測画像モデルＣ７の学習は、ＧＡＮに限定されず、例えば、ＣＮＮやＭＬＰ（MultiLayer Perceptron：多層パーセプトロン）等を用いて、各種センサ等の出力結果を画像として教師あり学習を行うようにしてもよい。 The prediction image model C7 is trained in the generation unit 133 using GAN (Generative Adversarial Networks) with the input as a latent variable (feature amount). The prediction image model C7 generates the prediction image 21 based on the outputs of the prediction models C1, C2, C3 and the function processing D1 in the simulation. That is, in the construction region 12, the output of the predicted image model C7 is connected to the predicted image 21. The predicted image 21 is image data and is connected to the reinforcement learning 14 and the imitation learning 15. That is, in the example of FIG. 3, the reinforcement learning 14 and the imitation learning 15 also learn the prediction image generated based on the output of the prediction model controlled by its own prediction value. The learning of the predicted image model C7 is not limited to GAN, and for example, CNN, MLP (MultiLayer Perceptron) or the like is used, and the output results of various sensors or the like are used as images for supervised learning. May be good.

ここで、予測画像の生成について、図４を用いて説明する。図４は、予測画像モデルにおける学習の一例を説明する図である。図４に示す学習例３０では、Generatorである生成器３１に対して潜在変数（特徴量）３２が入力されると、生成器３１は、偽物画像３３を生成し、Discriminatorである識別器３４に出力する。識別器３４は、偽物画像３３が本物画像である確率をシグモイド関数３５に出力する。また、識別器３４には、本物画像３６が偽物画像３３と交互に入力される。シグモイド関数３５は、偽物画像３３について、本物（１）か偽物（０）かを判定器３７に出力する。判定器３７は、入力された本物（１）か偽物（０）かについて、正解であるかどうかを判定して、誤差逆伝搬により生成器３１および識別器３４を学習させる。 Here, the generation of the predicted image will be described with reference to FIG. FIG. 4 is a diagram illustrating an example of learning in the predicted image model. In the learning example 30 shown in FIG. 4, when the latent variable (feature amount) 32 is input to the generator 31 which is a generator, the generator 31 generates a fake image 33 and causes the discriminator 34 which is a discriminator. Output. The classifier 34 outputs the probability that the fake image 33 is a real image to the sigmoid function 35. Further, the real image 36 is alternately input to the classifier 34 with the fake image 33. The sigmoid function 35 outputs to the determination device 37 whether the fake image 33 is genuine (1) or fake (0). The determination device 37 determines whether the input genuine (1) or fake (0) is the correct answer, and trains the generator 31 and the classifier 34 by error back propagation.

次に、図４に示す学習例３０では、学習が済んだ生成器３１を画像生成モデル３１ａとする。すなわち、画像生成モデル３１ａは、学習用データとして画像データを用いて機械学習を行った予測画像モデルである。画像生成モデル３１ａは、図３における予測画像モデルＣ７に相当し、予測モデル３８から出力されたシミュレータ出力値３９を入力として、予測画像４０を生成し出力する。生成された予測画像４０は、例えば、強化学習器での学習を介して、予測モデル３８の制御に利用できる。また、予測画像４０は、出力結果の確認用に表示するようにしてもよい。 Next, in the learning example 30 shown in FIG. 4, the trained generator 31 is used as the image generation model 31a. That is, the image generation model 31a is a predictive image model in which machine learning is performed using image data as training data. The image generation model 31a corresponds to the prediction image model C7 in FIG. 3, and generates and outputs the prediction image 40 by inputting the simulator output value 39 output from the prediction model 38. The generated prediction image 40 can be used to control the prediction model 38, for example, through learning with a reinforcement learning device. Further, the predicted image 40 may be displayed for confirmation of the output result.

図１の説明に戻る。実行部１３５は、ユーザからシミュレーションの開始指示を受け付けると、配置情報記憶部１２４を参照し、配置情報に基づいて、シミュレーションを実行する。つまり、実行部１３５は、強化学習器の強化学習、および、模倣学習器の模倣学習を実行し、第１学習済モデルおよび第２学習済モデルを学習済モデル記憶部１２５に記憶する。また、実行部１３５は、配置情報に予測画像モデルが含まれる場合、シミュレーションの実行中に、予測画像データを出力制御部１３７に出力する。すなわち、実行部１３５は、強化学習器の制御出力に応じた予測モデルの予測値、つまり、強化学習器の学習結果に基づく予測モデルの予測値に対応する予測画像データを出力制御部１３７に出力する。実行部１３５は、例えば、所定回数のシミュレーションが完了すると、判定指示を判定部１３６に出力する。また、実行部１３５は、配置情報に予測画像モデルが含まれる場合、シミュレーション完了時点の予測画像データを出力制御部１３７に出力する。なお、シミュレーションの所定回数は、例えば、予め設定された回数やユーザから指定を受け付けた回数を用いることができる。 Returning to the description of FIG. When the execution unit 135 receives the simulation start instruction from the user, the execution unit 135 refers to the arrangement information storage unit 124 and executes the simulation based on the arrangement information. That is, the execution unit 135 executes reinforcement learning of the reinforcement learner and imitation learning of the imitation learner, and stores the first trained model and the second trained model in the trained model storage unit 125. Further, when the arrangement information includes the predicted image model, the execution unit 135 outputs the predicted image data to the output control unit 137 during the execution of the simulation. That is, the execution unit 135 outputs the predicted value of the prediction model according to the control output of the reinforcement learner, that is, the prediction image data corresponding to the prediction value of the prediction model based on the learning result of the reinforcement learner to the output control unit 137. do. For example, when the simulation of a predetermined number of times is completed, the execution unit 135 outputs a determination instruction to the determination unit 136. Further, when the arrangement information includes the predicted image model, the execution unit 135 outputs the predicted image data at the time of completion of the simulation to the output control unit 137. As the predetermined number of simulations, for example, a preset number of times or a number of times specified by the user can be used.

判定部１３６には、配置情報に予測画像モデルが含まれる場合、表示した予測画像データに対して受け付けた評価が出力制御部１３７から入力される。判定部１３６は、実行部１３５から判定指示が入力されると、ユーザから予測画像データの評価を受け付けたか否かを判定する。つまり、判定部１３６は、配置情報に予測画像モデルが含まれる場合、出力制御部１３７が表示部１１１に表示した予測画像データに対して、ユーザから評価を受け付けたか否かを判定する。判定部１３６は、予測画像データの評価を受け付けていないと判定した場合、または、配置情報に予測画像モデルが含まれない場合、強化学習器の学習結果に基づいて、予測モデルの再生成を行うか否かを判定する。 When the predicted image model is included in the arrangement information, the determination unit 136 inputs the evaluation received for the displayed predicted image data from the output control unit 137. When the determination instruction is input from the execution unit 135, the determination unit 136 determines whether or not the evaluation of the predicted image data has been accepted from the user. That is, when the arrangement information includes the predicted image model, the determination unit 136 determines whether or not the predicted image data displayed on the display unit 111 by the output control unit 137 has been evaluated by the user. If the determination unit 136 determines that the evaluation of the predicted image data is not accepted, or if the arrangement information does not include the predicted image model, the determination unit 136 regenerates the predicted model based on the learning result of the reinforcement learner. Judge whether or not.

例えば、判定部１３６は、実環境における強化学習器の学習結果に基づく予測モデルの予測値と、シミュレーションにおける強化学習器の学習結果に基づく予測モデルの予測値との誤差の評価値に基づいて、予測モデルの再生成を行うか否かを判定する。誤差の評価値としては、平均平方二乗誤差（ＲＭＳＥ：Root Mean Square Error）や平均二乗誤差（ＭＳＥ：Mean Squared Error）を用いることができる。 For example, the determination unit 136 is based on the evaluation value of the error between the predicted value of the prediction model based on the learning result of the reinforcement learning device in the real environment and the predicted value of the prediction model based on the learning result of the reinforcement learning device in the simulation. Determine whether to regenerate the prediction model. As the evaluation value of the error, a mean squared error (RMSE: Root Mean Square Error) or a mean squared error (MSE: Mean Squared Error) can be used.

また、例えば、判定部１３６は、シミュレーションにおける強化学習器の学習結果に基づく予測モデルの予測値と、実測値との相関度合いに基づいて、予測モデルの再生成を行うか否かを判定する。相関度合いとしては、例えば、相関係数を用いてもよいし、損失関数を用いてもよい。つまり、判定部１３６は、学習結果に基づいて、一定期間、強化学習の結果が向上しない場合、新たな予測モデルの生成を行うか否かを判定する。 Further, for example, the determination unit 136 determines whether or not to regenerate the prediction model based on the degree of correlation between the prediction value of the prediction model based on the learning result of the reinforcement learning device in the simulation and the measured value. As the degree of correlation, for example, a correlation coefficient may be used, or a loss function may be used. That is, the determination unit 136 determines whether or not to generate a new prediction model when the result of reinforcement learning does not improve for a certain period of time based on the learning result.

一方、判定部１３６は、予測画像データの評価を受け付けたと判定した場合、受け付けた評価に基づいて、予測モデルの再生成を行うか否かを判定する。予測画像データの評価は、例えば、炉の内部をサーモグラフィによって撮影した実物画像に対応する予測画像を、熟練作業員が評価することで行うことができる。 On the other hand, when it is determined that the evaluation of the predicted image data has been accepted, the determination unit 136 determines whether or not to regenerate the prediction model based on the accepted evaluation. The evaluation of the predicted image data can be performed, for example, by a skilled worker evaluating a predicted image corresponding to a real image taken by thermography inside the furnace.

判定部１３６は、予測モデルの再生成を行うと判定した場合、設定部１３２に対して再設定指示を出力する。なお、再設定指示は、設定部１３２および生成部１３３に対して予測モデルの再生成を指示するものである。判定部１３６は、予測モデルの再生成を行わないと判定した場合、シミュレーション処理を終了する。すなわち、判定部１３６は、シミュレーション環境の構築を完了する。 When the determination unit 136 determines that the prediction model is to be regenerated, the determination unit 136 outputs a reset instruction to the setting unit 132. The resetting instruction instructs the setting unit 132 and the generation unit 133 to regenerate the prediction model. If the determination unit 136 determines that the prediction model is not regenerated, the determination unit 136 ends the simulation process. That is, the determination unit 136 completes the construction of the simulation environment.

出力制御部１３７は、実行部１３５から予測画像データが入力されると、予測画像データを表示部１１１に出力して表示する。出力制御部１３７は、例えば、シミュレーション実行中の予測画像データを逐次表示するようにしてもよいし、所定時間ごとに抽出した予測画像データを並べて表示するようにしてもよい。また、出力制御部１３７は、シミュレーションが完了した際の予測画像データを表示するようにしてもよい。さらに、出力制御部１３７は、表示した予測画像データに対する評価を受け付けると、受け付けた評価を判定部１３６に出力する。 When the predicted image data is input from the execution unit 135, the output control unit 137 outputs the predicted image data to the display unit 111 and displays it. For example, the output control unit 137 may sequentially display the predicted image data during simulation execution, or may display the predicted image data extracted at predetermined time intervals side by side. Further, the output control unit 137 may display the predicted image data when the simulation is completed. Further, when the output control unit 137 accepts the evaluation of the displayed predicted image data, the output control unit 137 outputs the accepted evaluation to the determination unit 136.

［シミュレーション装置の処理手順］
次に、第１の実施形態に係るシミュレーション装置１００の動作について説明する。図５は、第１の実施形態におけるシミュレーション処理の一例を示すフローチャートである。 [Processing procedure of simulation device]
Next, the operation of the simulation device 100 according to the first embodiment will be described. FIG. 5 is a flowchart showing an example of the simulation process in the first embodiment.

第１受付部１３１は、例えば、他の情報処理装置から学習用データの入力を受け付ける（ステップＳ１）。第１受付部１３１は、受け付けた学習用データを学習用データ記憶部１２１に記憶する。また、第１受付部１３１は、例えば、他の情報処理装置から評価用データの入力を受け付ける。第１受付部１３１は、受け付けた評価用データを評価用データ記憶部１２３に記憶する。また、第１受付部１３１は、ユーザから予測対象に対する設定の開始を受け付けると、設定部１３２に設定指示を出力する。 The first reception unit 131 receives, for example, input of learning data from another information processing device (step S1). The first reception unit 131 stores the received learning data in the learning data storage unit 121. Further, the first reception unit 131 receives input of evaluation data from, for example, another information processing device. The first reception unit 131 stores the received evaluation data in the evaluation data storage unit 123. Further, when the first reception unit 131 receives the start of the setting for the prediction target from the user, the first reception unit 131 outputs a setting instruction to the setting unit 132.

設定部１３２は、設定指示または再設定指示が入力されると、予測対象を設定する（ステップＳ２）。設定部１３２は、予測対象を設定すると、設定した予測対象に対応する前処理を、学習用データ記憶部１２１の学習用データに対して実行する（ステップＳ３）。設定部１３２は、前処理が完了すると、生成指示を生成部１３３に出力する。 When the setting instruction or the resetting instruction is input, the setting unit 132 sets the prediction target (step S2). When the prediction target is set, the setting unit 132 executes preprocessing corresponding to the set prediction target for the learning data of the learning data storage unit 121 (step S3). When the preprocessing is completed, the setting unit 132 outputs a generation instruction to the generation unit 133.

生成部１３３は、設定部１３２から生成指示が入力されると、学習用データ記憶部１２１から学習用データを読み込んで機械学習を行い、予測モデルを生成する（ステップＳ４）。生成部１３３は、生成した予測モデルを予測モデル記憶部１２２に記憶する。生成部１３３は、予測モデルが未生成の予測対象があるか否かを判定する（ステップＳ５）。生成部１３３は、予測モデルが未生成の予測対象があると判定した場合（ステップＳ５：Ｙｅｓ）、設定部１３２に対して、残りの予測対象について設定を行うように設定指示を出力し、ステップＳ２に戻る。生成部１３３は、予測モデルが未生成の予測対象がないと判定した場合（ステップＳ５：Ｎｏ）、第２受付部１３４に対して受付指示を出力する。 When the generation instruction is input from the setting unit 132, the generation unit 133 reads the learning data from the learning data storage unit 121, performs machine learning, and generates a prediction model (step S4). The generation unit 133 stores the generated prediction model in the prediction model storage unit 122. The generation unit 133 determines whether or not there is an ungenerated prediction target in the prediction model (step S5). When the generation unit 133 determines that the prediction model has an ungenerated prediction target (step S5: Yes), the generation unit 133 outputs a setting instruction to the setting unit 132 to set the remaining prediction target, and steps. Return to S2. When the generation unit 133 determines that there is no ungenerated prediction target in the prediction model (step S5: No), the generation unit 133 outputs a reception instruction to the second reception unit 134.

第２受付部１３４は、生成部１３３から受付指示が入力されると、表示部１１１に配置画面を表示させ、ユーザからシミュレーション環境における各要素の配置を受け付ける。第２受付部１３４は、生成した予測モデルの配置を受け付ける（ステップＳ６）。また、第２受付部１３４は、評価用データ、強化学習器および模倣学習器の配置を受け付ける（ステップＳ７）。さらに、第２受付部１３４は、配置された強化学習器および模倣学習器の制御対象等の設定を受け付ける（ステップＳ８）。第２受付部１３４は、ユーザから配置の完了を受け付けると、各要素の配置および接続情報を含む配置情報を配置情報記憶部１２４に記憶する。 When the reception instruction is input from the generation unit 133, the second reception unit 134 causes the display unit 111 to display the arrangement screen, and receives the arrangement of each element in the simulation environment from the user. The second reception unit 134 receives the arrangement of the generated prediction model (step S6). In addition, the second reception unit 134 receives the evaluation data, the reinforcement learning device, and the arrangement of the imitation learning device (step S7). Further, the second reception unit 134 receives settings such as control targets of the arranged reinforcement learner and imitation learner (step S8). When the second reception unit 134 receives the completion of the arrangement from the user, the second reception unit 134 stores the arrangement information including the arrangement and connection information of each element in the arrangement information storage unit 124.

実行部１３５は、ユーザからシミュレーションの開始指示を受け付けると、配置情報記憶部１２４を参照し、配置情報に基づいて、シミュレーションを実行し、強化学習を実行する（ステップＳ９）。実行部１３５は、強化学習器の第１学習済モデル、および、模倣学習器の第２学習済モデルを学習済モデル記憶部１２５に記憶する。また、実行部１３５は、配置情報に予測画像モデルが含まれる場合、シミュレーションの実行中に、予測画像データを出力制御部１３７に出力する。出力制御部１３７は、実行部１３５から予測画像データが入力されると、予測画像データを表示部１１１に出力して表示する（ステップＳ１０）。出力制御部１３７は、表示した予測画像データに対する評価を受け付けると、受け付けた評価を判定部１３６に出力する。実行部１３５は、例えば、所定回数のシミュレーションが完了すると、判定指示を判定部１３６に出力する。 When the execution unit 135 receives the simulation start instruction from the user, the execution unit 135 refers to the arrangement information storage unit 124, executes the simulation based on the arrangement information, and executes reinforcement learning (step S9). The execution unit 135 stores the first trained model of the reinforcement learner and the second trained model of the imitation learner in the trained model storage unit 125. Further, when the arrangement information includes the predicted image model, the execution unit 135 outputs the predicted image data to the output control unit 137 during the execution of the simulation. When the predicted image data is input from the execution unit 135, the output control unit 137 outputs the predicted image data to the display unit 111 and displays it (step S10). When the output control unit 137 receives the evaluation for the displayed predicted image data, the output control unit 137 outputs the received evaluation to the determination unit 136. For example, when the simulation of a predetermined number of times is completed, the execution unit 135 outputs a determination instruction to the determination unit 136.

判定部１３６は、実行部１３５から判定指示が入力されると、ユーザから予測画像データの評価を受け付けたか否かを判定する（ステップＳ１１）。判定部１３６は、予測画像データの評価を受け付けていないと判定した場合（ステップＳ１１：Ｎｏ）、または、配置情報に予測画像モデルが含まれない場合、強化学習器の学習結果に基づいて、予測モデルの再生成を行うか否かを判定すると設定する（ステップＳ１２）。一方、判定部１３６は、予測画像データの評価を受け付けたと判定した場合（ステップＳ１１：Ｙｅｓ）、受け付けた評価に基づいて、予測モデルの再生成を行うか否かを判定すると設定する（ステップＳ１３）。 When the determination instruction is input from the execution unit 135, the determination unit 136 determines whether or not the evaluation of the predicted image data has been accepted from the user (step S11). If the determination unit 136 determines that the evaluation of the predicted image data is not accepted (step S11: No), or if the arrangement information does not include the predicted image model, the determination unit 136 makes a prediction based on the learning result of the reinforcement learner. It is set when it is determined whether or not to regenerate the model (step S12). On the other hand, when it is determined that the evaluation of the predicted image data has been accepted (step S11: Yes), the determination unit 136 sets to determine whether or not to regenerate the prediction model based on the accepted evaluation (step S13). ).

判定部１３６は、予測モデルの再生成を行うか否かを判定する（ステップＳ１４）。判定部１３６は、予測モデルの再生成を行うと判定した場合（ステップＳ１４：Ｙｅｓ）、設定部１３２に対して再設定指示を出力し、ステップＳ２に戻る。判定部１３６は、予測モデルの再生成を行わないと判定した場合（ステップＳ１４：Ｎｏ）、シミュレーション処理を終了する。これにより、シミュレーション装置１００は、高精度なシミュレーション環境を容易に構築することができる。また、シミュレーション装置１００は、人手による構築よりも短時間でシミュレーション環境を構築することができる。 The determination unit 136 determines whether or not to regenerate the prediction model (step S14). When the determination unit 136 determines that the prediction model is to be regenerated (step S14: Yes), the determination unit 136 outputs a reset instruction to the setting unit 132 and returns to step S2. When the determination unit 136 determines that the prediction model is not regenerated (step S14: No), the determination unit 136 ends the simulation process. As a result, the simulation device 100 can easily construct a highly accurate simulation environment. Further, the simulation device 100 can construct the simulation environment in a shorter time than the manual construction.

なお、上記第１の実施形態では、予測モデルの一例として、予測画像データを生成する予測画像モデルを挙げたが、予測画像データの代わりに予測音声データを生成する予測音声モデルを用いてもよい。予測音声モデルは、例えば、図３に示す配置画面において、他の予測モデルと同様に配置することができる。 In the first embodiment, the predicted image model that generates the predicted image data is given as an example of the predicted model, but the predicted voice model that generates the predicted voice data may be used instead of the predicted image data. .. The predictive voice model can be arranged in the same manner as other predictive models on the arrangement screen shown in FIG. 3, for example.

［第１の実施形態の効果］
このように、シミュレーション装置１００は、学習用データの入力を受け付ける。また、シミュレーション装置１００は、生成部が、受け付けた学習用データを用いて学習し、予測モデルを生成する。また、シミュレーション装置１００は、シミュレーションに用いる評価用データと、生成された予測モデルと、シミュレーションにおける強化学習を行う強化学習器と、シミュレーションにおける模倣学習を行う模倣学習器とのうち、いずれか１つまたは複数の配置を受け付ける。また、シミュレーション装置１００は、受け付けた配置の状態に基づいて、評価用データと、予測モデルと、強化学習器と、模倣学習器とを用いたシミュレーションを実行する。また、シミュレーション装置１００は、シミュレーションにおける強化学習器の学習結果に基づいて、予測モデルの再生成を行うか否かを判定する。また、シミュレーション装置１００は、予測モデルの再生成を行うと判定した場合、生成部に対して、予測モデルの再生成を指示する。その結果、シミュレーション装置１００は、高精度なシミュレーション環境を容易に構築することができる。また、シミュレーション装置１００は、シミュレーション環境の構築から強化学習の実行までを自動化できるので、人手による構築よりも短時間で高精度なシミュレーション環境を構築することができる。なお、構築するシミュレーション環境は、デジタルツイン環境とも呼ばれるものである。 [Effect of the first embodiment]
In this way, the simulation device 100 accepts the input of learning data. Further, the simulation device 100 learns using the received learning data by the generation unit, and generates a prediction model. Further, the simulation device 100 is one of an evaluation data used for the simulation, a generated prediction model, a reinforcement learning device for performing reinforcement learning in the simulation, and an imitation learning device for performing imitation learning in the simulation. Or accept multiple arrangements. Further, the simulation device 100 executes a simulation using the evaluation data, the prediction model, the reinforcement learner, and the imitation learner based on the received arrangement state. Further, the simulation device 100 determines whether or not to regenerate the prediction model based on the learning result of the reinforcement learning device in the simulation. Further, when the simulation device 100 determines that the prediction model is to be regenerated, the simulation device 100 instructs the generation unit to regenerate the prediction model. As a result, the simulation device 100 can easily construct a highly accurate simulation environment. Further, since the simulation device 100 can automate the process from the construction of the simulation environment to the execution of reinforcement learning, it is possible to construct a highly accurate simulation environment in a shorter time than the manual construction. The simulation environment to be constructed is also called a digital twin environment.

また、シミュレーション装置１００は、実環境における強化学習器の学習結果に基づく予測モデルの予測値と、シミュレーションにおける強化学習器の学習結果に基づく予測モデルの予測値との誤差の評価値に基づいて、予測モデルの再生成を行うか否かを判定する。その結果、シミュレーション装置１００は、シミュレーション環境の精度を向上させることができる。 Further, the simulation device 100 is based on an evaluation value of an error between the predicted value of the prediction model based on the learning result of the reinforcement learner in the real environment and the prediction value of the prediction model based on the learning result of the reinforcement learner in the simulation. Determine whether to regenerate the prediction model. As a result, the simulation device 100 can improve the accuracy of the simulation environment.

また、シミュレーション装置１００は、シミュレーションにおける強化学習器の学習結果に基づく予測モデルの予測値と、実測値との相関度合いに基づいて、予測モデルの再生成を行うか否かを判定する。その結果、シミュレーション装置１００は、シミュレーション環境の精度を向上させることができる。 Further, the simulation device 100 determines whether or not to regenerate the prediction model based on the degree of correlation between the predicted value of the prediction model based on the learning result of the reinforcement learning device in the simulation and the measured value. As a result, the simulation device 100 can improve the accuracy of the simulation environment.

また、学習用データは、画像データを含む。また、シミュレーション装置１００は、予測モデルとして、さらに、画像データに基づいて、予測画像データを生成する予測画像モデルを生成する。また、シミュレーション装置１００は、予測画像モデルの配置を受け付ける。また、シミュレーション装置１００は、予測画像モデルを含むシミュレーションを実行する。また、シミュレーション装置１００は、さらに、シミュレーションにおける強化学習器の制御出力に応じた予測モデルの予測値、つまり強化学習器の学習結果に基づく予測モデルの予測値に対応する予測画像データを出力する。その結果、シミュレーション装置１００は、予測画像を用いてシミュレーションの状況をわかりやすく提示できる。 Further, the learning data includes image data. Further, the simulation device 100 further generates a predictive image model that generates predictive image data based on the image data as a predictive model. Further, the simulation device 100 accepts the arrangement of the predicted image model. Further, the simulation device 100 executes a simulation including a predicted image model. Further, the simulation device 100 further outputs predicted image data corresponding to the predicted value of the prediction model according to the control output of the reinforcement learning device in the simulation, that is, the predicted value of the prediction model based on the learning result of the reinforcement learning device. As a result, the simulation device 100 can present the simulation situation in an easy-to-understand manner using the predicted image.

また、学習用データは、音声データを含む。また、シミュレーション装置１００は、予測モデルとして、さらに、音声データに基づいて、予測音声データを生成する予測音声モデルを生成する。また、シミュレーション装置１００は、予測音声モデルの配置を受け付ける。また、シミュレーション装置１００は、予測音声モデルを含むシミュレーションを実行する。また、シミュレーション装置１００は、さらに、シミュレーションにおける強化学習器の制御出力に応じた予測モデルの予測値、つまり強化学習器の学習結果に基づく予測モデルの予測値に対応する予測音声データを出力する。その結果、シミュレーション装置１００は、予測音声を用いてシミュレーションの状況をわかりやすく提示できる。 Further, the learning data includes voice data. Further, the simulation device 100 further generates a predicted voice model that generates predicted voice data based on the voice data as a prediction model. Further, the simulation device 100 accepts the arrangement of the predicted voice model. Further, the simulation device 100 executes a simulation including a predicted speech model. Further, the simulation device 100 further outputs the predicted value of the prediction model according to the control output of the reinforcement learning device in the simulation, that is, the predicted voice data corresponding to the predicted value of the prediction model based on the learning result of the reinforcement learning device. As a result, the simulation device 100 can present the simulation situation in an easy-to-understand manner using the predicted voice.

また、シミュレーション装置１００は、さらに、出力した予測画像データまたは予測音声データに対する評価を受け付け、受け付けた評価に基づいて、予測モデルの再生成を行うか否かを判定する。その結果、シミュレーション装置１００は、熟練作業員の経験をシミュレーションに反映できる。 Further, the simulation device 100 further accepts evaluations of the output predicted image data or predicted voice data, and determines whether or not to regenerate the prediction model based on the received evaluations. As a result, the simulation device 100 can reflect the experience of a skilled worker in the simulation.

なお、上記第１の実施形態では、予測モデルの生成において、ＣＮＮを用いた機械学習を行ったが、これに限定されない。例えば、ＲＮＮ（Recurrent Neural Network）やＳＶＭ（Support Vector Machine）等を用いた機械学習を行ってもよい。 In the first embodiment, machine learning using CNN was performed in the generation of the prediction model, but the present invention is not limited to this. For example, machine learning using RNN (Recurrent Neural Network), SVM (Support Vector Machine), or the like may be performed.

また、上記第１の実施形態では、学習用データを機械学習して予測モデルを生成してから強化学習器の強化学習を行ったが、これに限定されない。例えば、シミュレーション装置１００は、学習用データを機械学習して予測モデルを生成し、生成した予測モデルを用いてシミュレーションを実行した結果（強化学習を行わない場合の結果）に基づいて、予測モデルの再生成を行うか否かを判定するようにしてもよい。また、例えば、シミュレーション装置１００は、他の情報処理装置から予測モデルを取得した場合、予測モデルの機械学習は行わず、シミュレーションで強化学習を実行した結果に基づいて、予測モデルの再生成を行うか否かを判定するようにしてもよい。 Further, in the first embodiment, the training data is machine-learned to generate a prediction model, and then the reinforcement learning of the reinforcement learning device is performed, but the present invention is not limited to this. For example, the simulation device 100 machine-learns training data to generate a prediction model, and based on the result of executing a simulation using the generated prediction model (result when reinforcement learning is not performed), the prediction model It may be determined whether or not to perform regeneration. Further, for example, when the simulation device 100 acquires a prediction model from another information processing device, the simulation device 100 does not perform machine learning of the prediction model, but regenerates the prediction model based on the result of executing reinforcement learning in the simulation. It may be determined whether or not.

［システム構成等］
また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。さらに、各装置にて行なわれる各処理機能は、その全部または任意の一部が、ＣＰＵやＧＰＵおよび当該ＣＰＵやＧＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 [System configuration, etc.]
Further, each component of each of the illustrated devices is a functional concept, and does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or part of them may be functionally or physically distributed / physically in any unit according to various loads and usage conditions. Can be integrated and configured. Further, each processing function performed by each device is realized by a CPU or GPU and a program that is analyzed and executed by the CPU or GPU, or as hardware by wired logic. Can be realized.

また、上記実施形態において説明した各処理のうち、自動的におこなわれるものとして説明した処理の全部または一部を手動的におこなうこともでき、あるいは、手動的におこなわれるものとして説明した処理の全部または一部を公知の方法で自動的におこなうこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 Further, among the processes described in the above-described embodiment, all or part of the processes described as being automatically performed can be manually performed, or the processes described as being manually performed can be performed. All or part of it can be done automatically by a known method. In addition, the processing procedure, control procedure, specific name, and information including various data and parameters shown in the above document and drawings can be arbitrarily changed unless otherwise specified.

［プログラム］
また、上記実施形態において説明したシミュレーション装置が実行する処理をコンピュータが実行可能な言語で記述したプログラムを作成することもできる。例えば、実施形態に係るシミュレーション装置１００が実行する処理をコンピュータが実行可能な言語で記述したシミュレーションプログラムを作成することもできる。この場合、コンピュータがシミュレーションプログラムを実行することにより、上記実施形態と同様の効果を得ることができる。さらに、かかるシミュレーションプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたシミュレーションプログラムをコンピュータに読み込ませて実行することにより上記実施形態と同様の処理を実現してもよい。 [program]
Further, it is also possible to create a program in which the processing executed by the simulation apparatus described in the above embodiment is described in a language that can be executed by a computer. For example, it is possible to create a simulation program in which the processing executed by the simulation apparatus 100 according to the embodiment is described in a language that can be executed by a computer. In this case, the same effect as that of the above embodiment can be obtained by executing the simulation program by the computer. Further, the same processing as that of the above embodiment may be realized by recording the simulation program on a computer-readable recording medium, reading the simulation program recorded on the recording medium into the computer, and executing the simulation program.

図６は、シミュレーションプログラムを実行するコンピュータの一例を示す図である。図６に例示するように、コンピュータ１０００は、例えば、メモリ１０１０と、ＣＰＵ１０２０と、ハードディスクドライブインタフェース１０３０と、ディスクドライブインタフェース１０４０と、シリアルポートインタフェース１０５０と、ビデオアダプタ１０６０と、ネットワークインタフェース１０７０とを有し、これらの各部はバス１０８０によって接続される。 FIG. 6 is a diagram showing an example of a computer that executes a simulation program. As illustrated in FIG. 6, the computer 1000 has, for example, a memory 1010, a CPU 1020, a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. However, each of these parts is connected by a bus 1080.

メモリ１０１０は、図６に例示するように、ＲＯＭ（Read Only Memory）１０１１およびＲＡＭ１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic Input Output System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１０３０は、図６に例示するように、ハードディスクドライブ１０９０に接続される。ディスクドライブインタフェース１０４０は、図６に例示するように、ディスクドライブ１１００に接続される。例えば磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブ１１００に挿入される。シリアルポートインタフェース１０５０は、図６に例示するように、例えばマウス１１１０、キーボード１１２０に接続される。ビデオアダプタ１０６０は、図６に例示するように、例えばディスプレイ１１３０に接続される。 The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012, as illustrated in FIG. The ROM 1011 stores, for example, a boot program such as a BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1090, as illustrated in FIG. The disk drive interface 1040 is connected to the disk drive 1100 as illustrated in FIG. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1100. The serial port interface 1050 is connected to, for example, a mouse 1110 and a keyboard 1120, as illustrated in FIG. The video adapter 1060 is connected, for example, to a display 1130, as illustrated in FIG.

ここで、図６に例示するように、ハードディスクドライブ１０９０は、例えば、ＯＳ１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３、プログラムデータ１０９４を記憶する。すなわち、上記の、シミュレーションプログラムは、コンピュータ１０００によって実行される指令が記述されたプログラムモジュールとして、例えばハードディスクドライブ１０９０に記憶される。 Here, as illustrated in FIG. 6, the hard disk drive 1090 stores, for example, the OS 1091, the application program 1092, the program module 1093, and the program data 1094. That is, the above-mentioned simulation program is stored in, for example, the hard disk drive 1090 as a program module in which a command executed by the computer 1000 is described.

また、上記実施形態で説明した各種データは、プログラムデータとして、例えばメモリ１０１０やハードディスクドライブ１０９０に記憶される。そして、ＣＰＵ１０２０が、メモリ１０１０やハードディスクドライブ１０９０に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出し、各種処理手順を実行する。 Further, the various data described in the above embodiment are stored as program data in, for example, a memory 1010 or a hard disk drive 1090. Then, the CPU 1020 reads the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1090 into the RAM 1012 as needed, and executes various processing procedures.

なお、シミュレーションプログラムに係るプログラムモジュール１０９３やプログラムデータ１０９４は、ハードディスクドライブ１０９０に記憶される場合に限られず、例えば着脱可能な記憶媒体に記憶され、ディスクドライブ等を介してＣＰＵ１０２０によって読み出されてもよい。あるいは、シミュレーションプログラムに係るプログラムモジュール１０９３やプログラムデータ１０９４は、ネットワーク（ＬＡＮ、ＷＡＮ（Wide Area Network）等）を介して接続された他のコンピュータに記憶され、ネットワークインタフェース１０７０を介してＣＰＵ１０２０によって読み出されてもよい。 The program module 1093 and program data 1094 related to the simulation program are not limited to those stored in the hard disk drive 1090, and may be stored in, for example, a detachable storage medium and read by the CPU 1020 via a disk drive or the like. good. Alternatively, the program module 1093 and the program data 1094 related to the simulation program are stored in another computer connected via a network (LAN, WAN (Wide Area Network), etc.) and read by the CPU 1020 via the network interface 1070. May be done.

上記の実施形態やその変形は、本願が開示する技術に含まれると同様に、特許請求の範囲に記載された発明とその均等の範囲に含まれるものである。 The above-described embodiments and modifications thereof are included in the invention described in the claims and the equivalent scope thereof, as included in the technique disclosed in the present application.

１００シミュレーション装置
１１０通信部
１１１表示部
１１２操作部
１２０記憶部
１２１学習用データ記憶部
１２２予測モデル記憶部
１２３評価用データ記憶部
１２４配置情報記憶部
１２５学習済モデル記憶部
１３０制御部
１３１第１受付部
１３２設定部
１３３生成部
１３４第２受付部
１３５実行部
１３６判定部
１３７出力制御部 100 Simulation device 110 Communication unit 111 Display unit 112 Operation unit 120 Storage unit 121 Learning data storage unit 122 Prediction model storage unit 123 Evaluation data storage unit 124 Arrangement information storage unit 125 Learned model storage unit 130 Control unit 131 First reception Unit 132 Setting unit 133 Generation unit 134 Second reception unit 135 Execution unit 136 Judgment unit 137 Output control unit

Claims

The first reception section that accepts input of learning data,
A generation unit that learns using the received training data and generates a prediction model,
A second reception unit that accepts the arrangement of the evaluation data used for the simulation, the generated prediction model, and the reinforcement learning device that performs reinforcement learning in the simulation.
An execution unit that executes the simulation using the evaluation data, the prediction model, and the reinforcement learner based on the received state of the arrangement.
Based on the learning result of the reinforcement learning device in the simulation, it is determined whether or not to regenerate the prediction model, and when it is determined to regenerate the prediction model, the generation unit is referred to. A judgment unit that instructs the regeneration of the prediction model, and
A simulation device characterized by having.

The determination unit determines the evaluation value of the error between the predicted value of the prediction model based on the learning result of the reinforcement learning device in the actual environment and the prediction value of the prediction model based on the learning result of the reinforcement learning device in the simulation. Based on this, it is determined whether or not to regenerate the prediction model.
The simulation apparatus according to claim 1.

The determination unit determines whether or not to regenerate the prediction model based on the degree of correlation between the prediction value of the prediction model based on the learning result of the reinforcement learning device in the simulation and the measured value.
The simulation apparatus according to claim 1.

The learning data includes image data and includes image data.
The generation unit further generates a prediction image model that generates prediction image data based on the image data as the prediction model.
The second reception unit receives the arrangement of the predicted image model and receives it.
The execution unit executes the simulation including the predicted image model, and the execution unit executes the simulation.
Further, it has an output control unit that outputs the predicted image data corresponding to the predicted value of the predicted model based on the learning result of the enhanced learning device in the simulation.
The simulation apparatus according to any one of claims 1 to 3, wherein the simulation apparatus is characterized in that.

The learning data includes voice data.
As the prediction model, the generation unit further generates a prediction voice model that generates prediction voice data based on the voice data.
The second reception unit receives the arrangement of the predicted voice model and receives it.
The execution unit executes the simulation including the predictive voice model, and the execution unit executes the simulation.
Further, it has an output control unit that outputs the predicted voice data corresponding to the predicted value of the predicted model based on the learning result of the enhanced learning device in the simulation.
The simulation apparatus according to any one of claims 1 to 4, wherein the simulation apparatus is characterized in that.

The determination unit further receives an evaluation of the predicted image data or the predicted voice data output by the output control unit, and determines whether or not to regenerate the prediction model based on the received evaluation.
The simulation apparatus according to claim 4 or 5.

The first reception process that accepts the input of learning data,
A generation process that learns using the received training data and generates a prediction model,
A second reception process that accepts the arrangement of the evaluation data used for the simulation, the generated prediction model, and the reinforcement learning device that performs reinforcement learning in the simulation.
An execution step of executing the simulation using the evaluation data, the prediction model, and the reinforcement learner based on the received state of the arrangement.
Based on the learning result of the reinforcement learning device in the simulation, it is determined whether or not to regenerate the prediction model, and when it is determined to regenerate the prediction model, the above-mentioned generation step is performed. A judgment process that instructs the regeneration of the prediction model, and
A simulation method characterized by the simulation device performing.

The first reception step that accepts the input of learning data,
A generation step to train using the received training data and generate a predictive model,
A second reception step that accepts the arrangement of the evaluation data used for the simulation, the generated prediction model, and the reinforcement learning device that performs reinforcement learning in the simulation.
An execution step for executing the simulation using the evaluation data, the prediction model, and the reinforcement learner based on the received state of the arrangement.
Based on the learning result of the reinforcement learning device in the simulation, it is determined whether or not to regenerate the prediction model, and when it is determined to regenerate the prediction model, the above-mentioned generation step is performed. A decision step that directs the regeneration of the predictive model, and
A simulation program characterized by having a computer execute.