JP7726730B2

JP7726730B2 - Computer system and model learning method

Info

Publication number: JP7726730B2
Application number: JP2021168780A
Authority: JP
Inventors: 佳一郎西; 信昭中須
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2021-10-14
Filing date: 2021-10-14
Publication date: 2025-08-20
Anticipated expiration: 2041-10-14
Also published as: JP2023058949A

Description

本発明は、物体の把持及び物体の移動を含む作業を行うロボットを制御するためのモデルの学習に関する。 The present invention relates to learning models for controlling robots that perform tasks including grasping and moving objects.

ＡＩ技術の進展に伴って、物体の把持及び物品の移動を含む作業を行うロボットを、ＡＩを用いて自動制御するシステムが登場している。ロボットを自動制御するＡＩは、機械学習によって生成されるモデルによって実現される。 With advances in AI technology, systems are emerging that use AI to automatically control robots that perform tasks including grasping objects and moving items. AI that automatically controls robots is realized by models generated through machine learning.

モデルの学習には多くの学習データが必要となる。ロボットから学習データを取得する場合、以下のような問題がある。第一に、学習データの取得に時間がかかるため、学習時間が長くなることである。第二に、ロボットを長時間稼働させる必要があり、また、ロボットに無理な動きを行わせる可能性があるため、故障する可能性があることである。 Learning a model requires a large amount of training data. Obtaining training data from a robot presents the following problems. First, it takes time to obtain the training data, which lengthens the training time. Second, the robot must be operated for long periods of time, and there is a possibility that the robot may be forced to perform unreasonable movements, which could lead to breakdowns.

上記のような課題に対して、特許文献１に記載の技術が知られている。特許文献１には「まず、仮想環境を用いた、複数の仮想対象物のバラ積み状態を生成する（１０１）。そして仮想ロボット装置が仮想対象物を保持する保持位置を生成する（１０２）。仮想ロボット装置にバラ積み状態の仮想対象物を取り出す動作を実行する（１０４）。仮想ロボット装置に仮想対象物を取り出す動作を実行させた結果の機械学習により対象物の特定の位置姿勢から、取り得る複数の保持位置の優先順位を出力可能な学習済みモデルを生成する（２００）。そして、この学習済みモデルを用いて実際のロボットマニピュレータに対象物を取り出させる。その取り出し動作の結果の機械学習により学習済みモデルの複数の保持位置の優先順位を更新する再学習を行う（２０３）。」ことが記載されている。 The technology described in Patent Document 1 is known to address the above-mentioned issues. Patent Document 1 states, "First, a loose pile of multiple virtual objects is generated using a virtual environment (101). Then, holding positions where a virtual robot device will hold the virtual objects are generated (102). The virtual robot device is then caused to perform an action to pick up the loosely piled virtual objects (104). Machine learning of the results of having the virtual robot device perform the action to pick up the virtual object is used to generate a trained model that can output the priorities of multiple possible holding positions from a specific position and orientation of the object (200). This trained model is then used to have an actual robot manipulator pick up the object. Re-learning is then performed to update the priorities of the multiple holding positions of the trained model using machine learning of the results of the pick-up action (203)."

特開２０２１－１３９９６号公報Japanese Patent Application Laid-Open No. 2021-13996

特許文献１に記載の技術では、軌道が考慮されていないため、物体の把持及び物体の移動を含む作業を行うロボットを制御するためのモデルの生成には対応できない。本発明は、学習時間を抑えつつ、高精度、かつ、様々な軌道に対してロバストなモデルを生成するシステム及び方法を提供することを目的とする。 The technology described in Patent Document 1 does not take trajectories into account, and therefore cannot generate models for controlling robots that perform tasks including grasping and moving objects. The present invention aims to provide a system and method that generates highly accurate models that are robust to various trajectories while reducing learning time.

本願において開示される発明の代表的な一例を示せば以下の通りである。すなわち、物体の把持及び物体の移動を含む作業を行うロボットを制御するためのモデルの学習を行う計算機システムであって、演算装置、前記演算装置に接続される記憶装置、及び前記演算装置に接続されるインタフェースを有する、少なくとも一つの計算機を含み、前記少なくとも一つの計算機は、作業に関する情報及び初期モデルの入力を受け付ける第１処理と、前記作業に関する情報に基づいて、仮想的な作業空間、仮想的な設備、及び仮想的なロボットから構成される仮想環境を設定し、前記仮想環境に対して第１軌道を設定する第２処理と、前記初期モデルを用いて、前記仮想環境下において前記第１軌道に沿った前記仮想的なロボットによる物体の移動を含む前記作業の制御処理に関するシミュレーションを行い、前記シミュレーションの結果を用いた機械学習アルゴリズムに基づいて前記作業の作業時間が短く、かつ、前記作業の成功確率が高くなるように前記初期モデルを更新する第３処理と、を含む第１学習処理を実行し、前記第２処理では、前記少なくとも一つの計算機は、ランダムに前記仮想環境を設定し、ランダムに前記第１軌道を設定し、前記第１学習処理では、前記少なくとも一つの計算機は、学習の終了条件を満たすまで、前記第２処理及び前記第３処理を繰り返し実行し、当該学習の終了条件を満たした場合の前記初期モデルを第１モデルとして記録する。 A representative example of the invention disclosed in the present application is as follows: That is, a computer system for learning a model for controlling a robot that performs a task including grasping and moving an object includes at least one computer having an arithmetic unit, a storage device connected to the arithmetic unit, and an interface connected to the arithmetic unit, wherein the at least one computer performs a first process of receiving input of information related to the task and an initial model, a second process of setting a virtual environment consisting of a virtual workspace, virtual equipment, and a virtual robot based on the information related to the task, and setting a first trajectory for the virtual environment, and a third process of moving the virtual robot along the first trajectory in the virtual environment using the initial model. and a third process of updating the initial model so that the task time is short and the task success probability is high based on a machine learning algorithm using the results of the simulation, wherein in the second process, the at least one computer randomly sets the virtual environment and the first trajectory, and in the first learning process, the at least one computer repeatedly executes the second process and the third process until a learning termination condition is met, and records the initial model when the learning termination condition is met as a first model.

本発明によれば、学習時間を抑えつつ、高精度、かつ、様々な軌道に対してロバストなモデルを生成できる。上記した以外の課題、構成及び効果は、以下の実施例の説明により明らかにされる。 The present invention makes it possible to generate highly accurate models that are robust to a variety of trajectories while minimizing learning time. Issues, configurations, and advantages other than those described above will become clearer in the following description of the embodiments.

実施例１のシステムの構成例を示す図である。FIG. 1 illustrates an example of the configuration of a system according to a first embodiment. 実施例１のロボット構成情報のデータ構造の一例を示す図である。FIG. 2 is a diagram illustrating an example of a data structure of robot configuration information according to the first embodiment. 実施例１の設備構成情報のデータ構造の一例を示す図である。FIG. 2 is a diagram illustrating an example of a data structure of equipment configuration information according to the first embodiment. 実施例１の軌道情報のデータ構造の一例を示す図である。FIG. 3 is a diagram illustrating an example of a data structure of trajectory information according to the first embodiment. 実施例１の環境調整情報のデータ構造の一例を示す図である。FIG. 4 is a diagram illustrating an example of a data structure of environment adjustment information according to the first embodiment. 実施例１のモデル管理情報のデータ構造の一例を示す図である。FIG. 10 is a diagram illustrating an example of a data structure of model management information according to the first embodiment. 実施例１の計算機が実行する学習処理の一例を説明するフローチャートである。10 is a flowchart illustrating an example of a learning process executed by a computer according to a first embodiment. 実施例１の計算機が実行する学習処理の一例を説明するフローチャートである。10 is a flowchart illustrating an example of a learning process executed by a computer according to a first embodiment. 実施例１の計算機が実行する学習処理の一例を説明するフローチャートである。10 is a flowchart illustrating an example of a learning process executed by a computer according to a first embodiment. 実施例１の計算機が表示する画面の一例を示す図である。FIG. 10 is a diagram illustrating an example of a screen displayed by a computer according to the first embodiment. 実施例１の環境設定部が実行する処理の具体例を示す図である。FIG. 10 is a diagram illustrating a specific example of processing executed by an environment setting unit according to the first embodiment. 実施例１の環境設定部が実行する処理の具体例を示す図である。FIG. 10 is a diagram illustrating a specific example of processing executed by an environment setting unit according to the first embodiment. 実施例１の環境設定部が実行する処理の具体例を示す図である。FIG. 10 is a diagram illustrating a specific example of processing executed by an environment setting unit according to the first embodiment. 実施例１の計算機が表示する画面の一例を示す図である。FIG. 10 is a diagram illustrating an example of a screen displayed by a computer according to the first embodiment.

以下、本発明の実施例を、図面を用いて説明する。ただし、本発明は以下に示す実施例の記載内容に限定して解釈されるものではない。本発明の思想ないし趣旨から逸脱しない範囲で、その具体的構成を変更し得ることは当業者であれば容易に理解される。 The following describes an embodiment of the present invention with reference to the accompanying drawings. However, the present invention should not be construed as being limited to the description of the embodiment shown below. Those skilled in the art will readily understand that the specific configuration can be modified without departing from the spirit or intent of the present invention.

以下に説明する発明の構成において、同一又は類似する構成又は機能には同一の符号を付し、重複する説明は省略する。 In the configuration of the invention described below, identical or similar components or functions are designated by the same reference numerals, and duplicate explanations will be omitted.

本明細書等における「第１」、「第２」、「第３」等の表記は、構成要素を識別するために付するものであり、必ずしも、数又は順序を限定するものではない。 The terms "first," "second," "third," etc. used in this specification are used to identify components and do not necessarily limit the number or order.

図面等において示す各構成の位置、大きさ、形状、及び範囲等は、発明の理解を容易にするため、実際の位置、大きさ、形状、及び範囲等を表していない場合がある。したがって、本発明では、図面等に開示された位置、大きさ、形状、及び範囲等に限定されない。 The position, size, shape, and range of each component shown in the drawings may not represent the actual position, size, shape, and range, etc., in order to facilitate understanding of the invention. Therefore, the present invention is not limited to the position, size, shape, and range, etc., disclosed in the drawings, etc.

図１は、実施例１のシステムの構成例を示す図である。 Figure 1 shows an example of the system configuration for Example 1.

システムは、ロボット１００及び計算機１０１から構成される。ロボット１００及び計算機１０１は、直接又はネットワークを介して接続される。 The system consists of a robot 100 and a computer 101. The robot 100 and the computer 101 are connected directly or via a network.

ロボット１００は、計算機１０１から出力される制御情報に基づいて、物体（ワーク）の把持、及び始点から終点まで移動を含む作業を行う。ロボット１００は、作業装置群１１０、コントローラ１１１、及び計測装置１１２を備える。 The robot 100 performs tasks including grasping an object (workpiece) and moving it from a starting point to an end point based on control information output from the computer 101. The robot 100 is equipped with a group of working devices 110, a controller 111, and a measuring device 112.

作業装置群１１０は、物体の把持及び物体の移動を実現する装置群であり、例えば、ハンド、リンク、及び駆動モータ等である。 The working device group 110 is a group of devices that realize the grasping and movement of objects, such as hands, links, and drive motors.

コントローラ１１１は、計算機１０１から受信した制御情報に基づいて作業装置群１１０を制御する。例えば、コントローラ１１１は、制御情報にしたがって、関節として機能する、リンク間を接続する駆動モータを駆動することによって、ハンドを移動させる。コントローラ１１１は、関節の角度、角速度、角加速度、並びに、駆動モータのトルク及び電流値等を含む稼働状態情報を計算機１０１に出力する。 The controller 111 controls the working device group 110 based on control information received from the computer 101. For example, the controller 111 moves the hand by driving drive motors that function as joints and connect links in accordance with the control information. The controller 111 outputs operating status information to the computer 101, including the angles, angular velocities, and angular accelerations of the joints, as well as the torque and current values of the drive motors.

計測装置１１２は、ロボット１００の作業による物体の状態を把握するための値を計測する。計測装置１１２から出力される値を作業状態情報とも記載する。計測装置１１２は、例えば、加速度センサ、力覚センサ、カメラ、接触センサ、及び電流センサ等である。なお、ロボット１００は、計測対象ごとに、種別が異なる複数の計測装置１１２を備えてもよい。なお、本発明は、計測装置１１２の設置位置及び設置数に限定されない。 The measuring device 112 measures values to understand the state of an object due to work by the robot 100. The values output from the measuring device 112 are also referred to as work state information. The measuring device 112 is, for example, an acceleration sensor, a force sensor, a camera, a contact sensor, a current sensor, etc. The robot 100 may be equipped with multiple measuring devices 112 of different types for each measurement object. The present invention is not limited to the installation location or number of measuring devices 112.

なお、ロボット１００は、稼働状態情報及び作業状態情報を一つの情報にまとめて送信してもよい。 The robot 100 may also transmit the operating status information and work status information together as a single piece of information.

計算機１０１は、ロボット１００の移動経路である軌道の情報（軌道情報）を生成し、また、軌道情報、稼働状態情報、及び作業状態情報に基づいて、制御情報を生成する。制御情報は、例えば、以下のような値を含む。
（１）制御軸の次の目標角度、角速度、及び角加速度
（２）駆動モータのトルク、及び駆動電流
（３）物体の目標座標、移動速度、及び加速度 The computer 101 generates information on the trajectory (trajectory information), which is the movement path of the robot 100, and also generates control information based on the trajectory information, operating status information, and work status information. The control information includes, for example, the following values:
(1) Next target angle, angular velocity , and angular acceleration of the control axis
(2) Torque and drive current of the drive motor
(3) Object target coordinates, movement speed, and acceleration

計算機１０１は、演算装置１２０、記憶装置１２１、通信装置１２２、入力装置１２３、及び出力装置１２４を備える。各ハードウェア要素は内部バスを介して接続される。 The computer 101 includes an arithmetic unit 120, a storage unit 121, a communication unit 122, an input unit 123, and an output unit 124. Each hardware element is connected via an internal bus.

記憶装置１２１は、演算装置１２０が実行するプログラム及び情報を格納する装置であり、例えば、メモリ等である。記憶装置１２１は、ロボット構成情報１４０、設備構成情報１４１、軌道情報１４２、環境調整情報１４３、及びモデル管理情報１４４を格納する。また、記憶装置１２１はワークエリアとしても使用される。 The storage device 121 is a device, such as a memory, that stores programs and information executed by the computing device 120. The storage device 121 stores robot configuration information 140, equipment configuration information 141, trajectory information 142, environmental adjustment information 143, and model management information 144. The storage device 121 is also used as a work area.

ロボット構成情報１４０は、ロボット１００の構成に関する情報である。ロボット構成情報１４０のデータ構造については図２を用いて説明する。 Robot configuration information 140 is information about the configuration of the robot 100. The data structure of the robot configuration information 140 will be explained using Figure 2.

設備構成情報１４１は、ロボット１００が作業を行う設備に関する情報である。設備構成情報１４１のデータ構造については図３を用いて説明する。 The equipment configuration information 141 is information about the equipment on which the robot 100 performs work. The data structure of the equipment configuration information 141 will be explained using Figure 3.

軌道情報１４２は、ロボット１００の移動経路である軌道に関する情報である。軌道情報１４２のデータ構造については図４を用いて説明する。 Trajectory information 142 is information about the trajectory, which is the movement path of robot 100. The data structure of trajectory information 142 will be explained using Figure 4.

環境調整情報１４３は、作業のシミュレーションを行う仮想環境を調整するための情報である。環境調整情報１４３のデータ構造については図５を用いて説明する。 Environment adjustment information 143 is information for adjusting the virtual environment in which a task simulation is performed. The data structure of environment adjustment information 143 will be explained using Figure 5.

モデル管理情報１４４は、制御情報を生成するモデルを管理するための情報である。ここで、モデルは、関数、テーブル、又はニューラルネットワーク等である。本実施例では、モデルはニューラルネットワークとする。モデル管理情報１４４のデータ構造については図６を用いて説明する。 Model management information 144 is information for managing models that generate control information. Here, a model is a function, a table, a neural network, or the like. In this embodiment, the model is a neural network. The data structure of model management information 144 will be explained using Figure 6.

演算装置１２０は、計算機１０１全体を制御する装置であり、例えば、プロセッサ等である。演算装置１２０は記憶装置１２１に格納されるプログラムを実行する。演算装置１２０がプログラムにしたがって処理を実行することによって、特定の機能を実現する機能部（モジュール）として動作する。以下の説明では、機能部を主語に処理を説明する場合、演算装置１２０が当該機能部を実現するプログラムを実行していることを示す。本実施例の演算装置１２０は、軌道情報生成部１３０、環境設定部１３１、学習部１３２、シミュレータ１３３、モデル統合部１３４、及び制御情報生成部１３５として機能する。 The arithmetic device 120 is a device that controls the entire computer 101, and is, for example, a processor. The arithmetic device 120 executes programs stored in the memory device 121. By executing processing in accordance with the programs, the arithmetic device 120 operates as a functional unit (module) that realizes a specific function. In the following explanation, when processing is described using a functional unit as the subject, it indicates that the arithmetic device 120 is executing a program that realizes the functional unit. In this embodiment, the arithmetic device 120 functions as a trajectory information generation unit 130, an environment setting unit 131, a learning unit 132, a simulator 133, a model integration unit 134, and a control information generation unit 135.

軌道情報生成部１３０は、作業内容及び物体の位置等の入力情報に基づいて、ハンドの軌道を決定する。また、軌道情報生成部１３０は、ロボット構成情報１４０、設備構成情報１４１、及びハンドの軌道に基づいてツールの軌道を決定する。軌道情報生成部１３０は、決定された軌道を示す軌道情報１４２を生成する。なお、軌道を決定する方法は公知の技術であるため詳細な説明は省略する。環境設定部１３１は、仮想環境を設定する。学習部１３２は、学習処理を実行する。シミュレータ１３３は、仮想環境下におけるロボット１００の動き等をシミュレーションする。モデル統合部１３４は、複数のモデルを統合する。制御情報生成部１３５は、軌道情報１４２、稼働状態情報、作業状態情報、及びモデルに基づいて、制御情報を生成することによって、作業を行うロボット１００を制御する。 The trajectory information generation unit 130 determines the trajectory of the hand based on input information such as the work content and the position of the object. The trajectory information generation unit 130 also determines the trajectory of the tool based on the robot configuration information 140, the equipment configuration information 141, and the hand trajectory. The trajectory information generation unit 130 generates trajectory information 142 indicating the determined trajectory. Note that the method of determining the trajectory is a well-known technique, so a detailed explanation will be omitted. The environment setting unit 131 sets the virtual environment. The learning unit 132 executes learning processing. The simulator 133 simulates the movement of the robot 100 in the virtual environment. The model integration unit 134 integrates multiple models. The control information generation unit 135 controls the robot 100 performing work by generating control information based on the trajectory information 142, operating status information, work status information, and models.

なお、計算機１０１が有する各機能部については、複数の機能部を一つの機能部にまとめてもよいし、一つの機能部を機能毎に複数の機能部に分けてもよい。例えば、学習部１３２が環境設定部１３１の機能を含んでもよい。 Regarding each functional unit of the computer 101, multiple functional units may be combined into one functional unit, or one functional unit may be divided into multiple functional units for each function. For example, the learning unit 132 may include the functions of the environment setting unit 131.

通信装置１２２は、外部装置と通信するための装置であり、例えば、ＮＩＣ（ＮｅｔｗｏｒｋＩｎｔｅｒｆａｃｅＣａｒｄ）である。 The communication device 122 is a device for communicating with external devices, such as a NIC (Network Interface Card).

入力装置１２３は、計算機１０１にデータ及びコマンド等を入力するための装置であり、例えば、キーボード、マウス、及びタッチパネル等である。 The input device 123 is a device for inputting data, commands, etc. into the computer 101, and is, for example, a keyboard, mouse, touch panel, etc.

出力装置１２４は、計算機１０１の演算結果等を出力するための装置であり、例えば、ディスプレイ、プロジェクタ、及びプリンタ等である。 The output device 124 is a device for outputting the calculation results of the calculator 101, and is, for example, a display, projector, printer, etc.

図２は、実施例１のロボット構成情報１４０のデータ構造の一例を示す図である。 Figure 2 shows an example of the data structure of robot configuration information 140 in Example 1.

ロボット構成情報１４０は、ＩＤ２０１、分類２０２、項目２０３、及び内容２０４を含むエントリを格納する。 Robot configuration information 140 stores entries including ID 201, classification 202, item 203, and content 204.

ＩＤ２０１は、エントリの識別情報を格納するフィールドである。分類２０２は、ロボット１００を構成する要素の分類を格納するフィールドである。項目２０３は、要素の管理項目を格納するフィールドである。内容２０４は、管理項目の内容を格納するフィールドである。内容２０４には、ファイル、数値、及び文字列等が格納される。 ID 201 is a field that stores the identification information of the entry. Classification 202 is a field that stores the classification of the elements that make up the robot 100. Item 203 is a field that stores the management item of the element. Content 204 is a field that stores the content of the management item. Content 204 stores files, numbers, character strings, etc.

リンクについては、リンクの形状が管理される。リンク間を結合する関節（Ｊｏｉｎｔ）については、結合するリンク、関節のタイプ、及び関節の動きの制約が管理される。なお、関節については、関節のタイプに応じて関節の動きの制約として管理する項目が異なる。 For links, the shape of the links is managed. For joints that connect links, the connecting links, the joint type, and constraints on joint movement are managed. Note that for joints, the items managed as constraints on joint movement differ depending on the joint type.

図３は、実施例１の設備構成情報１４１のデータ構造の一例を示す図である。 Figure 3 shows an example of the data structure of the equipment configuration information 141 in Example 1.

設備構成情報１４１は、ＩＤ３０１、設備名３０２、取付け対象３０３、相対位置３０４、及び相対姿勢３０５を含むエントリを格納する。 The equipment configuration information 141 stores entries including an ID 301, equipment name 302, attachment target 303, relative position 304, and relative orientation 305.

ＩＤ３０１は、エントリの識別情報を格納するフィールドである。設備名３０２は、設備の名称を格納するフィールドである。取付け対象３０３は、設備に取り付ける対象の名称を格納するフィールドである。相対位置３０４は、取付け対象の設備に対する相対的な設置位置を示す情報（例えば、座標）を格納するフィールドである。相対姿勢３０５は、取付け対象の設備に対する相対的な姿勢を示す情報（例えば、座標）を格納するフィールドである。 ID 301 is a field that stores the identification information of the entry. Equipment name 302 is a field that stores the name of the equipment. Mounting target 303 is a field that stores the name of the object to be mounted on the equipment. Relative position 304 is a field that stores information (e.g., coordinates) indicating the installation position relative to the equipment of the mounting target. Relative orientation 305 is a field that stores information (e.g., coordinates) indicating the orientation relative to the equipment of the mounting target.

ＩＤ３０１が「０」のエントリは、組立セルＡを基準として、他の設備の位置及び姿勢が決定されることを示す。ＩＤ３０１が「１」のエントリには、組立セルＡに対して設置されるＲｏｂｏＡの相対位置及び相対姿勢が格納される。 An entry with ID 301 of "0" indicates that the position and orientation of other equipment are determined based on assembly cell A. An entry with ID 301 of "1" stores the relative position and orientation of RoboA installed with respect to assembly cell A.

図４は、実施例１の軌道情報１４２のデータ構造の一例を示す図である。 Figure 4 shows an example of the data structure of orbit information 142 in Example 1.

軌道情報１４２は、ハンドの軌道を示すテーブル４００及びツールの軌道を示すテーブル４１０を格納する。 Trajectory information 142 stores a table 400 showing the hand trajectory and a table 410 showing the tool trajectory.

テーブル４００は、経由点４０１、位置４０２、姿勢４０３、及びＴｉｍｅ４０４を含むエントリを格納する。 Table 400 stores entries including via points 401, positions 402, attitudes 403, and times 404.

経由点４０１は、軌道上の経由点の識別情報を格納するフィールドである。位置４０２は、経由点におけるハンドの座標を示す値を格納するフィールド群である。位置４０２の各フィールドにはデカルト座標系の値が格納される。姿勢４０３は、経由点におけるハンドの姿勢を示す値を格納するフィールド群である。姿勢４０３の各フィールドには、四元数（ｑｕａｔｅｒｎｉｏｎ）で定義された値が格納される。Ｔｉｍｅ４０４は、始点から移動を開始したハンドが経由点に到達する時間を格納するフィールドである。 Way point 401 is a field that stores identification information for way points on the trajectory. Position 402 is a group of fields that store values indicating the coordinates of the hand at the way point. Each field of position 402 stores a value in the Cartesian coordinate system. Orientation 403 is a group of fields that store values indicating the orientation of the hand at the way point. Each field of orientation 403 stores a value defined by a quaternion. Time 404 is a field that stores the time at which the hand, having started moving from the starting point, reaches the way point.

テーブル４１０は、経由点４１１、姿勢４１２、及びＴｉｍｅ４１３を含むエントリを格納する。 Table 410 stores entries including waypoints 411, attitudes 412, and times 413.

経由点４１１は、経由点４０１と同一のフィールドである。姿勢４１２は、経由点における各関節の姿勢を示す値を格納するフィールド群である。姿勢４１２の各フィールドには関節の角度が格納される。Ｔｉｍｅ４１３は、Ｔｉｍｅ４０４と同一のフィールドである。 Way point 411 is the same field as way point 401. Posture 412 is a group of fields that store values indicating the posture of each joint at the way point. Each field of posture 412 stores the angle of the joint. Time 413 is the same field as Time 404.

なお、軌道情報１４２には、作業内容ごとに、テーブル４００及びテーブル４１０の組が含まれる。なお、ロボット１００本体が移動する場合には、ロボット１００本体の軌道に関するテーブルが軌道情報１４２に含まれてもよい。 The trajectory information 142 includes a set of table 400 and table 410 for each task. If the robot 100 main body moves, the trajectory information 142 may also include a table relating to the trajectory of the robot 100 main body.

図５は、実施例１の環境調整情報１４３のデータ構造の一例を示す図である。 Figure 5 shows an example of the data structure of the environment adjustment information 143 in Example 1.

環境調整情報１４３は、ＩＤ５０１、調整対象５０２、及び調整項目５０３を含むエントリ（調整データ）を格納する。 Environment adjustment information 143 stores entries (adjustment data) including ID 501, adjustment target 502, and adjustment item 503.

ＩＤ５０１は、エントリの識別情報を格納するフィールドである。調整対象５０２は、環境設定において調整する対象の識別情報を格納するフィールドである。調整対象は、例えば、組立セル、ワーク、及び障害物等である。調整項目５０３は、調整対象の項目及び調整範囲を格納するフィールドである。 ID 501 is a field that stores the identification information of the entry. Adjustment target 502 is a field that stores the identification information of the object to be adjusted in the environment settings. Examples of adjustment targets include assembly cells, workpieces, and obstacles. Adjustment item 503 is a field that stores the item to be adjusted and the adjustment range.

図６は、実施例１のモデル管理情報１４４のデータ構造の一例を示す図である。 Figure 6 shows an example of the data structure of model management information 144 in Example 1.

モデル管理情報１４４は、ＩＤ６０１、モデル６０２、作業６０３、学習時間（シミュレーション）６０４、学習時間（実機）６０５、作業時間６０６、及び成功確率６０７を含むエントリ（モデルデータ）を格納する。 Model management information 144 stores entries (model data) including ID 601, model 602, task 603, learning time (simulation) 604, learning time (actual machine) 605, task time 606, and success probability 607.

ＩＤ６０１は、エントリの識別情報を格納するフィールドである。モデル６０２は、モデルの実体（データ）を格納するフィールドである。なお、モデルの格納先のパスが格納されてもよい。作業６０３は、作業の種別を格納するフィールドである。本実施例では、一つの作業に対して一つ以上のモデルが生成される。学習時間（シミュレーション）６０４は、シミュレーションによる学習処理の処理時間を格納するフィールドである。学習時間（実機）６０５は、実際のロボット１００を用いた学習処理の処理時間を格納するフィールドである。作業時間６０６は、ロボット１００が作業完了に要する時間を格納するフィールドである。成功確率６０７は、ロボット１００の作業の成功確率を格納するフィールドである。 ID 601 is a field that stores the identification information of the entry. Model 602 is a field that stores the model entity (data). The path to the storage destination of the model may also be stored. Task 603 is a field that stores the type of task. In this embodiment, one or more models are generated for one task. Learning time (simulation) 604 is a field that stores the processing time of the learning process by simulation. Learning time (actual machine) 605 is a field that stores the processing time of the learning process using the actual robot 100. Task time 606 is a field that stores the time required for the robot 100 to complete the task. Success probability 607 is a field that stores the probability of success of the task by the robot 100.

本実施例の計算機１０１は、以下のような手順でモデルの学習を行う。 In this embodiment, the computer 101 learns the model using the following procedure.

（学習処理１）計算機１０１は、初期のモデル及びシミュレータ１３３を利用してモデル（一次モデル）の学習を行う。 (Learning process 1) The computer 101 uses the initial model and simulator 133 to learn the model (primary model).

（学習処理２）計算機１０１は、シミュレータ１３３を利用した学習によって生成されたモデルを実際のロボット１００に適用し、ロボット１００を実際に動かしてモデル（二次モデル）の学習を行う。 (Learning process 2) The computer 101 applies the model generated by learning using the simulator 133 to the actual robot 100, and actually moves the robot 100 to learn the model (secondary model).

（学習処理３）計算機１０１は、一次モデル及び二次モデルを統合し、当該モデル及びシミュレータ１３３を利用してモデル（一次モデル）の学習を行う。 (Learning process 3) The computer 101 integrates the primary model and the secondary model, and uses the integrated model and the simulator 133 to learn the model (primary model).

計算機１０１は、ユーザからの要求に応じて（学習処理２）及び（学習処理３）を繰り返し実行する。 Computer 101 repeatedly executes (Learning Process 2) and (Learning Process 3) in response to a request from the user.

以下、フローチャートを用いて各学習処理について説明する。図７、図８、及び図９は、実施例１の計算機１０１が実行する学習処理の一例を説明するフローチャートである。図１０は、実施例１の計算機１０１が表示する画面の一例を示す図である。図１１Ａ、図１１Ｂ、及び図１１Ｃは、実施例１の環境設定部１３１が実行する処理の具体例を示す図である。図１２は、実施例１の計算機１０１が表示する画面の一例を示す図である。 Each learning process will be explained below using a flowchart. Figures 7, 8, and 9 are flowcharts illustrating an example of the learning process executed by the computer 101 of Example 1. Figure 10 is a diagram showing an example of a screen displayed by the computer 101 of Example 1. Figures 11A, 11B, and 11C are diagrams showing specific examples of processes executed by the environment setting unit 131 of Example 1. Figure 12 is a diagram showing an example of a screen displayed by the computer 101 of Example 1.

まず、図７を用いて（学習処理１）について説明する。 First, we will explain (Learning Process 1) using Figure 7.

学習部１３２は、出力装置１２４を介して図１０に示すような画面１０００を表示し、作業の種別、使用する設備及びロボット１００、使用するモデル等についてユーザ入力を受け付ける（ステップＳ１０１）。学習部１３２は、使用する調整データに関する入力を受け付けてもよい。 The learning unit 132 displays a screen 1000 such as that shown in FIG. 10 via the output device 124 and accepts user input regarding the type of work, the equipment and robot 100 to be used, the model to be used, etc. (Step S101). The learning unit 132 may also accept input regarding the adjustment data to be used.

画面１０００は、表示欄１００１、選択欄１００２、１００４、読込ボタン１００３、１００５、モデル選択欄１００６、及び実行ボタン１００７を含む。 Screen 1000 includes a display field 1001, selection fields 1002 and 1004, load buttons 1003 and 1005, a model selection field 1006, and an execute button 1007.

表示欄１００１は、環境設定部１３１によって設定された仮想環境を表示する欄である。選択欄１００２は、作業に使用する設備を選択する欄である。読込ボタン１００３が操作された場合、設備構成情報１４１から選択欄１００２にて指定された設備に関する値が読み出される。選択欄１００４は、ロボット１００に実行させる作業を選択する欄である。読込ボタン１００５が操作された場合、作業に対応するテーブル４００、４１０が読み出される。また、環境調整情報１４３から作業に関連するエントリが読み出される。 The display field 1001 is a field for displaying the virtual environment set by the environment setting unit 131. The selection field 1002 is a field for selecting the equipment to be used for the work. When the read button 1003 is operated, values related to the equipment specified in the selection field 1002 are read from the equipment configuration information 141. The selection field 1004 is a field for selecting the work to be performed by the robot 100. When the read button 1005 is operated, the tables 400 and 410 corresponding to the work are read. In addition, entries related to the work are read from the environment adjustment information 143.

モデル選択欄１００６は、学習に使用するモデルを選択するための欄である。モデル選択欄１００６には、ＩＤ１０１１、モデル１０１２、作業時間１０１３、及び選択１０１４を含むエントリが表示される。一つのエントリが一つのモデルに対応する。なお、エントリには前述のフィールド以外のフィールドが含まれてもよい。 The model selection field 1006 is a field for selecting a model to use for learning. The model selection field 1006 displays entries including ID 1011, model 1012, work time 1013, and selection 1014. One entry corresponds to one model. Note that an entry may include fields other than those mentioned above.

ＩＤ１０１１、モデル１０１２、及び作業時間１０１３は、ＩＤ６０１、モデル６０２、及び作業時間６０６と同一のフィールドである。選択１０１４は、学習に使用するモデルを選択するためのフィールドである。学習処理１では、初期モデルが選択される。学習処理２及び学習処理３では、初期モデル以外のモデルが選択される。 ID 1011, Model 1012, and Working Time 1013 are the same fields as ID 601, Model 602, and Working Time 606. Selection 1014 is a field for selecting the model to use for learning. In learning process 1, the initial model is selected. In learning processes 2 and 3, a model other than the initial model is selected.

初期モデルが選択され、かつ、実行ボタン１００７が操作された場合、以下で説明する処理が実行される。まず、学習部１３２は、環境設定部１３１を呼び出し、ベース仮想環境の設定を指示する。環境設定部１３１は、ロボット構成情報１４０及び設備構成情報１４１を用いてベース仮想環境を設定する（ステップＳ１０２）。環境設定部１３１は、ベース仮想環境に関する情報をワークエリアに格納し、処理の完了を学習部１３２に通知する。 When an initial model is selected and the execute button 1007 is operated, the processing described below is executed. First, the learning unit 132 calls the environment setting unit 131 and instructs it to set a base virtual environment. The environment setting unit 131 sets the base virtual environment using the robot configuration information 140 and the equipment configuration information 141 (step S102). The environment setting unit 131 stores information about the base virtual environment in a work area and notifies the learning unit 132 that processing is complete.

具体的には、環境設定部１３１は、ユーザ入力に基づいてロボット構成情報１４０及び設備構成情報１４１から、作業に関連する設備、ロボット１００、及びワーク等に関するデータを取得する。環境設定部１３１は、取得したデータに基づいて、仮想的な作業空間、仮想的な設備、仮想的なロボット、及び仮想的なワーク等から構成されるベース仮想環境を設定する。なお、ベース仮想環境には仮想的な障害物が含まれてもよい。 Specifically, the environment setting unit 131 acquires data on the equipment, robot 100, workpiece, etc. related to the work from the robot configuration information 140 and the equipment configuration information 141 based on user input. Based on the acquired data, the environment setting unit 131 sets a base virtual environment consisting of a virtual work space, virtual equipment, a virtual robot, a virtual workpiece, etc. Note that the base virtual environment may also include virtual obstacles.

学習部１３２は、モデル管理情報１４４からユーザが指定したモデルのモデルデータと、環境調整情報１４３とを取得する（ステップＳ１０３）。なお、ユーザが調整データを指定している場合、学習部１３２は、環境調整情報１４３から指定された調整データを取得する。 The learning unit 132 acquires model data for the model specified by the user and the environment adjustment information 143 from the model management information 144 (step S103). Note that if the user has specified adjustment data, the learning unit 132 acquires the specified adjustment data from the environment adjustment information 143.

学習部１３２は、学習処理を開始する（ステップＳ１０４）。 The learning unit 132 starts the learning process (step S104).

学習部１３２は、環境設定部１３１を呼び出し、仮想環境の設定を指示する。環境設定部１３１は、ベース仮想環境及び環境調整情報１４３に基づいて、仮想環境を設定する（ステップＳ１０５）。また、環境設定部１３１は、仮想的な作業空間に、軌道の開始点及び終了点をランダムに設定する（ステップＳ１０６）。環境設定部１３１は、仮想環境、開始点、及び終了点に関する情報をワークエリアに格納し、処理の完了を学習部１３２に通知する。 The learning unit 132 calls the environment setting unit 131 and instructs it to set up a virtual environment. The environment setting unit 131 sets up a virtual environment based on the base virtual environment and the environment adjustment information 143 (step S105). The environment setting unit 131 also randomly sets the start and end points of a trajectory in the virtual work space (step S106). The environment setting unit 131 stores information about the virtual environment, the start and end points in a work area, and notifies the learning unit 132 that processing is complete.

例えば、環境設定部１３１は、環境調整情報１４３に基づいて、ベース仮想環境に含まれる設備及び障害物のサイズ及び位置をランダムに変更し、また、ワークのサイズ及び重さをランダムに変更する。 For example, the environment setting unit 131 randomly changes the size and position of equipment and obstacles included in the base virtual environment, and also randomly changes the size and weight of work, based on the environment adjustment information 143.

図１１Ａ、図１１Ｂ、及び図１１Ｃに示すように、環境設定部１３１は、仮想環境に含まれる障害物のサイズ及び位置、ワークのサイズ及び重量、軌道の開始点及び終了点をランダムに設定し、異なる軌道を生成する。これによって、効率よく複数の軌道に対応した制御、すなわち、あらゆる軌道に対してロバストな制御を実現するモデルを学習することができる。また、異なる仮想環境上で計測されるセンサ値も変化するため、実機と仮想環境とで生じるセンサ特性誤差を吸収することもできる。 As shown in Figures 11A, 11B, and 11C, the environment setting unit 131 randomly sets the size and position of obstacles included in the virtual environment, the size and weight of the workpiece, and the start and end points of the trajectory to generate different trajectories. This makes it possible to efficiently learn control that corresponds to multiple trajectories, i.e., a model that achieves robust control for any trajectory. In addition, because the sensor values measured in different virtual environments also change, it is also possible to absorb sensor characteristic errors that occur between the actual machine and the virtual environment.

学習部１３２は、軌道情報生成部１３０を呼び出し、軌道情報１４２の生成を指示する。このとき、学習部１３２は、仮想的な設備及びロボット１００に関する情報、並びに、軌道の開始点及び終了点に関する情報を軌道情報生成部１３０に入力する。軌道情報生成部１３０は、学習部１３２が入力した情報に基づいて、軌道情報１４２を生成し（ステップＳ１０７）、ワークエリアに格納する。軌道情報１４２の生成方法は公知の技術であるため詳細な説明は省略する。 The learning unit 132 calls the trajectory information generation unit 130 and instructs it to generate trajectory information 142. At this time, the learning unit 132 inputs information about the virtual facility and robot 100, as well as information about the start and end points of the trajectory, to the trajectory information generation unit 130. The trajectory information generation unit 130 generates trajectory information 142 based on the information input by the learning unit 132 (step S107) and stores it in the work area. The method for generating trajectory information 142 is a well-known technique, so a detailed explanation will be omitted.

学習部１３２は、制御処理を開始する（ステップＳ１０８）。制御処理は、物体の把持及び軌道経路に沿った物体の移動等、一連の作業が終了するまで繰り返し実行される。このとき、学習部１３２は、仮想環境に関する情報及び軌道情報１４２をシミュレータ１３３に出力する。 The learning unit 132 starts the control process (step S108). The control process is repeatedly executed until a series of tasks, such as grasping the object and moving the object along the trajectory path, is completed. At this time, the learning unit 132 outputs information about the virtual environment and trajectory information 142 to the simulator 133.

学習部１３２は、仮想的なロボット１００の動きをシミュレーションするシミュレータ１３３から稼働状態情報及び作業状態情報を取得する（ステップＳ１０９）。なお、軌道の開始点ではステップＳ１０９を省略してもよい。 The learning unit 132 acquires operating status information and work status information from the simulator 133, which simulates the movement of the virtual robot 100 (step S109). Note that step S109 may be omitted at the start point of the trajectory.

学習部１３２は、稼働状態情報、作業状態情報、及び軌道情報１４２をモデルに入力することによって制御情報を生成し、制御情報を仮想的なロボット１００に出力する（ステップＳ１１０）。シミュレータ１３３は、制御情報に基づいて、仮想的なロボット１００の動きをシミュレーションし、稼働状態情報及び作業状態情報を出力する。 The learning unit 132 generates control information by inputting the operating status information, work status information, and trajectory information 142 into a model, and outputs the control information to the virtual robot 100 (step S110). The simulator 133 simulates the movement of the virtual robot 100 based on the control information, and outputs the operating status information and work status information.

作業が完了していない場合、学習部１３２は、ステップＳ１０８に戻り、同様の処理を実行する。作業が完了した場合、学習部１３２は制御処理を終了する（ステップＳ１１１）。 If the work is not complete, the learning unit 132 returns to step S108 and performs the same processing. If the work is complete, the learning unit 132 ends the control processing (step S111).

学習部１３２は、制御処理において取得された稼働状態情報及び作業状態情報等に基づいてモデルを更新する（ステップＳ１１２）。例えば、勾配降下法等を用いてモデルが更新される。なお、本発明は、学習のアルゴリズムに限定されない。 The learning unit 132 updates the model based on the operating status information, work status information, etc. acquired during the control process (step S112). For example, the model is updated using a gradient descent method, etc. Note that the present invention is not limited to the learning algorithm.

学習の終了条件を満たさない場合、学習部１３２は、ステップＳ１０４に戻り、同様の処理を実行する。例えば、精度を示す指標が閾値より大きい場合、又は、学習回数が閾値より大きい場合、学習部１３２は、学習の終了条件を満たすと判定する。学習の終了条件を満たす場合、学習部１３２は、学習処理を終了し（ステップＳ１１３）、一連の学習処理を終了する。このとき、学習部１３２は、モデルの精度評価を行い、モデル管理情報１４４を更新する。具体的には、学習部１３２は、モデルを用いた制御による作業時間及び作業の成功確率をモデルの精度を評価する指標として算出する。学習部１３２は、モデル管理情報１４４にエントリを追加し、追加されたエントリのＩＤ６０１に識別情報を設定する。学習部１３２は、追加されたエントリのモデル６０２に、生成されたモデルを設定し、作業６０３に作業の種別を設定し、学習時間（シミュレーション）６０４に学習に要した時間を設定する。学習部１３２は、追加されたエントリの学習時間（実機）６０５に「０」を設定する。また、学習部１３２は、追加されたエントリの作業時間６０６及び成功確率６０７に、精度評価において算出した指標を設定する。 If the learning termination condition is not met, the learning unit 132 returns to step S104 and executes the same processing. For example, if the index indicating accuracy is greater than a threshold value or if the number of learning iterations is greater than a threshold value, the learning unit 132 determines that the learning termination condition is met. If the learning termination condition is met, the learning unit 132 terminates the learning process (step S113) and ends the series of learning processes. At this time, the learning unit 132 evaluates the accuracy of the model and updates the model management information 144. Specifically, the learning unit 132 calculates the task time and task success probability under control using the model as indicators for evaluating the accuracy of the model. The learning unit 132 adds an entry to the model management information 144 and sets identification information to the ID 601 of the added entry. The learning unit 132 sets the generated model to the model 602 of the added entry, sets the task type to the task 603, and sets the time required for learning to the learning time (simulation) 604. The learning unit 132 sets the learning time (actual machine) 605 of the added entry to "0". The learning unit 132 also sets the work time 606 and success probability 607 of the added entry to the indices calculated in the accuracy evaluation.

仮想環境及び軌道をランダムに変更してモデルを学習することによって、環境及び軌道に対してロバストな制御を実現するモデルを生成できる。また、実際のロボット１００を用いていないため、学習処理を高速化でき、また、ロボット１００の故障を回避できる。 By randomly changing the virtual environment and trajectory and training the model, it is possible to generate a model that achieves robust control over the environment and trajectory. Furthermore, because the actual robot 100 is not used, the learning process can be sped up and failure of the robot 100 can be avoided.

なお、計算機１０１は、学習処理の実行中、図１２に示すような画面１２００を表示してもよい。画面１２００は、表示欄１２０１、１２０２、１２０３、１２０４、１２０５及び停止ボタン１２０６を含む。 Note that the computer 101 may display a screen 1200 such as that shown in FIG. 12 while the learning process is being performed. Screen 1200 includes display fields 1201, 1202, 1203, 1204, and 1205, and a stop button 1206.

表示欄１２０１は、仮想環境上のロボットの作業状態を表示する欄である。表示欄１２０２、１２０３は、仮想環境上で作業を行うロボットから取得されるセンサ値（シミュレーション結果）を表示する欄である。表示欄１２０４、１２０５は、学習の経過、すなわち、モデルの精度評価を表示する欄である。表示欄１２０４には、作業時間の時間推移が表示され、表示欄１２０５には、作業の成功確率の推移が表示される。 Display field 1201 displays the working status of the robot in the virtual environment. Display fields 1202 and 1203 display sensor values (simulation results) obtained from the robot working in the virtual environment. Display fields 1204 and 1205 display the learning progress, i.e., the accuracy evaluation of the model. Display field 1204 displays the progress of the work time over time, and display field 1205 displays the progress of the success probability of the work.

ユーザは、表示欄１２０４、１２０５を参照し、モデルの精度が十分であると判断した場合、停止ボタン１２０６を操作し、学習処理を終了させることができる。 The user can refer to display fields 1204 and 1205, and if they determine that the model is sufficiently accurate, they can operate stop button 1206 to end the learning process.

次に、図８を用いて（学習処理２）について説明する。 Next, we will explain (Learning Process 2) using Figure 8.

学習部１３２は、入力を受け付ける画面１０００をユーザに提示し、作業の種別、使用する設備及びロボット１００、使用するモデル等についてユーザ入力を受け付ける（ステップＳ２０１）。なお、学習部１３２は、モデル管理情報１４４を参照し、作業６０３に、指定された作業の種別が設定されたモデルを、モデル選択欄１００６に表示する。 The learning unit 132 presents the user with a screen 1000 for accepting input, and accepts user input regarding the type of work, the equipment and robot 100 to be used, the model to be used, etc. (step S201). The learning unit 132 also references the model management information 144 and displays in the model selection field 1006 the model for which the specified work type is set in the work 603.

学習部１３２は、モデル管理情報１４４を参照して、ユーザが指定したモデルを取得し、また、ロボット構成情報１４０及び設備構成情報１４１からロボット及び設備に関する情報を取得する（ステップＳ２０２）。 The learning unit 132 references the model management information 144 to obtain the model specified by the user, and also obtains information about the robot and equipment from the robot configuration information 140 and equipment configuration information 141 (step S202).

学習部１３２は、軌道情報生成部１３０を呼び出し、軌道情報１４２の生成を指示する。このとき、学習部１３２は、設備及びロボット１００に関する情報を軌道情報生成部１３０に入力する。軌道情報生成部１３０は、学習部１３２が入力した情報に基づいて、軌道情報１４２を生成し（ステップＳ２０３）、ワークエリアに格納する。 The learning unit 132 calls the trajectory information generation unit 130 and instructs it to generate trajectory information 142. At this time, the learning unit 132 inputs information about the equipment and the robot 100 to the trajectory information generation unit 130. The trajectory information generation unit 130 generates trajectory information 142 based on the information input by the learning unit 132 (step S203) and stores it in the work area.

学習部１３２は、学習処理を開始し（ステップＳ２０４）、また、制御処理を開始する（ステップＳ２０５）。制御処理は、物体の把持及び軌道経路に沿った物体の移動等、一連の作業が終了するまで繰り返し実行される。 The learning unit 132 starts the learning process (step S204) and also starts the control process (step S205). The control process is repeatedly executed until a series of tasks, such as grasping the object and moving the object along the trajectory path, is completed.

学習部１３２は、ロボット１００から稼働状態情報及び作業状態情報を取得する（ステップＳ２０６）。なお、軌道の開始点ではステップＳ２０６を省略してもよい。 The learning unit 132 acquires operating status information and work status information from the robot 100 (step S206). Note that step S206 may be omitted at the start point of the trajectory.

学習部１３２は、稼働状態情報、作業状態情報、及び軌道情報１４２をモデルに入力することによって制御情報を生成し、制御情報をロボット１００に出力する（ステップＳ２０７）。 The learning unit 132 generates control information by inputting the operating status information, work status information, and trajectory information 142 into the model, and outputs the control information to the robot 100 (step S207).

作業が完了していない場合、学習部１３２は、ステップＳ２０５に戻り、同様の処理を実行する。作業が完了した場合、学習部１３２は制御処理を終了する（ステップＳ２０８）。 If the work is not complete, the learning unit 132 returns to step S205 and executes the same process. If the work is complete, the learning unit 132 ends the control process (step S208).

学習部１３２は、制御処理において取得された稼働状態情報及び作業状態情報等に基づいてモデルを更新する（ステップＳ２０９）。 The learning unit 132 updates the model based on the operating status information, work status information, etc. acquired during the control process (step S209).

学習の終了条件を満たさない場合、学習部１３２は、ステップＳ２０４に戻り、同様の処理を実行する。学習の終了条件を満たす場合、学習部１３２は、学習処理を終了し（ステップＳ２１０）、一連の学習処理を終了する。このとき、学習部１３２は、モデルの精度評価を行い、モデル管理情報１４４を更新する。具体的には、学習部１３２は、作業時間及び作業の成功確率を算出する。学習部１３２は、モデル管理情報１４４にエントリを追加し、追加されたエントリのＩＤ６０１に識別情報を設定する。学習部１３２は、追加されたエントリのモデル６０２に、生成されたモデルを設定し、作業６０３に作業の種別を設定する。学習部１３２は、ステップＳ２０２において取得したモデルに対応するエントリの学習時間（シミュレーション）６０４の値を、追加されたエントリの学習時間（シミュレーション）６０４に設定する。学習部１３２は、追加されたエントリの学習時間（実機）６０５に学習に要した時間を設定する。また、学習部１３２は、追加されたエントリの作業時間６０６及び成功確率６０７に、精度評価において算出した指標を設定する。 If the learning termination condition is not met, the learning unit 132 returns to step S204 and executes the same processing. If the learning termination condition is met, the learning unit 132 terminates the learning process (step S210) and ends the series of learning processes. At this time, the learning unit 132 evaluates the accuracy of the model and updates the model management information 144. Specifically, the learning unit 132 calculates the task time and the task success probability. The learning unit 132 adds an entry to the model management information 144 and sets identification information in the ID 601 of the added entry. The learning unit 132 sets the generated model in the model 602 of the added entry and sets the task type in the task 603. The learning unit 132 sets the value of the learning time (simulation) 604 of the entry corresponding to the model acquired in step S202 to the learning time (simulation) 604 of the added entry. The learning unit 132 sets the time required for learning in the learning time (actual machine) 605 of the added entry. The learning unit 132 also sets the work time 606 and success probability 607 of the added entry to the indices calculated in the accuracy evaluation.

シミュレータ１３３を用いた学習によって生成されたモデルを、実機を用いて再学習することによって、従来の学習より短時間で学習が終了する。また、実機を用いてモデルを再学習することによって、実際の作業の動きに対応したモデルを生成することができる。すなわち、精度を向上させることができる。 By relearning a model generated through learning using the simulator 133 using an actual machine, learning can be completed in a shorter time than conventional learning. Furthermore, by relearning a model using an actual machine, it is possible to generate a model that corresponds to the movements of actual work. In other words, accuracy can be improved.

次に、図９を用いて（学習処理３）について説明する。 Next, we will explain (Learning Process 3) using Figure 9.

学習部１３２は、入力を受け付ける画面１０００をユーザに提示し、作業の種別、使用する設備及びロボット１００、使用するモデル等についてユーザ入力を受け付ける（ステップＳ３０１）。学習部１３２は、使用する調整データに関する入力を受け付けてもよい。なお、（学習処理３）では、（学習処理１）にて生成されたモデルと、（学習処理２）にて生成されたモデルとが指定される。 The learning unit 132 presents the user with a screen 1000 for accepting input, and accepts user input regarding the type of work, the equipment and robot 100 to be used, the model to be used, etc. (Step S301). The learning unit 132 may also accept input regarding the adjustment data to be used. Note that in (Learning Process 3), the model generated in (Learning Process 1) and the model generated in (Learning Process 2) are specified.

学習部１３２は、環境設定部１３１を呼び出し、ベース仮想環境の設定を指示する。環境設定部１３１は、ロボット構成情報１４０及び設備構成情報１４１を用いてベース仮想環境を設定する（ステップＳ３０２）。環境設定部１３１は、ベース仮想環境に関する情報をワークエリアに格納し、処理の完了を学習部１３２に通知する。ステップＳ３０２の処理はステップＳ１０２の処理と同一である。 The learning unit 132 calls the environment setting unit 131 and instructs it to set a base virtual environment. The environment setting unit 131 sets the base virtual environment using the robot configuration information 140 and the equipment configuration information 141 (step S302). The environment setting unit 131 stores information about the base virtual environment in a work area and notifies the learning unit 132 that processing is complete. The processing in step S302 is the same as the processing in step S102.

学習部１３２は、モデル管理情報１４４からユーザが指定してモデルのモデルデータと、環境調整情報１４３とを取得する（ステップＳ３０３）。ステップＳ３０３の処理はステップＳ１０３の処理と同一である。 The learning unit 132 acquires the model data of the model designated by the user from the model management information 144 and the environment adjustment information 143 (step S303). The process of step S303 is the same as the process of step S103 .

学習部１３２は、モデル統合部１３４を呼び出し、モデルの統合を指示する。モデル統合部１３４は、（学習処理１）にて生成されたモデルと、（学習処理２）にて生成されたモデルとを統合し（ステップＳ３０４）、処理の完了と学習部１３２に通知する。例えば、モデル統合部１３４は、モデルがニューラルネットワークである場合、（学習処理２）にて生成されたモデルの一部の重みを、（学習処理１）にて生成されたモデルの重みに置き換える。本実施例では、局所解問題及び過学習を回避する目的でモデルを統合している。 The learning unit 132 calls the model integration unit 134 and instructs it to integrate the models. The model integration unit 134 integrates the model generated in (learning process 1) and the model generated in (learning process 2) (step S304), and notifies the learning unit 132 that the processing is complete. For example, if the model is a neural network, the model integration unit 134 replaces some of the weights of the model generated in (learning process 2) with the weights of the model generated in (learning process 1). In this embodiment, the models are integrated to avoid local solution problems and overfitting.

ステップＳ３０５からステップＳ３１４の処理は、ステップＳ１０４からステップＳ１１３の処理と同一である。ただし、モデル管理情報１４４の更新方法が異なる。 The processing from step S305 to step S314 is the same as the processing from step S104 to step S113. However, the method for updating the model management information 144 is different.

学習部１３２は、モデル管理情報１４４にエントリを追加し、追加されたエントリのＩＤ６０１に識別情報を設定する。学習部１３２は、追加されたエントリのモデル６０２に、生成されたモデルを設定し、作業６０３に作業の種別を設定する。学習部１３２は、学習時間（シミュレーション）６０４及び学習時間（実機）６０５に、（学習処理２）にて生成されたモデルに対応するエントリの学習時間（シミュレーション）６０４及び学習時間（実機）６０５の値を設定する。学習部１３２は、追加されたエントリの学習時間（シミュレーション）６０４の値に、学習に要した時間を加算する。また、学習部１３２は、追加されたエントリの作業時間６０６及び成功確率６０７に、精度評価において算出した指標を設定する。 The learning unit 132 adds an entry to the model management information 144 and sets identification information in the ID 601 of the added entry. The learning unit 132 sets the generated model in the model 602 of the added entry and sets the type of work in the work 603. The learning unit 132 sets the values of the learning time (simulation) 604 and learning time (actual machine) 605 of the entry corresponding to the model generated in (learning process 2) in the learning time (simulation) 604 and learning time (actual machine) 605. The learning unit 132 adds the time required for learning to the value of the learning time (simulation) 604 of the added entry. The learning unit 132 also sets the indicators calculated in the accuracy evaluation in the work time 606 and success probability 607 of the added entry.

なお、ステップＳ３０４では統合方法が異なる複数のモデルが生成されてもよい。この場合、各モデルに対してステップＳ３０５からステップＳ３１４の処理が実行される。 Note that in step S304, multiple models using different integration methods may be generated. In this case, steps S305 to S314 are performed for each model.

実機を用いた学習によって生成されたモデルを、シミュレータ１３３を用いて再学習することによって、短時間に、高精度、かつ、環境及び軌道に対してロバストなモデルを生成できる。 By re-learning a model generated through training using a real aircraft using simulator 133, a highly accurate model that is robust to the environment and trajectory can be generated in a short amount of time.

ユーザは、実際にロボット１００を制御する場合、計算機１０１にアクセスし、モデル管理情報１４４を参照する。ユーザは、モデル管理情報１４４に基づいて、使用するモデルを選択する。例えば、作業時間を重要視する場合、ユーザは、作業時間を基準にモデルを選択し、作業の安定性を重要視する場合、ユーザは、成功確率を基準にモデルを選択する。制御情報生成部１３５は、選択されたモデルを用いてロボット１００を制御する。 When actually controlling the robot 100, the user accesses the computer 101 and refers to the model management information 144. The user selects a model to use based on the model management information 144. For example, if the user places importance on the operation time, the user selects a model based on the operation time, and if the user places importance on the stability of the operation, the user selects a model based on the success probability. The control information generator 135 controls the robot 100 using the selected model.

なお、複数の作業群を一つの作業として扱ってもよい。例えば、「部品の搬送」、「部品の組立て」からなる製品の製造作業等が考えられる。この場合、作業を構成する要素作業ごとにモデルを選択する。制御情報生成部１３５は、要素作業ごとにモデルを切り替えて、ロボット１００を制御する。 It is also possible to treat multiple task groups as a single task. For example, a product manufacturing task consisting of "transporting parts" and "assembling parts" could be considered. In this case, a model is selected for each elemental task that makes up the task. The control information generator 135 switches models for each elemental task to control the robot 100.

なお、一つの計算機１０１がロボット１００を制御していたが、複数の計算機１０１を含む計算機システムが同様の制御を行うようにしてもよい。この場合、複数の計算機１０１に機能部を分散して配置してもよい。 Although one computer 101 controls the robot 100, a computer system including multiple computers 101 may perform similar control. In this case, functional units may be distributed across multiple computers 101.

なお、本発明は上記した実施例に限定されるものではなく、様々な変形例が含まれる。また、例えば、上記した実施例は本発明を分かりやすく説明するために構成を詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、各実施例の構成の一部について、他の構成に追加、削除、置換することが可能である。 The present invention is not limited to the above-described embodiments, but includes various modifications. Furthermore, for example, the above-described embodiments provide detailed descriptions of the configurations in order to clearly explain the present invention, and the present invention is not necessarily limited to those that include all of the described configurations. Furthermore, it is possible to add, delete, or replace part of the configuration of each embodiment with another configuration.

また、上記の各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、本発明は、実施例の機能を実現するソフトウェアのプログラムコードによっても実現できる。この場合、プログラムコードを記録した記憶媒体をコンピュータに提供し、そのコンピュータが備えるプロセッサが記憶媒体に格納されたプログラムコードを読み出す。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施例の機能を実現することになり、そのプログラムコード自体、及びそれを記憶した記憶媒体は本発明を構成することになる。このようなプログラムコードを供給するための記憶媒体としては、例えば、フレキシブルディスク、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、光ディスク、光磁気ディスク、ＣＤ－Ｒ、磁気テープ、不揮発性のメモリカード、ＲＯＭなどが用いられる。 Furthermore, the above-described configurations, functions, processing units, processing means, etc. may be implemented in part or in whole in hardware, for example, by designing them as integrated circuits. The present invention can also be realized by software program code that implements the functions of the embodiments. In this case, a storage medium on which the program code is recorded is provided to a computer, and a processor in the computer reads the program code stored on the storage medium. In this case, the program code read from the storage medium itself implements the functions of the above-described embodiments, and the program code itself and the storage medium on which it is stored constitute the present invention. Examples of storage media for providing such program code include flexible disks, CD-ROMs, DVD-ROMs, hard disks, SSDs (Solid State Drives), optical disks, magneto-optical disks, CD-Rs, magnetic tape, non-volatile memory cards, and ROMs.

また、本実施例に記載の機能を実現するプログラムコードは、例えば、アセンブラ、Ｃ／Ｃ＋＋、ｐｅｒｌ、Ｓｈｅｌｌ、ＰＨＰ、Ｐｙｔｈｏｎ、Ｊａｖａ（登録商標）等の広範囲のプログラム又はスクリプト言語で実装できる。 Furthermore, the program code that realizes the functions described in this embodiment can be implemented in a wide range of programming or scripting languages, such as assembler, C/C++, Perl, Shell, PHP, Python, and Java (registered trademark).

さらに、実施例の機能を実現するソフトウェアのプログラムコードを、ネットワークを介して配信することによって、それをコンピュータのハードディスクやメモリ等の記憶手段又はＣＤ－ＲＷ、ＣＤ－Ｒ等の記憶媒体に格納し、コンピュータが備えるプロセッサが当該記憶手段や当該記憶媒体に格納されたプログラムコードを読み出して実行するようにしてもよい。 Furthermore, the software program code that realizes the functions of the embodiments may be distributed via a network and stored on a storage means such as a computer's hard disk or memory, or on a storage medium such as a CD-RW or CD-R, and the processor of the computer may read and execute the program code stored on the storage means or storage medium.

上述の実施例において、制御線や情報線は、説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。全ての構成が相互に接続されていてもよい。 In the above examples, the control lines and information lines shown are those considered necessary for explanation, and not all control lines or information lines in the product are necessarily shown. All components may be interconnected.

１００ロボット
１０１計算機
１１０作業装置群
１１１コントローラ
１１２計測装置
１２０演算装置
１２１記憶装置
１２２通信装置
１２３入力装置
１２４出力装置
１３０軌道情報生成部
１３１環境設定部
１３２学習部
１３３シミュレータ
１３４モデル統合部
１４０ロボット構成情報
１４１設備構成情報
１４２軌道情報
１４３環境調整情報
１４４モデル管理情報 100 Robot 101 Computer 110 Working device group 111 Controller 112 Measuring device 120 Arithmetic device 121 Storage device 122 Communication device 123 Input device 124 Output device 130 Trajectory information generation unit 131 Environment setting unit 132 Learning unit 133 Simulator 134 Model integration unit 140 Robot configuration information 141 Equipment configuration information 142 Trajectory information 143 Environment adjustment information 144 Model management information

Claims

A computer system for learning a model for controlling a robot that performs a task including grasping and moving an object, comprising:
at least one computer having a computing device, a storage device connected to the computing device, and an interface connected to the computing device;
The at least one computer
a first process for receiving input of information about a task and an initial model;
a second process of setting a virtual environment including a virtual workspace, a virtual facility, and a virtual robot based on the information about the work, and setting a first trajectory for the virtual environment;
a third process of using the initial model to simulate a control process for the task, including movement of an object by the virtual robot along the first trajectory in the virtual environment, and updating the initial model based on a machine learning algorithm using a result of the simulation, so that the task time is short and the task success probability is high;
In the second process, the at least one computer randomly sets the virtual environment and randomly sets the first trajectory;
A computer system characterized in that in the first learning process, the at least one computer repeatedly executes the second process and the third process until the learning termination condition is met, and records the initial model when the learning termination condition is met as the first model.

2. The computer system of claim 1,
storing environment adjustment information for setting the virtual environment;
A computer system characterized in that, in the second processing, the at least one computer randomly sets the virtual environment based on the information related to the task and the environment adjustment information.

2. The computer system of claim 1,
The at least one computer
a fourth process for receiving input of information about the work and the first model to be used;
a fifth process of setting a second trajectory based on the information about the work;
a sixth process of using the first model to perform a control process for the task including movement of an object by the robot along the second trajectory, and updating the first model based on a machine learning algorithm using a result of the control process so that the task time is short and the task success probability is high;
A computer system characterized in that in the second learning process, the at least one computer repeatedly executes the sixth process until the learning termination condition is met, and records the first model as the second model when the learning termination condition is met.

4. The computer system according to claim 3,
The at least one computer
a seventh process of receiving input of information about the work, the first model to be used, and the second model to be used;
an eighth process of integrating the first model and the second model;
The second process;
a ninth process of performing a simulation of a control process for the task, including movement of an object by the virtual robot along the first trajectory, in the virtual environment using an integrated model obtained by integrating the first model and the second model, and updating the integrated model based on a machine learning algorithm using a result of the simulation , so that the task time is short and the success probability of the task is high;
In the third learning process, the at least one computer repeatedly executes the second process and the ninth process until the learning termination condition is met, and records the integrated model when the learning termination condition is met as the third model.

5. The computer system of claim 4,
in the first learning process, the at least one computer calculates a task time for the task and a success probability for the task in the first model as indices related to accuracy of the first model, and records data associating the processing time for the first learning process with the indices of the first model in model management information;
in the second learning process, the at least one computer calculates a task time for the task and a success probability for the task in the second model as indices related to accuracy of the second model, and records data associating the processing time for the second learning process with the indices of the second model in the model management information;
in the third learning process, the at least one computer calculates a task time for the task and a success probability for the task in the third model as indices related to accuracy of the third model, and records data associating the processing time for the third learning process with the indices of the third model in the model management information;
The computer system is characterized in that the at least one computer presents the model management information to a user.

6. The computer system according to claim 5,
A computer system, wherein the initial model, the first model, the second model, and the third model are neural networks.

A method for learning a model for controlling a robot that performs a task including grasping and moving an object, the method being executed by a computer system, comprising:
the computer system includes at least one computer having a computing device, a storage device connected to the computing device, and an interface connected to the computing device;
The model learning method includes:
The at least one computer
a first process for receiving input of information about a task and an initial model;
a second process of setting a virtual environment including a virtual workspace, a virtual facility, and a virtual robot based on the information about the work, and setting a first trajectory for the virtual environment;
a third process of using the initial model to simulate a control process for the task, including movement of an object by the virtual robot along the first trajectory in the virtual environment, and updating the initial model based on a machine learning algorithm using a result of the simulation, so that the task time is short and the task success probability is high;
In the second process, the at least one computer randomly sets the virtual environment and randomly sets the first trajectory;
A model learning method characterized in that, in the first learning process, the at least one computer repeatedly executes the second process and the third process until a learning termination condition is met, and when the learning termination condition is met, the initial model is recorded as a first model.

A method for training a model according to claim 7, comprising:
the computer system holds environmental adjustment information for setting the virtual environment;
A model learning method, wherein in the second process, the at least one computer randomly sets the virtual environment based on the information about the task and the environment adjustment information.

A method for training a model according to claim 7, comprising:
The at least one computer
a fourth process for receiving input of information about the work and the first model to be used;
a fifth process of setting a second trajectory based on the information about the work;
a sixth process of using the first model to perform a control process for the task including movement of an object by the robot along the second trajectory, and updating the first model based on a machine learning algorithm using a result of the control process so that the task time is short and the task success probability is high;
A model learning method characterized in that, in the second learning process, the at least one computer repeatedly executes the sixth process until a learning termination condition is met, and when the learning termination condition is met, the first model is recorded as a second model.

A method for training a model according to claim 9, comprising:
The at least one computer
a seventh process of receiving input of information about the work, the first model to be used, and the second model to be used;
an eighth process of integrating the first model and the second model;
The second process;
a ninth process of performing a simulation of a control process for the task, including movement of an object by the virtual robot along the first trajectory, in the virtual environment using an integrated model obtained by integrating the first model and the second model, and updating the integrated model based on a machine learning algorithm using a result of the simulation , so that the task time is short and the success probability of the task is high;
A model learning method characterized in that, in the third learning process, the at least one computer repeatedly executes the second process and the ninth process until a learning termination condition is met, and when the learning termination condition is met, the integrated model is recorded as a third model.

A method for training a model according to claim 10, comprising:
in the first learning process, the at least one computer calculates a task time for the task and a success probability for the task in the first model as indices related to accuracy of the first model, and records data associating the processing time for the first learning process with the indices of the first model in model management information;
in the second learning process, the at least one computer calculates a task time for the task and a success probability for the task in the second model as indices related to accuracy of the second model, and records data associating the processing time for the second learning process with the indices of the second model in the model management information;
in the third learning process, the at least one computer calculates a task time for the task and a success probability for the task in the third model as indices related to accuracy of the third model, and records data associating the processing time for the third learning process with the indices of the third model in the model management information;
The model learning method is characterized in that the at least one computer presents the model management information to a user.

A method for training a model according to claim 11, comprising:
A model learning method, wherein the initial model, the first model, the second model, and the third model are neural networks.