JP7848769B2

JP7848769B2 - Control device and control method

Info

Publication number: JP7848769B2
Application number: JP2023122148A
Authority: JP
Inventors: 聖公赤池
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2023-07-27
Filing date: 2023-07-27
Publication date: 2026-04-21
Anticipated expiration: 2043-07-27
Also published as: JP2025018448A

Description

本開示は、制御装置及び制御方法に関し、例えば、ワークを拾い上げて移動させるロボットアームの制御装置及び制御方法に関する。 This disclosure relates to a control device and control method, and more particularly to a control device and control method for a robot arm that picks up and moves a workpiece.

例えば、特許文献１の制御装置は、ロボットアームでワークを拾い上げた際の当該ワークの重心点をロボットアームに設けられた力覚センサの検出情報に基づいて算出し、算出した重心点を参照してワークの種類を判定している。 For example, the control device described in Patent Document 1 calculates the center of gravity of a workpiece when it is picked up by a robot arm, based on detection information from a force sensor installed on the robot arm, and determines the type of workpiece by referring to the calculated center of gravity.

特開２０２２－６７９９５号公報Japanese Patent Publication No. 2022-67995

本出願人は、以下の課題を見出した。特許文献１の制御装置は、ワークの種類を判別することはできるが、ワーク毎に当該ワークを良好に拾い上げることができる把持点を導き出せない課題を有する。 The applicant has identified the following problem: While the control device described in Patent Document 1 can identify the type of workpiece, it has the problem of not being able to determine a gripping point that allows for a good pickup of each workpiece.

本開示は、このような問題点に鑑みてなされたものであり、ワーク毎に当該ワークを良好に拾い上げることができる把持点を導き出せる制御装置及び制御方法を実現する。 This disclosure was made in view of these problems, and aims to realize a control device and control method that can determine a gripping point that allows for good pickup of each workpiece.

本開示の一態様に係る制御装置は、ワークを拾い上げて移動させるロボットアームの制御装置であって、
前記ロボットアームで前記ワークを拾い上げた際の前記ワークの重心点を前記ロボットアームに設けられた力覚センサの検出情報に基づいて導き出し、前記ロボットアームで前記ワークを拾い上げた際の前記ワークの重心点と前記ワークの把持点とのズレ量と、前記ロボットアームで前記ワークを拾い上げた際の前記ロボットアームの関節情報と、を入力とし、前記ワークの把持点を出力とする深層強化学習モデルによって、前記ワークの重心点に対してズレ量が最小となる前記ワークの把持点を学習する強化学習部と、
前記深層強化学習モデルによって学習した前記ワークの把持点で前記ワークを拾い上げるように、前記ロボットアームの関節情報を導き出す経路生成部と、
前記導き出したロボットアームの関節情報に基づいて前記ロボットアームを制御する制御部と、
を備え、
前記強化学習部は、前記深層強化学習モデルによって、前記導き出したロボットアームの関節情報に基づいて前記ロボットアームを制御して前記ロボットアームで前記ワークを拾い上げた際の前記ワークの重心点と前記ワークの把持点とのズレ量と、前記導き出したロボットアームの関節情報と、を用いて、前記ワークの把持点を再学習する。 A control device according to one aspect of this disclosure is a control device for a robot arm that picks up and moves a workpiece,
A reinforcement learning unit learns the gripping point of the workpiece that minimizes the amount of deviation from the center of gravity of the workpiece when the robot arm picks up the workpiece, using a deep reinforcement learning model that takes the amount of deviation between the center of gravity of the workpiece and the gripping point of the workpiece when the robot arm picks up the workpiece and the joint information of the robot arm when the robot arm picks up the workpiece as inputs and the gripping point of the workpiece as output.
A path generation unit that derives joint information of the robot arm so as to pick up the workpiece at the gripping point of the workpiece learned by the deep reinforcement learning model,
A control unit that controls the robot arm based on the derived joint information of the robot arm,
Equipped with,
The reinforcement learning unit uses the deep reinforcement learning model to control the robot arm based on the derived joint information of the robot arm, and uses the amount of deviation between the center of gravity of the workpiece and the gripping point of the workpiece when the robot arm picks up the workpiece, along with the derived joint information of the robot arm, to relearn the gripping point of the workpiece.

上述の制御装置は、前記ロボットアームで拾い上げるワークの種類並びに拾い上げるワークの順番に関する情報、及び学習済みワークの種類に関する情報を取得し、前記ロボットアームで拾い上げるワークの種類並びに拾い上げるワークの順番に関する情報、及び学習済みワークの種類に関する情報に基づいて、今回、拾い上げるワークが学習済みか否かを判定する判定部を備え、
前記今回、拾い上げるワークが学習済みの場合、前記深層強化学習モデルによる学習を省略し、前記今回、拾い上げるワークが学習済みでない場合、前記深層強化学習モデルによる学習を行うことが好ましい。 The control device described above acquires information regarding the type of workpiece to be picked up by the robot arm, the order in which the workpieces are picked up, and information regarding the types of workpieces that have been learned. Based on the information regarding the type of workpiece to be picked up by the robot arm, the order in which the workpieces are picked up, and the types of workpieces that have been learned, the control device includes a determination unit that determines whether or not the workpiece to be picked up this time has been learned.
In the present case, if the workpiece to be picked up has already been learned, it is preferable to omit the learning process using the deep reinforcement learning model, and if the workpiece to be picked up has not already been learned, it is preferable to perform the learning process using the deep reinforcement learning model.

上述の制御装置において、前記ロボットアームの関節情報として前記ロボットアームの各関節部の角度情報の他に、前記ロボットアームの関節部を駆動するモータの角速度情報、角加速度情報又はトルク情報の少なくとも何れか１つを含むことが好ましい。 In the control device described above, it is preferable that the joint information of the robot arm includes, in addition to the angle information of each joint of the robot arm, at least one of the angular velocity information, angular acceleration information, or torque information of the motor that drives the joint of the robot arm.

本開示の一態様に係る制御方法は、ワークを拾い上げて移動させるロボットアームの制御方法であって、
前記ロボットアームで前記ワークを拾い上げた際の前記ワークの重心点を前記ロボットアームに設けられた力覚センサの検出情報に基づいて導き出す工程と、
前記ロボットアームで前記ワークを拾い上げた際の前記ワークの重心点と前記ワークの把持点とのズレ量と、前記ロボットアームで前記ワークを拾い上げた際の前記ロボットアームの関節情報と、を入力とし、前記ワークの把持点を出力とする深層強化学習モデルによって、前記ワークの重心点に対してズレ量が最小となる前記ワークの把持点を学習する工程と、
前記深層強化学習モデルによって学習した前記ワークの把持点で前記ワークを拾い上げるように、前記ロボットアームの関節情報を導き出す工程と、
前記導き出したロボットアームの関節情報に基づいて前記ロボットアームを制御する工程と、
前記深層強化学習モデルによって、前記導き出したロボットアームの関節情報に基づいて前記ロボットアームを制御して前記ロボットアームで前記ワークを拾い上げた際の前記ワークの重心点と前記ワークの把持点とのズレ量と、前記導き出したロボットアームの関節情報と、を用いて、前記ワークの把持点を再学習する工程と、
を備える。 A control method relating to one aspect of this disclosure is a control method for a robot arm that picks up and moves a workpiece,
A step of determining the center of gravity of the workpiece when the robot arm picks up the workpiece based on detection information from a force sensor provided on the robot arm,
A deep reinforcement learning model takes the amount of deviation between the center of gravity of the workpiece and the gripping point of the workpiece when the robot arm picks up the workpiece as input, and the joint information of the robot arm when the robot arm picks up the workpiece as output, to learn the gripping point of the workpiece that minimizes the amount of deviation from the center of gravity of the workpiece.
The process of deriving joint information of the robot arm so that the workpiece is picked up at the gripping point of the workpiece learned by the deep reinforcement learning model,
A step of controlling the robot arm based on the joint information of the robot arm derived above,
The process involves using the deep reinforcement learning model to control the robot arm based on the joint information of the robot arm derived above, to relearn the gripping point of the workpiece, using the amount of deviation between the center of gravity of the workpiece and the gripping point of the workpiece when the robot arm picks up the workpiece, and the derived joint information of the robot arm.
It is equipped with.

上述の制御方法は、前記ロボットアームで拾い上げるワークの種類並びに拾い上げるワークの順番に関する情報、及び学習済みワークの種類に関する情報を取得し、前記ロボットアームで拾い上げるワークの種類並びに拾い上げるワークの順番に関する情報、及び学習済みワークの種類に関する情報に基づいて、今回、拾い上げるワークが学習済みか否かを判定する工程を備え、
前記今回、拾い上げるワークが学習済みの場合、前記深層強化学習モデルによる学習を省略し、前記今回、拾い上げるワークが学習済みでない場合、前記深層強化学習モデルによる学習を行うことが好ましい。 The control method described above includes a step of acquiring information regarding the type of workpiece to be picked up by the robot arm, the order in which the workpieces are picked up, and information regarding the types of workpieces that have been learned, and determining whether the workpiece to be picked up this time has been learned based on the information regarding the type of workpiece to be picked up by the robot arm, the order in which the workpieces are picked up, and the types of workpieces that have been learned.
In the present case, if the workpiece to be picked up has already been learned, it is preferable to omit the learning process using the deep reinforcement learning model, and if the workpiece to be picked up has not already been learned, it is preferable to perform the learning process using the deep reinforcement learning model.

本開示によれば、ワーク毎に当該ワークを良好に把持することができる把持点を導き出せる制御装置及び制御方法を実現できる。 According to this disclosure, a control device and control method can be realized that can determine a gripping point that allows for a good grip on each workpiece.

実施の形態１のロボットアームシステムの構成を示すブロック図である。This is a block diagram showing the configuration of the robot arm system according to Embodiment 1. 実施の形態１のロボットアームシステムのロボットアームでワークを把持する様子を示す図である。This diagram shows how the robot arm of the robot arm system of Embodiment 1 grips a workpiece. 実施の形態１のロボットアームシステムのロボットアームを制御する流れを説明するためのフローチャート図である。This is a flowchart illustrating the flow of controlling the robot arm of the robot arm system of Embodiment 1. ロボットアームで拾い上げられるワークが収納箱に収納された状態を示す図である。This diagram shows the workpiece, picked up by the robot arm, stored in a storage box. 実施の形態２のロボットアームシステムの構成を示すブロック図である。This is a block diagram showing the configuration of the robot arm system of Embodiment 2. 実施の形態２のロボットアームシステムのロボットアームを制御する流れを説明するためのフローチャート図である。This is a flowchart illustrating the control flow of the robot arm of the robot arm system of Embodiment 2.

以下、本開示を適用した具体的な実施の形態について、図面を参照しながら詳細に説明する。但し、本開示が以下の実施の形態に限定される訳ではない。また、以下の記載及び図面は、適宜、簡略化されている。 The following describes specific embodiments applying this disclosure, with reference to the drawings. However, this disclosure is not limited to the following embodiments. Furthermore, the following description and drawings have been simplified as appropriate.

＜実施の形態１＞
先ず、本実施の形態のロボットアームシステムの構成を説明する。図１は、本実施の形態のロボットアームシステムの構成を示すブロック図である。図２は、本実施の形態のロボットアームシステムのロボットアームでワークを把持する様子を示す図である。 <Embodiment 1>
First, the configuration of the robot arm system of this embodiment will be described. Figure 1 is a block diagram showing the configuration of the robot arm system of this embodiment. Figure 2 is a diagram showing how the robot arm of the robot arm system of this embodiment grips a workpiece.

本実施の形態のロボットアームシステム１は、図１に示すように、ロボットアーム２、力覚センサ３、カメラ４、制御装置５及びデータベース（ＤＢ）６を備えている。これらのロボットアーム２、力覚センサ３、カメラ４及びＤＢ６は、有線又は無線によって制御装置５に通信可能に接続されている。 As shown in Figure 1, the robot arm system 1 of this embodiment comprises a robot arm 2, a force sensor 3, a camera 4, a control device 5, and a database (DB) 6. These robot arm 2, force sensor 3, camera 4, and DB 6 are communicated to the control device 5 via wired or wireless means.

ロボットアーム２は、一般的なロボットアームと同様に多関節ロボットアームであり、図２に示すように、ハンド部２ａ及びアーム部２ｂを備えている。ハンド部２ａは、例えば、２指ハンドで構成することができる。アーム部２ｂは、例えば、６軸ロボットで構成することができる。これらのハンド部２ａ及びアーム部２ｂの各々の関節部は、モータ２ｃの駆動力に基づいて動作する。 The robot arm 2 is a multi-joint robot arm, similar to a typical robot arm, and comprises a hand section 2a and an arm section 2b, as shown in Figure 2. The hand section 2a can be configured as, for example, a two-fingered hand. The arm section 2b can be configured as, for example, a six-axis robot. Each joint of these hand section 2a and arm section 2b operates based on the driving force of the motor 2c.

力覚センサ３は、図２に示すように、ロボットアーム２のハンド部２ａとアーム部２ｂとの間に配置されている。つまり、ハンド部２ａは、力覚センサ３を介してアーム部２ｂに連結されている。カメラ４は、例えば、ＲＧＢＤカメラで構成することができる。 As shown in Figure 2, the force sensor 3 is positioned between the hand portion 2a and the arm portion 2b of the robot arm 2. In other words, the hand portion 2a is connected to the arm portion 2b via the force sensor 3. The camera 4 can be configured as, for example, an RGBD camera.

カメラ４は、例えば、ロボットアーム２で把持される前のワークＷを撮影する。カメラ４は、ロボットアーム２に設けられているとよいが、ロボットアーム２で把持される前のワークＷを撮影することができる箇所に配置されていればよい。 Camera 4, for example, photographs the workpiece W before it is grasped by the robot arm 2. While it is preferable for camera 4 to be mounted on the robot arm 2, it is sufficient if it is positioned in a location that allows it to photograph the workpiece W before it is grasped by the robot arm 2.

制御装置５は、図１に示すように、形状認識部５ａ、強化学習部５ｂ、経路生成部５ｃ及び制御部５ｄを備えている。形状認識部５ａは、カメラ４で撮影した画像情報に基づいて、ワークＷの形状を認識する。ワークＷの形状を認識する手法は、一般的な手法を用いることができる。 As shown in Figure 1, the control device 5 comprises a shape recognition unit 5a, a reinforcement learning unit 5b, a path generation unit 5c, and a control unit 5d. The shape recognition unit 5a recognizes the shape of the workpiece W based on image information captured by the camera 4. A general method can be used to recognize the shape of the workpiece W.

強化学習部５ｂは、詳細は後述するが、ロボットアーム２でワークＷを拾い上げた際の当該ワークＷの重心点Ｇを力覚センサ３の検出情報に基づいて導き出し、深層強化学習モデルによってワークＷの重心点Ｇに対するズレ量が最小となるワークＷの把持点Ｐを学習する。 The reinforcement learning unit 5b, as will be described in detail later, derives the center of gravity G of the workpiece W when the robot arm 2 picks it up, based on the detection information from the force sensor 3, and learns the gripping point P of the workpiece W that minimizes the amount of deviation from the center of gravity G of the workpiece W using a deep reinforcement learning model.

深層強化学習モデルは、ロボットアーム２でワークＷを拾い上げた際の当該ワークＷの重心点ＧとワークＷの把持点Ｐとのズレ量と、ロボットアーム２でワークＷを拾い上げた際のロボットアーム２の関節情報と、を入力とし、ワークＷの把持点Ｐを出力とするモデルである。なお、重心点Ｇ、把持点Ｐ及びズレ量は、例えば、所定の位置を原点とする三次元の座標情報で示されるとよい。 The deep reinforcement learning model takes the following as inputs: the amount of displacement between the center of gravity G of the workpiece W and the gripping point P of the workpiece W when the robot arm 2 picks up the workpiece W; and the joint information of the robot arm 2 when the robot arm 2 picks up the workpiece W. The model outputs the gripping point P of the workpiece W. The center of gravity G, gripping point P, and displacement amount can preferably be represented by three-dimensional coordinate information with a predetermined origin.

経路生成部５ｃは、詳細は後述するが、認識したワークＷの形状に基づいて導き出した当該ワークＷの把持点Ｐ又は深層強化学習モデルによって学習したワークＷの把持点Ｐにロボットアーム２がアプローチしてワークＷを拾い上げるための経路を生成し、生成した経路を実現するためのロボットアーム２の関節情報として各々の関節部の角度などを導き出す。 The path generation unit 5c, as will be described in detail later, generates a path for the robot arm 2 to approach the gripping point P of the workpiece W, derived based on the shape of the recognized workpiece W, or the gripping point P of the workpiece W learned by the deep reinforcement learning model, and pick up the workpiece W. It also derives joint information for the robot arm 2, such as the angles of each joint, to realize the generated path.

また、経路生成部５ｃは、深層強化学習モデルによって学習したワークＷの把持点Ｐから所定の場所にワークＷを移動させるための経路を生成し、生成した経路を実現するためのロボットアーム２の関節情報として各々の関節部の角度などを導き出す。 Furthermore, the path generation unit 5c generates a path for moving the workpiece W from its gripping point P, which was learned by a deep reinforcement learning model, to a predetermined location. It then derives joint information for the robot arm 2, such as the angles of each joint, to realize the generated path.

制御部５ｄは、導き出したロボットアーム２の各々の関節部の角度情報などに基づいて、ロボットアーム２の各々のモータ２ｃを制御する。このとき、制御部５ｄは、モータ２ｃに設けられたエンコーダ２ｄの検出情報に基づいて当該モータ２ｃを制御するとよい。ＤＢ６は、深層強化学習モデルやワークＷの把持点Ｐ毎の当該ワークＷの把持が成功したか否かの結果情報などを格納する。 The control unit 5d controls each motor 2c of the robot arm 2 based on the angle information of each joint of the robot arm 2 that it has derived. At this time, the control unit 5d may control the motor 2c based on the detection information of the encoder 2d provided on the motor 2c. The DB 6 stores the deep reinforcement learning model and result information such as whether the gripping of the workpiece W was successful or not for each gripping point P of the workpiece W.

次に、本実施の形態のロボットアームシステム１のロボットアーム２を制御する流れを説明する。図３は、本実施の形態のロボットアームシステムのロボットアームを制御する流れを説明するためのフローチャート図である。 Next, the control flow of the robot arm 2 of the robot arm system 1 of this embodiment will be explained. Figure 3 is a flowchart illustrating the control flow of the robot arm of the robot arm system of this embodiment.

ここで、図４は、ロボットアーム２で拾い上げられるワークＷが収納箱１１に収納された状態を示しており、以下の説明では、当該ワークＷを拾い上げるものとする。図４の例では、図２に示すような側面視が偏心形状である板状のワークＷが収納箱１１の内部に収納された状態で当該収納箱１１に設けられた仕切り１１ａに立て掛けられている。 Here, Figure 4 shows the state in which the workpiece W to be picked up by the robot arm 2 is stored in the storage box 11. In the following description, we will assume that the workpiece W is being picked up. In the example of Figure 4, a plate-shaped workpiece W, which has an eccentric shape in side view as shown in Figure 2, is stored inside the storage box 11 and propped up against a partition 11a provided in the storage box 11.

このような状態で、先ず、カメラ４がワークＷを上側から撮影する。そして、制御装置５の形状認識部５ａは、カメラ４で撮影したワークＷの画像情報に基づいて、ワークＷの形状を認識する（Ｓ１）。このとき、形状認識部５ａは、ワークＷの上面視の形状を認識するとよい。図４の例では、形状認識部５ａは、ワークＷの上面視が薄い板状であると認識する。 In this state, first, camera 4 photographs the workpiece W from above. Then, the shape recognition unit 5a of the control device 5 recognizes the shape of the workpiece W based on the image information of the workpiece W captured by camera 4 (S1). At this time, the shape recognition unit 5a should recognize the shape of the workpiece W as viewed from above. In the example in Figure 4, the shape recognition unit 5a recognizes that the top view of the workpiece W is a thin plate shape.

次に、制御装置５の経路生成部５ｃは、認識したワークＷの形状に基づいて、ワークＷの把持点Ｐを導き出す（Ｓ２）。例えば、ワークＷの上面視が薄い板状である場合、経路生成部５ｃは、図２に示すように、ワークＷの幅寸法（図２の左右方向）の中央を把持点Ｐとする。但し、経路生成部５ｃは、例えば、認識したワークＷの形状に基づいて、ワークＷの上面視の略中央を把持点Ｐとすればよい。 Next, the path generation unit 5c of the control device 5 derives the gripping point P of the workpiece W based on the recognized shape of the workpiece W (S2). For example, if the top view of the workpiece W is a thin plate shape, the path generation unit 5c sets the gripping point P to the center of the width dimension of the workpiece W (left-right direction in Figure 2), as shown in Figure 2. However, the path generation unit 5c may, for example, set the gripping point P to approximately the center of the top view of the workpiece W based on the recognized shape of the workpiece W.

次に、制御装置５の経路生成部５ｃは、導き出したワークＷの把持点Ｐにロボットアーム２がアプローチしてワークＷを拾い上げるための経路を生成し、生成した経路を実現するためのロボットアーム２の関節情報として各々の関節部の角度などを導き出す。 Next, the path generation unit 5c of the control device 5 generates a path for the robot arm 2 to approach the gripping point P of the workpiece W and pick up the workpiece W. It then derives joint information for the robot arm 2, such as the angles of each joint, to realize the generated path.

そして、制御装置５の制御部５ｄは、導き出したロボットアーム２の各々の関節部の角度情報などに基づいて、ロボットアーム２の各々のモータ２ｃを制御してワークＷを拾い上げる（Ｓ３）。ここで、ワークＷの拾い上げは、例えば、ワークＷを所定の高さ鉛直方向に引き上げるとよい。 Then, the control unit 5d of the control device 5 controls each motor 2c of the robot arm 2 based on the angle information of each joint of the robot arm 2 that it has derived, to pick up the workpiece W (S3). Here, the workpiece W can be picked up by, for example, pulling the workpiece W up vertically to a predetermined height.

次に、制御装置５の強化学習部５ｂは、ロボットアーム２でワークＷを拾い上げた際の当該ワークＷの重心点Ｇを力覚センサ３の検出情報に基づいて導き出し、ロボットアーム２でワークＷを拾い上げた際のワークＷの重心点ＧとワークＷの把持点Ｐとのズレ量と、ロボットアーム２でワークＷを拾い上げた際のロボットアーム２の関節情報と、を取得する（Ｓ４）。 Next, the reinforcement learning unit 5b of the control device 5 derives the center of gravity G of the workpiece W when the robot arm 2 picks up the workpiece W, based on the detection information from the force sensor 3. It then acquires the amount of deviation between the center of gravity G of the workpiece W and the gripping point P of the workpiece W, as well as the joint information of the robot arm 2 when the robot arm 2 picks up the workpiece W (S4).

ここで、力覚センサ３を用いてワークＷの重心点Ｇを導き出す手法は、一般的な手法を用いることができる。そして、ワークＷの重心点ＧとワークＷの把持点Ｐとのズレ量は、水平方向でのズレ量であるとよい。 Here, a general method can be used to determine the center of gravity G of the workpiece W using the force sensor 3. Furthermore, the amount of displacement between the center of gravity G of the workpiece W and the gripping point P of the workpiece W should ideally be the amount of displacement in the horizontal direction.

また、ロボットアーム２の関節情報は、ロボットアーム２でワークＷを拾い上げるために当該ワークＷを把持した際の各々の関節部の角度情報の他に、モータ２ｃの角速度情報、角加速度情報及びトルク情報の少なくとも１つを含んでいるとよい。 Furthermore, the joint information of the robot arm 2 may include, in addition to the angle information of each joint when the robot arm 2 grasps the workpiece W in order to pick it up, at least one of the angular velocity information, angular acceleration information, and torque information of the motor 2c.

次に、制御装置５の強化学習部５ｂは、深層強化学習モデルにＳ４の工程で取得したワークＷの重心点ＧとワークＷの把持点Ｐとのズレ量と、ロボットアーム２でワークＷを拾い上げた際のロボットアーム２の関節情報と、を入力し、ワークＷの重心点Ｇに対してズレ量が最小となるワークＷの把持点Ｐを学習する（Ｓ５）。 Next, the reinforcement learning unit 5b of the control device 5 receives the amount of displacement between the center of gravity G of the workpiece W and the gripping point P of the workpiece W, obtained in step S4, and the joint information of the robot arm 2 when the robot arm 2 picked up the workpiece W, and learns the gripping point P of the workpiece W that minimizes the displacement relative to the center of gravity G of the workpiece W (S5).

このとき、ワークＷの把持点Ｐが局所範囲である場合、例えば、ワークＷの形状や収納箱１１との干渉などの要因で、ロボットアーム２でワークＷの把持点Ｐを把持できない場合が想定される。そのため、制御装置５の強化学習部５ｂは、ワークＷの重心点Ｇに対して所定の範囲（図２の例では、ワークＷの重心点Ｇに対して当該ワークＷの幅方向に所定の範囲）でワークＷの把持点Ｐを分布させるとよい。 In this case, if the gripping point P of the workpiece W is within a localized area, it is conceivable that the robot arm 2 may not be able to grip the workpiece W at its gripping point P due to factors such as the shape of the workpiece W or interference with the storage box 11. Therefore, the reinforcement learning unit 5b of the control device 5 should distribute the gripping points P of the workpiece W within a predetermined range relative to the center of gravity G of the workpiece W (in the example in Figure 2, a predetermined range in the width direction of the workpiece W relative to the center of gravity G of the workpiece W).

次に、制御装置５の経路生成部５ｃは、学習したワークＷの把持点Ｐにロボットアーム２がアプローチしてワークＷを拾い上げるための経路を生成し、生成した経路を実現するためのロボットアーム２の関節情報として各々の関節部の角度などを導き出す。 Next, the path generation unit 5c of the control device 5 generates a path for the robot arm 2 to approach the learned gripping point P of the workpiece W and pick up the workpiece W. It then derives joint information for the robot arm 2, such as the angles of each joint, to realize the generated path.

そして、制御装置５の制御部５ｄは、導き出したロボットアーム２の各々の関節部の角度情報などに基づいて、ロボットアーム２の各々のモータ２ｃを制御してワークＷを拾い上げる（Ｓ６）。 Then, the control unit 5d of the control device 5 controls each motor 2c of the robot arm 2 based on the angle information of each joint of the robot arm 2 that it has derived, in order to pick up the workpiece W (S6).

次に、制御装置５の強化学習部５ｂは、ロボットアーム２で学習したワークＷの把持点Ｐを把持してワークＷを拾い上げた際の当該ワークＷの重心点Ｇを力覚センサ３の検出情報に基づいて取得し、ロボットアーム２でワークＷを拾い上げた際のワークＷの重心点ＧとワークＷの把持点Ｐとのズレ量と、ロボットアーム２でワークＷを拾い上げた際のロボットアーム２の関節情報と、を取得する（Ｓ７）。 Next, the reinforcement learning unit 5b of the control device 5 acquires the center of gravity G of the workpiece W when the robot arm 2 grasps the gripping point P of the workpiece W and picks up the workpiece W, based on the detection information of the force sensor 3. It also acquires the amount of deviation between the center of gravity G of the workpiece W and the gripping point P of the workpiece W when the robot arm 2 picks up the workpiece W, and the joint information of the robot arm 2 when the robot arm 2 picks up the workpiece W (S7).

次に、制御装置５の強化学習部５ｂは、深層強化学習モデルにＳ７の工程で取得したワークＷの重心点ＧとワークＷの把持点Ｐとのズレ量と、ロボットアーム２でワークＷを拾い上げた際のロボットアーム２の関節情報と、を入力し、ワークＷの重心点Ｇに対してズレ量が最小となるワークＷの把持点Ｐを再学習する（Ｓ８）。 Next, the reinforcement learning unit 5b of the control device 5 receives the amount of displacement between the center of gravity G of the workpiece W and the gripping point P of the workpiece W, obtained in step S7, and the joint information of the robot arm 2 when the robot arm 2 picked up the workpiece W, and relearns the gripping point P of the workpiece W that minimizes the displacement relative to the center of gravity G of the workpiece W (S8).

このとき、Ｓ５の工程において、強化学習部５ｂがワークＷの重心点Ｇに対して所定の範囲でワークＷの把持点Ｐを分布させている場合、深層強化学習モデルにワークＷの把持が成功したか否かの結果情報も入力して、ワークＷの重心点Ｇに対してズレ量が最小となるワークＷの把持点Ｐを再学習するとよい。 At this point, if, in step S5, the reinforcement learning unit 5b distributes the gripping points P of the workpiece W within a predetermined range relative to the centroid G of the workpiece W, it is advisable to input the result information of whether or not the gripping of the workpiece W was successful into the deep reinforcement learning model and retrain the gripping points P of the workpiece W that minimize the amount of displacement relative to the centroid G of the workpiece W.

これにより、ワークＷの重心点Ｇに対してズレ量が最小となるワークＷの把持点ＰでワークＷを確実に把持することができる。そのため、例えば、ワークＷが大型の場合であっても、ワークＷを把持した際に発生するワークＷのモーメントを減少させることができ、ワークＷを良好に拾い上げることができる。 This allows for secure gripping of the workpiece W at the gripping point P, where the amount of displacement relative to the workpiece W's center of gravity G is minimized. Therefore, even with large workpieces W, for example, the moment generated when gripping the workpiece W can be reduced, allowing for efficient pick-up of the workpiece W.

次に、制御装置５の経路生成部５ｃは、再学習したワークＷの把持点Ｐにロボットアーム２がアプローチしてワークＷを拾い上げ、所定の場所に移動させるための経路を生成し、生成した経路を実現するためのロボットアーム２の関節情報として各々の関節部の角度などを導き出す。 Next, the path generation unit 5c of the control device 5 generates a path for the robot arm 2 to approach the gripping point P of the relearned workpiece W, pick up the workpiece W, and move it to a predetermined location. It then derives joint information for the robot arm 2, such as the angles of each joint, to realize the generated path.

そして、制御装置５の制御部５ｄは、導き出したロボットアーム２の各々の関節部の角度情報などに基づいて、ロボットアーム２の各々のモータ２ｃを制御してワークＷを拾い上げて所定の場所に移動させる（Ｓ９）。 Then, the control unit 5d of the control device 5 controls each motor 2c of the robot arm 2 based on the angle information of each joint of the robot arm 2 that it has derived, to pick up the workpiece W and move it to the predetermined location (S9).

このように本実施の形態の制御装置５及び制御方法は、深層強化学習モデルを用いてワークＷの重心点Ｇに対してズレ量が最小となるワークＷの把持点Ｐを学習する。そのため、本実施の形態の制御装置５及び制御方法は、ワークＷ毎に当該ワークＷを良好に把持することができる把持点Ｐを簡単に導き出せる。 Thus, the control device 5 and control method of this embodiment learn the gripping point P of the workpiece W that minimizes the amount of displacement relative to the center of gravity G of the workpiece W, using a deep reinforcement learning model. Therefore, the control device 5 and control method of this embodiment can easily derive the gripping point P that allows for good gripping of each workpiece W.

しかも、ワークＷ毎に当該ワークＷの把持点Ｐを作業員が入力するティーチング処理を行う必要がなく、特許文献１の制御装置に比べて、把持可能なワークＷを簡単に増やすことができ、また、ワークＷの形状や荷姿によらずに、ワークＷを拾い上げることができる。 Furthermore, it eliminates the need for operators to manually input the gripping point P for each workpiece W. Compared to the control device described in Patent Document 1, it allows for easy expansion of the number of grippable workpieces W, and enables the picking up of workpieces W regardless of their shape or packaging.

＜実施の形態２＞
図５は、本実施の形態のロボットアームシステムの構成を示すブロック図である。本実施の形態のロボットアームシステム２１は、図５に示すように、実施の形態１のロボットアームシステム１と略等しい。 <Embodiment 2>
Figure 5 is a block diagram showing the configuration of the robot arm system of this embodiment. As shown in Figure 5, the robot arm system 21 of this embodiment is substantially the same as the robot arm system 1 of Embodiment 1.

そのため、重複する説明は省略するが、図５に示すように、制御装置２２が判定部２２ａを備えており、当該判定部２２ａが、今回、拾い上げるワークＷが学習済みか否かを判定し、判定結果に基づいて当該ワークＷの把持点Ｐを学習するか否かを判定する。 Therefore, although redundant explanations will be omitted, as shown in Figure 5, the control device 22 is equipped with a determination unit 22a. This determination unit 22a determines whether the workpiece W to be picked up has already been learned, and based on the determination result, it decides whether or not to learn the gripping point P of the workpiece W.

そして、ＤＢ２３には、ロボットアーム２で拾い上げるワークＷの種類や拾い上げるワークＷの順番などに関する情報、学習済みワークＷの種類に関する情報、学習済みワークＷの把持点Ｐに関する情報、及び学習済みワークＷの関節情報などが格納されている。 Furthermore, DB23 stores information such as the type of workpiece W to be picked up by the robot arm 2, the order in which the workpieces W are picked up, information about learned workpiece W types, information about the gripping points P of learned workpiece W, and information about the joints of learned workpiece W.

図６は、本実施の形態のロボットアームシステムのロボットアームを制御する流れを説明するためのフローチャート図である。本実施の形態のロボットアームシステム２１のロボットアーム２を制御する流れは、図６に示すように、実施の形態１のロボットアームシステム１のロボットアーム２を制御する流れと略等しい。 Figure 6 is a flowchart illustrating the control flow of the robot arm in this embodiment of the robot arm system. The control flow of the robot arm 2 in the robot arm system 21 of this embodiment is substantially the same as the control flow of the robot arm 2 in the robot arm system 1 of Embodiment 1, as shown in Figure 6.

そのため、重複する説明は省略するが、Ｓ１の工程の前に、制御装置２２の判定部２２ａがＤＢ２３からロボットアーム２で拾い上げるワークＷの種類や拾い上げるワークＷの順番などに関する情報、及び学習済みワークＷの種類に関する情報を読み出し、ロボットアーム２で拾い上げるワークＷの種類や拾い上げるワークＷの順番などに関する情報、及び学習済みワークＷの種類に関する情報に基づいて、今回、拾い上げるワークＷが学習済みか否かを判定する（Ｓ１１）。 Therefore, although redundant explanations will be omitted, before step S1, the determination unit 22a of the control device 22 reads information from DB 23 regarding the type of workpiece W to be picked up by the robot arm 2, the order in which the workpieces W will be picked up, and information regarding the types of workpieces W that have already been learned. Based on this information, the determination unit 22a determines whether the workpiece W to be picked up this time has already been learned (S11).

今回、拾い上げるワークＷが学習済みでない場合（Ｓ１１のＮＯ）、Ｓ１～Ｓ９の工程を実行し、ロボットアーム２を制御して当該ワークＷを拾い上げて移動させる。つまり、制御装置２２は、今回、拾い上げるワークＷの把持点を学習する。 If the workpiece W to be picked up is not already learned (NO in S11), steps S1 to S9 are executed, and the robot arm 2 is controlled to pick up and move the workpiece W. In other words, the control device 22 learns the gripping point of the workpiece W to be picked up this time.

一方、今回、拾い上げるワークＷが学習済みである場合（Ｓ１１のＹＥＳ）、制御装置２２の制御部５ｄは、ＤＢ２３から当該ワークＷの関節情報を読み出し、読み出したワークＷの関節情報に基づいて、ロボットアーム２の各々のモータ２ｃを制御してワークＷの把持を試み、ワークＷの把持が成功したか否かを判定する（Ｓ１２）。 On the other hand, if the workpiece W to be picked up is already learned (YES in S11), the control unit 5d of the control device 22 reads the joint information of the workpiece W from DB 23, and based on the read joint information of the workpiece W, controls each of the motors 2c of the robot arm 2 to attempt to grasp the workpiece W, and determines whether or not the grasp of the workpiece W was successful (S12).

ワークＷの把持が失敗した場合（Ｓ１２のＮＯ）、Ｓ１～Ｓ９の工程を実行し、ロボットアーム２を制御して当該ワークＷを拾い上げて移動させる。このとき、例えば、ワークＷは学習済みであっても、ワークＷの荷姿が変わって当該ワークＷの把持に失敗したことなどが要因にある。 If the gripping of workpiece W fails (NO in S12), steps S1 to S9 are executed, and the robot arm 2 is controlled to pick up and move the workpiece W. In this case, even if workpiece W has been learned, the failure to grip it may be due to a change in the packaging of workpiece W.

一方、ワークＷの把持が成功した場合（Ｓ１２のＹＥＳ）、読み出したワークＷの関節情報に基づいて、ロボットアーム２の各々のモータ２ｃを制御してワークＷの移動を継続させる（Ｓ１３）。つまり、制御装置２２は、今回、拾い上げるワークＷの把持点の学習を省略する。 On the other hand, if the gripping of the workpiece W is successful (YES in S12), the robot arm 2 controls each of its motors 2c based on the read joint information of the workpiece W to continue moving the workpiece W (S13). In other words, the control device 22 omits learning the gripping point of the workpiece W to be picked up this time.

このように本実施の形態の制御装置２２及び制御方法も、深層強化学習モデルを用いてワークＷの重心点Ｇに対してズレ量が最小となるワークＷの把持点Ｐを学習する。そのため、本実施の形態の制御装置２２及び制御方法は、ワークＷ毎に当該ワークＷを良好に把持することができる把持点Ｐを簡単に導き出せる。 Thus, the control device 22 and control method of this embodiment also learn the gripping point P of the workpiece W that minimizes the amount of displacement relative to the centroid G of the workpiece W using a deep reinforcement learning model. Therefore, the control device 22 and control method of this embodiment can easily derive the gripping point P that allows for good gripping of each workpiece W.

特に、本実施の形態の制御装置２２及び制御方法は、今回、拾い上げるワークＷが学習済みか否かを判定し、判定結果に基づいて、深層強化学習モデルによる学習を省略している。そのため、ワークＷを拾い上げる毎に当該ワークＷの把持点Ｐを学習する必要がなく、制御装置２２の計算負担を低減することができる。 In particular, the control device 22 and control method of this embodiment determine whether the workpiece W to be picked up has already been learned, and based on the determination result, the learning process using the deep reinforcement learning model is omitted. Therefore, it is not necessary to learn the gripping point P of the workpiece W each time it is picked up, thereby reducing the computational burden on the control device 22.

なお、本実施の形態では、制御装置２２の判定部２２ａがＤＢ２３からロボットアーム２で拾い上げるワークＷの種類や拾い上げるワークＷの順番などに関する情報、及び学習済みワークＷの種類に関する情報を読み出し、ロボットアーム２で拾い上げるワークＷの種類や拾い上げるワークＷの順番などに関する情報、及び学習済みワークＷの種類に関する情報に基づいて、今回、拾い上げるワークＷが学習済みか否かを判定しているが、今回、拾い上げるワークＷを撮影した画像情報に基づいて認識した当該ワークＷの形状に基づいて、今回、拾い上げるワークＷが学習済みか否かの判定をしてもよい。 In this embodiment, the determination unit 22a of the control device 22 reads information from DB 23 regarding the type of workpiece W to be picked up by the robot arm 2, the order in which the workpieces W are picked up, and information regarding the types of workpieces W that have already been learned. Based on this information, the determination unit 22a determines whether the workpiece W to be picked up this time has already been learned. However, the determination may also be made based on the shape of the workpiece W recognized from image information of the workpiece W to be picked up this time.

また、本実施の形態では、Ｓ１２の工程を実行しているが、Ｓ１２の工程を省略してもよい。 Furthermore, although step S12 is performed in this embodiment, step S12 may be omitted.

＜他の実施の形態＞
上記実施の形態では、本開示をハードウェアの構成として説明したが、本開示はこれに限定されるものではない。本開示は、各処理を、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）にプログラムを実行させることにより実現することも可能である。 <Other Embodiments>
In the embodiments described above, the disclosure was explained as a hardware configuration, but the disclosure is not limited thereto. The disclosure can also be implemented by having a CPU (Central Processing Unit) execute a program for each process.

ここで、プログラムは、コンピュータに読み込まれた場合に、１又はそれ以上の機能をコンピュータに行わせるための命令群（又はソフトウェアコード）を含む。プログラムは、非一時的なコンピュータ可読媒体又は実体のある記憶媒体に格納されてもよい。限定ではなく例として、コンピュータ可読媒体又は実体のある記憶媒体は、random-access memory（RAM）、read-only memory（ROM）、フラッシュメモリ、solid-state drive（SSD）又はその他のメモリ技術、CD-ROM、digital versatile disc（DVD）、Blu-ray（登録商標）ディスク又はその他の光ディスクストレージ、磁気カセット、磁気テープ、磁気ディスクストレージ又はその他の磁気ストレージデバイスを含む。プログラムは、一時的なコンピュータ可読媒体又は通信媒体上で送信されてもよい。限定ではなく例として、一時的なコンピュータ可読媒体又は通信媒体は、電気的、光学的、音響的、またはその他の形式の伝搬信号を含む。 Here, a program includes a set of instructions (or software code) that, when loaded into a computer, cause the computer to perform one or more functions. A program may be stored on a non-temporary computer-readable medium or a physical storage medium. Examples, but not limited to, include random-access memory (RAM), read-only memory (ROM), flash memory, solid-state drives (SSDs), or other memory technologies, CD-ROMs, digital versatile discs (DVDs), Blu-ray® discs, or other optical disc storage, magnetic cassettes, magnetic tapes, magnetic disk storage, or other magnetic storage devices. A program may be transmitted over a temporary computer-readable medium or a communication medium. Examples, but not limited to, include temporary computer-readable mediums or communication media that include electrically, optically, acoustically, or otherwise propagating signals.

本開示は上記実施の形態に限られたものではなく、趣旨を逸脱しない範囲で適宜変更することが可能である。 This disclosure is not limited to the embodiments described above, and may be modified as appropriate without departing from the spirit of the invention.

１ロボットアームシステム
２ロボットアーム、２ａハンド部、２ｂアーム部、２ｃモータ、２ｄエンコーダ
３力覚センサ
４カメラ
５制御装置、５ａ形状認識部、５ｂ強化学習部、５ｃ経路生成部、５ｄ制御部
６データベース
１１収納箱
２１ロボットアームシステム
２２制御装置、２２ａ判定部
２３データベース
Ｇワークの重心点
Ｐワークの把持点
Ｗワーク 1 Robot arm system 2 Robot arm, 2a Hand section, 2b Arm section, 2c Motor, 2d Encoder 3 Force sensor 4 Camera 5 Control device, 5a Shape recognition section, 5b Reinforcement learning section, 5c Path generation section, 5d Control unit 6 Database 11 Storage box 21 Robot arm system 22 Control device, 22a Judgment section 23 Database G Center of gravity of workpiece P Gripping point of workpiece W Workpiece

Claims

A control device for a robot arm that picks up and moves a workpiece,
A reinforcement learning unit learns the gripping point of the workpiece that minimizes the amount of deviation from the center of gravity of the workpiece when the robot arm picks up the workpiece, using a deep reinforcement learning model that takes the amount of deviation between the center of gravity of the workpiece and the gripping point of the workpiece when the robot arm picks up the workpiece and the joint information of the robot arm when the robot arm picks up the workpiece as inputs and the gripping point of the workpiece as output.
A path generation unit that derives joint information of the robot arm so as to pick up the workpiece at the gripping point of the workpiece learned by the deep reinforcement learning model,
A control unit that controls the robot arm based on the derived joint information of the robot arm,
Equipped with,
The reinforcement learning unit is a control device that, using the deep reinforcement learning model, controls the robot arm based on the joint information of the robot arm derived to pick up the workpiece, and uses the amount of deviation between the center of gravity of the workpiece and the gripping point of the workpiece, along with the derived joint information of the robot arm, to relearn the gripping point of the workpiece.

The system includes a determination unit that acquires information regarding the type of workpiece to be picked up by the robot arm, the order in which the workpieces are picked up, and the types of workpieces that have been learned, and determines whether the workpiece to be picked up this time has been learned or not based on the information regarding the type of workpiece to be picked up by the robot arm, the order in which the workpieces are picked up, and the types of workpieces that have been learned.
The control device according to claim 1, wherein if the workpiece to be picked up this time has already been learned, the learning by the deep reinforcement learning model is omitted, and if the workpiece to be picked up this time has not been learned, the learning by the deep reinforcement learning model is performed.

The control device according to claim 1 or 2, wherein, in addition to the angle information of each joint of the robot arm, the joint information of the robot arm includes at least one of the angular velocity information, angular acceleration information, or torque information of the motor that drives the joint of the robot arm.

A method for controlling a robot arm that picks up and moves a workpiece,
A step of determining the center of gravity of the workpiece when the robot arm picks up the workpiece based on detection information from a force sensor provided on the robot arm,
A deep reinforcement learning model takes the amount of deviation between the center of gravity of the workpiece and the gripping point of the workpiece when the robot arm picks up the workpiece as input, and the joint information of the robot arm when the robot arm picks up the workpiece as output, to learn the gripping point of the workpiece that minimizes the amount of deviation from the center of gravity of the workpiece.
The process of deriving joint information of the robot arm so that the workpiece is picked up at the gripping point of the workpiece learned by the deep reinforcement learning model,
A step of controlling the robot arm based on the joint information of the robot arm derived above,
The process involves using the deep reinforcement learning model to control the robot arm based on the joint information of the robot arm derived above, to relearn the gripping point of the workpiece, using the amount of deviation between the center of gravity of the workpiece and the gripping point of the workpiece when the robot arm picks up the workpiece, and the derived joint information of the robot arm.
A control method comprising:

The system includes a step of acquiring information regarding the type of workpiece to be picked up by the robot arm, the order in which the workpieces are picked up, and the types of workpieces that have been learned, and determining whether the workpiece to be picked up this time has been learned based on the information regarding the type of workpiece to be picked up by the robot arm, the order in which the workpieces are picked up, and the types of workpieces that have been learned,
The control method according to claim 4, wherein if the workpiece to be picked up this time has already been learned, the learning by the deep reinforcement learning model is omitted, and if the workpiece to be picked up this time has not been learned, the learning by the deep reinforcement learning model is performed.