JP7798613B2

JP7798613B2 - Posture determination method

Info

Publication number: JP7798613B2
Application number: JP2022035673A
Authority: JP
Inventors: 真規義平; 忠明長谷川
Original assignee: Honda Motor Co Ltd
Current assignee: Honda Motor Co Ltd
Priority date: 2022-03-08
Filing date: 2022-03-08
Publication date: 2026-01-14
Anticipated expiration: 2042-03-08
Also published as: JP2023131029A

Description

本発明は、姿勢決定方法に関する。 The present invention relates to an attitude determination method.

従来のエンドエフェクタの制御では、オフラインで得られるポリゴン情報（既知物体）において、安定把持可能な指先接触位置から求まる手首姿勢をデータベースに記憶している。また、オンラインでは、オフラインで記憶させたデータベースの手首位置を条件探索して使っている。また、ロボットの姿勢を操作する装置が提案されている（例えば特許文献１参照）。 In conventional end effector control, the wrist posture determined from the fingertip contact positions that allow stable grasping is stored in a database using polygon information (known objects) obtained offline. Additionally, online, wrist positions stored offline in the database are used through conditional search. Devices that manipulate the posture of robots have also been proposed (see, for example, Patent Document 1).

特許第６４７６３５８号公報Patent No. 6476358

しかしながら、従来技術では、エンドエフェクタを遠隔操作して未知物体を把持させる場合、オフラインでデータベースに記憶させた既知物体に対する手首姿勢データでは、未知物体に対して把持可能な向き等の情報がないため、エンドエフェクタ等の姿勢を決めることができなかった。 However, with conventional technology, when remotely operating an end effector to grasp an unknown object, the wrist posture data for known objects stored offline in a database does not contain information such as the orientation in which the unknown object can be grasped, making it impossible to determine the posture of the end effector, etc.

本発明は、上記の問題点に鑑みてなされたものであって、オンラインで姿勢を決定することができる姿勢決定方法を提供することを目的とする。 The present invention was made in consideration of the above problems, and aims to provide an attitude determination method that can determine attitude online.

（１）上記目的を達成するため、本発明の一態様に係る姿勢決定方法は、対象物体体を把持操作可能であり、複数の操作姿勢を有するエンドエフェクタに接続され、エンドエフェクタの位置を移動可能な移動機構の姿勢決定方法であって、前記対象物体をセンサによって計測する計測工程と、計測された前記対象物体の中心を示す複数の点を算出する算出工程と、前記複数の点の中から１つを、選択された複数の操作形状のうちの一つに応じた把持代表点として定める把持代表点決定工程と、前記把持代表点から前記移動機構の姿勢を決定する姿勢決定工程と、を有する。 (1) To achieve the above object, one aspect of the present invention provides an attitude determination method for a mobile mechanism that is capable of grasping and manipulating a target object, is connected to an end effector having multiple operation attitudes, and is capable of moving the position of the end effector, and includes a measurement step of measuring the target object using a sensor, a calculation step of calculating multiple points that indicate the center of the measured target object, a grasp representative point determination step of determining one of the multiple points as a grasp representative point corresponding to one of multiple selected operation shapes, and an attitude determination step of determining the attitude of the mobile mechanism from the grasp representative point.

（２）また、本発明の一態様に係る姿勢決定方法において、前記姿勢決定工程は、前記対象物体と前記移動機構の姿勢と似基づいて、前記把持物体の把持に関するタクソノミーに基づいて前記移動機構の姿勢を決定するようにしてもよい。 (2) Furthermore, in an attitude determination method according to one aspect of the present invention, the attitude determination step may determine the attitude of the moving mechanism based on a taxonomy relating to the gripping of the object to be gripped, based on the attitudes of the target object and the moving mechanism.

（３）また、本発明の一態様に係る姿勢決定方法において、前記姿勢決定工程は、前記エンドエフェクタと前記把持対象との関係と。前記対象物体と前記対象物体の周辺にある物体との関係と、前記対象物体の姿勢とをノードとエッジで表したポーズシーングラフを用いて予め学習したモデルを用いて、前記把持代表点として定めるようにしてもよい。 (3) In addition, in the posture determination method according to one aspect of the present invention, the posture determination step may determine the grasp representative point using a model trained in advance using a pose scene graph that represents the relationship between the end effector and the grasped object, the relationship between the target object and objects surrounding the target object, and the posture of the target object using nodes and edges.

（４）また、本発明の一態様に係る姿勢決定方法において、前記ポーズシーングラフのノードは、前記エンドエフェクタから見た前記対象物体の座標に基づき、前記ポーズシーングラフのエッジは、前記エンドエフェクタと前記把持対象との関係と。前記対象物体と前記対象物体の周辺にある物体との関係と、前記対象物体の姿勢に基づくコストであるようにしてもよい。 (4) In addition, in a posture determination method according to one aspect of the present invention, the nodes of the pose scene graph may be based on the coordinates of the target object as seen from the end effector, and the edges of the pose scene graph may be costs based on the relationship between the end effector and the grasped object, the relationship between the target object and objects surrounding the target object, and the posture of the target object.

（５）また、本発明の一態様に係る姿勢決定方法において、前記姿勢決定工程は、前記エンドエフェクタと前記対象物の相対ポーズと、前記対象物体と前記対象物体の中心を示す複数の点と、前記移動機構の取り得る姿勢とをノードとエッジで表したコストシーングラフも用いて予め学習したモデルを用いて、前記把持代表点として定めるようにしてもよい。 (5) In addition, in the posture determination method according to one aspect of the present invention, the posture determination step may determine the grasp representative point using a model that has been trained in advance using a cost scene graph that represents the relative pose of the end effector and the target object, the target object and multiple points indicating its center, and possible postures of the mobile mechanism using nodes and edges.

（６）また、本発明の一態様に係る姿勢決定方法において、前記コストシーングラフは、前記対象物体の中心を示す複数の点に対する把持中心の相対ポーズと、前記対象物体の中心を示す複数の点の周辺の前記対象物体の表面の曲面と、前記エンドエフェクタが備える指部の指曲面との差に基づく情報と、ＱｕａｌｉｔｙＭｅａｓｕｒｅの値をエッジにコストとして有するようにしてもよい。 (6) In addition, in a posture determination method according to one aspect of the present invention, the cost scene graph may include, as costs for edges, information based on the relative pose of the grip center with respect to a plurality of points indicating the center of the target object, information based on the difference between the curved surface of the target object around the plurality of points indicating the center of the target object and the curved surface of the finger portion of the end effector, and a Quality Measure value.

（１）～（６）によれば、オンラインで姿勢を決定することができる。 Based on (1) to (6), the posture can be determined online.

実施形態に係る把持中心を示す点の決定方法例を説明するための図である。10A and 10B are diagrams for explaining an example of a method for determining a point indicating a grip center according to an embodiment. 把持対象物の大きさが変化した場合の中心線の例を示す図である。10A and 10B are diagrams illustrating examples of center lines when the size of an object to be grasped changes. 三次元文体に対する中心線の例を示す図である。10A and 10B are diagrams illustrating examples of centerlines for three-dimensional writing styles. 把持代表点の選択を説明するための図である。FIG. 10 is a diagram for explaining selection of a grip representative point. 物体把持における把持代表点の決定を説明するための図である。FIG. 10 is a diagram for explaining determination of a grasp representative point in grasping an object. タクソノミーにおける名称例を示す図である。FIG. 10 is a diagram showing examples of names in a taxonomy. 物体と物体の関係、物体と動作の関係のコンテキストのイメージ図である。This is an image diagram of the context of the relationship between objects and the relationship between objects and actions. シーンを説明するための図である。FIG. 1 is a diagram for explaining a scene. グラフを説明するための図である。FIG. 10 is a diagram for explaining a graph. 実施形態に係るハンドがある方向と物体の姿勢の学習について説明するための図である。10A and 10B are diagrams for explaining learning of a hand direction and an object posture according to an embodiment. 実施形態に係る学習結果の利用例を説明するための図である。FIG. 10 is a diagram for explaining an example of use of a learning result according to the embodiment. ハンドと物体の関係のシーンと簡易的なグラフ表現を表す図である。1 is a diagram showing a scene of the relationship between a hand and an object and a simple graph representation thereof; 手首姿勢の推定処理の概要を示す図である。FIG. 10 is a diagram illustrating an overview of a wrist posture estimation process. コストシーングラフを説明するための図である。FIG. 10 is a diagram for explaining a cost scene graph. 実施形態に係る作業を行わせるエンドエフェクタの構成例を示す図である。1 is a diagram illustrating an example of the configuration of an end effector that performs work according to an embodiment. FIG. 実施形態に係る制御装置の構成例を示す図である。FIG. 2 is a diagram illustrating an example of the configuration of a control device according to the embodiment. 実施形態に係る学習時の手首姿勢決定装置が行う処理手順のフローチャートである。10 is a flowchart of a processing procedure performed by a wrist posture determination device during learning according to an embodiment. 実施形態に係る作業時の手首姿勢決定装置が行う処理手順のフローチャートである。10 is a flowchart of a processing procedure performed by a wrist posture determination device during work according to an embodiment. モデルを用いない場合の手首姿勢の決定方法を説明するための図である。FIG. 10 is a diagram for explaining a method for determining a wrist posture when no model is used.

以下、本発明の実施の形態について図面を参照しながら説明する。なお、以下の説明に用いる図面では、各部材を認識可能な大きさとするため、各部材の縮尺を適宜変更している。
なお、実施形態を説明するための全図において、同一の機能を有するものは同一符号を用い、繰り返しの説明は省略する。
また、本願でいう「ＸＸに基づいて」とは、「少なくともＸＸに基づく」ことを意味し、ＸＸに加えて別の要素に基づく場合も含む。また、「ＸＸに基づいて」とは、ＸＸを直接に用いる場合に限定されず、ＸＸに対して演算や加工が行われたものに基づく場合も含む。「ＸＸ」は、任意の要素（例えば、任意の情報）である。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the drawings used in the following description, the scale of each component is appropriately changed so that each component can be recognized.
In all the drawings for explaining the embodiments, the same reference numerals are used for components having the same functions, and repeated explanations will be omitted.
Furthermore, in this application, "based on XX" means "based on at least XX," and includes cases where it is based on other elements in addition to XX. Furthermore, "based on XX" is not limited to cases where XX is used directly, but also includes cases where it is based on XX that has been calculated or processed. "XX" is any element (for example, any information).

＜概要＞
実施形態の姿勢決定方法では、オンライン処理可能な手首姿勢決定を、物体の幾何構造（未知ポリゴンの主にスケルトン情報）と、タクソノミーの把持中心、把持時の指先曲率、指先接触点まわりのＱｕａｌｉｔｙＭｅａｓｕｒｅ（横力（摩擦力）二乗和最小）と、手と物体と周辺環境物のシーングラフに応じて決定する。
また、実施形態の姿勢決定方法では、手と把持対象物の相対ポーズ・把持対象物と周辺環境物の相対ポーズ・把持対象物と床面の相対ポーズちがいをポーズグラフとして定義する。そして、実施形態の姿勢決定方法では、スケルトンに対する把持中心の相対ポーズ・スケルトン周囲の物体表面局面と指局面の差の畳み込み・ＱｕａｌｉｔｙＭｅａｓｕｒｅの値をコストとしたグラフとして定義する。さらに、実施形態の姿勢決定方法では、２つのシーングラフのエッジをノード回りに畳み込んで最小コストとなりうるノードを見つけることで手首姿勢を決定する。 <Overview>
In the posture determination method of the embodiment, the wrist posture that can be processed online is determined based on the geometric structure of the object (mainly skeleton information of unknown polygons), the grasp center of the taxonomy, the fingertip curvature during grasp, the quality measure around the fingertip contact point (minimum sum of squares of lateral forces (friction forces)), and a scene graph of the hand, object, and surrounding environment.
In addition, in the posture determination method of the embodiment, the relative pose between the hand and the grasped object, the relative pose between the grasped object and the surrounding environment, and the relative pose differences between the grasped object and the floor are defined as a pose graph. The posture determination method of the embodiment defines the relative pose of the grasp center with respect to the skeleton, the convolution of the difference between the surface curves of the objects around the skeleton and the finger curves, and the Quality Measure value as a cost. Furthermore, in the posture determination method of the embodiment, the wrist posture is determined by convolving the edges of two scene graphs around a node to find the node with the smallest cost.

＜把持中心を示す点の決定方法例＞
図１は、本実施形態に係る把持中心を示す点の決定方法例を説明するための図である。
まず、画像ｇ１００のように、ポリゴンで表すため把持対象物体を上から見た白黒画像ｇ１０１を作成し、ボロノイ処理でポリゴンの中でどのあたりが中心であるか近接する点の集合である線ｇ１０２で表す。そして、この対象物体の中心を示す点の集合である線ｇ１０２に対して直角または、把持対象物体の外面に直交する方向に軸ｇ１０３を定義する。これらは、２本の指部の軸の長短を挟む際の摩擦力等を算出する際に用いる。 <Example of method for determining the point indicating the grip center>
FIG. 1 is a diagram for explaining an example of a method for determining a point indicating a grip center according to this embodiment.
First, a black-and-white image g101 of the object to be grasped viewed from above is created to represent it as a polygon, as in image g100. Voronoi processing is then used to determine the center of the polygon using line g102, a set of neighboring points. An axis g103 is then defined, either perpendicular to line g102, a set of points indicating the center of the object, or perpendicular to the outer surface of the object to be grasped. These are used to calculate frictional forces, etc., when pinching the long and short axes of two fingers.

画像ｇ１１０は、このようにして生成した中心線を２Ｄまたは３Ｄにコピーした画像例である。
画像ｇ１２０は、把持対象物体ｇ１２１である岩を斜め横から見た画像（ポリゴン）である。複数の近接する点の集合ｇ１２２は目標把持中心であり、点ｇ１２３はエンドエフェクタの把持中心である。このように、対象物の中心を表す複数の近接する点ｇ１２２を、本実施形態では中心線ともいう。 Image g110 is an example of a 2D or 3D copy of the centerline generated in this way.
Image g120 is an image (polygon) of a rock, which is an object to be grasped g121, viewed obliquely from the side. A set of multiple adjacent points g122 is the target grasp center, and point g123 is the grasp center of the end effector. In this embodiment, the multiple adjacent points g122 representing the center of the object are also referred to as a center line.

なお、作成におけるルールは、中心線が短い場合（例えば５サンプル以下）、物体中心フレームを目標把持ポーズとして選択する。また、目標把持ポーズより上では、四元数（０，０，０，１）のみで、指の方向はボディ枠のｘ軸と同じになる。 The rule for creation is that if the center line is short (for example, 5 samples or less), the object center frame is selected as the target grasp pose. Also, above the target grasp pose, only the quaternion (0,0,0,1) is used, and the finger direction is the same as the x-axis of the body frame.

図２は、把持対象物の大きさが変化した場合の中心線の例を示す図である。
図２のように、対象物体が小さくなる程、中心線の長さが短くなる。また、仮に対象物体が球や円の場合の中心線の点は、複数の点が中心に集まるため１つとなる。なお、隣接する複数の点で構成される中心線は、物体の大きさ、物体の長手方向、物体の方向情報も含む。
なお、図２において、符号ｇ１１が対象物体における目標把持中心であり、符号ｇ１２がエンドエフェクタの把持中心である。 FIG. 2 is a diagram showing an example of the center line when the size of the object to be grasped changes.
As shown in Figure 2, the smaller the target object, the shorter the center line. If the target object is a sphere or a circle, the center line will have only one point because multiple points converge at the center. The center line, which is made up of multiple adjacent points, also contains information about the size, longitudinal direction, and orientation of the object.
In FIG. 2, reference symbol g11 denotes the target grip center of the target object, and reference symbol g12 denotes the grip center of the end effector.

図３は、三次元文体に対する中心線の例を示す図である。中心線の検出では、上述したように、画像を白黒化し、中心線を作成する。画像ｇ２０１は、トーラス状の三次元物体の中心線の検出例である。符号ｇ２０２は、三次元物体であり、線ｇ２０３は中心線の候補を表し、線ｇ２０４は線ｇ２０２を平均処理した中心線の情報例を表す。画像ｇ２０５は、ネジ状の三次元物体の中心線の検出例である。符号ｇ２０６は、三次元物体であり、線ｇ２０７は中心線の候補を表し、線ｇ２０８は線ｇ２０７を平均処理した中心線の情報例を表す。なお、細いひものような物が巻かれている物体であっても、人形のような物体であっても、動物であっても、非連続物であっても、周知の手法で中心線を生成することができる。 Figure 3 shows an example of a center line for a three-dimensional writing style. Center line detection involves converting the image to black and white and creating a center line, as described above. Image g201 is an example of a detected center line for a torus-shaped three-dimensional object. Reference symbol g202 is a three-dimensional object, line g203 represents a center line candidate, and line g204 represents example center line information obtained by averaging line g202. Image g205 is an example of a detected center line for a screw-shaped three-dimensional object. Reference symbol g206 is a three-dimensional object, line g207 represents a center line candidate, and line g208 represents example center line information obtained by averaging line g207. Center lines can be generated using well-known techniques for objects wrapped around thin strings, doll-like objects, animals, and even non-continuous objects.

＜把持代表点の決定＞
上述したように、中心線は複数の点で構成されている。把持する際、このような複数の点から１つの点（把持代表点）を選択する必要がある。以下では、把持代表点の選択について説明する。
図４は、把持代表点の選択を説明するための図である。学習済みのモデルを用いない場合は、複数の点の中から、物体モデルの重心付近であり、かつエンドエフェクタが近づいてくる方向であり、かつ把持したとき傾かずに安定して把持できそうな点を選択する。 <Determination of gripping representative point>
As described above, the center line is made up of multiple points. When grasping, it is necessary to select one point (the grasp representative point) from these multiple points. The selection of the grasp representative point will be explained below.
4 is a diagram illustrating the selection of the grasp representative point. When a trained model is not used, a point is selected from multiple points that is near the center of gravity of the object model, in the direction in which the end effector is approaching, and that is likely to provide a stable grasp without tilting when grasped.

しかしながら、このような中心線だけでは、現実的には把持代表点を決定するのが難しい場合がある。
図５は、物体把持における把持代表点の決定を説明するための図である。図６は、タクソノミーにおける名称例を示す図である。 However, in reality, it may be difficult to determine the grip representative point using only such a center line.
Fig. 5 is a diagram for explaining how to determine a grasp representative point when grasping an object. Fig. 6 is a diagram showing examples of names in the taxonomy.

画像ｇ２２１のように、タクソノミー（参考文献１参照）におけるＭｅｄｉｕｍ－ｗｒａｐでボトルの側面を横から把持する例である。
画像ｇ２２２は、ボトルがホルダー等に挟まれている場合、タクソノミーにおけるＭｅｄｉｕｍ－ｗｒａｐでボトルの側面上部の横から把持する例である。
画像ｇ２２３は、ボトルが横たわっている場合、タクソノミーにおけるＰｒｅｃｉｓｉｏｎ－３ｆｉｎｇｅｒでボトルの側面を上から摘まむ例である。
画像ｇ２２４は、タクソノミーにおけるＴｒｉｐｏｄでボトルを上から把持する例である。
画像ｇ２２５は、タクソノミーにおけるＭｅｄｉｕｍ－ｗｒａｐでボトルを下から支える例である。 Image g221 is an example of a medium-wrap gripping the side of a bottle from the side in the taxonomy (see Reference 1).
Image g222 is an example of a bottle being held in a holder or the like and grasped from the side of the upper part of the side of the bottle using a medium-wrap in the taxonomy.
Image g223 is an example of pinching the side of a bottle from above with Precision-3 fingers in the taxonomy when the bottle is lying down.
Image g224 is an example of grasping a bottle from above using a Tripod in the taxonomy.
Image g225 is an example of a medium-wrap in the taxonomy supporting a bottle from below.

画像ｇ２２１～ｇ２２５において、符号ｇ２３１～ｇ２３５が示す領域は、把持しやすい領域である。この領域は、中心付近を挟む領域だけでは決定できず、対象物体がどのような状態であるか、さらにエンドエフェクタがどの方向から近づいていくのか等を考慮する必要がある。このため、把持においては、コンテキストを理解し、持ち方を変えられる機能が必要である。 In images g221 to g225, the areas indicated by symbols g231 to g235 are areas that are easy to grasp. This area cannot be determined solely by the area surrounding the center; it is necessary to consider the state of the target object and the direction from which the end effector is approaching. For this reason, grasping requires a function that can understand the context and change the way of holding.

参考文献１；Thomas Feix, Javier Romero,他,“The GRASP Taxonomy of Human Grasp Types” IEEE Transactions on Human-Machine Systems ( Volume: 46, Issue: 1, Feb. 2016),IEEE,p66-77 Reference 1: Thomas Feix, Javier Romero, et al., “The GRASP Taxonomy of Human Grasp Types” IEEE Transactions on Human-Machine Systems (Volume: 46, Issue: 1, Feb. 2016), IEEE, p66-77

把持のシーンは、図５を用いて説明したように、ボトルを持つ、ホルダーにあるボトルを持つ、地面の上にあるボトルを持つ、ボトルの蓋を摘まむ、ボトルの底を支える、とうである。これらの共通項は、物体と物体の関係、物体と動作の関係の２つのコンテキストを理解することが必要であることを示している。 As explained using Figure 5, the grasping scenes include holding a bottle, holding a bottle in a holder, holding a bottle on the ground, pinching the bottle cap, and supporting the bottom of the bottle. These common elements indicate the need to understand two contexts: the relationship between objects and the relationship between objects and actions.

図７は、物体と物体の関係、物体と動作の関係のコンテキストのイメージ図である。
画像ｇ２５１は、物体と物体の関係のコンテキストのイメージである。画像ｇ２５２は、物体と動作の関係のコンテキストのイメージである。
画像ｇ２５１のように、物体が例えば何かに挟まれている場合や机や床などに置かれている場合など、物体と物体にも関係がある。このような関係（状態）が意味であり、意味は、物体と物体の関係にコストとして加味される。
画像ｇ２５２のように、例えば物体を摘まむ場合、物体があり、摘まむ動作の意味がある。意味は、物体と動作の関係に対して、コストとして加味される。 FIG. 7 is an image diagram of the context of the relationship between objects and the relationship between objects and actions.
Image g251 is an image of the context of the relationship between objects. Image g252 is an image of the context of the relationship between objects and actions.
As in image g251, there are also relationships between objects, such as when an object is sandwiched between something else or placed on a desk or floor. Such relationships (states) are meanings, and meanings are added as costs to the relationships between objects.
For example, in the case of pinching an object as in image g252, there is an object and the meaning of the action of pinching. The meaning is taken into account as a cost for the relationship between the object and the action.

＜シーングラフ＞
次に、シーングラフについて説明する。
図８は、シーンを説明するための図である。物体を取り巻く環境を表すために、本実施形態ではシーングラフを用いる。例えば図８のように、机の上に、ボトル（ｂｏｔｔｌｅ）とグラス（ｇｌａｓｓ）とボウル（ｂｏｗｌ）が置かれている状態を表したのがシーンである。このシーンをグラフ表現すると、図９の画像ｇ２７１ように表すことができる。なお、ボトルには蓋（ｌｉｄ）が付いている。図９は、グラフを説明するための図である。グラフでは、物体同士の関係を表している。例えば、ボトルから見ると蓋は上にあり、蓋から見るとボトルは下にある。また、ボトルから見るとグラスは右にあり、グラスから見るとボトルは左にある。そして、ボトル、グラスそしてボウルは机の上にある。 <Scene graph>
Next, the scene graph will be described.
FIG. 8 is a diagram for explaining a scene. In this embodiment, a scene graph is used to represent the environment surrounding an object. For example, as shown in FIG. 8, a scene represents a state in which a bottle, a glass, and a bowl are placed on a desk. This scene can be represented graphically as shown in image g271 in FIG. 9. The bottle has a lid. FIG. 9 is a diagram for explaining a graph. The graph represents the relationship between objects. For example, when viewed from the bottle, the lid is on top, and when viewed from the lid, the bottle is on the bottom. Furthermore, when viewed from the bottle, the glass is on the right, and when viewed from the glass, the bottle is on the left. The bottle, glass, and bowl are on the desk.

図９の画像ｇ２７１における物体と物体の関係は、図９の画像ｇ２７２のように、ノードとエッジとノードで表され、このエッジがコストである特徴量に対応する。なお、関係を表す言葉を数値で表してラベルとする。このように、シーングラフでは、物体と物体の関係をノードとラベルを用いてグラフ表現で表すことができる。 The relationship between objects in image g271 in Figure 9 is represented by nodes, edges, and nodes, as in image g272 in Figure 9, and these edges correspond to the feature, which is the cost. Note that the words that express the relationship are expressed numerically as labels. In this way, in a scene graph, the relationship between objects can be represented graphically using nodes and labels.

＜エンドエフェクタと物体の姿勢の学習＞
図１０は、本実施形態に係るハンドがある方向と物体の姿勢の学習について説明するための図である。画像ｇ３０１のように、本実施形態では、オンライン処理で未知物体に対する作業を行うため、ハンドが物体のどの方向にあるのかを学習する。また、例えばグラスなどが倒れた状態では、倒れている口の方が上であるのか、グラスの側面が上でわるのか等、物体の向きで上の意味が分からない場合も生じる。このため、本実施形態では、方向と物体姿勢を分けて学習するようにした。このため、本実施形態では、相対ポーズとして、物体から見てハンドがある方向と、物体の姿勢それぞれを予め学習するようにした。なお、ノードは、撮影された画像やセンサ値によって決まり。エッジは学習済みのモデルを用いて生成される。また、グラフ表現は。ハンドから見た物体に対するものである。 <Learning the end effector and object pose>
FIG. 10 is a diagram illustrating learning of the hand direction and object posture according to this embodiment. As shown in image g301, this embodiment learns the orientation of the hand relative to the object in order to perform online processing on an unknown object. Furthermore, for example, if a glass is overturned, the meaning of "up" may not be clear from the object's orientation, such as whether the top is on top or whether the side of the glass is on top. For this reason, this embodiment learns the direction and object posture separately. Therefore, in this embodiment, the relative poses, the hand direction relative to the object and the object posture, are learned in advance. Note that nodes are determined by captured images and sensor values. Edges are generated using a trained model. The graph representation is for the object as seen from the hand.

画像ｇ３０２は、対象物体（質量系）から見てどの方向にハンドがあるのかを学習する方向の学習のイメージ図である。なお、エッジの特徴量は、例えば、ノード間の距離とノード間の角度と位置ベクトルである。
画像ｇ３０３は、物体姿勢の学習のイメージ図である。物体姿勢は、例えば物体が立っている、物体が床等に接していない、物体が倒れている（実施形態では「寝ている」ともいう）、物体が倒れていてかつ接地無しである。この場合のエッジの特徴量は、例えば、物体クオータ二オン、物体座標系での接触点と法線、重力加速度等である。 Image g302 is an image of learning the direction of the hand from the perspective of the target object (mass system). Note that the feature quantities of the edge are, for example, the distance between nodes, the angle between nodes, and the position vector.
Image g303 is an image diagram of object posture learning. The object posture may be, for example, an object standing, an object not touching the floor, an object lying down (also referred to as "lying down" in the embodiment), or an object lying down and not touching the ground. The edge feature quantities in this case include, for example, the object quaternion, the contact point and normal in the object coordinate system, and the gravitational acceleration.

本実施形態では、このように学習した結果を、図１１のように利用する。
図１１は、本実施形態に係る学習結果の利用例を説明するための図である。
シーングラフを用いて学習した結果、学数下モデルを以下のように使用できる。物体とハンドの関係は、例えば、「方向は、ハンドが物体の上または横にある」と、「姿勢は、物体が立っている」である。この場合の把持情報は、蓋または側面に把持候補の領域であるカプセルを配置する。また、物体とハンドの関係は、例えば、「方向は、ハンドが物体の上または横にある」と、「姿勢は、物体が横になっている」である。この場合の把持情報は、側面にカプセルを配置する。 In this embodiment, the results of this learning are used as shown in FIG.
FIG. 11 is a diagram for explaining an example of using the learning result according to this embodiment.
As a result of learning using a scene graph, the mathematical model can be used as follows. The relationship between the object and the hand is, for example, "the direction is that the hand is on top of or beside the object" and "the posture is that the object is standing." In this case, the grasping information is to place a capsule, which is a grasping candidate area, on the lid or side. Also, the relationship between the object and the hand is, for example, "the direction is that the hand is on top of or beside the object" and "the posture is that the object is lying down." In this case, the grasping information is to place a capsule on the side.

＜ハンドと物体の関係のシーングラフ＞
次に、ハンドと物体の関係をシーングラフで表すと、図１２のようになる。図１２は、ハンドと物体の関係のシーンと簡易的なグラフ表現を表す図である。なお、図１２の簡易的なグラフ表現は、図９におけるノード間のリンクの矢印を簡素化して示している。このようなシーングラフを、本実施形態では、ポーズシーングラフと言う。なお、ポーズグラフの構成は、ノードが例えばハンドから見た物体の座標であり、エッジが手と把持対象物の相対ポーズと、把持対象物と周辺環境物の相対ポーズと、把持対象物と床面の相対ポーズを足したものと座標である。 <Scene graph of hand-object relationships>
Next, the relationship between the hand and the object can be expressed in a scene graph as shown in Figure 12. Figure 12 is a diagram showing a scene of the relationship between the hand and the object and a simple graph representation. Note that the simple graph representation in Figure 12 shows a simplified version of the link arrows between the nodes in Figure 9. In this embodiment, such a scene graph is called a pose scene graph. Note that the configuration of the pose graph is such that the nodes are, for example, the coordinates of the object as seen from the hand, and the edges are the coordinates of the relative pose between the hand and the object to be grasped, the relative pose between the object to be grasped and the surrounding environment, and the relative pose between the object to be grasped and the floor.

＜手首姿勢の推定処理＞
次に、手首姿勢の推定処理の概要を説明する。図１３は、手首姿勢の推定処理の概要を示す図である。図１３のように、本実施形態では。手首姿勢の推定に、前述したポーズシーングラフと、コストシーングラフを用いる。 <Wrist Posture Estimation Processing>
Next, an outline of the wrist posture estimation process will be described. Fig. 13 is a diagram showing an outline of the wrist posture estimation process. As shown in Fig. 13, in this embodiment, the above-mentioned pose scene graph and cost scene graph are used to estimate the wrist posture.

ここで、コストシーングラフについて説明する。図１４は、コストシーングラフを説明するための図である。画像ｇ３５１は、ハンドが物体（ボウル）の上にある状態を示す。画像ｇ３５２は、ハンドが物体の側面（ボトル）にある状態を示す。画像ｇ３５３は、グラフ表現である。このような場合は、ハンドと物体の関係は、他の情報（手の位置、指の角度、ハンドのポーズ、摩擦等）も必要である。この結果、１つのノードに対して、画像ｇ２５３のｇ３５４のように、複数の状態を取り得る場合がある。本実施形態では。このような複数の状態毎にコストを与える。なお、コストは、指先接触点まわりのＱｕａｌｉｔｙＭｅａｓｕｒｅ（横力二乗和最小）の値を用いる。なお、コストグラフの構成は、ノードが中心線を構成する点に手首のポーズをかけたものであり、エッジがスケルトンに対する把持中心の相対ポーズと、スケルトン周囲の物体表面局面と指局面の差の畳み込みと、ＱｕａｌｉｔｙＭｅａｓｕｒｅの値を足したものと座標である。なお、ノードは、例えばハンドでつかめる位置、把持できる位置等である。なお、ノードの数は、中心線を構成する点の個数に、取り得る手首のポーズの個数を掛けた個数となる。なお、この理由は、例えば、ハンドと的部が物体の上に位置し、把持する点が決まっていても、手首姿勢に応じて、物体の把持のされ方が変わるためである。 Now, let's explain the cost scene graph. Figure 14 is a diagram for explaining the cost scene graph. Image g351 shows a state in which the hand is on top of an object (bowl). Image g352 shows a state in which the hand is on the side of an object (bottle). Image g353 is a graph representation. In such cases, the relationship between the hand and the object requires other information (hand position, finger angle, hand pose, friction, etc.). As a result, multiple states may be possible for a single node, as shown in image g354 in image g253. In this embodiment, a cost is assigned for each of these multiple states. The cost is calculated using the Quality Measure (minimum sum of squares of lateral forces) value around the fingertip contact point. The cost graph is constructed by multiplying the points that make up the center line by the wrist pose, and the edges are coordinates obtained by adding the relative pose of the grasp center with respect to the skeleton, the convolution of the difference between the object surface curve around the skeleton and the finger curve, and the Quality Measure value. Note that nodes are, for example, positions that can be grasped with the hand, or positions that can be grasped. The number of nodes is the number of points that make up the center line multiplied by the number of possible wrist poses. The reason for this is that even if the hand and target are positioned on an object and the grasping point is fixed, the way the object is grasped will change depending on the wrist posture.

本実施形態では、このように、ポーズグラフを用いて、ハンドと把持対象物の相対ポーズ、または把持対象物と周囲環境の相対ポーズ、あるいは把持対象と床面の相対ポーズを作成する。
次に、本実施形態では、コストグラフを用いて、スケルトンに対する把持中心の相対ポーズと、スケルトン周囲の物体（または表面、曲面）と指曲面の差の畳み込み（物体の近くに指があるのか否か、近くにあった場合、中心線を構成する複数の点のうちのどの点を把持代表点とするか）と、ＱｕａｌｉｔｙＭｅａｓｕｒｅの値をエッジにコストを持たせたシーングラフを作成する。
さらに、本実施形態では、エッジのコストを畳み込んで、最小コストとなり得るノードを見つけることで、手首姿勢を決定する。
また、本実施形態では、実際にエンドエフェクタを操作して、対象物体を把持させた結果である訓練データをデータベースに蓄積させる。 In this embodiment, the pose graph is used to create the relative pose between the hand and the object to be grasped, or the relative pose between the object to be grasped and the surrounding environment, or the relative pose between the object to be grasped and the floor surface.
Next, in this embodiment, a cost graph is used to create a scene graph in which edges are assigned costs based on the relative pose of the grasp center with respect to the skeleton, the convolution of the difference between the object (or surface, curved surface) around the skeleton and the finger curved surface (whether or not a finger is near the object, and if so, which of the multiple points constituting the center line should be used as the grasp representative point), and the Quality Measure value.
Furthermore, in this embodiment, the wrist posture is determined by folding the edge costs and finding the node that can have the minimum cost.
In addition, in this embodiment, training data, which is the result of actually operating the end effector and gripping a target object, is stored in a database.

そして、本実施形態では、オンラインで、持ち方を、ポーズシーングラフと、コストシーングラフを用いて、蓄積された訓練データを正解データとして、正解データと関連付けられるアテンションノードを含む、ネットワーク層（モデル）を備える。 In this embodiment, the system uses a pose scene graph and a cost scene graph to determine the grip style online, and includes a network layer (model) that uses accumulated training data as correct answer data and includes attention nodes that are associated with the correct answer data.

操作時は、実際の現在の手首姿勢を、このネットワーク（学習済みのモデル）に入力することで、把持における手首姿勢が推定できる。 During operation, the actual current wrist posture can be input into this network (a trained model) to estimate the wrist posture during grasping.

＜エンドエフェクタの構成例＞
図１５は、本実施形態に係る作業を行わせるエンドエフェクタの構成例を示す図である。図１５のように、エンドエフェクタ１（ハンド）は、指部１０１、指部１０２、指部１０３、指部１０４、および基体１１１を備える。エンドエフェクタ１は、エンドエフェクタ１の位置を移動可能な移動機構であるアーム１２１に関節を介して接続されている。 <Example of end effector configuration>
Fig. 15 is a diagram showing an example of the configuration of an end effector that performs work according to this embodiment. As shown in Fig. 15, end effector 1 (hand) includes finger portions 101, 102, 103, and 104, and a base 111. End effector 1 is connected via a joint to arm 121, which is a movement mechanism that can move the position of end effector 1.

また、指部１０１は、例えば指先に力センサ１４１を備える。指部１０２は、例えば指先に力センサ１４２を備える。指部１０３は、例えば指先に力センサ１４３を備える。指部１０４は、例えば指先に力センサ１４４を備える。指部１０１は例えば人間の親指に相当し、指部１０２は例えば人間の人差し指に相当し、指部１０３は例えば人間の中指に相当し、指部１０４は例えば人間の薬指に相当する。
なお、エンドエフェクタ１は、少なくとも２つの指部を備える。なお、指部の数は、３つ以上であってもよい。 Furthermore, finger unit 101 is equipped with a force sensor 141, for example, at its fingertip. Finger unit 102 is equipped with a force sensor 142, for example, at its fingertip. Finger unit 103 is equipped with a force sensor 143, for example, at its fingertip. Finger unit 104 is equipped with a force sensor 144, for example, at its fingertip. Finger unit 101 corresponds to, for example, a human thumb, finger unit 102 corresponds to, for example, a human index finger, finger unit 103 corresponds to, for example, a human middle finger, and finger unit 104 corresponds to, for example, a human ring finger.
The end effector 1 has at least two fingers, but may have three or more fingers.

アーム１２１は、エンドエフェクタ１の手首姿勢を変えることができる機構部でもある。
図１６のように、エンドエフェクタ１、およびアーム１２１は、制御装置２００によって制御される。図１６は、本実施形態に係る制御装置の構成例を示す図である。 The arm 121 is also a mechanism that can change the wrist position of the end effector 1 .
16, the end effector 1 and the arm 121 are controlled by a control device 200. Fig. 16 is a diagram showing an example of the configuration of the control device according to this embodiment.

制御装置２には、撮影装置５（センサ）と、指示装置７と、エンドエフェクタ１と、アーム１２１とが、有線または無線によって接続されている。
エンドエフェクタ１は、指部１０１～１０４（図１５）、基体１１１（図１５）に加えて、アクチュエータ１６１、およびセンサ１５１を備える。
アーム１２１は、アクチュエータ１７１、およびセンサ１８１を備える。 The control device 2 is connected to an imaging device 5 (sensor), an instruction device 7, an end effector 1, and an arm 121 by wire or wirelessly.
The end effector 1 includes fingers 101 to 104 (FIG. 15) and a base 111 (FIG. 15), as well as an actuator 161 and a sensor 151.
The arm 121 includes an actuator 171 and a sensor 181 .

指示装置７は、例えば、センサ７１、および通信部７２を備える。 The indicator device 7 includes, for example, a sensor 71 and a communication unit 72.

制御装置２は、取得部２１、制御部２２、エンドエフェクタ駆動部２３、アーム駆動部２４、および手首姿勢決定装置３を備える。
手首姿勢決定装置３は、中心線作成部３１、把持代表決定部３２、学習部３３、記憶部３４、決定部３７、およびタクソノミー決定部３８を備える。記憶部３３は、モデル３５、および訓練データ３６を記憶する。 The control device 2 includes an acquisition unit 21 , a control unit 22 , an end effector driving unit 23 , an arm driving unit 24 , and a wrist posture determination device 3 .
The wrist posture determination device 3 includes a center line creation unit 31, a grasp representative determination unit 32, a learning unit 33, a storage unit 34, a determination unit 37, and a taxonomy determination unit 38. The storage unit 33 stores a model 35 and training data 36.

指示装置７は、作業者が手に装着するデータグローブである。なお、指示装置７は、例えば視線検出を行うＨＭＤ(ヘッドマウントディスプレイ)を備えていてもよい。
センサ７１は、例えば６軸センサ、ジャイロセンサ、圧力センサ等である。センサ７１は、少なくとも指の位置や手首の位置とその軌跡を検出する。
通信部７２は、センサ７１が検出したセンサ値を制御装置２に送信する。 The instruction device 7 is a data glove that is worn by the worker. The instruction device 7 may include, for example, an HMD (head mounted display) that detects the worker's line of sight.
The sensor 71 is, for example, a six-axis sensor, a gyro sensor, a pressure sensor, etc. The sensor 71 detects at least the positions of the fingers and the wrist and their trajectories.
The communication unit 72 transmits the sensor value detected by the sensor 71 to the control device 2 .

撮影装置５は、深度測定も可能なＲＧＢ－Ｄカメラである。撮影装置５は、撮影した画像を用いて、対象物体の位置を求める。撮影装置５は、撮影した画像を用いて、対象物体がある床や地面等の環境の位置を求める。 The image capture device 5 is an RGB-D camera that is also capable of depth measurement. The image capture device 5 determines the position of the target object using the captured image. The image capture device 5 uses the captured image to determine the position of the environment, such as the floor or ground, where the target object is located.

アクチュエータ１６１は、例えば、エンドエフェクタの各指部１０１～１０３、および各関節に設けられている。
センサ１５１は、例えば６軸センサ、位置センサ、圧力センサ等である。 The actuators 161 are provided, for example, on each of the fingers 101 to 103 of the end effector and on each of the joints.
The sensor 151 is, for example, a six-axis sensor, a position sensor, a pressure sensor, or the like.

アクチュエータ１７１は、例えばエンドエフェクタ１との接続部等に設けられている。
センサ１８１は、例えば６軸センサ、位置センサ、圧力センサ等である。 The actuator 171 is provided, for example, at a connection portion with the end effector 1 .
The sensor 181 is, for example, a six-axis sensor, a position sensor, a pressure sensor, or the like.

取得部２１は、撮影装置５が撮影した画像を取得する。取得部２１は、センサ１５１が検出したセンサ値を取得する。取得部２１は、センサ１８１が検出したセンサ値を取得する。取得部２１は、センサ値を行う指示装置７から取得する。 The acquisition unit 21 acquires images captured by the imaging device 5. The acquisition unit 21 acquires sensor values detected by the sensor 151. The acquisition unit 21 acquires sensor values detected by the sensor 181. The acquisition unit 21 acquires the sensor values from the indicator device 7.

制御部２２は、取得部２１が取得した情報と、手首姿勢決定装置３が決定した情報を用いて、アーム駆動指令を生成する。制御部２２は、取得部２１が取得した情報を用いて、エンドエフェクタ駆動指令を生成する。 The control unit 22 generates an arm drive command using the information acquired by the acquisition unit 21 and the information determined by the wrist posture determination device 3. The control unit 22 generates an end effector drive command using the information acquired by the acquisition unit 21.

エンドエフェクタ駆動部２３は、制御部２２が生成したエンドエフェクタ駆動指令に基づいて、エンドエフェクタ１を制御する。 The end effector driving unit 23 controls the end effector 1 based on the end effector driving command generated by the control unit 22.

アーム駆動部２４は、制御部２２が生成したアーム駆動指令に基づいて、アーム１２１を制御する。 The arm driving unit 24 controls the arm 121 based on the arm driving command generated by the control unit 22.

次に、手首姿勢決定装置３について説明する。
中心線作成部３１は、撮影された画像に対して画像処理を行って、少なくとも１つの点を含む中心を示す中心線を作成する。 Next, the wrist posture determination device 3 will be described.
The center line creating unit 31 performs image processing on the captured image to create a center line that indicates the center and includes at least one point.

把持代表決定部３２は、学習済みのモデル３５を用いて把持代表点を決定する。 The grasp representative determination unit 32 determines the grasp representative point using the trained model 35.

学習部３３は、学習時に、上述したようにポーズシーングラフと、コストシーングラフを生成する。学習部３３は、学習時、訓練データ３６を用いてモデル３５を学習する。学習部３３は、学習したモデル３５を記憶部３４に記憶させる。 During learning, the learning unit 33 generates a pose scene graph and a cost scene graph as described above. During learning, the learning unit 33 learns a model 35 using training data 36. The learning unit 33 stores the learned model 35 in the memory unit 34.

決定部３７は、作業時、実際の手首姿勢をセンサ１８１から取得する。決定部３７は、取得した実際の手首姿勢を学習済みのモデル３５に入力して手首姿勢をオンラインで決定する。 The determination unit 37 acquires the actual wrist posture from the sensor 181 during work. The determination unit 37 inputs the acquired actual wrist posture into the trained model 35 to determine the wrist posture online.

タクソノミー決定部３８は、撮影された画像に基づいて、対象物体の位置、姿勢等を推定する。タクソノミー決定部３８は、推定した結果に基づいてタクソノミーを決定する。 The taxonomy determination unit 38 estimates the position, orientation, etc. of the target object based on the captured image. The taxonomy determination unit 38 determines the taxonomy based on the estimation results.

＜処理手順例＞
次に、学習時の手首姿勢決定装置３が行う処理手順例を説明する。図１７は、本実施形態に係る学習時の手首姿勢決定装置が行う処理手順のフローチャートである。 <Example of processing procedure>
Next, an example of the processing procedure performed by the wrist posture determination device 3 during learning will be described. Fig. 17 is a flowchart of the processing procedure performed by the wrist posture determination device during learning according to this embodiment.

（ステップＳ１０１）撮影装置５は、対象物体を撮影する。 (Step S101) The imaging device 5 captures an image of the target object.

（ステップＳ１０２）撮影装置５は、対象物体の位置と、環境の位置を求める。 (Step S102) The image capture device 5 determines the position of the target object and the position of the environment.

（ステップＳ１０３）学習部３３は、ノードに例えばハンドから見た物体の座標を用いて、エッジに「手と把持対象物の相対ポーズと、把持対象物と周辺環境物の相対ポーズと、把持対象物と床面の相対ポーズ」と座標を用いてポーズシーングラフを作成する。 (Step S103) The learning unit 33 creates a pose scene graph using, for example, the coordinates of the object as seen from the hand as nodes, and the coordinates of "the relative pose between the hand and the grasped object, the relative pose between the grasped object and the surrounding environment, and the relative pose between the grasped object and the floor" as edges.

（ステップＳ１０４）学習部３３は、ノードに中心線を構成する点に手首のポーズをかけたものを用いて、エッジに「スケルトンに対する把持中心の相対ポーズと、スケルトン周囲の物体表面局面と指局面の差の畳み込みと、ＱｕａｌｉｔｙＭｅａｓｕｒｅの値」と座標を用いてコストシーングラフを作成する。 (Step S104) The learning unit 33 creates a cost scene graph using the points that make up the center line multiplied by the wrist pose for the nodes, and the edges using the coordinates of "the relative pose of the grasp center with respect to the skeleton, the convolution of the difference between the surface curve of the object around the skeleton and the finger curve, and the Quality Measure value."

（ステップＳ１０５）学習部３３は、訓練データ３６とポーズシーングラフとコストシーングラフとを用いてモデル３５を学習させて、学習させたモデル３５を記憶部３４に記憶させる。 (Step S105) The learning unit 33 uses the training data 36, the pose scene graph, and the cost scene graph to train the model 35, and stores the trained model 35 in the memory unit 34.

次に、作業時、オンラインでの手首姿勢決定装置３が行う処理手順例を説明する。図１８は、本実施形態に係る作業時の手首姿勢決定装置が行う処理手順のフローチャートである。 Next, we will explain an example of the processing procedure performed by the online wrist posture determination device 3 during work. Figure 18 is a flowchart of the processing procedure performed by the wrist posture determination device during work according to this embodiment.

（ステップＳ２０１）撮影装置５は、対象物体を撮影する。 (Step S201) The imaging device 5 captures an image of the target object.

（ステップＳ２０２）タクソノミー決定部３８は、撮影された画像に基づいて、対象物体の位置、姿勢等を推定する。タクソノミー決定部３８は、推定した結果に基づいてタクソノミーを決定する。 (Step S202) The taxonomy determination unit 38 estimates the position, orientation, etc. of the target object based on the captured image. The taxonomy determination unit 38 determines a taxonomy based on the estimation results.

（ステップＳ２０３）中心線作成部３１は、撮影された画像に対して画像処理を行って、少なくとも１つの点を含む中心を示す中心線を作成する。なお、手首姿勢決定装置３は、ステップＳ２０１～Ｓ２０３の処理を把持開始時に行う。 (Step S203) The center line creation unit 31 performs image processing on the captured image to create a center line that indicates the center and includes at least one point. Note that the wrist posture determination device 3 performs the processes of steps S201 to S203 at the start of grasping.

（ステップＳ２０４）作業時、実際の手首姿勢をセンサ１８１から取得する。 (Step S204) During work, the actual wrist posture is acquired from sensor 181.

（ステップＳ２０５）把持代表点決定部３２は、未知の対象物に対する把持代表点を、実際の現在の手首姿勢を学習済みのモデル３５に入力して、オンラインで決定する。 (Step S205) The grasp representative point determination unit 32 determines the grasp representative point for the unknown object online by inputting the actual current wrist posture into the learned model 35.

（ステップＳ２０５）決定部３７は、未知の対象物に対する手首姿勢を、決定した把持代表点を対象物体の把持中心として把持するように、ハンドの手首姿勢を決定する。決定部３７は、例えば、シーングラフと、コストシーングラフと、実際の現在の手首姿勢を学習済みのモデル３５に入力して、オンラインで決定する。なお、決定部３７は、２つのシーングラフのエッジをノード回りに畳み込んで最小コストとなりうるノードを見つけることで手首姿勢を決定する。 (Step S205) The determination unit 37 determines the wrist posture of the hand with respect to the unknown object so that the target object is grasped with the determined grasp representative point as the grasp center. The determination unit 37 determines the wrist posture online, for example, by inputting the scene graph, cost scene graph, and actual current wrist posture into the trained model 35. The determination unit 37 determines the wrist posture by folding the edges of the two scene graphs around a node and finding the node that can provide the minimum cost.

以上のように、本実施形態では、オンライン処理可能な手首姿勢決定を、シーングラフと、コストシーングラフと、実際の現在の手首姿勢を学習済みのモデル３５に入力して、オンラインで決定する。モデル３５を用いた場合は、把持代表点をモデル３５で決定し、さらに対象物体に近づいてくるエンドエフェクタの手首の取り得る姿勢のうちからモデル３５を用いて最適な手首姿勢を決定することになる。 As described above, in this embodiment, a wrist posture that can be processed online is determined online by inputting the scene graph, cost scene graph, and actual current wrist posture into the trained model 35. When model 35 is used, the grasp representative point is determined by model 35, and the optimal wrist posture is then determined using model 35 from among the postures that the wrist of the end effector can take as it approaches the target object.

本実施形態では、物体の幾何構造（未知ポリゴンの主にスケルトン情報）と、タクソノミーの把持中心と、把持時の指先曲率と、指先接触点まわりのＱｕａｌｉｔｙＭｅａｓｕｒｅ（横力二乗和最小）と、ハンドと物体と周辺環境物のシーングラフに応じて決定するようにした。また、本実施形態では、ハンドと把持対象物の相対ポーズと、把持対象物と周辺環境物の相対ポーズと、把持対象物と床面の相対ポーズと、に基づいてポーズグラフを作成するようにした。 In this embodiment, the position is determined based on the geometric structure of the object (mainly skeleton information of unknown polygons), the taxonomy grasp center, the fingertip curvature during grasp, the Quality Measure (minimum sum of squared lateral forces) around the fingertip contact point, and a scene graph of the hand, object, and surrounding environment. Furthermore, in this embodiment, a pose graph is created based on the relative pose between the hand and the grasped object, the relative pose between the grasped object and the surrounding environment, and the relative pose between the grasped object and the floor.

また、本実施形態では、スケルトンに対する把持中心（把持代表点）の相対ポーズと、スケルトン周囲の物体表面局面と指局面の差の畳み込みと、ＱｕａｌｉｔｙＭｅａｓｕｒｅの値をコストとしたコストシーングラフを作成するようにした。なお、把持対象物と周辺環境物の相対ポーズは、仮に手首姿勢を決め、指部の間の仮の把持中心（把持代表点）に基づいて、指部と仮の把持中心との距離の差が求まる。手首姿勢決定装置３は、中心線のうち複数の点、１つ１つについてコストを算出する。また、手首姿勢決定装置３は、摩擦が生じるように、物体の中心線の周辺の物体表面の曲面と、エンドエフェクタ１の指部の指曲面との複数の差を、たたみ込んで１つの数値とする。中心線の近くに把持中心がある場合、接触点での横摩擦力の総和の力の釣り合いも検討する必要があるため、手首姿勢決定装置３は、ＱｕａｌｉｔｙＭｅａｓｕｒｅの値として算出してコストに用いる。そして、本実施形態では、２つのシーングラフのエッジをノード回りに畳み込んで最小コストとなりうるノードを見つけることで手首姿勢を決定ようにした。 In this embodiment, a cost scene graph is created using the relative pose of the grasp center (grasp representative point) relative to the skeleton, the convolution of the difference between the surface curve of the object around the skeleton and the finger curve, and the Quality Measure value as the cost. The relative pose of the grasped object and the surrounding environment is calculated by determining a provisional wrist posture and calculating the difference in distance between the finger and the provisional grasp center (grasp representative point) based on the provisional grasp center between the fingers. The wrist posture determination device 3 calculates the cost for each of multiple points on the center line. The wrist posture determination device 3 also convolves multiple differences between the curved surface of the object surface around the center line of the object and the curved surface of the fingers of the end effector 1 to generate a single numerical value so that friction occurs. When the grasp center is near the center line, it is necessary to consider the balance of the total lateral friction force at the contact point. Therefore, the wrist posture determination device 3 calculates this as the Quality Measure value and uses it as the cost. In this embodiment, the wrist posture is determined by folding the edges of the two scene graphs around a node to find the node with the lowest possible cost.

これにより、本実施形態によれば、オンラインで手首姿勢を決定できる。また、本実施形態によれば、複数の拘束条件の中から最も適した条件を選択でき、これによりエンドエフェクタ１の手首の移動方法を決定することができる。これにより、本実施形態によれば、処理が収束しやすくなる。 As a result, this embodiment allows the wrist posture to be determined online. Furthermore, this embodiment allows the most appropriate conditions to be selected from multiple constraint conditions, thereby determining the method of movement of the wrist of the end effector 1. As a result, this embodiment makes it easier for processing to converge.

なお、把持代表点、手首姿勢の決定に用いるモデル３５の例として、ポーズシーングラフと、コストシーングラフの２つを用いる例を説明したが、これに限らない。対象物体の形状、対象物体の姿勢、手首の現在の姿勢等に応じて、モデル３５は、ポーズシーングラフのみでもよい。 Note that while an example of the model 35 used to determine the grasp representative point and wrist posture has been described using both a pose scene graph and a cost scene graph, this is not limiting. Depending on the shape of the target object, the posture of the target object, the current posture of the wrist, etc., the model 35 may be a pose scene graph only.

＜モデルを用いない場合の姿勢決定方法＞
なお、上述した例では、モデル３５を用いて手首姿勢を決定する例を説明したが、これに限らない。図１９は、モデルを用いない場合の手首姿勢の決定方法を説明するための図である。 <Attitude determination method without using a model>
In the above example, the wrist posture is determined using the model 35, but the present invention is not limited to this. Fig. 19 is a diagram for explaining a method of determining the wrist posture without using a model.

図１９において、画像ｇ３００は、対象物体ｏｂｊの上側にエンドエフェクタ１と手首が位置している状態から、エンドエフェクタ１を対象物体ｏｂｊに近づけていく例である。画像ｇ４１０は、対象物体ｏｂｊの斜め右上にエンドエフェクタ１と手首が位置している状態から、エンドエフェクタ１を対象物体ｏｂｊに近づけていく例である。
モデル３５を用いない場合、手首姿勢決定装置３は、中心線を構成する複数の点の中から１つを、選択された複数の操作形状（対象物体に近づいてくるエンドエフェクタの手首の取り得る姿勢）のうちの一つをタクソノミーに応じて選択肢、把持代表点として定める。そして、エンドエフェクタ１が対象物体ｏｂｊが置かれている床面にぶつからない角度範囲で手首姿勢を傾ける。または、把持中心を傾けてもよい範囲を求める。なお、図１９において、符号ｇ４０１、ｇ４１１１は、対象物体における把持中心である。また、符号ｇ４０２、４１２は、エンドエフェクタ１の把持中心である。なお、エンドエフェクタ１の把持中心は、エンドエフェクタ１が備えるセンサ１５１が検出したセンサ値に基づいて算出する。 19, image g300 is an example in which the end effector 1 and the wrist are positioned above the target object obj and are then moved closer to the target object obj. Image g410 is an example in which the end effector 1 and the wrist are positioned diagonally to the upper right of the target object obj and are then moved closer to the target object obj.
When the model 35 is not used, the wrist posture determination device 3 selects one of the multiple points constituting the center line from the multiple selected operation shapes (possible postures of the wrist of the end effector approaching the target object) according to the taxonomy and determines it as the grasp representative point. Then, the wrist posture is tilted within an angle range that prevents the end effector 1 from colliding with the floor surface on which the target object obj is placed. Alternatively, the range within which the grasp center can be tilted is determined. In FIG. 19 , symbols g401 and g4111 indicate the grasp centers of the target object. Symbols g402 and g412 indicate the grasp center of the end effector 1. The grasp center of the end effector 1 is calculated based on the sensor value detected by the sensor 151 provided in the end effector 1.

なお、本発明における制御装置２００、手首姿勢決定装置３の機能の一部または全てを実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより制御装置２００、手首姿勢決定装置３が行う処理の全てまたは一部を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータシステム」は、ホームページ提供環境（あるいは表示環境）を備えたＷＷＷシステムも含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。 In addition, a program for implementing some or all of the functions of the control device 200 and wrist posture determination device 3 of the present invention may be recorded on a computer-readable recording medium, and the program recorded on this recording medium may be loaded into a computer system and executed to perform all or part of the processing performed by the control device 200 and wrist posture determination device 3. Note that the term "computer system" here includes hardware such as an OS and peripheral devices. It also includes a WWW system equipped with a homepage provision environment (or display environment). Furthermore, "computer-readable recording medium" refers to portable media such as flexible disks, optical magnetic disks, ROMs, and CD-ROMs, as well as storage devices such as hard disks built into computer systems. Furthermore, "computer-readable recording medium" also includes devices that retain a program for a certain period of time, such as volatile memory (RAM) within a computer system that acts as a server or client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line.

また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであってもよい。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。 The above program may also be transmitted from a computer system that stores the program in a storage device or the like to another computer system via a transmission medium, or by transmission waves within the transmission medium. Here, the "transmission medium" that transmits the program refers to a medium that has the function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The above program may also be one that realizes part of the above-mentioned functions. Furthermore, it may be a so-called differential file (differential program) that can realize the above-mentioned functions in combination with a program already recorded on the computer system.

以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形および置換を加えることができる。 The above describes the form for carrying out the present invention using an embodiment, but the present invention is in no way limited to such an embodiment, and various modifications and substitutions can be made without departing from the spirit of the present invention.

１…エンドエフェクタ、１０１，１０２，１０３，１０４…指部、１１１…基体、１２１…アーム、５…撮影装置、７…指示装置、１５１…センサ、１６１…アクチュエータ、１７１…アクチュエータ、１８１…センサ、７１…センサ、７２…通信部、
２１…取得部、２２…制御部、２３…エンドエフェクタ駆動部、２４…アーム駆動部、３…手首姿勢決定装置、３１…中心線作成部、３２…把持代表決定部、３３…学習部、３４…記憶部、３７…決定部、３８…タクソノミー決定部、３５…モデル、３６…訓練データ 1... end effector, 101, 102, 103, 104... finger portion, 111... base body, 121... arm, 5... imaging device, 7... instruction device, 151... sensor, 161... actuator, 171... actuator, 181... sensor, 71... sensor, 72... communication unit,
21...acquisition unit, 22...control unit, 23...end effector driving unit, 24...arm driving unit, 3...wrist posture determination device, 31...center line creation unit, 32...grasp representative determination unit, 33...learning unit, 34...storage unit, 37...determination unit, 38...taxonomy determination unit, 35...model, 36...training data

Claims

1. A method for determining an attitude of a moving mechanism that is capable of gripping and manipulating a target object , is connected to an end effector having a plurality of operation attitudes, and is capable of moving the position of the end effector, comprising:
a measuring step of measuring the target object by a sensor;
a calculation step of calculating a plurality of points indicating the center of the measured object;
a grip representative point determination step of determining one of the plurality of points as a grip representative point corresponding to one of the plurality of selected operation shapes;
and a posture determination step of determining a posture of the moving mechanism from the grip representative point ,
The posture determining step determines the grip representative point using a model that has been trained in advance using a pose scene graph that represents the relationship between the end effector and the target object, the relationship between the target object and objects in the periphery of the target object, and the posture of the target object using nodes and edges.
Posture determination method.

the attitude determination step determines a taxonomy related to grasping the target object based on the position of the target object and the attitude of the moving mechanism, and determines the attitude of the moving mechanism based on the taxonomy.
2. The method of claim 1.

The nodes of the pose scene graph are based on the coordinates of the target object as seen from the end effector,
The edges of the pose scene graph are costs based on the relationship between the end effector and the target object , the relationship between the target object and objects in the vicinity of the target object, and the posture of the target object.
2. The method of claim 1 .

The posture determination step determines the grip representative point using a model that has been trained in advance using a cost scene graph that represents the relative pose between the end effector and the target object , the target object and a plurality of points indicating the center of the target object, and possible postures of the moving mechanism using nodes and edges.
2. The method of claim 1 .

The cost scene graph has, as costs on edges, information based on a relative pose of a grip center with respect to a plurality of points indicating the center of the target object, information based on a difference between a curved surface of the target object around the plurality of points indicating the center of the target object and a curved surface of a finger unit of the end effector, and a Quality Measure value.
5. The method of claim 4 .