JP7809033B2

JP7809033B2 - Remote operation assistance system, remote operation assistance method, and program

Info

Publication number: JP7809033B2
Application number: JP2022136628A
Authority: JP
Inventors: 五十志奈良村; 了水谷; 智美福井
Original assignee: Honda Motor Co Ltd
Current assignee: Honda Motor Co Ltd
Priority date: 2022-08-30
Filing date: 2022-08-30
Publication date: 2026-01-30
Anticipated expiration: 2042-08-30
Also published as: JP2024033189A

Description

本発明は、遠隔操作補助システム、遠隔操作補助方法、およびプログラムに関する。 The present invention relates to a remote operation assistance system, a remote operation assistance method, and a program.

ロボットに物体を把持させる制御装置が提案されている。このような制御装置として、物理的に接触可能な把持点を仮決定したうえで、力のつり合いを考慮して把持力を決定し、その把持力の質を把持力の包絡体の体積として評価したうえで、より質の高い把持点を探索する手法が提案されている（例えば特許文献１参照）。 A control device has been proposed that allows a robot to grasp an object. One such control device involves tentatively determining a physically contactable grasp point, determining a grasp force taking into account the balance of forces, evaluating the quality of the grasp force as the volume of the envelope of the grasp force, and then searching for a grasp point of higher quality (see, for example, Patent Document 1).

特許第６４７６３５８号公報Patent No. 6476358

特許文献１に記載の技術等では、物体毎に形状や把持点を事前に登録しておき、物体形状を用いて物体の位置姿勢を推定したのちに、ロボットの現在姿勢と把持点との関係から把持するための軌道を生成する。このため、特許文献１に記載等の技術では、事前にシステムに登録された物体(形状や重量)しか扱えないという課題があった。 Technologies such as those described in Patent Document 1 register the shape and gripping point of each object in advance, estimate the object's position and orientation using the object's shape, and then generate a gripping trajectory based on the relationship between the robot's current orientation and the gripping point. For this reason, technologies such as those described in Patent Document 1 have the problem of only being able to handle objects (shapes and weights) that have been registered in the system in advance.

本発明は、上記の問題点に鑑みてなされたものであって、事前にシステムに登録されていない物体であっても作業可能な遠隔操作補助システム、遠隔操作補助方法、およびプログラムを提供することを目的とする。 The present invention was made in consideration of the above-mentioned problems, and aims to provide a remote operation assistance system, remote operation assistance method, and program that can perform work on objects that have not been registered in the system in advance.

（１）上記目的を達成するため、本発明の一態様に係る遠隔操作補助システムは、少なくともエンドエフェクタを遠隔操作する遠隔操作補助システムであって、少なくとも前記エンドエフェクタを操作する操作者の動作に関する情報を取得する動作取得部と、前記動作取得部で得た情報を用いて、前記エンドエフェクタによって操作する対象である対象物体と、前記対象物体の操作方法であるタスクを推定する意図理解部と、前記エンドエフェクタが操作される環境の環境情報を取得する環境状況判断部と、前記意図理解部の情報と前記環境状況判断部の情報を取得し、取得した情報から前記エンドエフェクタの操作量を決定する操作量決定部と、を備え、前記操作量決定部は、前記対象物体の物体形状を幾何基本要素に近似し、前記近似した結果を、前記エンドエフェクタに対する事前登録された操作戦略の登録情報と比較して反映する近似結果反映部を備える、遠隔操作補助システムである。 (1) In order to achieve the above object, one aspect of the present invention provides a remote operation assistance system for remotely operating at least an end effector, comprising: a motion acquisition unit that acquires information regarding the motion of an operator operating the end effector; an intention understanding unit that uses the information acquired by the motion acquisition unit to estimate a target object to be operated by the end effector and a task that is a method of operating the target object; an environmental situation determination unit that acquires environmental information about the environment in which the end effector is operated; and a manipulation amount determination unit that acquires information from the intention understanding unit and the environmental situation determination unit and determines a manipulation amount for the end effector from the acquired information, wherein the manipulation amount determination unit comprises an approximation result reflection unit that approximates the object shape of the target object to geometric primitives and compares the approximation result with registered information of a pre-registered manipulation strategy for the end effector and reflects the approximated result.

（２）また、本発明の一態様に係る（１）に記載の遠隔操作補助システムにおいて、前記近似結果反映部は、前記環境情報は、インスタンスＩＤを含む前記対象物体に関する形状情報であり、前記インスタンスＩＤ（識別子）毎に前記幾何基本要素の形状フィッティング処理を行う。なお、本実施形態において、対象物体に関する形状とは、ボクセル情報または点群情報である。 (2) In the remote operation assistance system according to (1) of one aspect of the present invention, the approximation result reflecting unit performs shape fitting of the geometric primitives for each instance ID (identifier) of the target object, where the environment information is shape information related to the target object including an instance ID. Note that in this embodiment, the shape related to the target object is voxel information or point cloud information.

（３）また、本発明の一態様に係る（１）または（２）に記載の遠隔操作補助システムにおいて、前記幾何基本要素は、１つ、または複数、あるいは注目領域のうちのいずれか１つである。 (3) In the remote operation assistance system according to (1) or (2) of one aspect of the present invention, the geometric basic element is one, multiple, or one of a region of interest.

（４）また、本発明の一態様に係る（３）に記載の遠隔操作補助システムにおいて、前記近似結果反映部は、前記幾何基本要素が前記注目領域の場合に、前記対象物体の平均形状を前記幾何基本要素に変形して領域間のマッピング関数を算出し、前記意図理解部によって推定された動作とタクソノミーと、前記環境状況判断部によって分離されたクラスＩＤに基づいて、把持確率マップを取得し、算出された領域間の対応のマッピング関数に基づいて、前記把持確率マップを前記幾何基本要素の領域にマッピングし、現在の手首姿勢から、幾何学的制約を満たしつつ、経路の短さと最短経路が最も確率が高くなる把持点を、最適化問題を解いて求める。 (4) Furthermore, in the remote operation assistance system described in (3) according to one aspect of the present invention, when the geometric primitive is the region of interest, the approximation result reflecting unit transforms the average shape of the target object into the geometric primitive and calculates a mapping function between regions, obtains a grasping probability map based on the motion and taxonomy estimated by the intention understanding unit and the class ID separated by the environmental situation determining unit, maps the grasping probability map to the region of the geometric primitive based on the calculated mapping function of correspondence between regions, and solves an optimization problem to find a grasping point that maximizes the probability of the shortest path and the shortest route while satisfying geometric constraints from the current wrist posture.

（５）また、本発明の一態様に係る（１）から（４）のうちのいずれか１つに記載の遠隔操作補助システムにおいて、前記環境状況判断部は、ＲＧＢデータを取得するＲＧＢセンサと、深度画像を取得する深度センサと、インスタンスセグメンテーション（ＩｎｓｔａｎｃｅＳｅｇｍｅｎｔａｔｉｏｎ）の手法を用いて、前記ＲＧＢセンサが出力するカラー画像を用いて、画像中の物体のインスタンス（Ｉｎｓｔａｎｃｅ）ＩＤに基づくマスク画像を生成するマスク生成部と、前記深度画像と前記マスク画像を用いて各インスタンスＩＤにおける深度画像を生成する加算部と、前記ＲＧＢセンサが出力するカラー画像に対して画像処理を行って、前記環境状況判断部の位置を検出する自己位置推定部と、前記自己位置推定部が出力する前記環境状況判断部の姿勢情報と、前記加算部が出力するクラス毎の各インスタンスＩＤにおける深度画像を用いて、クラス毎の各インスタンスＩＤのポイントクラウドを三次元再構成によって求める三次元再構成部と、キーフレーム毎に各インスタンスＩＤのポイントクラウドを取得し、時刻ｔの現在のデータ、過去のｔ－１、ｔ－２、・・・のフレームの時系列データを取得管理し、これらのデータを統合するデータ統合部と、を備える。 (5) In addition, in the remote operation assistance system according to any one of (1) to (4) according to one aspect of the present invention, the environmental situation determination unit includes an RGB sensor that acquires RGB data, a depth sensor that acquires a depth image, and an instance segmentation (Instance the mask generation unit uses the color image output by the RGB sensor to generate a mask image based on the instance ID of an object in the image using a method of image segmentation; an adder unit uses the depth image and the mask image to generate a depth image for each instance ID; a self-position estimation unit performs image processing on the color image output by the RGB sensor to detect the position of the environmental situation assessment unit; a three-dimensional reconstruction unit uses the posture information of the environmental situation assessment unit output by the self-position estimation unit and the depth image for each instance ID for each class output by the adder unit to perform three-dimensional reconstruction to obtain a point cloud for each instance ID for each class; and a data integration unit acquires a point cloud for each instance ID for each key frame, acquires and manages the current data at time t and time series data for past frames t-1, t-2, ..., and integrates this data.

（６）上記目的を達成するため、本発明の一態様に係る遠隔操作補助方法は、少なくともエンドエフェクタを遠隔操作する遠隔操作補助システムにおける遠隔操作補助方法であって、動作取得部が、少なくとも前記エンドエフェクタを操作する操作者の動作に関する情報を取得する動作取得ステップと、意図理解部が、前記動作取得ステップで得た情報を用いて、前記エンドエフェクタによって操作する対象である対象物体と、前記対象物体の操作方法であるタスクを推定する意図理解ステップと、環境状況判断部が、前記エンドエフェクタが操作される環境の環境情報を取得する環境状況判断ステップと、操作量決定部が、前記意図理解ステップで得られた情報と前記環境状況判断ステップで得られた情報を取得し、取得した情報から前記エンドエフェクタの操作量を決定する操作量決定ステップと、を有し、前記操作量決定ステップにおいて、前記操作量決定部が、前記対象物体の物体形状を幾何基本要素に近似し、前記近似した結果を、前記エンドエフェクタに対する事前登録された操作戦略の登録情報と比較して反映する、遠隔操作補助方法である。 (6) To achieve the above object, one aspect of the present invention provides a remote operation assistance method in a remote operation assistance system that remotely operates at least an end effector, the remote operation assistance method including: a motion acquisition step in which a motion acquisition unit acquires information about at least the motion of an operator operating the end effector; an intention understanding step in which an intention understanding unit uses the information acquired in the motion acquisition step to infer a target object to be operated by the end effector and a task that is a method of operating the target object; an environmental situation determination step in which an environmental situation determination unit acquires environmental information about the environment in which the end effector is operated; and a manipulation amount determination step in which a manipulation amount determination unit acquires the information acquired in the intention understanding step and the information acquired in the environmental situation determination step and determines a manipulation amount for the end effector from the acquired information; and in the manipulation amount determination step, the manipulation amount determination unit approximates the object shape of the target object to geometric primitives and compares the approximation result with registered information of a pre-registered manipulation strategy for the end effector to reflect the approximated result.

（７）上記目的を達成するため、本発明の一態様に係るプログラムは、少なくともエンドエフェクタを遠隔操作する遠隔操作補助システムにおけるコンピュータであって、少なくとも前記エンドエフェクタを操作する操作者の動作に関する動作情報を取得させ、前記動作情報を用いて、前記エンドエフェクタによって操作する対象である対象物体と、前記対象物体の操作方法であるタスクを推定させ、前記エンドエフェクタが操作される環境の環境情報を取得させ、推定された前記対象物体と前記タスクの情報と、前記環境情報を取得させ、取得された情報から前記エンドエフェクタの操作量を決定させ、前記エンドエフェクタの操作量の決定において、前記対象物体の物体形状を幾何基本要素に近似させ、前記近似された結果を、前記エンドエフェクタに対する事前登録された操作戦略の登録情報と比較して反映させる、を実行させるプログラムである。 (7) In order to achieve the above object, one aspect of the present invention provides a program for a computer in a remote operation assistance system that remotely operates at least an end effector, the program executing the following: acquiring motion information related to the motion of an operator operating the end effector; using the motion information, estimating a target object to be operated by the end effector and a task that is a method of operating the target object; acquiring environmental information about the environment in which the end effector is operated; acquiring information about the estimated target object and task and the environmental information; determining an operation amount for the end effector from the acquired information; approximating the object shape of the target object to geometric primitives in determining the operation amount for the end effector; and comparing the approximation result with registered information of a pre-registered operation strategy for the end effector and reflecting the result in the calculation.

（１）～（７）によれば、事前にシステムに登録されていない物体であっても作業可能にできる。 By using (1) to (7), it is possible to work with objects that have not been registered in the system in advance.

実施形態に係る遠隔操作補助システムの構成例を示す図である。1 is a diagram illustrating a configuration example of a remote operation assistance system according to an embodiment; 実施形態に係る環境状況判断部の構成例を示す図である。FIG. 2 is a diagram illustrating an example of the configuration of an environmental situation determination unit according to the embodiment. 実施形態に係る遠隔操作の概要を示す図である。FIG. 1 is a diagram illustrating an overview of remote control according to an embodiment. 幾何プリミティブの例を説明するための図である。FIG. 10 is a diagram illustrating an example of a geometric primitive. 実施形態に係る遠隔操作補助システムが行う処理手順例のフローチャートである。10 is a flowchart of an example of a processing procedure performed by the remote operation assistance system according to the embodiment. 実施形態に係る単一プリミティブ、複数プリミティブの場合の処理手順のフローチャートである。10 is a flowchart of a processing procedure for a single primitive and multiple primitives according to an embodiment. 実施形態に係る遠隔操作補助システムの構成の概要、処理の概要を説明するための図である。1 is a diagram for explaining an outline of a configuration and an outline of a process of a remote operation assistance system according to an embodiment; 比較例を説明するための図である。FIG. 10 is a diagram for explaining a comparative example.

以下、本発明の実施の形態について図面を参照しながら説明する。なお、以下の説明に用いる図面では、各部材を認識可能な大きさとするため、各部材の縮尺を適宜変更している。
なお、実施形態を説明するための全図において、同一の機能を有するものは同一符号を用い、繰り返しの説明は省略する。
また、本願でいう「ＸＸに基づいて」とは、「少なくともＸＸに基づく」ことを意味し、ＸＸに加えて別の要素に基づく場合も含む。また、「ＸＸに基づいて」とは、ＸＸを直接に用いる場合に限定されず、ＸＸに対して演算や加工が行われたものに基づく場合も含む。「ＸＸ」は、任意の要素（例えば、任意の情報）である。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the drawings used in the following description, the scale of each component is appropriately changed so that each component can be recognized.
In all the drawings for explaining the embodiments, the same reference numerals are used for components having the same functions, and repeated explanations will be omitted.
Furthermore, in this application, "based on XX" means "based on at least XX," and includes cases where it is based on other elements in addition to XX. Furthermore, "based on XX" is not limited to cases where XX is used directly, but also includes cases where it is based on XX that has been calculated or processed. "XX" is any element (for example, any information).

［遠隔操作補助システムの構成例］
図１は、本実施形態に係る遠隔操作補助システムの構成例を示す図である。図１のように、遠隔操作補助システム１は、環境状況判断部２（２－１，２－２，２－３，…）と、ＨＭＤ３と、指示検出部４と、動作取得部５と、意図理解部６と、操作量決定部７と、ロボット１５と、制御部１６を備える。
ＨＭＤ３は、例えば視線検出部３１を備える。
操作量決定部７は、例えば、モデル８と、データ統合部９と、近似結果反映部１７と、動作推定部１１と、把持ＤＢ１２と、データベース１４を備える。
近似結果反映部１７は、形状近似部１０と、把持計画部１３を備える。
ロボット１５は、例えば、エンドエフェクタ１５１と、アーム１５２と、センサ１５３を備える。 [Configuration example of remote operation assistance system]
1 is a diagram showing an example of the configuration of a remote operation assistance system according to this embodiment. As shown in FIG. 1, the remote operation assistance system 1 includes an environmental situation determination unit 2 (2-1, 2-2, 2-3, ...), an HMD 3, an instruction detection unit 4, an action acquisition unit 5, an intention understanding unit 6, an operation amount determination unit 7, a robot 15, and a control unit 16.
The HMD 3 includes, for example, a gaze detection unit 31 .
The operation amount determination unit 7 includes, for example, a model 8, a data integration unit 9, an approximation result reflection unit 17, a motion estimation unit 11, a grip DB 12, and a database 14.
The approximation result reflecting unit 17 includes a shape approximation unit 10 and a grasp planning unit 13 .
The robot 15 includes, for example, an end effector 151 , an arm 152 , and a sensor 153 .

環境状況判断部２は、後述するようにＲＧＢセンサ深度センサを備える。環境状況判断部２は、エンドエフェクタ１５１が操作される環境の環境情報を取得する。環境状況判断部２は、対象物体に関する形状情報を出力する。なお、本実施形態において、対象物体に関する形状とは、例えばボクセル情報または点群情報である。また、環境状況判断部２は、例えば操作環境に複数設置されている。なお、環境状況判断部２の構成例等は後述する。 The environmental situation determination unit 2 is equipped with an RGB sensor and a depth sensor, as described below. The environmental situation determination unit 2 acquires environmental information about the environment in which the end effector 151 is operated. The environmental situation determination unit 2 outputs shape information about the target object. Note that in this embodiment, the shape about the target object is, for example, voxel information or point cloud information. Furthermore, multiple environmental situation determination units 2 are installed in the operating environment, for example. Note that an example configuration of the environmental situation determination unit 2 will be described later.

ＨＭＤ３は、例えばヘッドマウントディスプレイ装置である。ＨＭＤ３は、例えば眼鏡型であってもよく、片目用であってもよく、両目用であってもよい。なお、ＨＭＤ３は、表示装置も備え、環境状況判断部２のＲＧＢセンサ２１１（図２）が撮影した画像を取得して表示してもよく、または取得した画像データを用いて生成した仮想画像を表示してもよい。 The HMD 3 is, for example, a head-mounted display device. The HMD 3 may be, for example, a glasses-type device, and may be for one eye or both eyes. The HMD 3 also includes a display device, and may acquire and display images captured by the RGB sensor 211 (Figure 2) of the environmental condition assessment unit 2, or may display virtual images generated using the acquired image data.

視線検出部３１は、例えば、操作者の眼の画像を撮影し、撮影した画像を画像処理して視線を検出する。 The gaze detection unit 31, for example, captures an image of the operator's eyes and performs image processing on the captured image to detect the gaze.

指示検出部４は、操作者の指の関節角度と手首の姿勢を検出する。指示検出部４は、例えば、データグローブ、関節角度検出センサ等である。 The instruction detection unit 4 detects the operator's finger joint angles and wrist posture. The instruction detection unit 4 is, for example, a data glove, a joint angle detection sensor, etc.

動作取得部５は、少なくともエンドエフェクタ１５１を操作する操作者の動作に関する情報を取得する。動作取得部５は、例えば、視線検出部３１が検出した検出結果と、指示検出部４が検出した検出結果を取得する。 The motion acquisition unit 5 acquires information about at least the motion of the operator operating the end effector 151. The motion acquisition unit 5 acquires, for example, the detection results detected by the gaze detection unit 31 and the detection results detected by the instruction detection unit 4.

意図理解部６は、動作取得部５が取得した情報を用いて、エンドエフェクタ１５１によって操作される対象である対象物体と、対象物体の操作方法であるタスク（操作意図）を推定する。 The intention understanding unit 6 uses the information acquired by the action acquisition unit 5 to estimate the target object to be operated by the end effector 151 and the task (operation intention), which is the method of operating the target object.

操作量決定部７は、意図理解部６が出力する情報と環境状況判断部２が出力する情報を取得し、取得した情報からエンドエフェクタ１５１の操作量を決定する。 The operation amount determination unit 7 acquires information output by the intention understanding unit 6 and information output by the environmental situation determination unit 2, and determines the operation amount of the end effector 151 from the acquired information.

モデル８は、動作推定部１１が動作推定の際に用いる学習済みのモデルである。 Model 8 is a trained model used by the motion estimation unit 11 when estimating motion.

データ統合部９は、複数の環境状況判断部２が出力する対象物体に関する形状情報に含まれるボクセル情報を用いて三次元形状の統合情報を生成する。なお、データ統合部９は、複数の環境状況判断部２が出力する対象物体に関する形状情報に含まれる点群情報を統合して三次元情報を生成するようにしてもよい。なお、統合情報はインスタンスＩＤを有しているので、データ統合部９は、インスタンスＩＤ毎に統合した三次元形状の統合情報を出力する。 The data integration unit 9 generates integrated information of the three-dimensional shape using voxel information contained in the shape information about the target object output by multiple environmental situation assessment units 2. The data integration unit 9 may also generate three-dimensional information by integrating point cloud information contained in the shape information about the target object output by multiple environmental situation assessment units 2. Since the integrated information includes an instance ID, the data integration unit 9 outputs integrated information of the three-dimensional shape integrated for each instance ID.

近似結果反映部１７は、近似結果をエンドエフェクタ１５１の事前登録された操作戦略の登録情報と比較して反映する。 The approximation result reflection unit 17 compares the approximation result with the registration information of the pre-registered operation strategy of the end effector 151 and reflects it.

形状近似部１０は、データ統合部９が出力する統合された三次元形状の統合情報を取得する。形状近似部１０は、インスタンスＩＤ毎に幾何プリミティブの形状フィッティング処理を行うことで幾何プリミティブ情報を生成する。なお、形状近似部１０は、後述するように単一プリミティブで形状近似を行うようにしてもよく、複数プリミティブで形状近似を行うようにしてもよく、注目領域で形状近似を行うようにしてもよい。なお、複数プリミティブにおいて、形状を分類するための情報は幾何情報を用いる。形状近似部１０は、複数プリミティブで形状近似する場合、それぞれのプリミティブをフィッティングした後に、各領域のアホーダンスを、データベース１４を参照して取得する。形状近似部１０は、例えば、物体のクラス（例えば、ペットボトル、スパナ、箱等）に分類する。 The shape approximation unit 10 acquires the integrated information of the integrated three-dimensional shape output by the data integration unit 9. The shape approximation unit 10 generates geometric primitive information by performing shape fitting processing on geometric primitives for each instance ID. As described below, the shape approximation unit 10 may perform shape approximation using a single primitive, multiple primitives, or a region of interest. For multiple primitives, geometric information is used as information for classifying shapes. When performing shape approximation using multiple primitives, the shape approximation unit 10 acquires the affordance of each region by referencing the database 14 after fitting each primitive. The shape approximation unit 10 classifies the objects, for example, into object classes (e.g., plastic bottle, wrench, box, etc.).

動作推定部１１は、意図理解部６から操作意図情報と、ロボット１５から関節角測定値と、形状近似部１０からインスタンスＩＤ（識別情報）と見えている（環境状況判断部２が検出できている）物体の幾何プリミティブを取得する。動作推定部１１は、モデル８を参照して、操作意図、ロボット状態、対象物体に紐づく可能な操作から、操作意図の集合である動作を推定する。例えば、両手でスパナを持ち替える動作は、右手把持と、両手支持と、左手把持との操作意図で構成される。例えば、動作推定部１１は、ロボット１５に次にどのような動作をさせるかを推定する。動作推定部１１は、推定したロボット１５のハンドの動作情報を出力する。なお、動作情報には、作業に関するタクソノミー情報（例えば参考文献１参照）も含まれる。モデル８は、データ量を削減するため、例えば多様なペットボトルのモデルを記憶せず、例えば代表的なペットボトルのモデルを記憶する。このため、動作推定部１１は、対象物体のクラスと、モデル８に記憶されている情報を用いることで、動作を推定することができる。 The motion estimation unit 11 acquires operation intention information from the intention understanding unit 6, joint angle measurements from the robot 15, and instance IDs (identification information) and geometric primitives of visible objects (detected by the environmental situation assessment unit 2) from the shape approximation unit 10. The motion estimation unit 11 references the model 8 to estimate a motion, which is a set of operation intentions, from the operation intention, robot state, and possible operations associated with the target object. For example, the motion of switching hands to hold a wrench is composed of operation intentions for right-hand grip, two-hand support, and left-hand grip. For example, the motion estimation unit 11 estimates the next motion the robot 15 should perform. The motion estimation unit 11 outputs estimated motion information for the robot 15's hands. The motion information also includes taxonomy information related to the task (see, for example, Reference 1). To reduce the amount of data, the model 8 does not store a wide variety of plastic bottle models, but instead stores a representative model of a plastic bottle. Therefore, the movement estimation unit 11 can estimate movement by using the class of the target object and the information stored in the model 8.

参考文献１；Thomas Feix, Javier Romero,他,“The GRASP Taxonomy of Human GraspTypes” IEEE Transactions on Human-Machine Systems ( Volume: 46, Issue: 1, Feb.2016),IEEE,p66-77 Reference 1: Thomas Feix, Javier Romero, et al., “The GRASP Taxonomy of Human GraspTypes” IEEE Transactions on Human-Machine Systems (Volume: 46, Issue: 1, Feb.2016), IEEE, p66-77

把持ＤＢ１２は、確率付き形状モデルを格納するデータベースであり、クラスＩＤ（識別子）毎に保持している。形状は、クラスの平均形状を意味する。確率は、操作意図毎の把持確率を意味しており、例えばポリゴン毎に保持している。この形状を、幾何プリミティブ形状に合わせて変形することで、得られたインスタンス形状での把持確率を得るが、ニューラルネットワークを用いて近似してもよい。把持ＤＢ１２は、物体のクラスの一般的な（単位体積当たりの）重量を格納する。把持ＤＢ１２は、タクソノミー毎、クラス毎に平均的な形状の場合の把持計画を格納する。 The grasping DB12 is a database that stores shape models with probabilities, and holds them for each class ID (identifier). Shape refers to the average shape of the class. Probability refers to the grasping probability for each operation intention, and is held for each polygon, for example. This shape is transformed to match the geometric primitive shape to obtain the grasping probability for the obtained instance shape, but this may also be approximated using a neural network. The grasping DB12 stores the general weight (per unit volume) of the object class. The grasping DB12 stores the grasping plan for the average shape for each taxonomy and class.

把持計画部１３は、形状近似部１０からインスタンスＩＤ毎の幾何プリミティブ形状情報と、動作推定部１１から推定された動作情報を取得する。把持計画部１３は、取得した情報に含まれるインタンスＩＤ毎の幾何プリミティブと、動作によって得られる未来での操作意図やタクソノミーと、操作意図毎の把持確率により把持点（把持する位置）を決定する。把持計画は、例えば、把持点決定後、接触するまで指を閉じ続けさせる。把持計画部１３は、把持点に基づく手先軌道列情報、指軌道列情報を出力する。なお、把持計画部１３が行う処理については、後述する。 The grasp planning unit 13 acquires geometric primitive shape information for each instance ID from the shape approximation unit 10 and motion information estimated from the motion estimation unit 11. The grasp planning unit 13 determines a grasp point (grasp position) based on the geometric primitive for each instance ID included in the acquired information, the future operation intention and taxonomy obtained from the motion, and the grasp probability for each operation intention. For example, the grasp plan keeps the fingers closed until contact occurs after the grasp point is determined. The grasp planning unit 13 outputs hand trajectory sequence information and finger trajectory sequence information based on the grasp point. The processing performed by the grasp planning unit 13 will be described later.

データベース１４は、例えば、プリミティブが備えるアホーダンスを格納する。データベース１４は、複数プリミティブの場合、それぞれのプリミティブが備えるアホーダンスを格納する。データベース１４は、インスタンスＩＤが所属するクラスＩＤを関連付けて格納する。データベース１４は、クラス毎の平均形状モデルを格納する。データベース１４は、注目領域のプリミティブについては、平均形状に関連付けて領域内の把持しやすい位置情報を格納する。 Database 14 stores, for example, the ahordances possessed by primitives. In the case of multiple primitives, database 14 stores the ahordances possessed by each primitive. Database 14 associates and stores instance IDs with the class IDs to which they belong. Database 14 stores average shape models for each class. For primitives in the region of interest, database 14 stores positional information within the region that is easy to grasp, associated with the average shape.

ロボット１５は、例えば、片腕アームロボット、双腕アームロボット、および多腕アームロボットのうちのいずれかである。
エンドエフェクタ１５１は、少なくとも３つの指部を備える。なお、エンドエフェクタ１５１が備える指の数は、タクソノミーを実現できる数であればよい。
アーム１５２は、先端にエンドエフェクタ１５１が接続されている。 The robot 15 is, for example, one of a single-arm robot, a double-arm robot, and a multi-arm robot.
The end effector 151 has at least three fingers. The number of fingers included in the end effector 151 may be any number that allows the taxonomy to be realized.
The arm 152 has an end effector 151 connected to its tip.

制御部１６は、把持計画部１３が出力する手先軌道列情報、指軌道列情報と、ロボット１５のセンサ１５３が出力する関節角測定値に基づいて、エンドエフェクタ１５１、アーム１５２の関節角指令値を生成して、ロボット１５の動作を制御する。 The control unit 16 generates joint angle command values for the end effector 151 and arm 152 based on the hand trajectory information and finger trajectory information output by the grasp planning unit 13 and the joint angle measurement values output by the sensor 153 of the robot 15, and controls the operation of the robot 15.

［環境状況判断部の構成例］
次に、環境状況判断部２の構成例を説明する。
図２は、本実施形態に係る環境状況判断部の構成例を示す図である。図２のように、環境状況判断部２は、例えば、環境センサ２１と、自己位置推定部２２と、マスク生成部２３と、加算部２４と、三次元再構成部２５と、データ統合部２６を備える。
環境センサ２１は、例えば、ＲＧＢセンサ２１１と、深度センサ２１２を備える。 [Configuration example of environmental situation determination unit]
Next, an example of the configuration of the environmental situation determination unit 2 will be described.
2 is a diagram showing an example of the configuration of the environmental situation determination unit according to this embodiment. As shown in FIG. 2, the environmental situation determination unit 2 includes, for example, an environmental sensor 21, a self-position estimation unit 22, a mask generation unit 23, an addition unit 24, a 3D reconstruction unit 25, and a data integration unit 26.
The environmental sensor 21 includes, for example, an RGB sensor 211 and a depth sensor 212 .

ＲＧＢセンサ２１１は、例えばＣＣＤ（Charge Coupled Device）撮像装置またはＣＭＯＳ（Complementary Metal Oxide Semiconductor）撮像装置等の撮影装置であり、ＲＧＢ（赤緑青）のカラー画像を出力する。また、ＲＧＢセンサ２１１は、例えば魚眼レンズを備える。 The RGB sensor 211 is an imaging device such as a CCD (Charge Coupled Device) imaging device or a CMOS (Complementary Metal Oxide Semiconductor) imaging device, and outputs an RGB (red, green, blue) color image. The RGB sensor 211 also includes, for example, a fisheye lens.

深度センサ２１２は、深度情報を含む深度画像を取得し、取得した深度画像を出力する。なお、深度画像には、深度センサ２１２から見た輝度情報が含まれている。 The depth sensor 212 acquires a depth image including depth information and outputs the acquired depth image. Note that the depth image also includes brightness information as seen by the depth sensor 212.

加算部２４は、深度画像とマスク画像を用いて各インスタンスＩＤにおける深度画像を生成し、生成した各インスタンスＩＤにおける深度画像を出力する。出力される画像は、インスタンスＩＤ毎の深度が含まれている。このため、同じインスタンスＩＤの領域がどこであるかを抽出できるので、インタンスＩＤ毎の距離を抽出できる。 The adder 24 generates a depth image for each instance ID using the depth image and mask image, and outputs the generated depth image for each instance ID. The output image contains the depth for each instance ID. This makes it possible to extract the locations of areas with the same instance ID, and therefore the distance for each instance ID.

マスク生成部２３は、例えばインスタンスセグメンテーション（Instance Segmentation）の手法を用いて、ＲＧＢセンサ２１１が出力するカラー画像を用いて、画像中の物体のインスタンス（Instance）ＩＤ（識別情報）に基づくマスク画像を生成して出力する。なお、インスタンスセグメンテーションとは、例えば、画像上やＲＧＢ－Ｄ画像に写っている物体インスタンスの前景領域マスクを、各物体インスタンスを互いに別の物と区別しながら推定する問題である。前景領域マスクとは、例えば［１ｏｒ０］の２値により「前景物体または背景」を表現する。 The mask generation unit 23 uses the color image output by the RGB sensor 211, for example, using a technique called instance segmentation, to generate and output a mask image based on the instance ID (identification information) of an object in the image. Instance segmentation is the problem of estimating the foreground region mask of an object instance appearing in an image or RGB-D image, while distinguishing each object instance from other objects. The foreground region mask represents "foreground object or background" using, for example, two values, [1 or 0].

自己位置推定部２２は、ＲＧＢセンサ２１１が出力するカラー画像に対して画像処理を行って、環境状況判断部２の位置を検出して、撮影装置の姿勢情報を出力する。 The self-position estimation unit 22 performs image processing on the color image output by the RGB sensor 211 to detect the position of the environmental situation assessment unit 2 and output attitude information of the image capture device.

三次元再構成部２５は、自己位置推定部２２が出力する撮影装置の姿勢情報と、加算部２４が出力するクラス毎の各インスタンスＩＤにおける深度画像を用いて、クラス毎の各インスタンスＩＤのポイントクラウド（point cloud）を三次元再構成によって求める。換言すると、三次元再構成部２５は、クラス毎にインスタンスＩＤに分けて対象物体に関する形状を例えばボクセルまたは点群で構成する。なお、ポイントクラウドとは、点群データであり、Ｘ、Ｙ、Ｚの基本的位置情報や色などの情報を持つ３次元データである。 The three-dimensional reconstruction unit 25 uses the orientation information of the image capture device output by the self-position estimation unit 22 and the depth image for each instance ID for each class output by the addition unit 24 to perform three-dimensional reconstruction to obtain a point cloud for each instance ID for each class. In other words, the three-dimensional reconstruction unit 25 divides the instance IDs for each class and constructs the shape of the target object using, for example, voxels or a point cloud. Note that a point cloud is point cloud data, and is three-dimensional data that contains information such as basic X, Y, and Z position information and color.

データ統合部２６は、キーフレーム毎に各インスタンスＩＤのポイントクラウドを取得する。データ統合部２６は、時刻ｔの現在のデータ、過去のｔ－１、ｔ－２等のフレームの時系列データも取得管理し、これらのデータを統合する。データ統合部２６は、現在のフレームに過去のフレームを含めた対象物体に関する形状情報を操作量決定部７へ出力する。例えば、テーブル上に２つの物体が置かれている場合は、２つのクラス（第１物体、第２物体）それぞれについてインスタンスＩＤが紐付けられている。 The data integration unit 26 acquires a point cloud for each instance ID for each key frame. The data integration unit 26 also acquires and manages the current data at time t and time series data for past frames such as t-1 and t-2, and integrates this data. The data integration unit 26 outputs shape information about the target object, including the current frame and past frames, to the operation amount determination unit 7. For example, if two objects are placed on a table, an instance ID is associated with each of the two classes (first object, second object).

［遠隔操作の概要］
図３は、本実施形態に係る遠隔操作の概要を示す図である。なお、図３に示したロボット１５は、双椀と頭部とボディを備える例を示したが、ロボット１５の構成や形状等はこれに限らない。
図３のように、操作者Ｕｓは、例えばＨＭＤ（ヘッドマウントディスプレイ）３と指示検出部４ａ、４ｂを装着している。ロボット１５またはロボット１５の周囲には環境センサ２１ａが設置され、作業環境にも環境センサ２１ｂが設置されている。なお、環境センサ２１は、ロボット１５に取り付けられていてもよい。また、ロボット１５は、エンドエフェクタ１５１（１５１ａ、１５１ｂ）とアーム１５２を備える。操作者Ｕｓは、ＨＭＤ３に表示された画像を見ながら指示検出部４ａ、４ｂを装着している手や指を動かすことで、ロボット１５を遠隔操作する。 [Remote control overview]
Fig. 3 is a diagram showing an overview of remote control according to this embodiment. Note that, although the robot 15 shown in Fig. 3 is an example having a pair of arms, a head, and a body, the configuration and shape of the robot 15 are not limited to this.
3 , the operator Us wears, for example, an HMD (head-mounted display) 3 and instruction detection units 4a and 4b. An environmental sensor 21a is installed on the robot 15 or around the robot 15, and an environmental sensor 21b is also installed in the work environment. The environmental sensor 21a may be attached to the robot 15. The robot 15 also includes an end effector 151 (151a, 151b) and an arm 152. The operator Us remotely controls the robot 15 by moving the hand or fingers wearing the instruction detection units 4a and 4b while viewing the image displayed on the HMD 3.

［幾何プリミティブの例］
ここで、幾何プリミティブの例を、ロボット１５にスパナを把持させて、ネジを締めさせる作業を例に説明する。
図４は、幾何プリミティブの例を説明するための図である。ロボット１５にスパナを把持させて、ネジを締めさせる作業の場合、操作者の注目対象はスパナである。 [Example of geometric primitives]
Here, an example of a geometric primitive will be described using an example of a task in which the robot 15 is made to grip a wrench and tighten a screw.
4 is a diagram for explaining an example of a geometric primitive. In a task in which the robot 15 is made to hold a wrench and tighten a screw, the object of the operator's attention is the wrench.

画像ｇ１００の例では、対象物体を１つ（ｇ１０１）の枠とする単一プリミティブである。 In the example of image g100, the target object is a single primitive with one frame (g101).

画像ｇ１１０の例では、対象物体を２つ（ｇ１１１、ｇ１１２）の枠とする複数プリミティブである。 In the example of image g110, the target object is a multiple primitive with two frames (g111, g112).

画像ｇ１２０では、ねじ締め作業の場合、注目領域が枠ｇ１１２内である。
このように、幾何プリミティブにおける形状フィッティングは、１つの枠で行ってもよく、２つ以上の枠で行ってもよく、注目領域を用いてもよい。 In the image g120, in the case of screw tightening work, the area of interest is within the frame g112.
Thus, shape fitting in geometric primitives may be performed on a single frame, on two or more frames, or using regions of interest.

このように、本実施形態では、対象物体を例えば平面において幾何基本要素で近似するようにした。なお、本実施形態において、幾何基本要素は、ユークリッド幾何学における三角形、正方向、長方形、台形、多角形、円形、楕円形などの基本的な形状である。 In this way, in this embodiment, the target object is approximated, for example, on a plane, using geometric primitives. Note that in this embodiment, geometric primitives are basic shapes in Euclidean geometry, such as triangles, squares, rectangles, trapezoids, polygons, circles, and ellipses.

［処理手順例］
次に、遠隔操作補助システム１が行う処理手順例を説明する。図５は、本実施形態に係る遠隔操作補助システム１が行う処理手順例のフローチャートである。 [Example of processing procedure]
Next, a description will be given of an example of a processing procedure performed by the remote operation assistance system 1. Fig. 5 is a flowchart of an example of a processing procedure performed by the remote operation assistance system 1 according to this embodiment.

（ステップＳ１）環境状況判断部２は、エンドエフェクタ１５１が操作される環境の環境情報を取得する。環境状況判断部２は、取得した環境情報である対象物体に関する形状情報を操作量決定部７に出力する。 (Step S1) The environmental situation determination unit 2 acquires environmental information about the environment in which the end effector 151 is operated. The environmental situation determination unit 2 outputs the acquired environmental information, which is shape information about the target object, to the operation amount determination unit 7.

（ステップＳ２）データ統合部９は、複数の環境状況判断部２から取得した対象物体に関する形状情報に含まれるボクセル情報を用いて、インスタンスＩＤ毎に三次元形状の統合を行う。形状近似部１０は、インスタンスＩＤ毎の統合された三次元形状の統合情報に基づいて幾何プリミティブ情報を生成する。把持計画部１３は、形状近似部１０からインスタンスＩＤ毎の幾何プリミティブ情報を取得する。 (Step S2) The data integration unit 9 integrates three-dimensional shapes for each instance ID using voxel information contained in the shape information related to the target object acquired from multiple environmental situation assessment units 2. The shape approximation unit 10 generates geometric primitive information based on the integrated information of the integrated three-dimensional shapes for each instance ID. The grasp planning unit 13 acquires the geometric primitive information for each instance ID from the shape approximation unit 10.

（ステップＳ３）操作量決定部７は、エンドエフェクタ１５１を操作する操作者の動作に関する情報をＨＭＤ３と指示検出部４から取得する。 (Step S3) The operation amount determination unit 7 acquires information about the movement of the operator operating the end effector 151 from the HMD 3 and the instruction detection unit 4.

（ステップＳ４）意図理解部６は、動作取得部５が出力する情報を用いて、エンドエフェクタ１５１によって操作する対象である対象物体と、対象物体の操作方法であるタスクを推定する。 (Step S4) The intention understanding unit 6 uses the information output by the action acquisition unit 5 to estimate the target object to be operated by the end effector 151 and the task, which is the method for operating the target object.

（ステップＳ５）動作推定部１１は、操作意図、ロボット状態、対象物体に紐づく可能な操作から、操作意図の集合である動作を推定する。把持計画部１３は、動作推定部１１から動作情報を取得する。 (Step S5) The motion estimation unit 11 estimates a motion, which is a collection of operational intentions, from the operational intentions, the robot state, and possible operations linked to the target object. The grasp planning unit 13 acquires motion information from the motion estimation unit 11.

（ステップＳ６）把持計画部１３は、取得されたインスタンスＩＤ毎の幾何プリミティブ情報と推定された動作情報を用いて、インスタンスＩＤが所属するクラスＩＤをデータベース１４から探索する。例えば、スパナクラスのスパナＡ、スパナＢというインスタンスがある場合、把持計画部１３は、これらのクラスＩＤをデータベース１４から探索する。 (Step S6) The grasp planning unit 13 uses the geometric primitive information and estimated motion information for each acquired instance ID to search the database 14 for the class ID to which the instance ID belongs. For example, if there are instances called Spanner A and Spanner B in the Spanner class, the grasp planning unit 13 searches the database 14 for these class IDs.

（ステップＳ７）把持計画部１３は、クラスＩＤのクラスに紐付けられている平均形状モデルをデータベース１４から取得する。例えばクラスがスパナの場合、平均形状モデルは、スパナの平均的な形状のモデルである。なお、このようなクラス毎の平均形状モデルは、データベース１４に格納されている。 (Step S7) The grasp planning unit 13 obtains the average shape model associated with the class of the class ID from the database 14. For example, if the class is a wrench, the average shape model is a model of the average shape of a wrench. Note that such average shape models for each class are stored in the database 14.

（ステップＳ８）把持計画部１３は、平均形状を幾何プリミティブに変形して領域間のマッピング関数を得る。把持計画部１３は、例えば、対象物体に関する形状情報に幾何プリミティブをフィッティングさせることで、平均形状とプリミティブ形状とのマッピング関数を得ることができる。なお、対象物体は剛体に限らず、変形が定義できる物体であれば非剛体であってもよい。また、把持計画部１３は、物体のクラスの一般的な（単位体積当たりの）重量をデータベース１４から取得する。 (Step S8) The grasp planning unit 13 transforms the average shape into a geometric primitive to obtain a mapping function between regions. The grasp planning unit 13 can obtain a mapping function between the average shape and the primitive shape, for example, by fitting the geometric primitive to shape information about the target object. Note that the target object is not limited to a rigid body; it may be a non-rigid object as long as its deformation can be defined. The grasp planning unit 13 also obtains the general weight (per unit volume) of the object class from the database 14.

（ステップＳ９）把持計画部１３は、動作と、タクソノミーと、クラスＩＤに基づいて、把持確率マップを、把持ＤＢ１２から取得する。 (Step S9) The grasping planning unit 13 obtains a grasping probability map from the grasping DB 12 based on the motion, taxonomy, and class ID.

（ステップＳ１０）把持計画部１３は、得られた領域間の対応のマッピング関数に基づいて、把持確率マップを幾何プリミティブの領域にマッピングする。これにより、オンラインで得られた幾何プリミティブ形状に応じた確率ヒートマップ的な情報を得ることができる。 (Step S10) The grasp planning unit 13 maps the grasp probability map to the geometric primitive region based on the obtained mapping function of correspondence between regions. This makes it possible to obtain probability heat map-like information corresponding to the geometric primitive shape obtained online.

（ステップＳ１１）把持計画部１３は、現在の手首姿勢から、幾何学的制約を満たしながら、経路の短さ、最短経路が最も確率が高くなる把持点を、最適化問題を解いて求める。 (Step S11) The grasp planning unit 13 solves an optimization problem to find the grasp point that has the shortest path and the highest probability of being the shortest route, while satisfying geometric constraints, based on the current wrist posture.

（ステップＳ１２）把持計画部１３は、把持点が選択された後、その把持点に近づくための軌道を算出してロボット１５に出力する。なお、把持計画部１３は、重量を物体クラスの一般的な（単位体積当たりの）重量から変化分を適応的に制御する。 (Step S12) After a gripping point is selected, the grasp planning unit 13 calculates a trajectory for approaching the gripping point and outputs it to the robot 15. The grasp planning unit 13 adaptively controls the change in weight from the general weight (per unit volume) for the object class.

なお、上述した実施形態では、作業の一例として把持を例に説明したが、作業内容はこれに限らない。 Note that in the above-described embodiment, grasping was used as an example of a task, but the task content is not limited to this.

［単一プリミティブ、複数プリミティブの場合の処理について］
ここで、単一プリミティブ、複数プリミティブの場合の処理についてさらに説明する。図６は、本実施形態に係る単一プリミティブ、複数プリミティブの場合の処理手順のフローチャートである。 [Processing for single and multiple primitives]
Here, the processing for the single primitive and multiple primitives will be further explained. Fig. 6 is a flowchart showing the processing procedure for the single primitive and multiple primitives according to this embodiment.

（ステップＳ１０１）環境状況判断部２は、エンドエフェクタ１５１が操作される環境の環境情報を取得する。環境状況判断部２は、取得した環境情報である対象物体に関する形状情報を操作量決定部７に出力する。 (Step S101) The environmental situation determination unit 2 acquires environmental information about the environment in which the end effector 151 is operated. The environmental situation determination unit 2 outputs the acquired environmental information, which is shape information about the target object, to the operation amount determination unit 7.

（ステップＳ１０２）データ統合部９は、複数の環境状況判断部２から取得した対象物体に関する形状情報に含まれるボクセル情報を用いて、インスタンスＩＤ毎に三次元形状の統合を行う。形状近似部１０は、インスタンスＩＤ毎の統合された三次元形状の統合情報に基づいて幾何プリミティブ情報を生成する。把持計画部１３は、形状近似部１０からインスタンスＩＤ毎の幾何プリミティブ情報を取得する。 (Step S102) The data integration unit 9 integrates three-dimensional shapes for each instance ID using voxel information contained in the shape information related to the target object acquired from multiple environmental situation assessment units 2. The shape approximation unit 10 generates geometric primitive information based on the integrated information of the integrated three-dimensional shapes for each instance ID. The grasp planning unit 13 acquires the geometric primitive information for each instance ID from the shape approximation unit 10.

（ステップＳ１０３）操作量決定部７は、エンドエフェクタ１５１を操作する操作者の動作に関する情報をＨＭＤ３と指示検出部４から取得する。 (Step S103) The operation amount determination unit 7 acquires information about the movement of the operator operating the end effector 151 from the HMD 3 and the instruction detection unit 4.

（ステップＳ１０４）意図理解部６は、動作取得部５が出力する情報を用いて、エンドエフェクタによって操作する対象である対象物体と、対象物体の操作方法であるタスクを推定する。 (Step S104) The intention understanding unit 6 uses the information output by the action acquisition unit 5 to estimate the target object to be operated by the end effector and the task, which is the method for operating the target object.

（ステップＳ１０５）動作推定部１１は、操作意図、ロボット状態、対象物体に紐づく可能な操作から、操作意図の集合である動作を推定する。把持計画部１３は、動作推定部１１から動作情報を取得する。 (Step S105) The motion estimation unit 11 estimates a motion, which is a collection of operational intentions, from the operational intention, robot state, and possible operations linked to the target object. The grasp planning unit 13 acquires motion information from the motion estimation unit 11.

（ステップＳ１０６）把持計画部１３は、取得された幾何プリミティブ情報と推定された動作情報を用いて、インスタンスＩＤが所属するクラスＩＤをデータベース１４から探索する。例えば、スパナクラスのスパナＡ、スパナＢというインスタンスがある場合は、これらのクラスＩＤをデータベース１４から探索する。 (Step S106) The grasp planning unit 13 uses the acquired geometric primitive information and estimated motion information to search the database 14 for the class ID to which the instance ID belongs. For example, if there are instances called Spanner A and Spanner B in the Spanner class, the database 14 is searched for these class IDs.

（ステップＳ１０７）複数プリミティブの場合、把持計画部１３は、作業者の操作入力とクラスＩＤと各プリミティブが持つ選択可能なタクソノミーを決定する。なお、単一プリミティブの場合、把持計画部１３は、作業者の操作入力からタクソノミーを決定する。 (Step S107) In the case of multiple primitives, the grasp planning unit 13 determines the selectable taxonomy for each primitive based on the operator's input, the class ID, and the primitive's operation. Note that in the case of a single primitive, the grasp planning unit 13 determines the taxonomy based on the operator's input.

（ステップＳ１０８）把持計画部１３は、クラスＩＤに対応した密度と、推定した体積から、物体の初期重量を決定する。 (Step S108) The grasp planning unit 13 determines the initial weight of the object based on the density corresponding to the class ID and the estimated volume.

（ステップＳ１０９）把持計画部１３は、現在の手首姿勢から、幾何学的制約を満たし且つ最短経路で対象物の物体表面に到達する把持点を、最適化問題を解いて選択する。 (Step S109) The grasp planning unit 13 solves an optimization problem to select a grasp point from the current wrist posture that satisfies the geometric constraints and reaches the object surface via the shortest path.

（ステップＳ１１０）把持計画部１３は、把持点が選択された後、その把持点に近づくための軌道を算出してロボット１５に出力する。なお、把持計画部１３は、重量を物体クラスの一般的な（単位体積当たりの）重量から変化分を適応的に制御する。 (Step S110) After a gripping point is selected, the grasp planning unit 13 calculates a trajectory for approaching the gripping point and outputs it to the robot 15. The grasp planning unit 13 adaptively controls the change in weight from the general weight (per unit volume) for the object class.

［処理、構成の概要］
ここで、遠隔操作補助システム１の構成の概要、処理の概要を説明する。
図７は、本実施形態に係る遠隔操作補助システムの構成の概要、処理の概要を説明するための図である。
図７のように、マスク生成部２３は、ＲＧＢデータにおける画像中の物体のインスタンスＩＤに基づくマスク画像を生成して出力する。なお、ここでの前提条件は、画像に写っている物体をクラス分類できることである。
三次元再構成部２５は、マスク画像と深度画像に基づいて、クラス毎の各インスタンスＩＤのポイントクラウドを三次元再構成によって求める。三次元再構成部２５は、対象物体に関する形状情報であるボクセルまたは点群（ポイントクラウド）をクラスタリングして出力する。三次元再構成部２５は、ＲＧＢセンサ２１１と深度センサ２１２を用いてアクティブセンシングを行い、センシングした情報を三次元点統合する。
形状近似部１０は、クラスタリングされた対象物体に関する形状情報を用いてプリミティブ・フィッティングを行って物体形状を幾何基本要素（例えば直方体、円柱、球等）で近似する。
把持計画部１３は、事前登録されている把持戦略をスケールさせることで操作意図に応じた物体把持を実現する。 [Processing and configuration overview]
Here, an outline of the configuration and processing of the remote operation assistance system 1 will be described.
FIG. 7 is a diagram for explaining an outline of the configuration and processing of the remote operation assistance system according to this embodiment.
7, the mask generation unit 23 generates and outputs a mask image based on the instance ID of an object in an image in RGB data. Note that a prerequisite here is that the objects in the image can be classified into classes.
The three-dimensional reconstruction unit 25 obtains a point cloud for each instance ID for each class by three-dimensional reconstruction based on the mask image and the depth image. The three-dimensional reconstruction unit 25 clusters and outputs voxels or a point group (point cloud), which is shape information related to the target object. The three-dimensional reconstruction unit 25 performs active sensing using the RGB sensor 211 and the depth sensor 212, and integrates the sensed information into three-dimensional points.
The shape approximation unit 10 performs primitive fitting using shape information on the clustered target object to approximate the object shape with geometric basic elements (for example, a rectangular parallelepiped, a cylinder, a sphere, etc.).
The grasp planning unit 13 realizes object grasping according to the operation intention by scaling a pre-registered grasp strategy.

［比較例］
ここで、比較例を、図８を用いて説明する。図８は、比較例を説明するための図である。
図８のように、従来技術では、例えばデータベースに予め情報が登録されている物体しか扱えなかった。例えば、物体がスパナの場合でも、形状、重量が異なる場合は、登録されている必要があった。そして、従来技術では、対象物体に三次元形状と重量が既知である必要があった。また、渋滞技術では、対象物の把持点や把持姿勢も既知である必要があった。 [Comparative Example]
A comparative example will now be described with reference to Fig. 8. Fig. 8 is a diagram for explaining the comparative example.
As shown in Figure 8, conventional technology could only handle objects whose information was registered in advance in a database. For example, even if the object was a wrench, if the shape and weight were different, it had to be registered. Furthermore, conventional technology required that the three-dimensional shape and weight of the target object be known. Furthermore, with congestion technology, the gripping point and gripping posture of the target object also had to be known.

これに対して、本実施形態では、前提条件として、物体クラス（例えばスパナ）と操作意図に応じた把持戦略のみを事前に与えるようにした。そして、本実施形態では、物体クラスと操作意図に応じた把持戦略を事前に与える物体形状を幾何基本要素（例えば直方体）で近似し、事前登録した把持戦略をスケールさせることで操作意図に応じた物体把持を実現するようにした。また本実施形態では、重量を物体クラスの一般的な（単位体積当たりの）重量から変化分を適応的に制御するようにした。 In contrast, in this embodiment, only the object class (e.g., a wrench) and a grasping strategy corresponding to the operation intention are provided in advance as prerequisites. Then, in this embodiment, the object shape for which a grasping strategy corresponding to the object class and operation intention is provided in advance is approximated with a geometric primitive (e.g., a rectangular parallelepiped), and the pre-registered grasping strategy is scaled to achieve object grasping corresponding to the operation intention. Furthermore, in this embodiment, the weight is adaptively controlled by the amount of change from the general weight (per unit volume) of the object class.

これにより、本実施形態によれば、事前に登録する情報が少なくなるため、様々なインスタンスに対して把持計画を行える。 As a result, according to this embodiment, less information needs to be registered in advance, allowing for grasp planning for a variety of instances.

なお、本発明における操作量決定部７の機能の全てまたは一部を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより操作量決定部７が行う処理の全てまたは一部を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータシステム」は、ホームページ提供環境（あるいは表示環境）を備えたＷＷＷシステムも含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。 In addition, a program for realizing all or part of the functions of the manipulated variable determination unit 7 in this invention may be recorded on a computer-readable recording medium, and the program recorded on this recording medium may be loaded into a computer system and executed to perform all or part of the processing performed by the manipulated variable determination unit 7. Note that the term "computer system" as used here includes hardware such as an OS and peripheral devices. It also includes a WWW system equipped with a homepage provision environment (or display environment). Furthermore, "computer-readable recording medium" refers to portable media such as flexible disks, optical magnetic disks, ROMs, and CD-ROMs, as well as storage devices such as hard disks built into computer systems. Furthermore, "computer-readable recording medium" also includes devices that retain a program for a certain period of time, such as volatile memory (RAM) within a computer system that acts as a server or client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line.

また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであってもよい。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。 The above program may also be transmitted from a computer system that stores the program in a storage device or the like to another computer system via a transmission medium, or by transmission waves within the transmission medium. Here, the "transmission medium" that transmits the program refers to a medium that has the function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The above program may also be one that realizes part of the above-mentioned functions. Furthermore, it may be a so-called differential file (differential program) that can realize the above-mentioned functions in combination with a program already recorded on the computer system.

以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形および置換を加えることができる。 The above describes the form for carrying out the present invention using an embodiment, but the present invention is in no way limited to such an embodiment, and various modifications and substitutions can be made without departing from the spirit of the present invention.

１…遠隔操作補助システム、２，２－１，２－２，２－３，・・・…環境状況判断部、３…ＨＭＤ、４…指示検出部、５…動作取得部、６…意図理解部、７…操作量決定部、８…モデル、９…データ統合部、１０…形状近似部、１１…動作推定部、１２…把持ＤＢ、１３…把持計画部、１４…データベース、１５…ロボット、１６…制御部、１７…近似結果反映部、３１…視線検出部、１５１…エンドエフェクタ、１５２…アーム、１５３…センサ、２１…環境センサ、２２…自己位置推定部、２３…マスク生成部、２４…加算部、２５…三次元再構成部、２６…データ統合部、２１１…ＲＧＢセンサ、２１２…深度センサ 1... Remote operation assistance system, 2, 2-1, 2-2, 2-3... Environmental situation judgment unit, 3... HMD, 4... Instruction detection unit, 5... Motion acquisition unit, 6... Intention understanding unit, 7... Operation amount determination unit, 8... Model, 9... Data integration unit, 10... Shape approximation unit, 11... Motion estimation unit, 12... Grasp DB, 13... Grasp planning unit, 14... Database, 15... Robot, 16... Control unit, 17... Approximation result reflection unit, 31... Gaze detection unit, 151... End effector, 152... Arm, 153... Sensor, 21... Environmental sensor, 22... Self-position estimation unit, 23... Mask generation unit, 24... Addition unit, 25... 3D reconstruction unit, 26... Data integration unit, 211... RGB sensor, 212... Depth sensor

Claims

A teleoperation assistance system for remotely operating at least an end effector,
a motion acquisition unit that acquires motion information relating to a motion of an operator operating at least the end effector;
an intention understanding unit that uses the operation information to estimate a target object to be operated by the end effector and a task that is a method of operating the target object intended by the operator ;
an environmental situation determination unit that acquires environmental information of an environment in which the target object exists and the end effector is operated with respect to the target object ;
a motion estimation unit that estimates a motion based on the task, a state of the robot equipped with the end effector, and an object shape of the target object generated based on the environmental information;
an operation amount determination unit that determines an operation amount of the end effector based on the estimated motion and the acquired environmental information ,
The operation amount determination unit
a shape approximation unit that approximates the object shape of the target object to geometric primitives;
a grasp planning unit that generates a grasp plan by comparing the geometric basic elements with registered information of an operation strategy as a strategy for an operation to grasp an object pre-registered for the end effector ,
The grasping planning unit
fitting the object shape of the target object to the geometric primitives to calculate a mapping function between the area of the object shape of the target object and the area of the geometric primitives;
determining a taxonomy capable of performing said task;
a grasping probability map indicating a grasping probability for each operation intention is obtained based on the motion information, the determined taxonomy, and a class ID to which an instance ID assigned to the object shape of the target object belongs; the grasping probability map is mapped to the region of the geometric basic element based on a mapping function of correspondence between the calculated regions; and the grasping plan is generated based on a current wrist posture.
Remote control assistance system.

a shape approximation unit that approximates the object shape of the target object to a geometric primitive for each of the instances ID ;
The remote operation assistance system according to claim 1 .

The geometric primitive may be one or more .
The remote operation assistance system according to claim 1 .

the geometric basic elements include a shape of a region of interest that is an area of the target object that is easy to grasp;
The remote operation assistance system according to claim 1 .

The environmental situation determination unit
an RGB sensor for acquiring RGB data;
a depth sensor for acquiring a depth image;
a mask generation unit that uses an instance segmentation technique to generate a mask image based on an instance ID of an object in the image using the color image output by the RGB sensor;
an adder that generates and outputs a depth image for each instance ID using the depth image and the mask image;
a self-position estimation unit that performs image processing on a color image output by the RGB sensor to detect a position and orientation of the RGB sensor , and generates and outputs orientation information based on the position and orientation ;
a three-dimensional reconstruction unit that obtains a point cloud for each instance ID for each class by three-dimensional reconstruction using the orientation information of the RGB sensor output by the self-position estimation unit and the depth image for each instance ID output by the addition unit;
A data integration unit that acquires a point cloud of each instance ID for each key frame, acquires and manages current data at time t, and time series data of past frames t-1, t-2, ..., and integrates these data;
3. The remote operation assistance system according to claim 1, further comprising:

a computer of a teleoperation assistance system that teleoperates at least the end effector,
acquiring motion information relating to the motion of an operator who operates at least the end effector ;
Using the motion information , a target object to be operated by the end effector and a task that is a method of operating the target object intended by the operator are estimated;
acquiring environmental information of an environment in which the target object exists and in which the end effector is operated relative to the target object ;
Estimating a motion based on the task, a state of the robot equipped with the end effector, and an object shape of the target object generated by the environmental information;
determining an operation amount of the end effector based on the estimated motion and the acquired environmental information ;
The computer
approximating the object shape of the target object to geometric primitives;
generating a grasp plan by comparing the geometric primitives with registered information of an operation strategy as a strategy for grasping an object pre-registered for the end effector;
The computer
fitting the object shape of the target object to the geometric primitives to calculate a mapping function between the area of the object shape of the target object and the area of the geometric primitives;
determining a taxonomy capable of performing said task;
a grasping probability map indicating a grasping probability for each operation intention is obtained based on the motion information, the determined taxonomy, and a class ID to which an instance ID assigned to the object shape of the target object belongs; the grasping probability map is mapped to the region of the geometric basic element based on a mapping function of correspondence between the calculated regions; and the grasping plan is generated based on a current wrist posture.
Remote control assistance method.

A computer of a teleoperation assistance system that teleoperates at least the end effector ,
acquiring motion information relating to at least the motion of an operator operating the end effector;
using the motion information , a target object to be operated by the end effector and a task that is a method of operating the target object intended by the operator are estimated;
acquiring environmental information of an environment in which the target object exists and in which the end effector is operated relative to the target object ;
A motion is estimated based on the task, a state of the robot equipped with the end effector, and an object shape of the target object generated by the environmental information.
determining an operation amount of the end effector based on the estimated motion and the acquired environmental information;
The computer,
approximating the object shape of the target object to geometric primitives;
generating a grasp plan by comparing the geometric primitives with registered information of an operation strategy as a strategy for grasping an object previously registered for the end effector;
The computer,
fitting the object shape of the target object to the geometric primitives to calculate a mapping function between a region of the object shape of the target object and a region of the geometric primitives;
determining a taxonomy capable of performing said task;
a grasping probability map indicating a grasping probability for each operation intention is obtained based on the motion information, the determined taxonomy, and a class ID to which an instance ID assigned to the object shape of the target object belongs; the grasping probability map is mapped to the region of the geometric basic element based on a mapping function of correspondence between the calculated regions; and the grasping plan is generated based on a current wrist posture.
program .