JP7800525B2

JP7800525B2 - Information processing device, information processing method, and program

Info

Publication number: JP7800525B2
Application number: JP2023183564A
Authority: JP
Inventors: 政志 ▲濱▼屋; 敦史橋本; 翔平田中; 圭佑白井; カミロベルトランエルナンデスクリスティアン
Original assignee: Omron Corp
Current assignee: Omron Corp
Priority date: 2023-10-25
Filing date: 2023-10-25
Publication date: 2026-01-16
Anticipated expiration: 2043-10-25
Also published as: WO2025089031A1; JP2026034836A; JP2025073015A

Description

本発明は、情報処理装置、情報処理方法及びプログラムに関する。 The present invention relates to an information processing device, an information processing method, and a program.

自然言語によるロボット装置との対話は、非専門家がロボット装置に複雑で多様なタスクを解かせる手段として有望である。近年では、大規模言語モデル（ＬＬＭ：Large Language Model）を用いて、言語指示から作業空間中でのロボット装置の動作系列を出力する研究が注目を集めている。 Interacting with robotic devices using natural language is a promising means for non-experts to enable robotic devices to solve complex and diverse tasks. In recent years, research has been attracting attention on using large language models (LLMs) to output sequences of robotic device actions in a workspace from linguistic instructions.

例えば、非特許文献１には、言語指示からロボット装置の動作系列を生成するシステムが提案されている。非特許文献１で提案されるシステムは、自然言語による指示（言語指示）の入力を受け付ける。当該システムは、ＬＬＭ（Ｓａｙモジュール）を用いて、次に実行する妥当性の高いロボット装置の動作を言語指示から推測する。また、当該システムは、価値関数（Ｃａｎモジュール）を用いて、実行可能性の高い動作を観測データから推測する。そして、当該システムは、２つの推測結果を統合し、統合の結果に応じて、ロボット装置に与える動作系列（実行する妥当性が高く、実行可能な動作系列）を決定する。 For example, Non-Patent Document 1 proposes a system that generates an action sequence for a robot device from linguistic instructions. The system proposed in Non-Patent Document 1 accepts input of instructions in natural language (linguistic instructions). The system uses an LLM (Say module) to infer from the linguistic instructions the next action of the robot device that is likely to be executed. The system also uses a value function (Can module) to infer from observation data which actions are likely to be executed. The system then integrates the results of the two inferences and, depending on the integration result, determines the action sequence to be given to the robot device (an action sequence that is likely to be executed and is feasible).

Michael Ahn et al., “Do As I Can, Not As I Say: Grounding Language in Robotic Affordances”, [online], [令和５年１０月２４日検索], インターネット＜ＵＲＬ：https://arxiv.org/abs/2204.01691＞Michael Ahn et al., “Do As I Can, Not As I Say: Grounding Language in Robotic Affordances”, [online], [Retrieved October 24, 2020], Internet <URL: https://arxiv.org/abs/2204.01691>

本件発明者らは、上記従来のシステムには、次のような問題点があることを見出した。すなわち、従来のシステムでは、生成される動作系列は、試行環境に特化しており、人間にとって解釈可能なものであるとは限らない。動作系列をそのまま出力してしまうため、ロボット装置の制御に関して、得られる出力の説明性が低いという問題点があった。 The inventors of this invention have discovered that the above-mentioned conventional systems have the following problems. In other words, the action sequences generated by conventional systems are specific to the trial environment and are not necessarily interpretable by humans. Because the action sequences are output as is, there is a problem in that the interpretability of the output obtained in terms of controlling the robot device is low.

本発明は、一側面では、このような事情を鑑みてなされたものであり、その目的は、ロボット装置の制御に関して、説明性の高い出力を得る技術を提供することである。 In one aspect, the present invention was made in consideration of these circumstances, and its purpose is to provide technology that can obtain highly interpretable output regarding the control of a robot device.

本発明は、上述した課題を解決するために、以下の構成を採用する。なお、以下の発明の構成は適宜組み合わせ可能である。 The present invention employs the following features to solve the above-mentioned problems. Note that the following features of the invention can be combined as appropriate.

すなわち、本発明の一側面に係る情報処理装置は、ロボット装置の動作する環境の観測データ、及び前記ロボット装置に与えるタスクの目標に関する指示情報を取得するステップ、推論モジュールを用いて、取得された前記観測データ及び前記指示情報から前記タスクの問題記述を生成するステップ、並びに生成された前記問題記述を出力するステップを実行するように構成される制御部を備える。前記問題記述は、前記環境に存在する物体の初期状態及び目標状態の記述を含む。 That is, an information processing device according to one aspect of the present invention includes a control unit configured to execute the steps of acquiring observation data of an environment in which a robot device operates and instruction information regarding a task goal to be given to the robot device, generating a problem description of the task from the acquired observation data and instruction information using an inference module, and outputting the generated problem description. The problem description includes descriptions of the initial state and goal state of objects present in the environment.

問題記述は、ドメイン記述と共に、プランナが行動計画を生成する（すなわち、動作系列を得る）のに用いられる。この問題記述は、タスクを達成するために、環境に存在する物体の初期状態及び目標状態を記述しており、高い説明性を有する（人間にとって解釈可
能である）。したがって、当該構成によれば、ロボット装置に与える動作系列（指令）を得るための説明性の高い出力を得ることができる。 The problem description, together with the domain description, is used by the planner to generate an action plan (i.e., obtain an action sequence). This problem description describes the initial and goal states of objects in the environment to accomplish a task, and is highly descriptive (human-interpretable). Therefore, this configuration can obtain a highly descriptive output for obtaining an action sequence (command) to be given to the robotic device.

上記一側面に係る情報処理装置において、生成される前記問題記述は、所定のフォーマットに従うものであってよい。当該構成によれば、生成される問題記述が所定のフォーマットに従っていることで、プランナが問題記述から行動計画を生成しやすくすることができる。 In the information processing device according to the above aspect, the generated problem description may conform to a predetermined format. With this configuration, the generated problem description conforms to the predetermined format, making it easier for the planner to generate an action plan from the problem description.

上記一側面に係る情報処理装置において、生成される前記問題記述は、前記環境に存在する前記物体の記述を更に含んでよい。当該構成によれば、物体の記述により、環境に存在する物体が特定しやすくなることで、プランナが問題記述から行動計画を生成しやすくすることができる。 In the information processing device according to the above aspect, the generated problem description may further include a description of the objects present in the environment. According to this configuration, the object description makes it easier to identify objects present in the environment, making it easier for the planner to generate an action plan from the problem description.

上記一側面に係る情報処理装置において、前記観測データは、センサのセンシングデータにより構成されてよく、前記指示情報は、前記目標を自然言語で指示する言語情報により構成されてよい。当該構成によれば、環境をセンサで観測し、言語指示でタスクの目標を与える場面で、ロボット装置の制御に関して、説明性の高い出力を得ることができる。 In the information processing device according to the above aspect, the observation data may be composed of sensing data from a sensor, and the instruction information may be composed of linguistic information that specifies the goal in natural language. With this configuration, when the environment is observed by a sensor and task goals are given in linguistic instructions, highly descriptive output can be obtained regarding the control of the robot device.

上記一側面に係る情報処理装置において、前記推論モジュールは、コンテキスト内学習（in-context learning）の訓練済みモデルを含むように構成されてよい。前記制御部は、生成された前記問題記述をプランナに与えて、前記ロボット装置の行動計画を生成する処理において、当該プランナがエラーメッセージを出力した場合、出力された前記エラーメッセージを取得するステップ、並びに前記推論モジュールを用いて、前記問題記述及び前記エラーメッセージから新たな問題記述を生成するステップを更に実行するように構成されてよい。当該構成によれば、適切な問題記述が得られなかった場合に、問題記述を自動的に修正することができる。なお、訓練済みモデルは、訓練された機械学習モデルである。 In the information processing device according to the above aspect, the inference module may be configured to include a trained model for in-context learning. When the planner outputs an error message during the process of providing the generated problem description to a planner and generating a behavior plan for the robotic device, the control unit may further be configured to: acquire the output error message; and use the inference module to generate a new problem description from the problem description and the error message. With this configuration, if an appropriate problem description cannot be obtained, the problem description can be automatically corrected. The trained model is a trained machine learning model.

上記一側面に係る情報処理装置の前記制御部は、前記取得するステップでは、前記環境に関する環境情報を更に取得するように構成されてよい。取得された前記観測データ及び前記指示情報から前記問題記述を生成することは、取得された前記観測データ、前記指示情報及び前記環境情報から前記問題記述を生成することにより構成されてよい。当該構成によれば、観測データ及び指示情報に加えて、環境情報を説明変数として更に用いることで、タスクを遂行する環境がより特定可能となるため、問題記述を生成する精度の向上を期待することができる。 The control unit of the information processing device according to the above aspect may be configured to further acquire environmental information related to the environment in the acquiring step. Generating the problem description from the acquired observation data and instruction information may be configured by generating the problem description from the acquired observation data, instruction information, and environmental information. With this configuration, by further using environmental information as an explanatory variable in addition to the observation data and instruction information, it becomes possible to more accurately identify the environment in which a task is to be performed, which is expected to improve the accuracy of generating the problem description.

上記一側面に係る情報処理装置において、生成される前記問題記述は、前記環境に存在する前記物体の記述を更に含んでよい。前記推論モジュールは、物体推定器を含んでよい。前記推論モジュールを用いて、前記問題記述を生成することは、前記物体推定器を用いて、前記環境に存在する前記物体の記述を取得された前記観測データから生成することを含んでよい。当該構成によれば、問題記述における物体の記述部分を適切に生成することができる。 In the information processing device according to the above aspect, the generated problem description may further include a description of the object present in the environment. The inference module may include an object estimator. Generating the problem description using the inference module may include using the object estimator to generate a description of the object present in the environment from the acquired observation data. This configuration makes it possible to appropriately generate the object description portion of the problem description.

上記一側面に係る情報処理装置において、前記物体推定器は、コンテキスト内学習の訓練済みモデルを備えてよい。当該構成によれば、物体推定器がコンテキスト内学習の訓練済みモデルを備えていることで、環境に存在する物体を汎用的に推定することができ、これによって、様々なタスクに対する問題記述の生成への対応を期待することができる。 In the information processing device according to the above aspect, the object estimator may include a trained model for in-context learning. With this configuration, the object estimator includes a trained model for in-context learning, enabling general-purpose estimation of objects present in the environment, which is expected to enable the generation of problem descriptions for a variety of tasks.

上記一側面に係る情報処理装置の前記制御部は、前記取得するステップでは、前記環境に存在する前記物体の属性情報を更に取得するように構成されてよい。取得された前記観
測データから前記物体の記述を生成することは、取得された前記観測データ及び前記属性情報から前記物体の記述を生成することにより構成されてよい。当該構成によれば、属性情報により環境に存在する物体をより特定可能となるため、問題記述における物体の記述部分を生成する精度の向上を期待することができる。 The control unit of the information processing device according to the above aspect may be configured to further acquire attribute information of the objects present in the environment in the acquiring step. Generating a description of the object from the acquired observation data may be configured by generating a description of the object from the acquired observation data and the attribute information. With this configuration, attribute information makes it possible to more accurately identify objects present in the environment, which can be expected to improve the accuracy of generating the description portion of the object in the problem description.

上記一側面に係る情報処理装置において、前記推論モジュールは、初期状態推定器を含んでよい。前記推論モジュールを用いて、前記問題記述を生成することは、前記初期状態推定器を用いて、前記環境に存在する前記物体の前記初期状態の記述を生成することを含んでよい。当該構成によれば、問題記述における初期状態の記述部分を適切に生成することができる。 In the information processing device according to the above aspect, the inference module may include an initial state estimator. Generating the problem description using the inference module may include generating a description of the initial state of the object present in the environment using the initial state estimator. This configuration makes it possible to appropriately generate the description portion of the initial state in the problem description.

上記一側面に係る情報処理装置において、前記初期状態推定器は、コンテキスト内学習の訓練済みモデルを備えてよい。当該構成によれば、初期状態推定器がコンテキスト内学習の訓練済みモデルを備えていることで、環境に存在する物体の初期状態を汎用的に推論することができ、これによって、様々なタスクに対する問題記述の生成への対応を期待することができる。 In the information processing device according to the above aspect, the initial state estimator may include a trained model for in-context learning. With this configuration, the initial state estimator includes a trained model for in-context learning, enabling general-purpose inference of the initial states of objects present in the environment, which is expected to enable the generation of problem descriptions for a variety of tasks.

上記一側面に係る情報処理装置において、前記推論モジュールは、目標推定器を含んでよい。前記推論モジュールを用いて、前記問題記述を生成することは、前記目標推定器を用いて、前記環境に存在する前記物体の前記目標状態の記述を生成することを含んでよい。当該構成によれば、問題記述における目標状態の記述部分を適切に生成することができる。 In the information processing device according to the above aspect, the inference module may include a goal estimator. Generating the problem description using the inference module may include generating a description of the goal state of the object present in the environment using the goal estimator. This configuration makes it possible to appropriately generate the description portion of the goal state in the problem description.

上記一側面に係る情報処理装置において、前記目標推定器は、コンテキスト内学習の訓練済みモデルを備えてよい。当該構成によれば、目標推定器がコンテキスト内学習の訓練済みモデルを備えていることで、物体の目標状態を汎用的に推論することができ、これによって、様々なタスクに対する問題記述の生成への対応を期待することができる。 In the information processing device according to the above aspect, the goal estimator may include a trained model for in-context learning. With this configuration, the goal estimator includes a trained model for in-context learning, enabling general inference of the goal state of an object, which is expected to enable the generation of problem descriptions for a variety of tasks.

なお、本発明の形態は、上記情報処理装置に限られなくてよい。上記各側面に係る情報処理装置の別の態様として、本発明の一側面は、上記各構成の全部又はその一部を実現する情報処理方法であってもよいし、プログラムであってもよいし、このようなプログラムを記憶した、コンピュータ等の機械が読み取り可能な記憶媒体であってもよい。コンピュータ等の機械が読み取り可能な記憶媒体とは、プログラム等の情報を、電気的、磁気的、光学的、機械的、又は化学的作用によって蓄積する媒体である。 Note that the present invention is not limited to the information processing device described above. As another form of the information processing device according to each of the above aspects, one aspect of the present invention may be an information processing method that realizes all or part of each of the above configurations, or may be a program, or may be a storage medium that stores such a program and is readable by a machine such as a computer. A storage medium that is readable by a machine such as a computer is a medium that stores information such as a program through electrical, magnetic, optical, mechanical, or chemical action.

例えば、本発明の一側面に係る情報処理方法は、コンピュータが、ロボット装置の動作する環境の観測データ、及び前記ロボット装置に与えるタスクの目標に関する指示情報を取得するステップ、推論モジュールを用いて、取得された前記観測データ及び前記指示情報から前記タスクの問題記述を生成するステップ、並びに生成された前記問題記述を出力するステップを実行する情報処理方法であってよい。 For example, an information processing method according to one aspect of the present invention may be an information processing method in which a computer executes the steps of acquiring observation data of an environment in which a robotic device operates and instruction information regarding a task goal to be given to the robotic device, using an inference module to generate a problem description for the task from the acquired observation data and instruction information, and outputting the generated problem description.

また、例えば、本発明の一側面に係るプログラムは、ロボット装置の動作する環境の観測データ、及び前記ロボット装置に与えるタスクの目標に関する指示情報を取得するステップ、推論モジュールを用いて、取得された前記観測データ及び前記指示情報から前記タスクの問題記述を生成するステップ、並びに生成された前記問題記述を出力するステップをコンピュータに実行させるためのプログラムであってよい。 Also, for example, a program according to one aspect of the present invention may be a program for causing a computer to execute the steps of acquiring observation data of the environment in which a robotic device operates and instruction information regarding the goal of a task to be given to the robotic device, using an inference module to generate a problem description for the task from the acquired observation data and instruction information, and outputting the generated problem description.

本発明によれば、ロボット装置の制御に関して、説明性の高い出力を得ることができる。 This invention makes it possible to obtain highly descriptive output regarding the control of a robotic device.

図１は、本発明が適用される場面の一例を模式的に示す。FIG. 1 is a diagram showing an example of a situation in which the present invention is applied. 図２は、実施の形態に係る推論モジュールの入出力の一例を模式的に示す。FIG. 2 is a diagram illustrating an example of input and output of an inference module according to an embodiment. 図３は、実施の形態に係る問題記述の修正過程の一例を模式的に示す。FIG. 3 is a diagram illustrating an example of a process for modifying a problem description according to an embodiment. 図４は、実施の形態に係る推論モジュールの構成の一例を模式的に示す。FIG. 4 is a diagram illustrating an example of the configuration of an inference module according to an embodiment. 図５Ａは、実施の形態に係る物体推定器の一例を模式的に示す。FIG. 5A illustrates a schematic diagram of an example object estimator according to an embodiment. 図５Ｂは、実施の形態に係る初期状態推定器の一例を模式的に示す。FIG. 5B illustrates an example of an initial state estimator according to an embodiment. 図５Ｃは、実施の形態に係る目標推定器の一例を模式的に示す。FIG. 5C illustrates a schematic diagram of an example target estimator according to an embodiment. 図６は、実施の形態に係る情報処理装置のハードウェア構成の一例を模式的に示す。FIG. 6 is a diagram illustrating an example of a hardware configuration of an information processing device according to an embodiment. 図７は、実施の形態に係る情報処理装置のソフトウェア構成の一例を模式的に示す。FIG. 7 is a diagram illustrating an example of a software configuration of the information processing device according to the embodiment. 図８は、実施の形態に係る情報処理装置の処理手順の一例を示すフローチャートである。FIG. 8 is a flowchart illustrating an example of a processing procedure of the information processing device according to the embodiment. 図９は、他の形態に係る推論モジュールの構成の一例を模式的に示す。FIG. 9 is a diagram illustrating an example of the configuration of an inference module according to another embodiment. 図１０は、用意した各ドメインにおけるドメイン記述に含まれる各記述を示す。FIG. 10 shows each description included in the domain description for each prepared domain. 図１１Ａは、ドメイン（Cooking）で与えた観測データ及び指示情報の例を示す。FIG. 11A shows an example of observation data and instruction information given in the domain (Cooking). 図１１Ｂは、ドメイン（Blocksworld）で与えた観測データ及び指示情報の例を示す。FIG. 11B shows an example of observation data and instruction information given in the domain (Blocksworld). 図１１Ｃは、ドメイン（Hanoi）で与えた観測データ及び指示情報の例を示す。FIG. 11C shows an example of observation data and instruction information given in the domain (Hanoi). 図１２は、第１実験例の結果を示す。FIG. 12 shows the results of the first experimental example. 図１３は、第２実験例の結果を示す。FIG. 13 shows the results of the second experimental example. 図１４は、第３実験例の結果を示す。FIG. 14 shows the results of the third experimental example.

以下、本発明の一側面に係る実施の形態（以下、「本実施形態」とも表記する）を、図面に基づいて説明する。ただし、以下で説明する本実施形態は、あらゆる点において本発明の例示に過ぎない。本発明の範囲を逸脱することなく種々の改良又は変形が行われてよい。本発明の実施にあたって、実施の形態に応じた具体的構成が適宜採用されてもよい。なお、本実施形態において登場するデータを自然言語により説明しているが、より具体的には、コンピュータが認識可能な疑似言語、コマンド、パラメータ、マシン語等で指定される。 An embodiment of one aspect of the present invention (hereinafter also referred to as "this embodiment") will be described below with reference to the drawings. However, the embodiment described below is merely an example of the present invention in all respects. Various improvements and modifications may be made without departing from the scope of the present invention. In implementing the present invention, specific configurations according to the embodiment may be adopted as appropriate. Note that while the data appearing in this embodiment is described in natural language, more specifically, it is specified using pseudo-language, commands, parameters, machine language, etc. that can be understood by a computer.

§１適用例
図１は、本発明が適用される場面の一例を模式的に示す。本実施形態に係る情報処理装置１は、プランナが行動計画を生成するのに用いる問題記述５を生成するように構成された１台以上のコンピュータである。具体的に、情報処理装置１は、ロボット装置Ｒの動作する環境の観測データ２０、及びロボット装置Ｒに与えるタスクの目標に関する指示情報２１を取得する。情報処理装置１は、推論モジュール３を用いて、取得された観測データ２０及び指示情報２１からタスクの問題記述５を生成する。問題記述５は、環境に存在する１つ以上の物体の初期状態及び目標状態の記述（５１、５２）を含む。情報処理装置１は、生成された問題記述５を出力する。 §1 Application Example FIG. 1 schematically illustrates an example of a scenario in which the present invention is applied. An information processing device 1 according to this embodiment is one or more computers configured to generate a problem description 5 used by a planner to generate an action plan. Specifically, the information processing device 1 acquires observation data 20 of the environment in which the robot device R operates and instruction information 21 regarding the goal of a task to be given to the robot device R. The information processing device 1 uses an inference module 3 to generate a problem description 5 for the task from the acquired observation data 20 and instruction information 21. The problem description 5 includes descriptions (51, 52) of initial states and goal states of one or more objects present in the environment. The information processing device 1 outputs the generated problem description 5.

問題記述５は、タスクの初期状態から目標状態に到達する行動計画をプランナが生成するため、環境に存在する１つ以上の物体それぞれの初期状態及び目標状態の記述（５１、５２）を含んでいる。初期状態は、タスクを遂行する前の状態である。初期状態は、開始
状態と称してもよい。目標状態は、タスクを遂行し、目的を達成した後の状態である。初期状態の記述５１は、環境に存在する１つ以上の物体それぞれの初期状態を説明する。一方、目標状態の記述５２は、タスクの達成により到達する、１つ以上の物体それぞれの目標状態を説明する。すなわち、各記述（５１、５２）は、タスクの遂行前後の各状態を示し、人間にとって解釈可能である。一例では、各記述（５１、５２）は、ロボット装置Ｒの行動ではなく、物体の状態を示すため、ロボット装置Ｒの特別な理解がなくても（例えば、プログラムコードを解読しなくても）、人間にとって解釈しやすいものである。そのため、この問題記述５は、高い説明性を有している。したがって、本実施形態によれば、ロボット装置Ｒに与える動作系列（制御指令）を得るための説明性の高い出力を得ることができる。 The problem description 5 includes descriptions (51, 52) of the initial state and the goal state of each of one or more objects in the environment, so that the planner generates an action plan that reaches the goal state from the initial state of the task. The initial state is the state before the task is performed. The initial state may also be referred to as the starting state. The goal state is the state after the task is performed and the objective is achieved. The description 51 of the initial state describes the initial state of each of one or more objects in the environment. On the other hand, the description 52 of the goal state describes the goal state of each of the one or more objects that is reached by completing the task. In other words, each description (51, 52) indicates the respective states before and after the task is performed and is interpretable by humans. In one example, each description (51, 52) describes the state of an object rather than the behavior of the robot device R, making it easy for humans to interpret even without a special understanding of the robot device R (e.g., without deciphering the program code). Therefore, the problem description 5 has high interpretability. Therefore, according to this embodiment, a highly interpretable output can be obtained to obtain an action sequence (control command) to be given to the robot device R.

［入出力データ］
本実施形態では、観測データ２０及び指示情報２１が入力データとして推論モジュール３に与えられ、推論モジュール３の演算処理を実行した結果、問題記述５が出力データとして得られる。 [Input/output data]
In this embodiment, the observation data 20 and the instruction information 21 are given to the inference module 3 as input data, and as a result of the calculation process of the inference module 3, the problem description 5 is obtained as output data.

観測データ２０は、タスクを遂行する前における物体の初期状態を表すものであれば何でもよい。観測データ２０の種類は、特に限られなくてもよく、実施の形態に応じて適宜選択されてよい。観測データ２０は、１つ以上のモダリティのデータにより構成されてよい。環境は、任意の方法で観測されてよい。環境は、ロボット装置Ｒがタスクを遂行する状況に関するあらゆる事象を含んでよい。環境は、実環境及び仮想環境の少なくともいずれにより構成されてよい。例えば、ＡＲ（Augmented Reality）、ＭＲ（Mixed Reality）等のように、環境は、実環境及び仮想環境の両方により構成されてもよい。環境は、ＶＲ（Virtual Reality）等を含んでもよい。環境に存在する物体の数は任意でよい。観測データ２０は、タスクの遂行を開始する前の任意のタイミングに取得されてよい。複数の作業が与えられる場合に、初期状態は、複数の作業を遂行する前の状態であってもよいし、複数の作業に含まれる一作業を遂行する前の状態（複数の作業を遂行する間の中間の状態を含む）であってもよい。すなわち、「初期」は、必ずしも「最初」を示していなくてもよい。タスクは、初期状態から目標状態に遷移するための任意の作業（仕事）であってよい。このタスクを遂行する前の状態が初期状態であってよく、タスクを適正に遂行した後の状態が目標状態であってよい。一タスクの間隔は、実施の形態に応じて適宜決定されてよい。観測データ２０には物体の初期状態が表れるため、推論モジュール３は、問題記述５における初期状態の記述５１を観測データ２０から推論することができる。 The observation data 20 may be any data representing the initial state of an object before a task is performed. The type of observation data 20 is not particularly limited and may be selected appropriately depending on the embodiment. The observation data 20 may be composed of data from one or more modalities. The environment may be observed using any method. The environment may include all events related to the situation in which the robot device R performs a task. The environment may be composed of at least a real environment and a virtual environment. For example, the environment may be composed of both a real environment and a virtual environment, such as AR (Augmented Reality) or MR (Mixed Reality). The environment may include VR (Virtual Reality), etc. The number of objects present in the environment may be arbitrary. The observation data 20 may be acquired at any time before the start of performance of a task. When multiple tasks are given, the initial state may be the state before the multiple tasks are performed, or the state before one of the multiple tasks is performed (including an intermediate state between the performance of the multiple tasks). That is, "initial" does not necessarily mean "beginning." A task may be any work (job) for transitioning from an initial state to a goal state. The state before the task is performed may be the initial state, and the state after the task is properly performed may be the goal state. The interval between tasks may be determined appropriately depending on the embodiment. Because the initial state of the object appears in the observation data 20, the inference module 3 can infer the description 51 of the initial state in the problem description 5 from the observation data 20.

指示情報２１は、タスクの目標を特定可能であれば何でもよい。指示情報２１の種類は、特に限られなくてよく、実施の形態に応じて適宜選択されてよい。指示情報２１は、１つ以上のモダリティのデータにより構成されてよい。目標は、適宜与えられてよい。指示情報２１は、タスクの遂行を開始する前の任意のタイミングに取得されてよい。指示情報２１は、観測データ２０よりも前に取得されてもよいし、後に取得されてもよい。また、指示情報２１は、観測データ２０と少なくとも部分的に並列に取得されてもよい。指示情報２１にはタスクの目標が表れるため、推論モジュール３は、問題記述５における目標状態の記述５２を指示情報２１から推論することができる。 The instruction information 21 may be any information that can identify the goal of the task. The type of instruction information 21 is not particularly limited and may be selected appropriately depending on the embodiment. The instruction information 21 may be composed of data of one or more modalities. The goal may be given as appropriate. The instruction information 21 may be acquired at any timing before the start of performance of the task. The instruction information 21 may be acquired before or after the observation data 20. The instruction information 21 may also be acquired at least partially in parallel with the observation data 20. Because the goal of the task is expressed in the instruction information 21, the inference module 3 can infer the description 52 of the goal state in the problem description 5 from the instruction information 21.

問題記述５は、初期状態及び目標状態の各記述（５１、５２）を含むことで、ロボット装置Ｒのスキルが与えられると、プランナが、そのロボット装置Ｒの実行可能性（実行可能か否か）を含め、ロボット装置Ｒに対する行動計画を生成可能にするものである。そのようなものである限り、問題記述５の構成は、特に限られなくてよく、実施の形態に応じて適宜選択されてよい。問題記述５は、ロボット装置に依存しないため、任意のロボット装置の行動計画の生成に使用されてよい。プランナは、問題記述５から行動計画を生成するように適宜構成されてよい。 Problem description 5 includes descriptions (51, 52) of the initial state and the goal state, which enables the planner to generate an action plan for robot device R, including the feasibility (whether or not the robot device R is feasible) of the robot device R, given the skills of the robot device R. As long as it is of this nature, the configuration of problem description 5 is not particularly limited and may be selected appropriately depending on the embodiment. Because problem description 5 is robot device independent, it may be used to generate an action plan for any robot device. The planner may be configured appropriately to generate an action plan from problem description 5.

図１に示されるとおり、一例では、ロボット装置Ｒのスキルはドメイン記述２３により与えられてよい。また、一例では、プランナは、シンボリックプランナ６１及びモーションプランナ６５を備えてよい。シンボリックプランナ６１は、抽象的な行動の系列である動作系列６３を問題記述５及びドメイン記述２３から生成するように構成されてよい。抽象的な行動は、ロボット装置Ｒの１つ以上の動作を含む任意の動作の集まりであって、希望（例えば、言葉等）で表現可能な動作の集まりで定義されてよい。抽象的な行動は、例えば、物体を掴む、運ぶ、位置決めする等の意味のある（すなわち、人間が理解可能な）動作の集まりで定義されてよい。シンボリックプランナ６１には、例えば、参考文献１（“The Fast Downward Planning System”, [online], [令和５年１０月２４日検索], インターネット＜ＵＲＬ：https://planning.wiki/ref/planners/fd＞）のFast Downward、参考文献２（Silvia Richter et al., “The LAMA planner: guiding cost-based anytime planning with landmarks”, [online], [令和５年１０月２４日検索], インターネット＜ＵＲＬ：https://dl.acm.org/doi/10.5555/1946417.1946420＞）のLAMA planner等の公知のシンボリックプランナが用いられてよい。 1, in one example, the skills of the robot device R may be given by the domain description 23. Also, in one example, the planner may include a symbolic planner 61 and a motion planner 65. The symbolic planner 61 may be configured to generate an action sequence 63, which is a sequence of abstract actions, from the problem description 5 and the domain description 23. The abstract action may be defined as any collection of actions that includes one or more actions of the robot device R and can be expressed as desired (e.g., in words, etc.). The abstract action may be defined as a collection of meaningful (i.e., human-understandable) actions, such as grasping, carrying, and positioning an object. The symbolic planner 61 may be a known symbolic planner such as Fast Downward in Reference 1 ("The Fast Downward Planning System", [online], [searched October 24, 2023], Internet <URL: https://planning.wiki/ref/planners/fd>) or the LAMA planner in Reference 2 (Silvia Richter et al., "The LAMA planner: guiding cost-based anytime planning with landmarks", [online], [searched October 24, 2023], Internet <URL: https://dl.acm.org/doi/10.5555/1946417.1946420>).

モーションプランナ６５は、シンボリックプランナ６１により動作系列６３が与えられると、動作系列６３に含まれる各行動をロボット装置Ｒに実行させるための制御指令６７の系列を生成するように構成されてよい。また、モーションプランナ６５は、動作系列６３に含まれる各行動をロボット装置Ｒが実行可能か否かを判定するように構成されてよい。モーションプランナ６５には、例えば、参考文献３（James J. Kuffner, Jr., Steven M. LaValle, “RRT-Connect: An Efficient Approach to Single-Query Path Planning”, [online], [令和５年１０月２４日検索], インターネット＜ＵＲＬ：https://www.cs.cmu.edu/afs/cs/academic/class/15494-s12/readings/kuffner_icra2000.pdf＞）のRRT-connect、参考文献４（Nathan Ratliff et al., “CHOMP: Gradient Optimization Techniques for Efficient Motion Planning”, [online], [令和５年１０月２４日検索], インターネット＜ＵＲＬ：https://www.ri.cmu.edu/pub_files/2009/5/icra09-chomp.pdf＞）のCHOMP（Covariant Hamiltonian Optimization for Motion Planning）、参考文献５（Mrinal Kalakrishnan et al., “STOMP: Stochastic Trajectory Optimization for Motion Planning”, [online], [令和５年１０月２４日検索], インターネット＜ＵＲＬ：http://ros.fei.edu.br/roswiki/attachments/Papers(2f)ICRA2011_Kalakrishnan/kalakrishnan_icra2011.pdf＞）のSTOMP等の公知のモーションプランナが用いられてよい。生成された制御指令６７は、ロボット装置Ｒに適宜与えられてよい。ロボット装置Ｒは、与えられた制御指令６７に従って、任意のタイミングでタスクの遂行を開始してよい。なお、プランナの構成は、図１の例に限られなくてよく、実施の形態に応じて適宜変更されてよい。他の一例では、プランナは、問題記述５及びドメイン記述２３から直接的に制御指令６７を推論することで、行動計画を生成するように構成されてもよい。 When the symbolic planner 61 provides the motion sequence 63, the motion planner 65 may be configured to generate a sequence of control commands 67 for causing the robot device R to execute each action included in the motion sequence 63. The motion planner 65 may also be configured to determine whether the robot device R can execute each action included in the motion sequence 63. The motion planner 65 may be, for example, the RRT-connect technique described in Reference 3 (James J. Kuffner, Jr., Steven M. LaValle, “RRT-Connect: An Efficient Approach to Single-Query Path Planning”, [online], [searched October 24, 2023], Internet URL: https://www.cs.cmu.edu/afs/cs/academic/class/15494-s12/readings/kuffner_icra2000.pdf>), or the CHOMP (Covariant Hamiltonian Optimization) technique described in Reference 4 (Nathan Ratliff et al., “CHOMP: Gradient Optimization Techniques for Efficient Motion Planning”, [online], [searched October 24, 2023], Internet URL: https://www.ri.cmu.edu/pub_files/2009/5/icra09-chomp.pdf>). A known motion planner may be used, such as STOMP, which is described in Reference 5 (Mrinal Kalakrishnan et al., "STOMP: Stochastic Trajectory Optimization for Motion Planning", [online], [searched October 24, 2023], Internet <URL: http://ros.fei.edu.br/roswiki/attachments/Papers(2f)ICRA2011_Kalakrishnan/kalakrishnan_icra2011.pdf>). The generated control command 67 may be provided to the robot device R as appropriate. The robot device R may start performing a task at any timing in accordance with the provided control command 67. Note that the configuration of the planner is not limited to the example shown in FIG. 1 and may be modified as appropriate depending on the embodiment. In another example, the planner may be configured to generate a behavior plan by directly inferring the control command 67 from the problem description 5 and the domain description 23.

初期状態及び目標状態の各記述（５１、５２）を含み、行動計画の生成に利用可能であれば、問題記述５の形式は、特に限られなくてよく、実施の形態に応じて適宜選択されてよい。一例では、生成される問題記述５は、所定のフォーマットに従うように構成されてよい。所定のフォーマットは、例えば、ＰＤＤＬ（Planning Domain Definition Language）、ＰＤＤＬＳｔｒｅａｍ（参考文献６：“pddlstream”, [online], [令和５年１０月２４日検索], インターネット＜ＵＲＬ：https://github.com/caelan/pddlstream＞）等のプランナのための言語（プランニング記述言語）により与えられてよい。本実施形態の一例では、生成される問題記述５が所定のフォーマットに従っていることで、プランナ（例えば、シンボリックプランナ６１）が問題記述５から行動計画を生成しやすくすることができる。 As long as the problem description 5 includes descriptions of the initial state and the goal state (51, 52) and can be used to generate an action plan, the format of the problem description 5 is not particularly limited and may be selected appropriately depending on the embodiment. In one example, the generated problem description 5 may be configured to follow a predetermined format. The predetermined format may be provided by a language for planners (planning description languages) such as PDDL (Planning Domain Definition Language) or PDDLStream (Reference 6: "pddlstream", [online], [searched October 24, 2023], Internet <URL: https://github.com/caelan/pddlstream>). In one example of this embodiment, the generated problem description 5 follows a predetermined format, making it easier for a planner (e.g., symbolic planner 61) to generate an action plan from the problem description 5.

他の一例では、生成される問題記述５は、所定のフォーマットに従っていなくてもよい
。この場合、生成された問題記述５は、変換プログラム等の中間処理により、所定のフォーマットに従うように変換されてよい。或いは、コンテキスト内学習の訓練済みモデルにより構成されるプランナ等、任意の形式の入力を受け付け可能なプランナにより、生成された問題記述５は、所定のフォーマットに従っていないまま、行動計画の生成に利用されてもよい。 In another example, the generated problem statement 5 may not conform to a predetermined format. In this case, the generated problem statement 5 may be converted to conform to a predetermined format by an intermediate process such as a conversion program. Alternatively, the generated problem statement 5 may be used to generate an action plan without conforming to a predetermined format by a planner that can accept input in any format, such as a planner configured with a trained model for in-context learning.

なお、推論モジュール３の入出力データは、実施の形態に応じて適宜変更されてよい。推論モジュール３は、観測データ２０及び指示情報２１以外の任意のデータの入力を更に受け付けてもよい。また、推論モジュール３は、問題記述５以外の任意のデータを更に出力してもよい。問題記述５は、初期状態及び目標状態の各記述（５１、５２）以外の任意の情報を更に含んでもよい。 The input and output data of the inference module 3 may be modified as appropriate depending on the embodiment. The inference module 3 may further accept input of any data other than the observation data 20 and the instruction information 21. The inference module 3 may also output any data other than the problem description 5. The problem description 5 may further include any information other than the descriptions of the initial state and the goal state (51, 52).

（入出力データの一例）
図２は、本実施形態に係る推論モジュール３の入出力の一例を模式的に示す。図２の例では、入力データは、観測データ２０、指示情報２１及び環境情報２２を含んでいる。問題記述５（出力データ）は、物体、初期状態及び目標状態の各記述（５０、５１、５２）を含んでいる。 (Example of input/output data)
2 is a schematic diagram illustrating an example of input and output of the inference module 3 according to this embodiment. In the example of FIG. 2, the input data includes observation data 20, instruction information 21, and environment information 22. The problem description 5 (output data) includes descriptions (50, 51, 52) of the object, the initial state, and the goal state.

一例では、環境は、１つ以上のセンサＳにより観測されてよく、これに応じて、観測データ２０は、１つ以上のセンサＳのセンシングデータにより構成されてよい。センサＳの種類は、特に限られなくてよく、実施の形態に応じて適宜選択されてよい。センサＳは、例えば、カメラ、深度センサ、赤外線センサ、光学センサ、レーダ、LiDAR（Light Detection And Ranging）、マイクロフォン、位置センサ、その他の測定センサ等を含んでよい。位置センサは、例えば、ＧＰＳ（Global Positioning System）センサ、ＧＮＳＳ（Global Navigation Satellite System）センサ等であってよい。センシングデータは、例えば、画像データ、深度データ、赤外線データ、光学センサの測定データ（マーカの検出結果等）、レーダデータ、LiDARデータ、音データ、位置データ、その他測定データ等を含んでよい。また、センサＳは、ロボット装置Ｒの状態を測定するための１つ以上の測定センサを含んでよい。測定センサは、例えば、エンコーダ、モーションキャプチャ、触覚センサ、力覚センサ、慣性計測ユニット（Inertial Measurement Unit）等を含んでよい。これに応じて、センシングデータは、ロボット装置Ｒの測定データ（例えば、関節角度、手先位置、手先における触覚データ、手先における力覚データ、姿勢の計測データ等）を含んでよい。センサＳは、ロボット装置Ｒの外部に配置されてもよいし、ロボット装置Ｒに配置されてもよい。 In one example, the environment may be observed by one or more sensors S, and the observation data 20 may accordingly be composed of sensing data from one or more sensors S. The type of sensor S is not particularly limited and may be selected appropriately depending on the embodiment. The sensor S may include, for example, a camera, depth sensor, infrared sensor, optical sensor, radar, LiDAR (Light Detection and Ranging), microphone, position sensor, other measurement sensor, etc. The position sensor may be, for example, a GPS (Global Positioning System) sensor, a GNSS (Global Navigation Satellite System) sensor, etc. The sensing data may include, for example, image data, depth data, infrared data, measurement data from an optical sensor (such as marker detection results), radar data, LiDAR data, sound data, position data, other measurement data, etc. The sensor S may also include one or more measurement sensors for measuring the state of the robot device R. The measurement sensor may include, for example, an encoder, motion capture, a tactile sensor, a force sensor, an inertial measurement unit, etc. Accordingly, the sensing data may include measurement data of the robot device R (e.g., joint angles, hand positions, tactile data at the hand, force data at the hand, posture measurement data, etc.). The sensor S may be located outside the robot device R or within the robot device R.

また、一例では、指示情報２１は、目標を自然言語で指示する言語情報により構成されてよい。自然言語の指示は、任意の方法で獲得されてよい。自然言語による指示は、例えば、テキスト入力、音声入力、画像入力、その他入力等の方法により獲得されてよい。指示情報２１（言語情報）のデータ形式は、実施の形態に応じて適宜選択されてよい。指示情報２１は、例えば、テキストデータ、音声データ、画像データ、その他形式のデータ等により構成されてよい。獲得された自然言語の指示データはそのまま指示情報２１として用いされてもよいし、データ形式を変換された上で、指示情報２１として用いられてよい。後者の一例として、例えば、音声からテキスト、テキストから音声等のように、指示データは変換モデルにより変換されてよく、変換された指示データが指示情報２１として用いられてよい。変換は、音声解析等の任意の解析処理を含んでよい。変換モデルは、推論モジュール３に含まれてもよいし、推論モジュール３とは別個に用意されてもよい。変換モデルは、訓練済みモデル（訓練された機械学習モデル）及びルールベースモデルの少なくともいずれかにより構成されてよい。自然言語の指示は、オペレータ等により人手で与えられてもよいし、コンピュータ処理により自動的に与えられてもよい。指示情報２１は、タスクに応じて適宜与えられてよい。一例では、指示情報２１は、タスクが与えられる
毎等のように、その都度与えられてもよい。他の一例では、例えば、ハノイの塔等のように、目標が予め特定されている場合、指示情報２１は、所与であってもよい。本実施形態の一例では、環境をセンサＳで観測し、言語指示でタスクの目標を与える場面で、ロボット装置Ｒの制御に関して、説明性の高い出力を得ることができる。 In one example, the instruction information 21 may be composed of linguistic information that indicates a target in natural language. The natural language instruction may be acquired by any method. The natural language instruction may be acquired by, for example, text input, voice input, image input, or other input methods. The data format of the instruction information 21 (linguistic information) may be selected appropriately depending on the embodiment. The instruction information 21 may be composed of, for example, text data, voice data, image data, or other types of data. The acquired natural language instruction data may be used as the instruction information 21 as is, or may be used as the instruction information 21 after converting its data format. As an example of the latter, the instruction data may be converted using a conversion model, such as from voice to text or from text to voice, and the converted instruction data may be used as the instruction information 21. The conversion may include any analysis process, such as voice analysis. The conversion model may be included in the inference module 3 or may be prepared separately from the inference module 3. The conversion model may be composed of at least one of a trained model (a trained machine learning model) and a rule-based model. The natural language instructions may be given manually by an operator or the like, or automatically by computer processing. The instruction information 21 may be given as appropriate depending on the task. In one example, the instruction information 21 may be given each time, such as each time a task is given. In another example, the instruction information 21 may be given when a target is specified in advance, such as the Tower of Hanoi. In one example of this embodiment, when the environment is observed by a sensor S and a task target is given by linguistic instructions, a highly descriptive output can be obtained regarding the control of the robot device R.

なお、観測データ２０及び指示情報２１の構成は、実施の形態に応じて適宜変更されてよい。観測データ２０は、センシングデータと共に、例えば、人手により与えられたデータ、コンピュータ処理により生成されたデータ等、センシングデータ以外の任意のデータを含んでよい。他の一例では、観測データ２０は、センシングデータを含まず、センシングデータ以外の任意のデータにより構成されてもよい。また、他の一例では、指示情報２１は、自然言語以外の形式で与えられてもよい。例えば、指示情報２１は、自然言語以外の記号によるテキスト形式で与えられてよい。また、例えば、指示情報２１は、環境に存在する物体の目標状態を示す画像により構成されてよい。画像は、実画像及び仮想画像の少なくともいずれかにより構成されてよい。 The configuration of the observation data 20 and the instruction information 21 may be modified as appropriate depending on the embodiment. The observation data 20 may include, in addition to sensing data, any data other than sensing data, such as manually provided data or data generated by computer processing. In another example, the observation data 20 may not include sensing data, and may instead be composed of any data other than sensing data. In another example, the instruction information 21 may be provided in a format other than natural language. For example, the instruction information 21 may be provided in a text format using symbols other than natural language. For example, the instruction information 21 may be composed of an image showing a target state of an object present in the environment. The image may be composed of at least one of a real image and a virtual image.

また、一例では、情報処理装置１は、環境に関する環境情報２２を更に取得するように構成されてよい。取得された観測データ２０及び指示情報２１から問題記述５を生成することは、取得された観測データ２０、指示情報２１及び環境情報２２から問題記述５を生成することにより構成されてよい。本実施形態の一例によれば、環境情報２２により、タスクを遂行する環境に制約を与えることで、問題記述５を生成する条件を絞ることができる。すなわち、観測データ２０及び指示情報２１に加えて、環境情報２２を説明変数として更に用いることで、タスクを遂行する環境がより特定可能となる。そのため、問題記述５を生成する精度の向上を期待することができる。 In one example, the information processing device 1 may be configured to further acquire environmental information 22 related to the environment. Generating the problem description 5 from the acquired observation data 20 and instruction information 21 may be configured by generating the problem description 5 from the acquired observation data 20, instruction information 21, and environmental information 22. According to one example of this embodiment, the environmental information 22 can be used to impose constraints on the environment in which the task is performed, thereby narrowing down the conditions for generating the problem description 5. In other words, by further using the environmental information 22 as an explanatory variable in addition to the observation data 20 and instruction information 21, it becomes possible to more accurately identify the environment in which the task is performed. Therefore, an improvement in the accuracy of generating the problem description 5 can be expected.

なお、環境情報２２は、問題記述５の生成に関与し得る、ロボット装置Ｒの関係に関するあらゆる情報を含んでよい。一例では、環境情報２２は、ドメイン記述２３及びドメイン情報２４の少なくともいずれかを含んでよい。 Note that the environment information 22 may include any information related to the relationships of the robot device R that may be involved in generating the problem description 5. In one example, the environment information 22 may include at least one of the domain description 23 and domain information 24.

ドメイン記述２３は、ロボット装置Ｒのスキルを含む、全ての問題に共通の事象を定義する。上記のとおり、ドメイン記述２３は、問題記述５と共に、プランナによる行動計画の生成に用いられてよい。問題記述５が、ＰＤＤＬ、ＰＤＤＬＳｔｒｅａｍ等の所定のフォーマットに従って与えられる場合、ドメイン記述２３も、所定のフォーマットに従って与えられてよい。例えば、ＰＤＤＬを採用する場合、問題記述５は、problem.pddlであってよく、ドメイン記述２３は、domain.pddlであってよい。また、一例では、ドメイン記述２３は、ロボット装置Ｒのスキルを定義する記述（例えば、actions）、環境に存在する対象物体の状態を定義する記述（例えば、predicates）、対象物体の種類を定義する記述（例えば、types）、及びプランナとの互換性を確認するための記述（例えば、requirements）を含んでよい。対象物体の状態は、定義されたスキル（行動）をロボット装置Ｒが実行したときに対象物体が取り得る状態を含んでよい。 The domain description 23 defines phenomena common to all problems, including the skills of the robot device R. As described above, the domain description 23, together with the problem description 5, may be used by the planner to generate an action plan. If the problem description 5 is provided in a predetermined format such as PDDL or PDDLStream, the domain description 23 may also be provided in a predetermined format. For example, if PDDL is used, the problem description 5 may be "problem.pddl" and the domain description 23 may be "domain.pddl." In one example, the domain description 23 may include descriptions that define the skills of the robot device R (e.g., "actions"), descriptions that define the states of target objects present in the environment (e.g., "predicates"), descriptions that define the types of target objects (e.g., "types"), and descriptions that confirm compatibility with the planner (e.g., "requirements"). The states of the target object may include states that the target object can take when the robot device R executes the defined skills (behavior).

ドメイン情報２４は、問題記述５を生成するドメインを限定する任意の情報を含んでよい。ドメイン情報２４は、ドメイン知識とも称してよい。ドメイン情報２４は、ドメイン記述２３以外の環境に関する任意の情報により構成されてよい。ドメイン情報２４は、ドメイン記述２３と共に問題記述５を生成する条件を補足してよい。一例では、ドメイン情報２４は、環境に存在する物体の属性情報２４１を含んでよい。属性情報２４１は、例えば、物体の名称、特徴等を含んでよい。特徴は、例えば、色、形状、大きさ等の外観に関する特徴を含んでよい。外観に関する特徴は、例えば、まな板が丸い、カウンタが黒い等である。これにより、観測データ２０に表れる物体の特徴を絞ることができ、その結果、観測データ２０に対する推論の精度の向上を期待することができる。また、一例では、後述するように、推論モジュール３がコンテキスト内学習の訓練済みモデルを含むように構
成される場合、ドメイン情報２４は、当該訓練済みモデルに対する１つ以上の入出力サンプル２４３を含んでよい。例えば、出力サンプルは、問題記述５のうち、訓練済みモデルにより生成される部分の正解サンプルであってよい。入力サンプルは、例えば、その問題記述５の正解サンプルを得るために与えられる観測データ２０、指示情報２１及び環境情報２２のうち、訓練済みモデルに入力される部分のサンプルであってよい。 The domain information 24 may include any information that limits the domain for generating the problem statement 5. The domain information 24 may also be referred to as domain knowledge. The domain information 24 may be composed of any information about the environment other than the domain description 23. The domain information 24 may supplement the conditions for generating the problem statement 5 together with the domain description 23. In one example, the domain information 24 may include attribute information 241 of objects present in the environment. The attribute information 241 may include, for example, the name and characteristics of the object. The characteristics may include, for example, characteristics related to appearance, such as color, shape, and size. Examples of characteristics related to appearance include a cutting board being round and a counter being black. This makes it possible to narrow down the characteristics of objects appearing in the observation data 20, thereby improving the accuracy of inference for the observation data 20. In another example, as described below, when the inference module 3 is configured to include a trained model for in-context learning, the domain information 24 may include one or more input/output samples 243 for the trained model. For example, the output sample may be a ground truth sample of a portion of the problem statement 5 that is generated by the trained model. The input sample may be a sample of a portion of the observation data 20, instruction information 21, and environment information 22 that are provided to obtain a ground truth sample of the problem statement 5 and that is input to the trained model.

入力データ（観測データ２０、指示情報２１及び環境情報２２）を推論モジュール３に与える形式は、特に限られなくてよく、実施の形態に応じて適宜決定されてよい。一例では、観測データ２０、指示情報２１及び環境情報２２はそのまま推論モジュール３に入力されてよい。他の一例では、観測データ２０、指示情報２１及び環境情報２２の少なくともいずれかのデータには前処理が適用されてよく、前処理後のデータが推論モジュール３に入力されてよい。前処理は、情報を解析する処理、情報を付加する処理、情報を削減する処理等の任意の演算処理を含んでよい。前処理を実行する演算モデルは、推論モジュール３に含まれてもよいし、推論モジュール３とは別個に用意されてもよい。演算モデルは、訓練済みモデル（訓練された機械学習モデル）及びルールベースモデルの少なくともいずれかにより構成されてよい。例えば、観測データ２０が画像データにより構成されるケースにおいて、推論モジュール３は、画像データの入力を受け付けるように構成されてもよい。或いは、画像データは、画像処理、解析モデル等の任意の方法で解析されてよい。この解析処理は、前処理の一例である。解析結果は、例えば、バウンディングボックスの検出結果、画像データに写る物体の識別結果等を含んでよい。推論モジュール３は、画像データに対する解析結果の入力を受け付けるように構成されてもよい。 The format in which the input data (observation data 20, instruction information 21, and environmental information 22) are provided to the inference module 3 is not particularly limited and may be determined appropriately depending on the embodiment. In one example, the observation data 20, instruction information 21, and environmental information 22 may be input to the inference module 3 as is. In another example, preprocessing may be applied to at least one of the observation data 20, instruction information 21, and environmental information 22, and the preprocessed data may be input to the inference module 3. The preprocessing may include any computational process, such as a process for analyzing information, a process for adding information, or a process for reducing information. The computational model that performs the preprocessing may be included in the inference module 3 or may be prepared separately from the inference module 3. The computational model may be composed of at least one of a trained model (a trained machine learning model) and a rule-based model. For example, in a case in which the observation data 20 is composed of image data, the inference module 3 may be configured to accept input of the image data. Alternatively, the image data may be analyzed using any method, such as image processing or an analytical model. This analysis process is an example of preprocessing. The analysis results may include, for example, bounding box detection results, identification results of objects appearing in the image data, etc. The inference module 3 may be configured to accept input of analysis results for the image data.

また、一例では、生成される問題記述５は、環境に存在する物体の記述５０を更に含んでもよい。物体の記述５０は、環境に存在する物体のリストに対応し得る。記述５０に含まれる物体の範囲は、実施の形態に応じて適宜決定されてよい。例えば、物体の記述５０は、対象の環境で観測され得る全ての物体のリストにより構成されてよい。また、例えば、物体の記述５０は、観測され得る一部の物体のリストにより構成されてよい。一部の物体は、タスクに関与し得る物体等の関心ある物体であってよい。この場合、物体の記述５０において、関心ない物体（例えば、タスクに関与し得ない物体）は省略されてよい。本実施形態の一例によれば、物体の記述５０により、環境に存在する物体が特定しやすくなることで、プランナ（例えば、シンボリックプランナ６１）が問題記述５から行動計画を生成しやすくすることができる。なお、初期状態及び目標状態の各記述（５１、５２）が所定のフォーマットに従うように得られる場合、物体の記述５０も、所定のフォーマットに従うように得られてよい。例えば、ＰＤＤＬを採用する場合、problem.pddlにおけるobjectsの記述が物体の記述５０の一例であり、init（initial state）の記述が初期状態の記述５１の一例であり、goalの記述が目標状態の記述５２の一例である。 In one example, the generated problem description 5 may further include a description 50 of objects present in the environment. The object description 50 may correspond to a list of objects present in the environment. The range of objects included in the description 50 may be determined appropriately depending on the embodiment. For example, the object description 50 may consist of a list of all objects that can be observed in the target environment. Alternatively, for example, the object description 50 may consist of a list of a portion of objects that can be observed. These portions of objects may be objects of interest, such as objects that may be involved in the task. In this case, objects of no interest (e.g., objects that may not be involved in the task) may be omitted from the object description 50. According to one example of this embodiment, the object description 50 makes it easier to identify objects present in the environment, thereby making it easier for a planner (e.g., a symbolic planner 61) to generate an action plan from the problem description 5. Note that if the descriptions of the initial state and the goal state (51, 52) are obtained in a predetermined format, the object description 50 may also be obtained in a predetermined format. For example, when using PDDL, the description of objects in problem.pddl is an example of an object description 50, the description of init (initial state) is an example of an initial state description 51, and the description of goal is an example of a target state description 52.

なお、推論モジュール３の入出力の形態は、図２の例に限られなくてよく、実施の形態に応じて適宜変更されてよい。環境情報２２の少なくとも一部は省略されてよい。問題記述５において、物体の記述５０は省略されてもよい。一例では、ロボット装置Ｒ以外の他の物体が存在しない環境でロボット装置Ｒが駆動する場合（例えば、ドローンが何も存在しない空中を飛行する等）、問題記述５において、物体の記述５０は省略されてよい。 Note that the input/output format of the inference module 3 need not be limited to the example shown in Figure 2 and may be modified as appropriate depending on the embodiment. At least a portion of the environmental information 22 may be omitted. In the problem statement 5, the object description 50 may be omitted. In one example, when the robot device R operates in an environment where no other objects exist other than the robot device R (for example, a drone flying in the air where there are no other objects), the object description 50 may be omitted from the problem statement 5.

［問題記述の修正］
一例では、情報処理装置１は、生成された問題記述５の修正に関する処理を実行するように構成されてよい。問題記述５を修正する処理は、コンピュータにより自動的に実行されてもよいし、オペレータ等の人手により手動的に実行されてもよい。 [Modify the problem description]
In one example, the information processing device 1 may be configured to execute a process related to modifying the generated problem description 5. The process of modifying the problem description 5 may be executed automatically by a computer or manually by an operator or the like.

一例として、情報処理装置１は、生成された問題記述５がプランナ（例えば、シンボリックプランナ６１）に適合しているか否かを判定してよい。判定は、訓練済みモデル及び
ルールベースモデルの少なくともいずれかを用いる等の任意の方法により行われてよい。例えば、問題記述５が所定のフォーマットに従うように生成される場合、情報処理装置１は、生成された問題記述５が所定のフォーマットに従っているか否かを評価することで、プランナに適合しているか否かを判定してよい。プランナに適合していると判定された場合、生成された問題記述５は、プランナに適宜与えられてよい。一方で、生成された問題記述５がプランナに適合していないと判定された場合、情報処理装置１は、生成された問題記述５を適宜修正してよい。例えば、情報処理装置１は、修正モデルを用いて、生成された問題記述５を修正してもよい。修正モデルは、訓練済みモデル及びルールベースモデルの少なくともいずれかにより構成されてよい。また、例えば、情報処理装置１は、プランナに適合しない要因（例えば、所定のフォーマットに従っていない箇所を示す）と共に問題記述５を出力装置に出力してもよい。情報処理装置１は、オペレータによる入力装置を介した問題記述５に対する修正を受け付けてもよい。そして、情報処理装置１は、受け付けた内容に応じて、問題記述５を修正してもよい。他の一例として、情報処理装置１は、プランナに適合しているか否かを判定する処理を省略した上で、問題記述５を修正する上記処理を実行してもよい。 As an example, the information processing device 1 may determine whether the generated problem description 5 is compatible with a planner (e.g., a symbolic planner 61). This determination may be made by any method, such as using at least one of a trained model and a rule-based model. For example, if the problem description 5 is generated to conform to a predetermined format, the information processing device 1 may determine whether the generated problem description 5 conforms to the predetermined format by evaluating whether the generated problem description 5 conforms to the predetermined format. If the generated problem description 5 is determined to be compatible with the planner, the generated problem description 5 may be provided to the planner as appropriate. On the other hand, if the generated problem description 5 is determined to be incompatible with the planner, the information processing device 1 may modify the generated problem description 5 as appropriate. For example, the information processing device 1 may modify the generated problem description 5 using a modification model. The modification model may be composed of at least one of a trained model and a rule-based model. Furthermore, for example, the information processing device 1 may output the problem description 5 to an output device together with factors that cause the problem description 5 to be incompatible with the planner (e.g., indicating parts that do not conform to the predetermined format). The information processing device 1 may accept modifications to the problem description 5 by an operator via an input device. Then, the information processing device 1 may modify the problem description 5 in accordance with the received content. As another example, the information processing device 1 may execute the above-described process of modifying the problem description 5 after omitting the process of determining whether or not the problem description 5 is suitable for the planner.

問題記述５を修正する処理を実行するタイミングは、特に限られなくてよく、実施の形態に応じて適宜決定されてよい。一例では、プランナ（例えば、シンボリックプランナ６１）による行動計画の生成に問題記述５を用いる前に、情報処理装置１は、問題記述５の修正に関する処理を実行してよい。他の一例では、生成された問題記述５がプランナに与えられた後、例えば、エラーメッセージを出力する、適正な行動計画が生成不能である等、プランナによる行動計画の試行にエラーが生じたことに応じて、情報処理装置１は、問題記述５を修正する処理を実行してよい。なお、問題記述５を修正する処理は、情報処理装置１以外の外部コンピュータにより実行されてもよい。また、問題記述５を修正する処理は、省略されてもよい。 The timing of executing the process to modify the problem description 5 is not particularly limited and may be determined appropriately depending on the embodiment. In one example, the information processing device 1 may execute the process to modify the problem description 5 before the problem description 5 is used to generate an action plan by a planner (e.g., symbolic planner 61). In another example, after the generated problem description 5 is provided to the planner, the information processing device 1 may execute the process to modify the problem description 5 in response to an error occurring in the planner's attempt to generate an action plan, such as outputting an error message or being unable to generate an appropriate action plan. Note that the process to modify the problem description 5 may be executed by an external computer other than the information processing device 1. Furthermore, the process to modify the problem description 5 may be omitted.

図３は、本実施形態に係る問題記述５の修正過程の一例を模式的に示す。一例では、推論モジュール３は、コンテキスト内学習の訓練済みモデル３９を含むように構成されてよい。コンテキスト内学習とは、例えば、入出力サンプル等の入力（プロンプト）の文脈を通じて特定の推論を行う能力を習得することである。一例では、自己注意機構及び自己回帰モデルを含むことで、訓練済みモデル３９は、コンテキスト内学習の能力を獲得することができる。この訓練済みモデル３９は、この訓練済みモデル３９は、例えば、大規模言語モデル（LLM：Large Vision-Language Model）、大規模視覚言語モデル（LVLM：Large Vision-Language Model）等であってよい。大規模視覚言語モデルは、visual question answering model、open vocabulary object detection model、open vocabulary object segmentation model等を含んでよい。その他、訓練済みモデル３９は、例えば、Audio Question Answering等の１つ以上の他のモダリティ（音等）と組み合わされた大規模言語モデルであってもよい。更には、訓練済みモデル３９は、例えば、Large Audio Model等の言語以外の１つ以上の他のモダリティの大規模モデルであってもよい。訓練済みモデル３９に対する入力のデータ形式は、実施の形態に応じて適宜選択されてよい。 Figure 3 schematically illustrates an example of the problem statement 5 revision process according to this embodiment. In one example, the inference module 3 may be configured to include a trained model 39 for in-context learning. In-context learning refers to acquiring the ability to make specific inferences through the context of input (prompts), such as input/output samples. In one example, the trained model 39 can acquire the ability for in-context learning by including a self-attention mechanism and an autoregressive model. This trained model 39 may be, for example, a large-scale language model (LLM), a large-scale visual language model (LVLM), etc. The large-scale visual language model may include a visual question answering model, an open vocabulary object detection model, an open vocabulary object segmentation model, etc. Alternatively, the trained model 39 may be a large-scale language model combined with one or more other modalities (such as sound), such as audio question answering. Furthermore, the trained model 39 may be a large-scale model of one or more modalities other than language, such as a Large Audio Model. The data format of the input to the trained model 39 may be selected appropriately depending on the embodiment.

図３の例では、コンテキスト内学習の能力を有する訓練済みモデル３９を含んでいることで、推論モジュール３は、エラーが生じた場合に、生じたエラーに適応し、生じたエラーに応じて問題記述５を修正することができる。そこで、図３の一例では、情報処理装置１は、推論モジュール３に入力データ２００を与えて、推論モジュール３の演算処理を実行することで、問題記述５を生成してよい。入力データ２００は、行動計画を生成する対象のシーンにおける観測データ２０及び指示情報２１を含む。一例では、入力データ２００は、環境情報２２を更に含んでよい。 In the example of FIG. 3, the inference module 3 includes a trained model 39 capable of in-context learning, which allows the inference module 3 to adapt to errors when they occur and modify the problem statement 5 in response to the errors. Therefore, in the example of FIG. 3, the information processing device 1 may generate the problem statement 5 by providing input data 200 to the inference module 3 and executing the calculation process of the inference module 3. The input data 200 includes observation data 20 and instruction information 21 of a target scene for which an action plan is to be generated. In one example, the input data 200 may further include environmental information 22.

生成された問題記述５は、プランナに適宜与えられてよい。プランナは、与えられた問
題記述５を用いて、行動計画の生成を試行してよい。プランナが図１の構成を有する場合、生成された問題記述５は、シンボリックプランナ６１に与えられてよく、シンボリックプランナ６１は、問題記述５から動作系列６３の生成を試行してよい。このプランナにより行動計画を生成する処理は、情報処理装置１及び情報処理装置１以外の外部コンピュータの少なくともいずれかにより実行されてよい。なお、上記のとおり、行動計画の生成には、問題記述５と共に、ドメイン記述２３が用いられてよい。 The generated problem description 5 may be provided to a planner as appropriate. The planner may attempt to generate an action plan using the provided problem description 5. When the planner has the configuration of FIG. 1 , the generated problem description 5 may be provided to a symbolic planner 61, which may attempt to generate an action sequence 63 from the problem description 5. The process of generating an action plan by this planner may be executed by at least one of the information processing device 1 and an external computer other than the information processing device 1. As described above, the domain description 23 may be used together with the problem description 5 to generate the action plan.

問題記述５が適正である場合、プランナは、問題記述５から適正な行動計画を生成することができる。図１の例では、適正な動作系列６３が得られ、これに応じて、モーションプランナ６５が制御指令６７の系列を生成することができる。生成された行動計画は、ロボット装置Ｒの動作制御に適宜用いられてよい。一方で、問題記述５が適正でない場合、プランナによる行動計画の生成にエラーが生じる。問題記述５が適正ではないことは、例えば、問題記述５がプランナに適合していないこと、禁止項目を避けて行動計画を生成することができないこと、与えられたロボット装置Ｒのスキルでは行動計画を生成することができないこと等を含んでよい。問題記述５がプランナに適合していないことは、例えば、上記問題記述５が所定のフォーマットに従っていないこと等を含んでよい。この場合、プランナからエラーメッセージ６１５が出力される。 If the problem description 5 is appropriate, the planner can generate an appropriate action plan from the problem description 5. In the example of FIG. 1, an appropriate action sequence 63 is obtained, and the motion planner 65 can generate a sequence of control commands 67 accordingly. The generated action plan may be used as appropriate to control the movement of the robot device R. On the other hand, if the problem description 5 is inappropriate, an error occurs in the planner's generation of the action plan. An inappropriate problem description 5 may include, for example, the problem description 5 not being compatible with the planner, an inability to generate an action plan that avoids prohibited items, or an inability to generate an action plan with the given skills of the robot device R. An inappropriate problem description 5 may include, for example, the problem description 5 not conforming to a predetermined format. In this case, an error message 615 is output from the planner.

生成された問題記述５をプランナに与えて、ロボット装置Ｒの行動計画を生成する処理において、プランナがエラーメッセージ６１５を出力した場合、情報処理装置１は、出力されたエラーメッセージ６１５を取得してよい。エラーメッセージ６１５の構成は、特に限られなくてよく、プランナの構成等の実施の形態に応じて適宜決定されてよい。そして、情報処理装置１は、取得されたエラーメッセージ６１５及び問題記述５を推論モジュール３に与えて、推論モジュール３の演算処理を再度実行してよい。すなわち、情報処理装置１は、問題記述５及びエラーメッセージ６１５を再プロンプトとして用いて、問題記述５を生成する処理を再度実行してよい。これにより、情報処理装置１は、推論モジュール３を用いて、問題記述５及びエラーメッセージ６１５から新たな問題記述５を生成してもよい（すなわち、問題記述５を修正してもよい）。本実施形態の一例によれば、適切な問題記述５が得られなかった場合に、問題記述５を自動的に修正することができる。 When the generated problem description 5 is provided to the planner to generate a behavior plan for the robot device R, if the planner outputs an error message 615, the information processing device 1 may acquire the output error message 615. The configuration of the error message 615 is not particularly limited and may be determined appropriately depending on the embodiment, such as the configuration of the planner. The information processing device 1 may then provide the acquired error message 615 and problem description 5 to the inference module 3 and re-execute the calculation processing of the inference module 3. In other words, the information processing device 1 may use the problem description 5 and error message 615 as a re-prompt to re-execute the process of generating the problem description 5. As a result, the information processing device 1 may use the inference module 3 to generate a new problem description 5 from the problem description 5 and error message 615 (i.e., modify the problem description 5). According to one example of this embodiment, if an appropriate problem description 5 cannot be obtained, the problem description 5 can be automatically modified.

なお、再プロンプトの際に推論モジュール３に与えるデータは、問題記述５及びエラーメッセージ６１５に限られなくてもよい。再プロンプトの構成は、実施の形態に応じて適宜変更されてよい。他の一例では、情報処理装置１は、問題記述５及びエラーメッセージ６１５と共に、入力データ２００の少なくとも一部を推論モジュール３に与えてもよい。これにより、問題記述５を修正する精度の向上を期待することができる。 Note that the data provided to the inference module 3 during a re-prompt does not have to be limited to the problem description 5 and the error message 615. The configuration of the re-prompt may be modified as appropriate depending on the embodiment. In another example, the information processing device 1 may provide at least a portion of the input data 200 to the inference module 3 along with the problem description 5 and the error message 615. This is expected to improve the accuracy of correcting the problem description 5.

また、情報処理装置１は、上記再プロンプトによる問題記述５を修正する処理を再帰的に繰り返し実行してもよい。繰り返す回数は、特に限られなくてよく、実施の形態に応じて適宜決定されてよい。一例では、情報処理装置１は、エラーメッセージ６１５が出力されなくなるまで、再プロンプトによる問題記述５を修正する処理を繰り返し実行してもよい。他の一例では、再プロンプトによる問題記述５を修正する処理を実行する回数が予め規定されてよい。情報処理装置１は、再プロンプトによる修正処理を所定回数繰り返し実行した後でもエラーメッセージ６１５が出力される場合、再プロンプトによる修正処理を停止し、それまでの実行結果（例えば、生成された問題記述５等）を出力してよい。 The information processing device 1 may also recursively execute the process of correcting the problem description 5 through re-prompts. The number of repetitions is not particularly limited and may be determined appropriately depending on the embodiment. In one example, the information processing device 1 may repeatedly execute the process of correcting the problem description 5 through re-prompts until the error message 615 is no longer output. In another example, the number of times the process of correcting the problem description 5 through re-prompts is executed may be specified in advance. If the error message 615 is still output after the information processing device 1 has executed the correction process through re-prompts a predetermined number of times, the information processing device 1 may stop the correction process through re-prompts and output the results of the execution up to that point (e.g., the problem description 5 generated, etc.).

また、再プロンプトの方法は、上記の例に限られなくてよく、実施の形態に応じて適宜変更されてよい。他の一例では、情報処理装置１は、問題記述５及びエラーメッセージ６１５と共に又はこれらに代えて、エラーメッセージ６１５に応じて修正されたプロンプトを再プロンプトとして用いることで、問題記述５を修正してもよい。例えば、エラーメッセージ６１５が出力された場合、情報処理装置１は、修正モデルを用いて、出力されたエ
ラーメッセージ６１５に応じて、推論モジュール３に与えたプロンプト（入力データ２００）を適宜修正してよい。修正モデルは、訓練済みモデル及びルールベースモデルの少なくともいずれかにより適宜構成されてよい。そして、情報処理装置１は、修正されたプロンプトを推論モジュール３に再度与えて、推論モジュール３の演算処理を実行することで、新たな問題記述５を生成してもよい。 Furthermore, the re-prompting method is not limited to the above example and may be modified as appropriate depending on the embodiment. In another example, the information processing device 1 may modify the problem statement 5 by using a prompt modified in response to the error message 615 as a re-prompt, together with or instead of the problem statement 5 and the error message 615. For example, when the error message 615 is output, the information processing device 1 may use a modified model to appropriately modify the prompt (input data 200) provided to the inference module 3 in response to the output error message 615. The modified model may be configured as appropriate using at least one of a trained model and a rule-based model. The information processing device 1 may then provide the modified prompt to the inference module 3 again and execute the calculation process of the inference module 3 to generate a new problem statement 5.

［推論モジュール］
推論モジュール３は、観測データ２０及び指示情報２１を含む入力データから問題記述５を生成する推論処理を実行するように構成される。このような推論処理を実行可能であれば、推論モジュール３の構成は、特に限られなくてよく、実施の形態に応じて適宜決定されてよい。一例では、推論モジュール３は、ルールベースモデル及び訓練済みモデル（訓練された機械学習モデル）の少なくともいずれかにより構成されてよい。 [Inference module]
The inference module 3 is configured to execute an inference process to generate a problem description 5 from input data including observation data 20 and instruction information 21. As long as the inference module 3 is capable of executing such an inference process, the configuration of the inference module 3 is not particularly limited and may be determined appropriately depending on the embodiment. In one example, the inference module 3 may be configured by at least one of a rule-based model and a trained model (a trained machine learning model).

ルールベースモデルは、ルールに従って、与えられた入力から推論結果（本実施形態では、問題記述５の生成結果）を導出するように構成される。ルールは、適宜設定されてよい。機械学習モデルは、機械学習により調整可能な１つ以上の演算パラメータを有するように構成される。１つ以上の演算パラメータは、目的とする推論（本実施形態では、問題記述５の生成）の演算に使用される。機械学習モデルは、例えば、ニューラルネットワーク、回帰モデル、決定木モデル、サポートベクタマシン、その他の関数式（演算モデル）等により構成されてよい。機械学習の方法は、採用する機械学習のモデルに応じて、適宜選択されてよい（例えば、誤差逆伝播法等）。 The rule-based model is configured to derive an inference result (in this embodiment, the result of generating problem statement 5) from a given input in accordance with rules. The rules may be set as appropriate. The machine learning model is configured to have one or more calculation parameters that can be adjusted by machine learning. The one or more calculation parameters are used to calculate the desired inference (in this embodiment, the generation of problem statement 5). The machine learning model may be configured, for example, from a neural network, regression model, decision tree model, support vector machine, or other functional formula (calculation model). The machine learning method may be selected as appropriate depending on the machine learning model used (for example, backpropagation, etc.).

機械学習は、訓練サンプルを使用して、演算パラメータの値を調整（最適化）することである。典型的には、入力サンプル（訓練サンプル）及び出力サンプル（教師信号、ラベル）の組み合わせによりそれぞれ構成される複数の学習データセットを用いた教師あり学習を行うことで、訓練済みモデルは生成されてよい。例えば、入力サンプルは、入力データ（観測データ２０、指示情報２１等）のサンプルであってよく、出力サンプルは、出力データ（問題記述５）のサンプルであってよい。教師あり学習では、入力サンプルを与えることで機械学習モデルから得られる出力が対応する出力サンプルに適合するものとなるように、機械学習モデルの演算パラメータの値は調整されてよい。ただし、訓練済みモデルを生成する方法は、このような例に限られなくてよく、実施の形態に応じて適宜変更されてよい。学習データセットは、上記の例に限られなくてよく、実施の形態に応じて適宜選択されてよい。例えば、コンテキスト内学習の能力を獲得させる場合等、学習データセットには、上記以外のデータが用いられてもよい。また、学習方法は、教師あり学習に限られなくてよく、例えば、教師なし学習（自己教師あり学習を含む）、強化学習等の他の方法が用いられてもよい。機械学習モデルは、オンライン及びオフラインの少なくともいずれかにより訓練されてよい。機械学習モデルに対して、転移学習、再学習、追加学習等のチューニングが適宜行われてもよい。追加学習は、例えば、参考文献７（“LoRA”, [online], [令和５年１０月２４日検索], インターネット＜ＵＲＬ：https://github.com/microsoft/LoRA＞）のLoRA（Low-Rank Adaptation of Large Language Models）、参考文献８（Neil Houlsby et al., “Parameter-Efficient Transfer Learning for NLP”, [online], [令和５年１０月２４日検索], インターネット＜ＵＲＬ：http://proceedings.mlr.press/v97/houlsby19a/houlsby19a.pdf＞）のAdapter、参考文献９（Brian Lester et
al., “The Power of Scale for Parameter-Efficient Prompt Tuning”, [online], [令和５年１０月２４日検索], インターネット＜ＵＲＬ：https://arxiv.org/abs/2104.08691＞）のPrompt Tuning等のように、既存の訓練済みモデルにモジュールを追加し、既存の訓練済みモデルのパラメータはそのままで、追加モジュールのパラメータを調整する機械学習を含んでよい。また、追加学習は、参考文献１０（Kecheng Zheng et al., “Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained Vision-Language Models”, [online], [令和５年１０月２４日検索], インターネット＜ＵＲＬ：https://
openaccess.thecvf.com/content/ICCV2023/papers/Zheng_Regularized_Mask_Tuning_Uncovering_Hidden_Knowledge_in_Pre-Trained_Vision-Language_Models_ICCV_2023_paper.pdf＞）のMask Tuning等のように訓練済みモデルの一部を不使用にするマスクを調整する機械学習を含んでよい。上記大規模言語モデル等のコンテキスト内学習を行う能力を有する訓練済みモデルを推論モジュール３の少なくとも一部に用いる場合に、これらの追加学習を行うことで、特定の問題、ドメイン等の場面に訓練済みモデルを特化させてもよい。 Machine learning involves adjusting (optimizing) values of computational parameters using training samples. Typically, a trained model may be generated by supervised learning using multiple training datasets, each of which is composed of a combination of input samples (training samples) and output samples (teacher signals, labels). For example, the input samples may be samples of input data (such as observation data 20 and instruction information 21), and the output samples may be samples of output data (such as problem descriptions 5). In supervised learning, the values of computational parameters of a machine learning model may be adjusted so that, when an input sample is provided, the output obtained from the machine learning model matches the corresponding output sample. However, the method for generating a trained model is not limited to this example and may be appropriately changed depending on the embodiment. The training dataset is not limited to the above example and may be selected appropriately depending on the embodiment. For example, when acquiring in-context learning capabilities, data other than those described above may be used for the training dataset. Furthermore, the learning method is not limited to supervised learning; for example, other methods such as unsupervised learning (including self-supervised learning) and reinforcement learning may be used. The machine learning model may be trained online or offline. The machine learning model may be appropriately tuned by transfer learning, re-learning, additional learning, etc. Additional learning may be performed using, for example, LoRA (Low-Rank Adaptation of Large Language Models) in Reference 7 (“LoRA”, [online], [searched October 24, 2023], Internet <URL: https://github.com/microsoft/LoRA>), Adapter in Reference 8 (Neil Houlsby et al., “Parameter-Efficient Transfer Learning for NLP”, [online], [searched October 24, 2023], Internet <URL: http://proceedings.mlr.press/v97/houlsby19a/houlsby19a.pdf>), or Brian Lester et al.
al., "The Power of Scale for Parameter-Efficient Prompt Tuning", [online], [Retrieved October 24, 2023], Internet <URL: https://arxiv.org/abs/2104.08691>), adds a module to an existing trained model, and adjusts the parameters of the additional module while leaving the parameters of the existing trained model unchanged. Furthermore, additional learning may include machine learning such as Prompt Tuning in Reference 10 (Kecheng Zheng et al., "Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained Vision-Language Models", [online], [Retrieved October 24, 2023], Internet <URL: https://
openaccess.thecvf.com/content/ICCV2023/papers/Zheng_Regularized_Mask_Tuning_Uncovering_Hidden_Knowledge_in_Pre-Trained_Vision-Language_Models_ICCV_2023_paper.pdf), etc. When a trained model capable of in-context learning, such as the large-scale language model described above, is used for at least a part of the inference module 3, the trained model may be specialized for a specific problem, domain, or other situation by performing additional learning.

一例では、推論モジュール３は、ニューラルネットワークを含んでよい。ニューラルネットワークの構造は、特に限られなくてよく、実施の形態に応じて適宜決定されてよい。ニューラルネットワークの構造は、例えば、入力層から出力層までの層の数、各層の種類、各層に含まれるノード（ニューロン）の数、各層のノード同士の結合関係等により特定されてよい。一例では、ニューラルネットワークは、再帰構造、自己注意機構、自己回帰モデル等の任意の機構を含んでよい。また、ニューラルネットワークは、例えば、全結合層、畳み込み層、プーリング層、逆畳み込み層、アンプーリング層、正規化層、ドロップアウト層、LSTM（Long short-term memory）等の任意の層を含んでよい。ニューラルネットワークは、diffusionモデル、transformerモデル、生成モデル等の任意のタイプのモデルを含んでよい。ニューラルネットワークに含まれる各ノード間の結合の重み及び各ノードの閾値が、演算パラメータの一例である。 In one example, the inference module 3 may include a neural network. The structure of the neural network is not particularly limited and may be determined appropriately depending on the embodiment. The structure of the neural network may be specified, for example, by the number of layers from the input layer to the output layer, the type of each layer, the number of nodes (neurons) included in each layer, and the connection relationships between the nodes in each layer. In one example, the neural network may include any mechanism such as a recurrent structure, a self-attention mechanism, or an autoregressive model. Furthermore, the neural network may include any layer such as a fully connected layer, a convolutional layer, a pooling layer, a deconvolutional layer, an unpooling layer, a normalization layer, a dropout layer, or an LSTM (Long Short-Term Memory). The neural network may include any type of model, such as a diffusion model, a transformer model, or a generative model. The connection weights between each node included in the neural network and the threshold value for each node are examples of calculation parameters.

上記のとおり、一例では、推論モジュール３は、コンテキスト内学習の訓練済みモデル３９を含んでよい。この場合、推論モジュール３に与える入力データは、環境情報２２のドメイン情報２４として、訓練済みモデル３９に対する１つ以上の入出力サンプル２４３を含んでよい。入出力サンプル２４３は、問題記述５を生成するドメインに応じて適宜用意されてよい。本実施形態の一例では、入出力サンプル２４３を訓練済みモデル３９に与えることで、コンテキスト内学習を行い、訓練済みモデル３９を対象のドメインに適応させることができる。すなわち、入出力サンプル２４３をドメイン毎に用意することで、推論モジュール３は、訓練済みモデル３９を交換することなく、問題記述５を汎用的に生成することができる。よって、本実施形態の一例によれば、様々なドメインにおけるタスクに対する問題記述５の生成への対応を期待することができる。 As described above, in one example, the inference module 3 may include a trained model 39 for in-context learning. In this case, the input data provided to the inference module 3 may include one or more input/output samples 243 for the trained model 39 as domain information 24 of the environment information 22. The input/output samples 243 may be prepared appropriately depending on the domain for which the problem description 5 is to be generated. In one example of this embodiment, by providing the input/output samples 243 to the trained model 39, in-context learning can be performed, and the trained model 39 can be adapted to the target domain. In other words, by preparing input/output samples 243 for each domain, the inference module 3 can generate a general-purpose problem description 5 without replacing the trained model 39. Therefore, according to this example of the present embodiment, it is expected that problem descriptions 5 can be generated for tasks in a variety of domains.

また、一例では、推論モジュール３がコンテキスト内学習の訓練済みモデル３９を含む場合に、ドメイン記述２３が、ロボット装置Ｒのタイプ毎に用意されてよい。加えて、ドメイン記述２３に対応する出力サンプル（問題記述５のサンプル）がドメイン情報２４としてタイプ毎に用意されてよい。これに応じて、情報処理装置１は、タイプの指定を受け付けてよい。指定方法は、特に限られなくてよく、実施の形態に応じて適宜選択されてよい。典型的には、タイプは、オペレータによる入力装置を介した操作、テキスト入力、音声入力等の手動的な方法で指定されてよい。タイプの指定は、指示情報２１に含まれてもよい。情報処理装置１は、指定されたタイプに対応するドメイン記述２３及び出力サンプル（ドメイン情報２４）を環境情報２２として推論モジュール３（訓練済みモデル３９）に与えてよい。これにより、指定されたタイプのスキルを有するロボット装置Ｒの行動計画を生成するための問題記述５を生成することができる。すなわち、本実施形態の一例によれば、推論モジュール３の汎用性を高めると共に、タイプに適合する問題記述５の生成精度の向上を期待することができる。 In one example, when the inference module 3 includes a trained model 39 for in-context learning, a domain description 23 may be prepared for each type of robot device R. Additionally, an output sample (a sample problem description 5) corresponding to the domain description 23 may be prepared for each type as domain information 24. In response, the information processing device 1 may accept a type designation. The designation method is not particularly limited and may be selected appropriately depending on the embodiment. Typically, the type may be designated manually, such as by an operator operating an input device, entering text, or entering voice information. The type designation may be included in the instruction information 21. The information processing device 1 may provide the domain description 23 and output sample (domain information 24) corresponding to the designated type as environment information 22 to the inference module 3 (trained model 39). This allows for the creation of a problem description 5 for generating a behavior plan for a robot device R having a skill of the designated type. In other words, this example of the present embodiment is expected to enhance the versatility of the inference module 3 and improve the accuracy of generating problem descriptions 5 that match the type.

また、推論モジュール３の入出力の形態は、実施の形態に応じて適宜決定されてよい。上記のとおり、一例では、入力データ（観測データ２０、指示情報２１及び環境情報２２）は、そのままの状態で推論モジュール３に与えられてもよい。他の一例では、入力データの少なくとも一部は、推論モジュール３に与えられる前に任意の前処理が適用されてよい。前処理は、特徴量の抽出、その他の解析、推論等の演算処理を含んでよい。推論モジュール３には、前処理後の入力データが与えられてもよい。また、一例では、推論モジュ
ール３の出力は、推論結果（問題記述５）を直接的に示すように構成されてよい。他の一例では、推論モジュール３の出力は、推論結果を間接的に示すように構成されてよい。この場合、推論結果は、推論モジュール３の出力に対して任意の情報処理（解釈処理）を実行することにより得られてよい。問題記述５の生成は、現時点で得られるデータ（過去のデータを含んでもよい）に対してリアルタイムに遂行されてもよいし、過去に得られたデータに対して事後的に遂行されてもよい。 The input/output form of the inference module 3 may be determined appropriately depending on the embodiment. As described above, in one example, the input data (observation data 20, instruction information 21, and environmental information 22) may be provided to the inference module 3 as is. In another example, any preprocessing may be applied to at least a portion of the input data before providing it to the inference module 3. The preprocessing may include computational processing such as feature extraction, other analysis, and inference. The inference module 3 may be provided with the preprocessed input data. In one example, the output of the inference module 3 may be configured to directly indicate the inference result (problem statement 5). In another example, the output of the inference module 3 may be configured to indirectly indicate the inference result. In this case, the inference result may be obtained by performing any information processing (interpretation processing) on the output of the inference module 3. The problem statement 5 may be generated in real time using data obtained at the present time (which may include past data) or retrospectively using data obtained in the past.

また、推論モジュール３は、一体的に構成されてもよいし、複数の部分要素の集合により構成されてもよい。一例では、複数の部分要素の集合により推論モジュール３を構成する場合、各部分要素は、問題記述５の一部（対応する部分）を生成するように適宜構成されてよい。例えば、問題記述５の物体、初期状態及び目標状態の各記述（５０、５１、５２）に対して、１つ以上の部分要素が用意されてよい。各部分要素は、機械学習モデル及びルールベースモデルの少なくともいずれかにより構成されてもよい。 Furthermore, the inference module 3 may be configured as an integrated unit, or may be configured from a collection of multiple subelements. In one example, when the inference module 3 is configured from a collection of multiple subelements, each subelement may be appropriately configured to generate a portion (corresponding part) of the problem description 5. For example, one or more subelements may be prepared for each description (50, 51, 52) of the object, initial state, and goal state in the problem description 5. Each subelement may be configured from at least one of a machine learning model and a rule-based model.

（推論モジュールの構成例）
図４は、本実施形態に係る推論モジュール３の構成の一例を模式的に示す。図４の例では、問題記述５は、環境に存在する物体の記述５０を更に含む場面を想定している。一例では、推論モジュール３は、物体推定器３１、初期状態推定器３３及び目標推定器３５を含んでよい。物体推定器３１は、物体の記述５０に対応する１つ以上の部分要素の一例である。初期状態推定器３３は、初期状態の記述５１に対応する１つ以上の部分要素の一例である。目標推定器３５は、目標状態の記述５２に対応する１つ以上の部分要素の一例である。 (Example of inference module configuration)
FIG. 4 schematically illustrates an example of the configuration of the inference module 3 according to this embodiment. In the example of FIG. 4, a scenario is assumed in which the problem description 5 further includes descriptions 50 of objects present in the environment. In one example, the inference module 3 may include an object estimator 31, an initial state estimator 33, and a goal estimator 35. The object estimator 31 is an example of one or more subelements corresponding to the object descriptions 50. The initial state estimator 33 is an example of one or more subelements corresponding to the initial state descriptions 51. The goal estimator 35 is an example of one or more subelements corresponding to the goal state descriptions 52.

物体推定器３１は、取得された観測データ２０から環境に存在する１つ以上の物体の記述５０を生成するように構成されてよい。物体の記述５０を生成することは、物体を検出することを含んでもよい。推論モジュール３を用いて、問題記述５を生成することは、物体推定器３１を用いて、環境に存在する物体の記述５０を取得された観測データ２０から生成することを含んでよい。観測データ２０には、環境に存在する物体に関する情報が表れる。そのため、本実施形態の一例によれば、問題記述５における物体の記述５０を適切に生成することができる。 The object estimator 31 may be configured to generate descriptions 50 of one or more objects present in the environment from the acquired observation data 20. Generating the object descriptions 50 may include detecting the objects. Generating the problem statement 5 using the inference module 3 may include using the object estimator 31 to generate descriptions 50 of objects present in the environment from the acquired observation data 20. The observation data 20 contains information about objects present in the environment. Therefore, according to one example of this embodiment, the object descriptions 50 in the problem statement 5 can be appropriately generated.

観測データ２０から物体の記述５０を生成可能であれば、物体推定器３１の構成は、特に限られなくてよく、実施の形態に応じて適宜選択されてよい。推論モジュール３の構成に関する上記説明は、物体推定器３１にも適用されてよい。物体推定器３１は、ルールベースモデル及び訓練済みモデルの少なくともいずれかにより構成されてよい。 As long as it is possible to generate an object description 50 from the observation data 20, the configuration of the object estimator 31 is not particularly limited and may be selected appropriately depending on the embodiment. The above description of the configuration of the inference module 3 may also be applied to the object estimator 31. The object estimator 31 may be configured using at least one of a rule-based model and a trained model.

物体推定器３１には、特定のドメインでのみ物体の記述５０を推論可能なモデルが使用されてよい。ただし、様々なドメインのタスクに共通に利用可能であるためには、そのような特定のドメインでのみ物体の記述５０を推論可能なモデルではなく、物体の記述５０を汎用的に推論可能なモデルを物体推定器３１として使用することが望ましい。一例では、物体推定器３１は、コンテキスト内学習の訓練済みモデルを備えてよい。コンテキスト内学習を行う能力を有する訓練済みモデルを備えていることで、物体推定器３１の汎用性を高めることができる。すなわち、環境に存在する物体の記述５０を汎用的に推論することができ、これにより、様々なドメインにおけるタスクに対する物体の記述５０の生成への対応を期待することができる。 The object estimator 31 may use a model that can infer object descriptions 50 only in a specific domain. However, in order to be commonly usable for tasks in various domains, it is desirable to use a model that can infer object descriptions 50 generically as the object estimator 31, rather than a model that can infer object descriptions 50 only in a specific domain. In one example, the object estimator 31 may be equipped with a trained model for in-context learning. By providing a trained model capable of in-context learning, the versatility of the object estimator 31 can be increased. In other words, it can infer descriptions 50 of objects present in the environment generically, which can be expected to enable the generation of object descriptions 50 for tasks in various domains.

物体推定器３１の入出力の形態は、実施の形態に応じて適宜決定されてよい。物体推定器３１は、観測データ２０以外の任意のデータの入力を更に受け付けてもよい。入力データは、そのまま物体推定器３１に与えられてもよいし、前処理が適用された後に物体推定器３１に与えられてもよい。物体推定器３１は、物体の記述５０以外の任意のデータを更
に出力してもよい。 The input/output form of the object estimator 31 may be determined appropriately depending on the embodiment. The object estimator 31 may further receive input of any data other than the observation data 20. The input data may be provided to the object estimator 31 as is, or may be provided to the object estimator 31 after preprocessing has been applied. The object estimator 31 may further output any data other than the object description 50.

一例では、情報処理装置１は、環境に関する環境情報２２を更に取得してもよい。物体推定器３１は、取得された環境情報２２を更に用いて、物体の記述５０を生成してもよい。すなわち、取得された観測データ２０から物体の記述５０を生成することは、取得された観測データ２０及び環境情報２２から物体の記述５０を生成することにより構成されてもよい。本実施形態の一例によれば、環境情報２２を更に用いることで、問題記述５における物体の記述５０を生成する精度の向上を期待することができる。 In one example, the information processing device 1 may further acquire environmental information 22 related to the environment. The object estimator 31 may further use the acquired environmental information 22 to generate the object description 50. In other words, generating the object description 50 from the acquired observation data 20 may be configured by generating the object description 50 from the acquired observation data 20 and environmental information 22. According to one example of this embodiment, by further using the environmental information 22, it is possible to expect an improvement in the accuracy of generating the object description 50 in the problem description 5.

物体推定器３１に与える環境情報２２は、実施の形態に応じて適宜選択されてよい。環境情報２２は、上記ドメイン記述２３及びドメイン情報２４の少なくともいずれかを含んでよい。例えば、環境情報２２は、ドメイン記述２３の少なくとも一部を含んでもよい。また、例えば、環境情報２２は、環境に存在する物体の属性情報２４１をドメイン情報２４として含んでもよい。これにより、属性情報２４１により環境に存在する物体をより特定可能となるため、問題記述５における物体の記述５０を生成する精度の向上を期待することができる。また、例えば、物体推定器３１がコンテキスト内学習の訓練済みモデルを備える場合、環境情報２２は、１つ以上の入出力サンプル２４３をドメイン情報２４として含んでもよい。入力サンプルは、観測データ２０のサンプルにより構成されてよい。入出力サンプル２４３以外の環境情報２２が物体推定器３１に与えられる場合、入力サンプルは、その環境情報２２のサンプルを含んでもよい。出力サンプルは、入力サンプルに対応する物体の記述５０の正解サンプルにより構成されてよい。 The environmental information 22 provided to the object estimator 31 may be selected appropriately depending on the embodiment. The environmental information 22 may include at least one of the domain description 23 and the domain information 24. For example, the environmental information 22 may include at least a portion of the domain description 23. Furthermore, for example, the environmental information 22 may include attribute information 241 of objects present in the environment as the domain information 24. This makes it possible to more accurately identify objects present in the environment using the attribute information 241, which is expected to improve the accuracy of generating the object description 50 in the problem description 5. Furthermore, for example, if the object estimator 31 is equipped with a trained model for in-context learning, the environmental information 22 may include one or more input/output samples 243 as the domain information 24. The input samples may be composed of samples of the observation data 20. If environmental information 22 other than the input/output samples 243 is provided to the object estimator 31, the input samples may include samples of that environmental information 22. The output samples may be composed of correct answer samples of the object description 50 corresponding to the input samples.

なお、ドメイン記述２３及びドメイン情報２４（属性情報２４１、入出力サンプル２４３）は、実施の形態に応じて適宜用意されてよい。一例では、ドメイン記述２３及びドメイン情報２４は予め用意されてよい。ドメイン記述２３及びドメイン情報２４は、情報処理装置１内のメモリ・リソースに保持されてもよいし、使用する際に外部から情報処理装置１に与えられてもよい。また、他の一例では、ドメイン記述２３及びドメイン情報２４の少なくとも一部は使用する際に適宜生成されてもよい。例えば、物体及び属性情報のリストを含む参照情報が予め用意されてよい。情報処理装置１は、指示情報２１（例えば、言語情報）及びドメイン記述２３（例えば、対象物体の種類を定義する記述）の少なくとも一方から環境に存在する物体の候補を抽出してよい。情報処理装置１は、抽出された物体の候補のリストと参照情報のリストとを照合し、物体の候補に適合する物体の属性情報を参照情報から抽出することで、物体推定器３１に与える属性情報２４１を生成してもよい。この属性情報２４１を生成する処理は、前処理として実行されてもよいし、推論モジュール３内の処理として実行されてもよい。 The domain description 23 and domain information 24 (attribute information 241, input/output samples 243) may be prepared as appropriate depending on the embodiment. In one example, the domain description 23 and domain information 24 may be prepared in advance. The domain description 23 and domain information 24 may be stored in memory resources within the information processing device 1, or may be provided to the information processing device 1 from an external source when used. In another example, at least a portion of the domain description 23 and domain information 24 may be generated as appropriate when used. For example, reference information including a list of objects and attribute information may be prepared in advance. The information processing device 1 may extract object candidates present in the environment from at least one of the instruction information 21 (e.g., linguistic information) and the domain description 23 (e.g., a description defining the type of target object). The information processing device 1 may compare the list of extracted object candidates with the list of reference information and extract attribute information of objects that match the object candidates from the reference information, thereby generating attribute information 241 to be provided to the object estimator 31. The process of generating this attribute information 241 may be performed as preprocessing, or as processing within the inference module 3.

初期状態推定器３３は、取得された観測データ２０から環境に存在する１つ以上の物体の初期状態の記述５１を生成するように構成されてよい。推論モジュール３を用いて、問題記述５を生成することは、初期状態推定器３３を用いて、環境に存在する物体の初期状態の記述５１を生成することを含んでよい。観測データ２０には、環境に存在する物体の初期状態が表れる。そのため、本実施形態の一例によれば、問題記述５における初期状態の記述５１を適切に生成することができる。 The initial state estimator 33 may be configured to generate a description 51 of the initial state of one or more objects present in the environment from the acquired observation data 20. Generating the problem statement 5 using the inference module 3 may include generating a description 51 of the initial state of the objects present in the environment using the initial state estimator 33. The observation data 20 represents the initial states of the objects present in the environment. Therefore, according to one example of this embodiment, the description 51 of the initial state in the problem statement 5 can be appropriately generated.

観測データ２０から初期状態の記述５１を生成可能であれば、初期状態推定器３３の構成は、特に限られなくてよく、実施の形態に応じて適宜選択されてよい。上記物体推定器３１と同様に、推論モジュール３の構成に関する上記説明は、初期状態推定器３３にも適用されてよい。初期状態推定器３３は、ルールベースモデル及び訓練済みモデルの少なくともいずれかにより構成されてよい。 As long as it is possible to generate an initial state description 51 from the observation data 20, the configuration of the initial state estimator 33 is not particularly limited and may be selected appropriately depending on the embodiment. As with the object estimator 31, the above description regarding the configuration of the inference module 3 may also be applied to the initial state estimator 33. The initial state estimator 33 may be configured using at least one of a rule-based model and a trained model.

初期状態推定器３３には、特定のドメインでのみ初期状態の記述５１を推論可能なモデ
ルが使用されてよい。ただし、様々なドメインのタスクに共通に利用可能であるためには、そのような特定のドメインでのみ初期状態の記述５１を推論可能なモデルではなく、初期状態の記述５１を汎用的に推論可能なモデルを初期状態推定器３３として使用することが望ましい。一例では、初期状態推定器３３は、コンテキスト内学習の訓練済みモデルを備えてよい。コンテキスト内学習を行う能力を有する訓練済みモデルを備えていることで、初期状態推定器３３の汎用性を高めることができる。すなわち、環境に存在する物体の初期状態の記述５１を汎用的に推論することができ、これにより、様々なドメインにおけるタスクに対する初期状態の記述５１の生成への対応を期待することができる。 The initial state estimator 33 may use a model capable of inferring the initial state description 51 only in a specific domain. However, in order to be commonly usable for tasks in various domains, it is desirable to use a model capable of inferring the initial state description 51 generically as the initial state estimator 33, rather than a model capable of inferring the initial state description 51 only in such a specific domain. In one example, the initial state estimator 33 may include a trained model for in-context learning. By including a trained model capable of in-context learning, the versatility of the initial state estimator 33 can be increased. In other words, the initial state description 51 of an object existing in the environment can be inferred generically, which can be expected to support the generation of initial state descriptions 51 for tasks in various domains.

初期状態推定器３３の入出力の形態は、物体推定器３１と同様に、実施の形態に応じて適宜決定されてよい。初期状態推定器３３は、観測データ２０以外の任意のデータの入力を更に受け付けてもよい。入力データは、そのまま初期状態推定器３３に与えられてもよいし、前処理が適用された後に初期状態推定器３３に与えられてもよい。入力データは、物体推定器３１に先に与えられてよく、入力データに対する物体推定器３１の演算結果（中間の演算結果を含む）が、初期状態推定器３３に与えられてもよい。例えば、観測データ２０を初期状態推定器３３に与えることは、観測データ２０を初期状態推定器３３にそのまま与えること、観測データ２０の前処理結果を初期状態推定器３３に与えること、及び観測データ２０に対する物体推定器３１の演算結果を初期状態推定器３３に与えることを含んでよい。前処理結果及び物体推定器３１の演算結果は、例えば、観測データ２０における物体の検出結果等であってよい。ただし、入力データを与える順序は、このような例に限られなくてよい。入力データは、初期状態推定器３３に先に与えられてよく、入力データに対する初期状態推定器３３の演算結果が、物体推定器３１に与えられてもよい。例えば、観測データ２０を物体推定器３１に与えることは、観測データ２０を物体推定器３１にそのまま与えること、観測データ２０の前処理結果を物体推定器３１に与えること、及び観測データ２０に対する初期状態推定器３３の演算結果を物体推定器３１に与えることを含んでよい。また、初期状態推定器３３は、物体の初期状態の記述５１以外の任意のデータを更に出力してもよい。 The input/output form of the initial state estimator 33 may be determined appropriately depending on the embodiment, similar to the object estimator 31. The initial state estimator 33 may further accept input of any data other than the observation data 20. The input data may be provided to the initial state estimator 33 as is, or may be provided to the initial state estimator 33 after preprocessing has been applied. The input data may be provided to the object estimator 31 first, and the calculation results (including intermediate calculation results) of the object estimator 31 on the input data may be provided to the initial state estimator 33. For example, providing the observation data 20 to the initial state estimator 33 may include providing the observation data 20 as is to the initial state estimator 33, providing the preprocessing results of the observation data 20 to the initial state estimator 33, and providing the calculation results of the object estimator 31 on the observation data 20 to the initial state estimator 33. The preprocessing results and the calculation results of the object estimator 31 may be, for example, the detection results of an object in the observation data 20. However, the order in which the input data is provided is not limited to this example. The input data may be provided to the initial state estimator 33 first, and the results of the calculations performed by the initial state estimator 33 on the input data may be provided to the object estimator 31. For example, providing the observation data 20 to the object estimator 31 may include providing the observation data 20 directly to the object estimator 31, providing the results of preprocessing the observation data 20 to the object estimator 31, and providing the results of the calculations performed by the initial state estimator 33 on the observation data 20 to the object estimator 31. The initial state estimator 33 may also output any data other than the description 51 of the initial state of the object.

一例では、物体推定器３１と同様に、初期状態推定器３３は、環境情報２２を更に用いて、初期状態の記述５１を生成してもよい。すなわち、観測データ２０から初期状態の記述５１を生成することは、観測データ２０及び環境情報２２から初期状態の記述５１を生成することにより構成されてもよい。初期状態推定器３３に与える環境情報２２は、実施の形態に応じて適宜選択されてよい。環境情報２２は、上記ドメイン記述２３及びドメイン情報２４の少なくともいずれかを含んでよい。例えば、環境情報２２は、ドメイン記述２３の少なくとも一部を含んでもよい。また、例えば、環境情報２２は、属性情報２４１をドメイン情報２４として含んでもよい。また、例えば、初期状態推定器３３がコンテキスト内学習の訓練済みモデルを備える場合、環境情報２２は、１つ以上の入出力サンプル２４３をドメイン情報２４として含んでもよい。入力サンプルは、観測データ２０（前処理結果及び物体推定器３１の演算結果を含む）のサンプルにより構成されてよい。入出力サンプル２４３以外の環境情報２２が初期状態推定器３３に与えられる場合、入力サンプルは、その環境情報２２のサンプルを含んでもよい。出力サンプルは、入力サンプルに対応する初期状態の記述５１の正解サンプルにより構成されてよい。環境情報２２（ドメイン記述２３、ドメイン情報２４）は、実施の形態に応じて適宜用意されてよい。環境情報２２を更に用いることで、問題記述５における初期状態の記述５１を生成する精度の向上を期待することができる。 In one example, similar to the object estimator 31, the initial state estimator 33 may further use the environmental information 22 to generate the initial state description 51. That is, generating the initial state description 51 from the observation data 20 may be configured by generating the initial state description 51 from the observation data 20 and the environmental information 22. The environmental information 22 provided to the initial state estimator 33 may be selected appropriately depending on the embodiment. The environmental information 22 may include at least one of the domain description 23 and the domain information 24. For example, the environmental information 22 may include at least a portion of the domain description 23. Furthermore, for example, the environmental information 22 may include attribute information 241 as the domain information 24. Furthermore, for example, when the initial state estimator 33 has a trained model for in-context learning, the environmental information 22 may include one or more input/output samples 243 as the domain information 24. The input sample may be configured from a sample of the observation data 20 (including the preprocessing result and the calculation result of the object estimator 31). When environmental information 22 other than the input/output samples 243 is provided to the initial state estimator 33, the input samples may include samples of that environmental information 22. The output samples may be composed of correct samples of the initial state description 51 corresponding to the input samples. The environmental information 22 (domain description 23, domain information 24) may be prepared as appropriate depending on the embodiment. By further using the environmental information 22, it is expected that the accuracy of generating the initial state description 51 in the problem description 5 can be improved.

また、一例では、物体推定器３１の演算処理は、初期状態推定器３３よりも先に完了まで実行されてよい。この場合、物体推定器３１により生成された物体の記述５０は、初期状態の記述５１を生成するための入力データとして、初期状態推定器３３に与えられてもよい。すなわち、初期状態推定器３３は、物体推定器３１により生成された物体の記述５
０を更に用いて、初期状態の記述５１を生成してもよい。初期状態推定器３３がコンテキスト内学習の訓練済みモデルを備え、初期状態推定器３３に与える環境情報２２が入出力サンプル２４３を含む場合、入力サンプルは、物体の記述５０のサンプルを含んでもよい。ただし、推論モジュール３内の演算順序は、このような例に限られなくてよい。初期状態推定器３３の演算処理は、物体推定器３１と少なくとも部分的に並列に実行されてもよいし、物体推定器３１よりも先に完了まで実行されてもよい。初期状態推定器３３の演算処理が物体推定器３１よりも先に完了まで実行される場合、初期状態推定器３３により生成された初期状態の記述５１が物体推定器３１に与えられてもよい。物体推定器３１がコンテキスト内学習の訓練済みモデルを備え、物体推定器３１に与える環境情報２２が入出力サンプル２４３を含む場合、入力サンプルは、初期状態の記述５１のサンプルを含んでもよい。 In one example, the calculation process of the object estimator 31 may be completed before the initial state estimator 33. In this case, the object description 50 generated by the object estimator 31 may be provided to the initial state estimator 33 as input data for generating the initial state description 51. That is, the initial state estimator 33 may use the object description 50 generated by the object estimator 31 to generate the initial state description 51.
0 may be further used to generate the initial state description 51. When the initial state estimator 33 comprises a trained model for in-context learning and the environmental information 22 provided to the initial state estimator 33 includes input/output samples 243, the input samples may include samples of the object description 50. However, the order of operations within the inference module 3 is not limited to this example. The operation process of the initial state estimator 33 may be executed at least partially in parallel with the object estimator 31, or may be executed to completion before the object estimator 31. When the operation process of the initial state estimator 33 is executed to completion before the object estimator 31, the initial state description 51 generated by the initial state estimator 33 may be provided to the object estimator 31. When the object estimator 31 comprises a trained model for in-context learning and the environmental information 22 provided to the object estimator 31 includes input/output samples 243, the input samples may include samples of the initial state description 51.

目標推定器３５は、取得された指示情報２１から環境に存在する１つ以上の物体の目標状態の記述５２を生成するように構成されてよい。推論モジュール３を用いて、問題記述５を生成することは、目標推定器３５を用いて、環境に存在する物体の目標状態の記述５２を生成することを含んでよい。指示情報２１には、タスクの目標に関する情報が表れる。そのため、本実施形態の一例によれば、問題記述５における目標状態の記述５２を適切に生成することができる。 The goal estimator 35 may be configured to generate a goal state description 52 of one or more objects in the environment from the acquired instruction information 21. Generating the problem description 5 using the reasoning module 3 may include generating a goal state description 52 of objects in the environment using the goal estimator 35. The instruction information 21 contains information related to the goal of the task. Therefore, according to one example of this embodiment, the goal state description 52 in the problem description 5 can be appropriately generated.

指示情報２１から目標状態の記述５２を生成可能であれば、目標推定器３５の構成は、特に限られなくてよく、実施の形態に応じて適宜選択されてよい。上記物体推定器３１等と同様に、推論モジュール３の構成に関する上記説明は、目標推定器３５にも適用されてよい。目標推定器３５は、ルールベースモデル及び訓練済みモデルの少なくともいずれかにより構成されてよい。 As long as it is possible to generate a description 52 of the goal state from the instruction information 21, the configuration of the goal estimator 35 is not particularly limited and may be selected appropriately depending on the embodiment. As with the object estimator 31, etc., the above description of the configuration of the inference module 3 may also be applied to the goal estimator 35. The goal estimator 35 may be configured using at least one of a rule-based model and a trained model.

目標推定器３５には、特定のドメインでのみ目標状態の記述５２を推論可能なモデルが使用されてよい。ただし、様々なドメインのタスクに共通に利用可能であるためには、そのような特定のドメインでのみ目標状態の記述５２を推論可能なモデルではなく、目標状態の記述５２を汎用的に推論可能なモデルを目標推定器３５として使用することが望ましい。一例では、目標推定器３５は、コンテキスト内学習の訓練済みモデルを備えてよい。コンテキスト内学習を行う能力を有する訓練済みモデルを備えていることで、目標推定器３５の汎用性を高めることができる。すなわち、環境に存在する物体の目標状態の記述５２を汎用的に推論することができ、これにより、様々なドメインにおけるタスクに対する目標状態の記述５２の生成への対応を期待することができる。 The goal estimator 35 may use a model that can infer goal state descriptions 52 only in a specific domain. However, in order to be commonly usable for tasks in various domains, it is desirable to use a model that can infer goal state descriptions 52 generically as the goal estimator 35, rather than a model that can infer goal state descriptions 52 only in such a specific domain. In one example, the goal estimator 35 may be equipped with a trained model for in-context learning. By providing a trained model capable of in-context learning, the versatility of the goal estimator 35 can be increased. In other words, it can infer goal state descriptions 52 of objects in the environment generically, which can be expected to enable the generation of goal state descriptions 52 for tasks in various domains.

目標推定器３５の入出力の形態は、物体推定器３１等と同様に、実施の形態に応じて適宜決定されてよい。目標推定器３５は、指示情報２１以外の任意のデータの入力を更に受け付けてもよい。入力データは、そのまま目標推定器３５に与えられてもよいし、前処理が適用された後に目標推定器３５に与えられてもよい。物体推定器３１及び初期状態推定器３３の間の関係と同様に、物体推定器３１及び初期状態推定器３３の少なくとも一方の演算結果（中間の演算結果を含む）が、目標推定器３５に与えられてもよい。或いは、目標推定器３５の演算結果（中間の演算結果を含む）が、物体推定器３１及び初期状態推定器３３の少なくとも一方に与えられてもよい。また、目標推定器３５は、物体の目標状態の記述５２以外の任意のデータを更に出力してもよい。 The input/output format of the target estimator 35 may be determined appropriately depending on the embodiment, similar to the object estimator 31, etc. The target estimator 35 may further accept input of any data other than the instruction information 21. The input data may be provided to the target estimator 35 as is, or may be provided to the target estimator 35 after preprocessing has been applied. Similar to the relationship between the object estimator 31 and the initial state estimator 33, the calculation results (including intermediate calculation results) of at least one of the object estimator 31 and the initial state estimator 33 may be provided to the target estimator 35. Alternatively, the calculation results (including intermediate calculation results) of the target estimator 35 may be provided to at least one of the object estimator 31 and the initial state estimator 33. The target estimator 35 may also output any data other than the description 52 of the object's target state.

一例では、物体推定器３１等と同様に、目標推定器３５は、環境情報２２を更に用いて、目標状態の記述５２を生成してもよい。すなわち、指示情報２１から目標状態の記述５２を生成することは、指示情報２１及び環境情報２２から目標状態の記述５２を生成することにより構成されてもよい。目標推定器３５に与える環境情報２２は、実施の形態に応じて適宜選択されてよい。環境情報２２は、上記ドメイン記述２３及びドメイン情報２４
の少なくともいずれかを含んでよい。例えば、環境情報２２は、ドメイン記述２３の少なくとも一部を含んでもよい。また、例えば、環境情報２２は、属性情報２４１をドメイン情報２４として含んでもよい。また、例えば、目標推定器３５がコンテキスト内学習の訓練済みモデルを備える場合、環境情報２２は、１つ以上の入出力サンプル２４３をドメイン情報２４として含んでもよい。入力サンプルは、指示情報２１のサンプルにより構成されてよい。入出力サンプル２４３以外の環境情報２２が目標推定器３５に与えられる場合、入力サンプルは、その環境情報２２のサンプルを含んでもよい。出力サンプルは、入力サンプルに対応する目標状態の記述５２の正解サンプルにより構成されてよい。環境情報２２（ドメイン記述２３、ドメイン情報２４）は、実施の形態に応じて適宜用意されてよい。環境情報２２を更に用いることで、問題記述５における目標状態の記述５２を生成する精度の向上を期待することができる。なお、物体推定器３１、初期状態推定器３３及び目標推定器３５に与える環境情報２２は、少なくとも部分的に重複してもよいし、重複していなくてもよい。 In one example, similar to the object estimator 31, the target estimator 35 may further use the environment information 22 to generate the target state description 52. That is, generating the target state description 52 from the instruction information 21 may be configured by generating the target state description 52 from the instruction information 21 and the environment information 22. The environment information 22 to be provided to the target estimator 35 may be appropriately selected depending on the embodiment. The environment information 22 may be generated by generating the target state description 52 from the instruction information 21 and the environment information 22.
The environmental information 22 may include at least one of the above. For example, the environmental information 22 may include at least a portion of the domain description 23. For example, the environmental information 22 may include attribute information 241 as the domain information 24. For example, if the goal estimator 35 has a trained model for in-context learning, the environmental information 22 may include one or more input/output samples 243 as the domain information 24. The input samples may be composed of samples of the instruction information 21. If environmental information 22 other than the input/output samples 243 is provided to the goal estimator 35, the input samples may include samples of the environmental information 22. The output samples may be composed of correct samples of the goal state description 52 corresponding to the input samples. The environmental information 22 (domain description 23, domain information 24) may be prepared as appropriate depending on the embodiment. By further using the environmental information 22, it is expected that the accuracy of generating the goal state description 52 in the problem description 5 can be improved. The environmental information 22 provided to the object estimator 31, the initial state estimator 33, and the target estimator 35 may or may not overlap at least partially.

また、一例では、物体推定器３１の演算処理は、目標推定器３５よりも先に完了まで実行されてよい。この場合、物体推定器３１により生成される物体の記述５０は、目標状態の記述５２を生成するための入力データとして、目標推定器３５に与えられてもよい。同様に、初期状態推定器３３の演算処理は、目標推定器３５よりも先に完了まで実行されてよい。この場合、初期状態推定器３３により生成される初期状態の記述５１は、目標状態の記述５２を生成するための入力データとして、目標推定器３５に与えられてもよい。すなわち、目標推定器３５は、物体推定器３１及び初期状態推定器３３により生成される物体の記述５０及び初期状態の記述５１の少なくとも一方を更に用いて、目標状態の記述５２を生成してもよい。目標推定器３５がコンテキスト内学習の訓練済みモデルを備え、目標推定器３５に与える環境情報２２が入出力サンプル２４３を含む場合、入力サンプルは、物体の記述５０及び初期状態の記述５１の少なくとも一方のサンプルを含んでもよい。ただし、推論モジュール３内の演算順序は、このような例に限られなくてよい。目標推定器３５の演算処理は、物体推定器３１と少なくとも部分的に並列に実行されてもよいし、物体推定器３１よりも先に完了まで実行されてもよい。目標推定器３５の演算処理が物体推定器３１よりも先に完了まで実行される場合、目標推定器３５により生成された目標状態の記述５２が、物体推定器３１に与えられてもよい。同様に、目標推定器３５の演算処理は、初期状態推定器３３と少なくとも部分的に並列に実行されてもよいし、初期状態推定器３３よりも先に完了まで実行されてもよい。目標推定器３５の演算処理が初期状態推定器３３よりも先に完了まで実行される場合、目標推定器３５により生成された目標状態の記述５２が、初期状態推定器３３に与えられてもよい。物体推定器３１及び初期状態推定器３３の少なくとも一方がコンテキスト内学習の訓練済みモデルを備え、与えられる環境情報２２が入出力サンプル２４３を含む場合、入力サンプルは、目標状態の記述５２のサンプルを含んでもよい。 In one example, the calculation process of the object estimator 31 may be completed before the target estimator 35. In this case, the object description 50 generated by the object estimator 31 may be provided to the target estimator 35 as input data for generating the target state description 52. Similarly, the calculation process of the initial state estimator 33 may be completed before the target estimator 35. In this case, the initial state description 51 generated by the initial state estimator 33 may be provided to the target estimator 35 as input data for generating the target state description 52. In other words, the target estimator 35 may further use at least one of the object description 50 and the initial state description 51 generated by the object estimator 31 and the initial state estimator 33 to generate the target state description 52. When the target estimator 35 has a trained model for in-context learning and the environmental information 22 provided to the target estimator 35 includes input/output samples 243, the input samples may include at least one of the object description 50 and the initial state description 51. However, the order of operations within the inference module 3 does not need to be limited to this example. The calculation process of the goal estimator 35 may be executed at least partially in parallel with the object estimator 31, or may be executed to completion before the object estimator 31. If the calculation process of the goal estimator 35 is executed to completion before the object estimator 31, the goal state description 52 generated by the goal estimator 35 may be provided to the object estimator 31. Similarly, the calculation process of the target estimator 35 may be executed at least partially in parallel with the initial state estimator 33, or may be executed to completion before the initial state estimator 33. If the calculation process of the target estimator 35 is executed to completion before the initial state estimator 33, the goal state description 52 generated by the target estimator 35 may be provided to the initial state estimator 33. If at least one of the object estimator 31 and the initial state estimator 33 includes a trained model for in-context learning, and the provided environment information 22 includes input/output samples 243, the input samples may include samples of the goal state description 52.

なお、推論モジュール３の構成は、このような例に限られなくてよく、実施の形態に応じて適宜変更されてよい。他の一例では、物体推定器３１、初期状態推定器３３及び目標推定器３５の少なくともいずれかの組み合わせは一体的に構成されてもよい。物体推定器３１、初期状態推定器３３及び目標推定器３５の少なくともいずれかは適宜省略されてもよい。また、図４の例において、推論モジュール３がコンテキスト内学習の訓練済みモデル３９を含むことは、物体推定器３１、初期状態推定器３３及び目標推定器３５の少なくともいずれかがコンテキスト内学習の訓練済みモデルを備えることにより構成されてよい。推論モジュール３がコンテキスト内学習の訓練済みモデル３９を含み、推論モジュール３に与える環境情報２２が入出力サンプル２４３を含む場合、情報処理装置１は、入出力サンプル２４３をFew-shotプロンプティングとして用いてよい。例えば、情報処理装置１は、問題記述５を生成する前に、入出力サンプル２４３のみを訓練済みモデル３９に与えてもよいし、問題記述５を生成する際に、他の入力データと共に入出力サンプル２４３を
訓練済みモデル３９に与えてもよい。そして、情報処理装置１は、訓練済みモデル３９の演算処理を実行してもよい。これにより、情報処理装置１は、訓練済みモデル３９において、コンテキスト内学習を実行し、問題記述５を生成するドメインに訓練済みモデル３９を適応させてよい。 Note that the configuration of the inference module 3 is not limited to this example and may be modified as appropriate depending on the embodiment. In another example, at least any combination of the object estimator 31, the initial state estimator 33, and the target estimator 35 may be integrated. At least any one of the object estimator 31, the initial state estimator 33, and the target estimator 35 may be omitted as appropriate. Furthermore, in the example of FIG. 4 , the inference module 3 includes the trained model 39 for in-context learning, which may be configured by at least any one of the object estimator 31, the initial state estimator 33, and the target estimator 35 being equipped with a trained model for in-context learning. When the inference module 3 includes the trained model 39 for in-context learning and the environmental information 22 provided to the inference module 3 includes input/output samples 243, the information processing device 1 may use the input/output samples 243 as few-shot prompting. For example, the information processing device 1 may provide only the input/output sample 243 to the trained model 39 before generating the problem statement 5, or may provide the input/output sample 243 together with other input data to the trained model 39 when generating the problem statement 5. Then, the information processing device 1 may execute calculation processing on the trained model 39. In this way, the information processing device 1 may execute in-context learning on the trained model 39 and adapt the trained model 39 to the domain in which the problem statement 5 is generated.

（物体推定器の具体例）
図５Ａは、本実施形態に係る物体推定器３１の一例を模式的に示す。一例では、物体推定器３１は、訓練済みモデル３１１及びルールベースモデル３１３を備えてよい。訓練済みモデル３１１は、観測データ２０及び環境情報２２が入力されると、環境情報２２を手掛かりとして用いて、観測データ２０に表れる物体を検出し、物体の検出結果２０１を出力するように構成されてよい。訓練済みモデル３１１は、コンテキスト内学習を行う能力を有する訓練済みモデルであってよい。 (Example of an object estimator)
5A schematically illustrates an example of the object estimator 31 according to this embodiment. In one example, the object estimator 31 may include a trained model 311 and a rule-based model 313. When observation data 20 and environmental information 22 are input, the trained model 311 may be configured to detect an object appearing in the observation data 20 using the environmental information 22 as a clue, and output an object detection result 201. The trained model 311 may be a trained model capable of in-context learning.

例えば、センサＳは、カメラであってよく、観測データ２０は、画像データであってよい。環境情報２２は、“white bowl”、“round cutting board”等のように言語表現で物体の属性を示す属性情報２４１のリストであってよい。訓練済みモデル３１１は、参考文献１１（Shilong Liu et al., “Grounding DINO: Marrying DINO with grounded pre-training for open-set object detection”, [online], [令和５年１０月２４日検索], インターネット＜ＵＲＬ：https://arxiv.org/abs/2303.05499＞）、参考文献１２（Alexander Kirillov et al., “Segment Anything”, [online], [令和５年１０月２４日検索], インターネット＜ＵＲＬ：https://arxiv.org/abs/2304.02643＞）等で提案されるopen vocabulary object detection model又はopen vocabulary object segmentation modelであってよい。検出結果２０１は、物体の写る範囲をバウンディングボックスとして検出した結果、及び物体を識別した結果の少なくとも一方を含んでよい。ルールベースモデル３１３は、ルールに従って、訓練済みモデル３１１による物体の検出結果２０１から物体の記述５０を生成するように構成されてよい。ルールは、適宜設定されてよい。 For example, the sensor S may be a camera, and the observation data 20 may be image data. The environmental information 22 may be a list of attribute information 241 that indicates the attributes of an object in linguistic terms, such as "white bowl," "round cutting board," etc. The trained model 311 may be an open vocabulary object detection model or an open vocabulary object segmentation model proposed in Reference 11 (Shilong Liu et al., “Grounding DINO: Marrying DINO with grounded pre-training for open-set object detection”, [online], [searched October 24, 2023], Internet <URL: https://arxiv.org/abs/2303.05499>), Reference 12 (Alexander Kirillov et al., “Segment Anything”, [online], [searched October 24, 2023], Internet <URL: https://arxiv.org/abs/2304.02643>), etc. The detection result 201 may include at least one of a result of detecting the area in which the object is captured as a bounding box and a result of identifying the object. The rule-based model 313 may be configured to generate an object description 50 from the object detection result 201 by the trained model 311 in accordance with rules. The rules may be set as appropriate.

なお、図５Ａの構成は、あくまで物体推定器３１の一例に過ぎない。物体推定器３１の構成は、図５Ａの例に限られなくてよく、実施の形態に応じて適宜変更されてよい。他の一例では、訓練済みモデル３１１には、属性情報２４１以外の環境情報２２が与えられてもよい。観測データ２０は、画像データ以外の形式のデータを更に含んでもよいし、画像データ以外の形式のデータのみで構成されてもよい。訓練済みモデル３１１及びルールベースモデル３１３は一体的に構成されてもよい。ルールベースモデル３１３は、訓練済みモデルに置き換えられてもよい。訓練済みモデル３１１が、物体の記述５０を生成するように構成されてよい。これに応じて、ルールベースモデル３１３は省略されてもよい。 Note that the configuration of FIG. 5A is merely one example of the object estimator 31. The configuration of the object estimator 31 is not limited to the example of FIG. 5A and may be modified as appropriate depending on the embodiment. In another example, the trained model 311 may be provided with environmental information 22 other than attribute information 241. The observation data 20 may further include data in a format other than image data, or may be composed solely of data in a format other than image data. The trained model 311 and the rule-based model 313 may be configured integrally. The rule-based model 313 may be replaced by the trained model. The trained model 311 may be configured to generate an object description 50. Accordingly, the rule-based model 313 may be omitted.

（初期状態推定器の具体例）
図５Ｂは、本実施形態に係る初期状態推定器３３の一例を模式的に示す。一例では、初期状態推定器３３は、検出器３３１及び訓練済みモデル３３３を備えてよい。情報処理装置１は、物体の検出結果２０１に応じて、各物体に対応する部分データ２０５を観測データ２０から抽出してよい。検出器３３１は、抽出された各物体の部分データ２０５から各物体のキャプション２０６を生成するように構成されてよい。各物体のキャプション２０６は、部分データ２０５に表れる物体に関する情報を含むように構成されてよい。訓練済みモデル３３３は、物体の検出結果２０１及びキャプション２０６から物体の初期状態の記述５１を生成するように構成されてよい。訓練済みモデル３３３が物体の検出結果２０１を用いることは、初期状態推定器３３が物体推定器３１の演算結果を観測データ２０として用いることの一例である。検出器３３１及び訓練済みモデル３３３はそれぞれ、コンテキスト内学習を行う能力を有する訓練済みモデルであってよい。 (Specific example of an initial state estimator)
FIG. 5B schematically illustrates an example of the initial state estimator 33 according to this embodiment. In one example, the initial state estimator 33 may include a detector 331 and a trained model 333. The information processing device 1 may extract partial data 205 corresponding to each object from the observation data 20 in accordance with the object detection results 201. The detector 331 may be configured to generate a caption 206 for each object from the extracted partial data 205 for each object. The caption 206 for each object may be configured to include information about the object appearing in the partial data 205. The trained model 333 may be configured to generate a description 51 of the initial state of the object from the object detection results 201 and the caption 206. The trained model 333's use of the object detection results 201 is an example of the initial state estimator 33 using the calculation results of the object estimator 31 as the observation data 20. The detector 331 and the trained model 333 may each be a trained model capable of in-context learning.

例えば、部分データ２０５は、各物体のバウンディングボックスにより抽出される部分
画像データであってよい。検出器３３１は、参考文献１３（Junnan Li et al., “BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large
Language Models”, [online], [令和５年１０月２４日検索], インターネット＜ＵＲＬ：https://arxiv.org/abs/2301.12597＞）等で提案されるvisual question answering modelであってよい。検出器３３１には、部分データ２０５と共に、質問文が適宜与えられてもよい。一例では、質問文は、部分画像に写る物体を質問する所定のテキストデータ（“Q: what does this object describe? A: .”等）で構成されてよい。また、訓練済みモデル３３３は、参考文献１４（OpenAI, “GPT-4 Technical Report”, [online], [令和５年１０月２４日検索], インターネット＜ＵＲＬ：https://arxiv.org/pdf/2303.08774.pdf＞）等で提案される大規模言語モデル（LLM）であってよい。これに応じて、キャプション２０６は、部分データ２０５（部分画像データ）を説明するテキストデータであってよい。物体の検出結果２０１は、テキストデータ２０２に適宜変換されてよく、得られたテキストデータ２０２が、検出結果２０１（観測データ２０）として訓練済みモデル３３３に与えられてよい。一例では、テキストデータ２０２は、検出された物体の名称（識別結果）及び物体のバウンディングボックスの座標範囲を示すテキストにより構成されてよい。訓練済みモデル３３３には、物体推定器３１により生成された物体の記述５０が更に与えられてもよい。また、入出力サンプル２４３が、環境情報２２（ドメイン情報２４）として訓練済みモデル３３３に更に与えられてもよい。一例では、入力サンプルは、テキストデータ２０２、キャプション２０６及び物体の記述５０のサンプルにより構成されてよく、出力サンプルは、対応する初期状態の記述５１の正解サンプルにより構成されてよい。入出力サンプル２４３は、訓練済みモデル３３３のFew-shotプロンプティングに用いられてよい。 For example, the partial data 205 may be partial image data extracted by a bounding box of each object. The detector 331 is based on the method described in Reference 13 (Junnan Li et al., "BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Image Encoders").
The pre-trained model 333 may be a visual question answering model proposed in, for example, "OpenAI, "GPT-4 Language Models", [online], [searched October 24, 2023], the Internet (URL: https://arxiv.org/abs/2301.12597>). A question may be provided to the detector 331 along with the partial data 205 as appropriate. In one example, the question may be composed of predetermined text data (such as "Q: what does this object describe? A: .") that asks about an object appearing in a partial image. The pre-trained model 333 may be a visual question answering model proposed in, for example, "OpenAI, "GPT-4 Technical Report", [online], [searched October 24, 2023], The caption 206 may be a large-scale language model (LLM) proposed on the Internet (URL: https://arxiv.org/pdf/2303.08774.pdf), etc. Accordingly, the caption 206 may be text data describing the partial data 205 (partial image data). The object detection result 201 may be appropriately converted into text data 202, and the obtained text data 202 may be provided to the trained model 333 as the detection result 201 (observation data 20). In one example, the text data 202 may include the name of the detected object (identification result) and the coordinates of the object's bounding box. The input sample 243 may be composed of text indicating the target range. The trained model 333 may further be provided with an object description 50 generated by the object estimator 31. In addition, input/output samples 243 may further be provided to the trained model 333 as the environment information 22 (domain information 24). In one example, the input sample may be composed of samples of the text data 202, the caption 206, and the object description 50, and the output sample may be composed of correct samples of the corresponding initial state description 51. The input/output samples 243 may be used for few-shot prompting of the trained model 333.

なお、図５Ｂの構成は、あくまで初期状態推定器３３の一例に過ぎない。初期状態推定器３３の構成は、図５Ｂの例に限られなくてよく、実施の形態に応じて適宜変更されてよい。他の一例では、訓練済みモデル３３３には、入出力サンプル２４３以外の環境情報２２が与えられてもよい。また、他の一例では、訓練済みモデル３３３は、大規模視覚言語モデル等のように、テキスト以外の他のモダリティのデータの入力を受け付ける入力部分を備えてよい。これにより、訓練済みモデル３３３は、他のモダリティのデータの入力を受け付け可能に構成されてよい。これに応じて、観測データ２０又は検出結果２０１はそのまま訓練済みモデル３３３に与えられてもよい。キャプション２０６は、テキストデータ以外の形式のデータにより構成されてもよい。また、部分データ２０５（観測データ２０）は、画像データ以外の形式のデータを更に含んでもよいし、画像データ以外の形式のデータのみで構成されてもよい。検出結果２０１のデータ形式は、特に限られなくてよく、実施の形態に応じて適宜選択されてよい。検出器３３１は省略されてもよい。 Note that the configuration of FIG. 5B is merely one example of the initial state estimator 33. The configuration of the initial state estimator 33 is not limited to the example of FIG. 5B and may be modified as appropriate depending on the embodiment. In another example, the trained model 333 may be provided with environmental information 22 other than the input/output samples 243. In another example, the trained model 333 may include an input section that accepts input of data of modalities other than text, such as a large-scale visual language model. In this way, the trained model 333 may be configured to be able to accept input of data of other modalities. Accordingly, the observation data 20 or the detection result 201 may be provided to the trained model 333 as is. The caption 206 may be composed of data in a format other than text data. Furthermore, the partial data 205 (observation data 20) may further include data in a format other than image data, or may be composed solely of data in a format other than image data. The data format of the detection result 201 is not particularly limited and may be selected as appropriate depending on the embodiment. The detector 331 may be omitted.

（目標推定器の具体例）
図５Ｃは、本実施形態に係る目標推定器３５の一例を模式的に示す。一例では、目標推定器３５は、訓練済みモデル３５１を備えてよい。訓練済みモデル３５１は、指示情報２１から物体の目標状態の記述５２を生成するように構成されてよい。訓練済みモデル３５１は、コンテキスト内学習を行う能力を有するものであってよい。 (Example of a target estimator)
5C is a schematic diagram illustrating an example of a target estimator 35 according to the present embodiment. In one example, the target estimator 35 may include a trained model 351. The trained model 351 may be configured to generate a target state description 52 of the object from the instruction information 21. The trained model 351 may be capable of in-context learning.

例えば、訓練済みモデル３５１は、上記参考文献１４等で提案される大規模言語モデル（LLM）であってよい。これに応じて、指示情報２１は、目標を自然言語で指示するテキストデータにより構成されてよい。訓練済みモデル３５１には、物体推定器３１により生成された物体の記述５０及び初期状態推定器３３により生成された初期状態の記述５１の少なくとも一方が更に与えられてもよい。また、入出力サンプル２４３が、環境情報２２（ドメイン情報２４）として訓練済みモデル３５１に更に与えられてもよい。一例では、入力サンプルは、指示情報２１、物体の記述５０及び初期状態の記述５１のサンプルにより構成されてよく、出力サンプルは、対応する目標状態の記述５２の正解サンプルにより
構成されてよい。入出力サンプル２４３は、訓練済みモデル３５１のFew-shotプロンプティングに用いられてよい。訓練済みモデル３５１は、初期状態推定器３３の訓練済みモデル３３３と共通に用意されてもよいし、別個に用意されてもよい。 For example, the trained model 351 may be a large-scale language model (LLM) proposed in Reference 14 or the like. Accordingly, the instruction information 21 may be composed of text data instructing a target in natural language. The trained model 351 may further be provided with at least one of an object description 50 generated by the object estimator 31 and an initial state description 51 generated by the initial state estimator 33. Furthermore, input/output samples 243 may be further provided to the trained model 351 as the environment information 22 (domain information 24). In one example, the input samples may be composed of samples of the instruction information 21, the object description 50, and the initial state description 51, and the output samples may be composed of correct samples of the corresponding target state description 52. The input/output samples 243 may be used for few-shot prompting of the trained model 351. The trained model 351 may be prepared in common with the trained model 333 of the initial state estimator 33, or may be prepared separately.

なお、図５Ｃの構成は、あくまで目標推定器３５の一例に過ぎない。目標推定器３５の構成は、図５Ｃの例に限られなくてよく、実施の形態に応じて適宜変更されてよい。他の一例では、訓練済みモデル３５１には、入出力サンプル２４３以外の環境情報２２が与えられてもよい。また、他の一例では、訓練済みモデル３５１は、上記訓練済みモデル３３３と同様に、他のモダリティのデータの入力を受け付け可能に構成されてよい。これに応じて、訓練済みモデル３５１には、テキストデータ以外の形式のデータが与えられてもよい。指示情報２１は、テキストデータ以外の形式のデータを更に含んでもよいし、テキストデータ以外の形式のデータのみで構成されてもよい。 Note that the configuration of Figure 5C is merely one example of the target estimator 35. The configuration of the target estimator 35 is not limited to the example of Figure 5C and may be modified as appropriate depending on the embodiment. In another example, the trained model 351 may be provided with environmental information 22 other than the input/output sample 243. In another example, the trained model 351 may be configured to be able to accept input of data of other modalities, similar to the trained model 333 described above. Accordingly, the trained model 351 may be provided with data in a format other than text data. The instruction information 21 may further include data in a format other than text data, or may be composed solely of data in a format other than text data.

［ロボット装置］
ロボット装置Ｒは、特に限られなくてよく、実施の形態に応じて適宜選択されてよい。ロボット装置Ｒは、例えば、生産ラインにおける産業用ロボット、自律的に動作可能に構成された自律ロボット、移動可能に構成された移動体等であってよい。産業用ロボットは、例えば、垂直多関節ロボット、水平多関節ロボット（スカラロボット）、パラレルリンクロボット、直交ロボット等であってよい。自律ロボットは、例えば、人型ロボット、案内ロボット、農業用ロボット、介護用ロボット、セキュリティロボット、運搬ロボット等であってよい。自律処理の内容は、実施の形態に応じて適宜選択されてよい。移動体は、例えば、掃除ロボット、移動可能に構成された上記自律ロボット（モバイルロボットを含む）、自動運転可能に構成された車両、自動飛行可能な飛行体（ドローン等）等を含んでよい。ロボット装置Ｒは、実空間又は仮想空間の存在であってよい。生成される問題記述５を用いた行動計画は、実空間におけるロボット装置Ｒの制御のために行われてもよいし、仮想空間におけるロボット装置Ｒのシミュレーションのために行われてもよい。 [Robot device]
The robot device R is not particularly limited and may be selected appropriately depending on the embodiment. The robot device R may be, for example, an industrial robot used in a production line, an autonomous robot configured to operate autonomously, or a mobile body configured to move. The industrial robot may be, for example, a vertical articulated robot, a horizontal articulated robot (SCARA robot), a parallel link robot, or an orthogonal robot. The autonomous robot may be, for example, a humanoid robot, a guide robot, an agricultural robot, a care robot, a security robot, or a transport robot. The content of the autonomous processing may be selected appropriately depending on the embodiment. The mobile body may include, for example, a cleaning robot, the above-mentioned autonomous robot configured to move (including a mobile robot), a vehicle configured to be self-driving, or an air vehicle capable of self-flying (such as a drone). The robot device R may exist in real space or virtual space. The action plan generated using the problem description 5 may be performed for controlling the robot device R in real space or for simulating the robot device R in virtual space.

本実施形態による問題記述５の生成は、ロボット装置Ｒが達成可能なあらゆるタスクに適用されてよい。タスクは、例えば、作業、移動等であってよい。作業は、例えば、組み立て、調理、掃除、化学実験等であってよい。ロボット装置Ｒが通信装置を備える場合、センシングデータ（観測データ２０）は、通信により得られるデータを含んでよい。例えば、移動体の行動計画（移動計画）を生成する場面では、センシングデータは、路車間通信、車車間通信等により得られるデータを含んでよい。 The generation of the problem description 5 according to this embodiment may be applied to any task that can be accomplished by the robot device R. The task may be, for example, work, movement, etc. The work may be, for example, assembly, cooking, cleaning, chemical experiments, etc. If the robot device R is equipped with a communication device, the sensing data (observation data 20) may include data obtained through communication. For example, in a situation where a behavior plan (movement plan) for a moving object is generated, the sensing data may include data obtained through road-to-vehicle communication, vehicle-to-vehicle communication, etc.

また、ロボット装置Ｒは、人間と共存する場面で運用されてよい。この場面では、指示情報２１は、人間（オペレータ）により与えられてよい。生成された問題記述５は、人間に知覚可能に出力されてよい（例えば、ディスプレイに出力されてよい）。情報処理装置１は、生成された問題記述５に対する人間の介入（人手による修正）を受け付けてよい。一例では、人間の介入により、例えば、ロボット装置Ｒの移動量、角度、姿勢、揺れ等の駆動具合が修正されてよい。推論モジュール３（物体推定器３１、初期状態推定器３３及び目標推定器３５の少なくともいずれか）にコンテキスト内学習を行う能力を有する訓練済みモデルを用いることで、情報処理装置１とオペレータとの対話形式のインタラクションを通じて、問題記述５が生成されてもよい。人間による入力は、情報処理装置１に直接的に与えられてもよいし、ロボット装置Ｒを介して間接的に情報処理装置１に与えられてもよい。人間による入力は、例えば、入力装置の操作（テキスト入力を含む）、録音、撮像等の任意の方法で行われてよい。プランナは、人間が存在することを考慮して、問題記述５から行動計画を生成するように構成されてもよい。 The robot device R may also be used in situations where it coexists with humans. In such situations, the instruction information 21 may be provided by a human (operator). The generated problem description 5 may be output in a manner perceptible to humans (e.g., output to a display). The information processing device 1 may accept human intervention (manual correction) of the generated problem description 5. In one example, human intervention may correct the driving characteristics of the robot device R, such as the amount of movement, angle, posture, and sway. The problem description 5 may be generated through interactive interaction between the information processing device 1 and the operator by using a trained model capable of in-context learning in the inference module 3 (at least one of the object estimator 31, initial state estimator 33, and target estimator 35). Human input may be provided directly to the information processing device 1 or indirectly to the information processing device 1 via the robot device R. Human input may be provided by any method, such as operating an input device (including text input), recording, or capturing an image. The planner may be configured to generate an action plan from the problem statement 5, taking into account the presence of humans.

［問題記述の出力］
問題記述５の出力形態は、特に限られなくてよく、実施の形態に応じて適宜選択されて
よい。単純な一例では、生成された問題記述５は、そのまま出力されてよい。問題記述５を出力することは、情報処理装置１のメモリ・リソース、情報処理装置１に接続される出力装置及び外部コンピュータの少なくともいずれかに問題記述５を出力することを含んでもよい。出力方法は、実施の形態に応じて適宜選択されてよい。例えば、問題記述５は、テキスト、画像、音声又はこれらの組み合わせにより出力されてよい。 [Problem description output]
The output form of the problem description 5 is not particularly limited and may be selected appropriately depending on the embodiment. In a simple example, the generated problem description 5 may be output as is. Outputting the problem description 5 may include outputting the problem description 5 to at least one of a memory resource of the information processing device 1, an output device connected to the information processing device 1, and an external computer. The output method may be selected appropriately depending on the embodiment. For example, the problem description 5 may be output as text, an image, audio, or a combination thereof.

他の一例では、問題記述５を出力することは、問題記述５をプランナに与えて、プランナにより行動計画を生成することの少なくとも一部を含んでよい。プランナにより行動計画を生成することを含む場合、問題記述５を出力することは、生成された行動計画に従って、ロボット装置Ｒの動作を制御すること（すなわち、ロボット装置Ｒを駆動すること）を更に含んでもよい。 In another example, outputting the problem description 5 may include at least a part of providing the problem description 5 to a planner and generating an action plan using the planner. If generating an action plan using the planner is included, outputting the problem description 5 may further include controlling the operation of the robotic device R (i.e., driving the robotic device R) in accordance with the generated action plan.

プランナによる行動計画の生成は、情報処理装置１及び１台以上の外部コンピュータの少なくともいずれかにより実行されてよい。プランナが、シンボリックプランナ６１及びモーションプランナ６５を備える場合、シンボリックプランナ６１及びモーションプランナ６５の一方の演算処理を情報処理装置１が実行し、他方の演算処理を外部コンピュータが実行してもよい。シンボリックプランナ６１及びモーションプランナ６５の両方の演算処理は、情報処理装置１及び１台以上の外部コンピュータの一方で実行されてもよい。シンボリックプランナ６１及びモーションプランナ６５の両方の演算処理を１台以上の外部コンピュータが実行する場合、シンボリックプランナ６１の演算処理を実行する外部コンピュータとモーションプランナ６５の演算処理を実行する外部コンピュータとは同一であってもよいし、異なってもよい。 The generation of an action plan by the planner may be executed by at least one of the information processing device 1 and one or more external computers. If the planner includes a symbolic planner 61 and a motion planner 65, the information processing device 1 may execute the calculations for one of the symbolic planner 61 and the motion planner 65, and the external computer may execute the calculations for the other. The calculations for both the symbolic planner 61 and the motion planner 65 may be executed by either the information processing device 1 or one or more external computers. If the calculations for both the symbolic planner 61 and the motion planner 65 are executed by one or more external computers, the external computer that executes the calculations for the symbolic planner 61 and the external computer that executes the calculations for the motion planner 65 may be the same or different.

また、生成された行動計画によるロボット装置Ｒの制御は、情報処理装置１及び外部コンピュータの少なくともいずれかにより実行されてよい。プランナによる行動計画の生成の少なくとも一部及びロボット装置Ｒの制御を外部コンピュータにより実行する場合、プランナによる行動計画の生成を実行する外部コンピュータとロボット装置Ｒを制御する外部コンピュータとは同一であってもよいし、異なっていてもよい。１台以上の外部コンピュータが、行動計画を生成することの少なくとも一部を実行し、情報処理装置１が、生成された行動計画に従ってロボット装置Ｒを制御することを実行してもよい。情報処理装置１及び外部コンピュータは、データ通信等の任意の方法でデータ交換を適宜行ってよい。 Furthermore, the control of the robot device R according to the generated behavior plan may be executed by at least one of the information processing device 1 and an external computer. When at least a portion of the generation of the behavior plan by the planner and the control of the robot device R are executed by an external computer, the external computer that executes the generation of the behavior plan by the planner and the external computer that controls the robot device R may be the same or different. One or more external computers may execute at least a portion of the generation of the behavior plan, and the information processing device 1 may execute the control of the robot device R according to the generated behavior plan. The information processing device 1 and the external computer may exchange data as appropriate using any method, such as data communication.

§２構成例
［ハードウェア構成］
図６は、本実施形態に係る情報処理装置１のハードウェア構成の一例を模式的に例示する。図６の一例では、本実施形態に係る情報処理装置１は、制御部１１、記憶部１２、外部インタフェース１３、入力装置１４、出力装置１５、及びドライブ１６が電気的に接続されたコンピュータである。 §2 Configuration example [Hardware configuration]
6 is a schematic diagram illustrating an example of a hardware configuration of the information processing device 1 according to this embodiment. In the example of FIG. 6, the information processing device 1 according to this embodiment is a computer to which a control unit 11, a storage unit 12, an external interface 13, an input device 14, an output device 15, and a drive 16 are electrically connected.

制御部１１は、ハードウェアプロセッサであるＣＰＵ（Central Processing Unit）、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）等を含み、プログラム及び各種データに基づいて情報処理を実行するように構成される。制御部１１（ＣＰＵ）は、プロセッサ・リソースの一例である。記憶部１２は、例えば、ハードディスクドライブ、ソリッドステートドライブ等で構成されてよい。記憶部１２、ＲＡＭ及びＲＯＭは、メモリ・リソースの一例である。本実施形態では、記憶部１２は、プログラム８１、モジュールデータ３００等の各種情報を記憶する。 The control unit 11 includes a hardware processor such as a CPU (Central Processing Unit), RAM (Random Access Memory), and ROM (Read Only Memory), and is configured to execute information processing based on programs and various data. The control unit 11 (CPU) is an example of a processor resource. The storage unit 12 may be configured, for example, as a hard disk drive or solid-state drive. The storage unit 12, RAM, and ROM are examples of memory resources. In this embodiment, the storage unit 12 stores various information such as the program 81 and module data 300.

プログラム８１は、問題記述５の生成に関する情報処理（後述の図８）を情報処理装置１に実行させるためのプログラムである。プログラム８１は、当該情報処理の一連の命令を含む。モジュールデータ３００は、推論モジュール３に関する情報を示す。問題記述５
を生成する際に推論モジュール３を再生可能であれば、モジュールデータ３００の構成は、特に限られなくてよく、実施の形態に応じて適宜決定されてよい。一例では、機械学習モデルを含む場合、モジュールデータ３００は、機械学習により調整された演算パラメータの値を示す情報を含んでよい。場合によって、モジュールデータ３００は、機械学習モデルの構成（例えば、ニューラルネットワークの構造等）を示す情報を更に含んでもよい。ルールベースモデルを含む場合、モジュールデータ３００は、ルールを示す情報を含んでよい。推論モジュール３が、図４の構成を有する場合、物体推定器３１、初期状態推定器３３及び目標推定器３５に関する情報は、別個のデータ（ファイル）で保持されてもよいし、少なくともいずれかの組み合わせは同一のデータで保持されてもよい。モジュールデータ３００は、プログラム８１に組み込まれてもよい。 The program 81 is a program for causing the information processing device 1 to execute information processing (see FIG. 8 described later) related to the generation of the problem statement 5. The program 81 includes a series of instructions for the information processing. The module data 300 indicates information related to the inference module 3.
As long as the inference module 3 can be reproduced when generating the module data 300, the configuration of the module data 300 is not particularly limited and may be determined appropriately depending on the embodiment. For example, if the module data 300 includes a machine learning model, the module data 300 may include information indicating values of calculation parameters adjusted by machine learning. In some cases, the module data 300 may further include information indicating the configuration of the machine learning model (e.g., the structure of a neural network). If the module data 300 includes a rule-based model, the module data 300 may include information indicating rules. If the inference module 3 has the configuration shown in FIG. 4 , information regarding the object estimator 31, the initial state estimator 33, and the target estimator 35 may be stored as separate data (files), or at least any combination of the information may be stored as the same data. The module data 300 may be incorporated into the program 81.

外部インタフェース１３は、有線又は無線で外部装置と接続するように構成される。外部インタフェース１３は、例えば、ＵＳＢ（Universal Serial Bus）ポート、通信ポート、専用ポート等であってよい。外部インタフェース１３が通信ポートを含む場合、通信ポートの通信規格は、任意に選択されてよい。本実施形態では、情報処理装置１は、外部インタフェース１３を介して、外部装置（例えば、センサＳ、ロボット装置Ｒ、外部コンピュータ等）に接続されてよい。 The external interface 13 is configured to connect to an external device via a wired or wireless connection. The external interface 13 may be, for example, a USB (Universal Serial Bus) port, a communication port, a dedicated port, etc. If the external interface 13 includes a communication port, the communication standard of the communication port may be selected arbitrarily. In this embodiment, the information processing device 1 may be connected to an external device (e.g., a sensor S, a robot device R, an external computer, etc.) via the external interface 13.

入力装置１４は、例えば、マウス、キーボード等の入力を行うための装置である。出力装置１５は、例えば、ディスプレイ、スピーカ等の出力を行うための装置である。オペレータは、入力装置１４及び出力装置１５を利用することで、情報処理装置１を操作することができる。入力装置１４及び出力装置１５は、外部インタフェース１３を介して接続されてよい。入力装置１４及び出力装置１５は、例えば、タッチパネルディスプレイ等により一体的に構成されてもよい The input device 14 is a device for inputting information, such as a mouse or keyboard. The output device 15 is a device for outputting information, such as a display or speaker. An operator can operate the information processing device 1 using the input device 14 and output device 15. The input device 14 and output device 15 may be connected via the external interface 13. The input device 14 and output device 15 may also be integrated into one device, such as a touch panel display.

ドライブ１６は、記憶媒体９１に記憶されたプログラム等の各種情報を読み込むための装置である。上記プログラム８１及びモジュールデータ３００の少なくとも一方は、記憶部１２に代えて又は記憶部１２と共に、記憶媒体９１に格納されていてもよい。記憶媒体９１は、コンピュータ等の機械が各種情報（記憶されたプログラム等）を読み取り可能なように、電気的、磁気的、光学的、機械的又は化学的作用により当該情報を蓄積するように構成される。情報処理装置１は、上記プログラム８１及びモジュールデータ３００の少なくとも一方を記憶媒体９１から取得してよい。なお、記憶媒体９１は、ＣＤ、ＤＶＤ等のディスク型の記憶媒体であってもよいし、又は半導体メモリ（例えば、フラッシュメモリ）等のディスク型以外の記憶媒体であってもよい。ドライブ１６の種類は、記憶媒体９１の種類に応じて適宜選択されてよい。ドライブ１６は、外部インタフェース１３を介して接続されてもよい。 The drive 16 is a device for reading various information, such as programs, stored in the storage medium 91. At least one of the program 81 and the module data 300 may be stored in the storage medium 91 instead of or together with the storage unit 12. The storage medium 91 is configured to store various information (such as stored programs) electrically, magnetically, optically, mechanically, or chemically so that a computer or other machine can read the information. The information processing device 1 may acquire at least one of the program 81 and the module data 300 from the storage medium 91. The storage medium 91 may be a disk-type storage medium, such as a CD or DVD, or a non-disk-type storage medium, such as a semiconductor memory (e.g., flash memory). The type of drive 16 may be selected appropriately depending on the type of storage medium 91. The drive 16 may be connected via the external interface 13.

なお、情報処理装置１の具体的なハードウェア構成に関して、実施形態に応じて、適宜、構成要素の省略、置換及び追加が可能である。例えば、制御部１１は、複数のハードウェアプロセッサを含んでもよい。ハードウェアプロセッサは、マイクロプロセッサ、ＦＰＧＡ（field-programmable gate array）、ＤＳＰ（digital signal processor）、ＧＰＵ（Graphics Processing Unit）、ＡＳＩＣ（application specific integrated circuit）等により構成されてよい。記憶部１２は、制御部１１に含まれるＲＡＭ及びＲＯＭにより構成されてもよい。外部インタフェース１３、入力装置１４、出力装置１５、及びドライブ１６の少なくともいずれかは省略されてもよい。情報処理装置１は、複数台のコンピュータで構成されてもよい。この場合、各コンピュータのハードウェア構成は、一致していてもよいし、又は一致していなくてもよい。また、情報処理装置１は、提供されるサービス専用に設計された情報処理装置の他、汎用のサーバ装置、汎用のＰＣ（Personal Computer）、タブレットＰＣ、端末装置等であってもよい。外部コンピュータが利用される場合に、外部コンピュータのハードウェア構成は、情報処理装置１と同様であってよい
。外部コンピュータは、提供されるサービス専用に設計された情報処理装置の他、汎用のサーバ装置、汎用のＰＣ、タブレットＰＣ、端末装置等であってもよい。 Note that, with regard to the specific hardware configuration of the information processing device 1, components may be omitted, replaced, or added as appropriate depending on the embodiment. For example, the control unit 11 may include multiple hardware processors. The hardware processor may be configured with a microprocessor, a field-programmable gate array (FPGA), a digital signal processor (DSP), a graphics processing unit (GPU), an application-specific integrated circuit (ASIC), or the like. The storage unit 12 may be configured with RAM and ROM included in the control unit 11. At least one of the external interface 13, the input device 14, the output device 15, and the drive 16 may be omitted. The information processing device 1 may be configured with multiple computers. In this case, the hardware configurations of the computers may or may not be identical. Furthermore, the information processing device 1 may be an information processing device designed specifically for the service provided, as well as a general-purpose server device, a general-purpose personal computer (PC), a tablet PC, a terminal device, or the like. When an external computer is used, the hardware configuration of the external computer may be the same as that of the information processing device 1. The external computer may be an information processing device designed specifically for the service to be provided, or may be a general-purpose server device, a general-purpose PC, a tablet PC, a terminal device, or the like.

［ソフトウェア構成］
図７は、本実施形態に係る情報処理装置１のソフトウェア構成の一例を模式的に例示する。情報処理装置１の制御部１１は、記憶部１２に記憶されたプログラム８１をＲＡＭに展開し、プログラム８１に含まれる命令をＣＰＵにより実行する。これにより、情報処理装置１は、取得部１１１、生成部１１２、及び出力部１１３をソフトウェアモジュールとして備えるコンピュータとして動作する。すなわち、本実施形態では、情報処理装置１の各ソフトウェアモジュールは、制御部１１（ＣＰＵ）により実現される。 [Software configuration]
7 schematically illustrates an example of the software configuration of the information processing device 1 according to this embodiment. The control unit 11 of the information processing device 1 loads a program 81 stored in the storage unit 12 into RAM and executes instructions included in the program 81 using the CPU. As a result, the information processing device 1 operates as a computer including an acquisition unit 111, a generation unit 112, and an output unit 113 as software modules. That is, in this embodiment, each software module of the information processing device 1 is realized by the control unit 11 (CPU).

取得部１１１は、ロボット装置Ｒの動作する環境の観測データ２０、及びロボット装置Ｒに与えるタスクの目標に関する指示情報２１を取得するように構成される。一例では、取得部１１１は、環境に関する環境情報２２を更に取得するように構成されてよい。情報処理装置１がモジュールデータ３００を保持していることで、生成部１１２は、推論モジュール３を備える。生成部１１２は、推論モジュール３を用いて、取得された観測データ２０及び指示情報２１からタスクの問題記述５を生成するように構成される。一例では、観測データ２０及び指示情報２１からタスクの問題記述５を生成することは、観測データ２０、指示情報２１及び環境情報２２からタスクの問題記述５を生成することにより構成されてよい。出力部１１３は、生成された問題記述５を出力するように構成される。 The acquisition unit 111 is configured to acquire observation data 20 of the environment in which the robot device R operates, and instruction information 21 regarding the goal of the task to be given to the robot device R. In one example, the acquisition unit 111 may further acquire environmental information 22 regarding the environment. As the information processing device 1 holds module data 300, the generation unit 112 is provided with an inference module 3. The generation unit 112 is configured to generate a task problem description 5 from the acquired observation data 20 and instruction information 21 using the inference module 3. In one example, generating the task problem description 5 from the observation data 20 and instruction information 21 may be configured by generating the task problem description 5 from the observation data 20, instruction information 21, and environmental information 22. The output unit 113 is configured to output the generated problem description 5.

なお、本実施形態では、情報処理装置１の各ソフトウェアモジュールがいずれも汎用のＣＰＵによって実現される例について説明している。しかしながら、上記ソフトウェアモジュールの一部又は全部は、１又は複数の専用のプロセッサ又はチップセットにより実現されてもよい。上記各モジュールは、ハードウェアモジュールとして実現されてもよい。情報処理装置１のソフトウェア構成に関して、実施形態に応じて、適宜、モジュールの省略、置換及び追加が行われてもよい。 In this embodiment, an example is described in which each software module of the information processing device 1 is implemented by a general-purpose CPU. However, some or all of the above software modules may be implemented by one or more dedicated processors or chipsets. Each of the above modules may also be implemented as a hardware module. With regard to the software configuration of the information processing device 1, modules may be omitted, replaced, or added as appropriate depending on the embodiment.

§３動作例
図８は、本実施形態に係る情報処理装置１の処理手順の一例を示すフローチャートである。以下の処理手順は、コンピュータにより実行される情報処理方法の一例である。ただし、以下の情報処理装置１の処理手順は、一例に過ぎず、各ステップは可能な限り変更されてよい。また、以下の処理手順について、実施の形態に応じて、適宜、ステップの省略、置換、及び追加が可能である。 §3 Operational Example Figure 8 is a flowchart showing an example of the processing procedure of the information processing device 1 according to this embodiment. The following processing procedure is an example of an information processing method executed by a computer. However, the following processing procedure of the information processing device 1 is merely an example, and each step may be changed as much as possible. Furthermore, steps in the following processing procedure may be omitted, replaced, or added as appropriate depending on the embodiment.

（ステップＳ１０１）
ステップＳ１０１では、制御部１１は、取得部１１１として動作し、観測データ２０及び指示情報２１を取得する。一例では、観測データ２０は、１つ以上のセンサＳのセンシングデータにより構成されてよく、指示情報２１は、目標を自然言語で指示する言語情報により構成されてよい。また、一例では、制御部１１は、環境情報２２を更に取得してよい。環境情報２２は、ドメイン記述２３及びドメイン情報２４の少なくとも一方を含んでよい。ドメイン情報２４は、属性情報２４１を含んでよい。推論モジュール３がコンテキスト内学習の訓練済みモデル３９を含む場合、ドメイン情報２４は、入出力サンプル２４３を含んでよい。観測データ２０及び指示情報２１を取得すると、制御部１１は、次のステップＳ１０２に処理を進める。 (Step S101)
In step S101, the control unit 11 operates as the acquisition unit 111 and acquires observation data 20 and instruction information 21. In one example, the observation data 20 may be composed of sensing data from one or more sensors S, and the instruction information 21 may be composed of linguistic information that instructs a target in natural language. In another example, the control unit 11 may further acquire environment information 22. The environment information 22 may include at least one of a domain description 23 and domain information 24. The domain information 24 may include attribute information 241. If the inference module 3 includes a trained model 39 for in-context learning, the domain information 24 may include input/output samples 243. After acquiring the observation data 20 and the instruction information 21, the control unit 11 proceeds to the next step S102.

（ステップＳ１０２）
ステップＳ１０２では、制御部１１は、生成部１１２として動作し、推論モジュール３を用いて、取得された観測データ２０及び指示情報２１からタスクの問題記述５を生成する。一例では、環境情報２２を更に取得した場合、制御部１１は、推論モジュール３を用
いて、観測データ２０、指示情報２１及び環境情報２２から問題記述５を生成してよい。 (Step S102)
In step S102, the control unit 11 operates as the generation unit 112, and uses the inference module 3 to generate a problem description 5 of the task from the acquired observation data 20 and instruction information 21. In one example, if environmental information 22 is further acquired, the control unit 11 may use the inference module 3 to generate the problem description 5 from the observation data 20, the instruction information 21, and the environmental information 22.

生成される問題記述５は、環境に存在する１つ以上の物体の初期状態及び目標状態の記述（５１、５２）を含む。一例では、生成された問題記述５は、所定のフォーマットに従うものであってよい。また、生成された問題記述５は、環境の存在する物体の記述５０を更に含んでもよい。 The generated problem description 5 includes descriptions (51, 52) of the initial and goal states of one or more objects present in the environment. In one example, the generated problem description 5 may follow a predetermined format. The generated problem description 5 may also include descriptions 50 of objects present in the environment.

また、一例では、推論モジュール３を用いて、問題記述５を生成することは、物体推定器３１により物体の記述５０を生成すること、初期状態推定器３３により初期状態の記述５１を生成すること、及び目標推定器３５により目標状態の記述５２を生成することの少なくともいずれかを含んでよい。推論モジュール３がコンテキスト内学習の訓練済みモデル３９を含み、ドメイン情報２４が入出力サンプル２４３を含む場合、制御部１１は、入出力サンプル２４３をFew-shotプロンプティングとして用いて、問題記述５を生成するドメインに訓練済みモデル３９を適応させてよい。問題記述５を生成すると、制御部１１は、次のステップＳ１０３に処理を進める。 In one example, generating the problem statement 5 using the inference module 3 may include at least one of generating an object statement 50 using the object estimator 31, generating an initial state statement 51 using the initial state estimator 33, and generating a goal state statement 52 using the goal estimator 35. If the inference module 3 includes a trained model 39 for in-context learning and the domain information 24 includes input/output samples 243, the control unit 11 may use the input/output samples 243 as few-shot prompting to adapt the trained model 39 to the domain for generating the problem statement 5. After generating the problem statement 5, the control unit 11 proceeds to the next step S103.

（ステップＳ１０３）
ステップＳ１０３では、制御部１１は、出力部１１３として動作し、生成された問題記述５を出力する。一例では、制御部１１は、情報処理装置１のメモリ・リソース、出力装置１５及び外部コンピュータの少なくともいずれかに問題記述５を出力してよい。また、他の一例では、制御部１１は、ステップＳ１０３の処理として、問題記述５をプランナに与えて、プランナにより行動計画を生成することの少なくとも一部の処理を実行してもよい。情報処理装置１において、プランナの演算処理を実行する場合、制御部１１は、プランナに関する情報を適宜取得してよい。例えば、プランナに関する情報は、情報処理装置１のメモリ・リソース（例えば、ＲＡＭ、記憶部１２）及び外部コンピュータの少なくともいずれかに格納されていてよく、制御部１１は、プランナに関する情報をいずれかより取得してよい。プランナにより行動計画を生成する処理を外部コンピュータにより実行する場合、制御部１１は、ステップＳ１０３の処理として、生成された問題記述５を外部コンピュータに与えてもよい。また、他の一例では、情報処理装置１又は外部コンピュータにより行動計画が生成される場合に、制御部１１は、ステップＳ１０３の処理として、生成された行動計画に従って、ロボット装置Ｒの動作を制御してもよい。ロボット装置Ｒの動作を制御することは、ロボット装置Ｒを直接的に制御すること、及びロボット装置Ｒのコントローラに指示を与えることで、ロボット装置Ｒを間接的に制御することを含んでよい。なお、プランナによる行動計画の生成及びロボット装置Ｒの制御の少なくともいずれかは、ステップＳ１０３の処理とは別個に実行されてもよい。ロボット装置Ｒの制御は、省略されてもよい。ロボット装置Ｒの制御を省略する場合、プランナによる行動計画の生成も省略されてよい。問題記述５の出力が完了すると、制御部１１は、本動作例に係る情報処理装置１の処理手順を終了する。 (Step S103)
In step S103, the control unit 11 operates as the output unit 113 and outputs the generated problem description 5. In one example, the control unit 11 may output the problem description 5 to at least one of the memory resources of the information processing device 1, the output device 15, and an external computer. In another example, the control unit 11 may provide the problem description 5 to a planner and execute at least a part of the process of generating an action plan by the planner as the process of step S103. When the information processing device 1 executes the calculation process of the planner, the control unit 11 may appropriately acquire information about the planner. For example, the information about the planner may be stored in at least one of the memory resources (e.g., RAM, storage unit 12) of the information processing device 1 and the external computer, and the control unit 11 may acquire the information about the planner from either of them. When the process of generating an action plan by the planner is executed by the external computer, the control unit 11 may provide the generated problem description 5 to the external computer as the process of step S103. In another example, when a behavior plan is generated by the information processing device 1 or an external computer, the control unit 11 may control the operation of the robot device R in accordance with the generated behavior plan as the processing of step S103. Controlling the operation of the robot device R may include directly controlling the robot device R and indirectly controlling the robot device R by giving instructions to a controller of the robot device R. Note that at least one of the generation of the behavior plan by the planner and the control of the robot device R may be executed separately from the processing of step S103. The control of the robot device R may be omitted. When the control of the robot device R is omitted, the generation of the behavior plan by the planner may also be omitted. When the output of the problem description 5 is completed, the control unit 11 ends the processing procedure of the information processing device 1 according to this operation example.

なお、一例では、プランナにより行動計画を生成する処理において、プランナがエラーメッセージ６１５を出力した場合、制御部１１は、問題記述５の修正に関する処理を実行してよい。例えば、図３に示されるように、推論モジュール３がコンテキスト内学習の訓練済みモデル３９を含む場合、制御部１１は、出力されたエラーメッセージ６１５を取得してよい。そして、制御部１１は、取得されたエラーメッセージ６１５及び問題記述５を推論モジュール３に与えて、推論モジュール３の演算処理を再度実行してよい。この再プロンプトの際、制御部１１は、入力データ２００の少なくとも一部を推論モジュール３に更に与えてもよい。これにより、制御部１１は、修正された新たな問題記述５を生成してよい。制御部１１は、この再プロンプトによる問題記述５を修正する処理を再帰的に繰り返し実行してもよい。 In one example, when the planner outputs an error message 615 during the process of generating an action plan using the planner, the control unit 11 may execute a process for modifying the problem statement 5. For example, as shown in FIG. 3, when the inference module 3 includes a trained model 39 for in-context learning, the control unit 11 may acquire the output error message 615. The control unit 11 may then provide the acquired error message 615 and problem statement 5 to the inference module 3 and execute the calculation process of the inference module 3 again. During this re-prompt, the control unit 11 may further provide at least a portion of the input data 200 to the inference module 3. As a result, the control unit 11 may generate a new, modified problem statement 5. The control unit 11 may recursively execute the process of modifying the problem statement 5 using this re-prompt.

［特徴］
本実施形態では、ステップＳ１０２により生成される問題記述５は、タスクの初期状態から目標状態に到達する行動計画をプランナが生成するため、環境に存在する１つ以上の物体それぞれの初期状態及び目標状態の記述（５１、５２）を含んでいる。初期状態の記述５１は、環境に存在する１つ以上の物体それぞれの初期状態を説明する。一方、目標状態の記述５２は、タスクの達成により到達する、１つ以上の物体それぞれの目標状態を説明する。すなわち、各記述（５１、５２）は、タスクの遂行前後の各状態を示し、人間にとって解釈可能である。そのため、この問題記述５は、高い説明性を有している。したがって、本実施形態によれば、ロボット装置Ｒに与える動作系列（制御指令）を得るための説明性の高い出力を得ることができる。 [Features]
In this embodiment, the problem description 5 generated in step S102 includes descriptions (51, 52) of the initial and goal states of one or more objects in the environment, so that the planner can generate an action plan that reaches a goal state from the initial state of the task. The initial state description 51 describes the initial state of each of the one or more objects in the environment. Meanwhile, the goal state description 52 describes the goal state of each of the one or more objects that is reached by completing the task. In other words, each description (51, 52) indicates the state before and after the task is performed and is human-interpretable. Therefore, the problem description 5 has high interpretability. Therefore, according to this embodiment, a highly interpretable output can be obtained to obtain an action sequence (control command) to be given to the robot device R.

§４変形例
以上、本発明の実施の形態を詳細に説明してきたが、前述までの説明はあらゆる点において本発明の例示に過ぎない。上記実施形態において、種々の改良又は変更が適宜行われてよい。例えば、以下のような変更が可能である。なお、以下では、上記実施形態と同様の構成要素に関しては同様の符号を用い、上記実施形態と同様の点については、適宜説明を省略した。以下の変形例は適宜組み合わせ可能である。 §4 Modifications Although the embodiments of the present invention have been described above in detail, the above description is merely an example of the present invention in every respect. Various improvements or modifications may be made to the above embodiments as appropriate. For example, the following modifications are possible. Note that, in the following, the same reference numerals are used for components similar to those in the above embodiments, and descriptions of the same points as in the above embodiments are omitted as appropriate. The following modifications can be combined as appropriate.

＜４．１＞
上記実施形態において、センサＳは、駆動装置に取り付けられる又は駆動装置を含むことで、観測方向を変更可能に設置されてよい。例えば、センサＳがカメラを含む場合、カメラは、電動雲台に取り付けられることで、撮影方向を変更可能に設置されてよい。電動雲台は、駆動装置の一例である。情報処理装置１は、直接的又は間接的に駆動装置を駆動可能に構成されてよい。これにより、情報処理装置１は、センサＳによる観測方向を適宜変更してもよい。 <4.1>
In the above embodiment, the sensor S may be attached to or include a driving device, thereby enabling the observation direction to be changed. For example, if the sensor S includes a camera, the camera may be attached to an electric pan head, thereby enabling the shooting direction to be changed. The electric pan head is an example of a driving device. The information processing device 1 may be configured to be able to directly or indirectly drive the driving device. In this way, the information processing device 1 may appropriately change the observation direction of the sensor S.

加えて、環境情報２２は、観測する対象となる物体の指定リストを含んでよい。上記ステップＳ１０１では、情報処理装置１の制御部１１は、指定リストを含む環境情報２２を取得してよい。上記属性情報２４１が、物体の指定リストを兼ねてもよい。上記ステップＳ１０２では、制御部１１は、指定リストに含まれる全ての物体を観測データ２０から検出したか否かを判定してもよい。上記図４の例では、制御部１１は、物体推定器３１又は初期状態推定器３３の演算処理において、指定リストに含まれる全ての物体を観測データ２０から検出したか否かを判定してもよい。一例では、図５Ａの訓練済みモデル３１１等のように、物体の検出器を備える場合、制御部１１は、観測データ２０に対する検出器による物体の検出結果に応じて、指定リストに含まれる全ての物体を検出したか否かを判定してよい。他の一例では、制御部１１は、物体の記述５０及び初期状態の記述５１の少なくともいずれかにおいて、指定リストに含まれる全ての物体が含まれるか否かに応じて、指定リストに含まれる全ての物体を検出したか否かを判定してもよい。指定リストに含まれる少なくとも一部の物体が観測データ２０から検出されなかった場合、制御部１１は、センサＳの駆動装置を駆動し、センサＳの観測方向を適宜変更してよい。変更する方向及び量は、例えば、ランダム、所定の規則に従う等の任意の方法で決定されてよい。そして、制御部１１は、観測データ２０を再度取得し、ステップＳ１０２の処理を再度実行してよい。指定リストに含まれる全ての物体が検出されるまで、制御部１１は、これらの処理を繰り返し実行してよい。一方で、指定リストに含まれる全ての物体が観測データ２０から検出された場合、制御部１１は、問題記述５を生成する処理を進めてよい。本変形例によれば、センサＳの観測方向を適切な方向に修正し、問題記述５を生成する精度の向上を期待することができる。 In addition, the environmental information 22 may include a designation list of objects to be observed. In step S101, the control unit 11 of the information processing device 1 may acquire the environmental information 22, including the designation list. The attribute information 241 may also serve as the object designation list. In step S102, the control unit 11 may determine whether all objects included in the designation list have been detected from the observation data 20. In the example of FIG. 4, the control unit 11 may determine whether all objects included in the designation list have been detected from the observation data 20 during the calculation process of the object estimator 31 or the initial state estimator 33. In one example, when an object detector, such as the trained model 311 of FIG. 5A, is provided, the control unit 11 may determine whether all objects included in the designation list have been detected based on the object detection results of the detector for the observation data 20. In another example, the control unit 11 may determine whether all objects included in the designation list have been detected based on whether all objects included in the designation list are included in at least one of the object description 50 and the initial state description 51. If at least some of the objects included in the designation list are not detected in the observation data 20, the control unit 11 may drive the drive device of the sensor S and appropriately change the observation direction of the sensor S. The direction and amount of change may be determined by any method, for example, randomly, according to a predetermined rule, or the like. The control unit 11 may then reacquire the observation data 20 and perform the process of step S102 again. The control unit 11 may repeat these processes until all of the objects included in the designation list are detected. On the other hand, if all of the objects included in the designation list are detected in the observation data 20, the control unit 11 may proceed with the process of generating the problem description 5. According to this modification, the observation direction of the sensor S is corrected in an appropriate direction, and the accuracy of generating the problem description 5 can be expected to improve.

なお、上記検出物体の確認処理は、実施の形態に応じて適宜変更されてよい。例えば、指定リストは、観測するか否かがどちらでもよい物体（推奨物体）を更に含んでもよい。
その推奨物体のみが検出されなかった場合、制御部１１は、アラートを出力した上で、問題記述５を生成する処理を進めてもよい。或いは、制御部１１は、問題記述５を生成する処理を停止し、問題記述５を生成する処理を続けるか否かをオペレータに問い合わせてもよい。 The above-described process for confirming detected objects may be modified as appropriate depending on the embodiment. For example, the designation list may further include objects that may or may not be observed (recommended objects).
If only the recommended object is not detected, the control unit 11 may output an alert and then proceed with the process of generating the problem description 5. Alternatively, the control unit 11 may stop the process of generating the problem description 5 and ask the operator whether or not to continue the process of generating the problem description 5.

＜４．２＞
ドメイン記述２３は、ロボット装置Ｒに依存する部分（例えば、ロボット装置Ｒのスキルを定義する記述等）及び環境に関連する部分（例えば、対象物体の種類を定義する記述等）を含み得る。そこで、上記実施形態において、問題記述５を生成する方法と同様に、情報処理装置１は、演算モジュールを用いて、観測データ２０、指示情報２１及び環境情報２２の少なくとも一部から、ドメイン記述２３の環境に関連する部分の少なくとも一部を生成してもよい。演算モジュールは、訓練済みモデル及びルールベースモデルの少なくともいずれかにより構成されてよい。本変形例によれば、ドメイン記述２３を用意する手間の削減を期待することができる。 <4.2>
The domain description 23 may include a portion dependent on the robot device R (e.g., a description defining the skills of the robot device R) and a portion related to the environment (e.g., a description defining the type of target object). Therefore, similar to the method for generating the problem description 5 in the above embodiment, the information processing device 1 may use a calculation module to generate at least a portion of the portion related to the environment of the domain description 23 from at least a portion of the observation data 20, the instruction information 21, and the environmental information 22. The calculation module may be configured with at least one of a trained model and a rule-based model. This modification is expected to reduce the effort required to prepare the domain description 23.

＜４．３＞
上記図４の例では、推論モジュール３は、問題記述５における物体、初期状態及び目標状態の各記述（５０、５１、５２）を個別に生成するように構成されている。しかしながら、推論モジュール３の構成は、このような例に限られなくてよい。他の一例では、推論モジュール３は、問題記述５における物体、初期状態及び目標状態の記述（５０、５１、５２）の少なくともいずれかの組み合わせを一体的に生成するように構成されてもよい。 <4.3>
4, the inference module 3 is configured to individually generate descriptions (50, 51, 52) of the objects, initial states, and goal states in the problem statement 5. However, the configuration of the inference module 3 is not limited to this example. In another example, the inference module 3 may be configured to integrally generate at least any combination of descriptions (50, 51, 52) of the objects, initial states, and goal states in the problem statement 5.

図９は、他の形態に係る推論モジュール３Ａの構成の一例を模式的に示す。推論モジュール３Ａは、物体検出器３７１、キャプションモデル３７３、及び大規模言語モデル３７５を備える。図９の例では、観測データ２０は、画像データにより構成されてよく、指示情報２１は、目標を自然言語で指示するテキストデータにより構成されてよい。物体検出器３７１には、上記参考文献１１、参考文献１２等で提案されるopen vocabulary object
detection model又はopen vocabulary object segmentation modelが用いられてよい。キャプションモデル３７３には、上記参考文献１３等で提案されるvisual question answering modelが用いられてよい。大規模言語モデル３７５には、上記参考文献１４等で提案されるモデルが用いられてよい。 9 is a schematic diagram showing an example of the configuration of an inference module 3A according to another embodiment. The inference module 3A includes an object detector 371, a caption model 373, and a large-scale language model 375. In the example shown in FIG. 9, the observation data 20 may be composed of image data, and the instruction information 21 may be composed of text data instructing a target in natural language. The object detector 371 includes an open vocabulary object model (OVO) proposed in the above-mentioned references 11 and 12.
For example, a detection model or an open vocabulary object segmentation model may be used. For the caption model 373, a visual question answering model proposed in the above-mentioned Reference 13 may be used. For the large-scale language model 375, a model proposed in the above-mentioned Reference 14 may be used.

まず、情報処理装置１の制御部１１は、物体検出器３７１を用いて、観測データ２０から物体を検出してよい。制御部１１は、物体検出器３７１による物体の検出結果に応じて、観測データ２０から各物体の部分データ（部分画像データ）を抽出し、抽出された部分データをキャプションモデル３７３に与えてよい。制御部１１は、キャプションモデル３７３を用いて、抽出された各物体の部分データから各物体のキャプションを生成してよい。そして、制御部１１は、大規模言語モデル３７５を用いて、物体検出器３７１による物体の検出結果のテキストデータ、キャプションモデル３７３によるキャプションのテキストデータ、及び指示情報２１から問題記述５の各記述（５０、５１、５２）を生成してもよい。本変形例によれば、演算処理の簡素化を図ることができる。 First, the control unit 11 of the information processing device 1 may use the object detector 371 to detect objects from the observation data 20. The control unit 11 may extract partial data (partial image data) of each object from the observation data 20 according to the object detection results by the object detector 371, and provide the extracted partial data to the caption model 373. The control unit 11 may use the caption model 373 to generate a caption for each object from the extracted partial data of each object. Then, the control unit 11 may use the large-scale language model 375 to generate each description (50, 51, 52) of the problem description 5 from the text data of the object detection results by the object detector 371, the text data of the caption by the caption model 373, and the instruction information 21. This variant example simplifies the calculation process.

＜４．４＞
上記実施形態において、図３の再プロンプトの方法は、実施の形態に応じて適宜改良されてよい。一例では、図３における再プロンプトの方法として、参考文献１５（Jason Wei et al., “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”, [online], [令和５年１０月２４日検索], インターネット＜ＵＲＬ：https://arxiv.org/pdf/2201.11903.pdf＞）、参考文献１６（Takeshi Kojima et al., “Large Language Models are Zero-Shot Reasoners”, [online], [令和５年１０月２４日検索], インターネット＜ＵＲＬ：https://arxiv.org/abs/2205.11916＞）、参考文献１７（Denny Zh
ou et al., “Least-to-Most Prompting Enables Complex Reasoning in Large Language
Models”, [online], [令和５年１０月２４日検索], インターネット＜ＵＲＬ：https://arxiv.org/abs/2205.10625＞）等で提案されるChain-of-Thoughtプロンプトが用いられてよい。Chain-of-Thoughtプロンプトは、中間推論のステップを経て、段階的な推論を重ねて、最終的な推論結果（本実施形態では、問題記述５の生成結果）を得る方法である。例えば、情報処理装置１の制御部１１は、問題記述５及びエラーメッセージ６１５と共に、“What part of the PDDL problem do you think is causing this error ?” 等の、エラーを質問するテンプレートデータを推論モジュール３に与えて、エラーに対する説明を推論モジュール３に生成させてもよい。その後、制御部１１は、問題記述５及びエラーメッセージ６１５を推論モジュール３に与えて、新たな問題記述５を推論モジュール３に生成させてよい。 <4.4>
In the above embodiment, the re-prompting method in Fig. 3 may be appropriately improved depending on the embodiment. As an example, the re-prompting method in Fig. 3 may be based on the method described in Reference 15 (Jason Wei et al., "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models", [online], [Retrieved October 24, 2023], Internet <URL: https://arxiv.org/pdf/2201.11903.pdf>), Reference 16 (Takeshi Kojima et al., "Large Language Models are Zero-Shot Reasoners", [online], [Retrieved October 24, 2023], Internet <URL: https://arxiv.org/abs/2205.11916>), Reference 17 (Denny Zh
ou et al., “Least-to-Most Prompting Enables Complex Reasoning in Large Language
Chain-of-Thought prompts such as those proposed in "Arxiv: PDDL Models," [online], [searched October 24, 2023], Internet <URL: https://arxiv.org/abs/2205.10625>" may be used. Chain-of-Thought prompts are a method of performing intermediate inference steps and then layering step-by-step inferences to obtain a final inference result (in this embodiment, the result of generating a problem description 5). For example, the control unit 11 of the information processing device 1 may provide the inference module 3 with template data that asks a question about the error, such as "What part of the PDDL problem do you think is causing this error?", along with the problem description 5 and the error message 615, to cause the inference module 3 to generate an explanation for the error. The control unit 11 may then provide the problem description 5 and the error message 615 to the inference module 3, and cause the inference module 3 to generate a new problem description 5.

§５実験例
上記実施形態の有効性を検証するため、以下の実験を行った。ただし、本発明は、以下の実施例に限定されるものではない。 §5 Experimental Examples The following experiments were carried out to verify the effectiveness of the above-described embodiment, but the present invention is not limited to the following examples.

（推論モジュール）
第１実施例に係る推論モジュールとして、図４に示される推論モジュールの構成を採用した。物体検出器の構成には、図５Ａの構成を採用した。物体検出器の訓練済みモデルには、参考文献１１で提案されるモデルを採用した。初期状態推定器の構成には、図５Ｂの構成を採用した。初期状態推定器の検出器には、参考文献１３で提案されるモデルを採用した。初期状態推定器の訓練済みモデルには、参考文献１４で提案されるモデルを採用した。目標推定器の構成には、図５Ｃの構成を採用した。目標推定器の訓練済みモデルには、参考文献１４で提案されるモデルを採用した。 (inference module)
The inference module configuration shown in FIG. 4 was used for the first example. The object detector configuration was the configuration shown in FIG. 5A. The trained model for the object detector was the model proposed in Reference 11. The initial state estimator configuration was the configuration shown in FIG. 5B. The detector for the initial state estimator was the model proposed in Reference 13. The trained model for the initial state estimator was the model proposed in Reference 14. The target estimator configuration was the configuration shown in FIG. 5C. The trained model for the target estimator was the model proposed in Reference 14.

また、観測データには、画像データを用いた。指示情報には、目標を自然言語で指示するテキストデータを用いた。物体推定器の訓練済みモデルには、観測データ（画像データ）と共に、言語表現で物体の属性を示す属性情報を与えた。初期状態推定器の訓練済みモデルには、物体の検出結果を示すテキストデータ、検出器によるキャプションのテキストデータ、物体推定器により生成された物体の記述、及びFew-shotプロンプティングに用いる入出力サンプルを与えた。目標推定器の訓練済みモデルには、指示情報（テキストデータ）、物体推定器により生成された物体の記述、初期状態推定器により生成された初期状態の記述及びFew-shotプロンプティングに用いる入出力サンプルを与えた。入出力サンプルの数は３つであった。 Image data was used as the observation data. Text data specifying the target in natural language was used as the instruction information. The trained model of the object estimator was given attribute information indicating the object's attributes in linguistic expressions along with the observation data (image data). The trained model of the initial state estimator was given text data indicating the object detection results, text data of captions by the detector, a description of the object generated by the object estimator, and input/output samples used for few-shot prompting. The trained model of the target estimator was given instruction information (text data), a description of the object generated by the object estimator, a description of the initial state generated by the initial state estimator, and input/output samples used for few-shot prompting. The number of input/output samples was three.

第２実施例に係る推論モジュールとして、図９に示される推論モジュールの構成を採用した。推論モジュールの物体検出器には、参考文献１１で提案されるモデルを採用した。キャプションモデルには、参考文献１３で提案されるモデルを採用した。大規模言語モデルには、参考文献１４で提案されるモデルを採用した。大規模言語モデルには、指示情報（テキストデータ）、物体検出器による検出結果のテキストデータ、キャプションモデルによるキャプションのテキストデータ、及びFew-shotプロンプティングに用いる入出力サンプルを与えた。 The inference module configuration shown in Figure 9 was used for the second example. The object detector of the inference module was the model proposed in Reference 11. The caption model was the model proposed in Reference 13. The large-scale language model was the model proposed in Reference 14. The large-scale language model was provided with instruction information (text data), text data of the detection results by the object detector, text data of captions by the caption model, and input/output samples used for few-shot prompting.

問題記述のフォーマットには、ＰＤＤＬを採用した。生成された問題記述を用いて行動計画が生成可能か否かを確認するため、シンボリックプランナを用意した。シンボリックプランナを実行する環境として、参考文献１のFast Downward及びVAL（参考文献１８：“KCL-Planning/VAL”, [online], [令和５年１０月２４日検索], インターネット＜ＵＲＬ：https://github.com/KCL-Planning/VAL＞）を用いた。第１実施例及び第２実施例の推論モジュールにおいて、図３の再プロンプトによる修正機構を用意した。再プロンプトは、入力データ、生成された問題記述及びエラーメッセージにより構成した。再プロンプト
の方法には、上記Chain-of-Thoughtプロンプトを採用した。再プロンプトにより問題記述を修正する最大回数は２回に設定した。 The problem description format was PDDL. A symbolic planner was prepared to verify whether an action plan could be generated using the generated problem description. The environment for running the symbolic planner was Fast Downward and VAL (Reference 18: "KCL-Planning/VAL", [online], [searched October 24, 2023], Internet URL: https://github.com/KCL-Planning/VAL). In the inference modules of the first and second examples, a reprompting correction mechanism (Figure 3) was implemented. The reprompting consisted of the input data, the generated problem description, and an error message. The reprompting method used was the Chain-of-Thought prompt described above. The maximum number of reprompting attempts to correct the problem description was set to two.

（データセット）
図１０は、用意した各ドメインにおけるドメイン記述のtypes(Object types)、predicates、及びactionsを示す。図１０に示されるように、料理（Cooking）、ブロックワールド（Blocksworld）及びハノイの塔（Hanoi）の３つのドメインにおけるデータセットを用意した。料理のタスクは、野菜をスライスしてボウルに入れることを想定した。ロボット装置Ｒとして、左右に配置される２つのロボットアームを想定した。目標状態は、野菜の状態及び場所を指定した。 (Dataset)
Figure 10 shows the types (object types), predicates, and actions of the domain description for each domain prepared. As shown in Figure 10, datasets were prepared for three domains: Cooking, Blocksworld, and Tower of Hanoi. The cooking task was assumed to be slicing vegetables and putting them into a bowl. Two robot arms, positioned on the left and right, were assumed as the robot device R. The state and location of the vegetables were specified as the target state.

ブロックワールドは、参考文献１９（Naresh Gupta et al., “On the Complexity of Blocks-World Planning”, [online], [令和５年１０月２４日検索], インターネット＜ＵＲＬ：https://www.semanticscholar.org/paper/On-the-Complexity-of-Blocks-World-Planning-Gupta-Nau/db01349fd0d29c9443e37b0b0aa4ddb948ace5ce＞）等で用いられる古典的なドメインである。重複のない７つの色付きのブロックを問題毎に用いた。ロボットアームは、最初に常に何かを掴んでいるとは限らないように設定した。目標状態は、ブロックの関係を指定した。 Block-world is a classic domain used in references such as Reference 19 (Naresh Gupta et al., "On the Complexity of Blocks-World Planning", [online], [searched October 24, 2023], Internet <URL: https://www.semanticscholar.org/paper/On-the-Complexity-of-Blocks-World-Planning-Gupta-Nau/db01349fd0d29c9443e37b0b0aa4ddb948ace5ce>). Seven unique colored blocks were used for each problem. The robot arm was initially configured so that it did not always grasp something. The goal state specified the relationship between the blocks.

ハノイは、参考文献２０（Ron Alford et al., “Translating HTNs to PDDL: A Small
Amount of Domain Knowledge Can Go a Long Way”, [online], [令和５年１０月２４日検索], インターネット＜ＵＲＬ：https://www.cs.umd.edu/~nau/papers/alford2009translating.pdf＞）等で用いられる古典的なドメインである。３本のペグ及び６色の１０枚のディスクを設定した。同じ色のディスクには、幅の大きい順に番号を付与した。３つのペグには、左から右に番号を付与した。初期状態及び目標状態はディスクの位置を指定した。 Hanoi is based on the reference 20 (Ron Alford et al., “Translating HTNs to PDDL: A Small
This is a classic domain used in places such as "The Amount of Domain Knowledge Can Go a Long Way," [online], [searched October 24, 2023], Internet (URL: https://www.cs.umd.edu/~nau/papers/alford2009translating.pdf). Three pegs and ten disks of six colors were set up. Disks of the same color were numbered in descending order of width. The three pegs were numbered from left to right. The initial state and goal state were specified by the positions of the disks.

図１１Ａ、図１１Ｂ及び図１１Ｃは、料理、ブロックワールド及びハノイのドメインで与えられる観測データ及び指示情報の例を示す。各ドメインでは、１つのドメイン記述を用意した。また、正解サンプルの問題記述を問題毎に用意した。ハノイでは、全ての問題で同一の指示情報を用いた。 Figures 11A, 11B, and 11C show examples of observation data and instruction information given in the Cooking, Block World, and Hanoi domains. One domain description was prepared for each domain. Sample correct answer problem descriptions were also prepared for each problem. In Hanoi, the same instruction information was used for all problems.

（評価指標）
問題記述が適切に生成されたか否かを評価するため、４つの指標（R_syntax、R_plan、R_part、R_all）を用意した。R_syntaxは、生成された問題記述がＰＤＤＬの構文に適合している比率として定義した。VALが警告及び終了コードを返さない場合、生成された問題記述は構文的に正しいと見なした。R_planは、生成された問題記述から行動計画を生成した際に、シンボリックプランナがエラーメッセージを出力しなかった比率として定義した。シンボリックプランナによる行動計画の生成を試行し、VALがエラーメッセージを返さない場合、行動計画は有効に生成されたと見なした。R_part及びR_allは、再現性を評価するため、生成された問題記述が正解サンプルの記述を全て含む比率として定義した。R_partでは、物体、初期状態及び目標状態の記述を個々に計算した。R_allでは、生成された問題記述全体で計算した。 (Evaluation indicators)
To evaluate whether a problem description was generated appropriately, four indices (R _syntax , R _plan , R _part , and R _all ) were prepared. R _syntax was defined as the proportion of the generated problem description that conformed to the PDDL syntax. If VAL returned no warnings or exit codes, the generated problem description was considered syntactically correct. R _plan was defined as the proportion of cases in which the symbolic planner did not output an error message when generating an action plan from the generated problem description. An action plan was attempted to be generated using the symbolic planner, and if VAL returned no error messages, the action plan was considered to have been generated validly. R _part and R _all were defined as the proportion of the generated problem description that included all descriptions of the correct sample to evaluate reproducibility. R _part was calculated individually for the descriptions of the objects, initial states, and goal states. _{R all} was calculated for the entire generated problem description.

（実験例）
第１実験例では、第１実施例の推論モジュールを用いて、３つのドメインそれぞれで問題記述を生成した。再プロンプトによる問題記述の修正は２回まで実行するように設定した。ドメイン毎に１００個の問題を用意した。入出力サンプルの組み合わせを変えて、１０個の問題記述を問題毎に生成した。生成された問題記述に対して上記４つの評価指標（R_syntax、R_plan、R_part、R_all）を計算した。 (Experimental Example)
In the first experimental example, problem descriptions were generated for each of three domains using the inference module of the first embodiment. The system was set up to allow up to two re-prompts to modify the problem descriptions. 100 problems were prepared for each domain. Ten problem descriptions were generated for each problem by varying the combination of input and output samples. The four evaluation indices (R _syntax , R _plan , R _part , and R _all ) were calculated for the generated problem descriptions.

第２実験例では、第２実施例の推論モジュールを用いて、３つのドメインそれぞれで問題記述を生成した。生成された問題記述に対して３つの評価指標（R_syntax、R_plan、R_all）を計算した。それ以外の条件は、第１実験例と同一に設定した。 In the second experimental example, problem descriptions were generated in each of the three domains using the inference module of the second embodiment. Three evaluation indices (R _syntax , R _plan , and R _all ) were calculated for the generated problem descriptions. All other conditions were set to be the same as in the first experimental example.

第３実験例では、第１実施例の推論モジュールを用いて、（Ａ）修正の再プロンプト（CR）及びChain-of-Thought(CoT)プロンプトあり（修正は２回まで）、（Ｂ）修正の再プロンプト（CR）及びChain-of-Thought(CoT)プロンプトあり（修正は１回まで）、（Ｃ）修正の再プロンプト（CR）あり及びChain-of-Thought(CoT)プロンプトなし（修正は１回まで）、並びに（Ｄ）修正の再プロンプト（CR）及びChain-of-Thought(CoT)プロンプトなしの４つの条件で、３つのドメインそれぞれで問題記述を生成した。生成された問題記述に対して３つの評価指標（R_syntax、R_plan、R_all）を計算した。それ以外の条件は、第１実験例と同一に設定した。それ以外の条件は、第１実験例と同一に設定した。 In the third experimental example, using the reasoning module of the first embodiment, problem statements were generated for each of three domains under four conditions: (A) with revision reprompts (CR) and Chain-of-Thought (CoT) prompts (up to two revisions), (B) with revision reprompts (CR) and Chain-of-Thought (CoT) prompts (up to one revision), (C) with revision reprompts (CR) and no Chain-of-Thought (CoT) prompts (up to one revision), and (D) without revision reprompts (CR) and Chain-of-Thought (CoT) prompts. Three evaluation metrics (R _syntax , R _plan , and R _all ) were calculated for the generated problem statements. The other conditions were the same as those in the first experimental example.

（実験結果）
図１２、図１３及び図１４は、第１実験例、第２実験例及び第３実験例における各評価指標の計算結果を示す。図１３の各括弧内の数値は、第１実験例の結果との差を示す。 (Experimental results)
12, 13, and 14 show the calculation results of each evaluation index in Experimental Example 1, Experimental Example 2, and Experimental Example 3. The values in parentheses in Fig. 13 indicate the difference from the result of Experimental Example 1.

図１２に示されるとおり、第１実験例のハノイのドメインでは、R_planがやや低かったものの、各ドメインで、R_syntax及びR_planのスコアは高かった。すなわち、構文的に正しい問題記述を生成することができた。また、ブロックワールド及びハノイのドメインでは、R_allのスコアは低かったものの、料理のドメインでは、R_part及びR_allのスコアは共に高かった。そのため、問題記述の生成において、高い再現性を実現可能であることも分かった。これらの結果から、本実施形態によれば、有効な問題記述が生成可能であることが分かった。 As shown in Figure 12, in the Hanoi domain of the first experimental example, although R _plan was slightly low, the scores for R _syntax and R _plan were high in each domain. In other words, syntactically correct problem descriptions were generated. Furthermore, while the R _all score was low in the Block World and Hanoi domains, the scores for both R _part and R _all were high in the Cooking domain. Therefore, it was found that high reproducibility can be achieved in generating problem descriptions. These results demonstrate that effective problem descriptions can be generated according to this embodiment.

また、図１３に示されるとおり、第２実施例の推論モジュールを用いても、第１実施例の推論モジュールと同等に、構文的に正しい問題記述を生成することができた。また、料理及びブロックワールドのドメインでは、R_allのスコアが低下したものの、ハノイのドメインでは、R_allのスコアは上昇した。これらの結果から、推論モジュールの構成として、図９のような一体的な構成を採用しても、有効な問題記述が生成可能であることが分かった。 Furthermore, as shown in Figure 13, the inference module of the second embodiment was able to generate syntactically correct problem descriptions equivalent to those of the inference module of the first embodiment. Furthermore, although the R _all score decreased in the cooking and block world domains, the R _all score increased in the Hanoi domain. These results demonstrate that effective problem descriptions can be generated even when the integrated configuration shown in Figure 9 is adopted as the inference module configuration.

また、図１４に示されるとおり、修正の再プロンプト（CR）及びChain-of-Thought(CoT)プロンプトを重ねることで、R_syntax、R_plan及びR_allのスコアは全て上昇した。この結果から、修正の再プロンプト（CR）及びChain-of-Thought(CoT)プロンプトが、適切な問題記述を得るのに有効であることが分かった。 Furthermore, as shown in Figure 14, the scores for R _syntax , R _plan , and R _all increased as a result of repeated use of correction reprompts (CR) and Chain-of-Thought (CoT) prompts. These results indicate that correction reprompts (CR) and Chain-of-Thought (CoT) prompts are effective in obtaining appropriate problem descriptions.

本明細書は以下の開示を含む。
［付記１］
ロボット装置の動作する環境の観測データ、及び前記ロボット装置に与えるタスクの目標に関する指示情報を取得するステップ、
推論モジュールを用いて、取得された前記観測データ及び前記指示情報から前記タスクの問題記述を生成するステップであって、前記問題記述は、前記環境に存在する物体の初期状態及び目標状態の記述を含む、ステップ、並びに
生成された前記問題記述を出力するステップ、
を実行するように構成される制御部を備える、
情報処理装置。
［付記２］
生成される前記問題記述は、所定のフォーマットに従う、
付記１に記載の情報処理装置。
［付記３］
生成される前記問題記述は、前記環境に存在する前記物体の記述を更に含む、
付記１又は２に記載の情報処理装置。
［付記４］
前記観測データは、センサのセンシングデータにより構成され、
前記指示情報は、前記目標を自然言語で指示する言語情報により構成される、
付記１から付記３のいずれか一つに記載の情報処理装置。
［付記５］
前記推論モジュールは、コンテキスト内学習の訓練済みモデルを含むように構成され、
前記制御部は、生成された前記問題記述をプランナに与えて、前記ロボット装置の行動計画を生成する処理において、当該プランナがエラーメッセージを出力した場合、
出力された前記エラーメッセージを取得するステップ、並びに
前記推論モジュールを用いて、前記問題記述及び前記エラーメッセージから新たな問題記述を生成するステップ、
を更に実行するように構成される、
付記１から付記４のいずれか一つに記載の情報処理装置。
［付記６］
前記取得するステップでは、前記制御部は、前記環境に関する環境情報を更に取得するように構成され、
取得された前記観測データ及び前記指示情報から前記問題記述を生成することは、取得された前記観測データ、前記指示情報及び前記環境情報から前記問題記述を生成することにより構成される、
付記１から付記５のいずれか一つに記載の情報処理装置。
［付記７］
生成される前記問題記述は、前記環境に存在する前記物体の記述を更に含み、
前記推論モジュールは、物体推定器を含み、
前記推論モジュールを用いて、前記問題記述を生成することは、前記物体推定器を用いて、前記環境に存在する前記物体の記述を取得された前記観測データから生成することを含む、
付記１から付記６のいずれか一つに記載の情報処理装置。
［付記８］
前記物体推定器は、コンテキスト内学習の訓練済みモデルを備える、
付記７に記載の情報処理装置。
［付記９］
前記取得するステップでは、前記制御部は、前記環境に存在する前記物体の属性情報を更に取得するように構成され、
取得された前記観測データから前記物体の記述を生成することは、取得された前記観測データ及び前記属性情報から前記物体の記述を生成することにより構成される、
付記７又は付記８に記載の情報処理装置。
［付記１０］
前記推論モジュールは、初期状態推定器を含み、
前記推論モジュールを用いて、前記問題記述を生成することは、前記初期状態推定器を用いて、前記環境に存在する前記物体の前記初期状態の記述を生成することを含む、
付記１から付記９のいずれか一つに記載の情報処理装置。
［付記１１］
前記初期状態推定器は、コンテキスト内学習の訓練済みモデルを備える、
付記１０に記載の情報処理装置。
［付記１２］
前記推論モジュールは、目標推定器を含み、
前記推論モジュールを用いて、前記問題記述を生成することは、前記目標推定器を用いて、前記環境に存在する前記物体の前記目標状態の記述を生成することを含む、
付記１から付記１１に記載の情報処理装置。
［付記１３］
前記目標推定器は、コンテキスト内学習の訓練済みモデルを備える、
付記１２に記載の情報処理装置。
［付記１４］
コンピュータが、
ロボット装置の動作する環境の観測データ、及び前記ロボット装置に与えるタスクの目標に関する指示情報を取得するステップ、
推論モジュールを用いて、取得された前記観測データ及び前記指示情報から前記タスクの問題記述を生成するステップであって、前記問題記述は、前記環境に存在する物体の初期状態及び目標状態の記述を含む、ステップ、並びに
生成された前記問題記述を出力するステップ、
を実行する、
情報処理方法。
［付記１５］
コンピュータに、
ロボット装置の動作する環境の観測データ、及び前記ロボット装置に与えるタスクの目標に関する指示情報を取得するステップ、
推論モジュールを用いて、取得された前記観測データ及び前記指示情報から前記タスクの問題記述を生成するステップであって、前記問題記述は、前記環境に存在する物体の初期状態及び目標状態の記述を含む、ステップ、並びに
生成された前記問題記述を出力するステップ、
を実行させるための、
プログラム。 This specification includes the following disclosure.
[Appendix 1]
Obtaining observation data of an environment in which the robotic device operates and instruction information regarding a goal of a task to be given to the robotic device;
generating a problem description for the task from the acquired observation data and the instruction information using an inference module, the problem description including descriptions of initial and goal states of objects in the environment; and outputting the generated problem description.
a control unit configured to perform
Information processing device.
[Appendix 2]
the problem description generated follows a predetermined format;
2. The information processing device according to claim 1.
[Appendix 3]
the generated problem description further includes a description of the objects present in the environment;
3. The information processing device according to claim 1 or 2.
[Appendix 4]
the observation data is composed of sensing data from a sensor;
the instruction information is composed of linguistic information that indicates the target in natural language;
4. An information processing device according to any one of claims 1 to 3.
[Appendix 5]
the inference module is configured to include a trained model for in-context learning;
The control unit provides the generated problem description to a planner, and when the planner outputs an error message during a process of generating a behavior plan for the robot device,
obtaining the output error message; and generating a new problem description from the problem description and the error message using the reasoning module.
and further configured to perform
5. An information processing device according to any one of claims 1 to 4.
[Appendix 6]
In the acquiring step, the control unit is configured to further acquire environmental information related to the environment;
generating the problem description from the acquired observation data and the instruction information comprises generating the problem description from the acquired observation data, the instruction information, and the environment information;
6. An information processing device according to any one of claims 1 to 5.
[Appendix 7]
the generated problem description further comprises a description of the objects present in the environment;
the inference module includes an object estimator;
generating the problem statement using the reasoning module includes generating descriptions of the objects present in the environment from the acquired observation data using the object estimator;
7. An information processing device according to any one of claims 1 to 6.
[Appendix 8]
the object estimator comprises a trained model for in-context learning;
8. The information processing device according to claim 7.
[Appendix 9]
In the acquiring step, the control unit is configured to further acquire attribute information of the object present in the environment;
generating a description of the object from the acquired observation data comprises generating a description of the object from the acquired observation data and the attribute information;
9. The information processing device according to claim 7 or 8.
[Supplementary Note 10]
the inference module includes an initial state estimator;
generating the problem statement using the reasoning module includes generating a description of the initial states of the objects in the environment using the initial state estimator.
10. An information processing device according to any one of Supplementary Note 1 to Supplementary Note 9.
[Appendix 11]
the initial state estimator comprises a trained model for in-context learning;
11. The information processing device according to claim 10.
[Appendix 12]
the inference module includes a target estimator;
generating the problem statement using the reasoning module includes generating a description of the goal state of the objects in the environment using the goal estimator.
12. The information processing device according to claim 1.
[Appendix 13]
the goal estimator comprises a trained model for in-context learning;
13. The information processing device according to claim 12.
[Appendix 14]
The computer
Obtaining observation data of an environment in which the robotic device operates and instruction information regarding a goal of a task to be given to the robotic device;
generating a problem description for the task from the acquired observation data and the instruction information using an inference module, the problem description including descriptions of initial and goal states of objects in the environment; and outputting the generated problem description.
To execute
Information processing methods.
[Appendix 15]
On the computer,
Obtaining observation data of an environment in which the robotic device operates and instruction information regarding a goal of a task to be given to the robotic device;
generating a problem description for the task from the acquired observation data and the instruction information using an inference module, the problem description including descriptions of initial and goal states of objects in the environment; and outputting the generated problem description.
In order to execute
program.

１…情報処理装置、
１１…制御部、１２…記憶部、１３…外部インタフェース、
１４…入力装置、１５…出力装置、１６…ドライブ、
８１…プログラム、９１…記憶媒体、
１１１…取得部、１１２…生成部、１１３…出力部、
３…推論モジュール、３００…モジュールデータ、
３１…物体推定器、３３…初期状態推定器、
３５…目標推定器、
２０…観測データ、２１…指示情報、
２２…環境情報、
２３…ドメイン記述、２４…ドメイン情報、
２４１…属性情報、
５…問題記述、
５０…物体の記述、５１…初期状態の記述、
５２…目標状態の記述 1...information processing device,
11...control unit, 12...storage unit, 13...external interface,
14...input device, 15...output device, 16...drive,
81... program, 91... storage medium,
111... Acquisition unit, 112... Generation unit, 113... Output unit,
3...inference module, 300...module data,
31... object estimator, 33... initial state estimator,
35...Target estimator,
20... Observation data, 21... Instruction information,
22...Environmental information,
23...Domain description, 24...Domain information,
241...Attribute information,
5...Problem description,
50...Description of object, 51...Description of initial state,
52...Description of target state

Claims

Obtaining observation data of an environment in which the robotic device operates and instruction information regarding a goal of a task to be given to the robotic device;
generating a problem description for the task using a reasoning module , the reasoning module including a goal estimator, the problem description including a description of a goal state of an object in the environment generated from the acquired observation data and the instruction information using the goal estimator; and outputting the generated problem description.
a control unit configured to perform
Information processing device.

the step of outputting the problem description includes outputting the problem description, including a description of the goal state, to a planner that generates an operation plan for the task of the robotic device.
The information processing device according to claim 1 .

the problem description generated follows a predetermined format;
The information processing device according to claim 1 .

the generated problem description further includes a description of the objects present in the environment;
The information processing device according to claim 1 .

the observation data is composed of sensing data from a sensor;
the instruction information is composed of linguistic information that indicates the target in natural language;
The information processing device according to claim 1 .

the inference module is configured to include a trained model for in-context learning;
the control unit obtaining an error message for the generated problem description; and using the inference module to generate a new problem description from the problem description and the error message.
and further configured to perform
The information processing device according to claim 1 .

In the acquiring step, the control unit is configured to further acquire environmental information related to the environment;
generating the problem description from the acquired observation data and the instruction information comprises generating the problem description from the acquired observation data, the instruction information, and the environment information;
The information processing device according to claim 1 .

the generated problem description further comprises a description of the objects present in the environment;
the inference module includes an object estimator;
generating the problem statement using the reasoning module includes generating descriptions of the objects present in the environment from the acquired observation data using the object estimator;
The information processing device according to claim 1 .

the object estimator comprises a trained model for in-context learning;
The information processing device according to claim 8 .

In the acquiring step, the control unit is configured to further acquire attribute information of the object present in the environment;
generating a description of the object from the acquired observation data comprises generating a description of the object from the acquired observation data and the attribute information;
The information processing device according to claim 8 or 9 .

the inference module includes an initial state estimator;
generating the problem statement using the reasoning module includes generating a description of an initial state of the objects in the environment using the initial state estimator.
The information processing device according to claim 1 .

the initial state estimator comprises a trained model for in-context learning;
The information processing device according to claim 11 .

the goal estimator comprises a trained model for in-context learning;
The information processing device according to claim 1 .

The computer
Obtaining observation data of an environment in which the robotic device operates and instruction information regarding a goal of a task to be given to the robotic device;
generating a problem description for the task using a reasoning module , the reasoning module including a goal estimator, the problem description including a description of a goal state of an object in the environment generated from the acquired observation data and the instruction information using the goal estimator; and outputting the generated problem description.
To execute
Information processing methods.

On the computer,
Obtaining observation data of an environment in which the robotic device operates and instruction information regarding a goal of a task to be given to the robotic device;
generating a problem description for the task using a reasoning module , the reasoning module including a goal estimator, the problem description including a description of a goal state of an object in the environment generated from the acquired observation data and the instruction information using the goal estimator; and outputting the generated problem description.
In order to execute
program.

the instruction information in the natural language includes at least one of text input, voice input, and image input ;
The information processing device according to claim 5 .

the sensing data includes at least one of image data, depth data, infrared data, measurement data of an optical sensor, radar data, LiDAR data, sound data, position data, measurement data of a robot device, and data obtained by communication;
The measurement data of the robot device includes at least one of joint angles, hand positions, tactile data at the hand, force data at the hand, and posture measurement data.
The information processing device according to claim 5 .

the description of the target state includes position information of the target of the object;
The information processing device according to claim 1 .

the problem statement further includes an initial state of objects in the environment;
The information processing device according to claim 1 .

the error message is an error message output by a planner in a process of generating a behavior plan for the robot device by providing the generated problem description to the planner.
The information processing device according to claim 6 .