JP7793386B2

JP7793386B2 - Processing system and processing method

Info

Publication number: JP7793386B2
Application number: JP2022002707A
Authority: JP
Inventors: 研一中里
Original assignee: Robert Bosch GmbH
Current assignee: Robert Bosch GmbH
Priority date: 2022-01-12
Filing date: 2022-01-12
Publication date: 2026-01-05
Anticipated expiration: 2042-01-12
Also published as: JP2023102322A

Description

本発明は、処理システムおよび処理方法に関する。 The present invention relates to a processing system and a processing method.

近年、状態に応じて行動を決定するための行動モデルを用いた人工知能（ＡＩ：ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）に関する種々の技術が提案されている。例えば、特許文献１には、強化学習による行動モデルの学習に関する技術が開示されている。 In recent years, various technologies related to artificial intelligence (AI) that use behavioral models to determine behavior depending on the situation have been proposed. For example, Patent Document 1 discloses technology related to learning behavioral models through reinforcement learning.

米国特許出願公開第２０１０／０９４７８６号明細書US Patent Application Publication No. 2010/094786

ところで、人工知能の技術を工場の生産現場等の作業現場に活用することが考えられる。ここで、作業現場では、熟練の作業員が有する高い技能を伝承することが重要である。技能の伝承には、熟練の作業員が時間を割く必要がある。ゆえに、作業現場における技能の伝承を効率良く行うことが望まれる。 By the way, it is conceivable that artificial intelligence technology could be utilized in workplaces such as factory production sites. Here, it is important to transfer the high level of skills possessed by experienced workers at the workplace. Skill transfer requires experienced workers to allocate time. Therefore, it is desirable to transfer skills at the workplace efficiently.

そこで、本発明は、このような課題に鑑み、作業現場における技能の伝承を効率良く行うことが可能な処理システムおよび処理方法を提供することを目的としている。 In light of these issues, the present invention aims to provide a processing system and processing method that enables efficient transfer of skills at the workplace.

上記課題を解決するために、処理システムは、状態に応じて行動を決定するための行動モデルを用いて行動を決定する処理システムであって、行動モデルは、第１作業員の行動を模倣する模倣モデルである第１行動モデルと、第２作業員の行動を模倣する模倣モデルである第２行動モデルとを含み、処理システムは、第１行動モデルを用いて行動を決定する第１決定部と、第１行動モデルの学習を行う第１学習部と、複数の状態の中から特定の状態を選択する選択部と、を備え、第１学習部は、複数の状態の各々において強化学習による第１行動モデルの学習をそれぞれ行い、選択部は、複数の状態の各々について学習後の第１行動モデルと第２行動モデルとの比較を行い、比較の結果に基づいて特定の状態を選択する。 To solve the above problem, the processing system determines behavior using a behavior model for determining behavior depending on the state, and the behavior model includes a first behavior model that is an imitation model that imitates the behavior of a first worker, and a second behavior model that is an imitation model that imitates the behavior of a second worker. The processing system includes a first determination unit that determines behavior using the first behavior model, a first learning unit that learns the first behavior model, and a selection unit that selects a specific state from a plurality of states. The first learning unit learns the first behavior model through reinforcement learning for each of the plurality of states, and the selection unit compares the first behavior model and the second behavior model after learning for each of the plurality of states and selects a specific state based on the results of the comparison.

上記課題を解決するために、処理方法は、状態に応じて行動を決定するための行動モデルを用いて行動を決定する処理方法であって、行動モデルは、第１作業員の行動を模倣する模倣モデルである第１行動モデルと、第２作業員の行動を模倣する模倣モデルである第２行動モデルとを含み、第１行動モデルを用いて行動を決定する第１ステップと、第１行動モデルの学習を行う第２ステップと、複数の状態の中から特定の状態を選択する第３ステップと、を備え、第２ステップにおいて、複数の状態の各々において強化学習による第１行動モデルの学習がそれぞれ行われ、第３ステップにおいて、複数の状態の各々について学習後の第１行動モデルと第２行動モデルとの比較が行われ、比較の結果に基づいて特定の状態が選択される。 To solve the above problem, the processing method determines behavior using a behavior model for determining behavior depending on a state, the behavior model including a first behavior model that is an imitation model that imitates the behavior of a first worker, and a second behavior model that is an imitation model that imitates the behavior of a second worker, and includes a first step of determining behavior using the first behavior model, a second step of learning the first behavior model, and a third step of selecting a specific state from a plurality of states, in which in the second step, the first behavior model is learned by reinforcement learning for each of the plurality of states, and in the third step, the first behavior model and the second behavior model after learning are compared for each of the plurality of states, and a specific state is selected based on the results of the comparison.

本発明によれば、作業現場における技能の伝承を効率良く行うことが可能となる。 This invention makes it possible to efficiently transfer skills at the workplace.

本発明の実施形態に係る処理システムの概略構成を示す模式図である。1 is a schematic diagram illustrating a schematic configuration of a processing system according to an embodiment of the present invention. 本発明の実施形態に係る第１処理装置の機能構成の一例を示すブロック図である。FIG. 2 is a block diagram showing an example of a functional configuration of a first processing device according to an embodiment of the present invention. 本発明の実施形態に係る第２処理装置の機能構成の一例を示すブロック図である。FIG. 2 is a block diagram showing an example of a functional configuration of a second processing device according to an embodiment of the present invention. 本発明の実施形態に係る第１ロボット、第２ロボットおよび第１作業員により行われる処理の全体的な流れの一例を示すフローチャートである。1 is a flowchart showing an example of an overall flow of processing performed by a first robot, a second robot, and a first worker according to an embodiment of the present invention. 本発明の実施形態に係る作業状態の選択処理の実行主体を示す図である。FIG. 10 is a diagram showing an entity that executes a selection process of an operation status according to an embodiment of the present invention. 本発明の実施形態に係る作業状態の選択処理における処理の流れの一例を示すフローチャートである。10 is a flowchart showing an example of a process flow in a work state selection process according to an embodiment of the present invention.

以下に添付図面を参照しながら、本発明の好適な実施形態について詳細に説明する。かかる実施形態に示す寸法、材料、その他具体的な数値等は、発明の理解を容易にするための例示に過ぎず、特に断る場合を除き、本発明を限定するものではない。なお、本明細書および図面において、実質的に同一の機能、構成を有する要素については、同一の符号を付することにより重複説明を省略し、また本発明に直接関係のない要素は図示を省略する。 Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings. Dimensions, materials, and other specific values shown in these embodiments are merely examples to facilitate understanding of the invention and, unless otherwise specified, do not limit the present invention. Furthermore, in this specification and drawings, elements that have substantially the same function and configuration are designated by the same reference numerals to avoid redundant explanation, and elements that are not directly related to the present invention are not shown.

＜処理システムの構成＞
図１～図３を参照して、本発明の実施形態に係る処理システム１の構成について説明する。 <Configuration of processing system>
The configuration of a processing system 1 according to an embodiment of the present invention will be described with reference to FIGS.

図１は、処理システム１の概略構成を示す模式図である。 Figure 1 is a schematic diagram showing the general configuration of the processing system 1.

図１に示されるように、処理システム１は、第１ロボット１０と、第２ロボット２０とを含む。処理システム１は、工場の生産現場等の作業現場において活用される。以下では、処理システム１が工場の生産現場における生産工程に活用される例を説明する。生産工程は、製品を生産する工程である。ただし、処理システム１が活用される作業現場は、製品の生産が行われる生産現場以外の作業現場（例えば、製品の検査が行われる現場等）であってもよい。 As shown in FIG. 1, the processing system 1 includes a first robot 10 and a second robot 20. The processing system 1 is utilized in a work site such as a factory production site. Below, an example is described in which the processing system 1 is utilized in a production process at a factory production site. The production process is a process for producing products. However, the work site in which the processing system 1 is utilized may also be a work site other than a production site where products are produced (for example, a site where product inspection is performed).

第１ロボット１０および第２ロボット２０は、人工知能を用いて行動を決定して実行するロボットである。具体的には、第１ロボット１０および第２ロボット２０は、状態に応じて行動を決定するための行動モデルを有しており、行動モデルを用いて行動を決定して実行する。第１ロボット１０は、第１ロボット１０に関する各種処理を行う第１処理装置１１を備える。第１処理装置１１は、第１行動モデルπ１を用いて第１ロボット１０の行動を決定し、決定した行動を第１ロボット１０に実行させる。第２ロボット２０は、第２ロボット２０に関する各種処理を行う第２処理装置２１を備える。第２処理装置２１は、第２行動モデルπ２を用いて第２ロボット２０の行動を決定し、決定した行動を第２ロボット２０に実行させる The first robot 10 and the second robot 20 are robots that use artificial intelligence to determine and execute actions. Specifically, the first robot 10 and the second robot 20 have behavioral models for determining actions depending on the state, and determine and execute actions using the behavioral models. The first robot 10 is equipped with a first processing device 11 that performs various processes related to the first robot 10. The first processing device 11 determines the actions of the first robot 10 using the first behavioral model π1 and causes the first robot 10 to execute the determined actions. The second robot 20 is equipped with a second processing device 21 that performs various processes related to the second robot 20. The second processing device 21 determines the actions of the second robot 20 using the second behavioral model π2 and causes the second robot 20 to execute the determined actions.

処理システム１が活用される生産工程では、複数の作業員によって製品の生産が行われる。このような複数の作業員には、第１作業員３０と、第２作業員４０とが含まれる。第１作業員３０は、第２作業員４０よりも技能の低い未熟な作業員である。例えば、第１作業員３０は、第２作業員４０よりも作業経験年数が短い作業員である。第２作業員４０は、第１作業員３０よりも技能の高い熟練の作業員である。例えば、第２作業員４０は、第１作業員３０よりも作業経験年数が長い熟練の作業員である。 In a production process in which the processing system 1 is utilized, products are produced by multiple workers. These multiple workers include a first worker 30 and a second worker 40. The first worker 30 is an inexperienced worker with lower skills than the second worker 40. For example, the first worker 30 is a worker with fewer years of work experience than the second worker 40. The second worker 40 is a skilled worker with higher skills than the first worker 30. For example, the second worker 40 is a skilled worker with more years of work experience than the first worker 30.

第１ロボット１０は、第１作業員３０の行動を模倣することが可能である。図１の例では、第１ロボット１０が人型ロボットである例が示されている。ただし、第１ロボット１０は、人型ロボット以外のロボットであってもよい。 The first robot 10 is capable of imitating the actions of the first worker 30. In the example shown in Figure 1, the first robot 10 is a humanoid robot. However, the first robot 10 may be a robot other than a humanoid robot.

第２ロボット２０は、第２作業員４０の行動を模倣することが可能である。図１の例では、第２ロボット２０が人型ロボットである例が示されている。ただし、第２ロボット２０は、人型ロボット以外のロボットであってもよい。 The second robot 20 is capable of imitating the actions of the second worker 40. In the example of Figure 1, the second robot 20 is a humanoid robot. However, the second robot 20 may be a robot other than a humanoid robot.

処理システム１が活用される生産工程では、熟練の作業員である第２作業員４０が有する高い技能を未熟な作業員である第１作業員３０に伝承することが重要である。処理システム１では、第１ロボット１０および第２ロボット２０を活用することによって、後述するように、第１作業員３０の技能の上達を促進し、技能の伝承を効率良く行うことが可能となる。 In production processes that utilize the processing system 1, it is important to transfer the advanced skills of the experienced second worker 40 to the less experienced first worker 30. By utilizing the first robot 10 and the second robot 20, the processing system 1 can promote the improvement of the skills of the first worker 30 and enable the transfer of skills to be carried out efficiently, as described below.

図２は、第１処理装置１１の機能構成の一例を示すブロック図である。 Figure 2 is a block diagram showing an example of the functional configuration of the first processing device 11.

第１処理装置１１は、演算処理装置であるＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＣＰＵが使用するプログラムや演算パラメータ等を記憶する記憶素子であるＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、および、ＣＰＵの実行において適宜変化するパラメータ等を一時記憶する記憶素子であるＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等を含む。 The first processing device 11 includes a CPU (Central Processing Unit), which is an arithmetic processing device; a ROM (Read Only Memory), which is a memory element that stores programs used by the CPU, calculation parameters, etc.; and a RAM (Random Access Memory), which is a memory element that temporarily stores parameters that change as the CPU executes.

図２に示されるように、第１処理装置１１は、例えば、第１取得部１１ａと、第１決定部１１ｂと、第１制御部１１ｃと、第１学習部１１ｄと、選択部１１ｅと、第１記憶部１１ｆとを備える。 As shown in FIG. 2, the first processing device 11 includes, for example, a first acquisition unit 11a, a first determination unit 11b, a first control unit 11c, a first learning unit 11d, a selection unit 11e, and a first memory unit 11f.

第１取得部１１ａは、第１ロボット１０の制御に必要な各種情報を取得する。上述したように、第１ロボット１０の行動は、第１行動モデルπ１を用いて決定される。第１取得部１１ａは、第１行動モデルπ１を用いた行動の決定に必要な情報を取得する。 The first acquisition unit 11a acquires various information necessary for controlling the first robot 10. As described above, the behavior of the first robot 10 is determined using the first behavior model π1. The first acquisition unit 11a acquires information necessary for determining behavior using the first behavior model π1.

第１決定部１１ｂは、第１ロボット１０に実行させる行動を決定する。具体的には、第１決定部１１ｂは、第１行動モデルπ１を用いて第１ロボット１０の行動を決定する。第１行動モデルπ１は、第１記憶部１１ｆに記憶されており、第１作業員３０の行動を模倣する模倣モデルである。第１行動モデルπ１は、例えば、状態と行動のペアを入力すると、入力した行動の評価指数（おすすめ度合い）を出力する関数である。 The first determination unit 11b determines the behavior to be performed by the first robot 10. Specifically, the first determination unit 11b determines the behavior of the first robot 10 using the first behavior model π1. The first behavior model π1 is stored in the first memory unit 11f and is an imitation model that imitates the behavior of the first worker 30. The first behavior model π1 is a function that, for example, when a state-action pair is input, outputs an evaluation index (degree of recommendation) for the input behavior.

第１行動モデルπ１に状態として入力される情報は、例えば、生産現場に関する情報（例えば、作業の種類を示す情報、または、作業で用いる材料の情報等）、第１ロボット１０に関する情報（例えば、第１ロボット１０の位置および姿勢を示す情報等）、または、作業員に関する情報（例えば、作業員の位置および姿勢を示す情報等）が挙げられる。これらの情報は、第１取得部１１ａによって取得される。第１取得部１１ａは、例えば、第１ロボット１０に設けられるカメラにより撮像される画像に基づいて、生産現場に関する情報、および、作業員に関する情報を取得できる。また、第１取得部１１ａは、例えば、第１ロボット１０に設けられる各種センサの出力結果に基づいて、第１ロボット１０に関する情報を取得できる。 The information input as a state to the first behavioral model π1 includes, for example, information about the production site (e.g., information indicating the type of work or information about the materials used in the work), information about the first robot 10 (e.g., information indicating the position and posture of the first robot 10), or information about the worker (e.g., information indicating the position and posture of the worker). This information is acquired by the first acquisition unit 11a. The first acquisition unit 11a can acquire information about the production site and information about the worker, for example, based on images captured by a camera provided on the first robot 10. The first acquisition unit 11a can also acquire information about the first robot 10 based on, for example, the output results of various sensors provided on the first robot 10.

例えば、状態が決まると、いくつかの行動の候補が決まる。第１決定部１１ｂは、これらの行動の候補をそれぞれ第１行動モデルπ１に入力して得られる評価指数同士を比較することで、第１ロボット１０の行動を決定することができる。具体的には、第１決定部１１ｂは、評価指数が最も高くなる行動を第１ロボット１０に行わせる行動として決定する。 For example, once the state is determined, several candidate actions are determined. The first determination unit 11b can determine the action of the first robot 10 by inputting each of these candidate actions into the first behavioral model π1 and comparing the evaluation indices obtained. Specifically, the first determination unit 11b determines the action that results in the highest evaluation index as the action to be performed by the first robot 10.

第１制御部１１ｃは、第１決定部１１ｂにより第１行動モデルπ１を用いて決定された行動を第１ロボット１０に実行させる。例えば、第１制御部１１ｃは、第１ロボット１０に設けられるモータ等のアクチュエータを制御することによって、第１ロボット１０を制御することができる。 The first control unit 11c causes the first robot 10 to perform the behavior determined by the first determination unit 11b using the first behavioral model π1. For example, the first control unit 11c can control the first robot 10 by controlling an actuator such as a motor provided on the first robot 10.

第１学習部１１ｄは、第１行動モデルπ１の学習を行う。なお、第１学習部１１ｄによる第１行動モデルπ１の学習の詳細については、後述する。 The first learning unit 11d learns the first behavioral model π1. Details of the learning of the first behavioral model π1 by the first learning unit 11d will be described later.

選択部１１ｅは、複数の状態の中から特定の状態を選択する。具体的には、選択部１１ｅは、後述するように、複数の作業状態の中から、第１作業員３０の技能の上達を促進するために最適な作業状態を選択する。なお、作業状態の選択処理の詳細については後述する。 The selection unit 11e selects a specific state from among multiple states. Specifically, as described below, the selection unit 11e selects from among multiple work states the optimal work state for promoting improvement in the skills of the first worker 30. Details of the work state selection process will be described later.

作業状態は、作業環境の状態であり、例えば、箱を整理する作業における箱の位置および数である。以下では、第１作業員３０が箱を整理する作業を行う場合について説明する。この場合に、第１作業員３０は、選択部１１ｅにより選択された作業状態において、箱を整理する作業を行うことによって、当該作業の技能を効率的に上達させることができる。ただし、第１作業員３０の技能の上達の対象となる作業は、箱を整理する作業以外の作業であってもよい。 The work state is the state of the work environment, for example, the location and number of boxes in the task of organizing boxes. The following describes the case where the first worker 30 is organizing boxes. In this case, the first worker 30 can efficiently improve his or her skills in the task by organizing boxes in the work state selected by the selection unit 11e. However, the task that is the subject of the first worker 30's skill improvement may be a task other than organizing boxes.

第１記憶部１１ｆは、第１ロボット１０の制御に必要な各種情報を記憶する。具体的には、第１記憶部１１ｆは、第１行動モデルπ１を記憶する。第１記憶部１１ｆに記憶される第１行動モデルπ１は、第１学習部１１ｄにより行われる学習に応じて更新される。 The first memory unit 11f stores various information necessary for controlling the first robot 10. Specifically, the first memory unit 11f stores the first behavioral model π1. The first behavioral model π1 stored in the first memory unit 11f is updated in accordance with the learning performed by the first learning unit 11d.

図３は、第２処理装置２１の機能構成の一例を示すブロック図である。 Figure 3 is a block diagram showing an example of the functional configuration of the second processing device 21.

第２処理装置２１は、演算処理装置であるＣＰＵ、ＣＰＵが使用するプログラムや演算パラメータ等を記憶する記憶素子であるＲＯＭ、および、ＣＰＵの実行において適宜変化するパラメータ等を一時記憶する記憶素子であるＲＡＭ等を含む。 The second processing device 21 includes a CPU, which is an arithmetic processing device; a ROM, which is a memory element that stores programs used by the CPU and calculation parameters; and a RAM, which is a memory element that temporarily stores parameters that change as the CPU executes.

図３に示されるように、第２処理装置２１は、例えば、第２取得部２１ａと、第２決定部２１ｂと、第２制御部２１ｃと、第２学習部２１ｄと、第２記憶部２１ｅとを備える。 As shown in FIG. 3, the second processing device 21 includes, for example, a second acquisition unit 21a, a second determination unit 21b, a second control unit 21c, a second learning unit 21d, and a second memory unit 21e.

第２取得部２１ａは、第２ロボット２０の制御に必要な各種情報を取得する。上述したように、第２ロボット２０の行動は、第２行動モデルπ２を用いて決定される。第２取得部２１ａは、第２行動モデルπ２を用いた行動の決定に必要な情報を取得する。 The second acquisition unit 21a acquires various information necessary for controlling the second robot 20. As described above, the behavior of the second robot 20 is determined using the second behavior model π2. The second acquisition unit 21a acquires information necessary for determining behavior using the second behavior model π2.

第２決定部２１ｂは、第２ロボット２０に実行させる行動を決定する。具体的には、第２決定部２１ｂは、第２行動モデルπ２を用いて第２ロボット２０の行動を決定する。第２行動モデルπ２は、第２記憶部２１ｅに記憶されており、第２作業員４０の行動を模倣する模倣モデルである。 The second determination unit 21b determines the behavior to be performed by the second robot 20. Specifically, the second determination unit 21b determines the behavior of the second robot 20 using the second behavior model π2. The second behavior model π2 is stored in the second memory unit 21e and is an imitation model that imitates the behavior of the second worker 40.

第２行動モデルπ２は、例えば、第１行動モデルπ１と同様に、状態と行動のペアを入力すると、入力した行動の評価指数（おすすめ度合い）を出力する関数である。第２決定部２１ｂは、状態に応じた行動の候補をそれぞれ第２行動モデルπ２に入力して得られる評価指数同士を比較することで、第２ロボット２０の行動を決定することができる。具体的には、第２決定部２１ｂは、評価指数が最も高くなる行動を第２ロボット２０に行わせる行動として決定する。なお、第２行動モデルπ２に状態として入力される情報は、第１行動モデルπ１に状態として入力される情報と同様であり、第１取得部１１ａによる取得方法と同様の方法で、第２取得部２１ａによって取得される。 The second behavioral model π2 is, for example, similar to the first behavioral model π1, a function that, when a state-action pair is input, outputs an evaluation index (degree of recommendation) for the input action. The second determination unit 21b can determine the action of the second robot 20 by inputting candidate actions according to the state into the second behavioral model π2 and comparing the evaluation indices obtained. Specifically, the second determination unit 21b determines the action that results in the highest evaluation index as the action to be performed by the second robot 20. Note that the information input as a state to the second behavioral model π2 is the same as the information input as a state to the first behavioral model π1, and is acquired by the second acquisition unit 21a in a manner similar to that acquired by the first acquisition unit 11a.

第２制御部２１ｃは、第２決定部２１ｂにより第２行動モデルπ２を用いて決定された行動を第２ロボット２０に実行させる。例えば、第２制御部２１ｃは、第２ロボット２０に設けられるモータ等のアクチュエータを制御することによって、第２ロボット２０を制御することができる。 The second control unit 21c causes the second robot 20 to perform the behavior determined by the second determination unit 21b using the second behavioral model π2. For example, the second control unit 21c can control the second robot 20 by controlling an actuator such as a motor provided on the second robot 20.

第２学習部２１ｄは、第２行動モデルπ２の学習を行う。なお、第２学習部２１ｄによる第２行動モデルπ２の学習の詳細については、後述する。 The second learning unit 21d learns the second behavioral model π2. Details of the learning of the second behavioral model π2 by the second learning unit 21d will be described later.

第２記憶部２１ｅは、第２ロボット２０の制御に必要な各種情報を記憶する。具体的には、第２記憶部２１ｅは、第２行動モデルπ２を記憶する。第２記憶部２１ｅに記憶される第２行動モデルπ２は、第２学習部２１ｄにより行われる学習に応じて更新される。 The second memory unit 21e stores various information necessary for controlling the second robot 20. Specifically, the second memory unit 21e stores the second behavioral model π2. The second behavioral model π2 stored in the second memory unit 21e is updated in accordance with the learning performed by the second learning unit 21d.

＜処理システムの動作＞
図４～図６を参照して、本発明の実施形態に係る処理システム１の動作について説明する。 <Operation of the Processing System>
The operation of the processing system 1 according to the embodiment of the present invention will be described with reference to FIGS.

上述したように、処理システム１は、例えば、工場の生産現場において、第１作業員３０の技能の上達の促進に活用される。そして、処理システム１では、第１作業員３０の技能の上達の促進のための処理が、第１ロボット１０、第２ロボット２０および第１作業員３０によって行われる。以下、第１作業員３０の技能の上達の促進のための処理の全体的な流れについて図４および図５を参照して説明した後に、後述する作業状態の選択処理の詳細について図６を参照して説明する。 As described above, the processing system 1 is used, for example, at a factory production site to promote the improvement of the skills of the first worker 30. In the processing system 1, the process for promoting the improvement of the skills of the first worker 30 is performed by the first robot 10, the second robot 20, and the first worker 30. Below, the overall flow of the process for promoting the improvement of the skills of the first worker 30 is explained with reference to Figures 4 and 5, and then the details of the work state selection process described below are explained with reference to Figure 6.

図４は、第１ロボット１０、第２ロボット２０および第１作業員３０により行われる処理の全体的な流れの一例を示すフローチャートである。図４におけるステップＳ１０１は、図４に示される処理フローの開始に対応する。 Figure 4 is a flowchart showing an example of the overall flow of processing performed by the first robot 10, the second robot 20, and the first worker 30. Step S101 in Figure 4 corresponds to the start of the processing flow shown in Figure 4.

図４に示される処理フローが開始されると、ステップＳ１０２において、模倣学習による第１行動モデルπ１の学習が行われる。第１行動モデルπ１の学習は、第１ロボット１０の第１学習部１１ｄによって行われる。 When the processing flow shown in FIG. 4 starts, in step S102, the first behavioral model π1 is learned by imitation learning. The first behavioral model π1 is learned by the first learning unit 11d of the first robot 10.

ステップＳ１０２では、第１学習部１１ｄは、第１作業員３０が行った行動に基づいて、模倣学習による第１行動モデルπ１の学習を行う。例えば、後述するステップＳ１０４において、または、後述するステップＳ１０４の後に、第１作業員３０が実際に作業を行った際の行動が記録される。第１作業員３０の行動の記録では、第１作業員３０が行った行動と、その行動が行われた際の状態（例えば、作業の種類等）が紐づけられている。第１行動モデルπ１の模倣学習は、このようにして得られた第１作業員３０の行動の記録を用いて行われる。第１行動モデルπ１の模倣学習では、第１学習部１１ｄは、第１ロボット１０に種々の行動を行わせ、第１ロボット１０が行った各行動に対して報酬を算出する。ここで、第１学習部１１ｄは、ある状態で第１ロボット１０が行った行動がその状態と対応する第１作業員３０の行動に近いほど（似ているほど）高くなるように報酬を算出する。第１学習部１１ｄは、このような報酬の算出を各状態に対して行う。このようにして得られる報酬は、上述した行動の評価指数（おすすめ度合い）に相当する。第１学習部１１ｄは、得られた状態と行動のペアと報酬との関係を用いて、第１行動モデルπ１における入力（状態と行動のペア）と出力（行動の評価指数（おすすめ度合い））との関係を更新する。 In step S102, the first learning unit 11d learns the first behavioral model π1 through imitation learning based on the actions performed by the first operator 30. For example, in step S104 (described later) or after step S104 (described later), the actions of the first operator 30 when actually performing work are recorded. In the record of the first operator's 30's actions, the actions performed by the first operator 30 are linked to the state (e.g., the type of work) at the time the actions were performed. Imitation learning of the first behavioral model π1 is performed using the record of the first operator's 30's actions obtained in this manner. In imitation learning of the first behavioral model π1, the first learning unit 11d causes the first robot 10 to perform various actions and calculates a reward for each action performed by the first robot 10. Here, the first learning unit 11d calculates a reward such that the closer (more similar) the action performed by the first robot 10 in a certain state is to the action of the first operator 30 corresponding to that state, the higher the reward. The first learning unit 11d calculates this reward for each state. The reward obtained in this way corresponds to the evaluation index (degree of recommendation) of the behavior described above. The first learning unit 11d uses the relationship between the obtained state-action pair and reward to update the relationship between the input (state-action pair) and output (evaluation index (degree of recommendation) of the behavior) in the first behavioral model π1.

ステップＳ１０２の次に、ステップＳ１０３において、複数の作業状態の中から、第１作業員３０の技能の上達を促進するために最適な作業状態を選択する選択処理が行われる。図５は、作業状態の選択処理の実行主体を示す図である。図５に示されるように、作業状態の選択処理は、第１ロボット１０および第２ロボット２０によって行われる。なお、作業状態の選択処理の詳細については後述する。 Following step S102, in step S103, a selection process is performed to select the optimal work state from among multiple work states in order to promote improvement in the skills of the first worker 30. Figure 5 is a diagram showing the entity that executes the work state selection process. As shown in Figure 5, the work state selection process is performed by the first robot 10 and the second robot 20. Details of the work state selection process will be described later.

図４のステップＳ１０３の次に、ステップＳ１０４において、第１作業員３０は、ステップＳ１０３の選択処理で選択された作業状態において、作業を行う。それにより、第１作業員３０は、当該作業の技能を効率的に上達させることができる。なお、ステップＳ１０４において、第１作業員３０の技能が早期に上達しない場合には、第１作業員３０の技能がある程度上達するまで、第１作業員３０に作業を行わせ続けてもよく、第１作業員３０に与える作業状態を適宜変更（修正）してもよい。 Following step S103 in FIG. 4, in step S104, the first operator 30 performs work in the work state selected in the selection process of step S103. This allows the first operator 30 to efficiently improve his or her work skills. Note that if the first operator 30's skills do not improve quickly in step S104, the first operator 30 may continue to perform the work until his or her skills improve to a certain extent, or the work state assigned to the first operator 30 may be changed (modified) as appropriate.

図４に示される処理フローでは、ステップＳ１０４の次に、ステップＳ１０２に戻り、ステップＳ１０２、ステップＳ１０３、および、ステップＳ１０４の処理が繰り返される。それにより、第１作業員３０の成長に伴い第１行動モデルπ１が更新されながら、第１作業員３０の技能の上達の促進のための処理（つまり、ステップＳ１０２、ステップＳ１０３、および、ステップＳ１０４）が繰り返される。 In the processing flow shown in FIG. 4, after step S104, the process returns to step S102, and steps S102, S103, and S104 are repeated. As a result, the first behavioral model π1 is updated as the first operator 30 grows, and the process for promoting improvement in the skills of the first operator 30 (i.e., steps S102, S103, and S104) is repeated.

図６は、作業状態の選択処理における処理の流れの一例を示すフローチャートである。図６におけるステップＳ２０１は、図６に示される処理フローの開始に対応する。図６におけるステップＳ２０５は、図６に示される処理フローの終了に対応する。図６に示される処理フローは、上述した図４に示される処理フローにおけるステップＳ１０３で行われる処理フローの一例に相当する。 Figure 6 is a flowchart showing an example of the processing flow for selecting an operation status. Step S201 in Figure 6 corresponds to the start of the processing flow shown in Figure 6. Step S205 in Figure 6 corresponds to the end of the processing flow shown in Figure 6. The processing flow shown in Figure 6 corresponds to an example of the processing flow performed in step S103 in the processing flow shown in Figure 4 described above.

図６に示される処理フローが開始されると、ステップＳ２０２において、第１ロボット１０の第１学習部１１ｄは、複数の作業状態の各々において強化学習による第１行動モデルπ１の学習をそれぞれ行う。 When the processing flow shown in FIG. 6 starts, in step S202, the first learning unit 11d of the first robot 10 learns the first behavioral model π1 through reinforcement learning in each of the multiple task states.

ステップＳ２０２では、いくつかの作業状態の候補が予め用意されている。そして、第１学習部１１ｄは、用意された複数の作業状態の候補の各々について、第１行動モデルπ１の学習を行う。以下では、理解を容易にするために、箱を整理する作業の作業状態として、箱の位置および数の組み合わせが互いに異なる３つの作業状態である第１作業状態、第２作業状態および第３作業状態が候補として用意されている場合について説明する。ただし、作業状態の候補の数は、３つ以外であってもよい。また、作業状態の種類は、この例に限定されない。 In step S202, several candidate work states are prepared in advance. The first learning unit 11d then learns the first behavioral model π1 for each of the prepared candidate work states. For ease of understanding, the following describes a case in which three candidate work states for the task of organizing boxes, namely, a first work state, a second work state, and a third work state, each with a different combination of box position and number, are prepared as candidate work states. However, the number of candidate work states may be other than three. Furthermore, the types of work states are not limited to this example.

例えば、３つの作業状態が候補として用意されている上記の例では、まず、第１学習部１１ｄは、第１作業状態において、強化学習による第１行動モデルπ１の学習を行う。第１作業状態についての強化学習では、第１学習部１１ｄは、例えば、実際の第１作業状態を模した模擬的な作業状態において第１ロボット１０に種々の行動を行わせ、第１ロボット１０が行った各行動に対して報酬を算出する。ここで、第１学習部１１ｄは、例えば、第１ロボット１０が行った行動の効率が高いほど（例えば、作業時間が短いほど、または、成功率が高いほど）高くなるように報酬を算出する。第１学習部１１ｄは、このような報酬の算出を、各行動に対して行う。そして、第１学習部１１ｄは、得られた行動と報酬との関係を用いて、第１行動モデルπ１における入力（状態と行動のペア）と出力（行動の評価指数（おすすめ度合い））との関係を更新する。第１作業状態についての強化学習による学習後の第１行動モデルπ１は、例えば、第１ロボット１０の第１記憶部１１ｆに記憶される。 For example, in the above example in which three task states are prepared as candidates, the first learning unit 11d first learns the first behavioral model π1 through reinforcement learning in the first task state. In the reinforcement learning for the first task state, the first learning unit 11d, for example, causes the first robot 10 to perform various actions in a simulated task state that mimics the actual first task state, and calculates a reward for each action performed by the first robot 10. Here, the first learning unit 11d calculates the reward so that the higher the efficiency of the action performed by the first robot 10 (for example, the shorter the task time or the higher the success rate). The first learning unit 11d calculates such a reward for each action. Then, the first learning unit 11d uses the obtained relationship between the action and the reward to update the relationship between the input (pair of state and action) and the output (evaluation index (recommended degree) of the action) in the first behavioral model π1. The first behavioral model π1 after learning through reinforcement learning for the first task state is stored, for example, in the first memory unit 11f of the first robot 10.

次に、第１学習部１１ｄは、第１行動モデルπ１を第１作業状態についての強化学習の前の行動モデルに戻した後に、第２作業状態において、強化学習による第１行動モデルπ１の学習を行う。第２作業状態についての強化学習の詳細は、第１作業状態についての強化学習と同様であるので、説明を省略する。第２作業状態についての強化学習による学習後の第１行動モデルπ１は、例えば、第１ロボット１０の第１記憶部１１ｆに記憶される。 Next, the first learning unit 11d returns the first behavior model π1 to the behavior model before reinforcement learning for the first work state, and then learns the first behavior model π1 through reinforcement learning in the second work state. The details of reinforcement learning for the second work state are the same as those for the first work state, so a detailed explanation will be omitted. The first behavior model π1 after learning through reinforcement learning for the second work state is stored, for example, in the first memory unit 11f of the first robot 10.

次に、第１学習部１１ｄは、第１行動モデルπ１を第２作業状態についての強化学習の前の行動モデルに戻した後に、第３作業状態において、強化学習による第１行動モデルπ１の学習を行う。第３作業状態についての強化学習の詳細も、第１作業状態についての強化学習と同様であるので、説明を省略する。第３作業状態についての強化学習による学習後の第１行動モデルπ１は、例えば、第１ロボット１０の第１記憶部１１ｆに記憶される。 Next, the first learning unit 11d returns the first behavior model π1 to the behavior model before reinforcement learning for the second work state, and then learns the first behavior model π1 through reinforcement learning in the third work state. The details of reinforcement learning for the third work state are similar to those for the first work state, so a detailed explanation will be omitted. The first behavior model π1 after learning through reinforcement learning for the third work state is stored, for example, in the first memory unit 11f of the first robot 10.

ステップＳ２０２の次に、ステップＳ２０３において、第１ロボット１０の選択部１１ｅは、複数の作業状態（上記の例では、第１作業状態、第２作業状態および第３作業状態）の各々について学習後の第１行動モデルπ１と第２行動モデルπ２との比較を行う。 Following step S202, in step S203, the selection unit 11e of the first robot 10 compares the learned first behavioral model π1 with the learned second behavioral model π2 for each of a plurality of work states (in the above example, the first work state, the second work state, and the third work state).

上述したように、第２行動モデルπ２は、第２作業員４０の行動を模倣する模倣モデルであり、第２行動モデルπ２の学習は、第２ロボット２０の第２学習部２１ｄによって行われる。具体的には、第２学習部２１ｄは、第２作業員４０が行った行動に基づいて、模倣学習による第２行動モデルπ２の学習を行う。例えば、第２行動モデルπ２の模倣学習は、第１行動モデルπ１の模倣学習と同様に、第２作業員４０の行動の記録を用いて行われる。第２行動モデルπ２の模倣学習では、第２学習部２１ｄは、第２ロボット２０に種々の行動を行わせ、第２ロボット２０が行った各行動に対して報酬を算出する。ここで、第２学習部２１ｄは、ある状態で第２ロボット２０が行った行動がその状態と対応する第２作業員４０の行動に近いほど（似ているほど）高くなるように報酬を算出する。第２学習部２１ｄは、このような報酬の算出を各状態に対して行う。そして、第２学習部２１ｄは、得られた状態と行動のペアと報酬との関係を用いて、第２行動モデルπ２における入力（状態と行動のペア）と出力（行動の評価指数（おすすめ度合い））との関係を更新する。 As described above, the second behavioral model π2 is an imitation model that imitates the behavior of the second worker 40, and learning of the second behavioral model π2 is performed by the second learning unit 21d of the second robot 20. Specifically, the second learning unit 21d learns the second behavioral model π2 through imitation learning based on the behavior performed by the second worker 40. For example, imitation learning of the second behavioral model π2 is performed using a record of the behavior of the second worker 40, similar to imitation learning of the first behavioral model π1. In imitation learning of the second behavioral model π2, the second learning unit 21d causes the second robot 20 to perform various actions and calculates a reward for each action performed by the second robot 20. Here, the second learning unit 21d calculates the reward so that the closer (more similar) the action performed by the second robot 20 in a certain state is to the action of the second worker 40 corresponding to that state, the higher the reward. The second learning unit 21d calculates this reward for each state. The second learning unit 21d then uses the obtained relationship between the state-action pairs and the reward to update the relationship between the input (state-action pairs) and the output (evaluation index (degree of recommendation) of the action) in the second behavioral model π2.

ステップＳ２０３では、選択部１１ｅは、ステップＳ２０２での作業状態の各候補についての学習により得られた学習後の各第１行動モデルπ１と第２行動モデルπ２との比較を行う。具体的には、選択部１１ｅは、学習後の各第１行動モデルπ１の第２行動モデルπ２に対する類似度を求める。当該類似度は、学習後の第１行動モデルπ１が第２行動モデルπ２に似ているほど高くなる。学習後の第１行動モデルπ１の第２行動モデルπ２に対する類似度は、例えば、各行動モデルに種々の入力情報を入力した際の出力結果同士の比較により求められてもよく、相関関数等を用いる方法等の計算によって求められてもよい。 In step S203, the selection unit 11e compares each of the learned first behavioral models π1 and the second behavioral model π2 obtained by learning each candidate task state in step S202. Specifically, the selection unit 11e calculates the similarity of each learned first behavioral model π1 to the second behavioral model π2. The similarity increases the more similar the learned first behavioral model π1 is to the second behavioral model π2. The similarity of the learned first behavioral model π1 to the second behavioral model π2 may be calculated, for example, by comparing the output results when various pieces of input information are input to each behavioral model, or by calculation using a method such as a correlation function.

例えば、３つの作業状態が候補として用意されている上記の例では、まず、選択部１１ｅは、第１作業状態についての強化学習による学習後の第１行動モデルπ１の第２行動モデルπ２に対する類似度を求める。次に、選択部１１ｅは、第２作業状態についての強化学習による学習後の第１行動モデルπ１の第２行動モデルπ２に対する類似度を求める。次に、選択部１１ｅは、第３作業状態についての強化学習による学習後の第１行動モデルπ１の第２行動モデルπ２に対する類似度を求める。 For example, in the above example where three task states are prepared as candidates, the selection unit 11e first calculates the similarity of the first task state's first behavioral model π1 to the second behavioral model π2 after learning through reinforcement learning. Next, the selection unit 11e ... third task state's first behavioral model π1 to the second behavioral model π2 after learning through reinforcement learning.

ステップＳ２０３の次に、ステップＳ２０４において、第１ロボット１０の選択部１１ｅは、複数の作業状態（上記の例では、第１作業状態、第２作業状態および第３作業状態）のうち、学習後の第１行動モデルπ１の第２行動モデルπ２に対する類似度が最も高い状態を最適な作業状態として選択し、図６に示される処理フローは終了する。 Following step S203, in step S204, the selection unit 11e of the first robot 10 selects, from among the multiple work states (in the above example, the first work state, the second work state, and the third work state), the state in which the post-learning first behavioral model π1 has the highest similarity to the second behavioral model π2 as the optimal work state, and the processing flow shown in Figure 6 ends.

上述したように、第２行動モデルπ２は、熟練の作業員である第２作業員４０の行動を模倣する模倣モデルである。ゆえに、学習後の第１行動モデルπ１の第２行動モデルπ２に対する類似度が高いほど（つまり、学習後の第１行動モデルπ１が第２行動モデルπ２に似ているほど）、学習後の第１行動モデルπ１を用いる第１ロボット１０の行動は、第２作業員４０の行動に近いことになる。ゆえに、上記の類似度が最も高い状態は、第１作業員３０の技能の上達を促進し、第１作業員３０の技能を第２作業員４０の技能に効率良く近づけるために最適な作業状態となる。 As described above, the second behavioral model π2 is an imitation model that imitates the behavior of the second operator 40, who is a skilled operator. Therefore, the higher the similarity of the learned first behavioral model π1 to the second behavioral model π2 (i.e., the more similar the learned first behavioral model π1 is to the second behavioral model π2), the closer the behavior of the first robot 10 using the learned first behavioral model π1 will be to the behavior of the second operator 40. Therefore, the state with the highest similarity is the optimal working state for promoting improvement in the skills of the first operator 30 and efficiently bringing the skills of the first operator 30 closer to those of the second operator 40.

以上説明したように、処理システム１では、第１ロボット１０の第１学習部１１ｄは、複数の作業状態の各々において強化学習による第１行動モデルπ１の学習をそれぞれ行い、選択部１１ｅは、複数の作業状態の各々について学習後の第１行動モデルπ１と第２行動モデルπ２との比較を行い、当該比較の結果に基づいて最適な作業状態を選択する。それにより、第１作業員３０の技能の上達を促進するために最適な作業状態を選択し、そのような作業状態を第１作業員３０に与えることができる。 As described above, in the processing system 1, the first learning unit 11d of the first robot 10 learns the first behavioral model π1 through reinforcement learning for each of a plurality of work states, and the selection unit 11e compares the learned first behavioral model π1 with the second behavioral model π2 for each of the plurality of work states and selects the optimal work state based on the results of the comparison. This makes it possible to select the optimal work state to promote improvement in the skills of the first worker 30, and to provide such work state to the first worker 30.

そして、そのような作業状態において第１作業員３０が作業を行うことによって、第１作業員３０の技能の上達を促進させることができる。それにより、第１作業員３０の技能を熟練の作業員である第２作業員４０の技能に効率良く近づけることができる。このように、処理システム１によれば、第２作業員４０の時間を節約しつつ、第１作業員３０の技能を効率良く上達させることができるので、作業現場における技能の伝承を効率良く行うことができる。 By having the first worker 30 work in such a working condition, the improvement of the first worker's skills can be promoted. This allows the first worker's skills to efficiently approach those of the second worker 40, who is an experienced worker. In this way, the processing system 1 can efficiently improve the skills of the first worker 30 while saving the second worker 40's time, thereby enabling the efficient transfer of skills at the work site.

また、上述したように、第１ロボット１０の第１学習部１１ｄは、選択部１１ｅにより選択された作業状態において第１作業員３０が作業した後に、第１作業員３０が行った行動に基づいて、模倣学習による第１行動モデルπ１の学習を行う。具体的には、第１作業員３０は、上述した図６の処理フローで選択された最適な作業状態において、作業を行う（図４のステップＳ１０４）。それにより、第１作業員３０は成長する。その後、模倣学習による第１行動モデルπ１の学習が行われる（図４のステップＳ１０２）。それにより、成長後の第１作業員３０の行動を模倣するように、第１行動モデルπ１を更新することができる。 As described above, the first learning unit 11d of the first robot 10 learns the first behavioral model π1 through imitation learning based on the behavior of the first operator 30 after the first operator 30 works in the work state selected by the selection unit 11e. Specifically, the first operator 30 works in the optimal work state selected in the processing flow of FIG. 6 described above (step S104 in FIG. 4). As a result, the first operator 30 grows. Thereafter, the first behavioral model π1 is learned through imitation learning (step S102 in FIG. 4). As a result, the first behavioral model π1 can be updated to imitate the behavior of the first operator 30 after growth.

なお、上記では、第２行動モデルπ２が１人の第２作業員４０の行動を模倣する模倣モデルである例を説明した。ただし、第２行動モデルπ２は、このような行動モデルに限定されない。例えば、複数の第２作業員４０の各々の行動を模倣する互いに異なる複数の模倣モデルを用意し、これらの複数の模倣モデルを統合することによって第２行動モデルπ２が生成されてもよい。例えば、上記の複数の模倣モデルの相加平均が、第２行動モデルπ２として生成されてもよい。上記のように、第２行動モデルπ２は、第２作業員４０の行動を模倣する複数の模倣モデルを統合して生成されてもよい。それにより、より幅広い様々な技能の伝承を行うことができる。 In the above, an example has been described in which the second behavioral model π2 is an imitation model that imitates the behavior of a single second worker 40. However, the second behavioral model π2 is not limited to this type of behavioral model. For example, the second behavioral model π2 may be generated by preparing multiple different imitation models that imitate the behavior of multiple second workers 40 and integrating these multiple imitation models. For example, the arithmetic mean of the multiple imitation models may be generated as the second behavioral model π2. As described above, the second behavioral model π2 may be generated by integrating multiple imitation models that imitate the behavior of the second worker 40. This enables a wider range of skills to be passed on.

なお、上記では、第１処理装置１１の各機能部（つまり、第１取得部１１ａ、第１決定部１１ｂ、第１制御部１１ｃ、第１学習部１１ｄ、選択部１１ｅおよび第１記憶部１１ｆ）と、第２処理装置２１の各機能部（つまり、第２取得部２１ａ、第２決定部２１ｂ、第２制御部２１ｃ、第２学習部２１ｄおよび第２記憶部２１ｅ）とが別々のロボットに分けられている例を説明した。ただし、上記で説明した各機能部を１つの装置が有していてもよい。例えば、上記で説明した第２ロボット２０が行う処理を内部的に計算するシミュレーターを第１ロボット１０が有していてもよい。この場合、第２ロボット２０は処理システム１から省略され得る。 In the above example, the functional units of the first processing device 11 (i.e., the first acquisition unit 11a, the first determination unit 11b, the first control unit 11c, the first learning unit 11d, the selection unit 11e, and the first memory unit 11f) and the functional units of the second processing device 21 (i.e., the second acquisition unit 21a, the second determination unit 21b, the second control unit 21c, the second learning unit 21d, and the second memory unit 21e) are separated into separate robots. However, each of the functional units described above may be included in a single device. For example, the first robot 10 may include a simulator that internally calculates the processing performed by the second robot 20 described above. In this case, the second robot 20 may be omitted from the processing system 1.

＜処理システムの効果＞
本発明の実施形態に係る処理システム１の効果について説明する。 <Effects of the treatment system>
The effects of the processing system 1 according to the embodiment of the present invention will be described.

処理システム１では、行動モデルは、第１作業員３０の行動を模倣する模倣モデルである第１行動モデルπ１と、第２作業員４０の行動を模倣する模倣モデルである第２行動モデルπ２とを含む。処理システム１は、第１行動モデルπ１を用いて行動を決定する第１決定部１１ｂと、第１行動モデルπ１の学習を行う第１学習部１１ｄと、複数の状態（上記の例では、作業状態）の中から特定の状態（上記の例では、最適な作業状態）を選択する選択部１１ｅとを備える。ここで、第１学習部１１ｄは、複数の状態の各々において強化学習による第１行動モデルπ１の学習をそれぞれ行い、選択部１１ｅは、複数の状態の各々について学習後の第１行動モデルπ１と第２行動モデルπ２との比較を行い、当該比較の結果に基づいて特定の状態を選択する。それにより、選択された特定の状態を第１作業員３０に与えることによって、第１作業員３０の技能を第２作業員４０の技能に効率良く近づけることができる。ゆえに、第２作業員４０の時間を節約しつつ、第１作業員３０への技能伝承ができるので、作業現場における技能の伝承を効率良く行うことができる。 In the processing system 1, the behavioral models include a first behavioral model π1, which is an imitation model that imitates the behavior of the first worker 30, and a second behavioral model π2, which is an imitation model that imitates the behavior of the second worker 40. The processing system 1 includes a first determination unit 11b that determines behavior using the first behavioral model π1, a first learning unit 11d that learns the first behavioral model π1, and a selection unit 11e that selects a specific state (in the above example, the optimal work state) from multiple states (in the above example, the work states). Here, the first learning unit 11d learns the first behavioral model π1 using reinforcement learning for each of the multiple states, and the selection unit 11e compares the learned first behavioral model π1 with the second behavioral model π2 for each of the multiple states and selects a specific state based on the results of the comparison. By providing the selected specific state to the first worker 30, the skills of the first worker 30 can be efficiently brought closer to those of the second worker 40. This allows the second worker 40 to transfer skills to the first worker 30 while saving time, making it possible to efficiently transfer skills at the work site.

好ましくは、処理システム１では、選択部１１ｅは、複数の状態（上記の例では、作業状態）のうち、学習後の第１行動モデルπ１の第２行動モデルπ２に対する類似度が最も高い状態を特定の状態（上記の例では、最適な作業状態）として選択する。それにより、第１作業員３０の技能の上達を促進するために最適な状態を適切に選択できる。ゆえに、作業現場における技能の伝承を効率良く行うことが適切に実現される。 Preferably, in the processing system 1, the selector 11e selects, from among multiple states (in the above example, work states), the state in which the first behavioral model π1 after learning has the highest similarity to the second behavioral model π2 as the specific state (in the above example, the optimal work state). This makes it possible to appropriately select the optimal state to promote improvement in the skills of the first worker 30. This therefore makes it possible to appropriately achieve efficient skills transfer at the workplace.

好ましくは、処理システム１では、第１学習部１１ｄは、選択部１１ｅにより選択された特定の状態（上記の例では、最適な作業状態）において第１作業員３０が作業した後に、第１作業員３０が行った行動に基づいて、模倣学習による第１行動モデルπ１の学習を行う。それにより、成長後の第１作業員３０の行動を模倣するように、第１行動モデルπ１を更新することができる。 Preferably, in the processing system 1, the first learning unit 11d learns the first behavioral model π1 by imitation learning based on the behavior of the first worker 30 after the first worker 30 works in the specific state selected by the selection unit 11e (in the above example, the optimal working state). This allows the first behavioral model π1 to be updated so that it imitates the behavior of the first worker 30 after growth.

好ましくは、処理システム１では、第２行動モデルπ２は、第２作業員４０の行動を模倣する複数の模倣モデルを統合して生成される。それにより、より幅広い様々な技能の伝承を行うことができる。 Preferably, in the processing system 1, the second behavioral model π2 is generated by integrating multiple imitation models that imitate the behavior of the second worker 40. This allows for the transfer of a wider range of skills.

好ましくは、処理システム１では、第２作業員４０は、第１作業員３０よりも技能の高い作業員である。それにより、第１学習部１１ｄおよび選択部１１ｅによる上述した処理によって、第２作業員４０の時間を節約しつつ、第１作業員３０の技能を効率良く上達させることが適切に実現される。ゆえに、作業現場における技能の伝承を効率良く行うことが適切に実現される。 Preferably, in the processing system 1, the second operator 40 is a worker with higher skills than the first operator 30. As a result, the above-described processing by the first learning unit 11d and the selection unit 11e appropriately achieves efficient improvement of the skills of the first operator 30 while saving the second operator 40's time. Therefore, efficient transfer of skills at the work site is appropriately achieved.

以上、添付図面を参照しつつ本発明の好適な実施形態について説明したが、本発明は上述した実施形態に限定されないことは勿論であり、特許請求の範囲に記載された範疇における各種の変更例または修正例についても、本発明の技術的範囲に属することは言うまでもない。 The above describes a preferred embodiment of the present invention with reference to the accompanying drawings. However, it goes without saying that the present invention is not limited to the above-described embodiment, and that various modifications and alterations within the scope of the claims also fall within the technical scope of the present invention.

例えば、本明細書においてフローチャートを用いて説明した処理は、必ずしもフローチャートに示された順序で実行されなくてもよい。いくつかの処理ステップは、並列的に実行されてもよい。また、追加的な処理ステップが採用されてもよく、一部の処理ステップが省略されてもよい。 For example, the processes described herein using flowcharts do not necessarily have to be performed in the order shown in the flowcharts. Some processing steps may be performed in parallel. Additionally, additional processing steps may be employed, and some processing steps may be omitted.

また、例えば、上記で説明した処理システム１による一連の制御処理は、ソフトウェア、ハードウェア、およびソフトウェアとハードウェアとの組合せのいずれを用いて実現されてもよい。ソフトウェアを構成するプログラムは、例えば、情報処理装置の内部または外部に設けられる記憶媒体に予め格納される。 Furthermore, for example, the series of control processes performed by the processing system 1 described above may be realized using software, hardware, or a combination of software and hardware. The programs that make up the software are stored in advance, for example, on a storage medium provided inside or outside the information processing device.

１処理システム
１０第１ロボット
１１第１処理装置
１１ａ第１取得部
１１ｂ第１決定部
１１ｃ第１制御部
１１ｄ第１学習部
１１ｅ選択部
１１ｆ第１記憶部
２０第２ロボット
２１第２処理装置
２１ａ第２取得部
２１ｂ第２決定部
２１ｃ第２制御部
２１ｄ第２学習部
２１ｅ第２記憶部
３０第１作業員
４０第２作業員
π１第１行動モデル
π２第２行動モデル 1 Processing system 10 First robot 11 First processing device 11a First acquisition unit 11b First decision unit 11c First control unit 11d First learning unit 11e Selection unit 11f First storage unit 20 Second robot 21 Second processing device 21a Second acquisition unit 21b Second decision unit 21c Second control unit 21d Second learning unit 21e Second storage unit 30 First worker 40 Second worker π1 First behavioral model π2 Second behavioral model

Claims

A processing system (1) for determining an action using a behavior model for determining an action according to a state,
The behavioral models include a first behavioral model (π1) that is an imitation model that imitates the behavior of a first worker (30) and a second behavioral model (π2) that is an imitation model that imitates the behavior of a second worker (40),
The processing system (1)
a first determination unit (11b) that determines an action using the first action model (π1);
a first learning unit (11d) that learns the first behavioral model (π1);
a selection unit (11e) for selecting a specific state from a plurality of states;
Equipped with
the first learning unit (11d) learns the first behavior model (π1) by reinforcement learning in each of the plurality of states;
the selection unit (11e) compares the first behavioral model (π1) and the second behavioral model (π2) after learning for each of the plurality of states, and selects the specific state based on a result of the comparison.
Processing system.

the selection unit (11e) selects, from the plurality of states, a state in which the degree of similarity of the first behavioral model (π1) to the second behavioral model (π2) after learning is highest as the specific state;
The processing system of claim 1 .

the first learning unit (11d) learns the first behavioral model (π1) by imitation learning based on the behavior of the first worker (30) after the first worker (30) works in the specific state selected by the selection unit (11e);
The processing system according to claim 1 or 2.

The second behavioral model (π2) is generated by integrating a plurality of imitation models that imitate the behavior of the second worker (40).
The processing system according to any one of claims 1 to 3.

The second worker (40) is a worker with higher skills than the first worker (30).
The processing system according to any one of claims 1 to 4.

A processing method for determining an action using a behavior model for determining an action according to a state, comprising:
The behavioral models include a first behavioral model (π1) that is an imitation model that imitates the behavior of a first worker (30) and a second behavioral model (π2) that is an imitation model that imitates the behavior of a second worker (40),
a first step of determining an action using the first action model (π1);
a second step of learning the first behavioral model (π1);
a third step of selecting a particular state from the plurality of states;
Equipped with
In the second step, learning of the first behavioral model (π1) by reinforcement learning is performed in each of the plurality of states;
In the third step, a comparison is made between the first behavioral model (π1) and the second behavioral model (π2) after learning for each of the plurality of states, and the specific state is selected based on a result of the comparison.
Processing method.