JP7601241B2

JP7601241B2 - Support device, support method, and support program

Info

Publication number: JP7601241B2
Application number: JP2023546606A
Authority: JP
Inventors: 美沙深井; 将志田所; 晴夫大石
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2021-09-07
Filing date: 2021-09-07
Publication date: 2024-12-17
Anticipated expiration: 2041-09-07
Also published as: WO2023037423A1; JPWO2023037423A1

Description

本発明は、支援装置、支援方法及び支援プログラムに関する。 The present invention relates to an assistance device, an assistance method, and an assistance program.

現実世界の環境を使ってモデルの強化学習を行う場合、環境に取り返しのつかない悪影響を与えないように注意する必要がある。 When using real-world environments for reinforcement learning of models, care must be taken to avoid irreparably damaging the environment.

例えば、強化学習は自動車の自動運転及びロボットの制御に利用される。一方で、現実世界の環境を使った強化学習の過程で、ロボットのハードウェアが損壊すること、及び自動車が衝突事故を起こすことが考えられる。For example, reinforcement learning is used to control self-driving cars and robots. However, during reinforcement learning in real-world environments, it is possible that the robot's hardware may be damaged or that the car may have a collision.

これに対し、非特許文献１には、Lyapunovの手法を基にした強化学習であって、安全な行動に報酬を与える強化学習の技術が開示されている。In response to this, non-patent document 1 discloses a reinforcement learning technique based on Lyapunov's method that rewards safe behavior.

また、非特許文献２には、転移ダイナミクス及び壊滅的な状態に関する不確実性をサンドボックスにおいて訓練しておき、現実の環境においては、エージェントが壊滅的な状態を避けるように学習を行うＣＡＲＬ（Cautious Adaptation For Reinforcement Learning）と呼ばれる技術が開示されている。In addition, non-patent document 2 discloses a technology called CARL (Cautious Adaptation For Reinforcement Learning), in which uncertainties regarding transition dynamics and catastrophic states are trained in a sandbox, and in the real environment, the agent learns to avoid catastrophic states.

Yinlam Chow, Ofir Nachum, Edgar Duenez-Guzman, Mohammad Ghavamzadeh, "A Lyapunov-based Approach to Safe Reinforcement Learning", arXiv:1805.07708v1 [cs.LG] 20 May 2018Yinlam Chow, Ofir Nachum, Edgar Duenez-Guzman, Mohammad Ghavamzadeh, "A Lyapunov-based Approach to Safe Reinforcement Learning", arXiv:1805.07708v1 [cs.LG] 20 May 2018 Jesse Zhang, Brian Cheung, Chelsea Finn, Sergey Levine, Dinesh Jayaraman, "Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings", arXiv:2008.06622v1 [cs.LG] 15 Aug 2020Jesse Zhang, Brian Cheung, Chelsea Finn, Sergey Levine, Dinesh Jayaraman, "Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings", arXiv:2008.06622v1 [cs.LG] 15 Aug 2020

しかしながら、従来の技術には、業務を支援するための強化学習を低コストかつ容易に行うことが困難な場合があるという問題がある。However, conventional technologies have the problem that it can be difficult to perform reinforcement learning to support business operations at low cost and easily.

業務を支援するための強化学習においては、業務環境に対して様々な行動をエージェントに試行錯誤させ、最適な行動を学習させる。その際、エージェントが行動を行うための環境の準備が必要となる。 In reinforcement learning to support business operations, an agent is made to try and error various actions in the business environment and learn the optimal action. In this case, it is necessary to prepare the environment in which the agent can take action.

例えば、業務においては、何をもって安全な行動と設定するべきかが不明瞭な場合が多い。このため、非特許文献１の技術を業務に適用し、適切な報酬を設計することは難しい。For example, in business, it is often unclear what should be defined as safe behavior. For this reason, it is difficult to apply the technology in Non-Patent Document 1 to business and design appropriate rewards.

また、例えば、非特許文献２に記載のサンドボックスのようなシミュレーション環境を用意するためには多大なコストがかかり、また、業務の学習に最適なシミュレーション環境を用意することは困難である。 In addition, preparing a simulation environment such as the sandbox described in non-patent document 2 is very costly, and it is difficult to prepare a simulation environment that is optimal for learning about a business.

上述した課題を解決し、目的を達成するために、支援装置は、業務における環境に関する情報である環境情報と、前記業務における行動に関する情報である行動情報と、を取得する取得部と、前記環境情報と前記行動情報とを対応付けて抽出する抽出部と、前記抽出部によって対応付けられた前記環境情報と前記行動情報との組み合わせに基づき、前記業務を実行するモデルの強化学習における環境及び報酬を設計する設計部と、を有することを特徴とする。In order to solve the above-mentioned problems and achieve the objectives, the support device is characterized by having an acquisition unit that acquires environmental information, which is information about the environment in a task, and behavioral information, which is information about behavior in the task, an extraction unit that associates and extracts the environmental information and the behavioral information, and a design unit that designs an environment and reward in reinforcement learning of a model that performs the task based on the combination of the environmental information and the behavioral information associated by the extraction unit.

本発明によれば、業務を支援するための強化学習を低コストかつ容易に行うことができる。 According to the present invention, reinforcement learning to support business operations can be performed easily and at low cost.

図１は、第１の実施形態の支援装置の構成例を示す図である。FIG. 1 is a diagram illustrating an example of the configuration of a support device according to a first embodiment. 図２は、環境情報と行動情報の例を示す図である。FIG. 2 is a diagram showing an example of the environmental information and the behavioral information. 図３は、環境情報と行動情報の例を示す図である。FIG. 3 is a diagram showing an example of the environmental information and the behavioral information. 図４は、環境情報と行動情報の例を示す図である。FIG. 4 is a diagram showing an example of the environmental information and the behavioral information. 図５は、環境情報と行動情報の例を示す図である。FIG. 5 is a diagram showing an example of the environmental information and the behavioral information. 図６は、環境設計と行動設計の例を示す図である。FIG. 6 is a diagram showing an example of an environment design and an action design. 図７は、取得処理の流れを示すフローチャートである。FIG. 7 is a flowchart showing the flow of the acquisition process. 図８は、抽出処理の流れを示すフローチャートである。FIG. 8 is a flowchart showing the flow of the extraction process. 図９は、学習処理の流れを示すフローチャートである。FIG. 9 is a flowchart showing the flow of the learning process. 図１０は、実行処理の流れを示すフローチャートである。FIG. 10 is a flowchart showing the flow of the execution process. 図１１は、支援プログラムを実行するコンピュータの一例を示す図である。FIG. 11 is a diagram illustrating an example of a computer that executes the assistance program.

以下に、本願に係る支援装置、支援方法及び支援プログラムの実施形態を図面に基づいて詳細に説明する。なお、本発明は、以下に説明する実施形態により限定されるものではない。 Below, the embodiments of the support device, support method, and support program according to the present application are described in detail with reference to the drawings. Note that the present invention is not limited to the embodiments described below.

従来、業務を支援するモデルの強化学習は、実際に業務が行われている環境を用いて行われる。そのような場合、業務へ悪影響が生じることが考えられる。一方で、実施形態によれば、業務への悪影響を抑止しつつ、業務を支援するためのモデルの強化学習を行うことができる。Conventionally, reinforcement learning of models to support business operations is performed in an environment in which the business operations are actually carried out. In such cases, it is conceivable that adverse effects on the business operations may occur. On the other hand, according to an embodiment, reinforcement learning of models to support business operations can be performed while preventing adverse effects on the business operations.

なお、実施形態における業務は、人間が実施するあらゆる業務を含むものとする。例えば、業務には、ＰＣ（Personal Computer）等の端末装置への入力作業、音声及びテキスト等による顧客からの問い合わせ対応、設備の点検等が含まれる。In the embodiments, the tasks include all tasks performed by humans. For example, the tasks include inputting data into a terminal device such as a personal computer (PC), responding to customer inquiries by voice or text, inspecting equipment, etc.

例えば、端末装置への入力作業を自動化するモデルは、端末装置によって表示される画面のキャプチャ画像に基づき、人間の入力作業を模した操作を端末装置に対して自動的に行う。また、モデルは、例えばニューラルネットワークを用いたものであってもよい。For example, a model that automates input operations into a terminal device automatically performs operations on the terminal device that mimic human input operations based on a captured image of the screen displayed by the terminal device. The model may also use, for example, a neural network.

［第１の実施形態の構成］
まず、図１を用いて、第１の実施形態に係る支援装置の構成について説明する。図１は、第１の実施形態に係る支援装置の構成の一例を示す図である。 [Configuration of the first embodiment]
First, the configuration of the support device according to the first embodiment will be described with reference to Fig. 1. Fig. 1 is a diagram showing an example of the configuration of the support device according to the first embodiment.

図１に示すように、支援システム１は、支援装置１０及び端末装置２０を有する。支援装置１０は、端末装置２０と接続されている。端末装置２０は、作業者が業務に関する作業を行うためのＰＣ等の装置である。また、支援装置１０は、作業者が持つカメラ、マイク、又はウェアラブル装置等と接続されていてもよい。As shown in FIG. 1, the support system 1 has a support device 10 and a terminal device 20. The support device 10 is connected to the terminal device 20. The terminal device 20 is a device such as a PC that a worker uses to perform work related to his or her job. The support device 10 may also be connected to a camera, microphone, or wearable device carried by the worker.

ここで、支援装置１０の各部について説明する。図１に示すように、支援装置１０は、入出力部１１、記憶部１２及び制御部１３を有する。Here, we will explain each part of the support device 10. As shown in Figure 1, the support device 10 has an input/output unit 11, a memory unit 12, and a control unit 13.

入出力部１１は、データの入力及び出力のためのインタフェースである。例えば、入出力部１１はＮＩＣ（Network Interface Card）である。入出力部１１は他の装置との間でデータの送受信を行うことができる。The input/output unit 11 is an interface for inputting and outputting data. For example, the input/output unit 11 is a NIC (Network Interface Card). The input/output unit 11 can send and receive data with other devices.

また、入出力部１１は、マウスやキーボード等の入力装置と接続されていてもよい。また、入出力部１１は、ディスプレイ及びスピーカ等の出力装置と接続されていてもよい。The input/output unit 11 may also be connected to an input device such as a mouse or a keyboard. The input/output unit 11 may also be connected to an output device such as a display or a speaker.

記憶部１２は、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、光ディスク等の記憶装置である。なお、記憶部１２は、ＲＡＭ（Random Access Memory）、フラッシュメモリ、ＮＶＳＲＡＭ（Non Volatile Static Random Access Memory）等のデータを書き換え可能な半導体メモリであってもよい。The storage unit 12 is a storage device such as a hard disk drive (HDD), a solid state drive (SSD), an optical disk, etc. The storage unit 12 may be a semiconductor memory in which data can be rewritten, such as a random access memory (RAM), a flash memory, or a non-volatile static random access memory (NVSRAM).

記憶部１２は、支援装置１０で実行されるＯＳ（Operating System）や各種プログラムを記憶する。記憶部１２は、学習情報１２１及びモデル情報１２２を記憶する。The memory unit 12 stores the OS (Operating System) and various programs executed by the support device 10. The memory unit 12 stores learning information 121 and model information 122.

学習情報１２１は、強化学習を行うための情報である。学習情報１２１は、強化学習における報酬及び環境を含む。 Learning information 121 is information for performing reinforcement learning. Learning information 121 includes rewards and environments in reinforcement learning.

モデル情報１２２は、業務を支援するモデルを構築するためのパラメータ等の情報である。モデルがニューラルネットワークである場合、モデル情報１２２にはノードごとの重み及びバイアス等が含まれる。 Model information 122 is information such as parameters for constructing a model that supports business operations. If the model is a neural network, model information 122 includes weights and biases for each node.

なお、モデルは、作業の環境を示す情報の入力を受け付け、行動に関する情報を出力する。支援装置１０は、出力した情報に基づき作業の支援を行う。The model receives input of information indicating the work environment and outputs information related to the behavior. The assistance device 10 assists with the work based on the output information.

制御部１３は、支援装置１０全体を制御する。制御部１３は、例えば、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）、ＧＰＵ（Graphics Processing Unit）等の電子回路や、ＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field Programmable Gate Array）等の集積回路である。The control unit 13 controls the entire support device 10. The control unit 13 is, for example, an electronic circuit such as a CPU (Central Processing Unit), MPU (Micro Processing Unit), or GPU (Graphics Processing Unit), or an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or FPGA (Field Programmable Gate Array).

また、制御部１３は、各種の処理手順を規定したプログラムや制御データを格納するための内部メモリを有し、内部メモリを用いて各処理を実行する。また、制御部１３は、各種のプログラムが動作することにより各種の処理部として機能する。例えば、制御部１３は、取得部１３１、抽出部１３２、設計部１３３、学習部１３４及び実行部１３５を有する。The control unit 13 also has an internal memory for storing programs and control data that define various processing procedures, and executes each process using the internal memory. The control unit 13 also functions as various processing units by the operation of various programs. For example, the control unit 13 has an acquisition unit 131, an extraction unit 132, a design unit 133, a learning unit 134, and an execution unit 135.

取得部１３１は、業務における環境に関する情報である環境情報と、業務における行動に関する情報である行動情報と、を取得する。The acquisition unit 131 acquires environmental information, which is information regarding the environment in business, and behavioral information, which is information regarding behavior in business.

取得部１３１は、業務を実施する作業者の行動に関する情報を行動情報として取得し、作業者に関する環境の情報を環境情報として取得することができる。また、取得部１３１は、作業者による端末装置２０に対する操作の内容を行動情報として取得し、作業者による操作に応じて変化する端末装置２０の状態を環境情報として取得することができる。The acquisition unit 131 can acquire information on the behavior of a worker performing a task as behavioral information, and acquire information on the environment related to the worker as environmental information. The acquisition unit 131 can also acquire the content of an operation performed by the worker on the terminal device 20 as behavioral information, and acquire the state of the terminal device 20 that changes in response to the operation performed by the worker as environmental information.

例えば、取得部１３１は、カメラによって撮影された作業者の視点方向の画像、マイクによって収集された作業者の周囲の音声を環境情報として取得する。For example, the acquisition unit 131 acquires an image captured by a camera in the direction of the worker's viewpoint and audio around the worker collected by a microphone as environmental information.

また、例えば、取得部１３１は、作業者によって操作される端末装置２０から出力される、画像及び音声等の情報を環境情報として取得する。 Also, for example, the acquisition unit 131 acquires information such as images and audio output from a terminal device 20 operated by a worker as environmental information.

取得部１３１が取得する画像は、キャプチャ画像等の静止画像であってもよいし、動画像であってもよい。また、取得部１３１は、収集された音声を音声ファイルとして取得してもよいし、収集された音声を変換したテキストを取得してもよい。The image acquired by the acquisition unit 131 may be a still image such as a captured image, or may be a moving image. The acquisition unit 131 may also acquire the collected voice as an audio file, or may acquire text obtained by converting the collected voice.

例えば、取得部１３１は、作業者に取り付けたセンサによって感知された作業者の身体の運動の情報、マイクによって収集された作業者の発話内容を行動情報として取得する。For example, the acquisition unit 131 acquires information on the worker's physical movements sensed by a sensor attached to the worker and the worker's speech collected by a microphone as behavioral information.

また、例えば、取得部１３１は、作業者が端末装置２０に対して行った操作の内容を行動情報として取得する。操作の内容は、キーボードを打鍵した時刻、打鍵したキーの種類、マウスの移動の軌跡、マウスをクリックした位置及び時刻等である。In addition, for example, the acquisition unit 131 acquires, as behavioral information, the content of an operation performed by the worker on the terminal device 20. The content of the operation includes the time when the keyboard was pressed, the type of key pressed, the trajectory of the mouse movement, the position and time when the mouse was clicked, etc.

このとき、取得部１３１は、作業者の端末装置２０に対する操作内容を行動情報として取得し、操作に応じて変化する端末装置２０の画面のキャプチャ画像を環境情報として取得することができる。At this time, the acquisition unit 131 can acquire the operation content of the worker on the terminal device 20 as behavioral information, and acquire a captured image of the screen of the terminal device 20 that changes in response to the operation as environmental information.

さらに、取得部１３１は、端末装置２０において作業者が操作しているアプリケーション又はウィンドウを識別する情報を操作内容とともに行動情報として取得してもよい。 Furthermore, the acquisition unit 131 may acquire information identifying the application or window being operated by the worker on the terminal device 20 as behavioral information together with the operation content.

抽出部１３２は、環境情報と行動情報とを対応付けて抽出する。言い換えると、抽出部１３２は、取得部１３１によって取得された環境情報と行動情報との組み合わせを抽出する。The extraction unit 132 extracts the environmental information and the behavioral information in association with each other. In other words, the extraction unit 132 extracts a combination of the environmental information and the behavioral information acquired by the acquisition unit 131.

抽出部１３２は、業務における行動に関する行動情報と、行動が取られる前の環境及び行動に影響を受けた環境のうちの少なくともいずれかに関する環境情報と、を対応付けて抽出することができる。The extraction unit 132 can extract behavioral information relating to behavior in business in association with environmental information relating to at least one of the environment before the behavior was taken and the environment influenced by the behavior.

図２から図５を用いて、環境情報と行動情報の対応付けについて説明する。図２、図３、図４及び図５は、環境情報と行動情報の例を示す図である。The correspondence between environmental information and behavioral information will be explained using Figures 2 to 5. Figures 2, 3, 4 and 5 are diagrams showing examples of environmental information and behavioral information.

図２及び図３には、端末装置２０を利用した業務に関する環境情報及び行動情報の例が示されている。 Figures 2 and 3 show examples of environmental information and behavioral information related to work using the terminal device 20.

図２の例では、抽出部１３２は、キャプチャ画像５１ａと操作内容５２ａとを対応付けて抽出する。キャプチャ画像５１ａは環境情報に相当する。操作内容５２ａは行動情報に相当する。In the example of FIG. 2, the extraction unit 132 extracts a captured image 51a and an operation content 52a in association with each other. The captured image 51a corresponds to environmental information. The operation content 52a corresponds to behavioral information.

作業者は、各種項目への入力を行い、ボタン５１１ａをマウスでクリック（押下）したものとする。この場合、操作内容５２ａには、操作イベントの種別がクリックであること、及びクリックが行われた際のカーソルの座標が含まれる。The worker inputs information into various fields and clicks (presses) button 511a with the mouse. In this case, operation content 52a includes information indicating that the type of operation event is a click, and the coordinates of the cursor when the click is performed.

この場合、例えば、行動が取られる前の環境は、ボタン５１１ａが押下される前の画面のキャプチャ画像である。一方、行動に影響を受けた環境は、ボタン５１１ａが押下された後に遷移する画面のキャプチャ画像である。In this case, for example, the environment before the action is taken is a captured image of the screen before button 511a is pressed, while the environment affected by the action is a captured image of the screen to which the action transitions after button 511a is pressed.

なお、抽出部１３２は、キャプチャ画像５１ａの全部を環境情報として抽出してもよいし、キャプチャ画像５１ａの一部であって、操作の対象となったボタン５１１ａを環境情報として抽出してもよい。 The extraction unit 132 may extract the entire captured image 51a as environmental information, or may extract only a part of the captured image 51a, that is, the button 511a that was the target of the operation, as environmental information.

なお、図２のキャプチャ画像５１ａは、ディスプレイに表示される画面の一部を切り出した画像である。一方で、環境情報は、ディスプレイに表示される画面全体のキャプチャ画像であって、ＯＳのタスクバー、及びブラウザ又は所定のアプリケーションのツールバー等を含む画像であってもよい。 Note that the capture image 51a in Fig. 2 is an image obtained by cutting out a part of the screen displayed on the display. On the other hand, the environmental information may be an image obtained by capturing the entire screen displayed on the display, including the task bar of the OS and the toolbar of the browser or a specified application.

図３の例では、抽出部１３２は、キャプチャ画像５１ｂと操作内容５２ｂとを対応付けて抽出する。キャプチャ画像５１ｂは環境情報に相当する。操作内容５２ｂは行動情報に相当する。In the example of FIG. 3, the extraction unit 132 extracts a captured image 51b and an operation content 52b in association with each other. The captured image 51b corresponds to environmental information. The operation content 52b corresponds to behavioral information.

作業者は、ローマ字で「ｙｏｋｏｓｕｋａ」と入力するために、まずテキストボックス５１１ｂにキーボードで「ｙ」キーを打鍵（押下）し、その後キーボードで「ｏ」キーを打鍵したものとする。 To input "yokosuka" in the Roman alphabet, the worker first types (presses) the "y" key on the keyboard in text box 511b, and then types the "o" key on the keyboard.

この場合、操作内容５２ｂには、操作イベントの種別が「ｏ」キーの押下であることが含まれる。In this case, operation content 52b includes the type of operation event being the pressing of the "o" key.

この場合、例えば、行動が取られる前の環境は、「ｏ」キーが押下される前の画面のキャプチャ画像である。一方、行動に影響を受けた環境は、「ｏ」キーが押下された後にテキストボックス５１１ｂに「よ」が入力された状態の画面のキャプチャ画像である。なお、ローマ字入力においては、「ｙ」の後に「ｏ」を入力することでひらがなの「よ」が表示される。In this case, for example, the environment before the action is taken is a captured image of the screen before the "o" key is pressed. On the other hand, the environment affected by the action is a captured image of the screen after the "o" key is pressed and "yo" is entered in text box 511b. Note that in romaji input, the hiragana "yo" is displayed by entering "o" after "y."

なお、抽出部１３２は、キャプチャ画像５１ｂの全部を環境情報として抽出してもよいし、キャプチャ画像５１ｂの一部であって、操作の対象となったテキストボックス５１１ｂを環境情報として抽出してもよい。 The extraction unit 132 may extract the entire captured image 51b as environmental information, or may extract only a part of the captured image 51b, that is, the text box 511b that was the target of the operation, as environmental information.

なお、図３のキャプチャ画像５１ｂは、ディスプレイに表示される画面の一部を切り出した画像である。一方で、環境情報は、ディスプレイに表示される画面全体のキャプチャ画像であって、ＯＳのタスクバー、及びブラウザ又は所定のアプリケーションのツールバー等を含む画像であってもよい。 Note that the capture image 51b in Fig. 3 is an image obtained by cutting out a part of the screen displayed on the display. On the other hand, the environmental information may be an image obtained by capturing the entire screen displayed on the display, including the task bar of the OS and the toolbar of the browser or a specified application.

抽出部１３２は、１つの行動情報に一連の複数の環境情報を対応付けて抽出してもよい。例えば、抽出部１３２は、端末装置２０に対する所定の操作内容が発生するまでの、時系列に沿った複数フレームの画面のキャプチャ画像を環境情報として抽出することができる。The extraction unit 132 may extract a series of multiple pieces of environmental information by associating one piece of behavioral information with the plurality of pieces of environmental information. For example, the extraction unit 132 may extract, as environmental information, capture images of multiple frames of a screen in a chronological order until a predetermined operation content occurs on the terminal device 20.

また、抽出部１３２は、抽出した環境情報に対応する環境との類似度が閾値以上である環境に関する環境情報を、行動情報と対応付けてさらに抽出してもよい。 In addition, the extraction unit 132 may further extract environmental information regarding an environment whose similarity to the environment corresponding to the extracted environmental information is equal to or greater than a threshold value, by associating it with behavioral information.

例えば、操作内容５２ｂに対応付けて、キャプチャ画像５１ｂに加えて、キャプチャ画像５１ｂに類似する過去のキャプチャ画像を抽出してもよい。For example, in addition to the captured image 51b, a past captured image similar to the captured image 51b may be extracted in association with the operation content 52b.

例えば、２つのキャプチャ画像に共通して示されている単語の数を類似度とする。そして、抽出部１３２は、類似度が閾値以上であるキャプチャ画像同士を類似しているものとみなす。For example, the number of words that appear in common in two captured images is taken as the similarity. The extraction unit 132 then considers captured images whose similarity is equal to or greater than a threshold to be similar.

すなわち、抽出部１３２は、行動情報が示す行動が実施された瞬間の環境情報だけでなく、過去の類似する環境情報を併せて抽出する。In other words, the extraction unit 132 extracts not only environmental information at the moment when the behavior indicated by the behavior information was performed, but also similar environmental information from the past.

図４には、電話対応業務に関する環境情報及び行動情報の例が示されている。 Figure 4 shows an example of environmental information and behavioral information related to telephone response work.

図４の例では、抽出部１３２は、顧客からの電話問い合わせの音声５１ｃと、オペレータの回答の音声５２ｃとを対応付けて抽出する。音声５１ｃは環境情報に相当する。音声５２ｃは行動情報に相当する。In the example of FIG. 4, the extraction unit 132 extracts voice 51c of a telephone inquiry from a customer and voice 52c of the operator's response in association with each other. Voice 51c corresponds to environmental information. Voice 52c corresponds to behavioral information.

このとき、抽出部１３２は、音声の代わりに当該音声を書き起こしたテキストを抽出してもよい。At this time, the extraction unit 132 may extract text transcribed from the audio instead of the audio.

この場合、例えば、行動が取られる前の環境は、顧客からの電話問い合わせの音声５１ｃである。一方、行動に影響を受けた環境は、オペレータの回答の音声５２ｃに対してさらに顧客が発した音声である。In this case, for example, the environment before the action is taken is the voice of the customer's telephone inquiry 51c. On the other hand, the environment influenced by the action is the voice uttered by the customer in response to the voice of the operator's response 52c.

図５には、設備の点検業務に関する環境情報及び行動情報の例が示されている。 Figure 5 shows an example of environmental information and behavioral information related to equipment inspection work.

図５の例では、抽出部１３２は、移動中の作業者の視点の映像５１ｄと、作業者が移動した目的地の位置５２ｄとを対応付けて抽出する。映像５１ｄは環境情報に相当する。位置５２ｄは行動情報に相当する。In the example of FIG. 5, the extraction unit 132 extracts an image 51d from the viewpoint of a moving worker and a position 52d of a destination to which the worker has moved in association with each other. The image 51d corresponds to environmental information. The position 52d corresponds to behavioral information.

この場合、例えば、行動が取られる前の環境は、移動中の作業者の視点の映像５１ｄである。一方、行動に影響を受けた環境は、移動後の作業者の視点の映像である。In this case, for example, the environment before the action is taken is the image 51d from the viewpoint of the worker while he is moving, while the environment affected by the action is the image from the viewpoint of the worker after he has moved.

設計部１３３は、抽出部１３２によって対応付けられた環境情報と行動情報との組み合わせに基づき、業務を実行するモデルの強化学習における環境及び報酬を設計する。The design unit 133 designs the environment and rewards in reinforcement learning of the model that performs the task based on the combination of environmental information and behavioral information associated by the extraction unit 132.

設計部１３３は、行動情報が示す行動を「正しい行動」と仮定し、当該行動がとられた際の環境において、エージェントが同様の行動を取った場合、報酬が付与されるように設計を行う。The design unit 133 assumes that the behavior indicated by the behavioral information is the "correct behavior" and designs the system so that a reward is given if the agent performs a similar behavior in the environment in which the behavior was performed.

図６は、環境設計と行動設計の例を示す図である。図６に示すように、設計部１３３は、環境設計と報酬設計を行う。 Figure 6 is a diagram showing an example of environmental design and behavioral design. As shown in Figure 6, the design unit 133 performs environmental design and reward design.

例えば、所定のボタンをクリックすることが「正しい行動」である場合、設計部１３３は、当該ボタン上でのクリック、及び当該ボタン上へのカーソルの移動という動作にプラスの報酬が与えられるように設計を行う。一方で、設計部１３３は、当該ボタン上以外でのクリックという動作にマイナスの報酬（罰則）が与えられるように設計を行う。For example, if clicking a specific button is the "right action," the design unit 133 designs the system so that a positive reward is given to clicking on that button and moving the cursor onto that button. On the other hand, the design unit 133 designs the system so that a negative reward (penalty) is given to clicking anywhere other than on the button.

さらに、設計部１３３は、作業者と同じ操作、すなわちボタンのクリックが行われた場合は環境を操作後のキャプチャ画像に遷移させ、ボタンのクリック以外の操作が行われた場合は環境を遷移させず同一の画面でエージェントに再度操作を実行させるように設計を行う。 Furthermore, the design department 133 designs the system so that when the same operation as that performed by the worker, i.e., clicking a button, is performed, the environment transitions to a captured image after the operation, and when an operation other than clicking a button is performed, the environment does not transition and the agent re-executes the operation on the same screen.

ここで、抽出部１３２は、行動が取られる前の環境に関する環境情報及び行動に影響を受けた環境に関する環境情報の両方を抽出するものとする。このとき、設計部１３３は、行動が取られる前の環境がエージェントに提示され、エージェントが「正しい行動」をとった場合に行動に影響を受けた環境に遷移するように設計を行う。Here, the extraction unit 132 extracts both environmental information related to the environment before the action is taken and environmental information related to the environment influenced by the action. At this time, the design unit 133 designs the environment before the action is taken to be presented to the agent, and transitions to the environment influenced by the action when the agent takes the "correct action."

なお、設計部１３３は、設計の内容を学習情報１２１として記憶部１２に格納する。 In addition, the design unit 133 stores the design contents in the memory unit 12 as learning information 121.

学習部１３４は、学習情報１２１に従いモデルの強化学習を行うための学習環境を構築する。さらに、学習部１３４は、学習環境においてエージェントに行動を実施させた結果を基にモデル情報１２２を更新する。The learning unit 134 constructs a learning environment for performing reinforcement learning of the model according to the learning information 121. Furthermore, the learning unit 134 updates the model information 122 based on the results of having the agent perform an action in the learning environment.

業務が端末装置２０を利用したものである場合、学習部１３４は、端末装置２０の画面のキャプチャ画像を環境としてエージェントに提示し、当該キャプチャ画像上でとるべき行動（クリックやカーソルの移動等）をエージェントに選択させる。If the task involves the use of the terminal device 20, the learning unit 134 presents the agent with a captured image of the screen of the terminal device 20 as the environment, and allows the agent to select an action to be taken on the captured image (such as clicking or moving the cursor).

業務が歩行による移動を伴うものである場合、学習部１３４は、歩行中の作業者の視点の動画像又は当該動画像から切り出した静止画像を環境としてエージェントに提示し、エージェントに進むべき方向を選択させる。 If the task involves movement by walking, the learning unit 134 presents the agent with a moving image from the viewpoint of the worker while walking or a still image extracted from the moving image as the environment, and allows the agent to select the direction in which to move.

このように、作業者の業務中の環境情報を学習環境として代用し、その環境上でエージェントに行動をとらせることで、実業務へ影響を与えず学習を行うことが可能となる。In this way, by using environmental information about the worker's work environment as a learning environment and having the agent act in that environment, it is possible to learn without affecting actual work.

なお、エージェントは、モデル情報１２２から構築したモデルの出力に応じて行動を選択する模擬的な主体である。 In addition, an agent is a simulated entity that selects actions based on the output of a model constructed from model information 122.

また、学習部１３４は、１人の作業者に関する環境情報と行動情報を基に学習を行っても良い。この場合、各作業者の特性を反映した行動の学習が見込める。The learning unit 134 may also perform learning based on environmental information and behavioral information about a single worker. In this case, it is expected that behavior that reflects the characteristics of each worker can be learned.

一方、学習部１３４は、複数の作業者に関する環境情報と行動情報を組み合わせて学習を行ってもよい。この場合、より効率的な作業手順の学習が見込める。On the other hand, the learning unit 134 may also perform learning by combining environmental information and behavioral information about multiple workers. In this case, it is expected that more efficient learning of work procedures can be achieved.

実行部１３５は、設計部１３３によって設計された環境及び報酬に基づいて強化学習を行ったモデルを用いて、業務に関する行動の系列を生成する。The execution unit 135 generates a sequence of tasks-related actions using a model that has undergone reinforcement learning based on the environment and rewards designed by the design unit 133.

例えば、実行部１３５は、モデル情報１２２から構築した学習済みのモデルに実際の業務における環境情報を入力して得られた出力に基づき行動を特定する。For example, the execution unit 135 inputs environmental information from actual business operations into a learned model constructed from the model information 122 and identifies an action based on the output obtained.

具体的には、実行部１３５は、作業者の業務中の環境情報から、学習済みのモデルを用いて行動系列を生成し、生成した行動系列に基づき業務の支援を行う。Specifically, the execution unit 135 generates a sequence of actions from environmental information during the worker's work using a learned model, and provides support for the work based on the generated sequence of actions.

業務の支援は、作業を直接行うものであってもよいし、業務において取るべき行動を作業者に提供するものであってもよい。 Work support may involve directly performing tasks or providing workers with actions to take in their work.

例えば、実行部１３５は、端末装置の画面のキャプチャ画像を基に、項目への自動入力を行う。また、例えば、実行部１３５は、作業者の視点映像から次に行う作業を推測し、推測した作業に関する情報を音声で提供してもよい。For example, the execution unit 135 automatically inputs information into the fields based on a captured image of the screen of the terminal device. In addition, for example, the execution unit 135 may infer the next task to be performed from the viewpoint video of the worker and provide information about the inferred task by voice.

［第１の実施形態の処理］
図７は、取得処理の流れを示すフローチャートである。図７に示すように、作業者が作業を終了していない場合（ステップＳ１０１、Ｎｏ）、取得部１３１は、作業者の業務中の環境情報を取得する（ステップＳ１０２）。 [Processing of the First Embodiment]
7 is a flowchart showing the flow of the acquisition process. As shown in FIG 7, if the worker has not finished the work (step S101, No), the acquisition unit 131 acquires environmental information about the worker's working environment (step S102).

そして、作業者が行動を取った場合（ステップＳ１０３、Ｙｅｓ）、取得部１３１は、作業者の行動情報を取得する（ステップＳ１０４）。Then, if the worker takes action (step S103, Yes), the acquisition unit 131 acquires the worker's action information (step S104).

作業者が行動を取らなかった場合（ステップＳ１０３、Ｎｏ）、取得部１３１はステップＳ１０１に戻る。 If the worker does not take any action (step S103, No), the acquisition unit 131 returns to step S101.

ここで、作業者が作業を終了した場合（ステップＳ１０１、Ｙｅｓ）、取得部１３１は処理を終了する。 Here, if the worker has finished the work (step S101, Yes), the acquisition unit 131 terminates the processing.

図８は、抽出処理の流れを示すフローチャートである。抽出部１３２は、取得部１３１によって取得されたすべての行動情報について、対応した環境が抽出されていない場合（ステップＳ２０１、Ｎｏ）、ターゲットとする行動情報を決定する（ステップＳ２０２）。 Figure 8 is a flowchart showing the flow of the extraction process. If no corresponding environment has been extracted for all of the behavioral information acquired by the acquisition unit 131 (step S201, No), the extraction unit 132 determines the behavioral information to be targeted (step S202).

そして、抽出部１３２は、ターゲットとした行動情報に対応する環境情報を抽出する（ステップＳ２０３）。 Then, the extraction unit 132 extracts environmental information corresponding to the targeted behavioral information (step S203).

抽出部１３２は、取得部１３１によって取得されたすべての行動情報について対応した環境が抽出された場合（ステップＳ２０１、Ｙｅｓ）、抽出部１３２は処理を終了する。If the extraction unit 132 extracts corresponding environments for all the behavioral information acquired by the acquisition unit 131 (step S201, Yes), the extraction unit 132 terminates the processing.

図９は、学習処理の流れを示すフローチャートである。ここでは、設計部１３３によって強化学習のための報酬及び環境が設計済みであるものとする。 Figure 9 is a flowchart showing the flow of the learning process. Here, it is assumed that the reward and environment for reinforcement learning have already been designed by the design unit 133.

図９に示すように、取得した環境情報について、作業者と同様の行動を生成できない場合（ステップＳ３０１、Ｎｏ）、学習部１３４は、ターゲットとする環境情報を決定する（ステップＳ３０２）。As shown in FIG. 9, if the acquired environmental information does not generate behavior similar to that of the worker (step S301, No), the learning unit 134 determines the target environmental information (step S302).

学習部１３４は、ターゲットとして環境情報を強化学習の環境として用いて、試行錯誤により取るべき行動について学習を行う（ステップＳ３０３）。学習部１３４は、学習の結果に基づき、モデル情報１２２を更新する。The learning unit 134 uses the environmental information as a target and as an environment for reinforcement learning to learn the actions to be taken by trial and error (step S303). The learning unit 134 updates the model information 122 based on the results of the learning.

取得した環境情報について、作業者と同様の行動を生成できるようになった場合（ステップＳ３０１、Ｙｅｓ）、学習部１３４は処理を終了する。 If it becomes possible to generate behavior similar to that of the worker based on the acquired environmental information (step S301, Yes), the learning unit 134 terminates the processing.

図１０は、実行処理の流れを示すフローチャートである。ここでは、実行部１３５は、モデル情報１２２から学習済みのモデルを構築するものとする。 Figure 10 is a flowchart showing the flow of the execution process. Here, the execution unit 135 constructs a trained model from the model information 122.

図１０に示すように、作業者が業務を終了していない場合（ステップＳ４０１、Ｎｏ）、実行部１３５は、作業者の環境情報を取得する（ステップＳ４０２）。As shown in FIG. 10, if the worker has not finished the work (step S401, No), the execution unit 135 acquires the worker's environmental information (step S402).

そして、実行部１３５は、環境に対する適切な行動系列を生成できる場合（ステップＳ４０３、Ｙｅｓ）、モデルを用いて生成した行動系列を実行する（ステップＳ４０４）。Then, if the execution unit 135 can generate an appropriate behavior sequence for the environment (step S403, Yes), it executes the behavior sequence generated using the model (step S404).

実行部１３５は、環境に対する適切な行動系列を生成できない場合（ステップＳ４０３、Ｎｏ）、ステップＳ４０１に戻る。 If the execution unit 135 cannot generate an appropriate behavior sequence for the environment (step S403, No), it returns to step S401.

作業者が業務を終了した場合（ステップＳ４０１、Ｙｅｓ）、実行部１３５は処理を終了する。 If the worker has finished the work (step S401, Yes), the execution unit 135 terminates the processing.

［第１の実施形態の効果］
これまで説明してきたように、取得部１３１は、業務における環境に関する情報である環境情報と、業務における行動に関する情報である行動情報と、を取得する。抽出部１３２は、環境情報と行動情報とを対応付けて抽出する。設計部１３３は、抽出部１３２によって対応付けられた環境情報と行動情報との組み合わせに基づき、業務を実行するモデルの強化学習における環境及び報酬を設計する。 [Effects of the First Embodiment]
As described above, the acquisition unit 131 acquires environmental information, which is information about the environment in a task, and behavioral information, which is information about behavior in the task. The extraction unit 132 extracts the environmental information and the behavioral information in association with each other. The design unit 133 designs an environment and a reward in reinforcement learning of a model that executes a task, based on the combination of the environmental information and the behavioral information associated with each other by the extraction unit 132.

このように、支援装置１０は、業務に関する行動及び環境を基に強化学習のための設計を行うことができる。その結果、実施形態によれば、業務を支援するための強化学習を低コストかつ容易に行うことが可能になる。In this way, the support device 10 can design reinforcement learning based on the behavior and environment related to the work. As a result, according to the embodiment, reinforcement learning for supporting work can be performed easily and at low cost.

また、取得部１３１は、業務を実施する作業者の行動に関する情報を行動情報として取得し、作業者に関する環境の情報を前記環境情報として取得する。このように、作業者の行動及び環境に注目することで容易に行動情報及び環境情報を取得することができる。In addition, the acquisition unit 131 acquires information on the behavior of the worker performing the task as behavioral information, and acquires information on the environment related to the worker as the environmental information. In this way, by focusing on the behavior and environment of the worker, it is possible to easily acquire behavioral information and environmental information.

また、取得部１３１は、作業者による端末装置２０に対する操作の内容を行動情報として取得し、作業者による操作に応じて変化する端末装置２０の状態を環境情報として取得する。これにより、端末装置を利用した業務に関して容易に環境情報を取得することができる。In addition, the acquisition unit 131 acquires the content of the operation of the terminal device 20 by the worker as behavioral information, and acquires the state of the terminal device 20 that changes in response to the operation by the worker as environmental information. This makes it possible to easily acquire environmental information related to the work using the terminal device.

また、抽出部１３２は、業務における行動に関する第１の行動情報と、行動が取られる前の環境及び行動に影響を受けた環境のうちの少なくともいずれかに関する第１の環境情報と、を対応付けて抽出する。これにより、関連する行動情報と環境情報から強化学習の設計を容易に行うことができる。In addition, the extraction unit 132 extracts first behavioral information related to behavior in a business and first environmental information related to at least one of the environment before the behavior was taken and the environment influenced by the behavior, in association with each other. This makes it easy to design reinforcement learning from the related behavioral information and environmental information.

また、抽出部１３２は、環境との類似度が閾値以上である環境に関する第２の環境情報を第１の行動情報と対応付けてさらに抽出する。このように、行動情報に類似する複数の環境情報を対応付けることにより、強化学習の精度を向上させることができる。In addition, the extraction unit 132 further extracts second environmental information related to an environment whose similarity to the environment is equal to or greater than a threshold value by associating it with the first behavioral information. In this way, by associating multiple pieces of environmental information similar to the behavioral information, the accuracy of reinforcement learning can be improved.

また、実行部１３５は、設計部１３３によって設計された環境及び報酬に基づいて強化学習を行ったモデルを用いて、業務に関する行動の系列を生成する。これにより、業務に関する人間の作業及び判断を削減することができる。 In addition, the execution unit 135 generates a sequence of actions related to the task using a model that has undergone reinforcement learning based on the environment and rewards designed by the design unit 133. This makes it possible to reduce human work and judgment related to the task.

［システム構成等］
また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示のように構成されていることを要しない。すなわち、各装置の分散及び統合の具体的形態は図示のものに限られず、その全部又は一部を、各種の負荷や使用状況等に応じて、任意の単位で機能的又は物理的に分散又は統合して構成することができる。さらに、各装置にて行われる各処理機能は、その全部又は任意の一部が、ＣＰＵ（Central Processing Unit）及び当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。なお、プログラムは、ＣＰＵだけでなく、ＧＰＵ等の他のプロセッサによって実行されてもよい。 [System configuration, etc.]
In addition, each component of each device shown in the figure is functionally conceptual, and does not necessarily have to be physically configured as shown in the figure. In other words, the specific form of distribution and integration of each device is not limited to that shown in the figure, and all or a part of it can be functionally or physically distributed or integrated in any unit depending on various loads, usage conditions, etc. Furthermore, each processing function performed by each device can be realized in whole or in part by a CPU (Central Processing Unit) and a program analyzed and executed by the CPU, or can be realized as hardware by wired logic. Note that the program may be executed not only by the CPU but also by other processors such as a GPU.

また、本実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部又は一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部又は一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 Furthermore, among the processes described in this embodiment, all or part of the processes described as being performed automatically can be performed manually, or all or part of the processes described as being performed manually can be performed automatically by a known method. In addition, the information including the processing procedures, control procedures, specific names, various data and parameters shown in the above documents and drawings can be changed as desired unless otherwise specified.

［プログラム］
一実施形態として、支援装置１０は、パッケージソフトウェアやオンラインソフトウェアとして上記の支援処理を実行する支援プログラムを所望のコンピュータにインストールさせることによって実装できる。例えば、上記の支援プログラムを情報処理装置に実行させることにより、情報処理装置を支援装置１０として機能させることができる。ここで言う情報処理装置には、デスクトップ型又はノート型のパーソナルコンピュータが含まれる。また、その他にも、情報処理装置にはスマートフォン、携帯電話機やＰＨＳ（Personal Handyphone System）等の移動体通信端末、さらには、ＰＤＡ（Personal Digital Assistant）等のスレート端末等がその範疇に含まれる。 [program]
In one embodiment, the support device 10 can be implemented by installing a support program that executes the above support processing as package software or online software on a desired computer. For example, the above support program can be executed by an information processing device, causing the information processing device to function as the support device 10. The information processing device here includes desktop or notebook personal computers. In addition, the information processing device also includes mobile communication terminals such as smartphones, mobile phones, and PHS (Personal Handyphone System), as well as slate terminals such as PDA (Personal Digital Assistant), etc.

また、支援装置１０は、ユーザが使用する端末装置をクライアントとし、当該クライアントに上記の支援処理に関するサービスを提供する支援サーバ装置として実装することもできる。例えば、支援サーバ装置は、業務における行動情報及び環境情報を入力とし、業務を支援するための学習済みモデルを出力とする支援サービスを提供するサーバ装置として実装される。この場合、支援サーバ装置は、Ｗｅｂサーバとして実装することとしてもよいし、アウトソーシングによって上記の支援処理に関するサービスを提供するクラウドとして実装することとしてもかまわない。 The support device 10 can also be implemented as a support server device that provides services related to the above support processing to a client, the client being a terminal device used by the user. For example, the support server device is implemented as a server device that provides a support service that takes action information and environmental information in a business as input and outputs a trained model for supporting the business. In this case, the support server device may be implemented as a web server, or may be implemented as a cloud that provides services related to the above support processing by outsourcing.

図１１は、支援プログラムを実行するコンピュータの一例を示す図である。コンピュータ１０００は、例えば、メモリ１０１０、ＣＰＵ１０２０を有する。また、コンピュータ１０００は、ハードディスクドライブインタフェース１０３０、ディスクドライブインタフェース１０４０、シリアルポートインタフェース１０５０、ビデオアダプタ１０６０、ネットワークインタフェース１０７０を有する。これらの各部は、バス１０８０によって接続される。 Figure 11 is a diagram showing an example of a computer that executes an assistance program. The computer 1000 has, for example, a memory 1010 and a CPU 1020. The computer 1000 also has a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. Each of these components is connected by a bus 1080.

メモリ１０１０は、ＲＯＭ（Read Only Memory）１０１１及びＲＡＭ（Random Access Memory）１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic Input Output System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１０３０は、ハードディスクドライブ１０９０に接続される。ディスクドライブインタフェース１０４０は、ディスクドライブ１１００に接続される。例えば磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブ１１００に挿入される。シリアルポートインタフェース１０５０は、例えばマウス１１１０、キーボード１１２０に接続される。ビデオアダプタ１０６０は、例えばディスプレイ１１３０に接続される。The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM (Random Access Memory) 1012. The ROM 1011 stores a boot program such as a BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to a hard disk drive 1090. The disk drive interface 1040 is connected to a disk drive 1100. A removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1100. The serial port interface 1050 is connected to a mouse 1110 and a keyboard 1120, for example. The video adapter 1060 is connected to a display 1130, for example.

ハードディスクドライブ１０９０は、例えば、ＯＳ１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３、プログラムデータ１０９４を記憶する。すなわち、支援装置１０の各処理を規定するプログラムは、コンピュータにより実行可能なコードが記述されたプログラムモジュール１０９３として実装される。プログラムモジュール１０９３は、例えばハードディスクドライブ１０９０に記憶される。例えば、支援装置１０における機能構成と同様の処理を実行するためのプログラムモジュール１０９３が、ハードディスクドライブ１０９０に記憶される。なお、ハードディスクドライブ１０９０は、ＳＳＤ（Solid State Drive）により代替されてもよい。The hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, the program that defines each process of the support device 10 is implemented as a program module 1093 in which computer-executable code is written. The program module 1093 is stored, for example, in the hard disk drive 1090. For example, a program module 1093 for executing processes similar to the functional configuration of the support device 10 is stored in the hard disk drive 1090. The hard disk drive 1090 may be replaced by an SSD (Solid State Drive).

また、上述した実施形態の処理で用いられる設定データは、プログラムデータ１０９４として、例えばメモリ１０１０やハードディスクドライブ１０９０に記憶される。そして、ＣＰＵ１０２０は、メモリ１０１０やハードディスクドライブ１０９０に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出して、上述した実施形態の処理を実行する。In addition, the setting data used in the processing of the above-described embodiment is stored as program data 1094, for example, in memory 1010 or hard disk drive 1090. Then, CPU 1020 reads out program module 1093 and program data 1094 stored in memory 1010 or hard disk drive 1090 into RAM 1012 as necessary, and executes the processing of the above-described embodiment.

なお、プログラムモジュール１０９３やプログラムデータ１０９４は、ハードディスクドライブ１０９０に記憶される場合に限らず、例えば着脱可能な記憶媒体に記憶され、ディスクドライブ１１００等を介してＣＰＵ１０２０によって読み出されてもよい。あるいは、プログラムモジュール１０９３及びプログラムデータ１０９４は、ネットワーク（ＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）等）を介して接続された他のコンピュータに記憶されてもよい。そして、プログラムモジュール１０９３及びプログラムデータ１０９４は、他のコンピュータから、ネットワークインタフェース１０７０を介してＣＰＵ１０２０によって読み出されてもよい。 Note that the program module 1093 and the program data 1094 are not limited to being stored in the hard disk drive 1090, but may be stored in, for example, a removable storage medium and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and the program data 1094 may be stored in another computer connected via a network (such as a local area network (LAN) or wide area network (WAN)). The program module 1093 and the program data 1094 may then be read by the CPU 1020 from the other computer via the network interface 1070.

１支援システム
１０支援装置
１１入出力部
１２記憶部
１３制御部
２０端末装置
５１ａ、５１ｂキャプチャ画像
５１ｃ、５２ｃ音声
５１ｄ映像
５２ａ、５２ｂ操作内容
５２ｄ位置
１２１学習情報
１２２モデル情報
１３１取得部
１３２抽出部
１３３設計部
１３４学習部
１３５実行部
５１１ａボタン
５１１ｂテキストボックス REFERENCE SIGNS LIST 1 Support system 10 Support device 11 Input/output unit 12 Storage unit 13 Control unit 20 Terminal device 51a, 51b Captured image 51c, 52c Audio 51d Video 52a, 52b Operation content 52d Position 121 Learning information 122 Model information 131 Acquisition unit 132 Extraction unit 133 Design unit 134 Learning unit 135 Execution unit 511a Button 511b Text box

Claims

an acquisition unit that acquires environmental information, which is information about an environment in a business, and behavioral information, which is information about behavior in the business;
an extracting unit that extracts, in association with each other, first behavioral information related to the behavior in the business and first environmental information related to at least one of an environment before the behavior was taken and an environment influenced by the behavior, and further extracts, in association with the first behavioral information, second environmental information related to an environment having a similarity to the environment equal to or greater than a threshold ;
a design unit that designs an environment and a reward in reinforcement learning of a model that executes the task based on a combination of the first environmental information , the second environmental information, and the first behavior information associated by the extraction unit;
An assistance device comprising:

The support device according to claim 1, characterized in that the acquisition unit acquires information about the behavior of a worker performing the task as the behavior information, and acquires information about an environment related to the worker as the environment information.

The support device according to claim 2, characterized in that the acquisition unit acquires the content of the operation of the terminal device by the worker as the behavioral information, and acquires the state of the terminal device that changes in response to the operation by the worker as the environmental information.

4. The support device according to claim 1, further comprising an execution unit that generates a series of actions related to a task using a model that has undergone reinforcement learning based on the environment and rewards designed by the design unit.

An assistance method executed by an assistance device, comprising:
An acquisition step of acquiring environmental information, which is information about an environment in a business, and behavioral information, which is information about behavior in the business;
an extraction step of extracting, in association with each other, first behavioral information related to the behavior in the business and first environmental information related to at least one of an environment before the behavior was taken and an environment influenced by the behavior, and further extracting, in association with the first behavioral information, second environmental information related to an environment having a similarity to the environment equal to or greater than a threshold ;
a design process of designing an environment and a reward in reinforcement learning of a model that executes the task based on a combination of the first environmental information , the second environmental information, and the first behavioral information that are associated by the extraction process;
Assistance method comprising:

A support program for causing a computer to function as the support device according to any one of claims 1 to 4 .