JP7745906B2

JP7745906B2 - Video analysis system

Info

Publication number: JP7745906B2
Application number: JP2023525140A
Authority: JP
Inventors: 渉三神谷
Original assignee: Imbesideyou Inc
Current assignee: Imbesideyou Inc
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2025-09-30
Anticipated expiration: 2041-05-31
Also published as: JPWO2022254492A1; WO2022254492A1

Description

本発明は、複数人の参加者で行われるオンラインセッションによって得られる動画像をもとに参加者の生体反応を解析する動画像分析システムに関する。 The present invention relates to a video image analysis system that analyzes the biological responses of participants based on video images obtained during an online session conducted with multiple participants.

発言者の発言に対して他者が受ける感情を解析する技術が知られている（例えば、特許文献１参照）。対象者の表情の変化を長期間にわたり時系列的に解析し、その間に抱いた感情を推定する技術も知られている（例えば、特許文献２参照）。感情の変化に最も影響を与えた要素を特定する技術も知られている（例えば、特許文献３～５参照）。対象者の普段の表情と現在の表情とを比較して、表情が暗い場合にアラートを発する技術も知られている（例えば、特許文献６参照）。対象者の平常時（無表情時）の表情と現在の表情とを比較して、対象者の感情の度合いを判定するようにした技術も知られている（例えば、特許文献７～９参照）。組織としての感情や、個人が感じるグループ内の雰囲気を分析する技術も知られている（例えば、特許文献１０、１１参照）。 Technology is known that analyzes the emotions felt by others in response to a speaker's comments (see, for example, Patent Document 1). Technology is also known that analyzes changes in a subject's facial expression over a long period of time and estimates the emotions felt during that time (see, for example, Patent Document 2). Technology is also known that identifies the factors that most influenced changes in emotions (see, for example, Patent Documents 3 to 5). Technology is also known that compares a subject's usual facial expression with their current facial expression and issues an alert if the facial expression is gloomy (see, for example, Patent Document 6). Technology is also known that compares a subject's normal (expressionless) facial expression with their current facial expression to determine the subject's level of emotion (see, for example, Patent Documents 7 to 9). Technology is also known that analyzes organizational emotions and the atmosphere felt by individuals within a group (see, for example, Patent Documents 10 and 11).

特開２０１９－５８６２５号公報JP 2019-58625 A 特開２０１６－１４９０６３号公報JP 2016-149063 A 特開２０２０－８６５５９号公報Japanese Patent Application Laid-Open No. 2020-86559 特開２０００－７６４２１号公報Japanese Patent Application Laid-Open No. 2000-76421 特開２０１７－２０１４９９号公報Japanese Patent Application Laid-Open No. 2017-201499 特開２０１８－１１２８３１号公報JP 2018-112831 A 特開２０１１－１５４６６５号公報JP 2011-154665 A 特開２０１２－８９４９号公報JP 2012-8949 A 特開２０１３－３００号公報Japanese Patent Application Laid-Open No. 2013-300 特開２０１１－１８６５２１号公報JP 2011-186521 A ＷＯ１５／１７４４２６号公報WO15/174426 publication

上述したすべての技術は、現実空間におけるコミュニケーションが主である状況におけるサブ的な機能にすぎない。即ち、昨今の業務のＤＸ（ＤｉｇｉｔａｌＴｒａｎｓｆｏｒｍａｔｉｏｎ）化や、世界的な感染症の流行等を受け、業務や授業等のコミュニケーションがオンラインで行われることが主とされる状況に生まれたものではない。 All of the above technologies are merely secondary functions in situations where communication in the real world is the primary mode of communication. In other words, they were not created in response to the recent digital transformation (DX) of work and the global pandemic, in which communication for work, classes, etc. is primarily conducted online.

本発明は、会議や講義等、オンラインコミュニケーションが主となる状況において、より効率的なコミュニケーションを行うために、これらのコミュニケーションを客観的に評価することを目的とする。 The present invention aims to objectively evaluate communication in situations where online communication is the norm, such as meetings and lectures, in order to ensure more efficient communication.

本発明によれば、
複数のユーザでオンラインセッションが行われる環境においてオンラインセッション中にユーザが画面に表示されているか否かによらず前記ユーザを撮影することによって得られる動画像をもとに前記ユーザの反応を分析する動画像分析システムであって、
複数の前記ユーザの夫々について、前記オンラインセッション中に前記ユーザを撮影することによって得られる動画像を取得する動画像取得部と、
前記動画像取得部により取得された動画像に基づいて、前記ユーザについて生体反応の変化を解析する解析部と、を備える動画像分析システムにおいて、
前記複数のユーザは、前記オンラインセッションを利用してカウンセリングを受ける対象患者と、カウンセラーとが少なくとも含まれており、
前記カウンセラーが対象患者に対して発した質問情報と、当該質問情報に対する前記対象患者の回答情報と、当該回答時における前記対象患者の前記生体反応に関する情報とを関連付けてカウンセリング記録として記憶する記憶部と、
前記前記カウンセリング記録を時系列によって比較して、前記対象患者に対する前記カウンセリングの効果を評価する比較部とを備える
動画像分析システム
が得られる。 According to the present invention,
A video analysis system that analyzes reactions of a user based on video images obtained by photographing the user in an environment where an online session is held by a plurality of users, regardless of whether the user is displayed on a screen during the online session, comprising:
a video acquisition unit that acquires, for each of the plurality of users, a video obtained by photographing the user during the online session;
an analysis unit that analyzes a change in a biological reaction of the user based on the moving image acquired by the moving image acquisition unit,
the plurality of users include at least a patient who is a target of receiving counseling using the online session and a counselor;
a storage unit that stores, as a counseling record, information on questions posed by the counselor to the target patient, information on the target patient's answers to the questions, and information on the target patient's biological reactions at the time of the answers, in association with each other;
and a comparison unit that compares the counseling records in chronological order to evaluate the effect of the counseling on the target patient.

本開示によれば、ビデオセッションの動画像を分析評価することにより、特に内容に関する評価を客観的に行うことができる。 According to the present disclosure, by analyzing and evaluating the video footage of a video session, evaluation can be made objectively, particularly with regard to content.

特に、本発明によれば、オンラインコミュニケーションが主となる状況において、より効率的なコミュニケーションを行うために、交わされたコミュニケーションを客観的に評価することができる。 In particular, the present invention makes it possible to objectively evaluate communication in order to communicate more efficiently in situations where online communication is the norm.

本発明の実施の形態によるシステム全体図を示す図である。1 is a diagram showing an overall system according to an embodiment of the present invention; 本発明の実施の形態による評価端末の機能ブロック図の一例である。FIG. 2 is a functional block diagram of an evaluation terminal according to an embodiment of the present invention; 本発明の実施の形態による評価端末の機能構成例１を示す図である。FIG. 2 is a diagram illustrating a first example of a functional configuration of an evaluation terminal according to an embodiment of the present invention. 本発明の実施の形態による評価端末の機能構成例２を示す図である。FIG. 10 is a diagram illustrating a second example of a functional configuration of the evaluation terminal according to the embodiment of the present invention. 本発明の実施の形態による評価端末の機能構成例３を示す図である。FIG. 10 is a diagram illustrating a third example of a functional configuration of an evaluation terminal according to an embodiment of the present invention. 本発明の実施の形態による評価端末の機能構成例３の他の構成を示す図である。FIG. 10 is a diagram illustrating another configuration of functional configuration example 3 of the evaluation terminal according to the embodiment of the present invention. 本発明の実施の形態による評価端末の機能構成例３の他の構成を示す図である。FIG. 10 is a diagram illustrating another configuration of functional configuration example 3 of the evaluation terminal according to the embodiment of the present invention. 本発明の実施の形態によるシステムの機能ブロック図である。FIG. 1 is a functional block diagram of a system according to an embodiment of the present invention. 本発明の実施の形態におけるシステムの利用イメージ図である。FIG. 1 is a diagram illustrating an image of how a system is used in an embodiment of the present invention. 本発明の実施の形態におけるシステムの利用イメージ図である。FIG. 1 is a diagram illustrating an image of how a system is used in an embodiment of the present invention. 本発明の実施の形態におけるシステムの利用イメージ図である。FIG. 1 is a diagram illustrating an image of how a system is used in an embodiment of the present invention.

本開示の実施形態の内容を列記して説明する。本開示は、以下のような構成を備える。
［項目１］
複数のユーザでオンラインセッションが行われる環境においてオンラインセッション中にユーザが画面に表示されているか否かによらず前記ユーザを撮影することによって得られる動画像をもとに前記ユーザの反応を分析する動画像分析システムであって、
複数の前記ユーザの夫々について、前記オンラインセッション中に前記ユーザを撮影することによって得られる動画像を取得する動画像取得部と、
前記動画像取得部により取得された動画像に基づいて、前記ユーザについて生体反応の変化を解析する解析部と、を備える動画像分析システムにおいて、
前記複数のユーザは、前記オンラインセッションを利用してカウンセリングを受ける対象患者と、カウンセラーとが少なくとも含まれており、
前記カウンセラーが対象患者に対して発した質問情報と、当該質問情報に対する前記対象患者の回答情報と、当該回答時における前記対象患者の前記生体反応に関する情報とを関連付けてカウンセリング記録として記憶する記憶部と、
前記前記カウンセリング記録を時系列によって比較して、前記対象患者に対する前記カウンセリングの効果を評価する比較部とを備える
動画像分析システム。
［項目２］
項目１に記載の動画像分析システムであって、
前記比較部は、前記カウンセラーが複数回にわたって発した同一の質問の夫々に対して前記対象患者が同一と評価し得る回答を行った場合における夫々の前記生体反応を時系列によって比較する、
動画像分析システム。
［項目３］
項目１又は項目２に記載の動画像分析システムであって、
前記解析部により前記ユーザについて解析された前記生体反応の変化に基づいて、複数のユーザ間で平準化された評価基準に従って前記ユーザの感情の度合いを評価する感情評価部を更に備え、
前記感情評価部は、平常時の生体反応に対する現在の生体反応の違いの大きさに基づく感情の程度であって、前記ユーザによる同じ感情の生起しやすさに応じて調整された感情の度合いを評価する、
動画像分析システム。
［項目４］
項目１乃至項目３のいずれかに記載の動画像分析システムであって、
一のオンラインセッションに関して前記ユーザについて解析された前記生体反応の変化が、前記一のオンラインセッションより時間的に前のオンラインセッションに関して前記ユーザについて解析された前記生体反応の変化と比べて特異的か否かを判定する特異判定部と、
前記特異判定部により特異的であると判定された生体反応の変化の内容および以前からの変化の大きさに基づいて、前記生体反応の変化パターンをクラスタリングするクラスタリング部とを更に備える、
動画像分析システム。
［項目５］
項目１乃至項目４のいずれかに記載の動画像分析システムであって、
前記解析部により解析された前記生体反応の変化に基づいて、複数の対象者間で平準化された評価基準に従って上記対象者の感情の度合いを評価する感情評価部を備え、
前記感情評価部は、平常時の生体反応に対する現在の生体反応の違いの大きさに基づく感情の程度であって、前記対象者による同じ感情の生起しやすさに応じて調整された感情の度合いを評価する、
動画像分析システム。 The present disclosure will be described below with reference to the following embodiments.
[Item 1]
A video analysis system that analyzes reactions of a user based on video images obtained by photographing the user in an environment where an online session is held by a plurality of users, regardless of whether the user is displayed on a screen during the online session, comprising:
a video acquisition unit that acquires, for each of the plurality of users, a video obtained by photographing the user during the online session;
an analysis unit that analyzes a change in a biological reaction of the user based on the moving image acquired by the moving image acquisition unit,
the plurality of users include at least a patient who is a target of receiving counseling using the online session and a counselor;
a storage unit that stores, as a counseling record, information on questions posed by the counselor to the target patient, information on the target patient's answers to the questions, and information on the target patient's biological reactions at the time of the answers, in association with each other;
a comparison unit that compares the counseling records in chronological order and evaluates the effectiveness of the counseling on the target patient.
[Item 2]
Item 1, a video analysis system according to item 1,
The comparison unit compares, in time series, the biological responses of the target patient when the target patient gives answers that can be evaluated as the same to the same questions asked by the counselor multiple times.
Video image analysis system.
[Item 3]
Item 1 or Item 2, a video analysis system according to item 1 or 2,
an emotion evaluation unit that evaluates a level of emotion of the user in accordance with evaluation criteria that are standardized among a plurality of users based on the change in the biological reaction of the user analyzed by the analysis unit;
the emotion evaluation unit evaluates the degree of emotion based on the magnitude of difference between a current biological response and a normal biological response, and the degree of emotion is adjusted according to the likelihood of the user experiencing the same emotion.
Video image analysis system.
[Item 4]
Item 3. The video analysis system according to any one of items 1 to 3,
a uniqueness determination unit that determines whether the change in the biological reaction analyzed for the user in one online session is unique compared to the change in the biological reaction analyzed for the user in an online session temporally earlier than the one online session;
and a clustering unit that clusters the change patterns of the biological reaction based on the content of the change in the biological reaction determined to be specific by the specific determination unit and the magnitude of the change from the past.
Video image analysis system.
[Item 5]
Item 4. The video analysis system according to any one of items 1 to 4,
an emotion evaluation unit that evaluates the degree of emotion of the subject in accordance with evaluation criteria that are standardized among a plurality of subjects based on the change in the biological reaction analyzed by the analysis unit;
the emotion evaluation unit evaluates the degree of emotion based on the magnitude of difference between a current biological response and a normal biological response, and the degree of emotion is adjusted according to the likelihood of the subject experiencing the same emotion.
Video image analysis system.

以下に添付図面を参照しながら、本開示の好適な実施の形態について詳細に説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 Preferred embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. Note that in this specification and drawings, components having substantially the same functional configuration will be assigned the same reference numerals, and redundant explanations will be omitted.

＜基本機能＞
本実施形態のビデオセッション評価システムは、複数人でビデオセッション（以下、一方向及び双方向含めてオンラインセッションという）が行われる環境において、当該複数人の中の解析対象者について他者とは異なる特異的な感情（自分または他人の言動に対して起こる気持ち。快・不快またはその程度など）を解析し評価するシステムである。オンラインセッションは、例えばオンライン会議、オンライン授業、オンラインチャットなどであり、複数の場所に設置された端末をインターネットなどの通信ネットワークを介してサーバに接続し、当該サーバを通じて複数の端末間で動画像をやり取りできるようにしたものである。オンラインセッションで扱う動画像には、端末を使用するユーザの顔画像や音声が含まれる。また、動画像には、複数のユーザが共有して閲覧する資料などの画像も含まれる。各端末の画面上に顔画像と資料画像とを切り替えて何れか一方のみを表示させたり、表示領域を分けて顔画像と資料画像とを同時に表示させたりすることが可能である。また、複数人のうち１人の画像を全画面表示させたり、一部または全部のユーザの画像を小画面に分割して表示させたりすることが可能である。端末を使用してオンラインセッションに参加する複数のユーザのうち、何れか１人または複数人を解析対象者として指定することが可能である。例えば、オンラインセッションの主導者、進行者または管理者（以下、まとめて主催者という）が何れかのユーザを解析対象者として指定する。オンラインセッションの主催者は、例えばオンライン授業の講師、オンライン会議の議長やファシリテータ、コーチングを目的としたセッションのコーチなどである。オンラインセッションの主催者は、オンラインセッションに参加する複数のユーザの中の一人であるのが普通であるが、オンラインセッションに参加しない別人であってもよい。なお、解析対象者を指定せず全ての参加者を解析対象としてもよい。また、オンラインセッションの主導者、進行者または管理者（以下、まとめて主催者という）が何れかのユーザを解析対象者として指定することも可能である。オンラインセッションの主催者は、例えばオンライン授業の講師、オンライン会議の議長やファシリテータ、コーチングを目的としたセッションのコーチなどである。オンラインセッションの主催者は、オンラインセッションに参加する複数のユーザの中の一人であるのが普通であるが、オンラインセッションに参加しない別人であってもよい。 <Basic functions>
The video session evaluation system of this embodiment is a system that analyzes and evaluates the unique emotions (e.g., feelings of pleasure or discomfort in response to one's own or another's words or actions, or the degree of such feelings) of a subject of analysis among multiple people in an environment where the multiple people are engaged in a video session (hereinafter, both one-way and two-way sessions are referred to as online sessions) that are different from those of the other people. The online session may be, for example, an online conference, online class, or online chat. Terminals installed in multiple locations are connected to a server via a communication network such as the Internet, allowing video images to be exchanged between the multiple terminals through the server. The video images handled in the online session include facial images and audio of the users using the terminals. The video images also include images of documents shared and viewed by multiple users. It is possible to switch between facial images and document images on the screen of each terminal, displaying only one of them, or to divide the display area and display both facial images and document images simultaneously. It is also possible to display an image of one of the multiple people in full screen, or to split the images of some or all of the users into smaller screens and display them. It is possible to designate one or more of the multiple users participating in an online session using terminals as analysis targets. For example, the leader, facilitator, or administrator of the online session (hereinafter collectively referred to as the "organizer") designates one of the users as the analysis target. The organizer of the online session is, for example, a lecturer of an online class, a chairperson or facilitator of an online conference, or a coach of a coaching session. The organizer of the online session is typically one of the multiple users participating in the online session, but may also be a different person who does not participate in the online session. Note that it is also possible to designate no analysis target and to use all participants as analysis targets. It is also possible for the leader, facilitator, or administrator of the online session (hereinafter collectively referred to as the "organizer") to designate one of the users as the analysis target. The organizer of the online session is, for example, a lecturer of an online class, a chairperson or facilitator of an online conference, or a coach of a coaching session. The organizer of the online session is typically one of the multiple users participating in the online session, but may also be a different person who does not participate in the online session.

本実施の形態によるビデオセッション評価システムは、複数の端末間においてビデオセッションセッションが確立された場合に、当該ビデオセッションから取得される少なくとも動画像を表示される。表示された動画像は、端末によって取得され、動画像内に含まれる少なくとも顔画像を所定のフレーム単位ごとに識別される。その後、識別された顔画像に関する評価値が算出される。当該評価値は必要に応じて共有される。特に、本実施の形態においては、取得した動画像は当該端末に保存され、端末上で分析評価され、その結果が当該端末のユーザに提供される。従って、例えば個人情報を含むビデオセッションや機密情報を含むビデオセッションであっても、その動画自体を外部の評価機関等に提供することなく分析評価できる。また、必要に応じて、当該評価結果（評価値）だけを外部端末に提供することによって、結果を可視化したり、クロス分析等行うことができる。 In the video session evaluation system according to this embodiment, when a video session is established between multiple terminals, at least a video image acquired from the video session is displayed. The displayed video image is acquired by the terminal, and at least facial images contained in the video image are identified for each predetermined frame. An evaluation value for the identified facial image is then calculated. The evaluation value is shared as needed. In particular, in this embodiment, the acquired video image is stored on the terminal, analyzed and evaluated on the terminal, and the results are provided to the user of the terminal. Therefore, even if a video session contains personal information or confidential information, for example, the video itself can be analyzed and evaluated without providing the video itself to an external evaluation agency, etc. Furthermore, if necessary, the evaluation results (evaluation values) can be provided to an external terminal, allowing the results to be visualized and cross-analysis, etc. to be performed.

図１に示されるように、本実施の形態によるビデオセッション評価システムは、少なくともカメラ部及びマイク部等の入力部と、ディスプレイ等の表示部とスピーカー等の出力部とを有するユーザ端末１０、２０と、ユーザ端末１０、２０に双方向のビデオセッションを提供するビデオセッションサービス端末３０と、ビデオセッションに関する評価の一部を行う評価端末４０とを備えている。 As shown in Figure 1, the video session evaluation system of this embodiment includes user terminals 10 and 20 having at least an input unit such as a camera unit and a microphone unit, a display unit such as a display, and an output unit such as a speaker, a video session service terminal 30 that provides a two-way video session to the user terminals 10 and 20, and an evaluation terminal 40 that performs part of the evaluation of the video session.

＜ハードウェア構成例＞
以下に説明する各機能ブロック、機能単位、機能モジュールは、例えばコンピュータに備えられたハードウェア、ＤＳＰ（Digital Signal Processor）、ソフトウェアの何れによっても構成することが可能である。例えばソフトウェアによって構成する場合、実際にはコンピュータのＣＰＵ、ＲＡＭ、ＲＯＭなどを備えて構成され、ＲＡＭやＲＯＭ、ハードディスクまたは半導体メモリ等の記録媒体に記憶されたプログラムが動作することによって実現される。本明細書において説明するシステム及び端末による一連の処理は、ソフトウェア、ハードウェア、及びソフトウェアとハードウェアとの組合せのいずれを用いて実現されてもよい。本実施形態に係る情報共有支援装置１０の各機能を実現するためのコンピュータプログラムを作製し、ＰＣ等に実装することが可能である。また、このようなコンピュータプログラムが格納された、コンピュータで読み取り可能な記録媒体も提供することが可能である。記録媒体は、例えば、磁気ディスク、光ディスク、光磁気ディスク、フラッシュメモリ等である。また、上記のコンピュータプログラムは、記録媒体を用いずに、例えばネットワークを介して配信されてもよい。 <Hardware configuration example>
Each functional block, functional unit, and functional module described below can be configured using, for example, hardware, a DSP (Digital Signal Processor), or software provided in a computer. For example, when configured using software, the computer is actually configured with a CPU, RAM, ROM, and the like, and is realized by running a program stored in a recording medium such as RAM, ROM, a hard disk, or a semiconductor memory. The series of processes performed by the system and terminal described herein can be realized using software, hardware, or a combination of software and hardware. A computer program for implementing each function of the information sharing support device 10 according to this embodiment can be created and installed on a PC or the like. A computer-readable recording medium on which such a computer program is stored can also be provided. Examples of such recording media include a magnetic disk, an optical disk, a magneto-optical disk, and a flash memory. The computer program may also be distributed, for example, via a network, without using a recording medium.

本実施の形態による評価端末は、ビデオセッションサービス端末から動画像を取得し、当該動画像内に含まれる少なくとも顔画像を所定のフレーム単位ごとに識別すると共に、顔画像に関する評価値を算出する（詳しくは後述する）。 The evaluation terminal in this embodiment acquires moving images from a video session service terminal, identifies at least facial images contained in the moving images for each specified frame unit, and calculates an evaluation value for the facial images (details will be described later).

＜動画の取得方法＞
図２に示されるように、ビデオセッションサービス端末が提供するビデオセッションサービス（以下、単に「本サービス」と言うことがある」）は、ユーザ端末１０、２０に対して双方向に画像および音声によって通信が可能となるものである。本サービスは、ユーザ端末のディスプレイに相手のユーザ端末のカメラ部で取得した動画像を表示し、相手のユーザ端末のマイク部で取得した音声をスピーカーから出力可能となっている。また、本サービスは双方の又はいずれかのユーザ端末によって、動画像及び音声（これらを合わせて「動画像等」という）を少なくともいずれかのユーザ端末上の記憶部に記録（レコーディング）することが可能に構成されている。記録された動画像情報Ｖｓ（以下「記録情報」という）は、記録を開始したユーザ端末にキャッシュされつついずれかのユーザ端末のローカルのみに記録されることとなる。ユーザは、必要があれば当該記録情報を本サービスの利用の範囲内で自分で視聴、他者に共有等行うこともできる。 <How to obtain the video>
As shown in FIG. 2, the video session service provided by the video session service terminal (hereinafter, sometimes simply referred to as "this service") enables two-way image and audio communication between user terminals 10 and 20. This service displays video captured by the camera of the other user terminal on the display of the user terminal, and can output audio captured by the microphone of the other user terminal from the speaker. This service is also configured to enable both or either user terminal to record video and audio (collectively referred to as "video, etc.") in the memory of at least one of the user terminals. The recorded video information Vs (hereinafter referred to as "recorded information") is cached on the user terminal that initiated the recording and is recorded only locally on one of the user terminals. If necessary, users can view the recorded information themselves or share it with others within the scope of their use of this service.

＜機能構成例１＞
図３は、本実施形態による構成例を示すブロック図である。図４に示すように、本実施形態のビデオセッション評価システムは、ユーザ端末１０が有する機能構成として実現される。すなわち、ユーザ端末１０はその機能として、動画像取得部１１、生体反応解析部１２、特異判定部１３、関連事象特定部１４、クラスタリング部１５および解析結果通知部１６を備えている。 <Functional configuration example 1>
Fig. 3 is a block diagram showing an example of the configuration according to this embodiment. As shown in Fig. 4, the video session evaluation system of this embodiment is realized as a functional configuration possessed by a user terminal 10. That is, the user terminal 10 has, as its functions, a video image acquisition unit 11, a biological reaction analysis unit 12, a peculiar determination unit 13, a related event identification unit 14, a clustering unit 15, and an analysis result notification unit 16.

動画像取得部１１は、オンラインセッション中に各端末が備えるカメラにより複数人（複数のユーザ）を撮影することによって得られる動画像を各端末から取得する。各端末から取得する動画像は、各端末の画面上に表示されるように設定されているものか否かは問わない。すなわち、動画像取得部１１は、各端末に表示中の動画像および非表示中の動画像を含めて、動画像を各端末から取得する。 The video acquisition unit 11 acquires video from each terminal, obtained by capturing images of multiple people (multiple users) using a camera equipped on each terminal during an online session. The video acquired from each terminal may or may not be configured to be displayed on the screen of each terminal. In other words, the video acquisition unit 11 acquires video from each terminal, including video that is currently being displayed and video that is not currently being displayed on each terminal.

生体反応解析部１２は、動画像取得部１１により取得された動画像（画面上に表示中のものか否かは問わない）に基づいて、複数人のそれぞれについて生体反応の変化を解析する。本実施形態において生体反応解析部１２は、動画像取得部１１により取得された動画像を画像のセット（フレーム画像の集まり）と音声とに分離し、それぞれから生体反応の変化を解析する。例えば、生体反応解析部１２は、動画像取得部１１により取得された動画像から分離したフレーム画像を用いてユーザの顔画像を解析することにより、表情、目線、脈拍、顔の動きの少なくとも１つに関する生体反応の変化を解析する。また、生体反応解析部１２は、動画像取得部１１により取得された動画像から分離した音声を解析することにより、ユーザの発言内容、声質の少なくとも１つに関する生体反応の変化を解析する。The biological response analysis unit 12 analyzes changes in biological responses for each of multiple people based on the video images acquired by the video image acquisition unit 11 (whether or not they are currently being displayed on a screen). In this embodiment, the biological response analysis unit 12 separates the video images acquired by the video image acquisition unit 11 into an image set (a collection of frame images) and audio, and analyzes changes in biological responses from each. For example, the biological response analysis unit 12 analyzes changes in biological responses related to at least one of facial expression, eye contact, pulse rate, and facial movement by analyzing the user's facial image using the frame images separated from the video images acquired by the video image acquisition unit 11. The biological response analysis unit 12 also analyzes changes in biological responses related to at least one of the user's speech content and voice quality by analyzing the audio separated from the video images acquired by the video image acquisition unit 11.

人は感情が変化すると、それが表情、目線、脈拍、顔の動き、発言内容、声質などの生体反応の変化となって現れる。本実施形態では、ユーザの生体反応の変化を解析することを通じて、ユーザの感情の変化を解析する。本実施形態において解析する感情は、一例として、快／不快の程度である。本実施形態において生体反応解析部１２は、生体反応の変化を所定の基準に従って数値化することにより、生体反応の変化の内容を反映させた生体反応指標値を算出する。 When a person's emotions change, this is reflected in changes in biological reactions such as facial expressions, eye movements, pulse rate, facial movements, speech content, and voice quality. In this embodiment, changes in the user's emotions are analyzed by analyzing changes in the user's biological reactions. In this embodiment, the emotion analyzed is, for example, the degree of comfort/discomfort. In this embodiment, the biological reaction analysis unit 12 quantifies changes in biological reactions according to predetermined criteria, thereby calculating a biological reaction index value that reflects the content of the changes in biological reactions.

表情の変化の解析は、例えば以下のようにして行う。すなわち、フレーム画像ごとに、フレーム画像の中から顔の領域を特定し、事前に機械学習させた画像解析モデルに従って特定した顔の表情を複数に分類する。そして、その分類結果に基づいて、連続するフレーム画像間でポジティブな表情変化が起きているか、ネガティブな表情変化が起きているか、およびどの程度の大きさの表情変化が起きているかを解析し、その解析結果に応じた表情変化指標値を出力する。 Analysis of facial expression changes is performed, for example, as follows: For each frame image, a facial region is identified within the frame image, and the identified facial expressions are classified into multiple categories according to an image analysis model that has been trained in advance by machine learning. Based on the classification results, it is then analyzed whether a positive or negative facial expression change has occurred between consecutive frame images, and the extent of the change, and a facial expression change index value corresponding to the analysis results is output.

目線の変化の解析は、例えば以下のようにして行う。すなわち、フレーム画像ごとに、フレーム画像の中から目の領域を特定し、両目の向きを解析することにより、ユーザがどこを見ているかを解析する。例えば、表示中の話者の顔を見ているか、表示中の共有資料を見ているか、画面の外を見ているかなどを解析する。また、目線の動きが大きいか小さいか、動きの頻度が多いか少ないかなどを解析するようにしてもよい。目線の変化はユーザの集中度にも関連する。生体反応解析部１２は、目線の変化の解析結果に応じた目線変化指標値を出力する。 Analysis of changes in eye direction is performed, for example, as follows. That is, for each frame image, the eye area is identified within the frame image, and the direction of both eyes is analyzed to analyze where the user is looking. For example, it is analyzed whether the user is looking at the face of the speaker currently being displayed, at the shared material currently being displayed, or looking off-screen. It is also possible to analyze whether the eye movements are large or small, and whether the movements are frequent or infrequent. Eye direction changes are also related to the user's level of concentration. The biological response analysis unit 12 outputs an eye direction change index value according to the results of the analysis of eye direction changes.

脈拍の変化の解析は、例えば以下のようにして行う。すなわち、フレーム画像ごとに、フレーム画像の中から顔の領域を特定する。そして、顔の色情報（ＲＧＢのＧ）の数値を捉える学習済みの画像解析モデルを用いて、顔表面のＧ色の変化を解析する。その結果を時間軸に合わせて並べることによって色情報の変化を表した波形を形成し、この波形から脈拍を特定する。人は緊張すると脈拍が速くなり、気持ちが落ち着くと脈拍が遅くなる。生体反応解析部１２は、脈拍の変化の解析結果に応じた脈拍変化指標値を出力する。 Analysis of pulse rate changes is performed, for example, as follows. That is, for each frame image, the facial area is identified within the frame image. Then, using a trained image analysis model that captures the numerical values of facial color information (G in RGB), changes in the G color of the facial surface are analyzed. The results are arranged along a time axis to form a waveform representing changes in color information, and the pulse rate is identified from this waveform. When a person is nervous, their pulse rate increases, and when they feel calm, their pulse rate slows down. The biological response analysis unit 12 outputs a pulse rate change index value according to the analysis results of pulse rate changes.

顔の動きの変化の解析は、例えば以下のようにして行う。すなわち、フレーム画像ごとに、フレーム画像の中から顔の領域を特定し、顔の向きを解析することにより、ユーザがどこを見ているかを解析する。例えば、表示中の話者の顔を見ているか、表示中の共有資料を見ているか、画面の外を見ているかなどを解析する。また、顔の動きが大きいか小さいか、動きの頻度が多いか少ないかなどを解析するようにしてもよい。顔の動きと目線の動きとを合わせて解析するようにしてもよい。例えば、表示中の話者の顔をまっすぐ見ているか、上目遣いまたは下目使いに見ているか、斜めから見ているかなどを解析するようにしてもよい。生体反応解析部１２は、顔の向きの変化の解析結果に応じた顔向き変化指標値を出力する。 Analysis of changes in facial movement is performed, for example, as follows. That is, for each frame image, a facial area is identified within the frame image, and the facial direction is analyzed to analyze where the user is looking. For example, it is analyzed whether the user is looking at the face of the currently displayed speaker, at the currently displayed shared material, or looking off-screen. It may also be analyzed whether the facial movements are large or small, or whether the movements are frequent or infrequent. It may also be analyzed by combining facial movements with eye movements. For example, it may be analyzed whether the user is looking straight at the currently displayed speaker's face, looking up or down, or looking at an angle. The biological response analysis unit 12 outputs a facial direction change index value according to the analysis results of changes in facial direction.

発言内容の解析は、例えば以下のようにして行う。すなわち、生体反応解析部１２は、指定した時間（例えば、３０～１５０秒程度の時間）の音声について公知の音声認識処理を行うことによって音声を文字列に変換し、当該文字列を形態素解析することにより、助詞、冠詞などの会話を表す上で不要なワードを取り除く。そして、残ったワードをベクトル化し、ポジティブな感情変化が起きているか、ネガティブな感情変化が起きているか、およびどの程度の大きさの感情変化が起きているかを解析し、その解析結果に応じた発言内容指標値を出力する。 The analysis of speech content is performed, for example, as follows. That is, the biological response analysis unit 12 converts speech into a string of characters by performing known speech recognition processing on a specified period of speech (for example, approximately 30 to 150 seconds), and then performs morphological analysis on the string of characters to remove words unnecessary for expressing conversation, such as particles and articles. The remaining words are then vectorized, and an analysis is performed to determine whether a positive or negative emotional change has occurred, and the extent of the emotional change, and a speech content index value corresponding to the analysis results is output.

声質の解析は、例えば以下のようにして行う。すなわち、生体反応解析部１２は、指定した時間（例えば、３０～１５０秒程度の時間）の音声について公知の音声解析処理を行うことによって音声の音響的特徴を特定する。そして、その音響的特徴に基づいて、ポジティブな声質変化が起きているか、ネガティブな声質変化が起きているか、およびどの程度の大きさの声質変化が起きているかを解析し、その解析結果に応じた声質変化指標値を出力する。 Voice quality analysis is performed, for example, as follows. That is, the biological response analysis unit 12 identifies the acoustic characteristics of the voice by performing known voice analysis processing on the voice for a specified period of time (for example, approximately 30 to 150 seconds). Then, based on the acoustic characteristics, it analyzes whether a positive or negative voice quality change has occurred, and the extent of the voice quality change, and outputs a voice quality change index value according to the analysis results.

生体反応解析部１２は、以上のようにして算出した表情変化指標値、目線変化指標値、脈拍変化指標値、顔向き変化指標値、発言内容指標値、声質変化指標値の少なくとも１つを用いて生体反応指標値を算出する。例えば、表情変化指標値、目線変化指標値、脈拍変化指標値、顔向き変化指標値、発言内容指標値および声質変化指標値を重み付け計算することにより、生体反応指標値を算出する。The biological response analysis unit 12 calculates a biological response index value using at least one of the facial expression change index value, eye gaze change index value, pulse rate change index value, facial direction change index value, speech content index value, and voice quality change index value calculated as described above. For example, the biological response index value is calculated by weighting the facial expression change index value, eye gaze change index value, pulse rate change index value, facial direction change index value, speech content index value, and voice quality change index value.

特異判定部１３は、解析対象者について解析された生体反応の変化が、解析対象者以外の他者について解析された生体反応の変化と比べて特異的か否かを判定する。本実施形態において、特異判定部１３は、生体反応解析部１２により複数のユーザのそれぞれについて算出された生体反応指標値に基づいて、解析対象者について解析された生体反応の変化が他者と比べて特異的か否かを判定する。 The uniqueness determination unit 13 determines whether the changes in biological reactions analyzed for the subject of analysis are unique compared to changes in biological reactions analyzed for other people other than the subject of analysis. In this embodiment, the uniqueness determination unit 13 determines whether the changes in biological reactions analyzed for the subject of analysis are unique compared to other people based on the biological reaction index values calculated for each of multiple users by the biological reaction analysis unit 12.

例えば、特異判定部１３は、生体反応解析部１２により複数人のそれぞれについて算出された生体反応指標値の分散を算出し、解析対象者について算出された生体反応指標値と分散との対比により、解析対象者について解析された生体反応の変化が他者と比べて特異的か否かを判定する。 For example, the uniqueness determination unit 13 calculates the variance of the biological reaction index values calculated for each of multiple people by the biological reaction analysis unit 12, and by comparing the biological reaction index value calculated for the person being analyzed with the variance, determines whether the changes in the biological reactions analyzed for the person being analyzed are unique compared to others.

解析対象者について解析された生体反応の変化が他者と比べて特異的である場合として、次の３パターンが考えられる。１つ目は、他者については特に大きな生体反応の変化が起きていないが、解析対象者について比較的大きな生体反応の変化が起きた場合である。２つ目は、解析対象者については特に大きな生体反応の変化が起きていないが、他者について比較的大きな生体反応の変化が起きた場合である。３つ目は、解析対象者についても他者についても比較的大きな生体反応の変化が起きているが、変化の内容が解析対象者と他者とで異なる場合である。 There are three possible cases where changes in the biological reactions analyzed for the subject are unique compared to others. The first is when no particularly significant changes in biological reactions occur in others, but a relatively large change in biological reactions occurs in the subject. The second is when no particularly significant changes in biological reactions occur in the subject, but a relatively large change in biological reactions occurs in others. The third is when relatively large changes in biological reactions occur in both the subject and others, but the nature of the change differs between the subject and others.

関連事象特定部１４は、特異判定部１３により特異的であると判定された生体反応の変化が起きたときに解析対象者、他者および環境の少なくとも１つに関して発生している事象を特定する。例えば、関連事象特定部１４は、解析対象者について特異的な生体反応の変化が起きたときにおける解析対象者自身の言動を動画像から特定する。また、関連事象特定部１４は、解析対象者について特異的な生体反応の変化が起きたときにおける他者の言動を動画像から特定する。また、関連事象特定部１４は、解析対象者について特異的な生体反応の変化が起きたときにおける環境を動画像から特定する。環境は、例えば画面に表示中の共有資料、解析対象者の背景に写っているものなどである。 The related event identification unit 14 identifies events occurring with respect to at least one of the subject, others, and the environment when a change in a biological reaction determined to be unique by the uniqueness determination unit 13 occurs. For example, the related event identification unit 14 identifies from video images the words and actions of the subject when a unique change in a biological reaction occurs in the subject. The related event identification unit 14 also identifies from video images the words and actions of others when a unique change in a biological reaction occurs in the subject. The related event identification unit 14 also identifies from video images the environment when a unique change in a biological reaction occurs in the subject. The environment may be, for example, shared documents displayed on the screen or something that appears in the background of the subject.

クラスタリング部１５は、特異判定部１３により特異的であると判定された生体反応の変化（例えば、目線、脈拍、顔の動き、発言内容、声質のうち１つまたは複数の組み合わせ）と、当該特異的な生体反応の変化が起きたときに発生している事象（関連事象特定部１４により特定された事象）との相関の程度を解析し、相関が一定レベル以上であると判定された場合に、その相関の解析結果に基づいて解析対象者または事象をクラスタリングする。 The clustering unit 15 analyzes the degree of correlation between a change in a biological reaction determined to be unique by the unique determination unit 13 (for example, one or more combinations of eye contact, pulse rate, facial movement, speech content, and voice quality) and an event occurring when the unique change in the biological reaction occurs (an event identified by the related event identification unit 14), and if the correlation is determined to be at a certain level or above, clusters the person or event being analyzed based on the results of the correlation analysis.

例えば、特異的な生体反応の変化がネガティブな感情変化に相当するものであり、当該特異的な生体反応の変化が起きたときに発生している事象もネガティブな事象である場合には一定レベル以上の相関が検出される。クラスタリング部１５は、その事象の内容やネガティブな度合い、相関の大きさなどに応じて、あらかじめセグメント化した複数の分類の何れかに解析対象者または事象をクラスタリングする。For example, if a specific change in a biological reaction corresponds to a negative emotional change and the event occurring when the specific change in the biological reaction occurs is also a negative event, a correlation above a certain level is detected. The clustering unit 15 clusters the subject of analysis or the event into one of multiple pre-segmented categories depending on the content of the event, the degree of negativity, the magnitude of correlation, etc.

同様に、特異的な生体反応の変化がポジティブな感情変化に相当するものであり、当該特異的な生体反応の変化が起きたときに発生している事象もポジティブな事象である場合には一定レベル以上の相関が検出される。クラスタリング部１５は、その事象の内容やポジティブな度合い、相関の大きさなどに応じて、あらかじめセグメント化した複数の分類の何れかに解析対象者または事象をクラスタリングする。Similarly, if a specific change in biological reaction corresponds to a positive emotional change and the event occurring when the specific change in biological reaction occurs is also a positive event, a correlation above a certain level is detected. The clustering unit 15 clusters the subject of analysis or event into one of multiple pre-segmented categories depending on the content of the event, the degree of positivity, the magnitude of correlation, etc.

解析結果通知部１６は、特異判定部１３により特異的であると判定された生体反応の変化、関連事象特定部１４により特定された事象、およびクラスタリング部１５によりクラスタリングされた分類の少なくとも１つを、解析対象者の指定者（解析対象者またはオンラインセッションの主催者）に通知する。 The analysis result notification unit 16 notifies the person designating the subject of analysis (the subject of analysis or the organizer of the online session) of at least one of the changes in biological reactions determined to be specific by the specificity determination unit 13, the events identified by the related event identification unit 14, and the classifications clustered by the clustering unit 15.

例えば、解析結果通知部１６は、解析対象者について他者とは異なる特異的な生体反応の変化が起きたとき（上述した３パターンの何れか。以下同様）に発生している事象として解析対象者自身の言動を解析対象者自身に通知する。これにより、解析対象者は、自分がある言動を行ったときに他者とは違う感情を持っていることを把握することができる。このとき、解析対象者について特定された特異的な生体反応の変化も併せて解析対象者に通知するようにしてもよい。さらに、対比される他者の生体反応の変化を更に解析対象者に通知するようにしてもよい。 For example, the analysis result notification unit 16 notifies the subject of analysis of the subject's own words and actions as an event occurring when a unique change in biological reaction occurs in the subject that is different from that of others (one of the three patterns described above; the same applies below). This allows the subject of analysis to understand that when he or she behaves in a certain way, he or she is feeling differently from others. At this time, the subject of analysis may also be notified of the unique changes in biological reaction identified for the subject of analysis. Furthermore, the subject of analysis may also be notified of changes in the biological reaction of others to be compared.

例えば、解析対象者が普段どおりの感情で特に意識せずに行った言動、または、解析対象者がある感情を伴って特に意識して行った言動に対して他者が受けた感情と、言動の際に解析対象者自身が抱いていた感情とが相違している場合に、そのときの解析対象者自身の言動が解析対象者に通知される。これにより、自分の意識に反して他者の受けが良い言動や他者の受けが良くない言動などを発見することも可能である。For example, if the emotions felt by others in response to words or actions made by the subject without any particular awareness and with normal emotions, or words or actions made by the subject with particular awareness and with a certain emotion, differ from the emotions felt by the subject himself at the time of the words or actions, the subject will be notified of his or her own words or actions at that time. This makes it possible to discover words or actions that are well-received or unwell by others, despite the subject's own awareness.

また、解析結果通知部１６は、解析対象者について他者とは異なる特異的な生体反応の変化が起きたときに発生している事象を、特異的な生体反応の変化と共にオンラインセッションの主催者に通知する。これにより、オンラインセッションの主催者は、指定した解析対象者に特有の現象として、どのような事象がどのような感情の変化に影響を与えているのかを知ることができる。そして、その把握した内容に応じて適切な処置を解析対象者に対して行うことが可能となる。 The analysis result notification unit 16 also notifies the organizer of the online session of events occurring when a unique change in biological response occurs in the analysis subject that is different from that of others, along with the unique change in biological response. This allows the organizer of the online session to know what events are influencing what emotional changes as a phenomenon unique to the designated analysis subject. It is then possible to take appropriate measures for the analysis subject based on the information obtained.

また、解析結果通知部１６は、解析対象者について他者とは異なる特異的な生体反応の変化が起きたときに発生している事象または解析対象者のクラスタリング結果をオンラインセッションの主催者に通知する。これにより、オンラインセッションの主催者は、指定した解析対象者がどの分類にクラスタリングされたかによって、解析対象者に特有の行動の傾向を把握したり、今後起こり得る行動や状態などを予測したりすることができる。そして、それに対して適切な処置を解析対象者に対して行うことが可能となる。 The analysis result notification unit 16 also notifies the organizer of the online session of events occurring when a unique change in the analysis subject's biological response occurs that is different from that of others, or of the clustering results for the analysis subject. This allows the organizer of the online session to understand the analysis subject's unique behavioral tendencies and predict possible future behaviors and conditions, depending on which category the specified analysis subject is clustered into. This then makes it possible to take appropriate measures for the analysis subject.

なお、上記実施形態では、生体反応の変化を所定の基準に従って数値化することによって生体反応指標値を算出し、複数人のそれぞれについて算出された生体反応指標値に基づいて、解析対象者について解析された生体反応の変化が他者と比べて特異的か否かを判定する例について説明したが、この例に限定されない。例えば、以下のようにしてもよい。 In the above embodiment, an example was described in which biological reaction index values are calculated by quantifying changes in biological reactions according to predetermined criteria, and whether or not the changes in biological reactions analyzed for the subject of analysis are unique compared to others is determined based on the biological reaction index values calculated for each of multiple people, but this example is not limiting. For example, the following may be used.

すなわち、生体反応解析部１２は、複数人のそれぞれについて目線の動きを解析して目線の方向を示すヒートマップを生成する。特異判定部１３は、生体反応解析部１２により解析対象者について生成されたヒートマップと他者について生成されたヒートマップとの対比により、解析対象者について解析された生体反応の変化が、他者について解析された生体反応の変化と比べて特異的か否かを判定する。 That is, the biological response analysis unit 12 analyzes the eye movements of each of multiple people and generates a heat map showing the direction of the eyes. The peculiar determination unit 13 compares the heat map generated by the biological response analysis unit 12 for the analysis subject with the heat maps generated for other people, and determines whether the changes in the biological responses analyzed for the analysis subject are unique compared to the changes in the biological responses analyzed for other people.

このように、本実施の形態においては、ビデオセッションの動画像をユーザ端末１０のローカルストレージに保存し、ユーザ端末１０上で上述した分析を行うこととしている。ユーザ端末１０のマシンスペックに依存する可能性があるとはいえ、動画像の情報を外部に提供することなく分析することが可能となる。 In this way, in this embodiment, the video footage of the video session is stored in the local storage of the user terminal 10, and the above-mentioned analysis is performed on the user terminal 10. Although this may depend on the machine specifications of the user terminal 10, it is possible to analyze the video footage without providing it to an external party.

＜機能構成例２＞
図４に示すように、本実施形態のビデオセッション評価システムは、機能構成として、動画像取得部１１、生体反応解析部１２および反応情報提示部１３ａを備えていてもよい。反応情報提示部１３ａは、画面に表示されていない参加者を含めて生体反応解析部１２ａにより解析された生体反応の変化を示す情報を提示する。例えば、反応情報提示部１３ａは、生体反応の変化を示す情報をオンラインセッションの主導者、進行者または管理者（以下、まとめて主催者という）に提示する。オンラインセッションの主催者は、例えばオンライン授業の講師、オンライン会議の議長やファシリテータ、コーチングを目的としたセッションのコーチなどである。オンラインセッションの主催者は、オンラインセッションに参加する複数のユーザの中の一人であるのが普通であるが、オンラインセッションに参加しない別人であってもよい。 <Functional configuration example 2>
As shown in FIG. 4 , the video session evaluation system of this embodiment may include, as its functional configuration, a video image acquisition unit 11, a biological response analysis unit 12, and a response information presentation unit 13a. The response information presentation unit 13a presents information indicating changes in biological responses analyzed by the biological response analysis unit 12a, including participants not displayed on the screen. For example, the response information presentation unit 13a presents information indicating changes in biological responses to a leader, facilitator, or manager of the online session (hereinafter collectively referred to as the organizer). The organizer of the online session may be, for example, a lecturer of an online class, a chairperson or facilitator of an online conference, or a coach of a session for coaching purposes. The organizer of the online session is typically one of multiple users participating in the online session, but may also be a different person who does not participate in the online session.

このようにすることにより、オンラインセッションの主催者は、複数人でオンラインセッションが行われる環境において、画面に表示されていない参加者の様子も把握することができる。 By doing this, the organizer of an online session can keep track of participants who are not displayed on the screen in an environment where an online session is being held with multiple people.

＜機能構成例３＞
図５は、本実施形態による構成例を示すブロック図である。図５に示すように、本実施形態のビデオセッション評価システムは、機能構成として、上述した実施の形態１と類似する機能については同一つの参照符号を付して説明を省略することがある。本実施の形態によるシステムは、ビデオセッションの映像を取得するカメラ部及び音声を取得するマイク部と、動画像を分析及び評価する解析部、取得した動画像を評価することによって得られた情報に基づいて表示オブジェクト（後述する）を生成するオブジェクト生成部、前記ビデオセッション実行中にビデオセッションの動画像と表示オブジェクトの両方を表示する表示部と、を備えている。 <Functional Configuration Example 3>
5 is a block diagram showing an example of a configuration according to this embodiment. As shown in FIG. 5, the video session evaluation system of this embodiment has a functional configuration in which functions similar to those of the first embodiment described above are assigned the same reference numerals, and descriptions thereof may be omitted. The system according to this embodiment includes a camera unit that acquires video of the video session, a microphone unit that acquires audio, an analysis unit that analyzes and evaluates the video, an object generation unit that generates display objects (described later) based on information obtained by evaluating the acquired video, and a display unit that displays both the video of the video session and the display object during execution of the video session.

解析部は、上述した説明と同様に、動画像取得部１１、生体反応解析部１２、特異判定部１３、関連事象特定部１４、クラスタリング部１５および解析結果通知部１６を備えている。各要素の機能については上述したとおりである。 As explained above, the analysis unit includes a video image acquisition unit 11, a biological response analysis unit 12, a peculiar determination unit 13, a related event identification unit 14, a clustering unit 15, and an analysis result notification unit 16. The functions of each element are as described above.

オブジェクト生成部は、解析部によってビデオセッションから取得される動画像を解析した結果に基づいて、必要に応じて、当該認識した顔の部分を示すオブジェクトと、上述した分析・評価した内容を示す情報を当該動画像に重畳して表示する。当該オブジェクトは、複数人の顔が動画像内に移っている場合には、複数人全員の顔を識別し、表示することとしてもよい。また、オブジェクトは、例えば、相手側の端末において、ビデオセッションのカメラ機能を停止している場合（即ち、物理的にカメラを覆う等ではなく、ビデオセッションのアプリケーション内においてソフトウェア的に停止している場合）であっても、相手側のカメラで相手の顔を認識していた場合には、相手の顔が位置している部分にオブジェクトを表示することとしてもよい。これにより、カメラ機能がオフになっていたとしても、相手側が端末の前にいることがお互い確認することが可能となる。この場合、例えば、ビデオセッションのアプリケーションにおいては、カメラから取得した情報を非表示にする一方、解析部によって認識された顔に対応するオブジェクトを表示することとしてもよい。また、ビデオセッションから取得される映像情報と、解析部によって認識され得られた情報とを異なる表示レイヤーに分け、前者の情報に関するレイヤーを非表示にすることとしてもよい。オブジェクトは、複数の動画像を表示する領域がある場合には、すべての領域又は一部の領域のみに表示することとしてもよい。例えば、ゲスト側の動画像のみに表示することとしてもよい。Based on the analysis results of the video acquired from the video session by the analysis unit, the object generation unit, if necessary, superimposes and displays an object representing the recognized face and information representing the analyzed and evaluated content on the video. When multiple faces appear in the video, the object may identify and display all of the faces. Furthermore, even if the camera function of the other party's device is disabled (i.e., the camera is disabled by software within the video session application, rather than by physically covering it), the object may be displayed in the area where the other party's face is located if the other party's face is recognized by the camera. This allows both parties to confirm that the other party is in front of the device, even if the camera function is turned off. In this case, for example, the video session application may hide the information acquired from the camera, while displaying an object corresponding to the face recognized by the analysis unit. Furthermore, the video information acquired from the video session and the information recognized by the analysis unit may be displayed on different display layers, with the layer related to the former information hidden. When there are areas for displaying multiple videos, the object may be displayed in all areas or only in some areas. For example, the object may be displayed only in the video on the guest side.

以上説明した基本構成例１乃至基本構成例３において説明した発明の実施の形態は、単独の装置として実現されてもよく、一部または全部がネットワークで接続された複数の装置（例えばクラウドサーバ）等により実現されてもよい。例えば、各端末１０の制御部１１０およびストレージ１３０は、互いにネットワークで接続された異なるサーバにより実現されてもよい。即ち、本システムは、ユーザ端末１０、２０と、ユーザ端末１０、２０に双方向のビデオセッションを提供するビデオセッションサービス端末３０と、ビデオセッションに関する評価を行う評価端末４０とを含んでいるところ、以下のような構成のバリエーション組み合わせが考えられる。
（１）すべてをユーザ端末のみで処理
図６に示されるように、解析部による処理をビデオセッションを行っている端末で行うことにより、（一定の処理能力は必要なものの）ビデオセッションを行っている時間と同時に（リアルタイムに）分析・評価結果を得ることができる。
（２）ユーザ端末と評価端末とで処理
図７に示されるように、ネットワーク等で接続された評価端末に解析部を備えさせることとしてもよい。この場合、ユーザ端末で取得された動画像は、ビデオセッションと同時に又は事後的に評価端末に共有され、評価端末における解析部によって分析・評価されたのちに、オブジェクト５０及びオブジェクト１００の情報がユーザ端末に動画像データと共に又は別に（即ち、少なくとも解析データを含む情報が）共有され表示部に表示される。 The embodiments of the invention described above in Basic Configuration Examples 1 to 3 may be realized as a single device, or may be realized by a plurality of devices (e.g., cloud servers) partially or entirely connected via a network. For example, the control unit 110 and storage 130 of each terminal 10 may be realized by different servers connected to each other via a network. That is, this system includes user terminals 10 and 20, a video session service terminal 30 that provides two-way video sessions to the user terminals 10 and 20, and an evaluation terminal 40 that evaluates the video sessions. The following variations and combinations of configurations are possible:
(1) All processing is done on the user's terminal. As shown in Figure 6, by performing processing by the analysis unit on the terminal where the video session is being conducted, it is possible to obtain analysis and evaluation results simultaneously (in real time) with the time the video session is being conducted (although a certain level of processing power is required).
(2) Processing in the User Terminal and the Evaluation Terminal As shown in Fig. 7, an analysis unit may be provided in an evaluation terminal connected via a network, etc. In this case, the video captured by the user terminal is shared with the evaluation terminal simultaneously with or after the video session, and after being analyzed and evaluated by the analysis unit in the evaluation terminal, information on the objects 50 and 100 is shared with the user terminal together with or separately from the video data (i.e., information including at least the analysis data) and displayed on the display unit.

上述した機能構成例１乃至機能構成例３の各構成又はそれらの組み合わせを用いて、以下のシステムが実現する。 The following system is realized using each of the configurations of functional configuration example 1 to functional configuration example 3 described above or a combination of them.

＜実施の形態＞ <Embodiment>

本実施の形態による動画像分析システムは、複数のユーザでオンラインセッションが行われる環境においてオンラインセッション中にユーザが画面に表示されているか否かによらずユーザを撮影することによって得られる動画像をもとにユーザの反応を分析する動画像分析システムである。 The video analysis system of this embodiment is a video analysis system that analyzes user reactions based on video images obtained by photographing users in an environment where online sessions are held by multiple users, regardless of whether the users are displayed on the screen during the online session.

図１０に示されるように、システムは、複数のユーザの夫々について、オンラインセッション中にユーザを撮影することによって得られる動画像を取得する動画像取得部を備えている。
動画像取得部により取得された動画像に基づいて、ユーザについて生体反応の変化を解析する解析部を備えている。複数のユーザは、オンラインセッションを利用してカウンセリングを受ける対象患者と、カウンセラーとが少なくとも含まれている。市捨ては、カウンセラーが対象患者に対して発した質問情報と、当該質問情報に対する対象患者の回答情報と、当該回答時における対象患者の生体反応に関する情報とを関連付けてカウンセリング記録として記憶する記憶部を備えている。システムは、カウンセリング記録を時系列によって比較して、対象患者に対するカウンセリングの効果を評価する比較部を備えている。 As shown in FIG. 10, the system includes a video image acquisition unit that acquires video images of each of a plurality of users by photographing the user during an online session.
The system includes an analysis unit that analyzes changes in the user's biological reactions based on video images acquired by the video image acquisition unit. The multiple users include at least a target patient receiving counseling using an online session and a counselor. The system includes a memory unit that associates and stores, as a counseling record, information about questions posed by the counselor to the target patient, information about the target patient's answers to the questions, and information about the target patient's biological reactions at the time of the answers. The system includes a comparison unit that compares the counseling records chronologically to evaluate the effectiveness of counseling on the target patient.

図１１乃至図１３に示されるように、比較部は、カウンセラーが複数回にわたって発した同一の質問の夫々に対して対象患者が同一と評価し得る回答を行った場合における夫々の生体反応を時系列によって比較する。即ち、図１１に示されるようにカウンセラーが意匠患者に対して行った質問データに対する回答データとその回答時の生体反応の変化とが関連付けられて記憶される。 As shown in Figures 11 to 13, the comparison unit compares, in chronological order, the biological responses of each patient when the counselor asks the same question multiple times and the patient responds in a way that can be evaluated as identical. That is, as shown in Figure 11, the response data to the questions asked by the counselor to the patient is associated with the changes in the biological responses at the time of the response and is stored.

図１３に示されるように、カウンセラーによる「調子はどうですか」という質問に対して、「元気です」と毎回同じように回答したとしても、図１２に示されるように、生体反応の幸せを示すスコアは異なっていることが考えられる。かかる生体反応の変化をとらえることによりカウンセリングによる治療がうまく進んでいるのかどうかといったことが推定できる。As shown in Figure 13, even if a patient always answers the counselor's question, "How are you?" with the same answer, "I'm fine," the happiness score of the biological response may differ, as shown in Figure 12. By capturing such changes in biological response, it is possible to estimate whether the counseling treatment is progressing well.

以上説明した実施の形態を適宜組み合わせて実施することとしてもよい。また、本明細書に記載された効果は、あくまで説明的または例示的なものであって限定的ではない。つまり、本開示に係る技術は、上記の効果とともに、または上記の効果に代えて、本明細書の記載から当業者には明らかな他の効果を奏しうる。 The embodiments described above may be implemented in appropriate combinations. Furthermore, the effects described in this specification are merely descriptive or exemplary and are not limiting. In other words, the technology disclosed herein may achieve other effects that will be apparent to those skilled in the art from the description of this specification, in addition to or in place of the above effects.

１０、２０ユーザ端末
３０ビデオセッションサービス端末
４０評価端末 10, 20 User terminal 30 Video session service terminal 40 Evaluation terminal

Claims

A video analysis system that analyzes reactions of a user based on video images obtained by photographing the user in an environment where an online session is held by a plurality of users, regardless of whether the user is displayed on a screen during the online session, comprising:
a video acquisition unit that acquires, for each of the plurality of users, a video obtained by photographing the user during the online session;
an analysis unit that analyzes a change in a biological reaction of the user based on the moving image acquired by the moving image acquisition unit,
the plurality of users include at least a patient who is a target of receiving counseling using the online session and a counselor;
a storage unit that stores, as a counseling record, information on questions posed by the counselor to the target patient, information on the target patient's answers to the questions, and information on the target patient's biological reactions at the time of the answers, in association with each other;
and a comparison unit that, when the target patient gives answers that can be evaluated as the same to each of the same questions asked by the counselor multiple times, calculates index values corresponding to changes in the biological reaction at the time of each of the answers based on the counseling record, compares the index values in chronological order , and evaluates the effectiveness of the counseling on the target patient.

The video analysis system according to claim 1 ,
an emotion evaluation unit that evaluates a level of emotion of the user in accordance with evaluation criteria that are standardized among a plurality of users based on the change in the biological reaction of the user analyzed by the analysis unit;
the emotion evaluation unit evaluates the degree of emotion based on the magnitude of difference between a current biological response and a normal biological response, and the degree of emotion is adjusted according to the likelihood of the user experiencing the same emotion.
Video image analysis system.

3. The video analysis system according to claim 1 ,
a uniqueness determination unit that determines whether the change in the biological reaction analyzed for the user in one online session is unique compared to the change in the biological reaction analyzed for the user in an online session temporally earlier than the one online session;
and a clustering unit that clusters the change patterns of the biological reaction based on the content of the change in the biological reaction determined to be specific by the specificity determination unit and the magnitude of the change from the past.
Video image analysis system.

4. The video analysis system according to claim 1,
an emotion evaluation unit that evaluates the degree of emotion of the subject in accordance with evaluation criteria that are standardized among a plurality of subjects based on the change in the biological reaction analyzed by the analysis unit;
the emotion evaluation unit evaluates the degree of emotion based on the magnitude of difference between a current biological response and a normal biological response, and the degree of emotion is adjusted according to the likelihood of the subject experiencing the same emotion.
Video image analysis system.