JP7528638B2

JP7528638B2 - Communication System

Info

Publication number: JP7528638B2
Application number: JP2020143020A
Authority: JP
Inventors: 光留菅田
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2020-08-26
Filing date: 2020-08-26
Publication date: 2024-08-06
Anticipated expiration: 2040-08-26
Also published as: JP2022038487A

Description

本発明はコミュニケーションシステムに関する。 The present invention relates to a communication system.

特許文献１には、ユーザの周囲とのコミュニケーション参加度合いを、画像情報等を用いて検出する技術が開示されている。 Patent document 1 discloses a technology that uses image information, etc. to detect the degree to which a user is participating in communication with those around them.

特開２００８－０４６８０１号公報JP 2008-046801 A

情報処理コストの高い画像情報を用いない場合であっても、検出精度の低下を抑制して、ユーザの周囲とのコミュニケーション参加度合いを検出できるコミュニケーションシステムを実現したいという課題があった。 There was a need to realize a communication system that could detect the user's level of participation in communication with those around them while suppressing a decrease in detection accuracy, even without using image information, which has a high information processing cost.

本開示では、そのような課題を解決するためになされたものであり、情報処理コストを低減させつつも、検出精度の低下を抑制して、ユーザの周囲とのコミュニケーション参加度合いを検出できるコミュニケーションシステムを提供することを目的とする。 The present disclosure has been made to solve such problems, and aims to provide a communication system that can detect the user's degree of participation in communication with those around them while reducing information processing costs and suppressing a decrease in detection accuracy.

少なくとも３台のウェアラブル端末と接続する判定装置を含むコミュニケーションシステムであって、前記判定装置は、
前記接続されるウェアラブル端末が検知した音声データに基づいて、前記ウェアラブル端末の各ユーザのうち、発話している発話者を判定する発話判定部と、
前記ウェアラブル端末が検知した加速度データに基づいて、前記ウェアラブル端末の各ユーザの頷きを判定する興味動作判定部と、
所定時間における前記頷きの回数である頷き密度の時間的推移を前記発話者の発話区間において算出し、非発話者の間で前記頷き密度の時間的推移の相関を算出する興味関心判定部と、を備える
コミュニケーションシステム。 A communication system including a determination device connected to at least three wearable devices, the determination device comprising:
an utterance determination unit that determines a speaker who is speaking among the users of the wearable device based on voice data detected by the connected wearable device;
an interest action determination unit that determines whether each user of the wearable device is nodding based on acceleration data detected by the wearable device;
and an interest determination unit that calculates a time transition of nodding density, which is the number of nods in a specified period of time, during a speech section of the speaker, and calculates a correlation of the time transition of the nodding density between non-speakers.

本開示によって、情報処理コストを低減させつつも、検出精度の低下を抑制して、ユーザの周囲とのコミュニケーション参加度合いを検出できるコミュニケーションシステムを提供できる。 This disclosure provides a communication system that can detect the degree of a user's participation in communication with those around them while reducing information processing costs and minimizing deterioration in detection accuracy.

第１の実施形態に係るコミュニケーションシステムの構成を示すブロック図である。1 is a block diagram showing a configuration of a communication system according to a first embodiment. 第１の実施形態に係る判定装置の動作を示すフローチャートである。4 is a flowchart showing an operation of the determination device according to the first embodiment. 第１の実施形態に係る判定装置における、発話者に対する非発話者間における興味関心の類似度を推定する方法の具体例を示す図である。4A to 4C are diagrams illustrating a specific example of a method for estimating a similarity of interests between a speaker and non-speakers in the determination device according to the first embodiment.

以下では、本開示を適用した具体的な実施の形態について、図面を参照しながら詳細に説明する。各図面において、同一要素には同一の符号が付されており、説明の明確化のため、必要に応じて重複説明は省略する。 Specific embodiments to which the present disclosure is applied will be described in detail below with reference to the drawings. In each drawing, the same elements are given the same reference numerals, and duplicate descriptions will be omitted as necessary to clarify the description.

（第１の実施形態）
まず、第１の実施形態に係るコミュニケーションシステム１を説明する。図１は、第１の実施形態に係るコミュニケーションシステム１の構成を示すブロック図である。図１に示すように、コミュニケーションシステム１は、少なくとも３台以上のウェアラブル端末１０と、判定装置２０（コミュニケーション判定装置２０）と、を備えている。まず、コミュニケーションシステム１において、ウェアラブル端末１０及び判定装置２０の各構成を説明する。その後、コミュニケーションシステム１の動作を説明する。 First Embodiment
First, a communication system 1 according to a first embodiment will be described. Fig. 1 is a block diagram showing the configuration of the communication system 1 according to the first embodiment. As shown in Fig. 1, the communication system 1 includes at least three or more wearable devices 10 and a determination device 20 (communication determination device 20). First, the configurations of the wearable devices 10 and the determination device 20 in the communication system 1 will be described. After that, the operation of the communication system 1 will be described.

＜ウェアラブル端末１０の構成＞
ウェアラブル端末１０は、ユーザに装着される。例えば、１人のユーザは、１つのウェアラブル端末１０を装着する。よって、複数のウェアラブル端末１０は、複数のユーザに装着される。なお、図１では、３台のウェアラブル端末１０が示されているが、これに限らず、ウェアラブル端末１０は、４台以上でもよい。 <Configuration of wearable terminal 10>
The wearable terminal 10 is worn by a user. For example, one user wears one wearable terminal 10. Thus, multiple users wear multiple wearable terminals 10. Note that, although three wearable terminals 10 are shown in FIG. 1, the number of wearable terminals 10 is not limited to this, and four or more wearable terminals 10 may be used.

ウェアラブル端末１０は、例えば、バッジである。なお、ウェアラブル端末１０は、ユーザに装着されるものであれば、バッジに限らず、ヘッドセット、イヤホン、メガネ、ネックレス、ペンダント等でもよい。ウェアラブル端末１０は、センサ１１を備える。 The wearable device 10 is, for example, a badge. Note that the wearable device 10 is not limited to a badge, and may be a headset, earphones, glasses, necklace, pendant, etc., as long as it is worn by a user. The wearable device 10 includes a sensor 11.

センサ１１は、ウェアラブル端末１０のユーザの物理情報を検知する。例えば、センサ１１は、ユーザの発声を検知するマイクを備え、ウェアラブル端末１０のユーザの発話、すなわち音声を検知する。また、センサ１１は、ユーザの動きを検知する加速度センサを備え、ウェアラブル端末１０のユーザの加速度を検知する。 The sensor 11 detects physical information of the user of the wearable device 10. For example, the sensor 11 includes a microphone that detects the user's speech, and detects the speech, i.e., the voice, of the user of the wearable device 10. The sensor 11 also includes an acceleration sensor that detects the user's movements, and detects the acceleration of the user of the wearable device 10.

各ウェアラブル端末１０は、図示しない送受信器を有している。各ウェアラブル端末１０は、無線または有線の通信回線により、判定装置２０に接続されている。例えば、各ウェアラブル端末１０は、Ｂｌｕｅｔｏｏｔｈ（登録商標）等の近距離無線通信により、判定装置２０と通信可能に接続されてもよい。各ウェアラブル端末１０は、インターネット等のネットワークを介して、判定装置２０に接続されてもよい。各ウェアラブル端末１０は、通信回線を介して、判定装置２０に検知した情報を送信する。また、各ウェアラブル端末１０は、通信回線を介して、判定装置２０から、制御信号等の情報を受信する。 Each wearable terminal 10 has a transceiver (not shown). Each wearable terminal 10 is connected to the determination device 20 via a wireless or wired communication line. For example, each wearable terminal 10 may be communicatively connected to the determination device 20 via short-range wireless communication such as Bluetooth (registered trademark). Each wearable terminal 10 may be connected to the determination device 20 via a network such as the Internet. Each wearable terminal 10 transmits detected information to the determination device 20 via the communication line. In addition, each wearable terminal 10 receives information such as a control signal from the determination device 20 via the communication line.

また、各ウェアラブル端末１０は、時刻を同期させてもよい。例えば、各ウェアラブル端末１０は、インターネットに接続した判定装置２０から、ネットワークタイムプロトコル（ＮｅｔｗｏｒｋＴｉｍｅＰｒｏｔｏｃｏｌ、ＮＴＰ）の時刻を受信して、時刻を同期させてもよい。 Furthermore, each wearable device 10 may synchronize the time. For example, each wearable device 10 may receive a Network Time Protocol (NTP) time from a determination device 20 connected to the Internet and synchronize the time.

各ウェアラブル端末１０は、無線または有線の通信回線により、相互に接続されてもよい。各ウェアラブル端末１０は、Ｂｌｕｅｔｏｏｔｈ（登録商標）等の近距離無線通信により、相互に通信可能に接続されてもよいし、インターネット等のネットワークを介して、相互に接続されてもよい。各ウェアラブル端末１０は、ウェアラブル端末１０同士で相互に各種の情報を送受信してもよい。判定装置２０は、各ウェアラブル端末１０間の近距離無線通信により、各ウェアラブル端末１０間の距離を取得してもよい。 The wearable terminals 10 may be connected to each other by wireless or wired communication lines. The wearable terminals 10 may be connected to each other so that they can communicate with each other by short-range wireless communication such as Bluetooth (registered trademark), or may be connected to each other via a network such as the Internet. The wearable terminals 10 may transmit and receive various information to each other. The determination device 20 may obtain the distance between each wearable terminal 10 by short-range wireless communication between each wearable terminal 10.

＜判定装置２０の構成＞
判定装置２０は、無線または有線の通信回線により各ウェアラブル端末１０に接続されている。判定装置２０は、例えば、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）、サーバ（Ｓｅｒｖｅｒ）、スマートフォン等の情報処理装置である。判定装置２０は、インターネット経由で各ウェアラブル端末１０が取得した情報を収集できるように、クラウド上に設けられてもよい。 <Configuration of Determination Device 20>
The determination device 20 is connected to each wearable terminal 10 via a wireless or wired communication line. The determination device 20 is, for example, an information processing device such as a personal computer (PC), a server, or a smartphone. The determination device 20 may be provided on a cloud so that information acquired by each wearable terminal 10 can be collected via the Internet.

判定装置２０は、複数のウェアラブル端末１０を用いて取得した情報に基づき、ウェアラブル端末１０のユーザ間の興味・関心の類似度を推定する。判定装置２０は、記憶部２１、発話判定部２２、興味動作判定部２３及び興味関心判定部２４を備える。 The determination device 20 estimates the degree of similarity in interests between users of the wearable devices 10 based on information acquired using multiple wearable devices 10. The determination device 20 includes a memory unit 21, an utterance determination unit 22, an interest action determination unit 23, and an interest determination unit 24.

記憶部２１は、例えばウェアラブル端末１０のセンサ１１が検知した情報を記憶する。記憶部２１は、センサ１１が検知した発話の音声データ、及び、センサ１１が検知した加速度データ等を記憶する。 The memory unit 21 stores, for example, information detected by the sensor 11 of the wearable device 10. The memory unit 21 stores voice data of speech detected by the sensor 11, acceleration data detected by the sensor 11, etc.

発話判定部２２は、発話の発話者及び発話区間を判定する。発話者を判定する方法は、以下に示すいくつかの例が挙げられる。例えば、発話判定部２２は、センサ１１により検知された音圧の大きさが閾値を超えているかどうかで発話者を判定してもよい。また、例えば、発話判定部２２は、まず、近距離無線通信等を用いて取得したウェアラブル端末１０間の距離をもとに、距離が近い複数のユーザで仮想的にグループを形成する。その上で、その仮想的なグループの中で音圧が一番大きく、さらに、他のユーザの音圧よりも一定以上の差をもっている一人を発話者と判定してもよい。 The speech determination unit 22 determines the speaker and speech section of the speech. The following are some examples of methods for determining the speaker. For example, the speech determination unit 22 may determine the speaker based on whether the sound pressure detected by the sensor 11 exceeds a threshold value. In addition, for example, the speech determination unit 22 may first virtually form a group of multiple users who are close to each other based on the distance between the wearable devices 10 acquired using short-range wireless communication or the like. Then, the one user in the virtual group who has the highest sound pressure and whose sound pressure is higher than the other users by a certain amount or more may be determined to be the speaker.

さらに、発話者の判定精度を向上させる方法として、発話判定部２２は、発話者かどうか判定する対象の対象ウェアラブル端末１０のセンサ１１が取得した音声データに発話区間がある場合には、対象ウェアラブル端末１０を装着したユーザを発話者と判定する。 Furthermore, as a method for improving the accuracy of determining who is speaking, if there is a speech section in the voice data acquired by the sensor 11 of the target wearable device 10 that is to be determined as being the speaker, the speech determination unit 22 determines that the user wearing the target wearable device 10 is the speaker.

具体的には、発話判定部２２は、対象ウェアラブル端末１０のセンサ１１が取得した音声データにおいて、音圧が定常ノイズの閾値よりも小さい区間を、定常ノイズの影響を反映した第１非発話区間と判定する。定常ノイズは、周囲の環境から由来する音であって、エアコンの稼働音や周囲のざわつき等、一定の範囲内で音圧が継続的に発生しているものである。 Specifically, the speech determination unit 22 determines, in the voice data acquired by the sensor 11 of the target wearable device 10, a section in which the sound pressure is smaller than the stationary noise threshold as a first non-speech section that reflects the influence of stationary noise. Stationary noise is a sound that originates from the surrounding environment, and is generated continuously within a certain range of sound pressure, such as the sound of an air conditioner running or surrounding commotion.

発話判定部２２は、対象ウェアラブル端末１０のセンサ１１が取得した音声データにおいて、音圧が、定常ノイズの閾値以上であり、対象ウェアラブル端末１０から所定距離内に位置する比較ウェアラブル端末１０のセンサ１１が取得した音声データの音圧と類似する場合には、突発的な非定常ノイズの影響を反映した第２非発話期間と判定する。突発的な非定常ノイズは、ウェアラブル端末１０を装着したユーザの発声以外に由来する音であって、周囲の人の突発的な大声や、大きな物音等、突発的に発声するものである。 When the sound pressure in the voice data acquired by the sensor 11 of the target wearable device 10 is equal to or greater than the threshold for stationary noise and is similar to the sound pressure of the voice data acquired by the sensor 11 of a comparison wearable device 10 located within a predetermined distance from the target wearable device 10, the speech determination unit 22 determines that the period is a second non-speech period that reflects the influence of sudden non-stationary noise. Sudden non-stationary noise is a sound that originates other than the voice of the user wearing the wearable device 10, such as a sudden loud voice from a nearby person or a loud noise.

発話判定部２２は、対象ウェアラブル端末１０のセンサ１１が取得した音声データにおいて、音圧が、定常ノイズの閾値以上であり、比較ウェアラブル端末１０の音圧と類似せず、比較ウェアラブル端末１０までの距離に応じて減少すべき閾値よりも小さい場合には、他者の発話が混入した第３非発話区間と判定する。 When the sound pressure in the voice data acquired by the sensor 11 of the target wearable device 10 is equal to or greater than the threshold for stationary noise, is not similar to the sound pressure of the comparison wearable device 10, and is smaller than a threshold that should decrease according to the distance to the comparison wearable device 10, the speech determination unit 22 determines that this is a third non-speech section in which another person's speech has been mixed in.

そして、発話判定部２２は、対象ウェアラブル端末１０のセンサ１１が取得した音声データにおいて、第１～第３非発話区間以外の区間を、対象ウェアラブル端末１０を装着したユーザが発話した発話区間と判定する。このようにして、発話判定部２２は、対象ウェアラブル端末１０のユーザが発話者かどうか判定する。なお、発話者を判定する方法は、上記の方法に限らない。また、上述した発話者を判定する方法をいくつか組み合わせてもよい。 Then, the speech determination unit 22 determines, in the voice data acquired by the sensor 11 of the target wearable device 10, sections other than the first to third non-speech sections as speech sections in which the user wearing the target wearable device 10 is speaking. In this way, the speech determination unit 22 determines whether the user of the target wearable device 10 is the speaker. Note that the method of determining the speaker is not limited to the above method. Also, some of the above-mentioned methods of determining the speaker may be combined.

興味動作判定部２３は、センサ１１が取得したウェアラブル端末１０のユーザの動きのデータから興味関心を表す特徴的動作を判定する。具体的には、興味動作判定部２３は、センサ１１が検知した加速度からウェアラブル端末１０を装着したユーザの「頷き」を判定する。なお、興味動作は、頷きに限らず、拍手等でもよい。 The interest action determination unit 23 determines a characteristic action that indicates interest from the data of the movement of the user of the wearable device 10 acquired by the sensor 11. Specifically, the interest action determination unit 23 determines a "nod" of the user wearing the wearable device 10 from the acceleration detected by the sensor 11. Note that the interest action is not limited to a nod, and may be a clap, etc.

興味動作判定部２３は、以下のような方法で、「頷き」を判定する。例えば、興味動作判定部２３は、センサ１１が取得した加速度のＸＹＺの３軸の時系列データのうち、鉛直方向の値を所定の時間区間ごとに抽出する。そして、その時間区間の平均値と標準偏差を算出する。算出した標準偏差が所定の値よりも小さい場合に、その時間区間で頷きが発生したと判定する。ただし、この場合には、歩行や姿勢変更といった大きな動作を伴わないことが必要条件である。また、興味動作判定部２３は、算出した平均値から、所定の偏差よりも外れている点が存在する場合に、その時間区間で頷きが発生したと判定してもよい。ただし、単発的な鉛直方向の動作が出ることが必要条件である。 The interest movement determination unit 23 determines "nodding" in the following manner. For example, the interest movement determination unit 23 extracts the vertical value from the time series data of the three axes, XYZ, of the acceleration acquired by the sensor 11 for each predetermined time interval. Then, the average value and standard deviation for that time interval are calculated. If the calculated standard deviation is smaller than a predetermined value, it is determined that a nod has occurred in that time interval. However, in this case, it is a necessary condition that no major movement such as walking or a change in posture is involved. Furthermore, the interest movement determination unit 23 may determine that a nod has occurred in that time interval if there is a point that deviates from the calculated average value by more than a predetermined deviation. However, it is a necessary condition that a single vertical movement occurs.

また、「頷き」を検出する別の方法として、興味動作判定部２３は、センサ１１が取得した加速度のＸＹＺの３軸の時系列データを、所定の時間区間ごとに抽出する。その時間区間の値を深層学習の畳み込みニューラルネットワーク（ＣｏｎｖｏｌｕｔｉｏｎＮｅｕｒａｌＮｅｔｗｏｒｋ、ＣＮＮ）にかけ、出力値が所定の値以上であれば、その区間で頷きが発生したと判定してもよい。 As another method for detecting "nodding," the interest behavior determination unit 23 extracts time series data of the three axes, XYZ, of acceleration acquired by the sensor 11 for each predetermined time interval. The value of that time interval may be applied to a deep learning Convolution Neural Network (CNN), and if the output value is equal to or greater than a predetermined value, it may be determined that a nod occurred in that interval.

興味関心判定部２４は、発話判定部２２及び興味動作判定部２３が判定した情報に基づいて、発話者に対する非発話者間における興味関心の類似度を推定する。具体的には、発話判定部２２が判定した発話者の発話区間内で、非発話者の頷き密度の時間的推移を算出する。ここで、頷き密度は、例えば所定時間におけるウェアラブル端末１０のユーザの頷き回数である。そして、興味関心判定部２４は、非発話者間で頷き密度の時間的推移の相関を算出する。したがって、当該相関の大きさに基づいて、非発話者における発話者に対する興味関心の類似度が推定できる。 The interest determination unit 24 estimates the similarity of interests between non-speakers and speakers based on the information determined by the speech determination unit 22 and the interest action determination unit 23. Specifically, it calculates the time transition of the nodding density of non-speakers within the speech section of the speaker determined by the speech determination unit 22. Here, the nodding density is, for example, the number of nods by the user of the wearable device 10 in a specified period of time. Then, the interest determination unit 24 calculates the correlation of the time transition of the nodding density between non-speakers. Therefore, the similarity of interests between non-speakers and speakers can be estimated based on the magnitude of the correlation.

＜ハードウェア構成＞
ウェアラブル端末１０又は判定装置２０は、例えば、算出処理、判定処理、制御処理等を行うＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＣＰＵによって実行される演算プログラム、制御プログラム等が記憶されたＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、各種のデータなどを記憶するＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、外部と信号の入出力を行うインターフェイス部（Ｉ／Ｆ）、などからなるマイクロコンピュータを中心にして、ハードウェアで構成されてもよい。ＣＰＵ、ＲＯＭ、ＲＡＭ及びインターフェイス部は、データバスなどを介して相互に接続されている。 <Hardware Configuration>
The wearable terminal 10 or the determination device 20 may be configured as hardware centered around a microcomputer including a CPU (Central Processing Unit) that performs calculation processing, determination processing, control processing, etc., a ROM (Read Only Memory) that stores an arithmetic program executed by the CPU, a control program, etc., a RAM (Random Access Memory) that stores various data, etc., and an interface unit (I/F) that inputs and outputs signals to and from the outside. The CPU, ROM, RAM, and interface unit are connected to each other via a data bus, etc.

＜判定装置２０の動作＞
続いて、第１の実施形態に係るコミュニケーションシステム１の動作を説明する。ここで、コミュニケーションシステム１の判定装置２０の動作を中心に説明する。図２は、第１の実施形態に係る判定装置２０の動作を示すフローチャートである。 <Operation of Determination Device 20>
Next, an operation of the communication system 1 according to the first embodiment will be described. Here, the operation of the determination device 20 of the communication system 1 will be mainly described. Fig. 2 is a flowchart showing the operation of the determination device 20 according to the first embodiment.

少なくとも３人以上のユーザが会話をする状況を想定する。１人のユーザは、１つのウェアラブル端末１０を用いる。よって、複数のウェアラブル端末１０は、複数のユーザに用いられる。ウェアラブル端末１０のユーザは、例えばグループワークなどの参加者である。 Assume a situation in which at least three or more users are having a conversation. Each user uses one wearable device 10. Thus, multiple wearable devices 10 are used by multiple users. The users of the wearable devices 10 are, for example, participants in group work.

判定装置２０の記憶部２１は、ウェアラブル端末１０のセンサ１１が検知した情報を記憶している。検知した情報とは、ウェアラブル端末１０のユーザが発話を検知した音声データ、及び、ウェアラブル端末１０のユーザの動きを検知した加速度データ等である。 The memory unit 21 of the determination device 20 stores information detected by the sensor 11 of the wearable device 10. The detected information includes voice data detected when the user of the wearable device 10 speaks, and acceleration data detected when the user of the wearable device 10 moves, etc.

まず、図２に示すように、ステップＳ１０１において、判定装置２０の発話判定部２２は、ウェアラブル端末１０が取得した音声データに基づいて、ウェアラブル端末１０のユーザの中から発話者を判定し、さらに発話者の発話区間を判定する。発話区間は、発話者が発話している時間に対応する。ここで、発話判定部２２は、記憶部２１からウェアラブル端末１０を用いる各ユーザの音声データを取得する。 First, as shown in FIG. 2, in step S101, the speech determination unit 22 of the determination device 20 determines a speaker from among users of the wearable device 10 based on the voice data acquired by the wearable device 10, and further determines the speech section of the speaker. The speech section corresponds to the time during which the speaker is speaking. Here, the speech determination unit 22 acquires the voice data of each user using the wearable device 10 from the memory unit 21.

次に、ステップＳ１０２において、興味動作判定部２３は、ウェアラブル端末１０が取得した加速度データに基づいて、ウェアラブル端末１０のユーザの頷きを判定する。ここで、発話判定部２２は、記憶部２１からウェアラブル端末１０を用いる各ユーザの加速度データを取得する。 Next, in step S102, the interest action determination unit 23 determines whether the user of the wearable device 10 is nodding based on the acceleration data acquired by the wearable device 10. Here, the speech determination unit 22 acquires the acceleration data of each user using the wearable device 10 from the storage unit 21.

次に、興味関心判定部２４は、発話判定部２２及び興味動作判定部２３が判定した情報に基づいて、非発話者における発話者に対する興味関心の類似度を推定する。まず、ステップＳ１０３において、興味関心判定部２４は、発話判定部２２が判定した発話者の発話区間内で、非発話者の頷き密度の時間的推移を算出する。ここで、頷き密度は、例えば所定時間におけるウェアラブル端末１０のユーザの頷き回数である。次に、ステップＳ１０４において、興味関心判定部２４は、非発話者間で頷き密度の時間的推移の相関を算出する。したがって、興味関心判定部２４は、当該相関の大きさに基づいて、非発話者における発話者に対する興味関心の類似度が推定できる。 Next, the interest determination unit 24 estimates the similarity of the non-speaker's interest in the speaker based on the information determined by the speech determination unit 22 and the interest action determination unit 23. First, in step S103, the interest determination unit 24 calculates the time transition of the nodding density of the non-speaker within the speech section of the speaker determined by the speech determination unit 22. Here, the nodding density is, for example, the number of times the user of the wearable device 10 nods in a predetermined period of time. Next, in step S104, the interest determination unit 24 calculates the correlation of the time transition of the nodding density between the non-speakers. Therefore, the interest determination unit 24 can estimate the similarity of the non-speaker's interest in the speaker based on the magnitude of the correlation.

続いて、図２及び図３を用いて、ステップＳ１０３～ステップＳ１０４に示した発話者に対する非発話者間における興味関心の類似度を推定する方法の具体例を説明する。図３は、第１の実施形態に係る判定装置２０における、発話者に対する非発話者間における興味関心の類似度を推定する方法の具体例を示す図である。 2 and 3, a specific example of a method for estimating the similarity of interests between non-speakers and a speaker shown in steps S103 to S104 will be described. FIG. 3 is a diagram showing a specific example of a method for estimating the similarity of interests between non-speakers and a speaker in the determination device 20 according to the first embodiment.

図３に示す一例において、ユーザＡ～ユーザＤは、それぞれウェアラブル端末１０を用いる。ユーザＡは発話者である。一方、ユーザＢ、ユーザＣ又はユーザＤは、非発話者である。ユーザＡは、発話開始点Ｉ～発話終了点ＩＩまで発話し、発話開始点Ｉ～発話終了点ＩＩまでの時間は発話区間である。 In the example shown in FIG. 3, users A to D each use a wearable device 10. User A is a speaker. Meanwhile, user B, user C, and user D are non-speakers. User A speaks from speech start point I to speech end point II, and the time from speech start point I to speech end point II is the speech section.

まず、ステップＳ１０３において、興味関心判定部２４は、発話者（ユーザＡ）の発話区間内で、非発話者（ユーザＢ～ユーザＤ）ごとに頷き密度の時間的推移を算出する。ここで、頷き密度は、例えば所定時間におけるウェアラブル端末１０のユーザの頷き回数である。 First, in step S103, the interest determination unit 24 calculates the time transition of the nodding density for each non-speaker (user B to user D) within the speech section of the speaker (user A). Here, the nodding density is, for example, the number of times the user of the wearable device 10 nods in a given period of time.

次に、ステップＳ１０４において、興味関心判定部２４は、非発話者間で頷き密度の時間的推移の相関を算出する。したがって、興味関心判定部２４は、当該相関の大きさに基づいて、非発話者における発話者に対する興味関心の類似度を推定できる。例えば、興味関心判定部２４は、ユーザＢとユーザＣでは、時間的推移の相関が高いと判定する。そうすると、興味関心判定部２４は、ユーザＡに対するユーザＢとユーザＣとの興味関心が似ていると推定する。一方、興味関心判定部２４は、ユーザＢとユーザＤでは、時間的推移の相関が低いと判定する。そうすると、興味関心判定部２４は、ユーザＡに対するユーザＢとユーザＤとの興味関心が異なると推定する。 Next, in step S104, the interest determination unit 24 calculates the correlation of the time transition of the nodding density between non-speakers. Therefore, the interest determination unit 24 can estimate the similarity of the interests of non-speakers in speakers based on the magnitude of the correlation. For example, the interest determination unit 24 determines that the correlation of the time transition between user B and user C is high. Then, the interest determination unit 24 estimates that the interests of user B and user C in user A are similar. On the other hand, the interest determination unit 24 determines that the correlation of the time transition between user B and user D is low. Then, the interest determination unit 24 estimates that the interests of user B and user D in user A are different.

第１の実施形態に係るコミュニケーションシステム１は、情報量が多く情報処理コストの高い画像情報を用いず、情報処理コストの低い音声データや加速度データなどの情報を用いて非発話者における発話者に対する興味関心の類似度を推定する。興味関心の類似度から、興味関心の類似度から非発話者の発話者を中心としたコミュニケーションへの参加度合いを検出できる。例えば、発話者に対して同様の興味関心を持つ非発話者が多い程、非発話者が発話者を中心としたコミュニケーションに参加していることが推定できる。したがって、情報処理コストを低減させつつも、検出精度の低下を抑制して、ユーザの周囲とのコミュニケーション参加度合いを検出できるコミュニケーションシステムを提供できる。 The communication system 1 according to the first embodiment estimates the similarity of interests of a non-speaker with respect to a speaker using information such as voice data and acceleration data, which have low information processing costs, without using image information that has a large amount of information and high information processing costs. The degree of participation of a non-speaker in communication centered on the speaker can be detected from the similarity of interests. For example, the more non-speakers there are who have similar interests to a speaker, the more it can be estimated that the non-speakers are participating in communication centered on the speaker. Therefore, a communication system can be provided that can detect the degree of participation of a user in communication with those around them while reducing information processing costs and suppressing a decrease in detection accuracy.

また、特許文献１では、ユーザの周囲とのコミュニケーション参加度合いが画像情報を用いて検出される場合、ユーザの動作はカメラ等で撮像される。そうすると、さらに、以下の（１）～（３）の課題も生じる。（１）撮影可能な範囲外や顔の角度が後ろ向きなどになった場合にコミュニケーション参加度合いが判定できない。（２）ユーザは、撮影されることに対する心理的抵抗感を持つ。（３）カメラの敷設コストが必要となる。第１の実施形態に係るコミュニケーションシステム１は、画像情報を用いずに非発話者における発話者に対する興味関心の類似度を推定する。したがって、上記（１）～（３）の課題を解決するとともに、検出精度の低下を抑制して、ユーザの周囲とのコミュニケーション参加度合いを検出できるコミュニケーションシステムを提供できる。 In addition, in Patent Document 1, when the degree of participation in communication with the user's surroundings is detected using image information, the user's actions are captured by a camera or the like. This also results in the following problems (1) to (3). (1) The degree of participation in communication cannot be determined when the user is outside the range that can be captured or when the face is facing backwards. (2) Users have a psychological resistance to being photographed. (3) The cost of installing cameras is required. The communication system 1 according to the first embodiment estimates the similarity of interests of a non-speaker in a speaker without using image information. Therefore, it is possible to provide a communication system that can detect the degree of participation in communication with the user's surroundings while solving the above problems (1) to (3) and suppressing a decrease in detection accuracy.

また、特許文献１では、情報提供者が情報取得者に対面して説明対象物に関連する情報を提供しており、情報取得者が説明対象物に直接アクセス可能な状況において、情報取得者の興味傾向を示す情報である興味傾向情報を取得し、出力する興味傾向情報出力装置を提供する。具体的には、特許文献１の興味傾向情報出力装置は、情報取得者の動きを示すモーション情報を受け付けるモーション情報受付部と、情報取得者の発した音声に関する音声情報を受け付ける音声情報受付部と、モーション情報と音声情報とから、情報取得者の興味傾向に関する情報である興味傾向情報を生成する興味傾向情報生成部と、興味傾向情報を出力する興味傾向情報出力部とを備える。そして、特許文献１の興味傾向情報出力装置は、このような構成により、グループワークといったコミュニケーションの場において、関係性を深く知るために、興味・関心が似ている人を把握したいニーズに対応しようとしている。 In addition, in Patent Document 1, an information provider provides information related to an object of explanation to an information acquirer in face-to-face contact, and an interest tendency information output device is provided that acquires and outputs interest tendency information, which is information indicating the interest tendency of the information acquirer, in a situation in which the information acquirer can directly access the object of explanation. Specifically, the interest tendency information output device of Patent Document 1 includes a motion information receiving unit that receives motion information indicating the movement of the information acquirer, a voice information receiving unit that receives voice information related to the voice uttered by the information acquirer, an interest tendency information generating unit that generates interest tendency information, which is information related to the interest tendency of the information acquirer, from the motion information and the voice information, and an interest tendency information output unit that outputs the interest tendency information. With this configuration, the interest tendency information output device of Patent Document 1 attempts to meet the need to understand people with similar interests and concerns in order to deepen relationships in communication situations such as group work.

しかしながら、特許文献１の興味傾向情報出力装置は、発話者の発話内容を把握し、それに対する反応の紐づけから興味・関心を推定している。よって、発話内容や映像といった大きい情報量を扱う必要があり、手軽なシステムで簡便に興味・関心を知ることはできない。また、発話者が発話内容を取られることに対して心理的な抵抗感がある。 However, the interest tendency information output device in Patent Document 1 estimates the interests and concerns of a speaker by grasping the content of the speaker's speech and linking the response to that. Therefore, it is necessary to handle a large amount of information such as the speech content and video, and it is not possible to easily know interests and concerns with a simple system. In addition, speakers have a psychological resistance to having their speech content taken into account.

これに対して、第１の実施形態に係るコミュニケーションシステム１は、発話者の発話内容を把握し、それに対する反応の紐づけから興味・関心を推定する必要がないので、情報処理コストを低減することができる。また、発話者が発話内容を取られることに対しての心理的な抵抗感を抑制することができる。 In contrast, the communication system 1 according to the first embodiment can reduce information processing costs because it is not necessary to grasp the content of the speaker's utterance and infer the speaker's interests and concerns from linking the responses to the content. In addition, it can suppress the speaker's psychological resistance to having the content of his or her utterance taken into account.

なお、本発明は上記実施の形態に限られたものではなく、趣旨を逸脱しない範囲で適宜変更することが可能である。 The present invention is not limited to the above embodiment, and can be modified as appropriate without departing from the spirit and scope of the invention.

上述の実施形態における各構成は、ハードウェア又はソフトウェア、もしくはその両方によって構成され、１つのハードウェア又はソフトウェアから構成してもよいし、複数のハードウェア又はソフトウェアから構成してもよい。各装置の機能（処理）を、ＣＰＵやメモリ等を有するコンピュータにより実現してもよい。例えば、記憶装置に実施形態における方法を行うためのプログラムを格納し、各機能を、記憶装置に格納されたプログラムをＣＰＵで実行することにより実現してもよい。 Each component in the above-described embodiments may be configured with hardware or software, or both, and may be configured with one piece of hardware or software, or may be configured with multiple pieces of hardware or software. The functions (processing) of each device may be realized by a computer having a CPU, memory, etc. For example, a program for performing the method in the embodiment may be stored in a storage device, and each function may be realized by executing the program stored in the storage device with a CPU.

これらのプログラムは、様々なタイプの非一時的なコンピュータ可読媒体（non-transitory computer readable medium）を用いて格納され、コンピュータに供給することができる。非一時的なコンピュータ可読媒体は、様々なタイプの実体のある記録媒体（tangible storage medium）を含む。非一時的なコンピュータ可読媒体の例は、磁気記録媒体（例えばフレキシブルディスク、磁気テープ、ハードディスクドライブ）、光磁気記録媒体（例えば光磁気ディスク）、ＣＤ－ＲＯＭ（Read Only Memory）、ＣＤ－Ｒ、ＣＤ－Ｒ／Ｗ、半導体メモリ（例えば、マスクＲＯＭ、ＰＲＯＭ（Programmable ROM）、ＥＰＲＯＭ（Erasable PROM）、フラッシュＲＯＭ、ＲＡＭ（random Access memory））を含む。また、プログラムは、様々なタイプの一時的なコンピュータ可読媒体（transitory computer readable medium）によってコンピュータに供給されてもよい。一時的なコンピュータ可読媒体の例は、電気信号、光信号、及び電磁波を含む。一時的なコンピュータ可読媒体は、電線及び光ファイバ等の有線通信路、又は無線通信路を介して、プログラムをコンピュータに供給できる。 These programs can be stored and supplied to a computer using various types of non-transitory computer readable media. Non-transitory computer readable media include various types of tangible storage media. Examples of non-transitory computer readable media include magnetic recording media (e.g., flexible disks, magnetic tapes, hard disk drives), magneto-optical recording media (e.g., magneto-optical disks), CD-ROMs (Read Only Memory), CD-Rs, CD-R/Ws, and semiconductor memories (e.g., mask ROMs, PROMs (Programmable ROMs), EPROMs (Erasable PROMs), flash ROMs, and RAMs (random access memories)). The programs may also be supplied to a computer by various types of transitory computer readable media. Examples of transitory computer readable media include electrical signals, optical signals, and electromagnetic waves. The transitory computer readable media can supply the programs to a computer via wired communication paths such as electric wires and optical fibers, or wireless communication paths.

１コミュニケーションシステム
１０ウェアラブル端末
１１センサ
２０判定装置（コミュニケーション判定装置）
２１記憶部
２２発話判定部
２３興味動作判定部
２４興味関心判定部 1 Communication system 10 Wearable device 11 Sensor 20 Determination device (communication determination device)
21 Storage unit 22 Speech determination unit 23 Interest action determination unit 24 Interest determination unit

Claims

A communication system including a determination device connected to at least three wearable devices used by at least three users, each of which has one device, the determination device comprising:
an utterance determination unit that determines a speaker of one of the at least three users and a speech period of the speaker based on voice data detected by a microphone included in each of the at least three wearable devices;
an interest gesture determination unit that determines whether the at least three users are nodding based on acceleration data detected by an acceleration sensor included in each of the at least three wearable devices;
an interest determination unit that calculates a time transition of a nod density, which is the number of nods of each of the at least two non-speakers in a predetermined time period during a speech section of the speaker, calculates a correlation of the time transition of the nod density between the at least two non-speakers, and estimates a similarity of interest between the at least two non-speakers in the speaker as a degree of participation of the at least two non-speakers in communication centered on the speaker based on the magnitude of the correlation ,
The utterance determination unit is
determine a speaker among the at least three users based on the sound pressure level of the voice data, and determine a section other than a first non-speech section, a second non-speech section, and a third non-speech section determined based on voice data detected by an acceleration sensor provided in a target wearable device among the at least three wearable devices that is to be determined as being the speaker, as a speech section spoken by the user wearing the target wearable device;
The interest action determination unit is
A vertical value of acceleration time series data included in the acceleration data of each of the at least two non-speakers is extracted for each predetermined time interval, and if a value calculated from the vertical value for the predetermined time interval satisfies a predetermined relationship with a predetermined value, it is determined that a nod of a non-speaker has occurred in the predetermined time interval.
Communication system.