JP3783504B2

JP3783504B2 - Dialog recording system

Info

Publication number: JP3783504B2
Application number: JP2000020001A
Authority: JP
Inventors: 直樹林
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2000-01-28
Filing date: 2000-01-28
Publication date: 2006-06-07
Anticipated expiration: 2020-01-28
Also published as: JP2001211440A

Description

【０００１】
【発明の属する技術分野】
本発明は対話記録システムおよび記憶媒体に関し、特に複数の対話参加者による対話を対話場所の画像を含めて記録する技術に関する。
【０００２】
【従来の技術】
複数人による対話（会話、会議）を記録し、その記録を分析することで新たな知識が得られることは多い。例えば、顧客との対話は、後でその内容を吟味してみると（特に、上長やトップセールスにその記録を見せて意見を聞いてみると）、その場では気づかなかった問題点やよりよい解決案を考えることができる。あるいは、システムの使い勝手などを調べるために、プロトコル分析という人の発話を分析する手法が有効であることは広く知られている。
【０００３】
画像を含む対話情報を記録するための従来技術としては、テレビ会議システムに関わるものが知られている。特に、発言者の映像を記録するための技術として、音声情報から発言者を特定し、その発言者を中心とした映像をカメラで捉えるものがある。この技術に関わるものとして、例えば、特開平４−１２２１８４号公報、特開平５−１２２６８９号公報、特開平６−１７８２９５号公報、特開平７−９２９８８号公報、特開平９−３７２２４号公報に示される技術がある。
【０００４】
【発明が解決しようとする課題】
しかしながら、前記従来技術には次の二つの課題がある。
【０００５】
一つ目は、対話が記録されていると感じることが、人間のコミュニケーションに影響を与えることである。従来の技術では、対話の記録を取る側（対話の主催者か、参加者の誰か）が対話開始前に記録の準備を行い、対話中の参加者のすべて発話を記録する。対話を取られる側（残りの参加者）は、事前に記録を取ることへの同意を求められるかもしれない。ここで同意をしてしまえば、対話を取られる側は自分の発言は他人に記録されており、後で何らかの形で記録が利用されることを対話中に意識せざるを得ない。このことは、参加者の発言を抑制したり、逆に参加者が過度に攻撃的あるいは迎合的な態度をとる可能性が高く、自由で闊達なコミュニケーションが阻害される。また、事前に同意をしなければ、良いアイディアや意見などが対話の中で形成されていった場合には、それが記録されていないために参加者の頭の中にある記憶に頼って記録を再構成しなければならなくなる。
【０００６】
二つ目は、対話場所に関するものである。前記従来技術では、完備された記録環境が事前に構築されている必要があるが、対話はこのような環境でのみ行われる訳ではない。例えば廊下での立ち話などのオープンスペースで偶発的に起こった対話が、業務遂行上の重要なヒントやアイディアを与えてくれる場合がある。また、顧客を訪問しても、必ずしも面談スペースなど従来技術に基づく装置を設置可能な場所が使えるわけではない。
【０００７】
本発明は、上記課題を鑑みてなされたものであって、その目的は、画像を含む対話記録を残すかどうかを対話参加者が対話後にコントロールでき、さらに完備された記録環境がない場所でも所望品質の記録を残すことのできる対話記録システムを提供することにある。
【０００８】
【課題を解決するための手段】
上記課題を解決するために、本発明に係る対話記録システムは、対話参加者毎に当該対話参加者に装着された音声入力手段から入力された各発言情報とそれら発言情報の発言時間、及び各発言情報の間の無声時間を時系列順に記録する発言情報記録手段と、前記対話参加者毎に当該対話参加者に装着された撮影手段で撮影された画像情報を時系列順に記録する画像情報記録手段と、各対話参加者の前記発言情報記録手段で記録された時系列順の各発言情報と各発言情報に対応する発言時間及び各無声時間から、ある対話参加者が発言している最中に他の対話参加者が発言しているという状態が生じないよう、各対話参加者の発言時間帯を相対的に前後させて組み合わせることにより、各対話参加者の対話順序を判定する対話順序判定手段と、前記対話順序判定手段で判定された対話順序中の各順番ごとに、その順番に該当する対話参加者の発言情報と、該対話参加者又はそれ以外の対話参加者のうちの所定の一方の画像情報記録手段に記録された画像情報のうち該発言情報に対応する発言時間に撮影された画像情報と、を採用して組み合わせることにより、対話参加者間でなされた対話情報を生成する対話情報生成手段と、前記対話情報を記録するための対話情報記録手段と、を含むように構成される。
【０００９】
この構成では、対話参加者毎に発言情報及び対話状況に関する画像情報を時系列的に記録しておき、前記発言情報が生成される発声時間情報に基づき、それら各対話参加者の発言情報及び画像情報を時系列的に組み合わせて対話情報を生成する。こうすれば、画像を含む対話記録を残すかどうかを各参加者が、対話の後でコントロールすることができる。また、この構成では、対話時には、対話参加者毎にその人の発言と画像を記録するだけでよいので、従来の会議システムのような発言音声の検知によるカメラの制御などの複雑な制御が不要になる。
【００１０】
また、本発明の一態様では、前記画像情報記録手段は各対話参加者の身体にそれぞれ装着され、対応する対話参加者の周囲の状況を表す画像情報を記録し、前記対話情報生成手段は、対話情報の生成に当たり、各発言情報に対して、前記画像情報のうち当該発言情報に係る対話参加者以外の対話参加者の画像情報記録手段で記録した画像情報を対応づける。
【００１１】
更に好適には、前記画像情報記録手段は、前記対話参加者の頭部に装着されるカメラを含み、前記対話参加者の注視方向に応じた画像情報を記録する。
【００１２】
この態様では、発言者の発言とその発言者を捉えた画像とを同時並行して再生できる対話情報を、簡便に作成できる。すなわち、対話においては、非発言者は発言者を見ている蓋然性が高い。したがって、対話参加者の頭部にカメラを装着すれば、そのうちの非発言者の画像情報記録手段には、発言者の姿の画像が記録されている可能性が高い。そこで、発言情報を、当該発言者以外の対話参加者の記録画像と組み合わせることにより、発言と当該発言の発言者の姿を捉えた画像とを適切に対応づけることが可能になる。
【００１３】
また、本発明の別の態様では、前記画像情報記録手段は各対話参加者の身体にそれぞれ装着され、対応する対話参加者自身の画像情報を記録し、前記対話情報生成手段は、対話情報の生成に当たり、前記画像情報のうち当該発言情報に係る対話参加者の画像情報記録手段で記録した画像情報を対応づける。
【００１４】
この態様でも、発言者の発言とその発言者を捉えた画像とを同時並行して再生できる対話情報を、簡便に作成できる。
【００１５】
また、本発明の別の態様では、前記発言情報記録手段及び前記画像情報記録手段は、各対話参加者に装着されたユーザ個別記録装置に設けられ、当該対話参加者に装着された前記音声入力手段及び前記撮影手段により取得された情報をそれぞれ記録し、前記ユーザ個別記録装置は、前記発言情報記録手段及び前記画像情報記録手段に記録された情報を、前記対話順序判定手段及び前記対話情報生成手段に渡すことの可否をユーザが制御するための制御手段、を備える。
【００１６】
この構成では、各ユーザの発言や対話状況に関する画像や発言は、一時的にはユーザ個別記録装置に記録される。この記録した画像や発言を対話情報としてまとめるかどうかをユーザ個別記録装置側で制御できるので、対話記録を残すか否かを、対話終了後に個々のユーザが決定できる。
【００１７】
好適な態様では、前記ユーザ個別記録装置は、当該ユーザ個別記録装置内の前記画像情報記録手段に記録された画像情報をユーザに提示して対話記録に残したくない画像の指定を受け付け、指定された画像を削除した画像情報を作成する機能を有する。
【００１８】
この態様によれば、各ユーザが不都合と判断した画像を削除して、対話記録を作成できる。
【００１９】
【発明の実施の形態】
以下、本発明の好適な実施の形態について図面に基づき詳細に説明する。
【００２０】
図１は、本発明の実施の形態に係る対話記録システムの構成を示す図である。同図に示すように、本実施の形態では、対話記録システムは複数の対話記録装置１０Ａ，１０Ｂ，１０Ｃを含む。一人の対話参加者に対して、一台の対話記録装置１０Ａ，１０Ｂ又は１０Ｃが一意的に与えられる。対話記録装置１０Ａ，１０Ｂ，１０Ｃは同様の構成を有する。
【００２１】
一台の対話記録装置１０Ａは、二つのユニットから構成される。一つは、対話参加者がメガネのように装着するメガネ型ユニット２０Ａであり、もう一つは対話参加者がホルスターやベルトを用いて体に装着する箱形ユニット３０Ａである。メガネ型ユニット２０Ａは音声及び映像の入出力を担当し、箱形ユニット３０Ａは入出力する音声や映像に関する処理や対話情報の編集などの各種情報処理を担当する。なお、他の対話記録装置１０Ｂ，１０Ｃも、同様の機能を持つメガネ型ユニット２０Ｂ，２０Ｃと箱形ユニット３０Ｂ，３０Ｃとから構成される。
【００２２】
各対話記録装置１０Ａ〜１０Ｃは、ビーム通信機能（詳細は後述）を用いて、他の対話記録装置１０Ａ〜１０Ｃとデータを送受信することができる。また、前記通信機能を用いて、データ送受信機能付きストレージ４０にデータを保存することもできる。
【００２３】
本実施形態では、各対話記録装置１０Ａ〜１０Ｃが、各々該装置を装着した対話参加者の発話音声やその参加者から見える画像（あるいはその参加者の画像）などを記録する。そして、１つの対話セッションにおけるそれら各装置１０Ａ〜１０Ｃで記録された音声や画像の情報を、一つの装置１０Ａ〜１０Ｃ（のいずれか）に集約して編集することにより、複数の対話参加者による音声及び画像による対話記録を生成する。
【００２４】
ここより、対話記録装置１０Ａ〜１０Ｃを構成する各機能部について説明する。まず、メガネ型ユニット２０Ａの各機能部について説明する。
【００２５】
［１］メガネ型ユニット
（１）画像撮影部
画像撮影部２０２は、対話参加者（ユーザ）の周囲の画像を撮影するための機能を持ち、例えばＣＣＤカメラにより実現される。撮影のためのＣＣＤカメラは、メガネ型ユニット２０Ａに前方へ向けて取り付けられており、対話参加者が顔を向けた方を撮影する。画像撮影部２０２で撮影される画像は、ユーザの視野の画像に相当する。画像撮影部２０２の出力はアナログＲＧＢデータであり、これは箱形ユニット３０Ａの画像データ生成部３１６へ送られる。
【００２６】
なお、後述する箱形ユニット３０Ａのコマンド入力部３０８のメニューで「カメラ画像確認」を選ぶことで、撮影される画像を確認することができる。これを選択すると、画像撮影部２０２で撮影中の画像（アナログＲＧＢデータ）が、ＵＩ制御部３０６を介してメガネ型ユニット２０Ａの情報表示部２０４に表示され、対話参加者はその画像を確認できる。
【００２７】
なお、この実施の形態においては、画像撮影部２０２は、対話参加者の周囲の画像を撮影するために、対話参加者が顔を向けた方を撮影するように構成している。これを更に高機能化することも可能である。例えば、画像撮影部２０２に視線追尾機能とこれに連動する画像撮影機能（例えば、ＣＣＤ画素の一部から画像を抽出する手法によるデジタルズーム及びパンニングなど）を持たせることで、対話参加者が注視した部位の画像を抽出して記録させることも可能である。
【００２８】
（２）情報表示部
情報表示部２０４は、ユーザ（対話参加者）へ情報を表示するための機能を持つ。表示される情報は、箱形ユニット３０ＡのＵＩ制御部３０６から与えられる。本実施の形態では、情報表示部２０４は、ＵＩ制御部３０６から送られるＲＧＢデータをメガネのレンズの内側に投影する、プロジェクションモニタの形態をとる。
【００２９】
（３）音声入力部
音声入力部２０６は、ユーザ（対話参加者）の音声を入力するための機能を持つ。本実施の形態では、強い指向性を持ったマイクにより実現される。このマイクは、ユーザの口の方へ向けてメガネ型ユニット２０Ａに取り付けられている。ユーザは、箱形ユニット３０Ａのコマンド入力部３０８の操作メニューのマイク音量調整を用いて、ユーザの声だけが入力できるようにマイク感度を調整できる。音声入力部２０６の出力はアナログ音声データであり、これは箱形ユニット３０Ａの音声データ生成部３１４へ送られる。
【００３０】
（４）音声出力部
音声出力部２０８は、箱形ユニット３０Ａの対話データ記憶部３３０に記憶された対話データの音声などを出力する機能を持つ。音声出力部２０８は、対話データの音声内容であるデジタル音声データをアナログ音声に変換し、スピーカでアナログ音声を再生する。なお、スピーカ再生の音量は、コマンド入力部３０８のメニューから「スピーカ音量調整」を選び、「＋」または「−」ボタンを押して調整する。
【００３１】
［２］箱形ユニット
以上、メガネ型ユニットに存在する機能部について説明した。次に、箱形ユニット３０Ａに存在する機能部について説明する。
【００３２】
（１）日時データ生成部
日時データ生成部３０２は、他の機能部にその時点における日時データを渡す機能を持つ。日時データは、１９９８．１２．６．１７．２５．１３．３３５のように、年・月・日・時・分・秒・ミリ秒で構成されている。なお、本機能部は時計を内蔵しており、その時刻合わせは後述するコマンド入力部３０８を用いて行う。
【００３３】
（２）識別番号記憶部
識別番号記憶部３０４は、各対話記録装置１０に一意に与えられた識別番号を記憶する機能を持つ。本機能部に記憶された識別番号は、後述するビーム通信制御部３１２が他の対話記録装置１０との通信をする際に用いられる。
【００３４】
（３）ＵＩ制御部
ＵＩ制御部３０６は、他の機能部と連携し、コマンド入力部３０８やメガネ型ユニット２０の情報表示部２０４などを用いてユーザとのインタラクションを行う機能を持つ。
【００３５】
（４）コマンド入力部
コマンド入力部３０８は、ユーザがコマンドを入力するための機能を持つ。具体的には、次の５つのボタンが用意されている。
【００３６】
・「メニュー」ボタン：ユーザが操作メニューの表示／非表示を切り替えるために用いる。操作メニュー項目には、ａ）対話記録の再生、ｂ）対話記録の生成、ｃ）判定時間の設定、ｄ）データ送信、ｅ）データ受信、ｆ）データ削除、ｇ）データにコメント、ｈ）マイク音量調整、ｉ）スピーカ音量調整、ｊ）カメラ画像確認、ｋ）時刻合わせ、ｌ）識別番号の表示という１２個のトップレベルメニューがあり、各項目で詳細を決めるためのサブメニューがある。
【００３７】
・「＋」ボタン：リスト中で次の選択肢を選ぶ、あるいは、数値を上げるなど、正方向に値を変化させるために用いる。
【００３８】
・「−」ボタン：リスト中で前の選択肢を選ぶ、あるいは、数値を下げるなど、負方向に値を変化させるために用いる。
【００３９】
・「○」ボタン：対話記録の再生を開始する、その選択肢を選ぶ、あるいは、確認メッセージに「はい」で答えるなど、各種の決定に用いる。
【００４０】
・「×」ボタン：対話記録の再生を停止する、サブメニューから上位メニューへ戻る、あるいは、確認メッセージに「いいえ」で答えるなど、各種のキャンセルに用いる。
【００４１】
（５）ビーム受発光部
ビーム受発光部３１０は、ビーム通信制御部３１２（詳細は後述）の制御に応じて、ビームを受光または発光する機能を有する。本実施の形態では、通信内容が他人に届くのを防ぐために、ビームが届く範囲（受発光部間の距離と相対角度）を小さくしている。
【００４２】
（６）ビーム通信制御部
ビーム通信制御部３１２は、他の人の対話記録装置のビーム通信制御部３１２と連携して、対話記憶装置間のビーム通信を実現する機能を持つ。通信されるデータは、後述するナレーションデータと対話データである。
【００４３】
ビーム通信の開始は、コマンド入力部３０８から指示される。対話記録装置間でデータをやり取りする際は、まず受信側の対話記録装置１０でメニュー「データ受信」を選択して受信待ち状態にした後、送信側の対話記録装置１０でメニュー「データ送信」を選択し、送信すべきデータを指定してデータを送信する。なおデータを送信する際は、送信者を特定するために、送信側の対話記録装置１０の識別番号記憶部３０４に記憶された識別番号が、その送信データとともに送信される。また、受信側の対話記録装置１０では、受信した識別番号とデータとを対応づけて記憶する。このとき、受信したデータの種類に応じて、後述するナレーションデータ記憶部３２４または対話データ記憶部３３０のいずれかにそのデータを記憶する。
【００４４】
なお本実施の形態では、対話記録装置間の通信にビームを用いた。この方式では、ビームによる通信範囲の限定により、送受するデータに特段の暗号化処理を施さなくても通信の秘話性をかなりの程度確保することができる。このようなビームを用いる方式の代わりに、送受するデータを暗号化し、電波によって通信してもよい。電波を使うことで、ユニットの位置や向きの制約がビームに比べて大幅に緩和される。また、暗号化されていることで、電波の範囲内の他者がデータを受信したとしても他者は暗号化されたデータを展開できないので、他人が対話記録を見ることを防ぐことができる。
【００４５】
（７）音声データ生成部
音声データ生成部３１４は、メガネ型ユニット２０Ａの音声入力部２０６の出力であるアナログ音声データから、デジタル音声データを生成する機能を持つ。生成されたデジタル音声データはナレーションデータ生成部３２０へ送られる。
【００４６】
（８）画像データ生成部
画像データ生成部３１６は、メガネ型ユニット２０Ａの画像撮影部２０２の出力であるアナログＲＧＢデータから、デジタル画像データを生成する機能を持つ。本実施の形態では、一定間隔（例えば０．２秒）毎に画像撮影部２０２のアナログ画像からデジタル画像データを生成する。生成された画像データは、後述する画像データバッファ部３１８へ送られる。
【００４７】
（９）画像データバッファ部
画像データバッファ部３１８は、画像データ生成部３１６で生成された画像データを一定時間分（例えば６０秒）蓄える機能を持つ。画像データ生成部３１６から新しい画像データが送られてきたときに、既に前記一定時間の画像データが蓄えられている場合は、最も古い画像データの上にそれを上書きする。
【００４８】
（１０）ナレーションデータ生成部
ナレーションデータ生成部３２０は、ナレーションデータを生成する機能を有する。ナレーションデータは、本発明に係わる発言情報、画像情報、及び発声時間情報の一例であり、発話の順序、時間、発言、画像、同時発言フラグの組からなる順列データである。
【００４９】
ナレーションデータの一例を図２に示す。同図は、本対話記録システムにて、２名の対話参加者（ここでは対話参加者Ａ、Ｂとする）の間で図３に示した対話が行われた場合に、個々の対話記録装置で生成されるナレーションデータを示している。図２（ａ）は対話参加者Ａのナレーションデータであり、図２（ｂ）は対話参加者Ｂのナレーションデータである。
【００５０】
時間的に連続した発話は音声入力部２０６で変換され、その変換結果であるデジタル音声データ、または無声を示すヌル（ｎｕｌｌ）値が同図の各行の「発言」欄に格納されている。発話が時間的に連続しているか否かの判定（別発言として次行にデータを格納すべきか否かの判定）は、ユーザの声が途切れた時間が一定の時間（例えば０．２秒）を越えるかどうかによって判定される。
【００５１】
各行の「時間」欄には、同じ行の発言（または無声）に要した時間（本実施の形態では秒を単位とし、１／１００秒までを有効桁とする）が格納されている。時間の計測は、日時データ生成部３０２が生成する時刻データを用いて行われる。
【００５２】
また、各行の「順序」欄には、同行に格納された発言（または無声）の順序を示す数値が格納される。
【００５３】
また、各行の「画像」欄には、同行に格納された発言（または無声）中に前記画像データバッファ部３１８に記憶された画像データが格納される。
【００５４】
また、各行の「同時発言フラグ」欄には、同行に格納された発言が、他の対話参加者の発言にかぶさるように同時になされた発言である可能性があることを示すブール値（ＴＲＵＥまたはＦＡＬＳＥ）が格納される。同時になされた可能性があることを示す値がＴＲＵＥであり、そうでない場合を示す値がＦＡＬＳＥである。ここでは、デフォルト値はＦＡＬＳＥである。この値の設定は、後述する同時発言判定部３２２によってなされる。
【００５５】
同図に示すナレーションデータを生成する際には、ある発言から一定の時間（例えば６０秒）無声だった場合に、それを１つのナレーションデータの区切りと判断する。すなわち、一定時間無声だった場合、それまでの発言と最後の発言の後の無声部分と最初の発言の直前の無声部分とを一つのナレーションデータとして、後述するナレーションデータ記憶部３２４に記憶させる。このとき、同時発言判定部３２２にて、「同時発言フラグ」欄の値を設定してからナレーションデータ記憶部３２４にナレーションデータを記憶させる。
【００５６】
なお、後述するようにナレーションデータに記録される「時間」欄の値は各対話参加者の発言を組み合わせて対話データを生成する際に用いられるものであり、種々の変形実施が可能である。例えば、各発言毎に発言開始時刻または発言終了時刻を記憶するようにもできる。
【００５７】
（１１）同時発言判定部
同時発言判定部３２２は、ナレーションデータ中の発言において、「ええ」「おお」「あの」「ちょっと」など、対話の割り込みや促進のために他の対話参加者の発言にかぶさって同時に発言しやすい表現（以下、同時発言表現と呼ぶ）を判定する機能を持つ。この判定結果は、後述する対話順序判定部３２６が、複数のナレーションデータの組合せ方を判定する際に利用される。
【００５８】
本実施の形態では、ある発言が同時発言表現であるかは、その発言の時間によって判断する。なぜなら、他の対話参加者の発言中に行う発言は、時間的に短いものである確率が高いためである。同時発言判定部３２２は、ある発言が同時発言判定部３２２に予め設定された判定時間以下であればその発言は同時発言表現であると判定し、ナレーションデータでその発言と同行の「同時発言フラグ」欄の値をＴＲＵＥに設定する。
【００５９】
同時発言表現の判定基準となる判定時間の設定処理は、ユーザがコマンド入力部３０８の操作メニューから「判定時間の設定」の項目を選択されることで起動される。この処理では、同時発言表現としてユーザが普段よく使うものをいくつかユーザに発声してもらい、音声入力部２０６からその音声を取り込んで発声時間を計測し、その発声時間の最小値に余裕分（例えば０．１秒）を加えたものを判定時間として同時発言判定部３２２に設定する。この処理の継続／終了は、前記表現を一つ発声してもらう毎に、前記コマンド入力部の「○」「×」ボタンをユーザに押してもらうことで決定する。
【００６０】
なお本実施の形態では同時発言表現の判定を時間によって行っているが、これを音声認識を用いて判定するようにも変形可能である。この変形では、判定時間の設定の変わりに、同時発言表現としてユーザが普段よく使うものとして発声してもらったものをデジタル音声データとして記憶し、この記憶したものと発言との音声的特徴（周波数分布や音量変化や時間など）を比較して判定する。
【００６１】
（１２）ナレーションデータ記憶部
ナレーションデータ記憶部３２４は、自装置のナレーションデータ生成部３２０で生成されたナレーションデータ、あるいは、ビーム通信制御部３１２を介して他の対話記録装置１０から受信したナレーションデータを、前述した日時データやコメント、及びそのナレーションデータに付随する識別番号と共に記憶する機能を持つ。ナレーションデータ記憶部３２４で記憶されたナレーションデータは、後述する対話データの生成処理に用いられる。
【００６２】
ナレーションデータと共に記憶される日時データは、そのデータを記憶した時点で日時データ生成部３０２が生成した日時データである。またナレーションデータと共に記憶される識別番号は、そのナレーションデータが自装置のナレーションデータ生成部３２０で生成されたものであれば識別番号記憶部３０４に記憶された識別番号であり、そのナレーションデータが他装置から受信したものであれば、それと同時に受信した識別番号である。ナレーションデータと共に記憶されるコメントは、ユーザがコマンド入力部３０８から入力したコメントである。コメント入力は、例えばユーザがコマンド入力部３０８の操作メニューで「データにコメント」を選択し、コメント内容を話すことで行われる。ユーザの話すコメント内容は、音声入力部２０６を介して入力され、デジタル音声データに変換され、コメントとしてナレーションデータ記憶部３２４に記憶される。
【００６３】
ナレーションデータ記憶部３２４に記憶されたナレーションデータに対して、ユーザは送受信および削除という操作ができる。この操作は、コマンド入力部３０８を用いて行われる。この操作の際には、個々のナレーションデータに対応する日時データと識別番号がメガネ型ユニット２０の情報表示部２０４に表示され、当該ナレーションデータに対応するコメントが音声出力部から再生される。このように、日時データと識別番号とコメントは、ナレーションデータの内容をユーザが想起する助けとなる。これら日時や識別番号、コメントを助けに、場合によってはナレーションデータそのものを再生して確認した上で、対象とするナレーションデータを特定し、それに対して送受信や削除などの操作を行う。
【００６４】
（１３）対話順序判定部
対話順序判定部３２６は、後述する対話データ生成部３２８が前記ナレーションデータ生成部３２０に記憶された各対話参加者のナレーションデータを組み合わせて対話データを生成する際に、その組合せを判定するための機能を持つ。
【００６５】
対話データは、図４に一例を示すように、すべての対話参加者のナレーションデータの発言と画像とを、その発言順にしたがって組み合わせたものである。図５は、図４における対話データを再生した際に見聞きできる音声と画像の順序を示した図である。
【００６６】
図５に示すように、本実施形態で生成される対話データでは、対話参加者Ａ側の記録音声の再生時には対話参加者Ｂ側の記録画像が同時に再生され、Ｂ側の記録音声再生時にはＡ側の記録画像が再生される。画像撮影部２０２は当該ユーザの正面方向（視線方向）を撮影しており、一般に対話時には各対話参加者は発言者の方向を見ている蓋然性が高いので、発言の音声がそのときの発言者の様子とともに再生されることになる。
【００６７】
ナレーションデータの組合せの判定の原則アルゴリズムは、対話においては一人が話しているときは他の人はほとんど黙っているので、ある人の発話部分が他の人の無声部分に時間的に収まる箇所を見つけ出す、というものである。例えば、対話参加者Ａの発声時間帯が図６（ａ）に示すようにａ１，ａ２，ａ３であり、対話参加者Ｂの発声時間帯が同図（ｂ）に示すようにｂ１，ｂ２であり、さらに対話参加者Ｃの発声時間帯が同図（ｃ）に示すようにｃ１，ｃ２，ｃ３である場合には、同図（ｄ）に示すようにして各対話参加者の発言時間帯の相対的位置が決定される。
【００６８】
すなわち、この順序判定では、いずれかのユーザが発声している最中に他のユーザが発声しているという状態ができるだけ生じないよう、各ユーザの発声時間帯を相対的に前後させる。具体的には、異なるユーザのナレーションデータの内容がデジタル音声データである組の時間の値と、内容がヌル値である組の時間の値とを比較し、組み合わせ方を見つけだす。
【００６９】
しかしながら、対話において一部の発言は他の対話参加者に重なっている場合がある。本実施の形態では、同時発言判定部３２２で同時発言表現と判定された発言、すなわち、ナレーションデータの「同時発言フラグ」欄がＴＲＵＥである発言については、これを無声と見なすことで組み合わせ方を見つけやすくする。より具体的には、「同時発言フラグ」欄がＴＲＵＥである発言とその前後の無声部分とを加算し、これを前記原則的アルゴリズムによる組合せ判定において一つの無声部分として扱う。これは前記原則的アルゴリズムの例外をなす。
【００７０】
（１４）対話データ生成部
対話データ生成部３２８は、前記ナレーションデータ記憶部３２４に記憶された各対話参加者のナレーションデータを組み合わせて対話データを生成する機能を有する。この対話データ生成部３２８の機能を図７を参照して説明する。
【００７１】
まず、対話データの作成に当たり、どのナレーションデータを組み合わせるかがユーザから指定される（Ｓ１０２）。ユーザはコマンド入力部３０８の操作メニューから「対話記録の生成」を選択して、前記ナレーションデータ記憶部３２４に記憶された複数のナレーションデータの内から組み合わせるものを指定する。この指定内容は、対話データ生成部３２８へ通知される。
【００７２】
この通知に応じて、対話データ生成部３２８は、対話順序判定部３２６を用いて、ユーザにより選択された複数のナレーションデータに対してその組合せ方の判定を行い、ナレーションデータの組合せを得る（Ｓ１０４）。
【００７３】
このようにして得られた組合せに応じて、対話データ生成部３２８は、それらナレーションデータを組み合わせ、新たに対話データを生成する。このとき、対話データ生成部３２８は、まず対話データ記憶部３３０に新しい対話データを記憶する領域を確保し（Ｓ１０６）、その領域に対して、生成した対話データを記憶する（Ｓ１０８、Ｓ１１０）。
【００７４】
ここで、生成される対話データ（例えば図４参照）において、各行の「順序」欄の値、「時間」欄の値および「発言」欄の値は、前述の組合せの順番に対応した各ユーザの各発言に対応する値が格納される（Ｓ１０８）。なお、発言が重なっているものに関しては、重なっている発言のうちで後から発言したものを前の発言の直後におく。例えば図４において、発言Ｂ２は発言Ａ２の直後となる。
【００７５】
また対話データ生成部３２８は、対話データ（図４参照）の「画像」欄の値としては、同行の「発言」欄の発言者以外の対話参加者の対話記録装置にてその発言の時間中に記録された画像データを格納する（Ｓ１１０）。すなわち対話データ生成部３２８は、あるナレーションデータの「発言」欄に格納されたデジタル音声データを、対話データのある行の「発言」欄の値としてコピーした場合は、そのナレーションデータとは異なる（すなわち他の参加者の）ナレーションデータの「画像」欄の画像データを、対話データのその行の「画像」欄へコピーする。コピーの対象となる画像データは、同行「発言」欄の発言が行われていたときに記録されていた画像が選択される。どの画像が発言中に記録されたものかは、ナレーションデータの組合せと「時間」欄の値から特定できる。コピーされる画像は、発言時間に合わせて選択される。例えば、図４の例における対話データの第２行の「発言」欄の値であるＢ１は発言Ａ１とＡ２の間にあるので、同行の「画像」欄には、図２（ａ）に示すナレーションデータの第３行目の「画像」欄に格納された画像の中から、Ｂ１の発言時間に合わせて選択された画像がコピーされる。
【００７６】
図４の例は対話参加者が２人の場合なので、一方が無声の区間に他方の発言をはめ込むという比較的単純な処理でよかったが、対話参加者が３人以上の場合は少し工夫を要する。本実施の形態では、対話参加者が３人以上の場合には、発言者以外の複数人の装置で撮影された画像のうちどれを選択するかは、確率的試行によって決定している。すなわち、対話参加者がＡ，Ｂ，Ｃの３人の場合、Ａが発言している時の画像の選択は、Ｂ側の画像、Ｃ側の画像が選ばれる確率をそれぞれ１／２とし、乱数による試行によりどちら側の画像を用いるかを決定する。なお、非発言者が複数の場合の画像の選択方法については、様々な変形が考えられる。例えば、上記の例では各人を等確率で扱っているが、対話参加者の個人プロファイルや対話時の位置関係などに応じ、選択される確率を変えても良い。また、確率を用いずに一定の規則（例えば、先ほどの例においてＢ側の画像とＣ側の画像とを交互に用いる）に従って選択しても良い。
【００７７】
（１５）対話データ記憶部
対話データ記憶部３３０は、対話データ生成部３２８で生成された対話データ、日時データとコメントと識別番号ともに記憶する機能を持つ。また、対話データ記憶部３３０は、ビーム通信制御部３１２を介して他の対話記録装置１０から受信した対話データを、同様に記憶する機能を持つ。これらの機能は、前述したナレーションデータ記憶部３２４が持つ機能に類似のものである。
【００７８】
対話データ記憶部３３０に記憶された対話データは、メガネ型ユニット２０の情報表示部２０４と音声出力部２０８とを用いて再生することができる。対話データを再生したい場合、ユーザは、コマンド入力部３０８の操作メニューから「対話記録の再生」を選べばよい。すると、対話データ記憶部３３０に記憶された対話データの一覧が、記憶された日時などの情報とともに、情報表示部に表示される。「＋」または「−」ボタンを押してこの一覧から任意の一つを選ぶと、選んだ対話データのコメントが再生される。ここで「○」ボタンを押すと、前記対話データの内容が、その順序に従って再生される。再生の途中で「×」ボタンを押すと、再生が停止する。
【００７９】
以上、本発明の好適な実施の形態について説明した。このように、上記実施の形態によれば、各ユーザの発言及び各ユーザの視界に相当する画像（ナレーションデータ）を、各ユーザが携帯する対話記録装置１０（箱形ユニット３０）に個別に記憶しておき、後からそれら対話記録装置１０間でそれらナレーションデータを交換すると、少なくとも１つの対話記録装置１０にて自動的にそれらナレーションデータを組み合わせて、画像を含んだ１つの対話データを生成できるようになる。
【００８０】
このように、本実施の形態のシステムでは、個人の発言やその人の視界に相当する画像は、一時的にはその人が持つ対話記録装置にだけ記録されるようにでき、その人が同意したときにのみ、他の人にそのデータを渡して対話データを生成することができる。したがって、対話後に参加者が合意した場合にだけ対話記録が生成され、合意されない場合には対話記録はどこにも存在しない、という記録の開示コントロールが可能になる。
【００８１】
また、本実施の形態のシステムでは、ユーザ同士の対話中、それらユーザの対話記録装置１０間でデータを送受信したり同期をとったりする必要が全くない。したがって、ネットワーク接続不能な場所や、音響や電波の状況が悪い場所、マイクやカメラの設置が物理的に困難な場所などでも、従来技術に比べてはるかに容易に、発言者の画像も含めた対話記録をとることができる。
【００８２】
また、本実施の形態のシステムでは、発話内容だけでなく画像をも含む対話データが生成できる。ここで、対話データにおいては、ある参加者の発言には、その時他の参加者の対話記録装置１０（画像撮影部２０２）で撮影された画像を対応づけて、同時に再生可能としている。実施の形態では、画像撮影部２０２は当該ユーザの正面方向（視線方向）を撮影しており、一般に対話時には各ユーザは発言者の方向を見ている蓋然性が高いので、そのような発言・画像の編集方式によれば、録音・撮影システムが設置されていない場所に偶然に複数人が出会って対話が始まったような状況でも、発言の音声情報と、そのときの画像（発言者を捉えている蓋然性が高い）とを互いに対応づけて対話データに残すことができる。
【００８３】
なお、本実施の形態においては、対話記録装置はこれを携帯する対話参加者以外のものを撮影する形態をとっているが、これを対話参加者自身を撮影するようにも変形できる。この変形では、対話データ生成部３２８で対話データを編集する際の画像の選択は、発言をしている対話参加者の装置のナレーションデータから画像を選択すればよい。この場合、自分の姿は自分の対話記録装置に記録されることになるので、プライバシー保護の点では優れている。
【００８４】
また、本実施の形態においては、対話記録を生成する際に使用される画像に関して、ある対話参加者が対話記録を生成することに同意した場合、その対話参加者が保持する対話記録装置に記録された画像すべてに対してその人が対話記録への使用を許可したことになる。これを、対話参加者が、自分の装置に記憶されたナレーションデータの各画像について、対話データへの使用への許可・不許可を選択的に決定できるようにしてもよい。
【００８５】
このような変形例として、例えば次のような方式が可能である。
【００８６】
この方式では、対話記録生成のために対話参加者がナレーションデータを他者の対話記録装置に送信する前に、そのナレーションデータに含まれる画像を情報表示部２０４を用いて確認する。ここで、もし使われたくない画像が見つかれば、コマンド入力部３０８を用いてその画像を指定することで、その画像が送信対象のナレーションデータから削除される。したがって、このナレーションデータを他の対話記録装置に送信すれば、その装置で編集される対話データには、そのユーザが許可しなかった画像は組み込まれなくなる。
【００８７】
この場合、対話データを生成する側の対話記録装置では、対話データ生成部３２８は、ある時点における対話データの画像選択の際に、指定された複数のナレーションデータのうち、その時点における画像が存在するナレーションデータから画像を選択する。もしそのようなナレーションデータがなければ、対話データ生成部３２８は、その代わりの画像として、予めその対話記録装置１０に登録された画像（例えばブルーの背景に「画像はありません（ＳＯＵＮＤＯＮＬＹ）」というテロップがある画像など）を選択し、対話データに組み込む。
【００８８】
この方式により、例えば、ある対話参加者が他人に知られたくない手元の情報を見ていた場合や、よそ見をしていた場合など、記録として残されることが不都合な画像を、その対話参加者が自ら選択して削除できる、といったことが可能となる。
【００８９】
【発明の効果】
本発明では、対話参加者毎に発言情報と画像情報とを記録しておき、該発言情報毎に生成される発声時間情報に基づき、そこから対話情報を生成する。こうすれば、画像を含む対話記録を残すかどうかを対話の参加者が対話後にコントロールできる。また、本発明では、対話参加者が個別に携帯するユニットに発言情報と画像情報とを記録し、対話後にそれらの記録から対話情報を生成するようにしたので、偶然に出会った場合などにおいても所望品質の記録を残すことができる。特に、本発明の一態様では、対話において発言している人以外の対話参加者から撮影した画像を対話記録に用いるので、偶然に出会った場合などにおいても発言者を捉えた画像を対話記録に残すことが容易である。
【図面の簡単な説明】
【図１】本発明の実施の形態に係る対話記録システムの構成を示す図である。
【図２】ナレーションデータの一例を示す図である。
【図３】対話参加者Ａと対話参加者Ｂとの対話の一例を示す図である。
【図４】対話記録装置で生成される対話データの一例を示す図である。
【図５】図４の対話データに対応する対話の状況を図解的に示した図である。
【図６】複数のナレーションデータを組合わせ手順を説明する図である。
【図７】対話データ生成部の処理を説明するフロー図である。
【符号の説明】
１０Ａ，１０Ｂ，１０Ｃ対話記録装置、２０Ａ，２０Ｂ，２０Ｃメガネ型ユニット、３０Ａ，３０Ｂ，３０Ｃ箱形ユニット、４０データ送受信機能付きストレージ、２０２画像撮影部、２０４情報表示部、２０６音声入力部、２０８音声出力部、３０２日時データ生成部、３０４識別番号記憶部、３０６ＵＩ制御部、３０８コマンド入力部、３１０ビーム受発光部、３１２ビーム通信制御部、３１４音声データ生成部、３１６画像データ生成部、３１８画像データバッファ部、３２０ナレーションデータ生成部、３２２同時発言判定部、３２４ナレーションデータ記憶部、３２６対話順序判定部、３２８対話データ生成部、３３０対話データ記憶部。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a dialog recording system and a storage medium, and more particularly to a technique for recording a dialog by a plurality of dialog participants including an image of a dialog place.
[0002]
[Prior art]
In many cases, new knowledge can be obtained by recording conversations (conversations, meetings) by multiple people and analyzing the records. For example, when reviewing the content of a customer's dialogue later (especially, showing the record to a senior manager or top sales and listening to an opinion), problems or problems that were not noticed on the spot You can think of a good solution. Alternatively, it is widely known that a technique of analyzing a person's utterance called protocol analysis is effective for examining the usability of the system.
[0003]
As a conventional technique for recording dialogue information including images, a technique related to a video conference system is known. In particular, as a technique for recording a video of a speaker, there is a technique for identifying a speaker from voice information and capturing a video centered on the speaker with a camera. Examples relating to this technology are shown in, for example, JP-A-4-122184, JP-A-5-12289, JP-A-6-178295, JP-A-7-92988, and JP-A-9-37224. There is technology to be.
[0004]
[Problems to be solved by the invention]
However, the prior art has the following two problems.
[0005]
The first is that the feeling that conversations are recorded affects human communication. In the conventional technology, the side that takes a record of a dialog (the organizer of the dialog or someone who participates) prepares for recording before the start of the dialog, and records all utterances of the participants during the dialog. The party to be dialogued (remaining participants) may be asked to agree in advance to record. If you agree here, the person who can take the dialogue must be aware during the dialogue that his / her remarks have been recorded by others and that some form of recording will be used later. This is likely to suppress the participant's remarks, and conversely, the participant is likely to take an excessively aggressive or compliant attitude, and free and intelligent communication is hindered. In addition, if a good idea or opinion was formed in the dialogue unless it was agreed in advance, it was recorded and relied on the memory in the participant's head because it was not recorded. Will have to be reconfigured.
[0006]
The second relates to the place of dialogue. In the prior art, a complete recording environment needs to be established in advance, but the dialogue is not performed only in such an environment. For example, an accidental conversation in an open space, such as standing in a corridor, can give you important tips and ideas for your work. Moreover, even if a customer is visited, it is not always possible to use a place where an apparatus based on the conventional technology such as an interview space can be installed.
[0007]
The present invention has been made in view of the above-mentioned problems, and its purpose is to allow a dialog participant to control whether or not to leave a dialog record including an image after the dialog, and also in a place without a complete recording environment. An object of the present invention is to provide a dialogue recording system capable of recording quality records.
[0008]
[Means for Solving the Problems]
  In order to solve the above-described problem, the dialogue recording system according to the present invention includes, for each dialogue participant, each utterance information input from the voice input means attached to the dialogue participant, the utterance time of the utterance information, and each An utterance information recording means for recording silent time between utterance information in chronological order, and an image information recording for recording image information taken by an imaging means attached to the conversation participant for each conversation participant in chronological order Each utterance information recorded in the utterance information recording means of each dialogue participant and the utterance time and each silent time corresponding to each utterance information,In order to prevent a situation in which other conversation participants are speaking while one conversation participant is speaking, by combining the conversation time periods of each conversation participant relatively back and forth,Dialog order determining means for determining the dialog order of each dialog participant; for each order in the dialog order determined by the dialog order determining means, remark information of the dialog participant corresponding to that order; and the dialog participation By adopting and combining the image information recorded at the speech time corresponding to the speech information among the image information recorded in the predetermined one of the image information recording means of the person or the other dialogue participant A dialogue information generating means for generating dialogue information made between the dialogue participants, and a dialogue information recording means for recording the dialogue information.
[0009]
In this configuration, the speech information and the image information related to the conversation status are recorded in time series for each dialogue participant, and based on the utterance time information for generating the speech information, the speech information and images of each dialogue participant are recorded. Dialog information is generated by combining information in time series. In this way, each participant can control whether or not to keep a dialogue record including images after the dialogue. Also, with this configuration, at the time of dialogue, it is only necessary to record each person's remarks and images for each conversation participant, so complicated control such as camera control by detecting remarks as in a conventional conference system is unnecessary. become.
[0010]
  Moreover, in one aspect of the present invention, the image information recording unit is attached to the body of each dialog participant, records image information representing the surrounding situation of the corresponding dialog participant, and the dialog information generation unit includes: When generating dialogue information,SaidAmong the image information, the image information recorded by the image information recording means of the dialog participant other than the dialog participant related to the comment information is associated.
[0011]
More preferably, the image information recording means includes a camera mounted on the head of the dialogue participant, and records image information corresponding to the gaze direction of the dialogue participant.
[0012]
In this aspect, it is possible to easily create dialog information that can simultaneously reproduce a speaker's speech and an image that captures the speaker. That is, in the dialogue, non-speakers are more likely to see the speaker. Therefore, if a camera is mounted on the head of a conversation participant, it is highly possible that an image of the speaker is recorded in the non-speaker image information recording means. Thus, by combining the speech information with the recorded images of the conversation participants other than the speaker, it is possible to appropriately associate the speech with an image that captures the appearance of the speaker of the speech.
[0013]
  Further, in another aspect of the present invention, the image information recording means is mounted on the body of each dialog participant, records the corresponding dialog participant's own image information, and the dialog information generating means In generating,SaidAmong the image information, the image information recorded by the image information recording means of the dialogue participant related to the comment information is associated.
[0014]
Also in this aspect, it is possible to easily create dialog information that can simultaneously reproduce a speaker's speech and an image that captures the speaker.
[0015]
  In addition, the present inventionIn another aspect ofIsThe speech information recording means and the image information recording means are provided in a user individual recording device attached to each dialog participant, and are acquired by the voice input means and the photographing means attached to the dialog participant. And the user individual recording device controls whether or not the information recorded in the speech information recording means and the image information recording means can be passed to the dialog order determining means and the dialog information generating means. Control means for carrying out the operation.
[0016]
  In this configuration, images about each user's remarks and conversation statusAnd remarksTemporarily record individual usersapparatusTo be recorded. This recordedImages and remarksThe dialogue informationPut together asWhether individual user recordsapparatusTherefore, each user can decide whether or not to keep a dialogue record after the dialogue is finished.
[0017]
  In a preferred aspect, the individual user recordapparatusIsIn the image information recording means in the individual user recording deviceRecordIsTheImage informationIs provided to the user and accepts designation of an image that the user does not want to leave in the dialogue record, and has a function of creating image information in which the designated image is deleted.
[0018]
According to this aspect, it is possible to create an interactive record by deleting an image that each user has determined to be inconvenient.
[0019]
DETAILED DESCRIPTION OF THE INVENTION
DESCRIPTION OF EXEMPLARY EMBODIMENTS Hereinafter, preferred embodiments of the invention will be described in detail with reference to the drawings.
[0020]
FIG. 1 is a diagram showing a configuration of a dialog recording system according to an embodiment of the present invention. As shown in the figure, in the present embodiment, the dialog recording system includes a plurality of dialog recording devices 10A, 10B, and 10C. One dialogue recording device 10A, 10B or 10C is uniquely given to one dialogue participant. The dialog recording devices 10A, 10B, and 10C have the same configuration.
[0021]
One dialog recording apparatus 10A is composed of two units. One is a glasses-type unit 20A that a dialog participant wears like glasses, and the other is a box-type unit 30A that the dialog participant wears on the body using a holster or a belt. The glasses-type unit 20A is in charge of input / output of audio and video, and the box-type unit 30A is in charge of various information processing such as processing relating to audio and video to be input / output and editing of dialogue information. The other dialog recording devices 10B and 10C are also composed of glasses-type units 20B and 20C and box-type units 30B and 30C having the same function.
[0022]
Each of the dialog recording devices 10A to 10C can transmit / receive data to / from the other dialog recording devices 10A to 10C using a beam communication function (details will be described later). In addition, data can be stored in the storage 40 with data transmission / reception function by using the communication function.
[0023]
In the present embodiment, each of the dialogue recording devices 10A to 10C records the speech voice of the dialogue participant wearing the device, the image seen from the participant (or the participant's image), and the like. Then, the voice and image information recorded by each of the devices 10A to 10C in one dialogue session is aggregated and edited in one of the devices 10A to 10C, thereby allowing a plurality of dialogue participants to Generate a dialogue record with audio and images.
[0024]
From here, each function part which comprises dialog recording device 10A-10C is demonstrated. First, each functional unit of the glasses-type unit 20A will be described.
[0025]
[1] Glasses type unit
(1) Image shooting unit
The image photographing unit 202 has a function for photographing an image around a dialog participant (user), and is realized by, for example, a CCD camera. The CCD camera for photographing is attached to the glasses-type unit 20A so as to face the front, and the conversation participant photographs the person whose face is facing. An image captured by the image capturing unit 202 corresponds to an image in the user's field of view. The output of the image photographing unit 202 is analog RGB data, which is sent to the image data generating unit 316 of the box-shaped unit 30A.
[0026]
Note that an image to be shot can be confirmed by selecting “camera image confirmation” in a menu of a command input unit 308 of the box-shaped unit 30A described later. When this is selected, the image (analog RGB data) being captured by the image capturing unit 202 is displayed on the information display unit 204 of the glasses-type unit 20A via the UI control unit 306, and the dialog participant can check the image. .
[0027]
In this embodiment, the image photographing unit 202 is configured to photograph the person whose face is facing the conversation participant in order to photograph the surrounding image of the conversation participant. This can be further enhanced in function. For example, by providing the image capturing unit 202 with a line-of-sight tracking function and an image capturing function linked thereto (for example, digital zoom and panning using a method of extracting an image from a part of a CCD pixel) It is also possible to extract and record an image of the part.
[0028]
(2) Information display section
The information display unit 204 has a function for displaying information to the user (dialog participant). The displayed information is given from the UI control unit 306 of the box-type unit 30A. In the present embodiment, the information display unit 204 takes the form of a projection monitor that projects the RGB data sent from the UI control unit 306 onto the inside of the lens of the glasses.
[0029]
(3) Voice input unit
The voice input unit 206 has a function for inputting the voice of the user (dialog participant). In the present embodiment, it is realized by a microphone having strong directivity. This microphone is attached to the glasses-type unit 20A toward the user's mouth. The user can adjust the microphone sensitivity so that only the voice of the user can be input using the microphone volume adjustment in the operation menu of the command input unit 308 of the box-shaped unit 30A. The output of the voice input unit 206 is analog voice data, which is sent to the voice data generation unit 314 of the box-shaped unit 30A.
[0030]
(4) Audio output unit
The voice output unit 208 has a function of outputting voice of dialogue data stored in the dialogue data storage unit 330 of the box unit 30A. The audio output unit 208 converts digital audio data, which is the audio content of the conversation data, into analog audio, and reproduces the analog audio with a speaker. Note that the speaker playback volume is adjusted by selecting “Speaker volume adjustment” from the menu of the command input unit 308 and pressing the “+” or “−” button.
[0031]
[2] Box unit
In the above, the function part which exists in a spectacles type unit was demonstrated. Next, functional units existing in the box-shaped unit 30A will be described.
[0032]
(1) Date / time data generator
The date / time data generation unit 302 has a function of passing date / time data at that time to another function unit. The date and time data is composed of year, month, date, hour, minute, second, and millisecond, as in 1998.12.2.6.17.25.13.335. This function unit has a built-in clock, and the time adjustment is performed using a command input unit 308 described later.
[0033]
(2) Identification number storage unit
The identification number storage unit 304 has a function of storing an identification number uniquely given to each dialog recording device 10. The identification number stored in this function unit is used when a beam communication control unit 312 described later communicates with another dialogue recording apparatus 10.
[0034]
(3) UI control unit
The UI control unit 306 has a function of interacting with the user using the command input unit 308, the information display unit 204 of the glasses-type unit 20, and the like in cooperation with other function units.
[0035]
(4) Command input part
The command input unit 308 has a function for a user to input a command. Specifically, the following five buttons are prepared.
[0036]
“Menu” button: used by the user to switch between display / non-display of the operation menu. The operation menu items include: a) playback of dialog record, b) generation of dialog record, c) setting of judgment time, d) data transmission, e) data reception, f) data deletion, g) data comment, h) There are twelve top level menus such as microphone volume adjustment, i) speaker volume adjustment, j) camera image confirmation, k) time adjustment, and l) identification number display, and there are submenus for determining details in each item.
[0037]
-“+” Button: Used to change the value in the positive direction, such as selecting the next option in the list or increasing the value.
[0038]
"-" Button: Used to change the value in the negative direction, such as selecting the previous option in the list or lowering the value.
[0039]
-“O” button: Used to make various decisions such as starting playback of a dialog record, selecting an option, or answering “Yes” to a confirmation message.
[0040]
"X" button: Used for various cancellations such as stopping playback of dialog recording, returning from the submenu to the upper menu, or answering "No" to the confirmation message.
[0041]
(5) Beam receiving and emitting unit
The beam receiving / emitting unit 310 has a function of receiving or emitting a beam in accordance with control of a beam communication control unit 312 (details will be described later). In this embodiment, in order to prevent communication contents from reaching other people, the range in which the beam reaches (distance and relative angle between light receiving and emitting units) is reduced.
[0042]
(6) Beam communication control unit
The beam communication control unit 312 has a function of realizing beam communication between dialog storage devices in cooperation with the beam communication control unit 312 of another person's dialog recording device. The data to be communicated is narration data and dialogue data described later.
[0043]
The start of beam communication is instructed from the command input unit 308. When exchanging data between dialog recording apparatuses, first, the menu “data reception” is selected in the dialog recording apparatus 10 on the reception side to wait for reception, and then the menu “data transmission” is performed in the dialog recording apparatus 10 on the transmission side. Is selected, data to be transmitted is designated, and data is transmitted. When transmitting data, in order to identify the sender, the identification number stored in the identification number storage unit 304 of the dialog recorder 10 on the transmission side is transmitted together with the transmission data. Further, the receiving side dialog recording apparatus 10 stores the received identification number and data in association with each other. At this time, the data is stored in either a narration data storage unit 324 or a dialogue data storage unit 330 described later according to the type of received data.
[0044]
In this embodiment, a beam is used for communication between dialog recording apparatuses. In this system, due to the limitation of the communication range by the beam, it is possible to secure a considerable degree of communication secrecy without performing special encryption processing on data to be transmitted and received. Instead of such a method using a beam, data to be transmitted and received may be encrypted and communicated by radio waves. By using radio waves, restrictions on the position and orientation of units are greatly relaxed compared to beams. Further, since the encrypted data is encrypted, even if another person within the radio wave range receives the data, the other person cannot expand the encrypted data, so that it is possible to prevent others from seeing the conversation record.
[0045]
(7) Audio data generator
The audio data generation unit 314 has a function of generating digital audio data from analog audio data that is an output of the audio input unit 206 of the glasses-type unit 20A. The generated digital audio data is sent to the narration data generation unit 320.
[0046]
(8) Image data generation unit
The image data generation unit 316 has a function of generating digital image data from analog RGB data that is an output of the image photographing unit 202 of the glasses-type unit 20A. In the present embodiment, digital image data is generated from the analog image of the image capturing unit 202 at regular intervals (for example, 0.2 seconds). The generated image data is sent to an image data buffer unit 318 described later.
[0047]
(9) Image data buffer section
The image data buffer unit 318 has a function of storing the image data generated by the image data generation unit 316 for a predetermined time (for example, 60 seconds). When new image data is sent from the image data generation unit 316, if the image data for the predetermined time is already stored, it is overwritten on the oldest image data.
[0048]
(10) Narration data generator
The narration data generation unit 320 has a function of generating narration data. Narration data is an example of utterance information, image information, and utterance time information according to the present invention, and is permutation data including a set of utterance order, time, utterance, image, and simultaneous utterance flag.
[0049]
An example of narration data is shown in FIG. This figure shows individual dialog recording devices when the dialog shown in FIG. 3 is performed between two dialog participants (here, dialog participants A and B) in this dialog recording system. Shows the narration data generated. 2A shows the narration data of the dialog participant A, and FIG. 2B shows the narration data of the dialog participant B.
[0050]
Speech that is temporally continuous is converted by the voice input unit 206, and digital voice data as a result of the conversion or a null value indicating unvoiced is stored in the “speak” column of each row in FIG. Whether or not utterances are continuous in time (determination of whether or not data should be stored in the next line as another utterance) is a fixed time (for example, 0.2 seconds) when the user's voice is interrupted It is judged by whether or not.
[0051]
In the “time” column of each row, the time required for speaking (or unvoiced) on the same row (in this embodiment, the unit is seconds, and 1/100 second is an effective digit) is stored. The time is measured using time data generated by the date / time data generation unit 302.
[0052]
In the “order” column of each row, a numerical value indicating the order of utterances (or unvoiced) stored in the row is stored.
[0053]
In the “image” column of each row, the image data stored in the image data buffer unit 318 during the speech (or unvoiced) stored in the row is stored.
[0054]
In addition, in the “simultaneous speech flag” column of each row, a Boolean value (TRUE or TRUE) indicating that the speech stored in the row may be a speech made simultaneously with the speech of another dialogue participant. FALSE) is stored. A value indicating that it may have been made at the same time is TRUE, and a value indicating otherwise is FALSE. Here, the default value is FALSE. This value is set by the simultaneous speech determination unit 322 described later.
[0055]
When generating the narration data shown in the figure, if there is no voice for a certain time (for example, 60 seconds) from a certain utterance, it is determined as a break of one narration data. That is, when there is no voice for a certain period of time, the voice until then, the voiceless part after the last voice and the voiceless part immediately before the first voice are stored as one narration data in the narration data storage unit 324 described later. At this time, the simultaneous speech determination unit 322 sets the value of the “simultaneous speech flag” field, and then stores the narration data in the narration data storage unit 324.
[0056]
As will be described later, the value in the “time” column recorded in the narration data is used when generating dialogue data by combining the comments of each dialogue participant, and various modifications can be made. For example, the speech start time or the speech end time can be stored for each speech.
[0057]
(11) Simultaneous speech determination unit
The simultaneous speech determination unit 322 easily speaks in the speech in the narration data by covering other speech participants' speeches such as “Yes”, “Oh”, “That”, “A little”, etc. It has a function of determining an expression (hereinafter referred to as simultaneous speech expression). This determination result is used when the dialogue order determination unit 326 described later determines how to combine a plurality of narration data.
[0058]
In the present embodiment, whether a certain utterance is a simultaneous utterance expression is determined by the time of the utterance. This is because an utterance made during the utterances of other dialog participants is likely to be short in time. The simultaneous utterance determination unit 322 determines that the utterance is a simultaneous utterance expression if a certain utterance is equal to or less than the determination time preset in the simultaneous utterance determination unit 322, and the “simultaneous utterance flag” accompanying the utterance in the narration data. The value in the “” column is set to TRUE.
[0059]
The determination time setting process, which is a determination criterion for the simultaneous speech expression, is activated when the user selects the “determination time setting” item from the operation menu of the command input unit 308. In this process, the user usually utters some commonly used speech expressions, takes the voice from the voice input unit 206, measures the utterance time, and adds a margin to the minimum value of the utterance time ( For example, the simultaneous speech determination unit 322 is set as a determination time with 0.1 second added. The continuation / termination of this process is determined by having the user press the “O” and “X” buttons of the command input section each time one of the expressions is uttered.
[0060]
In the present embodiment, the simultaneous speech expression is determined according to time. However, this can be modified to be determined using voice recognition. In this modification, instead of the setting of the determination time, what the user utters as frequently used as a simultaneous speech expression is stored as digital voice data, and the voice characteristics (frequency) of the stored voice and the speech are stored. (Distribution, volume change, time, etc.)
[0061]
(12) Narration data storage unit
The narration data storage unit 324 stores the narration data generated by the narration data generation unit 320 of the own device or the narration data received from another dialog recording device 10 via the beam communication control unit 312, It has a function of storing a comment and an identification number associated with the narration data. The narration data stored in the narration data storage unit 324 is used for a dialog data generation process to be described later.
[0062]
The date / time data stored together with the narration data is date / time data generated by the date / time data generation unit 302 at the time of storing the data. The identification number stored together with the narration data is the identification number stored in the identification number storage unit 304 if the narration data is generated by the narration data generation unit 320 of the own device. If it is received from the device, it is the identification number received at the same time. The comment stored together with the narration data is a comment input from the command input unit 308 by the user. The comment is input by, for example, the user selecting “Comment on data” from the operation menu of the command input unit 308 and speaking the content of the comment. The comment content spoken by the user is input via the voice input unit 206, converted into digital voice data, and stored as a comment in the narration data storage unit 324.
[0063]
The user can perform transmission / reception and deletion operations on the narration data stored in the narration data storage unit 324. This operation is performed using the command input unit 308. In this operation, date and time data and identification numbers corresponding to individual narration data are displayed on the information display unit 204 of the glasses-type unit 20, and a comment corresponding to the narration data is reproduced from the audio output unit. As described above, the date data, the identification number, and the comment help the user recall the contents of the narration data. With the help of these date / time, identification number, and comment, in some cases, the narration data itself is reproduced and confirmed, and then the target narration data is specified, and operations such as transmission / reception and deletion are performed.
[0064]
(13) Dialogue order determination unit
The dialog order determination unit 326 determines a combination when the dialog data generation unit 328 described later generates dialog data by combining the narration data of each dialog participant stored in the narration data generation unit 320. Has function.
[0065]
As shown in FIG. 4, the dialogue data is a combination of narration data utterances and images of all dialogue participants according to the utterance order. FIG. 5 is a diagram showing the order of audio and images that can be seen and heard when the conversation data in FIG. 4 is reproduced.
[0066]
As shown in FIG. 5, in the dialogue data generated in the present embodiment, the recorded image on the dialogue participant B side is reproduced at the same time when the recorded voice on the dialogue participant A side is reproduced, and the recorded image on the B side is reproduced on the A side. Side recorded image is played back. The image capturing unit 202 captures the front direction (line-of-sight direction) of the user. In general, each conversation participant is highly likely to see the direction of the speaker at the time of conversation, so the voice of the speaker is the speaker at that time. It will be played with the state of.
[0067]
The principle algorithm for determining the combination of narration data is that when one person is speaking in dialogue, the other person is almost silent, so the part where one person's utterance part fits in the silent part of another person in time. It is to find out. For example, the utterance time zone of the dialog participant A is a1, a2, a3 as shown in FIG. 6A, and the utterance time zone of the dialog participant B is b1, b2 as shown in FIG. 6B. In addition, when the utterance time zone of the dialogue participant C is c1, c2, and c3 as shown in FIG. 10C, the utterance time zone of each dialogue participant is shown as shown in FIG. The relative position of is determined.
[0068]
That is, in this order determination, the utterance time period of each user is relatively moved forward and backward so that a state in which another user is uttering as much as possible does not occur while any user is uttering. Specifically, the time value of a set whose contents of narration data of different users are digital audio data is compared with the time value of a set whose contents are null values, and a combination method is found.
[0069]
However, some statements in the dialogue may overlap with other dialogue participants. In the present embodiment, the utterance determined as the simultaneous utterance expression by the simultaneous utterance determination unit 322, that is, the utterance in which the “simultaneous utterance flag” column of the narration data is TRUE is combined with the voice by regarding it as silent. Make it easy to find. More specifically, an utterance in which the “simultaneous utterance flag” column is TRUE and the unvoiced parts before and after the utterance are added, and this is treated as one unvoiced part in the combination determination by the principle algorithm. This is an exception to the principle algorithm.
[0070]
(14) Dialog data generator
The dialogue data generation unit 328 has a function of generating dialogue data by combining the narration data of each dialogue participant stored in the narration data storage unit 324. The function of the dialog data generation unit 328 will be described with reference to FIG.
[0071]
First, in creating dialogue data, the user designates which narration data is to be combined (S102). The user selects “Generate Dialog Record” from the operation menu of the command input unit 308 and designates a combination of a plurality of narration data stored in the narration data storage unit 324. This specified content is notified to the dialog data generation unit 328.
[0072]
In response to this notification, the dialog data generation unit 328 uses the dialog order determination unit 326 to determine how to combine a plurality of narration data selected by the user, and obtain a combination of narration data (S104). ).
[0073]
In accordance with the combination obtained in this way, the dialogue data generation unit 328 combines the narration data and newly generates dialogue data. At this time, the dialog data generation unit 328 first secures an area for storing new dialog data in the dialog data storage unit 330 (S106), and stores the generated dialog data in the area (S108, S110).
[0074]
Here, in the generated dialogue data (see, for example, FIG. 4), the value in the “order” column, the value in the “time” column, and the value in the “speech” column of each row are the respective users corresponding to the order of the combination. A value corresponding to each utterance is stored (S108). In addition, with respect to those with overlapping statements, those that have been made later are placed immediately after the previous statement. For example, in FIG. 4, the utterance B2 is immediately after the utterance A2.
[0075]
Further, the dialog data generation unit 328 sets the value of the “image” field of the dialog data (see FIG. 4) in the dialog recording device of the dialog participant other than the speaker in the “speak” field of the bank during the speech time. The recorded image data is stored (S110). That is, the dialogue data generation unit 328 is different from the narration data when the digital voice data stored in the “speech” column of a certain narration data is copied as the value of the “speech” column of a certain row of dialogue data ( That is, the image data in the “image” column of the narration data (of other participants) is copied to the “image” column in that line of the dialogue data. As the image data to be copied, the image recorded when the remark in the “Remark” column is accompanied is selected. Which image was recorded during the speech can be specified from the combination of narration data and the value in the “time” column. The image to be copied is selected according to the speech time. For example, since B1 which is the value of the “speech” column in the second line of the dialogue data in the example of FIG. 4 is between the utterances A1 and A2, the “image” column of the same line is shown in FIG. From the images stored in the “image” column on the third line of the narration data, an image selected in accordance with the speech time of B1 is copied.
[0076]
The example in FIG. 4 is a case where there are two dialogue participants, so it may have been a relatively simple process of inserting the other speech into a silent section, but a little ingenuity is required when there are three or more dialogue participants. . In the present embodiment, when there are three or more participants in the dialogue, which image to be selected from a plurality of devices other than the speaker is selected by a probabilistic trial. That is, when there are three participants A, B, and C, the selection of an image when A is speaking makes the probability that the image on the B side and the image on the C side will be each halved, Which side of the image is used is determined by trial using a random number. Note that various modifications can be considered for the method of selecting an image when there are a plurality of non-speakers. For example, in the above example, each person is treated with equal probability, but the probability of selection may be changed according to the personal profile of the dialog participant, the positional relationship during the dialog, and the like. Alternatively, the selection may be made according to a certain rule without using the probability (for example, the B side image and the C side image are alternately used in the previous example).
[0077]
(15) Dialog data storage unit
The dialog data storage unit 330 has a function of storing the dialog data generated by the dialog data generation unit 328, date and time data, comments, and identification numbers. The dialogue data storage unit 330 has a function of similarly storing dialogue data received from another dialogue recording device 10 via the beam communication control unit 312. These functions are similar to the functions of the narration data storage unit 324 described above.
[0078]
The dialogue data stored in the dialogue data storage unit 330 can be reproduced using the information display unit 204 and the audio output unit 208 of the glasses-type unit 20. When the user wants to reproduce the dialog data, the user may select “play dialog record” from the operation menu of the command input unit 308. Then, a list of dialogue data stored in the dialogue data storage unit 330 is displayed on the information display unit together with information such as the stored date and time. When an arbitrary one is selected from this list by pressing the “+” or “−” button, the comment of the selected dialogue data is reproduced. When the “◯” button is pressed here, the contents of the dialogue data are reproduced in the order. Pressing the “X” button during playback stops playback.
[0079]
The preferred embodiments of the present invention have been described above. As described above, according to the above embodiment, each user's remarks and images (narration data) corresponding to each user's field of view are individually stored in the dialog recording device 10 (box unit 30) carried by each user. If the narration data is exchanged between the dialog recording devices 10 later, at least one dialog recording device 10 can automatically combine the narration data to generate one dialog data including an image. It becomes like this.
[0080]
As described above, in the system according to the present embodiment, an image corresponding to a person's remarks or the person's field of view can be temporarily recorded only on the person's dialog recording device, and the person agrees. Only then can you pass the data to others and generate interaction data. Accordingly, it is possible to control the disclosure of a record that a dialogue record is generated only when the participant agrees after the dialogue, and there is no dialogue record if the agreement is not agreed.
[0081]
Further, in the system of the present embodiment, there is no need to transmit / receive data or synchronize between the user's dialog recording devices 10 during the dialogue between users. Therefore, even in places where network connection is impossible, places where sound and radio wave conditions are poor, places where microphones and cameras are physically difficult to install, etc., it is much easier to include the images of speakers. Dialogue records can be taken.
[0082]
Further, in the system according to the present embodiment, it is possible to generate dialogue data including not only utterance contents but also images. Here, in the conversation data, an utterance of a certain participant is associated with an image photographed by the conversation recording device 10 (image photographing unit 202) of another participant at that time, and can be reproduced simultaneously. In the embodiment, the image capturing unit 202 captures the front direction (line-of-sight direction) of the user, and generally, each user is highly likely to see the direction of the speaker during a conversation. According to the editing method, even in situations where a conversation occurred when multiple people met by chance in a place where a recording and shooting system was not installed, the voice information of the speech and the image at that time (capturing the speaker) Can be left in the dialog data in association with each other.
[0083]
In the present embodiment, the dialog recording apparatus takes a form that captures images other than the dialog participant carrying it, but this can be modified to capture the dialog participant itself. In this modification, the selection of the image when the dialog data generation unit 328 edits the dialog data may be performed by selecting an image from the narration data of the device of the dialog participant who is speaking. In this case, since his / her appearance is recorded in his / her dialog recording device, it is excellent in terms of privacy protection.
[0084]
Also, in the present embodiment, when a dialog participant agrees to generate a dialog record regarding the image used when generating the dialog record, the dialog record device holds the dialog record. This means that the person has permitted the use of the recorded image for all recorded images. This may allow the dialog participant to selectively determine whether to use the dialog data for each image of the narration data stored in his device.
[0085]
As such a modification, for example, the following method is possible.
[0086]
In this method, a dialog participant confirms an image included in the narration data using the information display unit 204 before transmitting the narration data to another person's dialog recording device for generating a dialog record. Here, if an image that is not desired to be used is found, the image is deleted from the narration data to be transmitted by designating the image using the command input unit 308. Therefore, if this narration data is transmitted to another dialog recording device, the dialog data edited by that device will not include images that the user did not permit.
[0087]
In this case, in the dialog recording device on the dialog data generation side, the dialog data generation unit 328 has an image at that time among a plurality of designated narration data when selecting an image of the dialog data at a certain time. Select an image from the narration data. If there is no such narration data, the dialogue data generation unit 328 will use an image registered in advance in the dialogue recording device 10 as an alternative image (for example, “There is no image (SOUND ONLY) on a blue background)”. Select an image with a telop, etc.) and incorporate it into the conversation data.
[0088]
With this method, for example, when a dialog participant is looking at information that he / she does not want to be known to others, or when he / she is looking away, an image that is inconvenient to be recorded can be displayed in the dialog participant. Can be selected and deleted by themselves.
[0089]
【The invention's effect】
In the present invention, the utterance information and the image information are recorded for each dialogue participant, and the dialogue information is generated therefrom based on the utterance time information generated for each utterance information. In this way, it is possible for the participants of the dialogue to control whether or not to keep a dialogue record including an image after the dialogue. Also, in the present invention, since the conversation information is recorded in the unit that is individually carried by the conversation participant and the conversation information is generated from the record after the conversation, even in the case of encountering by chance, etc. Records of desired quality can be kept. In particular, in one aspect of the present invention, an image taken from a participant other than the person who is speaking in the dialogue is used for the dialogue recording. It is easy to leave.
[Brief description of the drawings]
FIG. 1 is a diagram showing a configuration of a dialog recording system according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating an example of narration data.
FIG. 3 is a diagram illustrating an example of a dialogue between a dialogue participant A and a dialogue participant B.
FIG. 4 is a diagram illustrating an example of dialogue data generated by the dialogue recording apparatus.
FIG. 5 is a diagram schematically showing a dialog state corresponding to the dialog data of FIG. 4;
FIG. 6 is a diagram illustrating a procedure for combining a plurality of narration data.
FIG. 7 is a flowchart for explaining processing of a dialog data generation unit.
[Explanation of symbols]
10A, 10B, 10C Dialog recording device, 20A, 20B, 20C glasses-type unit, 30A, 30B, 30C box unit, 40 storage with data transmission / reception function, 202 image capturing unit, 204 information display unit, 206 voice input unit, 208 Audio output unit, 302 Date / time data generation unit, 304 Identification number storage unit, 306 UI control unit, 308 Command input unit, 310 Beam receiving / emitting unit, 312 Beam communication control unit, 314 Audio data generation unit, 316 Image data generation unit, 318 Image data buffer unit, 320 Narration data generation unit, 322 Simultaneous speech determination unit, 324 Narration data storage unit, 326 Dialogue order determination unit, 328 Dialogue data generation unit, 330 Dialogue data storage unit

Claims

Remark information recording means for recording each utterance information inputted from the voice input means attached to the conversation participant for each conversation participant, the utterance time of those utterance information, and the silent time between each utterance information in time series order When,
Image information recording means for recording the image information photographed by the photographing means attached to the dialogue participant for each dialogue participant in chronological order;
From each utterance information recorded in the utterance information recording means of each dialogue participant in the chronological order, the speech time corresponding to each utterance information, and each silent time, while another dialogue participant is speaking, A dialogue order determination means for judging the dialogue order of each dialogue participant by combining the speech time zones of each dialogue participant relatively back and forth so that the state that the dialogue participant speaks does not occur ;
For each order in the dialog order determined by the dialog order determining means, the speech information of the dialog participant corresponding to that order, and a predetermined one of the dialog participants or other dialog participants Dialog information generation for generating dialog information between dialog participants by adopting and combining image information photographed at the speech time corresponding to the speech information among the image information recorded in the information recording means Means,
Dialogue information recording means for recording the dialogue information;
A dialogue recording system comprising:

The dialog recording system according to claim 1,
The image information recording means is mounted on the body of each dialogue participant, and records image information representing a situation around the corresponding dialogue participant,
The dialog information generating means corresponds to the image information recorded by the image information recording means of a dialog participant other than the dialog participant related to the comment information among the image information in generating the dialog information. Attach,
A dialog recording system characterized by that.

In the dialog recording system according to claim 2,
The image information recording means includes a camera mounted on a head of the dialogue participant, and records image information according to a gaze direction of the dialogue participant;
A dialog recording system characterized by that.

The dialog recording system according to claim 1,
The image information recording means is mounted on the body of each dialogue participant, and records the image information of the corresponding dialogue participant itself,
The dialog information generating means associates the image information recorded by the image information recording means of the dialog participant related to the remark information among the image information when generating the dialog information.
A dialog recording system characterized by that.

In the dialog recording system according to any one of claims 1 to 4,
The speech information recording means and the image information recording means are provided in a user individual recording device attached to each dialog participant, and are acquired by the voice input means and the photographing means attached to the dialog participant. Record each
The user individual recording device is a control means for the user to control whether or not the information recorded in the speech information recording means and the image information recording means can be passed to the dialog order determining means and the dialog information generating means. A dialogue recording system comprising:

6. The system according to claim 5, wherein
The individual user recording device accepts designation of an image that the user does not want to leave in the dialogue record by presenting image information recorded in the image information recording means in the individual user recording device to the user, and deletes the designated image It has a function to create image information.
Dialog recording system.