JP3829005B2

JP3829005B2 - Virtual environment presentation device

Info

Publication number: JP3829005B2
Application number: JP4553798A
Authority: JP
Inventors: 敏雄茂出木
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 1998-02-26
Filing date: 1998-02-26
Publication date: 2006-10-04
Anticipated expiration: 2018-02-26
Also published as: JPH11249772A

Description

【０００１】
【発明の属する技術分野】
本発明は、特に医療行為（救急処置、外科手術、マッサージ、介護等）、ダンス・体操等、主として人が対人的に行う各種演技・手技の修得や、マニュアル類を持ち込めない実践の場、例えば教育現場等で身体で覚える実技の修得に、あるいは多くの図面を要する設備類のメンテナンス作業等の支援に適用して好適な、仮想環境提示装置に関する。
【０００２】
【従来の技術】
医療行為等、主として人が対人的に行う各種演技・手技又は実技（以下、単に演技ともいう）を修得する場合、実践の場ではマニュアルを参照する余地がないため、指導者によるマンツーマン教育が最も効率的で不可欠になっている。
【０００３】
とは言うものの、誰もが常にマンツーマン教育を受けられるとは限らない。そのため、このマンツーマン教育以外に、依然として教則本・マニュアルとして印刷物、ビデオ等が利用され、更にＣＤ−ＲＯＭ、最近ではインターネットＷＷＷ（Ｗorld Ｗide Ｗeb）の媒体を利用して修得することが可能になっている。
【０００４】
【発明が解決しようとする課題】
しかしながら、実践の場にこれらマニュアル類を持ち込むことは困難であり、仮に持ち込めたとしても演技を修得をしながら、同時にマニュアルを参照することは困難であると共に、媒体の違いによっても以下のような各種の問題がある。
【０００５】
（１）印刷物：携帯性はよいが、演技中に参照したり、頁をめくったりできない。
【０００６】
（２）ビデオ：操作が不要で遠方から視聴しながら演技を同時にできるという利点はあるが、後戻りや速度変更、アングルの変更ができないため、テンポを遅くしたいという初心者のニーズや、アングルを変えて死角に入っている箇所を見たいというニーズに応えることができない。又、模範演技と利用者が対面している向きが逆の場合は、頭の中で絶えず逆向きに手技を翻訳しなくてはならず、初心者には難しいということもある。
【０００７】
（３）ＣＤ−ＲＯＭ：ビデオの問題点は解決されているが、マウス操作が必要になる。
【０００８】
（４）ＷＷＷ：ＣＤ−ＲＯＭに比べて提供できるコンテンツが拡大するという利点はあるが、同様にマウス操作が必要で、回線接続等、携帯性は逆に劣る。
【０００９】
本発明は、前記従来の問題点を解決するべくなされたもので、実践の場で演技や作業を続けながら、その支援をオンラインマニュアルにより受けることができる環境、即ちいわばウォーキング・ディクショナリ環境を提供することを課題とする。
【００１０】
【課題を解決するための手段】
本発明は、仮想環境提示装置において、マルチメディア情報を画像認識手段及び音声認識手段により提示制御可能となるように構成した情報提示装置と、前記装置に対するハンズフリーによる遠隔対話操作、及び前記装置により提供される情報に対するシースルー視聴を可能にした、少なくとも映像入力手段、映像出力手段、音声入力手段、音声出力手段を有する、利用者に取付可能なユーザインタフェース装置とを備え、前記映像入力手段は利用者の視野画像を入力し、該視野画像に基づいて前記情報提示装置の画像認識手段により利用者の視線を認識し、予め用意してある複数方向の模範映像又は実写映像の中から、認識した視線に最も近いアングルの映像を前記映像出力手段に提供し、該映像のアングルを前記視線に近付けるように制御すると共に、前記音声入力手段は利用者の音声を入力し、該音声から前記情報提示装置の音声認識手段により利用者が発声した文字列を認識し、該文字列に基づいて前記情報提示装置で提供するマルチメディア情報のコンテンツを切り替えるようにし、且つ、前記映像入力手段で入力された映像と、前記情報提示装置で提供される映像とを画像処理により合成し、その合成画像を前記映像出力手段に提供するようにしたことにより、前記課題を解決したものである。
【００１１】
即ち、本発明においては、ユーザインタフェース装置が有する映像入力手段により、演技等をサポートするマニュアル情報を映像として利用者に提示できるようにすると共に、その映像を音声出力手段や映像出力手段等を介して、例えば利用者の声や視線により、手を使うことなく、即ちハンズフリーで希望する情報コンテンツに切り替えることができるようにしたので、利用者は演技等を続けながら、その演技等を支援する所望のサポート情報を受け取ることができる。
【００１２】
【発明の実施の形態】
以下、図面を参照して、本発明の実施の形態について詳細に説明する。
【００１３】
図１は、本発明に係る一実施形態の救急医療支援システム（仮想環境提示装置）を概念的に示すブロック図、図２は、同システムの構成をより具体的に示したブロック図である。又、図３は、本実施形態の支援システムを構成するヘッドセット（ユーザインタフェース装置）を、利用者が装着した状態と共に示した正面図、図４は、その平面図である。
【００１４】
本実施形態の救急医療支援システムは、マニュアルとして電子媒体で提供される模範映像を、離れた位置から視聴しながら演技修得をハンズフリーで行うことができる仮想環境を実現する家庭向けのシステムである。
【００１５】
本実施形態の支援システムは、前記図１、２に示すように、救急医療支援のためのマニュルアル映像を初めとする、マルチメディア情報を提示・制御可能に構成されているパーソナルコンピュータ本体（情報提示装置、クライアント端末）Ａと、該本体Ａにワイヤレスで接続されたヘッドセット（ユーザインタフェース装置）１０とを備えている。
【００１６】
本実施形態では、前記マルチメディア情報が、インターネットを介してＷＷＷサーバＢのファイルサーバ（マルチメディアサーバ）Ｃから、上記コンピュータ本体Ａに提供されるようになっている。
【００１７】
なお、図１中ＨＴＭＬ（Ｈyper Ｔext Ｍarkup Ｌanguage ）は、インターネットのホームページの作成に用いられる言語、Ｊａｖａはサン・マイクロシステムズ社が開発したプログラム言語で、この言語で作成したプログラム（アプレット）は、インターネットのホームページを表示する際に実行される。又、ＶＲＭＬ（Ｖirtual Ｒeality Ｍodeling Ｌauguage ）はインターネットで３次元のグラフィックスを表現できる言語であり、ＭＩＤＩ（Ｍusical Ｉnstruments Ｄigital Ｉnterface）はコンピュータで音楽データをやり取りするための規格である。
【００１８】
上記ＷＷＷは、オンラインで豊富なコンテンツを提供でき、後戻り、速度変更、アングル変更等の機能も技術的には提供可能であるという点で、現状最も有望なメディアである。
【００１９】
本実施形態では、又、利用者が演技等の修得をしながら、電子マニュアルを参照できるようにするために、ユーザインタフェースとして必要とされる以下の５つの機能を実現している。
【００２０】
（１）両手を使わず、コンピュータ操作が行えること（ハンズフリー操作）。
【００２１】
（２）演技に支障がない程度に移動が可能であること（携帯性・軽量性）。
【００２２】
（３）模範演技（バーチャル映像）と現実像を同時に視聴可能であること（シースルー）。
【００２３】
（４）模範演技のアングルはユーザの視点にできるだけ近いこと（少なくとも逆向きでないこと）。
【００２４】
（５）模範演技の速度はユーザのスキルに応じて可変であり、演技中に変更も可能であること。
【００２５】
本実施形態においては、以下のシステム構成とすることにより、以上の各機能を実現している。
【００２６】
即ち、前記ヘッドセット１０を、図３、４に示すように、液晶ゴーグル１２、小型ビデオカメラ１４、左右のイヤパッドからなるヘッドフォン１６と、矢印方向に可動なハンズフリーのマイクロフォン１８と、上記液晶ゴーグル１２の上に固定されているペンライト２０と、伸縮自在なヘッドバンド１０Ａとを一体とした構成としている。
【００２７】
上記液晶ゴーグル１２は、半透明液晶ディスプレイを用いた光学的シースルー機能を有するシースルータイプである。このゴーグル１２とヘッドフォン１６は、それぞれゴーグルビデオケーブル１２Ｃと、ヘッドフォンオーディオケーブル１６Ｃを介して、利用者の腰にベルトで固定されている受信器（映像受信手段、音声受信手段）２２に接続されている。そして、受信器２２にはバッテリーが内蔵されており、前記映像受信手段及び音声受信手段に電源を供給する以外に、ゴーグル１２に対してゴーグルビテオケーブル１２Ｃを介して必要な電源を供給している。
【００２８】
又、前記小型ビデオカメラ１４とマイクロフォン１８は、それぞれビデオカメラケーブル１４Ｃとマイクオーディオケーブル１８Ｃを介して送信器（映像送信手段、音声送信手段）２４に接続されている。そして、送信器２４にもバッテリーが内蔵されており、前記映像送信手段及び音声送信手段に電源を供給する以外に、ビデオカメラ１４に対してビデオカメラケーブル１４Ｃを介して必要な電源を供給すると共に、マイクロフォン１８に対してマイクオーディオケーブル１８Ｃ介して必要な電源を供給している。
【００２９】
又、本実施形態では、前記コンピュータ本体Ａ（クライアント端末）が、図２に示したように音声認識モジュール３０、ステータス制御部３２、音声合成モジュール（又はＭＩＤＩ音源）３４、再生速度制御部３６、レンダリングやビデオ復号再生、背景実写像との合成を行う画像処理部３８、カメラパラメータ抽出機能を有する画像認識モジュール４０を備えている。
【００３０】
そして、上記ヘッドセット１０を利用者が頭部に取付けることにより、その利用者は演技を行いながら、電子媒体で提供されるマニュアルを視聴できるようにするために、該ヘッドセット１０と前記コンピュータ本体Ａとが、前記受信器２２、送信器２４を介してワイヤレス・アナログ回線で接続されている。
【００３１】
即ち、前記液晶ゴーグル１２には、受信器２２を介して画像処理部３８から赤外線で送信されるビデオ信号が入力され、ヘッドフォン１６には音声合成モジュール３４から同様にしてオーディオ信号が入力されるようになっている。一方、ビデオカメラ１４からは送信器２４を介して画像処理部３８及び画像認識モジュール４０にＶＨＦ帯の電波で送信されるビデオ信号が出力され、マイクロフォン１８からは音声認識モジュール３０に同様にしてオーディオ信号が出力されるようになっている。
【００３２】
このように、ここでは、ヘッドセット１０とコンピュータ本体Ａの間において、映像と音声の双方向ワイヤレス伝送を実現するにあたり、双方の干渉を避けるため、現在入手可能な赤外線とＶＨＦ帯の２種の伝送装置を共用する方策を取っている。
【００３３】
又、前記コンピュータ本体Ａは、インターネット・デジタル回線を介して前記ＷＷＷサーバＢと接続され、前記ファイルサーバＣに対してステータス制御部３２から演技状況が出力され、又、画像認識モジュール４０から視点情報が出力され、逆にファイルサーバＣからは音声合成モジュール３４に音声情報が入力され、又、画像処理部３８に映像情報が入力されるようになっている。
【００３４】
本実施形態においては、前記（１）の機能を、音声入力・認識によりコンピュータ操作をすることにより実現する。具体的には、前記マイクロフォン１８により、利用者の音声を入力し、該音声から前記コンピュータ本体Ａの音声認識モジュール３０により、利用者が発声した文字列を認識し、該文字列に基づいて前記ステータス制御部３２によりファイルサーバＣに切り替え信号を送ることにより、該ファイルサーバＣからコンピュータ本体Ａに出力し、該本体Ａから液晶ゴーグル１２、ヘッドフォン１６に提供する映像・音声のマルチメディア情報のコンテンツを切り替えることができるようになっている。
【００３５】
なお、この機能を実現するにあたり、ＨＴＭＬアンカー（アンダーラインのある文字列：クリックすると所定の頁にジャンプする機能）については、音声認識で得た文字コードに基づいて指定された先にジャンプさせるような制御プログラムをＨＴＭＬ文に追加する方法を採用した。
【００３６】
図５は、これを概念的に示したブロック図である。コンピュータ本体Ａをインターネットでサーバに接続すると、メイン管理プログラム起動用ＨＴＭＬ文により、Ｊａｖａアプレットで記述されたメイン管理プログラム本体が起動する。その後、マウス等により、この本体のメニューボタンに対してキー入力を行うことにより、マルチメディア表示プログラム起動用ＨＴＭＬ文により、同じくＪａｖａアプレットで記述されたマルチメディア表示プログラム本体を起動することができる。そして、この状態で上記表示プログラム本体のメニューボタンに対してキー入力を行うことにより、ファイルサーバＣから希望するファイルの情報を読み出すことができる。このように、現状はマウスクリック入力でしかＷＷＷサーバＢとの間で情報伝達のためのメニューの切り替えができないところ、キー入力でも実現可能にした。
【００３７】
そして、音声コマンドがマイクロフォン１８から音声認識モジュール３０に入力されると、該モジュール３０では所定のメニューボタンに対応する文字コードに変換し、該文字コードをキー入力したのと同等な入力処理が実行される。即ち、音声コンマドにより別のＨＴＭＬ頁にジャンプする所望のアンカー機能を実現している。
【００３８】
前記（２）の機能を実現するために、前記図３、４に示したように、前記液晶ゴーグル１２、小型ビデオカメラ１４、ヘッドフォン１６、マイクロフォン１８を有するヘッドセット１０を、利用者の頭部に取り付けて使用できるようにすると共に、前述のように該ヘッドセット１０とＷＷＷ端末であるコンピュータ本体Ａとはワイヤレス接続することにより、結果的にヘッドセット１０とＷＷＷサーバＢとのワイヤレス接続を実現している。
【００３９】
前記（３）の機能を実現するために、前記ビデオカメラ１４で入力された実写映像と、前記コンピュータ本体Ａで提供されるＣＧ（コンピュータグラフィックス）による模範映像とを画像処理部３８で合成し、その合成画像をシースルー機能を閉じた状態にした前記液晶ゴーグル１２に提供するようにしてある。このようにすることにより、ＣＧ画像では背景部を完全な無地の状態にすることが容易であることから、いわゆるクロマキー合成法を用いて背景部に実写映像を埋め込めむようにすれば、双方の品質が劣化しないようにできるため、鮮明な模範映像を提供することができる。但し、模範映像は不鮮明になるが、シースルー機能をそのまま使用することもできる。
【００４０】
前記（４）の機能を実現するためには、例えば利用者がビデオカメラ１４で捕らえた視線映像の中の被写体（例えば施術者）のシルエット像から、図６に示すような４種類のカメラワークを自動的に選択できるようにしてもよい。
【００４１】
又、ビデオカメラ１４が利用者の視野画像を入力し、該視野画像に基づいて前記コンピュータ本体Ａの画像認識モジュール１０により利用者の視線を認識し、前記液晶ゴーグル１２に提供する映像のアングルを前記視線に近付けるように制御するようにしてもよい。
【００４２】
具体的には、利用者の視野画像から被写体の顔の向きだけ、画像処理のモーメント法で認識し、その方向に最も近いアングルの映像を提供する。そのために、模範映像を３次元ＣＧで制作し、１つのシーンに付き数方向のアングルでレンダリングした映像を用意するか、数方向から撮影した実写映像を用意しておく。なお、現状の画像処理では正確な視点を認識するのは困難であるが、３次元ＣＧの手法の場合は、計算に時間がかかる問題があるが、端末の処理能力が大きければ、グラフィックスエンジンを搭載するなどして、リアルタイムにレンダリングする手法をとり、アングルを任意に変更できるようにしてもよい。又、他に、この用途には移動範囲に制約が大きいために不適当であるが、位置センサをゴーグルに取り付けて、視点に完全に追従させる方法もある。
【００４３】
本実施形態では、利用者が液晶ゴーグル１２を通して見ることのできる映像のカメラワークとしては、ＣＧ映像の場合のみではあるが、施術者の背中に隠れた動作を透過して見ることができるように、該施術者をスケルトン（骸骨）表示する機能と、逆に解剖学的に動作を見ることができるように、被施術者（患者）をスケルトン表示する機能もある。これについては、後に具体例を示す。
【００４４】
同様に、ＣＧ映像の場合のみではあるが、複数のパーツ（頭や手等）が協調して動いている場合、選択したパーツ以外の全パーツの動作を止めて再生したり、選択したパーツの動作を止めて再生したりできる、単一パーツの動作表示機能もある。
【００４５】
又、液晶ゴーグル１２に提示される模範映像を再生する際、指定領域に対する時間軸を拡大する機能がある。
【００４６】
動画の再生映像は、基本的に時間的に変化する動きを、連続的な静止画で表示するようになっている。従って、データベースには連続的に変化する静止画の１つ１つが、図７（Ａ）に示すようなフレーム１〜７・・・の画像データとして保存されている。
【００４７】
ところが、模範演技の中では１回しか出てこないため、見落とされ易い繰り返し演技の開始又は終了部分や、熟練者では繰り返し演技にスムーズに結合され、見落とされ易く、時には簡素化又は省略されることがある繰り返し演技のつなぎ部分のように、図７（Ａ）のフレームの再生速度を単に遅らせるスロー再生では対応できない場合がある。このような映像については、例えばフレーム２とフレーム３の間の詳細な映像を再生する超スローモーション再生に対応できるように、同図（Ｂ）に示したように、これら両フレーム間を細分割したフレーム２．１〜２．７・・・の映像を別途用意してある。
【００４８】
又、本実施形態では、前記（５）の機能を実現するために、前記マイクロフォン１８が、利用者の音声を入力し、該音声から前記コンピュータ本体Ａの音声認識モジュール３０により、利用者が発声した複数の文字列を認識し、文字列と文字列との時間間隔に基づいて、再生速度制御部３６で前記コンピュータ本体Ａの音声合成モジュール３４や画像処理部３８から、ヘッドフォン１６や液晶ゴーグル１２にそれぞれ提供する支援映像や音声等の時系列マルチメディア情報の再生速度を制御するようになっている。なお、ここでいう文字列には、「あ」、「い」、「う」等の１文字も含まれる。
【００４９】
即ち、動画再生のフレームレートを可変に構成すると共に、ＢＧＭサウンドをＭＩＤＩで記述しておく。利用者が、マイクロフォン１８を使って、例えば「１，２，３」を「いち、にい、さん」という複数の文字列として所定の間隔で発声すると、「１」と「３」の発音開始時間の間隔を測定し、その間隔で、例えば動画３０フレーム分、四部音符８個分を再生するように制御することができるようになっている。
【００５０】
この機能を概念的に示したのが図８である。この図には、「１，２，３」の発声を、一番上のユーザ入力パルスのパルス１〜３に置き換えて示してあり、パルス３のタイミングでシステム・クロックのパルス間隔が変化している。そして、これに同期して動画は既製のフレームデータを上書きする間隔が制御され、ＣＧ画像はバックグラウンドで描画したフレームをフロントに交替させる間隔が制御され、ＭＩＤＩは強拍音符演奏間隔が制御されるようになっている。なお、ＭＩＤＩの演奏間隔制御では、パルスの振幅の揺らぎ成分で音の強さを変調することも可能となっている。
【００５１】
又、上記再生速度制御部３６には、外部同期信号も入力可能になっており、該速度制御部３６に利用者もしくは利用者に対面している人物の心拍等の生体信号を入力し、該生体信号に基づいて同様に前記時系列マルチメディア情報の再生速度を制御することもできるようになっている。
【００５２】
又、前記支援システムには、図１に示したように、利用者が前記ゴーグルを通して見ている映像を、他の者も見れるようにＣＲＴモニタも設置されている。
【００５３】
次に、本実施形態の救急医療支援システムの使用方法を説明する。
【００５４】
近年、高齢化社会を迎え、生活習慣病（旧成人病）の増加は最早避けることができない。この生活習慣病の中でも、狭心症・心筋硬塞等の循環器疾患は、発作により一刻を争う危険性を秘めている。このような救急医療においては、病院に搬送されるまでの救命・応急処置が非常に重要であり、周囲に医者がいない、画像検査機器がない、治療用薬剤がないという環境の下で、周囲にいる人々の適切な医療行為が患者の運命を決定することになる。本実施形態の支援システムは、このような環境下で、特に優れた効果を発揮する。
【００５５】
まず、前記図３に示したように、ユーザーインタフェース装置であるヘッドセット１０を利用者の頭部に取り付ける。
【００５６】
次いで、利用者はキー入力又はマイクロフォン１８を通して音声コマンドにより、コンピュータ本体Ａに対してコード入力し、該本体ＡをＷＷＷサーバＢに接続させ、ファイルサーバＣより模範演技に関する映像、音声等の情報が入力できる状態に立ち上げる。
【００５７】
そうすると、前記図５に示したメイン管理プログラムにより、図９に端末制御プログラムの概要として示した（１）〜（１０）の各静止映像を、救急医療支援用のモードメニューとして、液晶ゴーグル１２に順次表示できる状態になる。
【００５８】
図１０には、便宜上ＣＲＴモニタのメニュー画面を示したが、前記液晶ゴーグル１２には、この画面のウィンドウに表示されている映像（ここでは図９に示した（６）の映像）が表示されると共に、同画面内にあるような指示や質問がヘッドフォン１６を通して利用者に伝えられる。
【００５９】
利用者は、その指示に従って動作を行うと共に、質問に対して“はい”又は“いいえ”でマイクロフォン１８を通して回答することにより、図９に示した各メニュー映像を順次更新することができる。
【００６０】
ここでは、便宜上、利用者が図９のメニューの中から（９）の心臓マッサージのモードを選択したとする。利用者は、図１０の画面に示した“映像”を指示するコマンドをマイクロフォン１８を通して入力すると、前記図５に示したマルチメディア表示プログラムが起動し、前記ファイルサーバＣから必要な映像、音声情報がローディングされ、液晶ゴーグル１２に心臓マッサージの模範映像が動画として表示される。
【００６１】
図１１は、この模範映像の１フレーム、それも被施術者（患者）が前述したスケルトン表示されたフレームを、前記図１０の場合と同様のモニタ画面に示したものである。実際には、この画面に文字で表示されているような操作メニューを、利用者が文字コードを音声入力することにより選択して希望する状態の支援映像を表示させ、その模範演技を見ながら心臓マッサージの方法を修得することになる。
【００６２】
因みに、操作メニューとしては、速度（Ｓpeed）、再生、停止が音声入力により選択できると共に、前記図６に示したカメラワークに相当する正面、上面、側面、拡大の４つの映像を加えて、映像の開始部を、ビデオカメラ１４のアングルや音声入力により選択できる。
【００６３】
又、心臓マッサージの動画映像に対応する連続したフレーム映像の図示は省略するが、本実施形態の救急医療支援システムが有する他の映像表示機能について説明する。
【００６４】
図１２は、横方向に仰向けになっている患者に対して、その胸の上に施術者が覆い被さって心臓マッサージをしている場面の１映像であるが、この映像では患者の胸部から上の状態が分かり難い。そこで、この図に対応する図１３に示したように、施術者をスケルトンで表示できるようになっている。
【００６５】
又、図１４は、施術者が患者の胸に手を当てている場面を正面から見た映像であるが、この映像が施術者の手が患者の心臓に対してどのような位置関係になっているか不明である。そこで、図１５に示すように、患者の方をスケルトン表示し、肋骨の内側にある心臓と施術者の手の位置の関係も分かるように表示できるようになっている。
【００６６】
このようなスケルトン表示は、例えば特に演技上ポイントとなるような映像に対して、前記４つのカメラワークによる動画フレームの他に、予め別途作成してデータベース化しておくことで対応できる。
【００６７】
以上、詳述した如く、本実施形態の救急医療支援システムによれば、利用者が救急医療の施術又はその練習を行う際、両手を使わずにシステムを操作できる上に、演技に支障がない程度に移動することができ、模範映像と現実像を同時に、しかも利用者の希望する方向の映像を視聴でき、又、模範映像の速度を利用者の技術に応じて、演技中にも変更することができる。従って、家庭においても、容易に救急医療の施し方を修得することができる。
【００６８】
以上、本発明について具体的に説明したが、本発明は、前記実施形態に示したものに限られるものでなく、その要旨を逸脱しない範囲で種々変更可能である。
【００６９】
例えば、前記実施形態では、仮想環境表示装置が救急医療支援システムである場合を示したが、これに限定されない。又、ヘッドセットの具体的な外観形状や構成は同様に限定されない。
【００７０】
又、前記実施形態では、マルチ情報のサーバとしてＷＷＷサーバを利用する場合を示したが、他のネットワークを利用してもよく、又、ネットワークを使わずにデータベースと直接交信できるようにしてもよい。
【００７１】
又、コンピュータ本体（端末）が据え置き型である場合を示したが、コンピュータの小型化が進んでいることから、コンピュータ本体そのものを身に付け、該本体とデータサーバの間をワイアレスで接続するようにしてもよい。
【００７２】
【発明の効果】
以上説明したとおり、本発明によれば、実践の場で演技や作業等を続けながら、映像や音声によりその支援を受けることができる環境を提供することができる。
【００７３】
又、本発明によれば、指導者がいない実践の場において、度忘れした事項や参照したい文献等が出てきた場合に、作業を中断することなく、文献の検索や参照もできるようになり、インターネット環境で運用されていれば、一人の指導者では対応できないような難しい局面に対しても、解決策が得られる可能性を提供することもできる。
【００７４】
又、薄暗くて狭い作業現場で、多くの図面を参照する必要がある場合、従来は設計事務所の担当者と無線連絡を取りながら作業を進めざるを得なかったが、本発明によれば、現場で仮想的に図面を見ながら作業ができるため、効率良く作業を進めることができる。
【図面の簡単な説明】
【図１】本発明に係る一実施形態の救急医療支援システムの概略構成を示すブロック図
【図２】上記救急医療支援システムの具体的構成を概念的に示すブロック図
【図３】上記救急医療支援システムを構成するヘッドセットを、装着状態と共に示す正面図
【図４】上記ヘッドセットを上から見た平面図
【図５】音声によるＷＷＷコンテンツの切り換え方法を概念的に示すブロック図
【図６】再生映像のカメラワークの種類を示す説明図
【図７】標準と時間軸を拡大した再生映像シーケンスの関係を示す説明図
【図８】データ再生速度制御を説明するための説明図
【図９】救急医療支援の制御プログラムの概要を示す説明図
【図１０】仮想映像に対応するモニタ画面を示す説明図
【図１１】他の仮想映像に対応するモニタ画面を示す説明図
【図１２】模範映像の１場面を示す説明図
【図１３】図１２のスケルトン表示の例を示す説明図
【図１４】模範映像の他の１場面を示す説明図
【図１５】図１４のスケルトン表示の例を示す説明図
【符号の説明】
Ａ…コンピュータ本体
Ｂ…ＷＷＷサーバ
Ｃ…ファイルサーバ
１０…ヘッドセット
１２…液晶ゴーグル
１４…小型ビデオカメラ
１６…ヘッドフォン
１８…マイクロフォン
２０…ペンライト
２２…受信器
２４…送信器
３０…音声認識モジュール
３２…ステータス制御部
３４…音声合成モジュール
３６…再生速度制御部
３８…画像処理部
４０…画像認識モジュール[0001]
BACKGROUND OF THE INVENTION
In particular, the present invention is a place of practice in which it is difficult to bring in manuals, such as medical practice (emergency treatment, surgery, massage, nursing care, etc.), dance / gymnastics, etc. The present invention relates to a virtual environment presentation device that is suitable for learning practical skills that can be learned by the body at an educational site or the like, or for supporting maintenance work of facilities that require many drawings.
[0002]
[Prior art]
One-to-one education by instructors is the best way to acquire various performances / manuals or practical skills (hereinafter also referred to simply as performance) that humans perform interpersonally, such as medical practice, because there is no room to refer to the manual in practice. Efficient and essential.
[0003]
That said, not everyone can always get one-on-one education. Therefore, in addition to this one-on-one education, printed materials, videos, etc. are still used as instructional books and manuals, and it is now possible to learn using CD-ROM and recently the Internet WWW (World Wide Web) media. Yes.
[0004]
[Problems to be solved by the invention]
However, it is difficult to bring these manuals into the field of practice, and even if they are brought in, it is difficult to refer to the manual at the same time while learning the performance, and depending on the medium, the following There are various problems.
[0005]
(1) Printed material: Good portability, but cannot be referenced during the performance or turn pages.
[0006]
(2) Video: There is an advantage that you can perform while watching from a distance without any operation, but you can not go back, change speed, change angle, so beginners who want to slow down the tempo or change angle Can't meet the need to see the spot in the blind spot. Also, if the model performance and the user's facing direction are opposite, the procedure must be translated constantly in the head, which may be difficult for beginners.
[0007]
(3) CD-ROM: The video problem has been solved, but mouse operation is required.
[0008]
(4) WWW: Although there is an advantage that the content that can be provided is expanded as compared with the CD-ROM, a mouse operation is similarly required, and portability such as line connection is inferior.
[0009]
The present invention has been made to solve the above-mentioned conventional problems, and provides an environment in which the support can be received by an online manual while continuing acting and working in a place of practice, that is, a walking dictionary environment. This is the issue.
[0010]
[Means for Solving the Problems]
The present invention relates to an information presentation device configured to be capable of presentation control of multimedia information by an image recognition unit and a voice recognition unit in a virtual environment presentation device, a hands-free remote dialogue operation on the device, and the device. A user interface device that can be attached to a user and includes at least a video input unit, a video output unit, a voice input unit, and a voice output unit, which enable see-through viewing of provided information. A user's visual field image is input, and based on the visual field image, the user's line of sight is recognized by the image recognition means of the information presentation device, The image of the angle closest to the recognized line of sight is selected from model images or live-action images in multiple directions prepared in advance. Provided to the video output means And the While controlling the video angle to approach the line of sight, the voice input means inputs the user's voice, and recognizes the character string uttered by the user by the voice recognition means of the information presentation device from the voice, The content of the multimedia information provided by the information presentation device is switched based on the character string, and the video input by the video input means and the video provided by the information presentation device are processed by image processing. By synthesizing and providing the synthesized image to the video output means, the problem is solved.
[0011]
That is, in the present invention, the video input means of the user interface device allows manual information supporting performance etc. to be presented to the user as a video, and the video is sent via the audio output means, the video output means, etc. Thus, for example, the user can switch to the desired information content without using hands, that is, hands-free, based on the voice and line of sight of the user, so that the user supports the performance while continuing the performance, etc. Desired support information can be received.
[0012]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0013]
FIG. 1 is a block diagram conceptually showing an emergency medical support system (virtual environment presentation device) according to an embodiment of the present invention, and FIG. 2 is a block diagram showing the configuration of the system more specifically. FIG. 3 is a front view showing the headset (user interface device) constituting the support system of the present embodiment together with a state in which the user is wearing, and FIG. 4 is a plan view thereof.
[0014]
The emergency medical support system according to the present embodiment is a home-oriented system that realizes a virtual environment where hands-free learning can be performed while viewing an example video provided as an electronic manual as a manual from a remote location. .
[0015]
As shown in FIGS. 1 and 2, the support system of this embodiment is a personal computer main body (information presentation) configured to be able to present and control multimedia information including manual images for emergency medical assistance. Device, client terminal) A and a headset (user interface device) 10 wirelessly connected to the main body A.
[0016]
In this embodiment, the multimedia information is provided to the computer main body A from the file server (multimedia server) C of the WWW server B via the Internet.
[0017]
In FIG. 1, HTML (Hyper Text Markup Language) is a language used to create a homepage on the Internet, Java is a programming language developed by Sun Microsystems, and a program (applet) created in this language is the Internet. Executed when displaying the homepage of VRML (Virtual Reality Modeling Language) is a language that can express three-dimensional graphics on the Internet, and MIDI (Musical Instruments Digital Interface) is a standard for exchanging music data on a computer.
[0018]
The WWW is currently the most promising media in that it can provide abundant content online, and can provide functions such as backtracking, speed change, and angle change technically.
[0019]
In the present embodiment, the following five functions required as a user interface are realized in order to allow the user to refer to the electronic manual while learning acting or the like.
[0020]
(1) A computer can be operated without using both hands (hands-free operation).
[0021]
(2) It must be able to move to the extent that it does not interfere with performance (portability and lightness).
[0022]
(3) A model performance (virtual video) and a real image can be viewed simultaneously (see-through).
[0023]
(4) The angle of the model performance should be as close as possible to the user's viewpoint (at least not in the opposite direction).
[0024]
(5) The speed of the model performance is variable according to the skill of the user and can be changed during the performance.
[0025]
In the present embodiment, the above-described functions are realized by adopting the following system configuration.
[0026]
That is, as shown in FIGS. 3 and 4, the headset 10 includes a liquid crystal goggles 12, a small video camera 14, headphones 16 composed of left and right ear pads, a hands-free microphone 18 movable in the direction of the arrow, and the liquid crystal goggles. The penlight 20 fixed on the head 12 and the telescopic headband 10A are integrated.
[0027]
The liquid crystal goggles 12 are a see-through type having an optical see-through function using a translucent liquid crystal display. The goggles 12 and the headphones 16 are connected to a receiver (video receiving means, audio receiving means) 22 fixed by a belt around the user's waist via a goggle video cable 12C and a headphone audio cable 16C, respectively. Yes. The receiver 22 has a built-in battery, and supplies power to the goggles 12 via the goggle video cable 12C in addition to supplying power to the video receiving means and audio receiving means. .
[0028]
The small video camera 14 and the microphone 18 are connected to a transmitter (video transmission means, audio transmission means) 24 via a video camera cable 14C and a microphone audio cable 18C, respectively. The transmitter 24 also has a built-in battery. In addition to supplying power to the video transmission means and audio transmission means, the transmitter 24 supplies necessary power to the video camera 14 via the video camera cable 14C. The necessary power is supplied to the microphone 18 via the microphone audio cable 18C.
[0029]
In the present embodiment, the computer body A (client terminal) includes a speech recognition module 30, a status control unit 32, a speech synthesis module (or MIDI sound source) 34, a playback speed control unit 36, as shown in FIG. An image processing unit 38 that performs rendering, video decoding and reproduction, and synthesis with a real background image, and an image recognition module 40 having a camera parameter extraction function are provided.
[0030]
Then, when the user attaches the headset 10 to the head, the user can watch the manual provided on the electronic medium while performing the performance. A is connected to the wireless analog line via the receiver 22 and the transmitter 24.
[0031]
That is, the liquid crystal goggles 12 are input with a video signal transmitted by infrared from the image processing unit 38 via the receiver 22, and the headphone 16 is similarly input with an audio signal from the voice synthesis module 34. It has become. On the other hand, a video signal transmitted by radio waves in the VHF band is output from the video camera 14 to the image processing unit 38 and the image recognition module 40 via the transmitter 24, and audio is similarly transmitted from the microphone 18 to the voice recognition module 30. A signal is output.
[0032]
As described above, in order to avoid interference between the headset 10 and the computer main body A in order to avoid bidirectional interference between video and audio, there are two types of infrared and VHF bands currently available. We are taking measures to share transmission equipment.
[0033]
The computer main unit A is connected to the WWW server B via the Internet / digital line, the performance status is output from the status control unit 32 to the file server C, and the viewpoint information is received from the image recognition module 40. On the contrary, audio information is input from the file server C to the audio synthesizing module 34, and video information is input to the image processing unit 38.
[0034]
In the present embodiment, the function (1) is realized by operating a computer by voice input / recognition. Specifically, a user's voice is input by the microphone 18, a character string uttered by the user is recognized from the voice by the voice recognition module 30 of the computer main body A, and the character string is based on the character string. By sending a switching signal to the file server C by the status control unit 32, the contents of the video / audio multimedia information output from the file server C to the computer main body A and provided to the liquid crystal goggles 12 and the headphones 16 from the main body A Can be switched.
[0035]
In order to realize this function, HTML anchors (characters with underline: a function that jumps to a predetermined page when clicked) are made to jump to a point specified based on a character code obtained by voice recognition. Adopted a method of adding a simple control program to an HTML sentence.
[0036]
FIG. 5 is a block diagram conceptually showing this. When the computer main body A is connected to the server via the Internet, the main management program main body described in the Java applet is started by the main management program starting HTML statement. After that, by inputting a key to the menu button of the main body with a mouse or the like, the multimedia display program main body described in the Java applet can be started by the multimedia display program starting HTML sentence. In this state, information on a desired file can be read from the file server C by performing key input to the menu button of the display program body. As described above, the menu for information transmission can be switched with the WWW server B only by mouse click input, but it can be realized by key input.
[0037]
When a voice command is input from the microphone 18 to the voice recognition module 30, the module 30 converts the voice command into a character code corresponding to a predetermined menu button, and executes an input process equivalent to key input of the character code. Is done. That is, a desired anchor function for jumping to another HTML page by voice comma is realized.
[0038]
In order to realize the function (2), as shown in FIGS. 3 and 4, the headset 10 having the liquid crystal goggles 12, the small video camera 14, the headphones 16, and the microphone 18 is connected to the user's head. The headset 10 and the computer main body A, which is a WWW terminal, are connected wirelessly as described above, and as a result, wireless connection between the headset 10 and the WWW server B is realized. is doing.
[0039]
In order to realize the function (3), the image processing unit 38 synthesizes an actual image input by the video camera 14 and a model image by CG (computer graphics) provided by the computer main body A. The composite image is provided to the liquid crystal goggles 12 with the see-through function closed. By doing so, it is easy to make the background portion completely solid in the CG image. Therefore, if the real image is embedded in the background portion by using a so-called chroma key composition method, the quality of both is improved. Since it can be prevented from deteriorating, a clear model image can be provided. However, although the model image is unclear, the see-through function can be used as it is.
[0040]
In order to realize the function (4), for example, four types of camera work as shown in FIG. 6 are obtained from a silhouette image of a subject (for example, a practitioner) in a line-of-sight image captured by a user with the video camera 14. May be selected automatically.
[0041]
Further, the video camera 14 inputs a user's visual field image, recognizes the user's line of sight by the image recognition module 10 of the computer main body A based on the visual field image, and determines the angle of the video provided to the liquid crystal goggles 12. You may make it control so that it may approach the said eyes | visual_axis.
[0042]
Specifically, only the orientation of the subject's face is recognized from the user's visual field image by the moment method of image processing, and an image having an angle closest to that direction is provided. For this purpose, an exemplary video is produced by 3D CG, and a video that is rendered at an angle in several directions per scene is prepared, or a live-action video taken from several directions is prepared. Although it is difficult to recognize an accurate viewpoint with the current image processing, the 3D CG method has a problem that it takes time to calculate, but if the processing capability of the terminal is large, the graphics engine For example, the angle may be arbitrarily changed by using a method of rendering in real time. In addition, there is a method in which a position sensor is attached to goggles to completely follow the viewpoint, which is not suitable for this application because of a large restriction on the moving range.
[0043]
In the present embodiment, the camera work of the image that the user can see through the liquid crystal goggles 12 is only for the CG image, so that the operation hidden on the back of the practitioner can be seen through. There is also a function for displaying the operator (skeleton) and a function for displaying the patient (patient) in a skeleton so that the operation can be seen anatomically. About this, a specific example is shown later.
[0044]
Similarly, although only in the case of CG images, when multiple parts (heads, hands, etc.) are moving in cooperation, all parts other than the selected part are stopped and played back, There is also a single part motion display function that can be stopped and played back.
[0045]
In addition, there is a function of expanding the time axis with respect to the designated area when reproducing the model video presented on the liquid crystal goggles 12.
[0046]
The playback video of a movie basically displays a temporally changing motion as a continuous still image. Therefore, each still image that changes continuously is stored in the database as image data of frames 1 to 7... As shown in FIG.
[0047]
However, since it only appears once in the model performance, the beginning or end of the repeated performance that is easily overlooked, and the skilled person can be smoothly combined with the repeated performance, easily overlooked, sometimes simplified or omitted. There is a case where slow playback in which the playback speed of the frame in FIG. For such video, for example, as shown in FIG. 2B, the two frames are subdivided so as to be compatible with super slow motion playback that plays back detailed video between frames 2 and 3. The images of the frames 2.1 to 2.7... Prepared separately are prepared.
[0048]
Further, in the present embodiment, in order to realize the function (5), the microphone 18 inputs a user's voice, and the user speaks from the voice by the voice recognition module 30 of the computer main body A. Based on the time interval between the character string and the character string, the reproduction speed control unit 36 recognizes the headphone 16 and the liquid crystal goggles 12 from the voice synthesis module 34 and the image processing unit 38 of the computer main body A. The playback speed of time-series multimedia information such as support video and audio provided to each is controlled. Note that the character string here includes one character such as “A”, “I”, “U”.
[0049]
That is, the frame rate for moving image playback is configured to be variable, and the BGM sound is described in MIDI. For example, when the user utters “1, 2, 3” as a plurality of character strings “1, 2, 3,” at predetermined intervals using the microphone 18, the pronunciation of “1” and “3” starts. It is possible to control the time interval to be measured and to reproduce, for example, 30 frames of moving images and 8 quadrants at that interval.
[0050]
FIG. 8 conceptually shows this function. In this figure, the utterance of “1, 2, 3” is replaced with pulses 1 to 3 of the top user input pulse, and the pulse interval of the system clock changes at the timing of pulse 3. Yes. In synchronization with this, the interval for overwriting ready-made frame data is controlled for moving images, the interval for switching the frame drawn in the background to the front is controlled for CG images, and the interval for playing strong notes is controlled for MIDI. It has become so. Note that in MIDI performance interval control, it is also possible to modulate the sound intensity with a fluctuation component of the amplitude of the pulse.
[0051]
Further, an external synchronization signal can be input to the reproduction speed control unit 36, and a biological signal such as a heartbeat of a user or a person facing the user is input to the speed control unit 36, Similarly, the playback speed of the time-series multimedia information can be controlled based on the biological signal.
[0052]
In addition, as shown in FIG. 1, the support system is also provided with a CRT monitor so that other users can see the video that the user is looking through the goggles.
[0053]
Next, the usage method of the emergency medical assistance system of this embodiment is demonstrated.
[0054]
In recent years, with an aging society, an increase in lifestyle-related diseases (old adult diseases) can no longer be avoided. Among these lifestyle-related diseases, circulatory diseases such as angina pectoris and myocardial infarction have a risk of competing for a moment due to seizures. In such emergency medical care, lifesaving and first-aid treatment until transport to the hospital is very important, and there are no doctors, no imaging equipment, no therapeutic drugs in the surroundings. Appropriate medical practices of people at home will determine the fate of the patient. The support system of the present embodiment exhibits particularly excellent effects under such an environment.
[0055]
First, as shown in FIG. 3, the headset 10 which is a user interface device is attached to the user's head.
[0056]
Next, the user inputs a code to the computer main body A by voice input through the key input or the microphone 18, connects the main body A to the WWW server B, and information such as video and voice regarding the model performance is received from the file server C. Start up ready for input.
[0057]
Then, according to the main management program shown in FIG. 5, the still images of (1) to (10) shown as the outline of the terminal control program in FIG. 9 are transferred to the liquid crystal goggles 12 as a mode menu for emergency medical assistance. It becomes possible to display sequentially.
[0058]
FIG. 10 shows the menu screen of the CRT monitor for convenience. However, the liquid crystal goggles 12 display the video displayed in the window of this screen (here, the video (6) shown in FIG. 9). At the same time, instructions and questions as in the same screen are transmitted to the user through the headphones 16.
[0059]
The user performs the operation according to the instruction and answers the question through the microphone 18 with “Yes” or “No”, thereby sequentially updating the menu images shown in FIG.
[0060]
Here, for convenience, it is assumed that the user has selected the heart massage mode (9) from the menu of FIG. When the user inputs a command indicating “video” shown in the screen of FIG. 10 through the microphone 18, the multimedia display program shown in FIG. 5 is activated, and necessary video and audio information is received from the file server C. Is loaded, and a model image of heart massage is displayed as a moving image on the liquid crystal goggles 12.
[0061]
FIG. 11 shows one frame of this exemplary image, and the frame on which the patient (patient) displayed the skeleton as described above on the same monitor screen as in FIG. Actually, the user selects the operation menu that is displayed in characters on this screen by voice input of the character code, displays the support video in the desired state, and while watching the model performance, You will learn how to massage.
[0062]
Incidentally, as the operation menu, speed, playback, and stop can be selected by voice input, and four images of front, top, side, and enlargement corresponding to the camera work shown in FIG. Can be selected by the angle of the video camera 14 or voice input.
[0063]
In addition, although illustration of a continuous frame image corresponding to a moving image of cardiac massage is omitted, another image display function of the emergency medical support system of the present embodiment will be described.
[0064]
FIG. 12 is one image of a scene in which a practitioner covers a chest and is performing a heart massage on a patient who is lying on his back in the lateral direction. Is difficult to understand. Therefore, as shown in FIG. 13 corresponding to this figure, the practitioner can be displayed as a skeleton.
[0065]
FIG. 14 is an image of the front view of the scene where the practitioner is placing his hand on the patient's chest. This video shows the positional relationship of the practitioner's hand with respect to the patient's heart. Whether it is unknown. Therefore, as shown in FIG. 15, the patient can be displayed in a skeleton manner so that the relationship between the heart inside the ribs and the position of the practitioner's hand can be understood.
[0066]
Such a skeleton display can be dealt with by, for example, creating a database separately in advance in addition to the moving image frames by the four camera works, for example, for an image that is a point for performance.
[0067]
As described above, according to the emergency medical support system of the present embodiment, when the user performs emergency medical treatment or practice thereof, the system can be operated without using both hands, and there is no problem in performance. The model video and the real image can be viewed at the same time, and the video in the direction desired by the user can be viewed, and the speed of the model video can be changed during the performance according to the user's technique. be able to. Therefore, it is possible to easily learn how to perform emergency medical care even at home.
[0068]
Although the present invention has been specifically described above, the present invention is not limited to that shown in the above embodiment, and various modifications can be made without departing from the scope of the invention.
[0069]
For example, although the case where the virtual environment display device is an emergency medical support system has been described in the embodiment, the present invention is not limited to this. Further, the specific external shape and configuration of the headset are not limited as well.
[0070]
In the above embodiment, the WWW server is used as the multi-information server. However, another network may be used, or the database may be directly communicated without using the network. .
[0071]
Moreover, although the case where the computer main body (terminal) is a stationary type has been shown, since the miniaturization of the computer is progressing, the computer main body itself is worn and the main body and the data server are connected wirelessly. It may be.
[0072]
【The invention's effect】
As described above, according to the present invention, it is possible to provide an environment in which the support can be received by video and audio while continuing acting and working in a place of practice.
[0073]
In addition, according to the present invention, when a forgotten matter or a document to be referred to comes out in a practice place where there is no instructor, it is possible to search and refer to the document without interrupting the work. If it is operated in the Internet environment, it can provide the possibility of obtaining a solution even in difficult situations that cannot be handled by a single instructor.
[0074]
In addition, when it is necessary to refer to many drawings in a dim and narrow work site, conventionally, it was necessary to proceed with the work while wirelessly communicating with the person in charge of the design office, but according to the present invention, Since work can be performed while virtually viewing the drawing at the site, the work can be carried out efficiently.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a schematic configuration of an emergency medical support system according to an embodiment of the present invention.
FIG. 2 is a block diagram conceptually showing a specific configuration of the emergency medical support system.
FIG. 3 is a front view showing a headset constituting the emergency medical support system together with a wearing state.
FIG. 4 is a plan view of the headset as seen from above.
FIG. 5 is a block diagram conceptually showing a method for switching WWW content by sound.
FIG. 6 is an explanatory diagram showing the types of camera work of playback video
FIG. 7 is an explanatory diagram showing the relationship between a standard and a playback video sequence with an expanded time axis
FIG. 8 is an explanatory diagram for explaining data reproduction speed control;
FIG. 9 is an explanatory diagram showing an outline of a control program for emergency medical assistance.
FIG. 10 is an explanatory diagram showing a monitor screen corresponding to a virtual image.
FIG. 11 is an explanatory diagram showing a monitor screen corresponding to another virtual image
FIG. 12 is an explanatory diagram showing one scene of a model video
13 is an explanatory diagram showing an example of the skeleton display of FIG.
FIG. 14 is an explanatory diagram showing another scene of the model video
15 is an explanatory diagram showing an example of the skeleton display in FIG. 14;
[Explanation of symbols]
A ... Computer body
B ... WWW server
C ... File server
10 ... Headset
12 ... Liquid crystal goggles
14 ... Small video camera
16. Headphone
18 ... Microphone
20 ... Penlight
22 ... Receiver
24 ... Transmitter
30 ... Voice recognition module
32 ... Status control unit
34 ... Speech synthesis module
36. Playback speed control unit
38. Image processing unit
40. Image recognition module

Claims

Information presentation device configured to be able to control presentation of multimedia information by image recognition means and voice recognition means, remote interactive operation by hands-free to the device, and see-through viewing of information provided by the device A user interface device that has at least a video input means, a video output means, a voice input means, and a voice output means, and that can be attached to a user,
The video input means inputs a visual field image of the user, recognizes the user's line of sight by the image recognition means of the information presentation device based on the visual field image, and prepares a model video or a live-action video in a plurality of directions prepared in advance. among, and provide an image of the closest angle to the recognized visual line to the video output unit, controls so as to approach the angle of the image in the visual axis,
The voice input means inputs a user's voice, recognizes a character string uttered by the user by the voice recognition means of the information presentation apparatus from the voice, and provides the multi-value provided by the information presentation apparatus based on the character string Switch the content of the media information,
In addition, the video input by the video input unit and the video provided by the information presentation device are combined by image processing, and the combined image is provided to the video output unit. Environmental presentation device.

In claim 1,
The voice input unit inputs a user's voice, recognizes a plurality of character strings uttered by the user from the voice by the voice recognition unit of the information presentation device, and based on a time interval between the character string and the character string A virtual environment presentation device characterized in that the playback speed of time-series multimedia information provided by the information presentation device is controlled.

In claim 1,
A biological signal including a heartbeat of a user or a person facing the user is input, and the reproduction speed of time-series multimedia information provided by the information presentation device is controlled based on the biological signal. A virtual environment presentation device characterized.

In claim 1,
A virtual environment presentation device, wherein the user interface device including the video input unit, the video output unit, the voice input unit, and the voice output unit includes a headset that can be attached to a user's head.

In claim 4,
A virtual environment presentation device, wherein the headset is capable of wirelessly transmitting information to and from the information presentation device via a video transmission unit, a video reception unit, a voice transmission unit, and a voice reception unit.

In claim 4,
The virtual environment presentation device, wherein the information presentation device is connected to a data server that provides multimedia information.

In claim 6,
The virtual environment presentation device, wherein the headset is wire-connected to the information presentation device attached to a user's body, and the information presentation device is wirelessly connected to the data server.

In claim 6,
The virtual environment presentation device, wherein the information presentation device and the data server are a WWW client and a WWW server connected to the Internet, respectively.