JP4141782B2

JP4141782B2 - Image display device, image display method, and program

Info

Publication number: JP4141782B2
Application number: JP2002287473A
Authority: JP
Inventors: 声揚黄; 淳富士本
Original assignee: Ptopa Inc; Aruze Corp
Current assignee: Universal Entertainment Corp; Ptopa Inc
Priority date: 2002-09-30
Filing date: 2002-09-30
Publication date: 2008-08-27
Anticipated expiration: 2022-09-30
Also published as: JP2004126784A

Description

【０００１】
【発明の属する技術分野】
本発明は、話者の居る位置に基づいて特定の画像を画面上に表示させる画像表示装置、画像表示方法及びプログラムに関する。
【０００２】
【従来の技術】
従来から、コンピュータが特定の話者に話し掛ける音声システムがある（例えば、特許文献１参照。）。これにより、遊技場等の人が多く出入りするような場所に音声システムを構築すれば、その音声を発するコンピュータは、一種の広告塔としての機能を果たすことができる。また、特に一人暮らしの者が、自宅に音声システムを配備すれば、音声を発するコンピュータが、その者からの音声に応じて所定の回答文を出力するので、かかる者は、少しでも一人暮らしの寂しさを紛らわすことができる。
【０００３】
【特許文献１】
特開２００２−１６９８０４（第５−１３、第１５図）
【０００４】
【発明が解決しようとする課題】
しかしながら、話者が上記コンピュータに対して発話したとしても、コンピュータが単にその発話に対して所定の回答文を出力していただけであるので、その話者は何か物足りなさを感じていた。このため、従来からは、話者に飽きさせることのないキャラクタの動作を見せることで、話者をより楽しませ、話者がそのキャラクタとの間でコミュニケーションを取っているかのような感覚を味わせることのできるシステムの開発が望まれていた。
【０００５】
そこで、本発明は以上の点に鑑みてなされたものであり、話者が居る位置に応じて画面に表示された画像を変化させることで、その話者が、その変化された画像を見て恰も他の話者との間でコミュニケーションを取っているかのような感覚を味わうことのできる画像表示装置、画像表示方法及びプログラムを提供することを課題とする。
【０００６】
【課題を解決するための手段】
本発明は、上記課題を解決すべくなされたものであり、画面中のいずれかの領域に表示される複数の画像のそれぞれには、各領域にそれぞれ相応する配置関係の位置情報のうち、話者が居るであろうと予想される位置情報が対応付けられ、その各画像を予め記憶し、話者から発せられた音声に基づいて話者の位置を推定し、推定された位置に基づいて、その位置と予め記憶された各位置情報とを照合し、その各位置情報の中から、上記位置に応じた位置情報を検索し、検索された位置情報に基づいて、位置情報に対応付けられた画像を取得し、取得された画像を、画面の、位置推定手段で推定された位置に応じた位置情報に相応する配置関係の領域に表示することを特徴とする。
【０００７】
このような本願に係る発明によれば、画像表示装置が、推定した話者の位置に基づいて、その位置と一致する位置情報に対応付けられた画像を画面に表示することができる。即ち、画像表示装置は、話者が居る位置に応じて特定の画像を画面に表示することができる。
【０００８】
上記構成においては、各画像のそれぞれには、所定のキャラクタが含まれ、各画像のそれぞれに対応付けられた位置情報には、キャラクタを特定の領域に表示させるための領域情報が対応付けられ、検索された位置情報に対応付けられた領域情報に基づいて、その領域情報に対応する領域に、検索された位置情報に対応付けられた画像に含まれるキャラクタを表示することが好ましい。尚、この各領域は、キャラクタが移動する方向に設けることが好ましい。
【０００９】
これにより、キャラクタが移動する方向に各領域が設けられている場合には、画像表示装置は、例えば、話者の移動に伴なって画面上のキャラクタを同方向に移動させることができる。この場合、話者が特定の方向に移動すると、画面上のキャラクタが同方向に移動するので、その話者は、そのキャラクタとの間で恰もコミュニケーションを取っているかのような感覚を味わうことができる。
【００１０】
尚、画像表示装置は、所定のキーワードを予め複数記憶し、話者から発せられた音声に対応する文字列を認識し、認識された文字列に基づいて文字列と一致するキーワードを検索することができた場合には、検索された位置情報に対応付けられた画像を取得することが好ましい。
【００１１】
この場合には、画像表示装置は、話者から発せられた音声に対応する文字列に基づいて、その文字列と一致するキーワードを検索することができたときは、検索された位置情報に対応付けられた画像を表示することができる。この結果、話者は、自己の特定の言葉に反応して画面上の画像が移動するので、恰もその画像との間でコミュニケーションを取っているかのような感覚を味わうことができる。
【００１２】
尚、各キーワードの内容としては、各話者との間で最初又は最後に交わされる挨拶文等が挙げられる。この場合には、話者は、特定の挨拶文を発すれば、その挨拶文に反応して画面上の画像が移動するので、話者は、画面上の画像との間で親密な関係を有するかのような感覚を味わうことができる。この結果、話者は、画面上の画像との間でより多くの出来事を話そうとする動機付けが高まり、退屈な時間をより楽しく過ごすことができる。
【００１３】
【発明の実施の形態】
［実施形態］
（画像表示装置の基本構成）
本発明に係る画像表示装置について図面を参照しながら説明する。図１は、本実施形態に係る画像表示装置１００の内部構造を示す図である。同図に示すように、画像表示装置１００は、話者の居る位置に応じた画像を画面に表示させるものである。画像表示装置１００は、本実施形態では、音声入力部１１０と、位置推定部１２０と、画像検索部１３０と、画像記憶部１４０と、出力部１５０とを有する。
【００１４】
前記音声入力部１１０は、話者から発せられた音声を取得するものである。この音声入力部１１０は、本実施形態では、複数のマイクロホンで構成することができる。具体的に、話者から発せられた音声を取得した音声入力部１１０は、取得した音声を音声信号として位置推定部１２０に出力する。
【００１５】
位置推定部１２０は、話者から発せられた音声に基づいて話者の位置を推定するものである。ここで、話者の位置は、図２に示すように、座標系で示すことができる。この座標系は、話者が居るであろう位置を示す仮想的なものである。これら各座標は、本実施形態では、（ｘi、ｙj）{ｉ＝１、２、・・・ｎ；ｊ＝１、２、・・・ｍ}で示すものとする。尚、位置推定部１２０は、本実施形態では、推定した位置を座標系で特定しているが、極座標系で特定してもよい。
【００１６】
具体的に、音声入力部１１０から音声信号が入力された位置推定部１２０は、先ず入力された複数の音声信号に基づいて、それら音声信号の相互相関関数を、全てのマイクロホンの組み合わせについて計算する。この相互相関関数を計算した位置推定部１２０は、計算した相互相関関数に基づいて、予め決められた一の基準マイクロホンと他のマイクロホンとの間の最大値を与える時間差を求める。
【００１７】
この位置推定部１２０は、求めた時間差に基づいて話者（音源）の位置を推定する（参考文献：特開平１１−３０４９０６）。話者の位置を推定した位置推定部１２０は、推定した位置を位置信号として画像検索部１３０に出力する。尚、その他の複数のマイクロホンから得られる音声信号を処理して話者の位置を推定する方法は、文献「音響システムと信号処理」、大賀他、電子情報通信学会の７章に詳述されている。
【００１８】
画像検索部１３０は、位置推定部１２０で推定された位置に基づいて、その位置と予め記憶された各位置情報とを照合し、各位置情報の中から、その位置と一致する位置情報を検索するものである。また、画像検索部１３０は、検索した位置情報に基づいて、その位置情報に対応付けられた画像を取得するものでもある。
【００１９】
ここで、複数の画像のそれぞれには、話者が位置するであろうと予測される位置情報が対応付けられている。この各画像（画像パターン１−１、画像パターン１−２・・・）は、本実施形態では、図４に示すように、予め画像記憶部１４０に記憶されている。
【００２０】
具体的に、位置推定部１２０から位置信号が入力された画像検索部１３０は、入力された位置信号に対応する話者の位置と、予め記憶された各位置情報とを照合し、各位置情報の中から、その話者の位置と一致する位置情報を検索する。画像検索部１３０は、その検索した位置情報に対応付けられた画像パターンを取得する。画像検索部１３０は、その取得した画像パターンを画像信号として出力部１５０に出力する。
【００２１】
出力部１５０は、画像検索部１３０で検索された画像パターンを画面に表示するものである。出力部１５０は、本実施形態では、液晶ディスプレイ等が挙げられる。具体的に、画像検索部１３０から画像信号が入力された出力部１５０は、入力された画像信号に基づいて、画像信号に対する画像パターンを画面に表示する。
【００２２】
尚、各画像のそれぞれには、所定のキャラクタを含めてもよい。また、その各画像のそれぞれに対応付けられた位置情報には、上記キャラクタを特定の領域に表示させるための領域情報を対応付けてもよい。この特定の領域は、図５に示すように、本実施形態では、キャラクタが移動する方向、例えば水平方向に設けられるものとする。ここで、キャラクタは、本実施形態では、現存の人物、架空の人物又は動物等を意味するものである。
【００２３】
具体的に、キャラクタは、本実施形態では、図５及び図６に示すように、話者が居る位置に応じて、基準領域、Ａ領域〜Ｄ領域のいずれかの領域に移動する。この基準領域は、本実施形態では、キャラクタが最初に位置する領域を意味する。キャラクタは、その基準領域を中心として、Ａ領域〜Ｄ領域のいずれかの領域へと移動する。
【００２４】
即ち、出力部１５０は、画像検索部１３０で検索された位置情報に対応付けられた領域情報に基づいて、その領域情報に対応する領域に、検索された位置情報に対応付けられた画像に含まれるキャラクタを表示する。これにより、出力部１５０は、話者が居る位置に応じてキャラクタを移動させることができる。
【００２５】
尚、各領域は、図６に示すように、話者が居る（位置する）であろうと予測される上記位置情報と相応する配置関係に設けることが好ましい。即ち、各領域は、各位置情報に対応する各位置の関係と相対的に釣り合いが取れた配置関係に設けることが好ましい。
【００２６】
例えば、各位置情報に対応する各位置が、図６に示すように、音声入力部１１０の中央部分（同図中の”基準領域”）である場合には、上記基準領域は、その各位置と相対的に釣り合いが取れた部分、即ち画面の中央部分（図５中の”基準領域”）に設ける。これと同様にして、Ａ〜Ｄ領域も各位置情報に対応する各位置と相応する部分に設けることができる。これにより、話者が特定の方向に移動したときは、画面上のキャラクタは、その方向と同一の方向に移動することができる。
【００２７】
（画像表示装置を用いた画像表示方法）
上記構成を有する画像表示装置による画像表示方法は、以下の手順により実施することができる。図７は、本実施形態に係る画像表示方法の手順を示すフロー図である。
【００２８】
先ず、音声入力部１１０が、話者から発せられた音声を取得するステップを行う（Ｓ１０１）。この音声入力部１１０は、本実施形態では、複数のマイクロホンで構成することができる。具体的に、話者から発せられた音声を取得した音声入力部１１０は、取得した音声を音声信号として位置推定部１２０に出力する。
【００２９】
そして、位置推定部１２０が、話者から発せられた音声に基づいて話者の位置を推定するステップを行う（Ｓ１０２）。具体的に、音声入力部１１０から音声信号が入力された位置推定部１２０は、先ず入力された複数の音声信号に基づいて、それら音声信号の相互相関関数を、全てのマイクロホンの組み合わせについて計算する。
【００３０】
この相互相関関数を計算した位置推定部１２０は、計算した相互相関関数に基づいて、予め決められた一の基準マイクロホンと他のマイクロホンとの間の最大値を与える時間差を求める。位置推定部１２０は、求めた時間差に基づいて話者（音源）の位置を推定する（参考文献：特開平１１−３０４９０６）。話者の位置を推定した位置推定部１２０は、推定した位置を位置信号として画像検索部１３０に出力する。
【００３１】
次いで、出力部１５０が、画像検索部１３０で検索された画像を画面に表示するステップを行う（Ｓ１０３）。具体的に、位置推定部１２０から位置信号が入力された画像検索部１３０は、入力された位置信号に対応する話者の位置と、予め記憶された各位置情報とを照合し、各位置情報の中から、その話者の位置と一致する位置情報を検索する。
【００３２】
その画像検索部１３０は、その検索した位置情報に対応付けられた画像パターンを取得する。画像検索部１３０は、その取得した画像パターンを画像信号として出力部１５０に出力する。画像検索部１３０から画像信号が入力された出力部１５０は、入力された画像信号に基づいて、画像信号に対する画像パターンを画面に表示する。
【００３３】
（画像表示装置及び画像表示方法による作用及び効果）
このような本願に係る発明によれば、出力部１５０が、位置推定部１２０で推定された話者の位置に基づいて、その位置と一致する位置情報に対応付けられた画像を画面に表示することができる。即ち、出力部１５０は、話者が居る位置に応じて特定の画像を画面に表示することができる。
【００３４】
また、出力部１５０は、画像検索部１３０で検索された位置情報に対応付けられた領域情報に基づいて、その領域情報に対応する領域に、検索された位置情報に対応付けられた画像に含まれるキャラクタを表示することができる。これにより、キャラクタが移動する方向に各領域が設けられている場合には、出力部１５０は、例えば、話者の移動に伴なって画面上のキャラクタを同方向に移動させることができるので、その話者は、そのキャラクタとの間で恰もコミュニケーションを取っているかのような感覚を味わうことができる。
【００３５】
［変更例］
尚、本発明は、上記実施形態に限定されるものではなく、以下に示す変更を加えることができる。
【００３６】
（第一変更例）
図８に示すように、音声入力部１１０を構成する各マイクロホン１１１ａ〜１１１ｃは、特定の方向に配備してもよい。本変更例では、各マイクロホン１１１ａ〜１１１ｃは、一列に並べるものとする。また、位置推定部１２０は、各マイクロホンで取得された音声に基づいて、話者が各マイクロホンに向かって音声を発した方向と各マイクロホンが配備されている方向との間でなす角度を算出するものであってもよい。
【００３７】
ここで、図８は、二つのマイクロホン１１１ｂ、１１１ｃに平面波が入力する様子を示すものである。この二つのマイクロホン１１１ｂ、１１１ｃに平面波が入力された時間差が幾何学的に何を示しているのかを説明するものである。同図中の破線は平面波の等位相面を示す。同図は、これらの平面波が、先ずマイクロホン１１１ｂに到達し、遅れてマイクロホン１１１ｃに到達する様子を描いている。
【００３８】
同図に示すように、各平面波が各マイクロホン１１１ｂ、１１１ｃに到達する時間差（到達時間差）は、各マイクロホンの間隔と入射角度θの余角との積を正規化音速ｃで除したものである。すなわち、到達時間差は、マイクロホン間隔ｃｏｓθ／ｃとして表現することができる。この入射角度θは、本変更例では、話者が各マイクロホン１１１ａ〜１１１ｃに向かって音声を発した方向と各マイクロホン１１１ａ〜１１１ｃが配備されている方向との間でなす角度を意味する。
【００３９】
上記式を変形すると、入射角度θは、ａｒｃｃｏｓ（ｃ到達時間差／マイクロホン間隔）となる。従って、入射角度θは、到達時間差とマイクロホン間隔が分かれば特定することができる。
【００４０】
具体的に、各マイクロホンから音声信号が入力された位置推定部１２０は、入力された各音声信号に基づいて、少なくとも二つの音声信号が入力された時間差を算出する。この時間差は上記到達時間差とすることができる。この到達時間差を算出した位置推定部１２０は、上式に従って、算出した到達時間差と、上記各音声信号を取得した各マイクロホンの間隔とに基づいて平面波の入射角度θを算出する。位置推定部１２０は、その算出した入射角度θを角度信号として画像検索部１３０に出力する。
【００４１】
画像検索部１３０は、位置推定部１２０で算出された入射角度θに基づいて、この入射角度θと予め設定された各符号とを照合し、各符号の中から、入射角度θと一致する符号を検索するものであってもよい。この符号は、本変更例では角度情報として表現する。
【００４２】
ここで、本変更例では、各画像のそれぞれには、所定のキャラクタの顔が含まれ、その各画像を順次切り替えることにより顔は、所定の回転角をもって回転するように表示されるものとなっている。また、画像に対応付けられた角度情報の大きさは、本変更例では、画像に含まれる顔の回転角の大きさと一致するものとする。
【００４３】
すなわち、図１０に示すように、正面を向いている顔は、回転角を有しないので、例えば、正面を向いた顔の画像２−０には、角度情報０を対応付ける。同様にして、図１１に示すように、斜め３０°の方向を向いている顔は、回転角３０°を有するので、例えば、斜め３０°の方向を向いている顔の画像２−３０には、角度情報３０を対応付ける。
【００４４】
具体的に、位置推定部１２０から角度信号が入力された画像検索部１３０は、入力された角度信号に対応する入射角度θに基づいて、各角度情報の中から、入射角度θ（例えば、３）と一致する角度情報３を検索する。この角度情報３を検索した画像検索部１３０は、検索した角度情報３に基づいて、角度情報３に対応付けられた画像２−３を取得する。出力部１５０は、画像検索部１３０で検索された画像２−３を画面に表示する。
【００４５】
このような本変更例に係る発明によれば、画像検索部１３０は、位置推定部１２０で算出された入射角度θに基づいて、その入射角度θと一致する角度情報に対応付けられた画像を画面に表示させることができる。また、画像検索部１３０は、画面上の画像を一義的に動作させるのではなく、話者の位置に応じて画像を動作させることができるので、その話者は、画面上に表示された画像との間でコミュニケーションを取っているかのような感覚を味わうことができる。
【００４６】
更に、画像検索部１３０は、話者から発せられた音声の入射角度に応じて、画面に表示された顔の画像の回転角を変えることができる。これにより、話者が所定の音声を発すれば、画面に表示された顔の画像が話者の方向を向くので、その話者は、恰もその顔画像との間でコミュニケーションを取っているかのような感覚を味わうことができる。
【００４７】
（第二変更例）
尚、画像記憶部１４０は、所定のキーワードを予め複数記憶するものであってもよい。また、音声入力部１１０は、話者から発せられた音声に対応する文字列を認識するものであってもよい。更に、画像検索部１３０は、音声入力部１１０で認識された文字列に基づいて文字列と一致するキーワードを検索することができた場合には、自部で検索した位置情報に対応付けられた画像を取得することが好ましい。
【００４８】
この場合には、画像検索部１３０は、話者から発せられた音声に対応する文字列に基づいて、その文字列と一致するキーワードを検索することができたときは、検索した位置情報に対応付けられた画像を出力部１５０で表示させることができる。この結果、話者は、自己の特定の言葉に反応して画面上の画像が移動するので、恰もその画像との間でコミュニケーションを取っているかのような感覚を味わうことができる。
【００４９】
尚、各キーワードの内容としては、各話者との間で最初又は最後に交わされる挨拶文等が挙げられる。この場合には、話者は、特定の挨拶文を発すれば、その挨拶文に反応して画面上の画像が移動するので、話者は、画面上の画像との間で親密な関係を有するかのような感覚を味わうことができる。この結果、話者は、画面上の画像との間でより多くの出来事を話そうとする動機付けが高まり、退屈な時間をより楽しく過ごすことができる。
【００５０】
［プログラム］
上記画像表示装置及び画像表示方法で説明した内容は、パーソナルコンピュータ等の汎用コンピュータで、所定のプログラム言語で記述された専用プログラムを実行することにより実現することができる。
【００５１】
このような本実施形態に係るプログラムによれば、話者が居る位置に応じて画面に表示させる画像を変化させることで、その話者が、その変化された画像を見て恰も他の話者との間でコミュニケーションを取っているような感覚を味わうことができるという作用効果を奏する画像表示装置及び画像表示方法を一般的な汎用コンピュータで容易に実現することができる。
【００５２】
尚、プログラムは、記録媒体に記録することができる。この記録媒体は、図１２に示すように、例えば、ハードディスク２００、フレキシブルディスク３００、コンパクトディスク４００、ＩＣチップ５００、カセットテープ６００などが挙げられる。このようなプログラムを記録した記録媒体によれば、プログラムの保存、運搬、販売などを容易に行うことができる。
【００５３】
【発明の効果】
以上説明したように本発明によれば、話者が居る位置に応じて画面に表示された画像を変化させることで、その話者は、その変化された画像を見て恰も他の話者との間でコミュニケーションを取っているかのような感覚を味わうことができる。
【図面の簡単な説明】
【図１】本実施形態に係る画像表示装置の内部構成を示すブロック図である。
【図２】本実施形態における音声入力部の前で話者が行動するであろうと予想される仮想的な範囲を示す図である。
【図３】本実施形態における位置推定部で推定された話者の位置を示す図である。
【図４】本実施形態における画像記憶部で記憶される各位置情報及び各画像の内容を示す図である。
【図５】本実施形態における画面で所定のキャラクタが移動する様子を示す図である。
【図６】本実施形態における出力部が話者の位置に応じてキャラクタを画面上に表示させる各領域を示す図である。
【図７】本実施形態に係る画像表示方法の手順を示すフロー図である。
【図８】本変更例における位置推定部が平面波の入射角度を特定する様子を示す図である。
【図９】本変更例における画像記憶部で記憶する各角度情報及び各画像の内容を示す図である。
【図１０】変更例における出力部で表示する人物のキャラクタを示す図である（その１）。
【図１１】変更例における出力部で表示する人物のキャラクタを示す図である（その２）。
【図１２】本実施形態におけるプログラムを記録する記録媒体を示す図である。
【符号の説明】
１００…画像表示装置、１１０…音声入力部、１１１…マイクロホン、１２０…位置推定部、１３０…画像検索部、１４０…画像記憶部、１５０…出力部、１６０…画像検索部、２００…ハードディスク、３００…フレキシブルディスク、４００…コンパクトディスク、５００…ＩＣチップ、６００…カセットテープ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image display device, an image display method, and a program for displaying a specific image on a screen based on a position where a speaker is present.
[0002]
[Prior art]
Conventionally, there is an audio system in which a computer talks to a specific speaker (see, for example, Patent Document 1). Thus, if a sound system is constructed in a place where many people such as a game hall enter and leave, the computer that emits the sound can function as a kind of advertising tower. Also, especially if a person living alone deploys a voice system at home, the computer that emits the sound outputs a predetermined answer sentence according to the voice from the person, so that person is lonely living alone. Can be confused.
[0003]
[Patent Document 1]
JP 2002-169804 (FIGS. 5-13 and 15)
[0004]
[Problems to be solved by the invention]
However, even if the speaker utters to the computer, the computer simply outputs a predetermined answer to the utterance, and the speaker felt something unsatisfactory. For this reason, traditionally, the behavior of a character that does not bore the speaker is shown to make the speaker more enjoyable and feel as if the speaker is communicating with the character. The development of a system that can be adjusted is desired.
[0005]
Therefore, the present invention has been made in view of the above points, and by changing the image displayed on the screen according to the position where the speaker is, the speaker can see the changed image. It is an object of the present invention to provide an image display device, an image display method, and a program that allow the user to enjoy a feeling as if the user is communicating with another speaker.
[0006]
[Means for Solving the Problems]
The present invention has been made to solve the above problems, each of multiple images displayed on one area of the screen, among the position information of the positional relationship corresponding to each area, The location information where the speaker is expected to be located is associated, each image is stored in advance, the location of the speaker is estimated based on the speech uttered from the speaker, and based on the estimated location The position is compared with each position information stored in advance, the position information corresponding to the position is searched from the position information, and the position information is associated with the position information based on the searched position information. The acquired image is acquired, and the acquired image is displayed in an area of the layout relation corresponding to the position information corresponding to the position estimated by the position estimating means on the screen .
[0007]
According to the invention according to the present application, the image display device can display an image associated with position information that matches the position on the screen based on the estimated position of the speaker. In other words, the image display device can display a specific image on the screen according to the position where the speaker is present.
[0008]
In the above configuration, each image includes a predetermined character, and the position information associated with each image is associated with region information for displaying the character in a specific region, Based on the area information associated with the retrieved position information, it is preferable to display a character included in the image associated with the retrieved position information in the area corresponding to the area information. Each area is preferably provided in the direction in which the character moves.
[0009]
Thereby, when each area | region is provided in the direction to which a character moves, the image display apparatus can move the character on a screen to the same direction with the movement of a speaker, for example. In this case, when the speaker moves in a specific direction, the character on the screen moves in the same direction, so that the speaker may feel as if he or she is communicating with the character. it can.
[0010]
The image display device stores a plurality of predetermined keywords in advance, recognizes a character string corresponding to a voice uttered by a speaker, and searches for a keyword that matches the character string based on the recognized character string. If it is possible, it is preferable to acquire an image associated with the searched position information.
[0011]
In this case, when the image display device can search for a keyword that matches the character string based on the character string corresponding to the voice uttered by the speaker, the image display device corresponds to the searched position information. The attached image can be displayed. As a result, the speaker moves the image on the screen in response to his / her specific words, so that the speaker can feel as if he / she is communicating with the image.
[0012]
Note that the content of each keyword includes a greeting sentence exchanged with each speaker first or last. In this case, if the speaker utters a specific greeting, the image on the screen moves in response to the greeting, so the speaker has an intimate relationship with the image on the screen. You can enjoy the feeling of having it. As a result, the speaker is more motivated to talk more events with the image on the screen, and can have a more enjoyable boring time.
[0013]
DETAILED DESCRIPTION OF THE INVENTION
[Embodiment]
(Basic configuration of image display device)
An image display apparatus according to the present invention will be described with reference to the drawings. FIG. 1 is a diagram showing an internal structure of the image display apparatus 100 according to the present embodiment. As shown in the figure, the image display device 100 displays an image corresponding to the position where the speaker is on the screen. In the present embodiment, the image display apparatus 100 includes a voice input unit 110, a position estimation unit 120, an image search unit 130, an image storage unit 140, and an output unit 150.
[0014]
The voice input unit 110 acquires voice uttered by a speaker. In this embodiment, the voice input unit 110 can be composed of a plurality of microphones. Specifically, the voice input unit 110 that has acquired the voice uttered by the speaker outputs the acquired voice to the position estimation unit 120 as a voice signal.
[0015]
The position estimation unit 120 estimates the position of the speaker based on the voice emitted from the speaker. Here, the position of the speaker can be shown in a coordinate system as shown in FIG. This coordinate system is a virtual one indicating the position where the speaker will be. In the present embodiment, these coordinates are represented by (xi, yj) {i = 1, 2,... N; j = 1, 2,. In the present embodiment, the position estimation unit 120 specifies the estimated position in the coordinate system, but may specify the position in the polar coordinate system.
[0016]
Specifically, the position estimation unit 120 to which the audio signal is input from the audio input unit 110 first calculates a cross-correlation function of the audio signals for all combinations of microphones based on the input audio signals. . The position estimation unit 120 that has calculated the cross-correlation function obtains a time difference that gives the maximum value between one predetermined reference microphone and another microphone based on the calculated cross-correlation function.
[0017]
The position estimation unit 120 estimates the position of the speaker (sound source) based on the obtained time difference (reference document: Japanese Patent Laid-Open No. 11-304906). The position estimation unit 120 that has estimated the position of the speaker outputs the estimated position to the image search unit 130 as a position signal. A method for estimating the position of a speaker by processing audio signals obtained from a plurality of other microphones is described in detail in the literature “Acoustic System and Signal Processing”, Oga et al., Chapter 7 of the Institute of Electronics, Information and Communication Engineers. Yes.
[0018]
Based on the position estimated by the position estimation unit 120, the image search unit 130 collates the position with each piece of position information stored in advance, and searches for position information that matches the position from the position information. To do. The image search unit 130 is also configured to acquire an image associated with the position information based on the searched position information.
[0019]
Here, each of the plurality of images is associated with position information where it is predicted that the speaker will be located. In the present embodiment, each image (image pattern 1-1, image pattern 1-2,...) Is stored in advance in the image storage unit 140 as shown in FIG.
[0020]
Specifically, the image search unit 130 to which the position signal is input from the position estimation unit 120 collates the position of the speaker corresponding to the input position signal with each position information stored in advance, and each position information The position information that matches the position of the speaker is retrieved from the list. The image search unit 130 acquires an image pattern associated with the searched position information. The image search unit 130 outputs the acquired image pattern to the output unit 150 as an image signal.
[0021]
The output unit 150 displays the image pattern searched by the image search unit 130 on the screen. In the present embodiment, the output unit 150 includes a liquid crystal display. Specifically, the output unit 150 to which the image signal is input from the image search unit 130 displays an image pattern for the image signal on the screen based on the input image signal.
[0022]
Each image may include a predetermined character. Further, the position information associated with each of the images may be associated with area information for displaying the character in a specific area. As shown in FIG. 5, this specific area is provided in the direction in which the character moves, for example, in the horizontal direction in this embodiment. Here, in the present embodiment, the character means an existing person, a fictional person, an animal, or the like.
[0023]
Specifically, in this embodiment, as shown in FIGS. 5 and 6, the character moves to one of the reference area and the areas A to D according to the position where the speaker is present. In the present embodiment, this reference area means an area where the character is first positioned. The character moves from the A area to the D area around the reference area.
[0024]
That is, the output unit 150 includes, in the image associated with the searched position information, the area corresponding to the area information based on the area information associated with the position information searched by the image search unit 130. The character to be displayed is displayed. Thereby, the output part 150 can move a character according to the position where a speaker exists.
[0025]
As shown in FIG. 6, each region is preferably provided in an arrangement relationship corresponding to the position information predicted that a speaker is present (positioned). That is, it is preferable to provide each region in an arrangement relationship that is relatively balanced with the relationship between the positions corresponding to the position information.
[0026]
For example, when each position corresponding to each position information is the central portion ("reference area" in the figure) of the voice input unit 110 as shown in FIG. It is provided in a relatively balanced part, that is, in the central part of the screen ("reference area" in FIG. 5). Similarly, the areas A to D can be provided at portions corresponding to the positions corresponding to the position information. Thus, when the speaker moves in a specific direction, the character on the screen can move in the same direction as that direction.
[0027]
(Image display method using image display device)
The image display method by the image display apparatus having the above configuration can be implemented by the following procedure. FIG. 7 is a flowchart showing the procedure of the image display method according to the present embodiment.
[0028]
First, the voice input unit 110 performs a step of acquiring voice uttered by a speaker (S101). In this embodiment, the voice input unit 110 can be composed of a plurality of microphones. Specifically, the voice input unit 110 that has acquired the voice uttered by the speaker outputs the acquired voice to the position estimation unit 120 as a voice signal.
[0029]
And the position estimation part 120 performs the step which estimates the position of a speaker based on the audio | voice emitted from the speaker (S102). Specifically, the position estimation unit 120 to which the audio signal is input from the audio input unit 110 first calculates a cross-correlation function of the audio signals for all combinations of microphones based on the input audio signals. .
[0030]
The position estimation unit 120 that has calculated the cross-correlation function obtains a time difference that gives the maximum value between one predetermined reference microphone and another microphone based on the calculated cross-correlation function. The position estimation unit 120 estimates the position of the speaker (sound source) based on the obtained time difference (reference document: Japanese Patent Laid-Open No. 11-304906). The position estimation unit 120 that has estimated the position of the speaker outputs the estimated position to the image search unit 130 as a position signal.
[0031]
Next, the output unit 150 performs a step of displaying the image searched by the image search unit 130 on the screen (S103). Specifically, the image search unit 130 to which the position signal is input from the position estimation unit 120 collates the position of the speaker corresponding to the input position signal with each position information stored in advance, and each position information The position information that matches the position of the speaker is retrieved from the list.
[0032]
The image search unit 130 acquires an image pattern associated with the searched position information. The image search unit 130 outputs the acquired image pattern to the output unit 150 as an image signal. The output unit 150 to which the image signal is input from the image search unit 130 displays an image pattern for the image signal on the screen based on the input image signal.
[0033]
(Operation and effect of image display device and image display method)
According to the invention according to this application, the output unit 150 displays, on the screen, an image associated with position information that matches the position based on the position of the speaker estimated by the position estimation unit 120. be able to. That is, the output unit 150 can display a specific image on the screen according to the position where the speaker is present.
[0034]
Further, the output unit 150 includes, in the image associated with the searched position information, the area corresponding to the area information based on the area information associated with the position information searched by the image search unit 130. Character can be displayed. Thereby, when each area is provided in the direction in which the character moves, the output unit 150 can move the character on the screen in the same direction as the speaker moves, for example. The speaker can feel as if he / she is communicating with the character.
[0035]
[Example of change]
In addition, this invention is not limited to the said embodiment, The change shown below can be added.
[0036]
(First change example)
As shown in FIG. 8, each of the microphones 111 a to 111 c constituting the voice input unit 110 may be arranged in a specific direction. In this modification, the microphones 111a to 111c are arranged in a line. Further, the position estimation unit 120 calculates an angle formed between the direction in which the speaker utters the sound toward each microphone and the direction in which each microphone is provided, based on the sound acquired by each microphone. It may be a thing.
[0037]
Here, FIG. 8 shows a state in which plane waves are input to the two microphones 111b and 111c. This is to explain what geometrically the time difference when plane waves are input to the two microphones 111b and 111c shows. The broken line in the figure shows an equiphase surface of a plane wave. This figure depicts how these plane waves first reach the microphone 111b and then arrive at the microphone 111c with a delay.
[0038]
As shown in the figure, the time difference (arrival time difference) at which each plane wave reaches each of the microphones 111b and 111c is obtained by dividing the product of the interval between each microphone and the remainder of the incident angle θ by the normalized sound velocity c. . That is, the arrival time difference can be expressed as a microphone interval cos θ / c. In this modification, the incident angle θ means an angle formed between the direction in which the speaker emits sound toward the microphones 111a to 111c and the direction in which the microphones 111a to 111c are provided.
[0039]
When the above equation is modified, the incident angle θ becomes arccos (c arrival time difference / microphone interval). Therefore, the incident angle θ can be specified if the arrival time difference and the microphone interval are known.
[0040]
Specifically, the position estimation unit 120 to which audio signals are input from the microphones calculates a time difference at which at least two audio signals are input based on the input audio signals. This time difference can be the arrival time difference. The position estimation unit 120 that has calculated the arrival time difference calculates the incident angle θ of the plane wave based on the calculated arrival time difference and the interval between the microphones that have acquired the audio signals according to the above formula. The position estimation unit 120 outputs the calculated incident angle θ to the image search unit 130 as an angle signal.
[0041]
Based on the incident angle θ calculated by the position estimating unit 120, the image search unit 130 collates the incident angle θ with each preset code, and the code that matches the incident angle θ from the respective codes. May be searched. This code is expressed as angle information in this modification.
[0042]
Here, in this modified example, each image includes a face of a predetermined character, and the face is displayed so as to rotate at a predetermined rotation angle by sequentially switching each image. ing. In the present modification, the size of the angle information associated with the image is assumed to match the size of the face rotation angle included in the image.
[0043]
That is, as shown in FIG. 10, since the face facing the front does not have a rotation angle, for example, the angle information 0 is associated with the image 2-0 of the face facing the front. Similarly, as shown in FIG. 11, a face that faces 30 ° obliquely has a rotation angle of 30 °. Therefore, for example, an image 2-30 of a face that faces 30 ° obliquely has a rotation angle of 30 °. The angle information 30 is associated.
[0044]
Specifically, the image search unit 130 to which the angle signal is input from the position estimation unit 120, from the angle information based on the incident angle θ corresponding to the input angle signal, the incident angle θ (for example, 3 ) Is searched for angle information 3 that matches. The image search unit 130 searching for the angle information 3 acquires the image 2-3 associated with the angle information 3 based on the searched angle information 3. The output unit 150 displays the image 2-3 searched by the image search unit 130 on the screen.
[0045]
According to the invention according to the present modification example, the image search unit 130, based on the incident angle θ calculated by the position estimating unit 120, searches for an image associated with angle information that matches the incident angle θ. It can be displayed on the screen. In addition, since the image search unit 130 can operate the image according to the position of the speaker rather than uniquely operating the image on the screen, the speaker can display the image displayed on the screen. You can feel as if you are communicating with the other.
[0046]
Furthermore, the image search unit 130 can change the rotation angle of the face image displayed on the screen in accordance with the incident angle of the sound emitted from the speaker. As a result, if the speaker emits a predetermined sound, the face image displayed on the screen faces the speaker, so that the speaker is also communicating with the face image. You can taste a feeling like this.
[0047]
(Second modified example)
Note that the image storage unit 140 may store a plurality of predetermined keywords in advance. Further, the voice input unit 110 may recognize a character string corresponding to a voice uttered by a speaker. Further, when the image search unit 130 can search for a keyword that matches the character string based on the character string recognized by the voice input unit 110, the image search unit 130 associates the keyword with the position information searched by the own unit. It is preferable to acquire an image.
[0048]
In this case, when the image search unit 130 can search for a keyword that matches the character string based on the character string corresponding to the voice uttered by the speaker, the image search unit 130 corresponds to the searched position information. The attached image can be displayed on the output unit 150. As a result, the speaker moves the image on the screen in response to his / her specific words, so that the speaker can feel as if he / she is communicating with the image.
[0049]
Note that the content of each keyword includes a greeting sentence exchanged with each speaker first or last. In this case, if the speaker utters a specific greeting, the image on the screen moves in response to the greeting, so the speaker has an intimate relationship with the image on the screen. You can enjoy the feeling of having it. As a result, the speaker is more motivated to talk more events with the image on the screen, and can have a more enjoyable boring time.
[0050]
[program]
The contents described in the image display apparatus and the image display method can be realized by executing a dedicated program written in a predetermined program language on a general-purpose computer such as a personal computer.
[0051]
According to such a program according to the present embodiment, by changing the image to be displayed on the screen according to the position where the speaker is, the speaker sees the changed image and the other speaker An image display device and an image display method that have an effect of being able to taste a sense of communicating with a general-purpose computer can be easily realized.
[0052]
The program can be recorded on a recording medium. Examples of the recording medium include a hard disk 200, a flexible disk 300, a compact disk 400, an IC chip 500, and a cassette tape 600 as shown in FIG. According to the recording medium on which such a program is recorded, the program can be easily stored, transported, sold, and the like.
[0053]
【The invention's effect】
As described above, according to the present invention, by changing the image displayed on the screen according to the position where the speaker is, the speaker can see the changed image from the other speaker. You can feel as if you are communicating with each other.
[Brief description of the drawings]
FIG. 1 is a block diagram showing an internal configuration of an image display apparatus according to an embodiment.
FIG. 2 is a diagram illustrating a virtual range in which a speaker is expected to act in front of a voice input unit according to the present embodiment.
FIG. 3 is a diagram illustrating a speaker position estimated by a position estimation unit according to the present embodiment.
FIG. 4 is a diagram showing position information and contents of each image stored in an image storage unit in the present embodiment.
FIG. 5 is a diagram illustrating a state in which a predetermined character moves on the screen according to the present embodiment.
FIG. 6 is a diagram illustrating each region where an output unit according to the present embodiment displays a character on the screen according to the position of a speaker.
FIG. 7 is a flowchart showing a procedure of an image display method according to the present embodiment.
FIG. 8 is a diagram illustrating a state in which a position estimation unit according to the present modification specifies an incident angle of a plane wave.
FIG. 9 is a diagram illustrating each angle information stored in an image storage unit and contents of each image in the present modification example.
FIG. 10 is a diagram showing a human character displayed on the output unit in the modified example (No. 1);
FIG. 11 is a diagram showing a character of a person displayed on the output unit in the modified example (No. 2).
FIG. 12 is a diagram showing a recording medium for recording a program in the present embodiment.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 100 ... Image display apparatus 110 ... Audio | voice input part 111 ... Microphone 120 ... Position estimation part 130 ... Image search part 140 ... Image storage part 150 ... Output part 160 ... Image search part 200 ... Hard disk 300 ... flexible disk, 400 ... compact disk, 500 ... IC chip, 600 ... cassette tape

Claims

Each multiple images displayed on one area of the screen, the one of the position information of the positional relationship corresponding to each area, position corresponding to a position that will be expected that it would talker is present Image storage means associated with position information and storing each of the images in advance;
Position estimation means for estimating the position of the speaker based on the voice emitted from the speaker;
Based on the position estimated by the position estimating means, the position is compared with each position information stored in advance, and the position information corresponding to the position is searched from the position information, Image acquisition means for acquiring the image associated with the searched position information;
Display means for displaying the image acquired by the image acquisition means in an area of the screen corresponding to the position information corresponding to the position estimated by the position estimation means. An image display device.

The image display device according to claim 1,
Keyword storage means for storing a plurality of predetermined keywords in advance;
Character recognition means for recognizing a character string corresponding to a voice emitted from the speaker;
When the image acquisition unit can search for the keyword that matches the character string based on the character string recognized by the character recognition unit, the image acquisition unit is associated with the searched position information. An image display device that acquires the image.

The image display device according to claim 1, wherein:
Each of the images includes a predetermined character, and the position information associated with each of the images is associated with region information for displaying the character in a specific region.
Based on the region information associated with the position information retrieved by the image acquisition unit, the display unit associates the region associated with the region information with the retrieved location information. An image display device that displays the character included in an image.

The image display device according to claim 3,
Each area is provided in a direction in which the character moves.

A plurality of images are stored in advance , which are associated with position information where the speaker is expected to be present, and are displayed in an arrangement-related area corresponding to the position information on the screen ,
Estimating the position of the speaker based on speech emitted from the speaker;
Based on the estimated position, the position is compared with each position information stored in advance, the position information corresponding to the position is searched from the position information, and the searched position Obtaining the image associated with the information;
And displaying the acquired image in an area of the layout corresponding to the position information corresponding to the estimated position on the screen .

The image display method according to claim 5,
A plurality of predetermined keywords are stored in advance,
Recognizing a character string corresponding to speech uttered by the speaker;
Obtaining the image associated with the searched position information when the keyword matching the character string can be searched based on the recognized character string. An image display method characterized by the above.

The image display method according to claim 5, wherein:
Each of the images includes a predetermined character, and the position information associated with each of the images is associated with region information for displaying the character in a specific region.
Based on the area information associated with the retrieved position information, the character included in the image associated with the retrieved position information is displayed in the area corresponding to the area information. An image display method characterized by the above.

The image display method according to claim 7,
The image display method according to claim 1, wherein each of the regions is provided in a direction in which the character moves.

Each multiple images displayed on one area of the screen, among the position information of the positional relationship corresponding to each region, correlated position information is expected would talker is present Each image is stored in advance,
On the computer,
Estimating the position of the speaker based on speech emitted from the speaker;
Based on the estimated position, the position is compared with each position information stored in advance, the position information corresponding to the position is searched from the position information, and the searched position Obtaining the image associated with the information;
And displaying the acquired image in an area of the layout corresponding to the position information corresponding to the estimated position of the screen .

The program according to claim 9, wherein
A plurality of predetermined keywords are stored in advance,
Recognizing a character string corresponding to speech uttered by the speaker;
A step of acquiring the image associated with the searched position information when the keyword matching the character string can be searched based on the recognized character string. A program that executes

A program according to any one of claims 9 and 10,
Each of the images includes a predetermined character, and the position information associated with each of the images is associated with region information for displaying the character in a specific region.
Processing for displaying the character included in the image associated with the retrieved position information in the area corresponding to the area information based on the area information associated with the retrieved position information A program that executes

The program according to claim 11,
The area is provided in a direction in which the character moves.