JP4003544B2

JP4003544B2 - Display / voice linkage system, server and method

Info

Publication number: JP4003544B2
Application number: JP2002166294A
Authority: JP
Inventors: 隆浩村上
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2002-06-06
Filing date: 2002-06-06
Publication date: 2007-11-07
Anticipated expiration: 2022-06-06
Also published as: JP2004015443A

Description

【０００１】
【発明の属する技術分野】
本発明は、ユーザの個人情報を用いることなくＷｅｂページの表示による表示サービスと音声対話を行うことによる音声サービスとの連携を図ることができる表示・音声連携システム、表示・音声連携サーバ、および表示・音声連携方法に関する。
【０００２】
【従来の技術】
従来から、インターネットなどの通信ネットワークに接続されているＷＷＷ（World Wide Web）サーバによるＷｅｂページの表示を利用した表示サービスと、公衆電話回線などの通信ネットワークに接続されている音声対話サーバによる音声対話機能を用いた音声サービスとを連携させた表示・音声連携システムが利用されている。
【０００３】
図８は、従来の表示・音声連携システムの構成例を示すブロック図である。表示・音声連携システム１００は、表示サービスを実行するＷＷＷサーバ１２０と、音声サービスを実行する音声対話サーバ１３０と、ブラウザ機能１４１および通話機能１４２を備えた例えば携帯電話端末などのユーザ端末１４０とを含む。ここでは、ＷＷＷサーバ１２０は、着信メロディ（電話機での着信時に着信音として用いられるメロディ）を提供するための着信メロディ提供サイトを運営しているものとする。ＷＷＷサーバ１２０およびユーザ端末１４０は、それぞれ、インターネットなどの通信ネットワーク１５０に接続される。また、音声対話サーバ１３０およびユーザ端末１４０は、公衆電話回線網１６０に接続される。なお、公衆電話回線網１６０は、通信ネットワーク１５０に接続されている。
【０００４】
次に、表示・音声連携システム１００の動作例について説明する。ユーザ端末１４０は、ユーザの操作に応じて、ブラウザ機能１４１を用いてＷＷＷサーバ１２０が提供する着信メロディ提供サイトにアクセスし、着信メロディ提供サイトにおける着信メロディを選択するためのＷｅｂページを自己が備える表示装置に表示する。このＷｅｂページには、着信メロディとして取得しようとするメロディのタイトル（以下、「着信メロディのタイトル」という）を入力するためのタイトル入力領域が含まれる。
【０００５】
次いで、ユーザ端末１４０は、ユーザからの指示があると、通話機能１４２を用いて音声対話サーバ１３０に発呼する。ユーザ端末１４０と音声対話サーバ１３０との接続が確立すると、音声対話サーバ１３０は、ユーザが取得を希望している着信メロディのタイトルを特定するための音声対話処理を実行する。音声対話処理によって着信メロディのタイトルが特定され、音声対話処理が終了すると、音声対話サーバ１３０は、音声対話処理によって特定された着信メロディのタイトルを示す情報を例えば専用回線などの通信ネットワークを介してＷＷＷサーバ１２０に送信する。
【０００６】
また、音声対話処理が終了すると、ユーザ端末１４０は、ブラウザ機能１４１を用いてＷＷＷサーバ１２０が提供する着信メロディ提供サイトにアクセスし、Ｗｅｂページの更新を要求する。ＷＷＷサーバ１２０は、Ｗｅｂページの更新要求に応じて、音声対話サーバから取得した着信メロディのタイトルを示す情報を反映させたＷｅｂページを示すデータを送信する。すると、ユーザ端末１４０の表示装置に、着信メロディを選択するためのＷｅｂページが、タイトル入力領域にタイトルが表示された状態で表示される。つまり、ユーザ端末１４０の表示装置に表示されているＷｅｂページのタイトル入力領域に、ユーザ端末１４０と音声対話サーバ１３０との間で実行された音声対話によって特定された着信メロディのタイトルが入力されたことになる。
【０００７】
このように、表示・音声連携システムを利用することで、例えば、携帯電話端末などのユーザ端末が備える表示装置に表示されている情報入力領域への情報入力を、音声を発声することによって行うことができる。
【０００８】
【発明が解決しようとする課題】
上記のような表示・音声連携システムにおいては、ＷＷＷサーバによる表示サービスと音声サーバによる音声サービスは通信経路が異なるため、ＷＷＷサーバと音声サーバとを関連付けて、表示サービスと音声サービスとの連携を図る必要がある。従来は表示サービスと音声サービスとの連携は、例えば特開２００１−２６８２４１に開示されているシステムのように、表示・音声連携システムを利用するユーザ端末における音声通話のための発信者番号にもとづいて図られている。
【０００９】
従って、表示・音声連携システムを利用するためには、ＷＷＷサーバによる表示サービスを利用して使用する端末装置の電話番号をあらかじめ登録しておく必要があった。このように、個人情報を開示したあとでなければ表示・音声連携システムを利用することができないため、表示・音声連携システムの利用を促進することが困難であるという問題があった。
【００１０】
また、表示・音声連携システムが提供するサービスを受けるときには、そのサービスの提供のために必要とされているか否かにかかわらず、ログイン操作が必要であった。ログイン操作は、電話番号を入力することで行われたり、ＷＷＷサーバに電話番号を登録するユーザ登録の際に定められたユーザ名を入力することで行われる。このように、表示・音声連携システムを利用する度にログイン操作を行わなければならなず、ユーザにとって煩わしい操作を強いられるという問題があった。
【００１１】
本発明は上述した問題を解消し、個人情報を開示することなく簡単な操作で表示と音声の連携サービスを受けることができるようにすることを目的とする。
【００１２】
【課題を解決するための手段】
上記の問題を解決するために、本発明の表示・音声連携システム（例えば表示・音声連携システム１０）は、通信ネットワークに接続される端末装置（例えばユーザ端末４０）と、Ｗｅｂページを用いて情報の提供や収集を行うＷＷＷサーバ（例えばＷＷＷサーバ２０）と、通信ネットワークを介して音声による情報の入出力によって音声対話処理を実行する音声対話サーバ（例えば音声対話サーバ３０）とを備えた表示・音声連携システムであって、端末装置とＷＷＷサーバとの間で行われた一連の処理の流れを示すセッション情報を記憶するセッションデータベースを備え、Ｗｅｂページを表示するためのＷｅｂページデータは、音声対話サーバに向けて発呼するための発呼データ（例えば電話番号データ）を含み、ＷＷＷサーバは、端末装置からのＷｅｂページの取得要求に応じて、音声対話サーバとの連携を図るための連携データ（例えば図４に示すデータ）として使用する文字列データを決定し、セッション情報及び端末装置についての通信管理情報（例えばCookieなどのセッション識別子）に連携データを対応付けしてセッションデータベースに保存するとともに（例えばステップＳ１０３）、連携データを発呼データに関連付けしてＷｅｂページデータに設定したあと（例えばステップＳ１０４）、連携データが設定されたＷｅｂページデータを端末装置に向けて送信する処理（例えばステップＳ１０５）を実行し、端末装置は、受信したＷｅｂページデータにもとづいてＷｅｂページを表示するブラウザ機能（例えばブラウザ機能４１）と、ユーザからの要求に応じてＷｅｂページデータに含まれている発呼データを用いて音声対話サーバに向けて発呼し、当該発呼データに関連付けされている連携データとしての文字列データにもとづくトーンを出力する通話機能（例えば通話機能４２）とを有し、音声対話サーバは、端末装置からのトーンを文字列データとすることで連携データを生成し（例えばステップＳ１１０）、生成した連携データをＷＷＷサーバに送信し、ＷＷＷサーバは、さらに、セッションデータベースが記憶するセッション情報のうち、音声対話サーバから受信した連携データに対応するセッション情報を特定し、特定したセッション情報から端末装置が要求した音声によるサービスがいずれのサービスであるかを判断し、端末装置が要求した音声によるサービスの判断結果を音声対話サーバに送信し、音声対話サーバは、さらに、ＷＷＷサーバから受信した判断結果に基づいて生成した連携データを用いて音声対話処理の実行内容を決定する（例えばステップＳ１１１）ようにしたものである。
ＷＷＷサーバは、音声対話サーバから受信した連携データに対応するセッション情報を特定すると、特定したセッション情報の中から、音声対話サーバから受信した連携データと同一の連携データが設定されているＷｅｂページデータを特定し、特定したＷｅｂページデータから端末装置が要求した音声によるサービスがいずれのサービスであるかを判断する構成とされてもよい。
セッションデータベースは、セッション情報及び通信管理情報を連携データに対応付けて記憶する構成とされてもよい。
ＷＷＷサーバは、数字又は記号からなる文字列を更新する文字列カウンタが更新した文字列を抽出することによって、連携データとして文字列を決定する構成とされてもよい。
【００１３】
上記の構成としたことで、連携データを用いてＷＷＷサーバと音声対話サーバとの連携を図ることができるようになり、端末装置を使用するユーザは、個人情報を開示することなく簡単な操作で表示と音声の連携サービスを受けることができるようになる。
【００１４】
音声対話サーバが、連携データが設定されていたＷｅｂページデータにもとづくＷｅｂページの表示内容に合致した音声対話が行われるように、音声対話処理の実行内容を決定する構成とされていてもよい。
【００１５】
上記の構成としたことで、連携データを用いて、音声対話による処理が選択されたＷｅｂページによるサービスの内容に合致した音声対話処理を実行することができる。
【００１６】
ＷＷＷサーバが、端末装置からのＷｅｂページの取得要求に応じて文字列データを生成し（例えばステップＳ１０２）、生成した文字列データを使用する連携データに決定する構成とされていてもよい。
【００１７】
上記の構成としたことで、ＷＷＷサーバが生成した連携データを用いて、ＷＷＷサーバと音声対話サーバとの連携を図ることができる。
【００１８】
音声対話サーバが、音声対話処理を実行し、音声対話処理結果を示す音声対話処理結果データを、生成した連携データと同一の連携データが対応付けされている端末装置についての通信管理情報に対応付けしてシステム内（例えばＷＷＷサーバ２０、データベースサーバ）に保存するための処理（例えばＷＷＷサーバ２０に向けて連携データおよび音声対話処理結果データを送信する処理）を実行するように構成されていてもよい。
【００１９】
上記の構成としたことで、音声対話サーバによる音声対話処理の結果を、音声対話処理によるサービスを受けた端末装置に関する情報に関連付けしてシステム内に保存しておくことができる。よって、ＷＷＷサーバが、端末装置に関する情報を特定することによって、その端末装置によって行われた音声対話の結果を示す情報を取得することができる。
【００２０】
ＷＷＷサーバが、端末装置からのＷｅｂページ取得要求に応じて、端末装置についての通信管理情報に対応付けされてシステム内に保存されている音声対話処理結果データを取得し、音声対話処理結果データが示す音声対話処理結果を反映させたＷｅｂページデータを送信するように構成されていてもよい。
【００２１】
上記の構成としたことで、音声対話処理の結果を、Ｗｅｂページに反映させることができる。
【００２２】
ＷＷＷサーバが、端末装置についての通信管理情報に対応付けされてシステム内に保存されている音声対話処理結果データを取得し、音声対話処理結果データが示す音声対話処理結果を反映させたＷｅｂページデータを端末装置に送信するように構成されていてもよい。
【００２３】
上記の構成としたことで、Ｗｅｂページ取得要求を端末装置に行わせることなく、音声対話処理の結果を、Ｗｅｂページに反映させることができる。
【００２４】
連携データは、文字に対応したトーンが端末装置にて発せられる当該文字を任意に組合せた複数の文字からなる文字列データであるように構成されていてもよい。
【００２５】
上記の構成としたことで、文字列データとして通信ネットワークを介して連携データを送受することができるとともに、音データとして通信ネットワークを介して連携データを送受することができる。
【００２６】
通信管理情報として、端末装置に対応して管理されている端末管理情報（例えばCookie）が用いられる構成とされていてもよい。
【００２７】
上記の構成としたことで、端末管理情報に対応付けして連携データなどの各種の情報を保存しておくことができる。
【００２８】
また、本発明の表示・音声連携サーバは、Ｗｅｂページを用いて情報の提供や収集を行うＷＷＷサーバ（例えばＷＷＷサーバ２０）と、通信ネットワークを介して音声による情報の入出力によって音声対話処理を実行する音声対話サーバ（例えば音声対話サーバ３０）とを備えた表示・音声連携サーバ（例えばＷＷＷサーバ２０と音声対話サーバ３０とからなるサーバ）であって、通信ネットワークに接続される端末装置とＷＷＷサーバとの間で行われた一連の処理の流れを示すセッション情報を記憶するセッションデータベースを備え、Ｗｅｂページを表示するためのＷｅｂページデータは、音声対話サーバに向けて発呼するための発呼データを含み、ＷＷＷサーバは、端末装置からのＷｅｂページの取得要求に応じて、音声対話サーバとの連携を図るための連携データとして文字列データを決定し、セッション情報及び端末装置についての通信管理情報に生成した連携データを対応付けしてセッションデータベースに保存するとともに、生成した連携データを発呼データに関連付けしてＷｅｂページデータに設定したあと、生成した連携データが設定されたＷｅｂページデータを端末装置に向けて送信する処理を実行し、音声対話サーバは、端末装置がＷｅｂページデータに含まれている発呼データを用いて当該音声対話サーバに発呼したことに応じて、当該端末装置との接続を確立するための処理を実行し、接続が確立されている端末装置によって発呼データに関連付けされている連携データとしての文字列データにもとづくトーンが発せられたことに応じて、当該トーンを文字列データとして連携データを生成し、セッションデータベースが記憶するセッション情報のうち、生成した連携データに対応するセッション情報を特定し、特定したセッション情報から端末装置が要求した音声によるサービスがいずれのサービスであるかを判断し、判断結果に基づいて音声対話処理の実行内容を決定するようにしたものである。
【００２９】
上記の構成としたことで、連携データを用いてＷＷＷサーバと音声対話サーバとの連携を図ることができるようになり、端末装置を使用するユーザに対して、個人情報を開示させることなく簡単な操作で行うことができる表示と音声の連携サービスを提供することができる。
【００３０】
音声対話サーバが、連携データが設定されていたＷｅｂページデータにもとづくＷｅｂページの表示内容に合致した音声対話が行われるように、音声対話処理の実行内容を決定するように構成されていてもよい。
【００３１】
上記の構成としたことで、連携データを用いて、音声対話による処理が選択されたＷｅｂページによるサービスの内容に合致した音声対話処理を実行することができる。
【００３２】
また、本発明の表示・音声連携方法は、Ｗｅｂページを用いて情報の提供や収集を行うＷＷＷサーバ（例えばＷＷＷサーバ２０）と、通信ネットワークを介して音声による情報の入出力によって音声対話処理を実行する音声対話サーバ（音声対話サーバ３０）との連携を図るための表示・音声連携方法であって、Ｗｅｂページを表示するためのＷｅｂページデータは、音声対話サーバに向けて発呼するための発呼データを含み、ＷＷＷサーバは、通信ネットワークに接続された端末装置（例えばユーザ端末４０）からのＷｅｂページの取得要求に応じて、音声対話サーバとの連携を図るための連携データとして文字列データを決定し（例えばステップＳ１０２）、端末装置とＷＷＷサーバとの間で行われた一連の処理の流れを示すセッション情報、及び端末装置についての通信管理情報に、生成した連携データを対応付けしてセッションデータベースに保存するとともに（例えばステップＳ１０３）、生成した連携データを発呼データに関連付けしてＷｅｂページデータに設定したあと（例えばステップＳ１０４）、生成した連携データが設定されたＷｅｂページデータを端末装置に向けて送信する処理を実行し（例えばステップＳ１０４）、音声対話サーバが、端末装置がＷｅｂページデータに含まれている発呼データを用いて当該音声対話サーバに発呼したことに応じて、当該端末装置との接続を確立するための処理を実行し、接続が確立されている端末装置によって発呼データに関連付けされている連携データとしての文字列データにもとづくトーンが発せられたことに応じて、当該トーンを文字列データとして連携データを生成し（例えばステップＳ１１０）、生成した連携データをＷＷＷサーバに送信し、ＷＷＷサーバは、さらに、セッションデータベースが記憶するセッション情報のうち、音声対話サーバから受信した連携データに対応するセッション情報を特定し、特定したセッション情報から端末装置が要求した音声によるサービスがいずれのサービスであるかを判断し、端末装置が要求した音声によるサービスの判断結果を音声対話サーバに送信し、音声対話サーバは、さらに、ＷＷＷサーバから受信した判断結果に基づいて生成した連携データを用いて音声対話処理の実行内容を決定する（例えばステップＳ１１１）ものである。
【００３３】
上記の構成としたことで、連携データを用いてＷＷＷサーバと音声対話サーバとの連携を図ることができるようになり、端末装置を使用するユーザに対して、個人情報を開示させることなく簡単な操作で行うことができる表示と音声の連携サービスを提供することができる。
【００３４】
音声対話サーバが、連携データが設定されていたＷｅｂページデータにもとづくＷｅｂページの表示内容に合致した音声対話が行われるように、音声対話処理の実行内容を決定するように構成されていてもよい。
【００３５】
上記の構成としたことで、連携データを用いて、音声対話による処理が選択されたＷｅｂページによるサービスの内容に合致した音声対話処理を実行することができる。
【００３６】
【発明の実施の形態】
以下、本発明の実施の形態について図面を参照して説明する。
図１は、本発明の一実施形態である表示・音声連携システム１０の構成の例を示すブロック図である。表示・音声連携システム１０は、ＷＷＷサーバ２０と、音声対話サーバ３０と、ユーザ端末４０とを含む。ＷＷＷサーバ２０およびユーザ端末４０は、それぞれ、インターネットなどの通信ネットワーク５０に接続される。また、音声対話サーバ３０およびユーザ端末４０は、公衆電話回線網６０に接続される。なお、公衆電話回線網６０は、通信ネットワーク５０に接続されている。以下の説明において、公衆電話回線網６０含むネットワークを通信ネットワーク５０ということがある。
【００３７】
ＷＷＷサーバ２０は、例えばインターネットサーバなどの情報処理装置により構成される。ＷＷＷサーバ２０は、例えばＨＴＭＬ（Hypertext Markup Language）などのマークアップ言語により作成されたＷｅｂページデータを管理し、Ｗｅｂページデータにもとづいて表示されるＷｅｂページを用いて、各種の情報の提供や取得を行う機能を有している。Ｗｅｂページには、例えば、商品の受注を行うためのものや、アンケートの回収を行うためのものなどがある。
【００３８】
ＷＷＷサーバ２０は、本例では、Cookieと呼ばれるユーザを識別するための文字列情報を利用して本システムを利用する各ユーザを管理する。ここで、Cookieを利用してユーザ管理を行う場合の処理について簡単に説明する。先ず、ＷＷＷサーバ２０は、Cookieを生成し、ユーザ端末（例えばユーザ端末４０、具体的には、ユーザ端末４０に搭載されているブラウザ）に向けて送信する。Cookieを取得すると、ユーザ端末は、Cookieが格納されたファイル（Cookieファイル）を保存する。その後は、ユーザ端末は、ＷＷＷサーバ２０にアクセスする際に、ユーザ端末に搭載されているブラウザの機能によってCookieファイルを送信する。ＷＷＷサーバ２０は、取得したCookieファイルによって、ユーザに関する情報を認識する。このようにしてユーザに関する情報を認識することができるため、ＷＷＷサーバ２０は、最新のユーザ情報を把握することができるようになる。
【００３９】
また、ＷＷＷサーバ２０は、Ｗｅｂページを表示するためのＷｅｂページデータの他、各ユーザ端末についてのセッション（ユーザ端末とＷＷＷサーバとの間で行われた一連の処理の流れ）を示すセッション情報、各ユーザ端末に付与されているセッション識別子（各ユーザ端末の通信履歴などの通信に関する各種の情報をユーザ端末毎に管理するために用いられる通信管理情報の一例）、後述する連携データなどが格納されるデータベース２１を備えている（図５参照）。なお、本例では、セッション識別子として、上述したCookieが用いられる。また、各セッション情報は、それぞれ、セッション情報が示すセッションに関与したユーザ端末についてのセッション識別子に対応付けされている。従って、セッション識別子に対応付けされているセッション情報は、そのセッション識別子が示すユーザ端末と、ＷＷＷサーバ２０とのセッションを示す情報である。
【００４０】
音声対話サーバ３０は、一般公衆回線網６０を介して入力した音声データが示す音声を認識する音声認識機能と、文字情報にもとづいて音声合成して音声データ出力を行う音声合成機能とを有する。音声対話サーバ３０は、音声認識機能と音声合成機能とを用いて、音声よる情報の伝達や情報の取得を行う音声対話処理を実行する。この例では、音声対話サーバ３０は、ＷＷＷサーバ２０と連携して各種のサービスを提供する。例えば、ＷＷＷサーバ２０が運営しているＷｅｂページに設けられている情報入力領域に入力される情報を、音声対話処理によって取得するサービスを行う。この音声対話サーバ３０は、音声認識や音声合成を行うための辞書データを有している。
【００４１】
ユーザ端末４０は、図１に示すように、一般公衆回線網６０を介して接続先との間で音声通話を行うための通話機能４１を有するとともに、自己が備える例えばＬＣＤ（Liquid Crystal Display）などの表示装置にＷｅｂページを表示したり、自己が備える入力装置を用いてＷｅｂページ上で文字入力や情報選択を行うためのブラウザ機能４２を有している。ユーザ端末４０は、例えばＰＤＣ（Personal Digital Cellular）規格に準拠したディジタル携帯電話などの携帯電話端末によって構成される。ユーザ端末４０は、通信ネットワーク５０への接続や、通信ネットワーク５０を利用した情報の送受などを行うことができる環境（例えばブラウザなどのソフトウェアや、ハードウェアなどにおける環境）を備えている。
【００４２】
次に、本例の表示・音声連携システム１０の動作について図面を参照して説明する。図２、図３は、本例の表示・音声連携システム１０における表示・音声連携処理および処理タイミングの一例を示すタイミングチャートである。
【００４３】
先ず、ユーザ端末４０は、ユーザの操作に応じて、通信ネットワーク５０を介してＷＷＷサーバ２０にアクセスする（ステップＳ１０１）。例えば、ＷＷＷサーバ２０が提供しているＷｅｂページのＵＲＬ（Uniform Resource Locator）を指定することでアクセスする。
【００４４】
ユーザ端末４０からのアクセスがあり、Ｗｅｂページを表示するためのＷｅｂページデータの取得要求があった場合には、ＷＷＷサーバ２０は、先ず、音声対話サーバ３０との連携を図るための連携データを生成する（ステップＳ１０２）。連携データは、例えば図４に示すように、特定のトーンを発することを電話機に指定することができる数字や記号（例えば「＃」）を、複数個任意に組合せた文字列によって構成される。この例では、連携データは、既に生成されて保存している他の連携データの何れにも一致しない文字列となるように生成される。
【００４５】
なお、この例では、Ｗｅｂページデータには、音声対話サーバ３０との音声対話による処理を選択するための音声対話選択領域をＷｅｂページ上に表示するための音声対話選択領域表示データと、音声対話サーバ３０に向けて発呼するための電話番号を示す電話番号データとが、互いに関連付けされた状態で含まれている。すなわち、音声対話選択領域表示データと電話番号データとが、マークアップ言語によってＷｅｂページデータ内に表記されている。また、Ｗｅｂページデータ内に、マークアップ言語によって、音声対話選択領域表示データが示す音声対話選択領域が選択されると、電話番号データが示す電話番号を用いて発呼を行うように指示する記述がなされている。
【００４６】
次いで、ＷＷＷサーバ２０は、生成した連携データを、ユーザ端末３０についてのセッション識別子に対応付けた状態でデータベース２１に保存する（ステップＳ１０３）。図５は、データベース２１の格納状態の例を示す説明図である。図５に示すように、各セッション識別子に対応付けされた状態で、セッション情報、連携データ、音声対話結果情報などの各種の情報が格納されている。従って、連携データに対応するセッション情報やセッション識別子を確認することで、その連携データが設定されたＷｅｂページデータをどのユーザ端末が取得したかを特定することができるようになる。
【００４７】
また、ＷＷＷサーバ２０は、生成した連携データを、ユーザ端末４０に送信するＷｅｂページデータの中に設定する（ステップＳ１０４）。具体的には、マークアップ言語で構成されているＷｅｂページデータ内の所定の箇所に、連携データを表記する処理を行う。この例では、連携データは、Ｗｅｂページデータに含まれている音声対話サーバ３０の電話番号を示す電話番号データに関連付けされた状態で設定される。
【００４８】
次いで、ＷＷＷサーバ２０は、連携データを設定したＷｅｂページデータを、ユーザ端末４０に向けて通信ネットワーク５０を介して送信する（ステップＳ１０５）。送信されるＷｅｂページデータには、音声対話サーバ３０の電話番号を示す電話番号データと、ステップＳ１０２で生成された連携データとが含まれている。
【００４９】
ユーザ端末４０は、Ｗｅｂページデータを受信すると、ブラウザ機能４１によって、受信したＷｅｂページデータにもとづくＷｅｂページを自己が備える表示装置に表示する（ステップＳ１０６）。
【００５０】
図６は、ユーザ端末４０に表示されるＷｅｂページの表示状態の例を示す説明図である。ここでは、ＷＷＷサーバ２０が、チケットの予約受付、チケットの予約内容の変更、チケットの予約の取消しなどのサービスを提供している場合を例に説明する。図６には、チケットの予約内容の変更を行うためのＷｅｂページの表示状態の例が示されている。図６に示すように、Ｗｅｂページには、現在のチケットの予約内容を表示する表示領域と、変更後の予約内容を入力する入力領域と、音声対話によって変更後の予約内容を入力することを選択する音声対話選択領域７０とが設けられている。
【００５１】
Ｗｅｂページにおいて音声対話選択領域７０が押下されると、ユーザ端末４０のブラウザ機能４１は、通話機能４２を呼び出し（ステップＳ１０７）、音声対話選択領域７０を表示させるための音声対話選択領域表示データに関連付けされている電話番号データが示す電話番号を用いて発呼することを指示する。呼び出された通話機能４２は、ブラウザ機能４１からの指示に従って、Ｗｅｂページデータ内に設定されている電話番号データが示す電話番号を用いて、音声対話サーバ３０に向けて発呼を行う（ステップＳ１０８）。
【００５２】
音声対話サーバ３０がユーザ端末４０からの発呼に応じて通信回線が接続状態になったことを確認すると、ユーザ端末４０は、発呼に用いた電話番号データに関連付けされている連携データが示す文字列にもとづいて、文字列の各文字に対応するトーンを発する処理を実行する（ステップＳ１０９）。
【００５３】
ユーザ端末４０からのトーンが入力すると、音声対話サーバ３０は、入力したトーンに対応する文字列を生成することで、文字列による連携データを生成する（ステップＳ１１０）。
【００５４】
連携データを生成すると、音声対話サーバ３０は、生成した連携データを用いて音声対話処理の実行内容を決定する（ステップＳ１１１）。具体的には、例えば、音声対話サーバ３０は、先ず、生成した連携データをＷＷＷサーバ２０に送信し、ＷＷＷサーバ２０に、ステップＳ１１０で生成した連携データと同一の連携データに関連付けされているセッション情報（例えば、最近追加された数バイト分のデータなど、セッション情報の一部であってもよい）をデータベース２１から探索させる。次いで、ＷＷＷサーバ２０は、探索したセッション情報の中からステップＳ１１０で生成された連携データと同一の連携データが設定されているＷｅｂページデータを特定する。この特定したＷｅｂページデータにもとづいて、ユーザ端末４０がどのＷｅｂページを経由して音声対話サーバ３０に向けて発呼を行ったかを確認することができる。ＷＷＷサーバ２０は、特定したＷｅｂページデータから、ユーザ端末４０を用いてどのようなサービスを音声によって受けようとしていたかを確認し、その確認結果を音声対話サーバ３０に送信する。そして、音声対話サーバ３０は、受信した確認結果を示す情報にもとづいて、実行する音声対話処理の内容を決定する。例えば、図６に示したＷｅｂページを経由して音声対話サーバ３０に向けて発呼を行ったことが特定された場合には、チケットの変更を音声対話によって行うための音声対話処理を実行することに決定する。このようにして、ステップＳ１１１での音声対話処理の実行内容が決定されるようにすればよい。
【００５５】
なお、ステップＳ１１１での音声対話処理の実行内容の決定は、どのようにして行われるようにしてもよい。例えば、本システムを、例えばデータベース２１の格納データを管理するデータベースサーバを有する構成とし、音声対話サーバ３０がデータベースサーバにアクセスすることで、ユーザ端末４０を用いてどのようなサービスを音声によって受けようとしていたかを確認するようにしてもよい。
【００５６】
音声対話処理の実行内容を決定すると、音声対話サーバ３０は、決定した音声対話処理を実行し、音声対話処理にて、ユーザ端末４０に対して音声による情報の報知を行うとともに、ユーザ端末４０からの音声を入力して情報を取得する（ステップＳ１１２）。
【００５７】
音声対話処理を終了すると、音声対話サーバ３０は、音声対話処理の結果を示す音声対話処理結果データと、ステップＳ１１０にて生成した文字列の連携データをＷＷＷサーバ２０に向けて送信する（ステップＳ１１３）。ＷＷＷサーバ２０は、受信した音声対話処理結果データを、受信した連携データと同一の連携データに対応付けされているセッション識別子に対応付けして保存する（ステップＳ１１４）。
【００５８】
また、音声対話処理を終了すると、ユーザ端末４０の通話機能４２は、ブラウザ機能４１を呼び出す（ステップＳ１１５）。呼び出されたユーザ端末４０のブラウザ機能４１は、ＷＷＷサーバ２０に対して、ユーザ端末４０の表示装置に表示されている表示情報の更新を要求する（ステップＳ１１６）。ＷＷＷサーバ２０は、更新要求に応じて、ユーザ端末４０についてのセッション識別子に対応付けされている音声対話処理結果データを読み出して、音声対話処理の結果を反映させたＷｅｂページデータを作成する（ステップＳ１１７）。そして、音声対話処理の結果を反映させたＷｅｂページデータを送信する（ステップＳ１１８）。
【００５９】
Ｗｅｂページデータを受信すると、ユーザ端末４０のブラウザ機能４１によって、受信したＷｅｂページデータにもとづくＷｅｂページが表示される（ステップＳ１１９）。Ｗｅｂページの表示内容は、例えば図７に示すように、音声対話処理によって入力された情報の内容が反映された状態となっている。なお、図７は、図６に示すＷｅｂページから音声対話が選択され、音声対話処理によってチケット予約の変更内容を示す情報が音声入力されたあと、音声対話処理の結果が反映されたＷｅｂページの表示状態の例を示す説明図である。
【００６０】
以上説明したように、ＷＷＷサーバ２０が生成した文字列による連携データを、連携データにもとづくトーンによって音声対話サーバ３０に伝達する構成としたので、ＷＷＷサーバ２０が生成した連携データによって、音声対話サーバ３０が、どのＷｅｂページを経由してユーザ端末４０が接続してきたかを確認することができ、ＷＷＷサーバ２０によるＷｅｂページによるサービスと、音声対話サーバ３０による音声対話によるサービスとを連携させることができる。このように、連携データを用いて表示と音声の連携を図っているので、端末装置を使用するユーザは、ユーザ端末の電話番号などの個人情報を開示することなく表示と音声の連携サービスを受けることができるようになる。なお、ユーザ端末４０は、音声対話サーバ３０に発呼するときに発信者電話番号を通知しなくてよいので、ＷＷＷサーバ２０だけでなく音声対話サーバ３０に対しても個人情報を開示する必要はない。
【００６１】
また、上述したように、Cookieと呼ばれるユーザ管理情報（端末管理情報）を用いてユーザ管理（端末管理）を行う構成としているので、ユーザは、ログイン動作の必要のない簡単な操作で表示と音声の連携サービスを受けることができるようになる。
【００６２】
また、上述したように、ＷＷＷサーバ２０が生成した文字列による連携データをセッション情報などに対応付けして保存し、音声対話サーバ３０が音声対話結果データを連携データとともに送信する構成としたことで、ＷＷＷサーバ２０が、音声対話の結果を示す情報を、音声対話を行ったユーザ端末４０についてのセッション情報に対応付けして保存することができる。
【００６３】
なお、上述した実施の形態では、音声対話処理を実行する音声対話サーバ３０を用いる構成とし、音声対話処理を音声認識や音声合成を行うことによって実行するようにしていたが、音声対話処理を人間が行うようにしてもよい。すなわち、音声対話サーバ３０の代わりに、オペレーションセンタを備える構成としてもよい。この場合、オペレーションセンタでは、通信回線が接続されたあとに入力したトーンから文字列の連携データを生成する処理などは上述した音声対話サーバ３０と同様に実行されるが、音声対話処理はオペレータによって行われる。
【００６４】
また、上述した実施の形態では、ＷＷＷサーバ２０が、Ｗｅｂページデータの送信要求を受けたときに、任意の文字列による連携データを生成する構成としていたが、ＷＷＷサーバ２０から音声対話サーバ３０に向けての一方の連携だけを図るようにする場合（上述した音声・表示連携処理におけるステップＳ１１２までの処理を行う場合）には、あらかじめ定められている文字列による連携データを用いる構成としてもよい。この場合、各Ｗｅｂページデータに、それぞれ、あらかじめ定められている所定の連携データ（Ｗｅｂページデータ毎に定められている別個の連携データ）を設定しておくようにすればよい。また、音声対話サーバ３０が、各連携データが設定されているＷｅｂページデータによるＷｅｂページの内容（例えば、チケット予約をするＷｅｂページであるなどのような内容）を示す情報が格納されているデータベースを備えるようにすればよい。また、上述したあらかじめ定められている文字列と、ＷＷＷサーバ２０が、Ｗｅｂページデータの送信要求を受けたときに生成する任意の文字列を組み合わせ、連携データとして用いる構成としてもよい。この場合、音声対話サーバ３０が、各連携データが設定されているＷｅｂページデータによるＷｅｂページの内容（例えば、チケット予約をするＷｅｂページであるなどのような内容）を示す情報が格納されているデータベースを備え、音声対話処理結果は任意の文字列と関連付けられシステム内に保存されるようにすればよい。
【００６５】
また、上述した実施の形態では、ＷＷＷサーバ２０が、連携データとしての文字列データを、既に生成されて保存している他の連携データの何れにも一致しない文字列となるように生成する構成としていたが、他の連携データの内容とは無関係に、特定のトーンを発することを電話機に指定することができる数字や記号からなる文字列を更新していく文字更新カウンタ（文字更新手段の一例）から文字列を抽出することで、連携データとしての文字列を生成するようにしてもよい。このように構成しても、連携データとしての文字列の桁数を多くすれば、他の連携データを一致した連携データが生成されることは防止できる。この場合、文字更新カウンタは、例えば、０〜９の数字や「♯」などの記号の組合せからなる所定桁数の文字列をランダムに更新する構成とすればよい。そして、ＷＷＷサーバ２０が、ステップＳ１０２にて、文字更新カウンタから文字列を抽出し、連携データとしての文字列データを生成するようにすればよい。このように構成すれば、連携データとしての文字列データをランダムに決定することができ、自己と前後して他人に付与された連携データであっても、自己に付与された連携データにもとづいて他人の連携データを予測することができないようにすることができる。すなわち、文字列データを規則的な順番で生成する構成とすると、あるユーザ端末に対して付与した連携データと、次にアクセスしてきた他のユーザ端末に付与した連携データとが、連番となってしまう。よって、連携データが付与されたユーザ端末のユーザは、自己に付与された連携データから他人に付与された連携データを容易に予測できてしまう。しかし、文字更新カウンタを用いて連携データを生成する構成とすれば、他人に付与された連携データを予測することは不可能となる。従って、他人に付与された連携データを音声対話サーバ３０に送信し、音声対話サーバ３０やＷＷＷサーバ２０に謝った処理を実行させるような行為は防止される。よって、システム１０を安全に運用することができるようになる。
【００６６】
なお、文字更新カウンタは、生成する文字列と同じ桁数の文字列をランダムに更新するものに限らず、例えば１桁などの他の桁数の文字あるいは文字列をランダムに更新するものであってもよい。この場合、ＷＷＷサーバ２０が、生成する文字列の桁数の文字を抽出するまで、文字更新カウンタから文字または文字列を数回抽出し、抽出した文字または文字列を組合せて連携データとしての文字列を生成するようにすればよい。
【００６７】
また、上述した実施の形態では、セッション識別子としてCookieを用いる構成としていたが、ブラウザ機能４１にCookieを取り扱う機能が搭載されていないユーザ端末により本システムが利用される場合には、ＷＷＷサーバ２０が、ユーザ端末に向けて送信するＷｅｂページデータ（例えばＵＲＬ）にセッションを識別するためのパラメータを付加することとし、そのパラメータによって各ユーザ端末におけるセッションを管理するようにすればよい。
【００６８】
また、上述した実施の形態では、ユーザ端末４０が携帯電話端末であるものとして説明していたが、ブラウザ機能と通話機能とをともに備えるものであれば、ＰＤＡ(Personal Digital Assistants)やパーソナルコンピュータなどの他の端末装置であってもよい。
【００６９】
また、上述した実施の形態では、ステップＳ１１３にて、音声対話サーバ３０が音声対話処理結果データと文字列の連携データをＷＷＷサーバ２０に向けて送信する構成としていたが、本システムがデータベース２１の格納データを管理するデータベースサーバを有する構成とし、そのデータベースサーバに向けて送信する構成としてもよい。この場合、データベースサーバは、受信した音声対話処理結果データを、受信した連携データと同一の連携データに対応付けされているセッション識別子に対応付けして保存するようにし、ＷＷＷサーバ２０からの音声対話処理結果の問い合わせに応じて保存している音声対話処理結果データをＷＷＷサーバ２０に送信するようにしてもよい。すなわち、データベース２１の格納情報は、ＷＷＷサーバ２０によって管理されていなくてもよく、本システムに含まれる他のサーバ（例えば音声対話サーバ３０、データベースサーバ）によって管理されていても、複数のサーバによって共通に管理されていてもよい。つまり、データベース２１の格納情報は、本システムで管理できるような状態で保存されていれば、何処にどのような状態で保存されていてもよい。
【００７０】
また、上述した実施の形態では、ＷＷＷサーバ２０が、ユーザ端末４０からのＷｅｂページ取得要求（ステップＳ１１６の表示情報更新要求）に応じて、音声対話処理結果データを取得して（ステップＳ１１７）、音声対話処理結果データが示す音声対話処理結果を反映させたＷｅｂページデータを送信する構成（ステップＳ１１８）としていたが、音声対話処理結果データを保存したあと（ステップＳ１１４）に、ユーザ端末４０からのＷｅｂページ取得要求の有無に関わらず、音声対話処理結果データが示す音声対話処理結果を反映させたＷｅｂページデータをユーザ端末４０に送信する構成としてもよい。このように構成すれば、Ｗｅｂページ取得要求をユーザ端末４０に行わせることなく、音声対話処理の結果を、Ｗｅｂページに反映させることができる。
【００７１】
また、上述した実施の形態では、ユーザ端末４０が、ステップＳ１０８にて音声対話サーバ３０に向けて発呼を行い、接続が確立したあとにステップＳ１０９にてトーンを発するようにしていたが、ユーザ端末４０は電話番号と連携データの文字列によるトーンとを同時に出力し、公衆電話回線網６０に接続されている交換機が、ユーザ端末４０と音声対話サーバ３０との接続が確立したあとにトーンを音声対話サーバ３０に向けて出力する構成としてもよい。
【００７２】
また、上述した各実施の形態では、Ｗｅｂページデータを生成するための表示用言語としてＨＴＭＬを例にしていたが、携帯電話端末のブラウザでWebページの表示などを行うために広く用いられているC-HTML(Compact HTML)などの携帯電話端末用のマークアップ言語や、HDML(Handheld Device Markup Language)、WML(Wireless Markup Language)などの他のマークアップ言語を用いるようにしてもよい。
【００７３】
さらに、上述した各実施の形態では、音声サーバ、ユーザ端末ともに公衆電話回線網６０に接続されているが、VoIP(Voice over Internet Protocol)等のＩＰネットワークに接続してもよい。
【００７４】
【発明の効果】
以上のように、本発明の表示・音声連携システムによれば、端末装置とＷＷＷサーバとの間で行われた一連の処理の流れを示すセッション情報を記憶するセッションデータベースを備え、ＷＷＷサーバが、端末装置からのＷｅｂページの取得要求に応じて、音声対話サーバとの連携を図るための連携データとして使用する文字列データを決定し、セッション情報及び端末装置についての通信管理情報に連携データを対応付けしてセッションデータベースに保存するとともに、連携データを発呼データに関連付けしてＷｅｂページデータに設定したあと、連携データが設定されたＷｅｂページデータを端末装置に向けて送信する処理を実行する。また、端末装置が、受信したＷｅｂページデータにもとづいてＷｅｂページを表示するブラウザ機能と、ユーザからの要求に応じてＷｅｂページデータに含まれている発呼データを用いて音声対話サーバに向けて発呼し、当該発呼データに関連付けされている連携データとしての文字列データにもとづくトーンを出力する通話機能とを有する。さらに、音声対話サーバが、端末装置からのトーンを文字列データとすることで連携データを生成し、生成した連携データをＷＷＷサーバに送信する。ＷＷＷサーバは、さらに、セッションデータベースが記憶するセッション情報のうち、音声対話サーバから受信した連携データに対応するセッション情報を特定し、特定したセッション情報から端末装置が要求した音声によるサービスがいずれのサービスであるかを判断し、端末装置が要求した音声によるサービスの判断結果を音声対話サーバに送信する。音声対話サーバは、さらに、ＷＷＷサーバから受信した判断結果に基づいて音声対話処理の実行内容を決定する。このように構成したことで、連携データを用いてＷＷＷサーバと音声対話サーバとの連携を図ることができるようになり、端末装置を使用するユーザは、個人情報を開示することなく簡単な操作で表示と音声の連携サービスを受けることができるようになる。
【００７５】
音声対話サーバが、連携データが設定されていたＷｅｂページデータにもとづくＷｅｂページの表示内容に合致した音声対話が行われるように、音声対話処理の実行内容を決定する構成とされているので、連携データを用いて、音声対話による処理が選択されたＷｅｂページによるサービスの内容に合致した音声対話処理を実行することができる。
【００７６】
ＷＷＷサーバが、端末装置からのＷｅｂページの取得要求に応じて文字列データを生成し、生成した文字列データを使用する連携データに決定する構成とされているので、ＷＷＷサーバが生成した連携データを用いて、ＷＷＷサーバと音声対話サーバとの連携を図ることができる。
【００７７】
音声対話サーバが、音声対話処理を実行し、音声対話処理結果を示す音声対話処理結果データを、生成した連携データと同一の連携データが対応付けされている端末装置についての通信管理情報に対応付けしてシステム内に保存するための処理を実行するように構成されているので、音声対話サーバによる音声対話処理の結果を、音声対話処理によるサービスを受けた端末装置に関する情報に関連付けしてシステム内に保存しておくことができる。
【００７８】
ＷＷＷサーバが、端末装置からのＷｅｂページ取得要求に応じて、端末装置についての通信管理情報に対応付けされてシステム内に保存されている音声対話処理結果データを取得し、音声対話処理結果データが示す音声対話処理結果を反映させたＷｅｂページデータを送信するように構成されているので、音声対話処理の結果を、Ｗｅｂページに反映させることができる。
【００７９】
ＷＷＷサーバが、端末装置についての通信管理情報に対応付けされてシステム内に保存されている音声対話処理結果データを取得し、音声対話処理結果データが示す音声対話処理結果を反映させたＷｅｂページデータを端末装置に送信するように構成されているので、Ｗｅｂページ取得要求を端末装置に行わせることなく、音声対話処理の結果を、Ｗｅｂページに反映させることができる。
【００８０】
連携データは、文字に対応したトーンが端末装置にて発せられる当該文字を任意に組合せた複数の文字からなる文字列データであるように構成されているので、文字列データとして通信ネットワークを介して連携データを送受することができるとともに、音データとして通信ネットワークを介して連携データを送受することができる。
【００８１】
ＷＷＷサーバが、少なくとも１つの文字を更新する文字更新手段を備え、文字更新手段から抽出した文字によって連携データとしての文字列データを決定するように構成されているので、連携データとしての文字列データをランダムに決定することができ、自己と前後して他人に付与された連携データであっても、自己に付与された連携データにもとづいて予測することはできないので、他人に付与された連携データを音声対話サーバに送信して処理を実行させるような行為が防止され、システムを安全に運用することができる。
【００８２】
通信管理情報として、端末装置に対応して管理されている端末管理情報が用いられる構成とされているので、端末管理情報に対応付けして連携データなどの各種の情報を保存しておくことができる。
【００８３】
また、本発明の表示・音声連携サーバによれば、通信ネットワークに接続される端末装置とＷＷＷサーバとの間で行われた一連の処理の流れを示すセッション情報を記憶するセッションデータベースを備え、ＷＷＷサーバが、端末装置からのＷｅｂページの取得要求に応じて、音声対話サーバとの連携を図るための連携データとして文字列データを決定し、セッション情報及び端末装置についての通信管理情報に生成した連携データを対応付けしてセッションデータベースに保存するとともに、生成した連携データを発呼データに関連付けしてＷｅｂページデータに設定したあと、生成した連携データが設定されたＷｅｂページデータを端末装置に向けて送信する処理を実行する。また、音声対話サーバが、端末装置がＷｅｂページデータに含まれている発呼データを用いて当該音声対話サーバに発呼したことに応じて、当該端末装置との接続を確立するための処理を実行し、接続が確立されている端末装置によって発呼データに関連付けされている連携データとしての文字列データにもとづくトーンが発せられたことに応じて、当該トーンを文字列データとして連携データを生成し、セッションデータベースが記憶するセッション情報のうち、生成した連携データに対応するセッション情報を特定し、特定したセッション情報から端末装置が要求した音声によるサービスがいずれのサービスであるかを判断し、判断結果に基づいて音声対話処理の実行内容を決定する。このように構成されているので、連携データを用いてＷＷＷサーバと音声対話サーバとの連携を図ることができるようになり、端末装置を使用するユーザに対して、個人情報を開示させることなく簡単な操作で行うことができる表示と音声の連携サービスを提供することができる。
【００８４】
音声対話サーバが、連携データが設定されていたＷｅｂページデータにもとづくＷｅｂページの表示内容に合致した音声対話が行われるように、音声対話処理の実行内容を決定するように構成されているので、連携データを用いて、音声対話による処理が選択されたＷｅｂページによるサービスの内容に合致した音声対話処理を実行することができる。
【００８５】
また、本発明の表示・音声連携方法によれば、ＷＷＷサーバが、通信ネットワークに接続された端末装置からのＷｅｂページの取得要求に応じて、音声対話サーバとの連携を図るための連携データとして文字列データを決定し、端末装置とＷＷＷサーバとの間で行われた一連の処理の流れを示すセッション情報、及び端末装置についての通信管理情報に、生成した連携データを対応付けしてセッションデータベースに保存するとともに、生成した連携データを発呼データに関連付けしてＷｅｂページデータに設定したあと、生成した連携データが設定されたＷｅｂページデータを端末装置に向けて送信する処理を実行し、音声対話サーバが、端末装置がＷｅｂページデータに含まれている発呼データを用いて当該音声対話サーバに発呼したことに応じて、当該端末装置との接続を確立するための処理を実行し、接続が確立されている端末装置によって発呼データに関連付けされている連携データとしての文字列データにもとづくトーンが発せられたことに応じて、当該トーンを文字列データとして連携データを生成し、生成した連携データをＷＷＷサーバに送信し、ＷＷＷサーバは、さらに、セッションデータベースが記憶するセッション情報のうち、音声対話サーバから受信した連携データに対応するセッション情報を特定し、特定したセッション情報から端末装置が要求した音声によるサービスがいずれのサービスであるかを判断し、端末装置が要求した音声によるサービスの判断結果を音声対話サーバに送信し、音声対話サーバは、さらに、ＷＷＷサーバから受信した判断結果に基づいて音声対話処理の実行内容を決定するので、連携データを用いてＷＷＷサーバと音声対話サーバとの連携を図ることができるようになり、端末装置を使用するユーザに対して、個人情報を開示させることなく簡単な操作で行うことができる表示と音声の連携サービスを提供することができる。
【００８６】
音声対話サーバが、連携データが設定されていたＷｅｂページデータにもとづくＷｅｂページの表示内容に合致した音声対話が行われるように、音声対話処理の実行内容を決定するように構成されているので、連携データを用いて、音声対話による処理が選択されたＷｅｂページによるサービスの内容に合致した音声対話処理を実行することができる。
【図面の簡単な説明】
【図１】本発明の一実施の形態における表示・音声連携システムの構成の例を示すブロック図である。
【図２】本発明の一実施の形態における表示・音声連携処理および処理タイミングの一例を示すタイミングチャートである。
【図３】本発明の一実施の形態における表示・音声連携処理および処理タイミングの一例を示すタイミングチャートである。
【図４】連携データの一例を示す説明図である。
【図５】データベースの格納状態の例を示す説明図である。
【図６】Ｗｅｂページの表示状態の例を示す説明図である。
【図７】音声対話処理結果が反映されたＷｅｂページの表示状態の例を示す説明図である。
【図８】従来の表示・音声連携システムの構成の例を示すブロック図である。
【符号の説明】
１０表示・音声連携システム
２０ＷＷＷサーバ
２１データベース
３０音声対話サーバ
４０ユーザ端末
４１ブラウザ機能
４２通話機能
５０通信ネットワーク
６０一般公衆電話回線
７０音声対話選択領域[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a display / speech cooperation system, a display / speech cooperation server, and a display capable of cooperating with a display service by displaying a web page and a voice service by performing a voice dialogue without using personal information of the user -Concerning voice linkage methods.
[0002]
[Prior art]
Conventionally, a display service using display of a Web page by a WWW (World Wide Web) server connected to a communication network such as the Internet, and a voice dialog by a voice dialog server connected to a communication network such as a public telephone line. A display / speech linkage system linked with a voice service using a function is used.
[0003]
FIG. 8 is a block diagram illustrating a configuration example of a conventional display / voice cooperation system. The display / voice cooperation system 100 includes a WWW server 120 that executes a display service, a voice dialogue server 130 that executes a voice service, and a user terminal 140 such as a mobile phone terminal that includes a browser function 141 and a call function 142. Including. Here, it is assumed that WWW server 120 operates an incoming melody providing site for providing an incoming melody (a melody used as a ringing tone when an incoming call is received). The WWW server 120 and the user terminal 140 are each connected to a communication network 150 such as the Internet. The voice interaction server 130 and the user terminal 140 are connected to the public telephone line network 160. The public telephone line network 160 is connected to the communication network 150.
[0004]
Next, an operation example of the display / voice cooperation system 100 will be described. The user terminal 140 itself includes a web page for accessing an incoming melody providing site provided by the WWW server 120 using the browser function 141 and selecting an incoming melody on the incoming melody providing site in accordance with a user operation. Display on the display device. This Web page includes a title input area for inputting a title of a melody to be acquired as an incoming melody (hereinafter referred to as “title of incoming melody”).
[0005]
Next, when receiving an instruction from the user, the user terminal 140 calls the voice interaction server 130 using the call function 142. When the connection between the user terminal 140 and the voice dialogue server 130 is established, the voice dialogue server 130 executes voice dialogue processing for specifying the title of the incoming melody that the user desires to obtain. When the title of the incoming melody is specified by the voice dialogue processing and the voice dialogue processing is completed, the voice dialogue server 130 sends information indicating the title of the incoming melody specified by the voice dialogue processing via a communication network such as a dedicated line. It is transmitted to the WWW server 120.
[0006]
When the voice interaction process is completed, the user terminal 140 uses the browser function 141 to access the incoming melody providing site provided by the WWW server 120 and requests an update of the Web page. In response to the Web page update request, the WWW server 120 transmits data indicating the Web page reflecting the information indicating the title of the incoming melody acquired from the voice interaction server. Then, a web page for selecting a ringtone is displayed on the display device of the user terminal 140 with the title displayed in the title input area. That is, the title of the incoming melody specified by the voice dialogue executed between the user terminal 140 and the voice dialogue server 130 is input to the title input area of the Web page displayed on the display device of the user terminal 140. It will be.
[0007]
In this way, by using the display / voice cooperation system, for example, information input to the information input area displayed on the display device provided in the user terminal such as a mobile phone terminal is performed by uttering voice. Can do.
[0008]
[Problems to be solved by the invention]
In the display / voice cooperation system as described above, since the communication path is different between the display service by the WWW server and the voice service by the voice server, the WWW server and the voice server are associated with each other so as to link the display service and the voice service. There is a need. Conventionally, the linkage between the display service and the voice service is based on the caller number for voice call in the user terminal using the display / voice linkage system, for example, as in the system disclosed in Japanese Patent Laid-Open No. 2001-268241. It is illustrated.
[0009]
Therefore, in order to use the display / voice cooperation system, it is necessary to register in advance the telephone number of the terminal device to be used by using the display service by the WWW server. As described above, since the display / voice linkage system can be used only after the personal information is disclosed, there is a problem that it is difficult to promote the use of the display / voice linkage system.
[0010]
In addition, when receiving a service provided by the display / speech cooperation system, a login operation is required regardless of whether or not the service is required for providing the service. The login operation is performed by inputting a telephone number, or by inputting a user name determined at the time of user registration for registering a telephone number in the WWW server. As described above, there is a problem that a login operation has to be performed every time the display / voice cooperation system is used, and the user is forced to perform a troublesome operation.
[0011]
SUMMARY OF THE INVENTION It is an object of the present invention to solve the above-described problems, and to receive a display and audio linkage service with a simple operation without disclosing personal information.
[0012]
[Means for Solving the Problems]
  In order to solve the above problem, the display / voice cooperation system (for example, the display / voice cooperation system 10) of the present invention uses a terminal device (for example, the user terminal 40) connected to the communication network and a Web page to provide information. A display / provided with a WWW server (for example, WWW server 20) that provides and collects and a voice dialog server (for example, voice dialog server 30) that executes voice dialog processing by inputting and outputting information by voice via a communication network A voice linkage system comprising a session database that stores session information indicating a flow of a series of processes performed between a terminal device and a WWW server, and Web page data for displaying a Web page is a voice dialog Including call data (for example, telephone number data) for making a call to the server. In response to a Web page acquisition request from the apparatus, character string data to be used as cooperation data (for example, data shown in FIG. 4) for cooperation with the voice interaction server is determined, and communication about session information and terminal apparatus is performed. The association data is associated with management information (for example, a session identifier such as a cookie) and stored in the session database (for example, step S103), and the association data is associated with the call data and set in the web page data (for example, step). S104), a process of transmitting the Web page data in which the cooperation data is set to the terminal device (for example, step S105), and the terminal device displays a Web page based on the received Web page data ( For example, according to browser function 41) and user requests A call function (for example, calling a voice dialogue server using call data included in Web page data and outputting a tone based on character string data as linkage data associated with the call data) The voice dialogue server generates the linkage data by using the tone from the terminal device as the character string data (for example, step S110).The generated cooperation data is transmitted to the WWW server, and the WWW server furtherOf the session information stored in the session database,Receive from spoken dialogue serverIdentifying session information corresponding to the linked data, determining from the identified session information which service the voice service requested by the terminal device is,The voice service requested by the terminal device is transmitted to the voice dialogue server, and the voice dialogue server further receives the result from the WWW server.The execution content of the voice interaction process is determined using the cooperation data generated based on the determination result (for example, step S111).
  When the WWW server specifies session information corresponding to the cooperation data received from the voice interaction server, Web page data in which the same cooperation data as the cooperation data received from the voice interaction server is set from the specified session information. It may be configured to determine which service is the voice service requested by the terminal device from the specified Web page data.
  The session database may be configured to store session information and communication management information in association with linkage data.
  The WWW server may be configured to determine a character string as linkage data by extracting a character string updated by a character string counter that updates a character string composed of numbers or symbols.
[0013]
With the above configuration, the cooperation between the WWW server and the voice conversation server can be achieved using the cooperation data, and the user using the terminal device can perform a simple operation without disclosing personal information. It will be possible to receive display and audio linkage services.
[0014]
The voice conversation server may be configured to determine the execution contents of the voice conversation processing so that the voice conversation matching the display contents of the Web page based on the Web page data for which the cooperation data has been set is performed.
[0015]
With the above-described configuration, it is possible to execute voice conversation processing that matches the content of the service by the Web page for which processing by voice conversation is selected, using the cooperation data.
[0016]
The WWW server may be configured to generate character string data in response to a Web page acquisition request from the terminal device (for example, step S102) and to determine link data that uses the generated character string data.
[0017]
With the above configuration, it is possible to achieve cooperation between the WWW server and the voice conversation server using the cooperation data generated by the WWW server.
[0018]
The voice dialogue server executes voice dialogue processing, and associates the voice dialogue processing result data indicating the voice dialogue processing result with the communication management information about the terminal device associated with the same linkage data as the generated linkage data. And processing for saving in the system (for example, the WWW server 20 and the database server) (for example, processing for transmitting cooperation data and voice interaction processing result data to the WWW server 20) may be executed. Good.
[0019]
With the above configuration, the result of the voice dialogue processing by the voice dialogue server can be stored in the system in association with the information related to the terminal device that has received the service by the voice dialogue processing. Therefore, the WWW server can acquire information indicating the result of the voice conversation performed by the terminal device by specifying the information related to the terminal device.
[0020]
In response to a Web page acquisition request from the terminal device, the WWW server acquires voice interaction processing result data associated with the communication management information about the terminal device and stored in the system. Web page data reflecting the voice dialogue processing result shown may be transmitted.
[0021]
With the above configuration, the result of the voice interaction process can be reflected on the Web page.
[0022]
Web page data in which the WWW server obtains the voice interaction processing result data stored in the system in association with the communication management information about the terminal device, and reflects the voice interaction processing result indicated by the voice interaction processing result data May be configured to be transmitted to the terminal device.
[0023]
With the above configuration, the result of the voice interaction process can be reflected on the Web page without causing the terminal device to make a Web page acquisition request.
[0024]
The linkage data may be configured to be character string data including a plurality of characters obtained by arbitrarily combining the characters generated by the terminal device with a tone corresponding to the characters.
[0025]
With the above-described configuration, it is possible to transmit / receive cooperative data as character string data via a communication network, and to transmit / receive cooperative data as sound data via a communication network.
[0026]
As the communication management information, terminal management information (for example, Cookie) managed corresponding to the terminal device may be used.
[0027]
With the above configuration, various types of information such as cooperation data can be stored in association with the terminal management information.
[0028]
  In addition, the display / speech cooperation server of the present invention performs a voice interaction process by inputting / outputting information by voice via a communication network and a WWW server (for example, WWW server 20) that provides and collects information using a Web page. A display / voice cooperation server (for example, a server comprising the WWW server 20 and the voice dialogue server 30) having a voice dialogue server (for example, the voice dialogue server 30) to be executed, and a terminal device connected to the communication network and the WWW A session database for storing session information indicating a flow of a series of processes performed with the server is provided, and the Web page data for displaying the Web page is called for calling to the voice interaction server. The WWW server includes data, and in response to a Web page acquisition request from the terminal device, Character string data is determined as cooperation data for cooperation, and the generated cooperation data is stored in the session database in association with the session information and the communication management information about the terminal device, and the generated cooperation data is called data. After the web page data is set in association with the web page data, the web page data in which the generated linkage data is set is transmitted to the terminal device. The voice conversation server includes the terminal device in the web page data. Using outgoing call dataCalled the voice conversation serverDepending on,In response to execution of a process for establishing a connection with the terminal device, and a tone based on the character string data as linkage data associated with the call data is issued by the terminal device with which the connection is established Then, the linkage data is generated using the tone as character string data, the session information corresponding to the generated linkage data is identified from the session information stored in the session database, and the terminal device uses the voice requested by the terminal device from the identified session information. It is determined which service is a service, and the execution content of the voice interaction process is determined based on the determination result.
[0029]
With the above configuration, it becomes possible to link the WWW server and the voice conversation server using the link data, and it is simple without disclosing personal information to the user who uses the terminal device. It is possible to provide a display and audio linkage service that can be performed by operation.
[0030]
The voice conversation server may be configured to determine the execution contents of the voice conversation processing so that the voice conversation matching the display contents of the web page based on the web page data for which the cooperation data has been set is performed. .
[0031]
With the above-described configuration, it is possible to execute voice conversation processing that matches the content of the service by the Web page for which processing by voice conversation is selected, using the cooperation data.
[0032]
  In addition, the display / speech cooperation method of the present invention performs a voice interaction process by inputting / outputting information via a communication network and a WWW server (for example, the WWW server 20) that provides and collects information using a Web page. A display / speech cooperation method for coordinating with a voice dialogue server to be executed (voice dialogue server 30), wherein web page data for displaying a web page is for calling to the voice dialogue server The WWW server includes the call data, and in response to a Web page acquisition request from a terminal device (for example, the user terminal 40) connected to the communication network, Session for determining data (for example, step S102) and showing a flow of a series of processes performed between the terminal device and the WWW server Information and the communication management information about the terminal device are associated with the generated cooperation data and stored in the session database (for example, step S103), and the generated cooperation data is associated with the call data and set in the Web page data. (For example, step S104), a process of transmitting the generated web page data set with the cooperation data to the terminal device is executed (for example, step S104), and the voice dialogue server includes the terminal device in the web page data. Using outgoing call dataCalled the voice conversation serverDepending on,In response to execution of a process for establishing a connection with the terminal device, and a tone based on the character string data as linkage data associated with the call data is issued by the terminal device with which the connection is established The linkage data is generated using the tone as character string data (for example, step S110),The generated cooperation data is transmitted to the WWW server, and the WWW server furtherOf the session information stored in the session database,Receive from spoken dialogue serverIdentifying session information corresponding to the linked data, determining from the identified session information which service the voice service requested by the terminal device is,The voice service requested by the terminal device is transmitted to the voice dialogue server, and the voice dialogue server further receives the result from the WWW server.The execution content of the voice interaction process is determined using the cooperation data generated based on the determination result (for example, step S111).
[0033]
With the above configuration, it becomes possible to link the WWW server and the voice conversation server using the link data, and it is simple without disclosing personal information to the user who uses the terminal device. It is possible to provide a display and audio linkage service that can be performed by operation.
[0034]
The voice conversation server may be configured to determine the execution contents of the voice conversation processing so that the voice conversation matching the display contents of the web page based on the web page data for which the cooperation data has been set is performed. .
[0035]
With the above-described configuration, it is possible to execute voice conversation processing that matches the content of the service by the Web page for which processing by voice conversation is selected, using the cooperation data.
[0036]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
FIG. 1 is a block diagram showing an example of the configuration of a display / voice cooperation system 10 according to an embodiment of the present invention. The display / voice cooperation system 10 includes a WWW server 20, a voice interaction server 30, and a user terminal 40. The WWW server 20 and the user terminal 40 are each connected to a communication network 50 such as the Internet. The voice interaction server 30 and the user terminal 40 are connected to a public telephone line network 60. The public telephone line network 60 is connected to the communication network 50. In the following description, a network including the public telephone line network 60 may be referred to as a communication network 50.
[0037]
The WWW server 20 is configured by an information processing device such as an Internet server, for example. The WWW server 20 manages Web page data created by a markup language such as HTML (Hypertext Markup Language), for example, and provides and acquires various types of information using Web pages displayed based on the Web page data. It has a function to perform. Web pages include, for example, those for ordering products and those for collecting questionnaires.
[0038]
In this example, the WWW server 20 manages each user who uses this system by using character string information for identifying a user called a cookie. Here, a process when user management is performed using Cookie will be briefly described. First, the WWW server 20 generates a cookie and transmits it to a user terminal (for example, the user terminal 40, specifically, a browser installed in the user terminal 40). When the cookie is acquired, the user terminal stores a file (cookie file) in which the cookie is stored. Thereafter, when accessing the WWW server 20, the user terminal transmits a Cookie file by the function of the browser installed in the user terminal. The WWW server 20 recognizes information about the user from the acquired cookie file. Since the information about the user can be recognized in this way, the WWW server 20 can grasp the latest user information.
[0039]
  In addition to Web page data for displaying a Web page, the WWW server 20 includes session information indicating a session for each user terminal (a flow of a series of processes performed between the user terminal and the WWW server), A session identifier assigned to each user terminal (an example of communication management information used for managing various information related to communication such as a communication history of each user terminal for each user terminal), cooperation data described later, and the like are stored. Database 21 (FIG.reference). In this example, the above-described cookie is used as the session identifier. Each session information is associated with a session identifier for a user terminal involved in the session indicated by the session information. Therefore, the session information associated with the session identifier is information indicating a session between the user terminal indicated by the session identifier and the WWW server 20.
[0040]
The voice dialogue server 30 has a voice recognition function for recognizing voice indicated by voice data input via the general public network 60 and a voice synthesis function for voice synthesis based on character information and outputting voice data. The voice dialogue server 30 executes voice dialogue processing for transmitting information and obtaining information by voice using a voice recognition function and a voice synthesis function. In this example, the voice interaction server 30 provides various services in cooperation with the WWW server 20. For example, a service for acquiring information input in an information input area provided in a Web page operated by the WWW server 20 by voice interaction processing is performed. The speech dialogue server 30 has dictionary data for performing speech recognition and speech synthesis.
[0041]
As shown in FIG. 1, the user terminal 40 has a call function 41 for making a voice call with a connection destination via a general public network 60, and includes, for example, an LCD (Liquid Crystal Display) provided by the user terminal 40. A browser function 42 is provided for displaying a web page on the display device and for inputting characters and selecting information on the web page using an input device provided by itself. The user terminal 40 is configured by a mobile phone terminal such as a digital mobile phone conforming to the PDC (Personal Digital Cellular) standard, for example. The user terminal 40 has an environment (for example, software such as a browser or an environment such as hardware) in which connection to the communication network 50 and transmission / reception of information using the communication network 50 can be performed.
[0042]
Next, the operation of the display / voice cooperation system 10 of this example will be described with reference to the drawings. 2 and 3 are timing charts showing an example of display / voice cooperation processing and processing timing in the display / voice cooperation system 10 of this example.
[0043]
First, the user terminal 40 accesses the WWW server 20 via the communication network 50 in accordance with a user operation (step S101). For example, it is accessed by specifying a URL (Uniform Resource Locator) of a Web page provided by the WWW server 20.
[0044]
When there is an access from the user terminal 40 and there is a request for acquisition of Web page data for displaying a Web page, the WWW server 20 firstly outputs cooperation data for cooperation with the voice interaction server 30. Generate (step S102). For example, as shown in FIG. 4, the linkage data is configured by a character string in which a plurality of numbers and symbols (for example, “#”) that can designate a telephone to emit a specific tone are arbitrarily combined. In this example, the cooperation data is generated to be a character string that does not match any of the other cooperation data that has already been generated and saved.
[0045]
In this example, the web page data includes voice dialogue selection area display data for displaying a voice dialogue selection area for selecting processing by voice dialogue with the voice dialogue server 30 on the web page, and voice dialogue. Phone number data indicating a phone number for making a call to the server 30 is included in a state of being associated with each other. That is, the voice dialog selection area display data and the telephone number data are written in the Web page data in the markup language. In addition, when the voice dialog selection area indicated by the voice dialog selection area display data is selected in the markup language in the Web page data, a description instructing to make a call using the telephone number indicated by the telephone number data Has been made.
[0046]
  Next, the WWW server 20 stores the generated cooperation data in the database 21 in a state associated with the session identifier for the user terminal 30 (step S103).FIG.These are explanatory drawings showing an example of the storage state of the database 21.FIG.As shown in FIG. 4, various information such as session information, linkage data, and voice conversation result information is stored in a state associated with each session identifier. Therefore, by confirming the session information and session identifier corresponding to the cooperation data, it is possible to specify which user terminal has acquired the Web page data in which the cooperation data is set.
[0047]
Further, the WWW server 20 sets the generated cooperation data in the Web page data to be transmitted to the user terminal 40 (Step S104). Specifically, the process of notifying the linkage data at a predetermined location in the Web page data configured in the markup language is performed. In this example, the linkage data is set in a state in which it is associated with telephone number data indicating the telephone number of the voice interaction server 30 included in the Web page data.
[0048]
Next, the WWW server 20 transmits the Web page data set with the cooperation data to the user terminal 40 via the communication network 50 (step S105). The transmitted Web page data includes telephone number data indicating the telephone number of the voice interaction server 30 and the cooperation data generated in step S102.
[0049]
When the user terminal 40 receives the Web page data, the browser function 41 displays the Web page based on the received Web page data on its own display device (step S106).
[0050]
  FIG.These are explanatory drawings showing an example of a display state of a Web page displayed on the user terminal 40. Here, a case will be described as an example where the WWW server 20 provides services such as ticket reservation reception, ticket reservation content change, and ticket reservation cancellation.FIG.Shows an example of a display state of a Web page for changing the reservation contents of a ticket.FIG.As shown in FIG. 5, the Web page is selected to display the reservation contents of the current ticket, the input area for inputting the changed reservation contents, and the input of the changed reservation contents by voice dialogue. A voice dialogue selection area 70 is provided.
[0051]
When the voice dialog selection area 70 is pressed on the Web page, the browser function 41 of the user terminal 40 calls the call function 42 (step S107), and the voice dialog selection area display data for displaying the voice dialog selection area 70 is displayed. It is instructed to make a call using the telephone number indicated by the associated telephone number data. The called call function 42 makes a call to the voice interaction server 30 using the telephone number indicated by the telephone number data set in the web page data in accordance with an instruction from the browser function 41 (step S108). ).
[0052]
When the voice conversation server 30 confirms that the communication line is in a connected state in response to a call from the user terminal 40, the user terminal 40 indicates the link data associated with the telephone number data used for the call. Based on the character string, a process of emitting a tone corresponding to each character of the character string is executed (step S109).
[0053]
When a tone from the user terminal 40 is input, the voice dialogue server 30 generates a character string corresponding to the input tone, thereby generating linkage data based on the character string (step S110).
[0054]
When the cooperation data is generated, the voice interaction server 30 determines the execution contents of the voice interaction process using the generated cooperation data (step S111). Specifically, for example, the voice interaction server 30 first transmits the generated cooperation data to the WWW server 20, and the session associated with the same cooperation data as the cooperation data generated in step S110 is transmitted to the WWW server 20. The database 21 is searched for information (for example, it may be a part of session information such as recently added data of several bytes). Next, the WWW server 20 identifies Web page data in which the same cooperation data as the cooperation data generated in step S110 is set from the searched session information. Based on the specified Web page data, it is possible to confirm through which Web page the user terminal 40 has made a call to the voice interaction server 30. The WWW server 20 confirms what service was intended to be received by voice using the user terminal 40 from the identified Web page data, and transmits the confirmation result to the voice dialogue server 30. Then, the voice dialogue server 30 determines the content of the voice dialogue processing to be executed based on the information indicating the received confirmation result. For example, when it is determined that a call is made to the voice interaction server 30 via the Web page shown in FIG. 6, a voice interaction process for changing the ticket by voice interaction is executed. Decide on. In this way, the execution content of the voice interaction process in step S111 may be determined.
[0055]
Note that the execution content of the voice interaction process in step S111 may be determined in any way. For example, this system is configured to have a database server that manages data stored in the database 21, for example, and any service can be received by voice using the user terminal 40 by the voice interaction server 30 accessing the database server. You may make it confirm whether it was doing.
[0056]
When the execution content of the voice dialogue processing is determined, the voice dialogue server 30 executes the determined voice dialogue processing, notifies the user terminal 40 of information by voice in the voice dialogue processing, and from the user terminal 40. To obtain information (step S112).
[0057]
When the voice dialogue processing is completed, the voice dialogue server 30 transmits the voice dialogue processing result data indicating the result of the voice dialogue processing and the link data of the character string generated in step S110 to the WWW server 20 (step S113). ). The WWW server 20 stores the received voice interaction processing result data in association with the session identifier associated with the same cooperation data as the received cooperation data (step S114).
[0058]
When the voice interaction process is finished, the call function 42 of the user terminal 40 calls the browser function 41 (step S115). The called browser function 41 of the user terminal 40 requests the WWW server 20 to update the display information displayed on the display device of the user terminal 40 (step S116). In response to the update request, the WWW server 20 reads out the voice interaction processing result data associated with the session identifier for the user terminal 40, and creates Web page data reflecting the result of the voice interaction processing (step) S117). Then, the Web page data reflecting the result of the voice interaction process is transmitted (step S118).
[0059]
When the web page data is received, the browser function 41 of the user terminal 40 displays a web page based on the received web page data (step S119). For example, as shown in FIG. 7, the display content of the Web page is in a state in which the content of the information input by the voice dialogue process is reflected. FIG. 7 shows a Web page in which a voice dialog is selected from the Web page shown in FIG. 6 and information indicating the change contents of the ticket reservation is input by voice dialog processing, and then the result of the voice dialog processing is reflected. It is explanatory drawing which shows the example of a display state.
[0060]
As described above, the link data based on the character string generated by the WWW server 20 is transmitted to the voice dialogue server 30 by the tone based on the linkage data, so that the voice dialogue server is used by the linkage data generated by the WWW server 20. 30 can confirm through which Web page the user terminal 40 has been connected, and the service by the Web page by the WWW server 20 and the service by the voice dialog by the voice dialog server 30 can be linked. . As described above, since the display and voice are linked using the linked data, the user who uses the terminal device receives the linked display and voice service without disclosing personal information such as the telephone number of the user terminal. Will be able to. Since the user terminal 40 does not need to notify the caller telephone number when making a call to the voice conversation server 30, it is necessary to disclose personal information not only to the WWW server 20 but also to the voice conversation server 30. Absent.
[0061]
In addition, as described above, since user management (terminal management) is performed using user management information (terminal management information) called a cookie, the user can display and sound with a simple operation that does not require a login operation. It will be possible to receive the cooperation service.
[0062]
In addition, as described above, the cooperation data by the character string generated by the WWW server 20 is stored in association with the session information, and the voice conversation server 30 transmits the voice conversation result data together with the cooperation data. The WWW server 20 can store information indicating the result of the voice conversation in association with the session information about the user terminal 40 that has performed the voice conversation.
[0063]
In the above-described embodiment, the voice dialogue server 30 that executes the voice dialogue processing is used, and the voice dialogue processing is executed by performing voice recognition or voice synthesis. May be performed. In other words, instead of the voice interaction server 30, an operation center may be provided. In this case, in the operation center, processing for generating linked data of character strings from the tone input after the communication line is connected is executed in the same manner as the voice dialogue server 30 described above, but the voice dialogue processing is performed by the operator. Done.
[0064]
In the above-described embodiment, the WWW server 20 is configured to generate cooperative data using an arbitrary character string when receiving a Web page data transmission request. In the case where only one of the links is intended (when the processing up to step S112 in the above-described voice / display cooperation processing is performed), it may be configured to use cooperation data based on a predetermined character string. . In this case, it is only necessary to set predetermined predetermined linkage data (separate linkage data determined for each Web page data) in advance for each Web page data. The database in which the voice dialogue server 30 stores information indicating the contents of a Web page (for example, a content such as a Web page for ticket reservation) based on the Web page data in which each cooperation data is set. Should be provided. The predetermined character string described above and an arbitrary character string generated when the WWW server 20 receives a transmission request for Web page data may be combined and used as linked data. In this case, the voice dialogue server 30 stores information indicating the contents of the Web page (for example, contents such as a Web page for ticket reservation) based on the Web page data in which each cooperation data is set. A database may be provided, and the voice interaction processing result may be associated with an arbitrary character string and stored in the system.
[0065]
In the above-described embodiment, the WWW server 20 generates character string data as cooperation data so as to be a character string that does not match any of the other cooperation data already generated and stored. The character update counter (an example of a character update means) that updates a character string made up of numbers and symbols that can be specified to the phone to emit a specific tone regardless of the contents of other linked data ) May be extracted to generate a character string as linked data. Even if it comprises in this way, if the number of digits of the character string as cooperation data is increased, it can prevent that the cooperation data which matched other cooperation data are produced | generated. In this case, the character update counter may be configured to randomly update a character string having a predetermined number of digits, for example, a combination of symbols such as numbers 0 to 9 and “#”. In step S102, the WWW server 20 may extract a character string from the character update counter and generate character string data as cooperation data. If comprised in this way, the character string data as cooperation data can be determined at random, and even if it is cooperation data given to others before and after self, based on cooperation data given to self It is possible to prevent other people's cooperation data from being predicted. That is, when the configuration is such that character string data is generated in a regular order, the linkage data assigned to a certain user terminal and the linkage data assigned to another user terminal that has accessed next are serial numbers. End up. Therefore, the user of the user terminal to which the cooperation data is given can easily predict the cooperation data given to others from the cooperation data given to the user. However, if it is set as the structure which produces | generates cooperation data using a character update counter, it will become impossible to estimate the cooperation data provided to others. Therefore, the act of transmitting the cooperation data given to another person to the voice conversation server 30 and causing the voice conversation server 30 or the WWW server 20 to apologize is prevented. Therefore, the system 10 can be operated safely.
[0066]
Note that the character update counter is not limited to a character string having the same number of digits as the character string to be generated, but to update a character or character string having other digits such as one digit at random. May be. In this case, the character or character string is extracted from the character update counter several times until the WWW server 20 extracts the characters of the number of digits of the character string to be generated, and the extracted characters or character strings are combined to generate characters as linkage data. A sequence may be generated.
[0067]
In the above-described embodiment, the cookie is used as the session identifier. However, when the system is used by a user terminal that does not have a function for handling the cookie in the browser function 41, the WWW server 20 A parameter for identifying a session may be added to Web page data (for example, URL) transmitted to the user terminal, and the session at each user terminal may be managed based on the parameter.
[0068]
In the above-described embodiment, the user terminal 40 is described as a mobile phone terminal. However, a PDA (Personal Digital Assistants), a personal computer, or the like may be used as long as the user terminal 40 has both a browser function and a call function. Other terminal devices may be used.
[0069]
In the above-described embodiment, the voice dialogue server 30 transmits the voice dialogue processing result data and the character string linkage data to the WWW server 20 in step S113. It is good also as a structure which has a database server which manages stored data, and transmits to the database server. In this case, the database server stores the received voice dialogue processing result data in association with the session identifier associated with the same linkage data as the received linkage data, and stores the voice dialogue from the WWW server 20. The voice interaction processing result data stored in response to the processing result inquiry may be transmitted to the WWW server 20. In other words, the storage information of the database 21 may not be managed by the WWW server 20, and may be managed by other servers (for example, the voice interaction server 30 and the database server) included in the system, but may be managed by a plurality of servers. It may be managed in common. That is, as long as the storage information of the database 21 is stored in a state that can be managed by the present system, the storage information may be stored anywhere and in any state.
[0070]
In the above-described embodiment, the WWW server 20 acquires the voice interaction processing result data in response to the Web page acquisition request from the user terminal 40 (display information update request in step S116) (step S117). The configuration is such that the web page data reflecting the voice dialogue processing result indicated by the voice dialogue processing result data is transmitted (step S118), but after the voice dialogue processing result data is saved (step S114), Regardless of whether or not there is a web page acquisition request, the web page data reflecting the voice dialogue processing result indicated by the voice dialogue processing result data may be transmitted to the user terminal 40. If comprised in this way, the result of a voice interaction process can be reflected on a web page, without making the user terminal 40 make a web page acquisition request.
[0071]
In the above-described embodiment, the user terminal 40 makes a call to the voice interaction server 30 in step S108, and after the connection is established, the user terminal 40 emits a tone in step S109. The terminal 40 simultaneously outputs a telephone number and a tone based on the character string of the linkage data. After the exchange connected to the public telephone network 60 establishes a connection between the user terminal 40 and the voice dialogue server 30, the tone is output. It is good also as a structure output toward the voice dialogue server 30. FIG.
[0072]
In each of the above-described embodiments, HTML is used as an example of a display language for generating Web page data. However, it is widely used for displaying a Web page on a browser of a mobile phone terminal. A markup language for mobile phone terminals such as C-HTML (Compact HTML), or other markup languages such as HDML (Handheld Device Markup Language) and WML (Wireless Markup Language) may be used.
[0073]
Furthermore, in each of the embodiments described above, both the voice server and the user terminal are connected to the public telephone line network 60, but may be connected to an IP network such as VoIP (Voice over Internet Protocol).
[0074]
【The invention's effect】
  As described above, according to the display / voice cooperation system of the present invention, the WWW server includes a session database that stores session information indicating a flow of a series of processes performed between the terminal device and the WWW server. In response to a Web page acquisition request from the terminal device, character string data to be used as linkage data for linking with the voice interaction server is determined, and the linkage data is associated with the session information and the communication management information for the terminal device. At the same time, the data is stored in the session database, and the link data is associated with the call data and set as Web page data, and then processing for transmitting the Web page data set with the link data to the terminal device is executed. In addition, the terminal device uses the browser function for displaying the Web page based on the received Web page data, and the call data included in the Web page data in response to a request from the user, toward the voice conversation server. A call function for making a call and outputting a tone based on character string data as linked data associated with the call data. Furthermore, the voice dialogue server generates linkage data by using the tone from the terminal device as character string data,The generated cooperation data is transmitted to the WWW server. The WWW serverOf the session information stored in the session database,Receive from spoken dialogue serverIdentifying session information corresponding to the linked data, determining from the identified session information which service the voice service requested by the terminal device is,The determination result of the voice service requested by the terminal device is transmitted to the voice conversation server. The voice dialogue server further received from the WWW serverBased on the determination result, the execution content of the voice interaction process is determined. With this configuration, the WWW server and the voice interaction server can be linked using the linked data, and the user using the terminal device can perform simple operations without disclosing personal information. It will be possible to receive display and audio linkage services.
[0075]
Since the voice conversation server is configured to determine the execution contents of the voice conversation processing so that the voice conversation matching the display contents of the web page based on the web page data for which the cooperation data has been set is performed. Using the data, it is possible to execute a voice dialogue process that matches the content of the service by the Web page for which the voice dialogue process is selected.
[0076]
Since the WWW server is configured to generate character string data in response to a Web page acquisition request from the terminal device and determine the link data using the generated character string data, the link data generated by the WWW server Can be used to link the WWW server and the voice dialogue server.
[0077]
The voice dialogue server executes voice dialogue processing, and associates the voice dialogue processing result data indicating the voice dialogue processing result with the communication management information about the terminal device associated with the same linkage data as the generated linkage data. In this system, the result of the voice dialogue processing by the voice dialogue server is associated with the information about the terminal device that received the service by the voice dialogue processing. Can be stored in
[0078]
In response to a Web page acquisition request from the terminal device, the WWW server acquires voice interaction processing result data associated with the communication management information about the terminal device and stored in the system. Since the web page data reflecting the voice dialogue processing result shown is transmitted, the result of the voice dialogue processing can be reflected on the web page.
[0079]
Web page data in which the WWW server obtains the voice interaction processing result data stored in the system in association with the communication management information about the terminal device, and reflects the voice interaction processing result indicated by the voice interaction processing result data Is transmitted to the terminal device, the result of the voice interaction process can be reflected on the Web page without causing the terminal device to make a Web page acquisition request.
[0080]
Since the linkage data is configured to be character string data composed of a plurality of characters in which a tone corresponding to the character is emitted in the terminal device, the character string data is transmitted via the communication network. The cooperative data can be transmitted and received, and the cooperative data can be transmitted and received as sound data via the communication network.
[0081]
Since the WWW server includes character update means for updating at least one character and is configured to determine character string data as linkage data based on characters extracted from the character update means, character string data as linkage data Can be determined at random, even if it is linked data given to others before and after self, it cannot be predicted based on the linked data given to itself, so the linked data given to others Is prevented from being sent to the voice interaction server and executed, and the system can be operated safely.
[0082]
Since the terminal management information managed corresponding to the terminal device is used as the communication management information, it is possible to store various information such as cooperation data in association with the terminal management information. it can.
[0083]
  In addition, according to the display / voice cooperation server of the present invention, the WWW server includes a session database that stores session information indicating a flow of a series of processing performed between the terminal device connected to the communication network and the WWW server. In response to a Web page acquisition request from the terminal device, the server determines character string data as cooperation data for cooperation with the voice interaction server, and generates the session information and communication management information for the terminal device. The data is stored in the session database in association with each other, and the generated cooperative data is associated with the call data and set as Web page data, and then the Web page data in which the generated cooperative data is set is directed to the terminal device. Execute the process to send. Further, the voice conversation server uses the call data included in the Web page data by the terminal device.Called the voice conversation serverDepending on,In response to execution of a process for establishing a connection with the terminal device, and a tone based on the character string data as linkage data associated with the call data is issued by the terminal device with which the connection is established Then, the linkage data is generated using the tone as character string data, the session information corresponding to the generated linkage data is identified from the session information stored in the session database, and the terminal device uses the voice requested by the terminal device from the identified session information. It is determined which service is the service, and the execution content of the voice interaction process is determined based on the determination result. Since it is configured in this way, it becomes possible to achieve cooperation between the WWW server and the voice conversation server using the cooperation data, and it is easy without disclosing personal information to the user who uses the terminal device. It is possible to provide a display and audio linkage service that can be performed with simple operations.
[0084]
Since the voice conversation server is configured to determine the execution contents of the voice conversation processing so that the voice conversation matching the display contents of the web page based on the web page data for which the linkage data has been set is performed. Using the cooperation data, it is possible to execute a voice dialogue process that matches the content of the service by the Web page for which the voice dialogue process is selected.
[0085]
  In addition, according to the display / voice cooperation method of the present invention, the WWW server responds to a web page acquisition request from a terminal device connected to the communication network as cooperation data for cooperation with the voice conversation server. A session database that determines character string data and associates the generated linkage data with session information indicating a flow of a series of processes performed between the terminal device and the WWW server, and communication management information about the terminal device. And storing the generated linkage data in association with the call data and setting it as Web page data, and then executing a process of transmitting the Web page data in which the generated linkage data has been set to the terminal device. The dialog server uses the call data included in the Web page data by the terminal device.Called the voice conversation serverDepending on,In response to execution of a process for establishing a connection with the terminal device, and a tone based on the character string data as linkage data associated with the call data is issued by the terminal device with which the connection is established To generate linkage data using the tone as character string data,The generated cooperation data is transmitted to the WWW server, and the WWW server furtherOf the session information stored in the session database,Receive from spoken dialogue serverIdentifying session information corresponding to the linked data, determining from the identified session information which service the voice service requested by the terminal device is,The voice service requested by the terminal device is transmitted to the voice dialogue server, and the voice dialogue server further receives the result from the WWW server.Since the execution content of the voice interaction processing is determined based on the determination result, the cooperation between the WWW server and the voice interaction server can be achieved using the cooperation data. It is possible to provide a display and audio link service that can be performed with a simple operation without disclosing information.
[0086]
Since the voice conversation server is configured to determine the execution contents of the voice conversation processing so that the voice conversation matching the display contents of the web page based on the web page data for which the linkage data has been set is performed. Using the cooperation data, it is possible to execute a voice dialogue process that matches the content of the service by the Web page for which the voice dialogue process is selected.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating an example of a configuration of a display / voice cooperation system according to an embodiment of the present invention.
FIG. 2 is a timing chart showing an example of display / voice cooperation processing and processing timing in an embodiment of the present invention.
FIG. 3 is a timing chart showing an example of display / audio cooperation processing and processing timing in an embodiment of the present invention.
FIG. 4 is an explanatory diagram showing an example of cooperation data.
FIG. 5 is an explanatory diagram showing an example of a database storage state;
FIG. 6 is an explanatory diagram illustrating an example of a display state of a Web page.
FIG. 7 is an explanatory diagram illustrating an example of a display state of a Web page in which a voice conversation processing result is reflected.
FIG. 8 is a block diagram showing an example of the configuration of a conventional display / voice cooperation system.
[Explanation of symbols]
10 Display / Voice Cooperation System
20 WWW server
21 Database
30 Spoken Dialogue Server
40 User terminal
41 Browser function
42 Call function
50 Communication network
60 Public telephone line
70 Voice dialogue selection area

Claims

A terminal device connected to a communication network; a WWW server that provides and collects information using a Web page; and a voice dialog server that executes voice dialog processing by inputting and outputting voice information via the communication network. Display / voice linkage system,
A session database for storing session information indicating a flow of a series of processing performed between the terminal device and the WWW server;
Web page data for displaying the Web page includes a call data for a call toward the voice interaction server,
The WWW server in response to said request for acquisition of a Web page from the terminal device, the determined character string data to be used as the link data for realizing cooperation with the audio interactive server for the session information and the terminal device The link data is associated with the communication management information and stored in the session database, and the link data is set in the Web page data in association with the call data, and then the Web page data in which the link data is set is displayed. Execute a process of transmitting to the terminal device;
The terminal device includes a browser function of displaying a Web page based on the Web page data received, toward the audio interactive server using the call data contained in the Web page data in response to a request from a user A call function for making a call and outputting a tone based on character string data as linked data associated with the call data;
The voice interaction server generates linkage data by using a tone from a terminal device as character string data, and transmits the generated linkage data to the WWW server.
The WWW server further specifies session information corresponding to the cooperation data received from the voice dialogue server from the session information stored in the session database, and a voice service requested by the terminal device from the specified session information. Is transmitted to the voice dialogue server, the result of the voice service requested by the terminal device is determined,
The voice dialogue server further determines execution contents of voice dialogue processing based on the determination result received from the WWW server .

When the WWW server specifies the session information corresponding to the cooperation data received from the voice interaction server, the Web page in which the same cooperation data as the cooperation data received from the voice interaction server is set from the specified session information The display / speech cooperation system according to claim 1, wherein data is specified and a service by voice requested by the terminal device is determined from the specified Web page data.

3. The display / voice cooperation system according to claim 1, wherein the session database stores session information and communication management information in association with the cooperation data.

WWW server, by extracting the character string string counter is updated to update a string of numbers or symbols, any one of the claims 1 to 3 for determining a string as the link data display and sound cooperation system according to.

Voice interaction server, as spoken dialogue that matches the display contents of a Web page based on the Web page data link data is set is made, from the claims 1 to determine the execution content of the voice interaction process of claim 4 The display / voice cooperation system according to any one of the above.

WWW server generates character string data in response to the acquisition request for the Web page from the terminal apparatus, any one of claims 5 the generated character string data from claim 1 to determine the link data to be used Display / speech linkage system described in 1.

The voice dialogue server executes voice dialogue processing, and associates the voice dialogue processing result data indicating the voice dialogue processing result with the communication management information about the terminal device associated with the same linkage data as the generated linkage data. The display / speech cooperation system according to claim 6, wherein the processing for storing in the system is executed.

In response to a Web page acquisition request from the terminal device, the WWW server acquires voice interaction processing result data associated with communication management information about the terminal device and stored in the system, and the voice interaction processing result The display / voice cooperation system according to claim 7, wherein the Web page data reflecting the voice dialogue processing result indicated by the data is transmitted.

A WWW server acquires voice dialogue processing result data associated with communication management information about a terminal device and stored in the system, and reflects a voice dialogue processing result indicated by the voice dialogue processing result data The display / voice cooperation system according to claim 7 , wherein data is transmitted to the terminal device.

The display / sound according to any one of claims 1 to 9 , wherein the linkage data is character string data composed of a plurality of characters in which a tone corresponding to the character is emitted from the terminal device. Cooperation system.

The display / speech cooperation system according to claim 10 , wherein the WWW server includes character update means for updating at least one character, and determines character string data as cooperation data based on the character extracted from the character update means.

The display / speech cooperation system according to any one of claims 1 to 11 , wherein terminal management information managed corresponding to a terminal device is used as the communication management information.

A display / voice cooperation server comprising a WWW server that provides and collects information using a web page and a voice dialogue server that executes voice dialogue processing by inputting and outputting information by voice via a communication network,
A session database for storing session information indicating a flow of a series of processing performed between a terminal device connected to a communication network and the WWW server;
Web page data for displaying the Web page includes a call data for a call toward the voice interaction server,
The WWW server in response to said request for acquisition of a Web page from the terminal device, wherein the character string data is determined as the link data for realizing cooperation with the audio interactive server, communication for the session information and the terminal device The association data generated in association with the management information is stored in the session database, and the generated association data is associated with the call data and set in the Web page data, and then the generated association data is set in the Web. Executing a process of transmitting page data to the terminal device;
The voice conversation server performs processing for establishing a connection with the terminal device in response to the terminal device calling the voice conversation server using the call data included in the web page data. In response to a tone based on the character string data as the cooperation data associated with the call data being executed by the terminal device that has been established and connected, the tone is used as the character string data. Among the session information stored in the session database, the session information corresponding to the generated linkage data is identified, and the voice service requested by the terminal device is identified from the identified session information. A display / speech characterized by determining and determining the execution content of the voice dialogue processing based on the determination result Linked server.

The display / voice according to claim 13 , wherein the voice dialogue server determines the execution contents of the voice dialogue processing so that the voice dialogue matching the display contents of the web page based on the web page data for which the cooperation data has been set is performed. Linked server.

A display / voice linkage method for linking a WWW server that provides and collects information using a Web page and a voice dialogue server that executes voice dialogue processing by inputting and outputting voice information via a communication network. There,
Web page data for displaying the Web page includes a call data for a call toward the voice interaction server,
The WWW server
In response to an acquisition request for a Web page from a terminal device connected to the communication network, determines the character string data as the link data for realizing cooperation with said audio interactive server,
The session information indicating a flow of a series of processes performed between the terminal device and the WWW server, and the communication management information about the terminal device are associated with the generated linkage data and stored in the session database. , After associating the generated linkage data with the call data and setting it as Web page data, executing processing for transmitting the Web page data set with the generated linkage data to the terminal device,
The voice dialogue server
In response to the terminal device calling the voice conversation server using the call data included in the web page data, a process for establishing a connection with the terminal device is executed.
In response to the tone based on the character string data as the cooperation data associated with the call data by the terminal device that has been established, the cooperation data is generated using the tone as the character string data. ,
Send the generated cooperation data to the WWW server,
The WWW server further includes:
Among the session information stored in the session database, identify the session information corresponding to the cooperation data received from the voice dialogue server ,
Determine which service is the voice service requested by the terminal device from the identified session information,
Transmitting the result of the service determination by voice requested by the terminal device to the voice dialogue server;
The voice dialogue server further determines the execution contents of voice dialogue processing based on the determination result received from the WWW server .

The display / voice according to claim 15 , wherein the voice dialogue server determines the execution contents of the voice dialogue processing so that the voice dialogue matching the display contents of the web page based on the web page data for which the cooperation data has been set is performed. Cooperation method.