JP4811059B2

JP4811059B2 - Agent device

Info

Publication number: JP4811059B2
Application number: JP2006061251A
Authority: JP
Inventors: 英裕大橋
Original assignee: Kenwood KK
Current assignee: Kenwood KK
Priority date: 2006-03-07
Filing date: 2006-03-07
Publication date: 2011-11-09
Anticipated expiration: 2026-03-07
Also published as: JP2007241535A

Description

本発明は、使用状況によりエージェントの対応を変化させるエージェント装置に関する。 The present invention relates to an agent device that changes an agent's correspondence according to usage conditions.

従来から、パーソナルコンピュータ、カーナビゲーションシステム、携帯電話機やＰＤＡ（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔｓ）を始めとするモバイル機器及びＶＣＲやＤＶＤプレーヤ等のＡＶ（ＡｕｄｉｏＶｉｓｕａｌ）機器等の様々な電気・電子機器において、画像表示装置に擬人化したエージェントの画像を表示させたり、音声出力装置からエージェントの音声出力を行ったりして各種機器の操作案内、経路案内、電子メールの送受信等を行わせるエージェント機能に関する開発や研究が行われている。 Conventionally, image display in various electric and electronic devices such as personal computers, car navigation systems, mobile phones, mobile devices such as PDA (Personal Digital Assistants), and AV (Audio Visual) devices such as VCRs and DVD players. Development and research on agent functions to display anthropomorphic agent images on the device, or perform agent voice output from the voice output device to perform operation guidance of various devices, route guidance, e-mail transmission and reception, etc. Has been done.

近年では、情報技術の飛躍的な発達に伴いエージェント機能もより一層の発達を遂げ、より人間的な自然な対応を行うエージェント技術も開発されている。例えば、特許文献１には、カーナビゲーションシステムにおけるエージェント装置であって、車両に設けた各種のセンサ等から送信される検出信号に基づいて車両の状況を把握し、この状況の種類や程度によってエージェントに発声させる音声の高さ、強さ、音質、長さ等を自動的に変更する発明が開示されている。 In recent years, with the rapid development of information technology, the agent function has also been further developed, and agent technology has been developed that provides a more natural human response. For example, Patent Document 1 discloses an agent device in a car navigation system, which grasps a vehicle situation based on detection signals transmitted from various sensors provided in the vehicle, and determines the agent according to the type and degree of the situation. An invention for automatically changing the height, strength, sound quality, length, etc. of the voice to be uttered is disclosed.

また、特許文献２には、ユーザの音声を認識する音声認識手段を備え、この音声認識手段で認識した音声の強さや時間とともに変化するユーザの感情を自律的に判断してその場に適切な行動を自動的に実行するエージェントインターフェースに関する発明が開示されている。
特開平１１−２５９２７１号公報特開２００５−２２２３３１号公報 Further, Patent Document 2 includes a voice recognition unit that recognizes a user's voice, and autonomously determines the user's emotion that changes with the strength and time of the voice recognized by the voice recognition unit, and is appropriate for the situation. An invention related to an agent interface that automatically executes an action is disclosed.
JP 11-259271 A JP 2005-222331 A

ところで、特許文献１や特許文献２に開示されるように、エージェントの行動をユーザの気分やその場の状況に応じてユーザに対して好適な行動に変化させるためのエージェント技術の発明はなされているが、より積極的に、ユーザに対して反発的な行動を行うエージェント機能に関する発明はあまりなされていないのが現状である。 By the way, as disclosed in Patent Document 1 and Patent Document 2, the invention of the agent technology for changing the behavior of the agent to the behavior suitable for the user according to the user's mood and the situation on the spot has been made. However, the present situation is that there are not so many inventions related to the agent function of more positively acting against the user.

反発的な対応を実行させることにより、画一的となりがちなエージェントの対応をより表現豊にすることができる。エージェント自身がより人間的になり、ユーザにとって更に親近感が沸く存在とすることができる。 By executing the repulsive response, the agent's response, which tends to be uniform, can be made more expressive. The agent itself can become more human and the user can be more familiar.

本発明は、上記課題を解決するためになされたものであり、その目的とするところは、
エージェントに反発的な行動を実行させることでエージェントにより親近感を抱かせ、もってエージェント機能をより好適なものにすることにある。 The present invention has been made in order to solve the above-mentioned problems, and its object is as follows.
It is to make the agent function more suitable by making the agent execute repulsive behavior to make the agent feel closer.

上記課題を解決するために、請求項１に記載の発明は、擬人化されたエージェントの状態を示す複数の画像データを表示する表示手段と、
前記画像データ、及び謝意を内容とする特定の音声データを含む複数の音声データを記憶する記憶手段と、
前記記憶手段に記憶された音声データを合成する音声合成手段と、
前記音声合成手段により合成された前記音声データを出力する音声出力手段と、
外部からの指示を入力する入力手段と、
前記入力手段により入力された指示のうち音声データを認識する音声認識手段と、
前記音声認識手段に入力された音声データを検出する検出手段と、
前記検出手段により前記音声データが検出されてから次の音声データが検出されるまで計時を行う計時手段と、
前記計時手段による計時が所定の時間を超過するか否かを判定する判定手段と、
前記判定手段により所定の時間を超過すると判定された場合に、外部に対して、前記入力手段による入力を促す音声データを前記音声合成手段により合成させて前記音声出力手段により出力させる制御手段と、を備え、
前記制御手段は、前記入力手段による入力を促す音声データの出力後、更に前記判定手段により所定の時間を超過すると判定された場合に、前記音声出力手段からの音声データの出力を一時的に停止させ、
前記音声出力手段からの音声データの出力が一時的に停止された後、前記音声認識手段により前記音声データが認識された場合に、前記音声認識手段により認識された前記音声データと前記記憶手段に記憶された前記特定の音声データとを比較して一致するか否かを判定し、
前記音声認識手段により認識された前記音声データが前記特定の音声データと一致すると判定した場合に、前記一時的に停止させた前記音声出力手段からの音声データの出力を再開させることを特徴とする。 In order to solve the above-described problem, the invention according to claim 1 includes a display unit that displays a plurality of image data indicating the state of an anthropomorphic agent;
Storage means for storing a plurality of audio data including the image data and specific audio data including gratitude ;
Voice synthesis means for synthesizing voice data stored in the storage means;
Voice output means for outputting the voice data synthesized by the voice synthesis means;
An input means for inputting an external instruction;
Voice recognition means for recognizing voice data among instructions input by the input means ;
Detection means for detecting a speech data input before Symbol speech recognition means,
Timing means for performing timed the audio data from being detected by the detection means to the next audio data is detected,
Determining means for determining whether or not the time measured by the time measuring means exceeds a predetermined time;
Wherein when it is determined to exceed the predetermined time by the judging means, the external control means to further outputs audio data to prompt an input by said input means to said audio output means by synthesized by the voice synthesis section and, with a,
The control means temporarily stops outputting the audio data from the audio output means when the determination means determines that a predetermined time is exceeded after the output of the audio data prompting the input by the input means. Let
After the output of the voice data from the voice output unit is temporarily stopped, when the voice data is recognized by the voice recognition unit, the voice data recognized by the voice recognition unit and the storage unit Compare with the stored specific audio data to determine whether they match,
If the voice data recognized by the voice recognition unit determines that matches the particular voice data, and wherein Rukoto restarts the output of the audio data from the temporarily stopped the voice output unit To do.

請求項２に記載の発明は、請求項１に記載のエージェント装置であって、
前記制御手段は、
前記判定手段により所定の時間を超過すると判定された場合に、前記入力手段による入力を促す画像データを前記表示手段に更に表示させることを特徴とする。 The invention according to claim 2 is the agent device according to claim 1 ,
The control means includes
Wherein when it is determined to exceed the predetermined time by the judging means, characterized in that to further display the image data for prompting an input by the input means on the display means.

請求項３に記載の発明は、請求項１に記載のエージェント装置であって、
前記制御手段は、
前記判定手段により所定の時間を超過すると判定された場合に、前記入力手段による入力を促す音声データを前記音声出力手段により出力させることに代えて、前記入力手段による入力を促す画像データを前記表示手段に表示させることを特徴とする。 The invention according to claim 3 is the agent device according to claim 1 ,
The control means includes
If it is determined to exceed the predetermined time by the determination means, the image data of the voice data urging input by said input means in place of the Rukoto is more output to the audio output means, prompting the user to input by said input means characterized in that that presents on the display means.

請求項４に記載の発明は、請求項２又は３に記載のエージェント装置であって、
前記入力手段による入力を促す画像データは、前記擬人化されたエージェントのヒマ又は退屈な態様を示したものであることを特徴とする。 The invention according to claim 4 is the agent device according to claim 2 or 3 , wherein
The image data that prompts the input by the input means is one that shows a cast or boring aspect of the anthropomorphic agent.

本発明によれば、検出手段により入力された音声データを検出してから、次の音声データを検出するまでの時間を計時する計時手段を備え、判定手段によりこの時間が所定の時間を超過すると判定する場合に、音声出力手段からエージェントとの対話を促す音声を出力することができる。例えば、ヒマな状態や退屈な状態である旨の音声を出力してユーザに対して反発的な演出を行うことで、画一的となりがちなエージェントの対応に意外性を持たせることができ、エージェントとの対話に対してユーザにより関心を抱かせ、エージェント機能をより好適なものとすることができる。
特に、画像表示手段を更に備え、エージェントがヒマな状態や退屈な状態である画像も表示させることで、エージェントに対するユーザの関心をより強く抱かせることができる。 According to the present invention, it is provided with time measuring means for measuring the time from detection of the voice data input by the detection means until detection of the next voice data, and when this time exceeds a predetermined time by the determination means. In the determination, a voice prompting a dialogue with the agent can be outputted from the voice output means. For example, by producing a repulsive presentation to the user by outputting a voice indicating that it is in a timeless or boring state, it is possible to make the response of agents that tend to be uniform, surprisingly, The user can be more interested in the interaction with the agent, and the agent function can be made more suitable.
In particular, the image display means is further provided to display an image in which the agent is in a tedious state or a boring state, whereby the user can be more interested in the agent.

次に、図を用いて本発明を実施するための最良の形態について説明する。
本発明におけるエージェント装置は、電気・電子機器の駆動状態を検出できさえすれば、種々の電子機器に適用することができる。例えば、携帯電話機を始めとする各種のモバイル機器、カーナビゲーションシステム及びデスクトップパーソナルコンピュータ等の種々の電子機器に適用することができる。なお、電気・電子機器の駆動状態とは、例えば、リモコン装置、キーボード又はマウス等のユーザ自身の操作により入力信号を送信する各種の入力装置（ユーザインターフェース）からの入力信号を検出する場合や、電子機器の外部環境の変化を検出することのできる各種のセンサ（例えば、電子機器の駆動により発生する熱を検出する温度センサ）等により入力信号を検出できる場合を含む意味である。即ち、これらインターフェースや各種センサを操作することにより、電気・電子機器が使用されている状態を検出できればよい。本実施の形態では、本発明のエージェント装置を携帯電話機に適用した例を説明する。 Next, the best mode for carrying out the present invention will be described with reference to the drawings.
The agent device according to the present invention can be applied to various electronic devices as long as the driving state of the electrical / electronic device can be detected. For example, the present invention can be applied to various mobile devices such as mobile phones, various electronic devices such as car navigation systems and desktop personal computers. The driving state of the electric / electronic device is, for example, when detecting input signals from various input devices (user interfaces) that transmit input signals by user's own operation such as a remote control device, a keyboard or a mouse, This means that the input signal can be detected by various sensors (for example, a temperature sensor that detects heat generated by driving the electronic device) that can detect a change in the external environment of the electronic device. That is, it is only necessary to detect the state in which the electric / electronic device is used by operating these interfaces and various sensors. In this embodiment, an example in which the agent device of the present invention is applied to a mobile phone will be described.

図１のブロック図に、本発明を適用した携帯電話機１００の機能的な構成を示す。携帯電話機１００は、携帯電話機制御部２０、通信／通話機能部２１、表示部２２、入力部２３、エージェントＣＰＵ１、ＲＡＭ２、記憶部３、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）４、マイク５、音源ＩＣ６、スピーカ７及び時計回路８から構成され、これらがバス９により電気・電子的に接続されている。
なお、エージェントＣＰＵ１を携帯電話機制御部２０と一体に構成することも当然に可能であるが、簡単のため両者を分けて構成している。 The functional configuration of a mobile phone 100 to which the present invention is applied is shown in the block diagram of FIG. The cellular phone 100 includes a cellular phone control unit 20, a communication / call function unit 21, a display unit 22, an input unit 23, an agent CPU 1, a RAM 2, a storage unit 3, a DSP (Digital Signal Processor) 4, a microphone 5, a sound source IC 6, and a speaker. 7 and a clock circuit 8, which are electrically and electronically connected by a bus 9.
Of course, the agent CPU 1 can be configured integrally with the mobile phone control unit 20, but for simplicity, both are configured separately.

携帯電話機制御部２０は、ＣＰＵ、ＲＯＭ及びＲＡＭ等から構成され、予め記憶されたオペレーションプログラムや音声通話プログラム及びメール送受信プログラム等のアプリケーションプログラムの指示に従い携帯電話機１００の全体制御を行うものである。表示部２２の表示制御も行うものであり、エージェントＣＰＵ１から送信されるエージェントの画像データを受信して表示部２２に表示させる制御も行う。更に、電話番号や文字の入力を行う各種のキーから構成される入力部２３からの入力信号をエージェントＣＰＵ１に送信する制御も行うようになっている。 The mobile phone control unit 20 includes a CPU, a ROM, a RAM, and the like, and performs overall control of the mobile phone 100 in accordance with instructions of application programs such as an operation program, a voice call program, and a mail transmission / reception program stored in advance. Display control of the display unit 22 is also performed, and control for receiving the image data of the agent transmitted from the agent CPU 1 and displaying it on the display unit 22 is also performed. Furthermore, control is also performed to transmit an input signal from the input unit 23 composed of various keys for inputting telephone numbers and characters to the agent CPU 1.

エージェントＣＰＵ１は、記憶部３に格納されたオペレーションプログラム及びエージェントプログラム１０を読み出し、ワークエリアとしてのＲＡＭ２に展開し、これらプログラムとの協働によりエージェント機能の制御統括を行うものである。より具体的には、オペレーションプログラム及びエージェントプログラム１０の指示に従い、後述する「エージェント機能一時停止処理」を行う。 The agent CPU 1 reads the operation program and the agent program 10 stored in the storage unit 3, develops them in the RAM 2 as a work area, and performs control over the agent function in cooperation with these programs. More specifically, “agent function temporary stop processing” described later is performed in accordance with instructions from the operation program and the agent program 10.

「エージェント機能一時停止処理」とは、携帯電話機１００が使用されていない時間を計時し、この使用されていない時間に応じて表示部２２に表示するエージェントの画像やスピーカ７を介して出力するエージェントの音声を通じて携帯電話機１００の使用を催促し、その後、このような催促にも関わらず携帯電話機１００が使用されない状態が続くと、エージェント機能を一時的に停止する処理である。また、エージェント機能を一時的に停止した後は、ユーザに対し所定の行動を要求し、この要求が満たされるとエージェント機能を復帰する処理も含むものである。 The “agent function suspension process” is an agent that measures the time when the mobile phone 100 is not used and outputs the agent image to be displayed on the display unit 22 or the speaker 7 according to the unused time. This is a process of temporarily suspending the agent function when the use of the mobile phone 100 is urged through the voice and the mobile phone 100 is not used in spite of such a urgency. Further, after the agent function is temporarily stopped, a process of requesting a predetermined action from the user and returning the agent function when the request is satisfied is also included.

記憶部３には、エージェント機能を司るオペレーションプログラム、エージェントプログラム１０、画像データ１１、音声データ１２、入力音声サンプルデータ１３等の各種プログラムやデータが格納される。 The storage unit 3 stores various programs and data such as an operation program that performs an agent function, an agent program 10, image data 11, audio data 12, and input audio sample data 13.

エージェントプログラム１０は、エージェントＣＰＵ１との協働により「エージェント機能一時停止処理」を行うアプリケーションプログラムである。エージェントプログラム１０には、エージェントＣＰＵ１が、携帯電話機１００が使用されていない時間を判別する際に基準とする閾値時間データが格納されている。より具体的には、エージェントＣＰＵ１は、入力部２３を介して入力される入力信号の検出をトリガとして図示しないカウンタにより時間を計時する。エージェントＣＰＵ１は、次に入力信号を検出するまで計時を行い、この計時した時間が閾値時間データを超過か否かを監視して、超過する場合に、表示部２２やスピーカ７にエージェントの画像データ及び／又は音声データを出力するようになっている。本実施の形態では、例として第１から第３までの３つの閾値時間データが格納されている。 The agent program 10 is an application program that performs “agent function temporary stop processing” in cooperation with the agent CPU 1. The agent program 10 stores threshold time data used as a reference when the agent CPU 1 determines a time when the mobile phone 100 is not used. More specifically, the agent CPU 1 measures time with a counter (not shown) triggered by detection of an input signal input via the input unit 23. The agent CPU 1 keeps timing until the next input signal is detected. The agent CPU 1 monitors whether or not the measured time exceeds the threshold time data, and if it exceeds, the image data of the agent is displayed on the display unit 22 or the speaker 7. And / or audio data. In the present embodiment, three threshold time data from first to third are stored as an example.

画像データ１１としては、擬人化したエージェントがヒマな状態や退屈な状態であることを内容とした画像データが記憶されている。図２（ａ）〜（ｄ）に画像データ１１の例を示す。図２（ａ）は、エージェント機能が通常に作動している場合（即ち、携帯電話機１００が頻繁に使用されている場合）に表示部２２に表示させるエージェントの画像を表している。画像は静止画でもよいが、ユーザにより親近感を抱かせるために動画であることが好ましい。携帯電話機１００が普通に使用されているときには、従来からのエージェントと同様に、笑ったり、微笑んだりする表情を表示するようになっている。 As the image data 11, image data is stored that indicates that the anthropomorphic agent is in a tedious state or a boring state. 2A to 2D show examples of the image data 11. FIG. 2A shows an agent image displayed on the display unit 22 when the agent function is operating normally (that is, when the mobile phone 100 is frequently used). The image may be a still image, but is preferably a moving image in order to make the user feel closer. When the mobile phone 100 is used normally, a facial expression that smiles or smiles is displayed as in the case of a conventional agent.

図２（ｂ）は、エージェントがあくびをしている画像を表したものである。携帯電話機１００が余り使用されていないときに、ヒマをもてあますエージェントの画像を表示させることで、携帯電話機１００の現実の使用状況とエージェントの画像とを観念的に連鎖させることが可能となる。 FIG. 2B shows an image of an agent yawning. When the mobile phone 100 is not used much, an image of the agent who is freed up is displayed, so that the actual usage status of the mobile phone 100 and the image of the agent can be ideally linked.

図２（ｃ）は、エージェントが寝ている画像を表したものである。上述の図２（ｂ）に表したのと同様に、エージェントの画像を携帯電話機１００の現実の使用状況と観念的に連鎖させることが可能となる。更に、一般に「あくび」という動作と「寝る」という動作とでは、ヒマな状態や退屈な状態を表現する上で「寝る」という動作の方がヒマな状態や退屈な状態の程度をより顕著に表現するものである。従って、携帯電話機１００がより長時間使用されていない場合に、図２（ｃ）に示すような画像を表示することにより、携帯電話機１００がより長時間使用されていないという現実の使用状況とエージェントの画像とを更に観念的に連鎖させることが可能となる。 FIG. 2C shows an image of the agent sleeping. Similarly to the case shown in FIG. 2B described above, the agent image can be ideally linked with the actual usage status of the mobile phone 100. Furthermore, in general, the action of “yawn” and the action of “sleep” are more prominent in expressing the state of dullness and boredom when the action of “sleeping” is expressed in a dull state and bored state. To express. Therefore, when the mobile phone 100 has not been used for a longer time, an actual usage situation and agent that the mobile phone 100 has not been used for a longer time by displaying an image as shown in FIG. 2C. It is possible to more ideally link these images.

図２（ｄ）は、エージェントが「家出」、「逃亡」、「行方不明」又は「消滅」等した画像を表したものである。上述の図２（ｂ）及び図２（ｃ）と同様に、エージェントの画像を携帯電話機１００の現実の使用状況と観念的に連鎖させることが可能となる。特に、「家出」、「逃亡」、「行方不明」又は「消滅」等の行動は、相手に対して愛想が尽くという感情を想起させる行動である。従って、エージェントの表現として人間的な反発を演出することができ、ユーザのエージェントに対する関心をより強く引くことができる。 FIG. 2D shows an image in which the agent “runs away”, “runaway”, “missing” or “disappears”. Similar to FIGS. 2B and 2C described above, the image of the agent can be conceptually linked to the actual usage status of the mobile phone 100. In particular, actions such as “run away”, “escape”, “missing”, or “disappearance” are actions that remind the other party of feelings of exhaustion. Therefore, human repulsion can be produced as an expression of the agent, and the user's interest in the agent can be more strongly attracted.

音声データ１２としては、エージェントがヒマな状態や退屈な状態であることを内容とした音声データが記憶されている。このような音声を発声させることで、ユーザに対して携帯電話機１００の使用を促すことができる。図３に音声データ１２の一例を示す。音声データ１２は、『あ〜ぁ・・・ヒマだなぁ〜・・・』、『おーい！遊んでよぉー！！』等のエージェントがヒマな状態あるいは退屈な状態にあることを内容とする。「エージェント機能一時停止処理」において、エージェントＣＰＵ１は、プログラムの指示に従い音声合成処理を行い、このようなヒマな状態や退屈な状態をスピーカ７から出力させるようになっている。音声合成処理では、エージェントＣＰＵ１は、単語の波形を表すデータである音片データと音素を構成するための波形を表す音素データに基づいて、図３に示すようなエージェントの台詞を読み上げる音声データを合成する。合成の手法は録音編集方式や規則合成方式（Ｒｕｌｅ−Ｂａｓｅｄｓｙｎｔｈｅｓｉｓ）を使用できる。録音方式は、予め人により読み上げられた単語単位の音声をデータベース化し、それらをつなぎあわせて出力する方式である。規則合成方式は、音韻（子音や母音）や仮名のような比較的小さな単位をつなぎ合わせて出力する方式である。 As the voice data 12, voice data is stored that indicates that the agent is in a tedious state or a boring state. By uttering such voice, the user can be prompted to use the mobile phone 100. FIG. 3 shows an example of the audio data 12. The audio data 12 is “Ah… It ’s time…”, “Oh! Play! ! ”And the like are in a timeless or boring state. In the “agent function pause process”, the agent CPU 1 performs a voice synthesis process in accordance with an instruction of the program, and outputs such a dull state or a boring state from the speaker 7. In the speech synthesis process, the agent CPU 1 reads speech data that reads out the speech of the agent as shown in FIG. 3 based on phoneme data representing the waveform of the phoneme and the speech piece data that is data representing the waveform of the word. Synthesize. As a synthesis method, a recording editing method or a rule-based synthesis method can be used. The recording method is a method in which speech in units of words read out by a person in advance is made into a database, and these are connected and output. The rule synthesis method is a method in which relatively small units such as phonemes (consonants and vowels) and kana are connected and output.

入力音声サンプルデータ１３は、「エージェント機能一時停止処理」でエージェント機能を一時停止させた後に復帰（再開）させる際のトリガとなる特定言語をデジタル音声化したものである。エージェントＣＰＵ１は、マイク５及びＤＳＰ４を介して入力されたユーザからの音声データを取得すると、この入力音声サンプルデータ１３に格納されるサンプル音声データとの比較を行い、一致又は所定の閾値（割合）を超えるほどに一致するサンプル音声データの検索を行う。その結果、一致（又は所定の閾値を超えるほどに一致）するサンプル音声データを検出することで、入力された音声が特定の言語であることを認識し、エージェント機能を復帰させるようになっている。
入力音声サンプルデータ１３としては、ユーザからエージェントに対する謝罪を内容とする特定の言語の音声データが記憶されている。図４に、入力音声サンプルデータの一例を示す。『ごめんなさい』、『お願い！帰ってきて！』、『私が悪かった』等の謝罪を内容とするものや、『いや〜電車が遅れちゃってさぁ』、『ここのところ残業続きでさぁ』等の言い訳を内容とするものをパターン化（分類）して格納している。 The input voice sample data 13 is obtained by digitalizing a specific language serving as a trigger when the agent function is paused by the “agent function pause process” and then returned (resumed). When the agent CPU 1 acquires the voice data from the user input via the microphone 5 and the DSP 4, the agent CPU 1 compares the voice data with the sample voice data stored in the input voice sample data 13, and matches or a predetermined threshold (ratio). Search for sample audio data that matches as the number exceeds. As a result, by detecting sample voice data that matches (or matches so as to exceed a predetermined threshold), it recognizes that the input voice is in a specific language and restores the agent function. .
As the input voice sample data 13, voice data in a specific language whose content is an apology from the user to the agent is stored. FIG. 4 shows an example of input voice sample data. "I'm sorry", "Please!" Come back! ”,“ I am bad ”, etc., and“ Ah, the train is delayed ”,“ What's overtime now, ” Classified) and stored.

ＤＳＰ４は、ユーザからマイク５を介して入力されたアナログ音声から図示しないＡ／Ｄ変換器を介して所定の周期でサンプリングを行い、デジタル音声データを生成する音声処理装置である。生成された音声データは、エージェントＣＰＵ１により、入力音声サンプルデータとの比較が行われ、両データが一致する又は所定の閾値以上に一致する場合に音声として認識される。
マイク５は、ユーザ等からの音声を取得しアナログ信号としてＤＳＰ４に出力するものである。なお、携帯電話機１００の送話部（不図示）をマイク５として兼用する構成としてもよい。 The DSP 4 is an audio processing device that samples digital audio from a user via a microphone 5 at a predetermined cycle via an A / D converter (not shown) and generates digital audio data. The generated voice data is compared with the input voice sample data by the agent CPU 1 and is recognized as voice when both data match or match a predetermined threshold value or more.
The microphone 5 acquires sound from a user or the like and outputs it to the DSP 4 as an analog signal. Note that a transmitter (not shown) of the mobile phone 100 may also be used as the microphone 5.

音源ＩＣ６は、エージェントＣＰＵ１から送信される音声データからアナログ音声データを生成する集積回路である。生成されたアナログ音声データはスピーカ７から出力される。本実施の形態では、エージェントＣＰＵ１、音声ＩＣ及びスピーカ７が、音声出力手段として機能する。 The sound source IC 6 is an integrated circuit that generates analog audio data from the audio data transmitted from the agent CPU 1. The generated analog audio data is output from the speaker 7. In the present embodiment, the agent CPU 1, the voice IC, and the speaker 7 function as voice output means.

時計回路８は、水晶振動子等の素子から構成され、エージェントＣＰＵ１に一定周期で信号を出力する。エージェントＣＰＵ１は、この信号を受信しつづけることで計時を行うことができる。本実施の形態では、エージェントＣＰＵ１と時計回路８が計時手段として機能する。 The clock circuit 8 is composed of an element such as a crystal resonator and outputs a signal to the agent CPU 1 at a constant period. The agent CPU 1 can measure time by continuing to receive this signal. In the present embodiment, the agent CPU 1 and the clock circuit 8 function as time measuring means.

表示部２２は、画像データ１１に基づいて擬人化されたエージェントの画像や携帯電話機１００の通話機能等に備わる各種の画像を表示するものである。 The display unit 22 displays an agent image that is anthropomorphic based on the image data 11 and various images that are provided in the call function of the mobile phone 100.

次に、図５に示すフロー図を用いて、携帯電話機１００の動作について説明する。以下の処理を直接的に行うのは図示しないプロセッサ等である場合もあるが、簡単のために、エージェントＣＰＵ１がプログラムの指示によって行うものとして説明を行う。
なお、携帯電話機１００を使用していない時間を検出するための基準となる、上述の閾値時間データとしては、第１閾値時間データが「２時間」、第２閾値データが「１０時間」、第３閾値時間データを「３６時間」とする。 Next, the operation of the mobile phone 100 will be described using the flowchart shown in FIG. Although the following processing may be directly performed by a processor or the like (not shown), for the sake of simplicity, description will be made assuming that the agent CPU 1 performs the processing according to a program instruction.
Note that, as the above-described threshold time data serving as a reference for detecting the time when the mobile phone 100 is not used, the first threshold time data is “2 hours”, the second threshold data is “10 hours”, 3 The threshold time data is “36 hours”.

ステップＳ１０１で、エージェントＣＰＵ１は、入力部２３から送信される入力信号を、携帯電話機制御部２０を介して検出する。各種のキーから構成される入力部２３から入力信号が送信されるということは、ユーザが携帯電話機１００を実際に使用している状態である。
ステップＳ１０２で、エージェントＣＰＵ１は、ステップＳ１０１で入力信号を検出したことをトリガとして、時計回路８から送信される信号に基づいて計時を開始する。 In step S <b> 101, the agent CPU 1 detects an input signal transmitted from the input unit 23 via the mobile phone control unit 20. The fact that the input signal is transmitted from the input unit 23 composed of various keys means that the user is actually using the mobile phone 100.
In step S <b> 102, the agent CPU 1 starts measuring time based on the signal transmitted from the clock circuit 8 using the detection of the input signal in step S <b> 101 as a trigger.

ステップＳ１０３で、エージェントＣＰＵ１は、携帯電話機制御部２０を介して受信する入力部２３からの入力信号を監視し、ステップＳ１０２で計時を開始した後に、更に入力信号を検出するか否かの判断を行う。エージェントＣＰＵ１は、更に入力信号を検出しないと判断する場合、ステップＳ１０４に進む（ステップＳ１０３：ＮＯ）。逆に、エージェントＣＰＵ１は、入力信号を検出すると判断する場合、ステップＳ１０２に戻り、それまで計時していたカウンタをリセットして再び計時を開始する（ステップＳ１０３：ＹＥＳ）。 In step S103, the agent CPU 1 monitors the input signal from the input unit 23 received via the mobile phone control unit 20, and determines whether or not to further detect the input signal after starting the timing in step S102. Do. When the agent CPU1 determines that the input signal is not further detected, the agent CPU1 proceeds to step S104 (step S103: NO). Conversely, if the agent CPU 1 determines that an input signal is detected, the agent CPU 1 returns to step S102, resets the counter that has timed until then, and starts time measurement again (step S103: YES).

ステップＳ１０４で、エージェントＣＰＵ１は、ステップＳ１０２で計時を開始してからの経過時間と、予めエージェントプログラム１０に記憶されている第１閾値時間データ（２時間）との比較を行い、経過時間がこの第１閾値時間データを超えるか否かの判断を行う。エージェントＣＰＵ１は、経過時間が第１閾値時間データを超過すると判断する場合、ステップＳ１０５に進む（ステップＳ１０４：ＹＥＳ）。逆に、エージェントＣＰＵ１は、経過時間が第１閾値時間データを超過しないと判断する場合、ステップＳ１０３に戻る（ステップＳ１０４：ＮＯ）。 In step S104, the agent CPU 1 compares the elapsed time from the start of time measurement in step S102 with the first threshold time data (2 hours) stored in the agent program 10 in advance. It is determined whether or not the first threshold time data is exceeded. When determining that the elapsed time exceeds the first threshold time data, the agent CPU 1 proceeds to step S105 (step S104: YES). Conversely, when the agent CPU 1 determines that the elapsed time does not exceed the first threshold time data, the agent CPU 1 returns to step S103 (step S104: NO).

ステップＳ１０５で、エージェントＣＰＵ１は、記憶部３に記憶された音声データ１２から所定の音声データを合成して音源ＩＣに送信し、スピーカ７を介して音声（例えば、『あ〜ぁ・・・ヒマだなー・・・』）を出力する。 In step S105, the agent CPU 1 synthesizes predetermined audio data from the audio data 12 stored in the storage unit 3 and transmits the synthesized audio data to the sound source IC, and the audio (for example, “Ah ... Dana ... ”) is output.

ステップＳ１０６で、エージェントＣＰＵ１は、携帯電話機制御部２０を介して入力部２３から入力される入力信号を再び監視し、この入力信号を検出するか否かの判断を行う。エージェントＣＰＵ１は、入力信号を検出すると判断する場合、ステップＳ１０２戻り、カウンタをリセットして再度計時を開始する。逆に、エージェントＣＰＵ１は、入力信号を検出しない場合、予めエージェントプログラム１０に記憶されている第２閾値時間データ（１０時間）との比較を行い、経過時間がこの第２閾値時間データを超えるか否かの判断を行う。エージェントＣＰＵ１は、経過時間が第２閾値時間データを超過すると判断する場合、ステップＳ１０８に進む（ステップＳ１０７：ＹＥＳ）。逆に、エージェントＣＰＵ１は、経過時間が第２閾値時間データを超過しないと判断する場合、ステップＳ１０６に戻る（ステップＳ１０７：ＮＯ）。 In step S106, the agent CPU 1 monitors the input signal input from the input unit 23 via the mobile phone control unit 20 again, and determines whether to detect this input signal. If the agent CPU 1 determines to detect the input signal, the agent CPU 1 returns to step S102, resets the counter, and starts counting again. On the contrary, if the agent CPU 1 does not detect the input signal, the agent CPU 1 compares the second threshold time data (10 hours) stored in the agent program 10 in advance, and whether the elapsed time exceeds the second threshold time data. Make a decision. When the agent CPU1 determines that the elapsed time exceeds the second threshold time data, the agent CPU1 proceeds to step S108 (step S107: YES). Conversely, when the agent CPU 1 determines that the elapsed time does not exceed the second threshold time data, the agent CPU 1 returns to step S106 (step S107: NO).

ステップＳ１０８で、エージェントＣＰＵ１は、記憶部３に記憶された画像データ１１（例えば、図２（ｂ）又は（ｃ）の画像データ）を読み込み、表示部２２に擬人化されたエージェントの画像を表示する。 In step S108, the agent CPU 1 reads the image data 11 (for example, the image data of FIG. 2B or 2C) stored in the storage unit 3, and displays the anthropomorphized agent image on the display unit 22. To do.

ステップＳ１０９で、エージェントＣＰＵ１は、携帯電話機制御部２０を介して入力部２３から入力される入力信号を再度監視し、入力信号を検出するか否かの判断を行う。エージェントＣＰＵ１は、入力信号を検出しない場合、ステップＳ１１０に進む（ステップＳ１０９：ＮＯ）。逆に、エージェントＣＰＵ１は、入力信号を検出する場合、ステップＳ１０２に戻り、それまで計時していたカウンタをリセットして再び計時を開始する（ステップＳ１０９：ＹＥＳ）。 In step S109, the agent CPU 1 again monitors the input signal input from the input unit 23 via the mobile phone control unit 20, and determines whether to detect the input signal. When the agent CPU1 does not detect the input signal, the agent CPU1 proceeds to step S110 (step S109: NO). On the other hand, when detecting the input signal, the agent CPU 1 returns to step S102, resets the counter that has timed until then, and starts counting again (step S109: YES).

ステップＳ１１０で、エージェントＣＰＵ１は、ステップＳ１０２で計時を開始してからの経過時間と、予めエージェントプログラム１０に記憶されている第３閾値時間データ（３６時間）との比較を行い、経過時間がこの第３閾値時間データを超えるか否かの判断を行う。エージェントＣＰＵ１は、経過時間が第３閾値時間データを超過すると判断する場合、ステップＳ１１１に進む（ステップＳ１１０：ＹＥＳ）。逆に、エージェントＣＰＵ１は、経過時間が第３閾値時間データを超過しないと判断する場合、ステップＳ１０９に戻る（ステップＳ１１０：ＮＯ）。 In step S110, the agent CPU 1 compares the elapsed time from the start of timing in step S102 with the third threshold time data (36 hours) stored in the agent program 10 in advance, and the elapsed time is It is determined whether or not the third threshold time data is exceeded. When determining that the elapsed time exceeds the third threshold time data, the agent CPU 1 proceeds to step S111 (step S110: YES). Conversely, when the agent CPU 1 determines that the elapsed time does not exceed the third threshold time data, the agent CPU 1 returns to step S109 (step S110: NO).

ステップＳ１１１で、エージェントＣＰＵ１は、記憶部３に記憶された画像データ１１（図２（ｄ）の画像データ）を読み込み、表示部２２に画像を表示する。 In step S111, the agent CPU 1 reads the image data 11 (image data in FIG. 2D) stored in the storage unit 3, and displays the image on the display unit 22.

ステップＳ１１２で、エージェントＣＰＵ１は、入力部２３から所定時間内に連続して入力信号を検出するか否かを判断する。エージェントＣＰＵ１が、連続して入力信号を検出しない場合、連続信号を検出するまで待機状態となる（ステップＳ１１２：ＮＯ）。即ち、エージェントＣＰＵ１は、入力信号を連続して検出するまではステップＳ１１１で表示した画像（図２（ｄ））を表示し続ける。これにより、エージェントが「家出」等をすると所定条件を満たすまで帰ってこないというユーザに対して反発的な態度を振舞うという演出を行うことができる。
逆に、エージェントＣＰＵ１は、入力部２３から所定時間内に連続して入力信号を検出する場合、ステップＳ１１３に進む（ステップＳ１１２：ＹＥＳ）。 In step S112, the agent CPU 1 determines whether or not to continuously detect an input signal from the input unit 23 within a predetermined time. When the agent CPU 1 does not detect the input signal continuously, the agent CPU 1 is in a standby state until it detects the continuous signal (step S112: NO). That is, the agent CPU 1 continues to display the image displayed in step S111 (FIG. 2D) until the input signal is continuously detected. Thereby, when the agent “runs away” or the like, it is possible to perform an effect of acting a repulsive attitude toward the user who does not return until the predetermined condition is satisfied.
Conversely, if the agent CPU 1 detects an input signal continuously from the input unit 23 within a predetermined time, the process proceeds to step S113 (step S112: YES).

ステップＳ１１３で、エージェントＣＰＵ１は、マイク５を介して入力される音声から入力音声サンプルデータ１３に一致（又は、所定の閾値以上に一致）する音声を認識するか否かの判断を行う。即ち、入力音声サンプルデータ１３として謝罪の言葉が記憶されているため、ユーザはマイク５を介してエージェントに対して所定の謝罪の言葉を音声入力する必要がある。これにより、エージェントが「家出」等をすると所定条件を満たすまで帰ってこないというエージェントがユーザに対して反発的な態度を振舞うという演出を更に効果的に行うことができる。
また、このステップＳ１１３の判断がＮＯである場合、即ちエージェントＣＰＵ１が謝罪の言葉を認識しない場合、ステップＳ１１２に戻るように構成している。このように構成することにより、エージェントが「家出」等をすると所定条件を２つ満たすまで帰ってこないという、ユーザに対してより反発的な態度を振舞うという演出を更に効果的に行うことができる。 In step S113, the agent CPU 1 determines whether or not a voice that matches the input voice sample data 13 (or matches a predetermined threshold or more) is recognized from the voice input through the microphone 5. That is, since a word of apology is stored as the input voice sample data 13, the user needs to input a predetermined apology word to the agent via the microphone 5. Thereby, when the agent “runs away” or the like, the agent that does not come back until the predetermined condition is satisfied can be more effectively performed such that the agent behaves repulsive toward the user.
If the determination in step S113 is NO, that is, if the agent CPU1 does not recognize an apology, the process returns to step S112. By configuring in this way, when the agent “runs away” or the like, it is possible to more effectively perform an effect of acting a more repulsive attitude toward the user that the agent does not return until two predetermined conditions are satisfied. .

ステップＳ１１４で、エージェントＣＰＵ１は、エージェント機能の一時停止機能を解除し、通常の処理状態に復帰する。即ち、表示部２２に図２（ａ）に示す画像を表示し、携帯電話機１００の各種処理におけるエージェント機能を実行する。 In step S114, the agent CPU 1 cancels the temporary stop function of the agent function and returns to the normal processing state. That is, the image shown in FIG. 2A is displayed on the display unit 22 and the agent function in various processes of the mobile phone 100 is executed.

このように、本発明を適用した携帯電話機１００によれば、入力部２３を介してユーザから入力される入力信号を検出してから、次の入力信号を検出するまでの経過時間が所定の閾値時間を超過するか否かを判断し、超過する場合に、エージェントがヒマな状態や退屈な状態である画像を表示させたり、ヒマな状態や退屈な状態である旨の音声を出力させたりしてユーザに対して反発的になるという演出を行うことができる。これにより一層人間的な動作を行うエージェント機能を実現することができる。 As described above, according to the mobile phone 100 to which the present invention is applied, the elapsed time from the detection of the input signal input from the user via the input unit 23 to the detection of the next input signal is a predetermined threshold value. Judgment whether or not the time is exceeded, and if it exceeds the time, the agent displays a picture that is in a dull state or a dull state, or outputs a sound that indicates a dull state or a dull state. Thus, it is possible to produce an effect of being repulsive to the user. This makes it possible to realize an agent function that performs a more human operation.

また、携帯電話機１００は、現実に携帯電話機１００を使用していない時間に基づいて、エージェントにヒマな状態や退屈な状態等のユーザに対して反抗的な画像を表示したり音声を出力したりするようになっている。従って、携帯電話機１００の不使用時間とエージェントの動作とが観念的に連鎖し、あたかも携帯電話機１００自体に人間的な意思があるかのように見せるエージェント機能本来の演出効果を更に効果的に行うことができる。 In addition, the mobile phone 100 displays a rebellious image or outputs a sound to the user in a state of being in a dull state or bored based on the time when the mobile phone 100 is not actually used. It is supposed to be. Therefore, the non-use time of the mobile phone 100 and the operation of the agent are linked conceptually, and the effect of the original agent function that makes it appear as if the mobile phone 100 itself has a human intention is more effectively performed. be able to.

また、携帯電話機１００は、携帯電話機１００の不使用の時間に基づいて（最終的に）エージェントに「家出」をさせる画像を表示するようになっている。この「家出」という極めて人間的な動作を行わせることにより、ユーザに対してエージェントから愛想を尽かされたかのような感情を想起させることができる。これによりエージェントに対する意外性を向上させることができ、エージェントとのコミュケーションに対する関心を高めることができる。 In addition, the mobile phone 100 displays an image that causes the agent to “run away” based on the non-use time of the mobile phone 100 (finally). By causing the human operation of “run away” to occur, the user can be reminded of feelings as if the agent had exhausted his / her affection. Thereby, the unexpectedness with respect to the agent can be improved, and the interest in communication with the agent can be increased.

また、携帯電話機１００は、エージェントに「家出」をさせた後は、入力部２３からの入力信号を所定間隔で連続して検出し且つユーザからの謝罪の言葉を認識するまではエージェント機能の一時機能停止状態に陥る。一時機能停止状態を解除するために、ユーザに対してこのような実際の人間同士で生ずるような行為を要求するように構成することで、エージェントとユーザとの関係がより対等なものに近づけることができるという効果がある。 In addition, after causing the agent to “run away”, the mobile phone 100 temporarily detects the input signal from the input unit 23 at predetermined intervals and continues to temporarily display the agent function until the user's apology is recognized. It stops functioning. In order to cancel the temporary function suspension state, it is configured to require the user to perform such an action that occurs between actual humans, thereby bringing the relationship between the agent and the user closer to equality. There is an effect that can be.

以上、本発明を実施するための最良の形態について説明したが、本発明は上記種々の例に限定されるものではない。特に、上述の例では、ユーザに対する反発的な画像データ及び音声データとして、ヒマな状態あるいは退屈な状態を内容とするものについて説明したが、怒った状態、不機嫌な状態、無視する状態、我儘な状態、悲しむ状態、いじける状態等、およそユーザに対して反発的な感情を想起させる状態を適用することも当然に可能である。
同様に、エージェント機能が一時機能停止状態に陥ってから通常の機能状態に復帰させるために、ダイヤルキー等の連続押しと謝罪の言葉の入力とを要求する構成としているが、この例に限ったものではない。例えば、図４に示すように、謝罪の言葉に変えて言い訳の言葉を入力する構成としてもよいし、ダイヤルキーの長押し、謝罪の言葉の後に言い訳の言葉を要求する構成あるいはこの逆、謝罪の言葉を所定回数連続して認識する構成等種々の形態を適用することができることは無論である。 As mentioned above, although the best form for implementing this invention was demonstrated, this invention is not limited to the said various example. In particular, in the above-described example, the repulsive image data and audio data for the user have been described as having contents of a dull state or a dull state, but an angry state, a displeased state, a neglected state, a selfish state Of course, it is possible to apply a state that reminds the user of a repulsive feeling, such as a state, a state of sadness, and a state of teasing.
Similarly, in order to return the agent function to the normal function state after falling into the temporary function stop state, it is configured to require continuous pressing of dial keys and the input of an apology word. It is not a thing. For example, as shown in FIG. 4, an excuse may be entered instead of an apology, or a dial key may be pressed and an excuse may be requested after an apology, or vice versa. Of course, it is possible to apply various forms such as a configuration for recognizing the above word continuously a predetermined number of times.

本発明のエージェント装置を備えた携帯電話機の構成を示したブロック図である。It is the block diagram which showed the structure of the mobile telephone provided with the agent apparatus of this invention. 図２（ａ）から（ｄ）は、本発明のエージェント装置を備えた携帯電話機の表示部に表示するエージェントの例を示した模式図である。FIGS. 2A to 2D are schematic views showing examples of agents displayed on a display unit of a mobile phone provided with the agent device of the present invention. 本発明のエージェント装置を備えた携帯電話機で合成出力される音声データの例を示した模式図である。It is the schematic diagram which showed the example of the audio | voice data synthesize | combined and output with the mobile telephone provided with the agent apparatus of this invention. 本発明のエージェント装置を備えた携帯電話機に格納された入力音声サンプルデータの例を示した模式図である。It is the schematic diagram which showed the example of the input audio | voice sample data stored in the mobile telephone provided with the agent apparatus of this invention. 本発明のエージェント装置を備えた携帯電話機の処理手順を示したフロー図である。It is the flowchart which showed the process sequence of the mobile telephone provided with the agent apparatus of this invention.

Explanation of symbols

１エージェントＣＰＵ
２ＲＡＭ
３記憶部
４ＤＳＰ
５マイク
６音源ＩＣ
７スピーカ
８時計回路
９バス
１０エージェントプログラム
１１画像データ
１２音声データ
１３入力音声サンプルデータ
２０携帯電話機制御部
２１通信／通話機能部
２２表示部
２３入力部 1 Agent CPU
2 RAM
3 Storage unit 4 DSP
5 Microphone 6 Sound source IC
7 Speaker 8 Clock circuit 9 Bus 10 Agent program 11 Image data 12 Audio data 13 Input audio sample data 20 Mobile phone control unit 21 Communication / call function unit 22 Display unit 23 Input unit

Claims

Display means for displaying a plurality of image data indicating the state of the anthropomorphic agent;
Storage means for storing a plurality of audio data including the image data and specific audio data including gratitude ;
Voice synthesis means for synthesizing voice data stored in the storage means;
Voice output means for outputting the voice data synthesized by the voice synthesis means;
An input means for inputting an external instruction;
Voice recognition means for recognizing voice data among instructions input by the input means ;
Detection means for detecting a speech data input before Symbol speech recognition means,
Timing means for performing timed the audio data from being detected by the detection means to the next audio data is detected,
Determining means for determining whether or not the time measured by the time measuring means exceeds a predetermined time;
Wherein when it is determined to exceed the predetermined time by the judging means, the external control means to further outputs audio data to prompt an input by said input means to said audio output means by synthesized by the voice synthesis section and, with a,
The control means temporarily stops outputting the audio data from the audio output means when the determination means determines that a predetermined time is exceeded after the output of the audio data prompting the input by the input means. Let
After the output of the voice data from the voice output unit is temporarily stopped, when the voice data is recognized by the voice recognition unit, the voice data recognized by the voice recognition unit and the storage unit Compare with the stored specific audio data to determine whether they match,
If the voice data recognized by the voice recognition unit determines that matches the particular voice data, and wherein Rukoto restarts the output of the audio data from the temporarily stopped the voice output unit Agent device to perform.

The agent device according to claim 1 ,
The control means includes
Wherein when it is determined to exceed the predetermined time by the judging means, the agent device, characterized in that to further display the image data for prompting an input by the input means on the display means.

The agent device according to claim 1 ,
The control means includes
If it is determined to exceed the predetermined time by the determination means, the image data of the voice data urging input by said input means in place of the Rukoto is more output to the audio output means, prompting the user to input by said input means the agent device for causing displayed in said display means.

The agent device according to claim 2 or 3 , wherein
The agent device characterized in that the image data that prompts input by the input means shows a cast or boring aspect of the personified agent.