JP7605582B2

JP7605582B2 - Object guidance control device, object guidance control method, and program

Info

Publication number: JP7605582B2
Application number: JP2019144180A
Authority: JP
Inventors: 優坪内; 隆史松本
Original assignee: Shimizu Corp
Current assignee: Shimizu Corp
Priority date: 2019-08-06
Filing date: 2019-08-06
Publication date: 2024-12-24
Anticipated expiration: 2039-08-06
Also published as: JP2021025884A

Description

本発明は、対象案内制御装置、対象案内制御方法、およびプログラムに関する。 The present invention relates to a target guidance control device, a target guidance control method, and a program.

従来、車両において経路案内や周囲の施設の案内等を行うガイドシステムが普及している。当該ガイドシステムが搭載されたカーナビゲーション装置は、自動車等の車両における経路案内によく用いられている。また、当該ガイドシステムは、観光客に対する観光案内にも用いられている。 Traditionally, guidance systems that provide route guidance and guidance to surrounding facilities in vehicles have become widespread. Car navigation devices equipped with such guidance systems are often used for route guidance in vehicles such as automobiles. Such guidance systems are also used to provide tourist information to tourists.

例えば、下記特許文献１には、車内にバスガイド等の案内者がいなくても、観光客に対して観光案内を自動で行うことが可能なガイドシステムが開示されている。当該技術では、車外を撮像可能に設けられた撮像装置が観光案内の対象を撮像した場合、あらかじめ用意された音声データに基づく音声出力により観光案内が行われる。 For example, the following Patent Document 1 discloses a guide system that can automatically provide tourist information to tourists even when there is no guide such as a bus guide on board the vehicle. In this technology, when an imaging device that is installed to be able to capture images of the outside of the vehicle captures an image of a target for tourist information, tourist information is provided by audio output based on previously prepared audio data.

また、近年では、スマートフォンの普及に伴い、スマートフォンで経路案内や周囲の施設の案内が可能なアプリケーションが普及している。そのため、カーナビゲーション装置がスマートフォンで代用されることもある。さらに、音声認識技術の発展に伴い、音声により制御可能なスピーカ型の音声インタフェース装置（例えばスマートスピーカ）も普及してきている。そのため、将来的に、カーナビゲーション装置がスマートスピーカに置き換わり、ユーザの音声入力に基づき、スマートスピーカが経路案内や車両の周囲の施設の案内等を行う可能性がある。 In recent years, with the spread of smartphones, applications that can provide route guidance and guidance to nearby facilities on smartphones have also become widespread. As a result, car navigation devices are sometimes substituted with smartphones. Furthermore, with the development of voice recognition technology, speaker-type voice interface devices (e.g., smart speakers) that can be controlled by voice are also becoming popular. Therefore, in the future, car navigation devices may be replaced by smart speakers, which may provide route guidance and guidance to facilities around the vehicle based on the user's voice input.

特開２０１７－１９４８９８号公報JP 2017-194898 A

しかしながら、従来のスマートスピーカが移動している車両の車内に設けられた場合、当該スマートスピーカは、車内の情報を取得することはできるが、車外の情報を取得することは困難である。例えば、ユーザが車外のランドマーク等を「あの建物は何？」等のランドマークの名称を特定しない方法で案内の対象として指定した場合、スマートスピーカは、ユーザから案内の要求があったことは認識し得る。しかしながら、ユーザが車外のどのランドマークを案内の対象として指定したかをスマートスピーカが認識することは困難である。そのため、スマートスピーカは、ユーザが指定したランドマークを正しく認識することが難しいため、誤った案内を行う可能性がある。 However, when a conventional smart speaker is installed inside a moving vehicle, the smart speaker can obtain information inside the vehicle, but has difficulty obtaining information outside the vehicle. For example, if a user specifies a landmark outside the vehicle as a guidance target in a manner that does not specify the name of the landmark, such as "What is that building?", the smart speaker can recognize that the user has requested guidance. However, it is difficult for the smart speaker to recognize which landmark outside the vehicle the user specified as the guidance target. Therefore, the smart speaker may provide erroneous guidance because it has difficulty correctly recognizing the landmark specified by the user.

上述の課題を鑑み、本発明の目的は、車両に乗車しているユーザが案内を求める対象として指定した案内対象に関する案内の精度を向上することが可能な対象案内制御装置、対象案内制御方法、およびプログラムを提供することにある。 In view of the above problems, the object of the present invention is to provide a target guidance control device, a target guidance control method, and a program that can improve the accuracy of guidance regarding a target designated by a user in a vehicle as a target for which guidance is requested.

上述の課題を解決するために、本発明の一態様に係る対象案内制御装置は、車両に乗車しているユーザの音声と前記ユーザが案内を求める対象である案内対象を指定する前記ユーザのポーズが撮像された車内撮像画像と前記車両の周囲の撮像画像とに基づき、前記ユーザが指定した案内対象の候補を検出する検出部と、前記車両の位置情報に基づき前記車両の周囲の空間情報における前記車両の位置と向きを特定し、特定した前記車両の向きを基準に前記空間情報における前記案内対象の方向を特定し、特定した前記方向に基づいて前記候補から一義的に前記案内対象を決定できるか否かを判定し、決定できた場合は、前記案内対象を特定し、前記案内対象を決定できなかった場合は、前記ユーザに前記案内対象を絞りこむための発話を促すことによって前記ユーザから発話された音声を前記検出部に供給する特定部と、特定した前記案内対象と対応する前記空間情報に基づき、前記案内対象に関する出力情報を出力させる出力制御部と、を備える。 In order to solve the above-mentioned problems, a target guidance control device according to one embodiment of the present invention includes a detection unit that detects candidates for a guidance target designated by a user based on the voice of a user riding in a vehicle, an in-vehicle image in which a pose of the user is captured to designate a guidance target for which the user requests guidance, and an image of the surroundings of the vehicle; an identification unit that identifies a position and orientation of the vehicle in spatial information around the vehicle based on position information of the vehicle, identifies a direction of the guidance target in the spatial information based on the identified orientation of the vehicle, determines whether the guidance target can be uniquely determined from the candidates based on the identified direction, identifies the guidance target if it can be determined, and supplies the voice spoken by the user to the detection unit by prompting the user to speak to narrow down the guidance targets if it cannot be determined ; and an output control unit that outputs output information related to the guidance target based on the spatial information corresponding to the identified guidance target.

本発明の一態様に係る対象案内制御方法は、車両に乗車しているユーザの音声と前記ユーザが案内を求める対象である案内対象を指定する前記ユーザのポーズが撮像された車内撮像画像と前記車両の周囲の撮像画像とに基づき、前記ユーザが指定した案内対象の候補を検出することと、前記車両の位置情報に基づき前記車両の周囲の空間情報における前記車両の位置と向きを特定し、特定した前記車両の向きを基準に前記空間情報における前記案内対象の方向を特定し、特定した前記方向に基づいて前記候補から一義的に前記案内対象を決定できるか否かを判定し、決定できた場合は、前記案内対象を特定し、前記案内対象を決定できなかった場合は、前記ユーザに前記案内対象を絞りこむための発話を促すことによって前記ユーザから発話された音声を前記検出部に供給することと、特定した前記案内対象と対応する前記空間情報に基づき、前記案内対象に関する出力情報を出力させることと、を含み、プロセッサにより実行される。 A target guidance control method according to one aspect of the present invention includes: detecting candidates for a guidance target designated by a user based on a voice of a user riding in a vehicle, an in-vehicle image in which a pose of the user is captured to designate a guidance target for which the user requests guidance, and an image of the surroundings of the vehicle; identifying a position and orientation of the vehicle in spatial information around the vehicle based on position information of the vehicle, identifying a direction of the guidance target in the spatial information based on the identified orientation of the vehicle , determining whether the guidance target can be uniquely determined from the candidates based on the identified direction , identifying the guidance target if it is possible to determine it, and supplying the voice spoken by the user to the detection unit by prompting the user to speak to narrow down the guidance targets if it is not possible to determine the guidance target, and is executed by a processor.

本発明の一態様に係るプログラムは、コンピュータを、車両に乗車しているユーザの音声と前記ユーザが案内を求める対象である案内対象を指定する前記ユーザのポーズが撮像された車内撮像画像と前記車両の周囲の撮像画像とに基づき、前記ユーザが指定した案内対象の候補を検出する検出部と、前記車両の位置情報に基づき前記車両の周囲の空間情報における前記車両の位置と向きを特定し、特定した前記車両の向きを基準に前記空間情報における前記案内対象の方向を特定し、特定した前記方向に基づいて前記候補から一義的に前記案内対象を決定できるか否かを判定し、決定できた場合は、前記案内対象を特定し、前記案内対象を決定できなかった場合は、前記ユーザに前記案内対象を絞りこむための発話を促すことによって前記ユーザから発話された音声を前記検出部に供給する特定部と、特定した前記案内対象と対応する前記空間情報に基づき、前記案内対象に関する出力情報を出力させる出力制御部と、として機能させる。
A program according to one aspect of the present invention causes a computer to function as: a detection unit that detects candidates for a guidance target designated by a user based on the voice of a user riding in a vehicle, an in-vehicle image in which a pose of the user is captured to designate a guidance target for which the user requests guidance, and an image of the surroundings of the vehicle; an identification unit that identifies a position and orientation of the vehicle in spatial information around the vehicle based on position information of the vehicle, identifies a direction of the guidance target in the spatial information based on the identified orientation of the vehicle, determines whether the guidance target can be uniquely determined from the candidates based on the identified direction, identifies the guidance target if it can be determined, and supplies the voice spoken by the user to the detection unit by prompting the user to speak to narrow down the guidance targets if it cannot be determined; and an output control unit that outputs output information related to the guidance target based on the spatial information corresponding to the identified guidance target.

本発明によれば、車両に乗車しているユーザが案内を求める対象として指定した案内対象に関する案内の精度を向上することができる。 The present invention can improve the accuracy of guidance regarding a target specified by a user in a vehicle for which guidance is requested.

本発明の実施形態に係る対象案内制御システムの概要を示す図である。1 is a diagram showing an overview of an object guidance control system according to an embodiment of the present invention. 同実施形態に係る対象案内制御システムの構成例を示すブロック図である。2 is a block diagram showing a configuration example of a target guidance control system according to the embodiment. FIG. 同実施形態に係る案内対象の候補の検出の例の説明図である。11 is an explanatory diagram of an example of detection of candidates for guidance targets according to the embodiment; FIG. 同実施形態に係る車両の位置情報と３次元地図との対応付けの例の説明図である。11 is an explanatory diagram of an example of association between vehicle position information and a three-dimensional map according to the embodiment; FIG. 同実施形態に係る対象案内制御システムにおける処理の流れを示すフローチャートである。10 is a flowchart showing a process flow in the object guidance control system according to the embodiment. 同実施形態に係る案内対象特定処理の流れを示すフローチャートである。13 is a flowchart showing a flow of a guidance target specification process according to the embodiment.

以下、図面を参照しながら本発明の実施形態について詳しく説明する。
図１は、本発明の実施形態に係る対象案内制御システム１の概要を示す図である。対象案内制御システム１は、図１に示す対象案内制御装置１０、車両２０、およびネットワーク３０で構成される。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
1 is a diagram showing an overview of an object guidance control system 1 according to an embodiment of the present invention. The object guidance control system 1 is composed of an object guidance control device 10, a vehicle 20, and a network 30 shown in FIG.

対象案内制御装置１０は、車両２０のユーザ４０が案内を求める対象として指定した案内対象に関する案内の制御（以下、「案内制御」とも称される）を行う装置である。対象案内制御装置１０は、例えば、サーバ装置等のコンピュータにより実現される。 The target guidance control device 10 is a device that controls guidance (hereinafter also referred to as "guidance control") regarding a guidance target designated by a user 40 of a vehicle 20 as a target for which guidance is requested. The target guidance control device 10 is realized, for example, by a computer such as a server device.

対象案内制御装置１０は、図１に示すように、ネットワーク３０を介して車両２０と接続されており、車両２０と情報の送受信が可能である。本実施形態に係る対象案内制御装置１０は、ネットワーク３０を介して車両２０から受信した情報に基づき、案内制御を行う。また、対象案内制御装置１０は、ネットワーク３０を介して、案内制御に関する情報を車両２０へ送信する。 As shown in FIG. 1, the object guidance control device 10 is connected to the vehicle 20 via a network 30, and is capable of transmitting and receiving information to and from the vehicle 20. The object guidance control device 10 according to this embodiment performs guidance control based on information received from the vehicle 20 via the network 30. The object guidance control device 10 also transmits information related to guidance control to the vehicle 20 via the network 30.

車両２０は、搭乗者（例えばユーザ４０）を乗せて移動する移動体である。本実施形態に係る車両２０は、一例として、自動運転車で実現されるものとする。車両２０が自動運転車である場合、ユーザ４０は、必ずしも運転する必要はない。車両２０は、非自動運転車であってもよい。 The vehicle 20 is a moving body that moves with a passenger (e.g., a user 40) on board. As an example, the vehicle 20 according to this embodiment is realized as an autonomous vehicle. When the vehicle 20 is an autonomous vehicle, the user 40 does not necessarily need to drive it. The vehicle 20 may be a non-autonomous vehicle.

車両２０には、各種情報を取得するためのセンサ装置が設けられている。例えば、図１に示すように、マイク２１２、車内カメラ２１４ａ、車内カメラ２１４ｂ、車内カメラ２１４ｃ、車外カメラ２１６ａ、車外カメラ２１６ｂ、およびスピーカ２４２が車両２０に設けられる。マイク２１２は、車両２０の車内で生じた音声を取得可能に設けられる。車内カメラ２１４ａ～２１４ｃ（以下、「車内カメラ２１４」とも称される）は、車両２０の車内の様子を撮像可能に設けられる。例えば、車内カメラ２１４は、図１に示すように、それぞれ車両２０の上部前方、上部中央、上部後方に設けられ、各々の位置から車両２０の車内を撮像して撮像画像（静止画像または動画像）を取得する。車外カメラ２１６ａおよび車外カメラ２１６ｂ（以下、「車外カメラ２１６」とも称される）は、車両２０の車外の様子を撮像可能に設けられる。例えば、車外カメラ２１６は、図１に示すように、それぞれ車両２０の前方と後方に設けられ、各々の位置から車両２０の車外を撮像する。なお、車両２０に設けられるセンサ装置の種類、数、および位置は、図１に示す例に限定されない。 The vehicle 20 is provided with a sensor device for acquiring various information. For example, as shown in FIG. 1, the vehicle 20 is provided with a microphone 212, an in-vehicle camera 214a, an in-vehicle camera 214b, an in-vehicle camera 214c, an exterior camera 216a, an exterior camera 216b, and a speaker 242. The microphone 212 is provided so as to be able to acquire sounds generated inside the vehicle 20. The in-vehicle cameras 214a to 214c (hereinafter also referred to as "in-vehicle cameras 214") are provided so as to be able to capture images of the interior of the vehicle 20. For example, as shown in FIG. 1, the in-vehicle cameras 214 are provided at the upper front, upper center, and upper rear of the vehicle 20, respectively, and capture images (still images or moving images) of the interior of the vehicle 20 from each position. The exterior camera 216a and the exterior camera 216b (hereinafter also referred to as "exterior camera 216") are provided so as to be able to capture images of the exterior of the vehicle 20. For example, as shown in FIG. 1, the exterior cameras 216 are provided at the front and rear of the vehicle 20, respectively, and capture images of the outside of the vehicle 20 from their respective positions. Note that the type, number, and positions of the sensor devices provided on the vehicle 20 are not limited to the example shown in FIG. 1.

車両２０には、出力情報を出力するための出力装置も設けられている。例えば、図１に示すように、スピーカ２４２が車両２０に設けられる。スピーカ２４２は、出力情報として案内対象に関する情報（以下、「案内情報」とも称される）を車両２０の車内に向けて出力可能に設けられる。なお、車両２０に設けられる出力装置の種類、数、および位置は、図１に示す例に限定されない。また、出力装置は、ユーザ４０が有する端末であってもよい。ユーザ４０が有する端末は、例えば、スマートフォン、タブレット端末、ヒアラブル端末等である。 The vehicle 20 is also provided with an output device for outputting output information. For example, as shown in FIG. 1, a speaker 242 is provided in the vehicle 20. The speaker 242 is provided so as to be able to output information relating to a guidance target (hereinafter also referred to as "guidance information") as output information toward the interior of the vehicle 20. Note that the type, number, and location of the output device provided in the vehicle 20 are not limited to the example shown in FIG. 1. In addition, the output device may be a terminal owned by the user 40. The terminal owned by the user 40 is, for example, a smartphone, a tablet terminal, a hearable terminal, etc.

案内対象は、車両２０の周囲に存在するランドマークの内、ユーザ４０が指定したランドマークである。図１に示すように、車両２０の周囲にはランドマークとしてビル５０とビル５１がある。進行方向が矢印２２の方向である車両２０の車内にいるユーザ４０から見て、ビル５０はユーザ４０の右前方にあり、ビル５１はユーザ４０の左前方にある。なお、ランドマークはビルに限定されない。
案内情報は、案内対象の紹介情報、案内対象までの経路情報等を含む。 The guidance target is a landmark designated by the user 40 among the landmarks existing around the vehicle 20. As shown in Fig. 1, there are buildings 50 and 51 as landmarks around the vehicle 20. As seen from the user 40 inside the vehicle 20 traveling in the direction of the arrow 22, the building 50 is in front of the user 40 to the right, and the building 51 is in front of the user 40 to the left. Note that the landmarks are not limited to buildings.
The guidance information includes introduction information of the guidance target, route information to the guidance target, and the like.

図１では、ユーザ４０は、吹き出し４２に示されるように「右前のビル」と発話し、音声により案内対象を指定している。同時に、ユーザ４０は、指で矢印４４の方向（ユーザ４０から見て右前方）を指す動作により案内対象を指定している。これより、図１における案内対象はビル５０と特定される。なお、ユーザ４０は、案内対象を指定する際に、質問形式の発話を行ってもよい。例えば、ユーザ４０は、「右前のビルは何？」と発話してもよい。 In FIG. 1, the user 40 utters "the building ahead on the right" as shown in speech bubble 42, specifying the guidance target by voice. At the same time, the user 40 specifies the guidance target by pointing with his/her finger in the direction of arrow 44 (to the front right as seen by the user 40). As a result, the guidance target in FIG. 1 is identified as building 50. Note that the user 40 may utter a question when specifying the guidance target. For example, the user 40 may utter, "What building is ahead on the right?"

このような案内対象を特定する処理（以下、「案内対象特定処理」とも称される）は、車両２０にて取得される音声や、ユーザ４０が案内対象を指定する動作に基づき、対象案内制御装置１０により行われる。ユーザ４０が案内対象を指定する動作は、例えば、指で指す動作、棒状の物（例えば指示棒、ペンなど）で指す動作、腕全体で指す動作などである。また、対象案内制御装置１０は、ユーザ４０の顔の向きや視線の方向を示す情報も用いて、案内対象特定処理を行ってもよい。なお、案内対象特定処理の詳細は後述される。 The process of identifying such a guidance target (hereinafter also referred to as "guidance target identification process") is performed by the target guidance control device 10 based on the voice acquired by the vehicle 20 and the action of the user 40 designating the guidance target. The action of the user 40 designating the guidance target is, for example, a pointing action with a finger, a pointing action with a rod-shaped object (e.g., a pointer or pen), or a pointing action with the entire arm. The target guidance control device 10 may also perform the guidance target identification process using information indicating the direction of the user 40's face and the direction of their gaze. Details of the guidance target identification process will be described later.

なお、対象案内制御装置１０は、その構成要素の一部または全部を、車両２０に備えることが可能な装置として実現されてもよい。車両２０に備えられた対象案内制御装置１０の構成要素は、車両２０が備える各装置と情報の送受信が可能になるように、各装置と有線または無線により接続され得る。一方、車両２０に備えられなかった対象案内制御装置１０の構成要素は、例えばサーバ装置等により実現され、ネットワーク３０を介して、車両２０に備えられた対象案内制御装置１０の構成要素や、車両２０が備える各装置と連携を行い得る。 The object guidance control device 10 may be realized as a device that can have some or all of its components installed in the vehicle 20. The components of the object guidance control device 10 installed in the vehicle 20 can be connected to each device installed in the vehicle 20 by wire or wirelessly so as to be able to send and receive information with each device. On the other hand, the components of the object guidance control device 10 that are not installed in the vehicle 20 can be realized by, for example, a server device, and can communicate with the components of the object guidance control device 10 installed in the vehicle 20 and each device installed in the vehicle 20 via the network 30.

以上、本実施形態の概要について説明した。続いて、本実施形態に係る対象案内制御システム１の構成例について説明する。図２は、本実施形態に係る対象案内制御システム１の構成例を示すブロック図である。なお、以下では、対象案内制御装置１０の構成要素の全部が車両２０には備えられず、サーバ装置等により実現される場合の対象案内制御システム１の構成例について説明する。 The outline of this embodiment has been described above. Next, an example of the configuration of the object guidance control system 1 according to this embodiment will be described. FIG. 2 is a block diagram showing an example of the configuration of the object guidance control system 1 according to this embodiment. Note that, below, an example of the configuration of the object guidance control system 1 in the case where not all of the components of the object guidance control device 10 are provided in the vehicle 20, but are realized by a server device or the like, will be described.

（車両２０の構成例）
図２に示すように、車両２０は、取得部２１０、通信部２２０、制御部２３０、および案内情報出力部２４０を備える。 (Configuration example of vehicle 20)
As shown in FIG. 2 , the vehicle 20 includes an acquisition unit 210 , a communication unit 220 , a control unit 230 , and a guidance information output unit 240 .

（取得部２１０）
取得部２１０は、各種情報を取得する機能を有する。当該機能は、多様なセンサ装置により実現され得る。本実施形態に係る取得部２１０は、例えば、上述したマイク２１２、車内カメラ２１４、車外カメラ２１６等のセンサ装置を備える。また、取得部２１０は、車両２０の位置情報を取得するセンサ装置を備えてもよい。一例として、取得部２１０は、ＧＰＳ（Global Positioning System）を備える。これらのセンサ装置によるセンシング情報の取得後、取得部２１０は、センシング情報を通信部２２０へ出力する。 (Acquisition unit 210)
The acquisition unit 210 has a function of acquiring various information. This function can be realized by various sensor devices. The acquisition unit 210 according to this embodiment includes, for example, sensor devices such as the above-mentioned microphone 212, the in-vehicle camera 214, and the outside-vehicle camera 216. The acquisition unit 210 may also include a sensor device that acquires position information of the vehicle 20. As an example, the acquisition unit 210 includes a GPS (Global Positioning System). After acquiring the sensing information by these sensor devices, the acquisition unit 210 outputs the sensing information to the communication unit 220.

なお、取得部２１０が備えるセンサ装置はかかる例に限定されない。例えば、取得部２１０は、車両２０の車内外の様子を示す情報を取得するセンサ装置として、ＬｉＤＡＲ（Laser Imaging Detection and Ranging）等の測距センサを備えてもよい。 The sensor device equipped in the acquisition unit 210 is not limited to this example. For example, the acquisition unit 210 may be equipped with a distance measurement sensor such as LiDAR (Laser Imaging Detection and Ranging) as a sensor device that acquires information indicating the state inside and outside the vehicle 20.

（通信部２２０）
通信部２２０は、外部装置と通信を行う機能を有する。通信部２２０は、外部装置との通信にて受信する情報を制御部２３０へ出力する。一例として、通信部２２０は、ネットワーク３０を介して対象案内制御装置１０から受信した案内情報を、制御部２３０へ出力する。また、通信部２２０は、取得部２１０から入力される情報を対象案内制御装置１０へ送信する。一例として、通信部２２０は、取得部２１０から入力されるセンシング情報を、ネットワーク３０を介して対象案内制御装置１０へ送信する。 (Communication unit 220)
The communication unit 220 has a function of communicating with an external device. The communication unit 220 outputs information received through communication with the external device to the control unit 230. As an example, the communication unit 220 outputs guidance information received from the object guidance control device 10 via the network 30 to the control unit 230. The communication unit 220 also transmits information input from the acquisition unit 210 to the object guidance control device 10. As an example, the communication unit 220 transmits sensing information input from the acquisition unit 210 to the object guidance control device 10 via the network 30.

（制御部２３０）
制御部２３０は、車両２０の動作全般を制御する機能を有する。制御部２３０は、例えば、車両２０がハードウェアとして備えるＣＰＵ（Central Processing Unit）にプログラムを実行させることによって実現される。例えば、制御部２３０は、取得部２１０が有する各センサ装置の動作を制御する。また、制御部２３０は、通信部２２０から入力される案内情報の案内情報出力部２４０への出力を制御する。また、車両２０が自動運転車である場合、制御部２３０は、車両２０の自動運転を制御する。 (Control unit 230)
The control unit 230 has a function of controlling the overall operation of the vehicle 20. The control unit 230 is realized, for example, by causing a central processing unit (CPU) provided as hardware in the vehicle 20 to execute a program. For example, the control unit 230 controls the operation of each sensor device provided in the acquisition unit 210. In addition, the control unit 230 controls the output of guidance information input from the communication unit 220 to the guidance information output unit 240. In addition, when the vehicle 20 is an autonomous vehicle, the control unit 230 controls autonomous driving of the vehicle 20.

（案内情報出力部２４０）
案内情報出力部２４０は、案内情報の出力を行う機能を有する。案内情報は、例えば、制御部２３０から入力され、多様な出力装置により出力され得る。例えば、案内情報出力部２４０がスピーカ等の音声出力装置を備える場合、案内情報出力部２４０は、音声に変換された案内情報をスピーカから出力する。また、案内情報出力部２４０がディスプレイ等の表示装置を備える場合、案内情報出力部２４０は、画像に変換された案内情報をディスプレイに表示する。 (Guidance Information Output Unit 240)
The guidance information output unit 240 has a function of outputting guidance information. The guidance information may be input from the control unit 230, for example, and output by various output devices. For example, if the guidance information output unit 240 includes an audio output device such as a speaker, the guidance information output unit 240 outputs the guidance information converted into audio from the speaker. Also, if the guidance information output unit 240 includes a display device such as a display, the guidance information output unit 240 displays the guidance information converted into an image on the display.

（対象案内制御装置１０の構成例）
図２に示すように、対象案内制御装置１０は、通信部１１０、制御部１２０、および記憶部１３０を備える。 (Configuration example of the object guidance control device 10)
As shown in FIG. 2 , the object guidance control device 10 includes a communication unit 110, a control unit 120, and a storage unit 130.

（通信部１１０）
通信部１１０は、外部装置と通信を行う機能を有する。通信部１１０は、外部装置との通信にて受信する情報を制御部１２０へ出力する。一例として、通信部１１０は、ネットワーク３０を介して車両２０から受信したセンシング情報を、制御部１２０へ出力する。また、通信部１１０は、制御部１２０から入力される情報を車両２０へ送信する。一例として、通信部１１０は、制御部１２０から入力される案内情報を、ネットワーク３０を介して車両２０へ送信する。 (Communication unit 110)
The communication unit 110 has a function of communicating with an external device. The communication unit 110 outputs information received through communication with the external device to the control unit 120. As an example, the communication unit 110 outputs sensing information received from the vehicle 20 via the network 30 to the control unit 120. The communication unit 110 also transmits information input from the control unit 120 to the vehicle 20. As an example, the communication unit 110 transmits guidance information input from the control unit 120 to the vehicle 20 via the network 30.

（制御部１２０）
制御部１２０は、対象案内制御装置１０の動作全般を制御する機能を有する。制御部１２０は、例えば、対象案内制御装置１０がハードウェアとして備えるＣＰＵにプログラムを実行させることによって実現される。当該機能を実現するために、制御部１２０は、検出部１２２、特定部１２４、および出力制御部１２６を備える。 (Control unit 120)
The control unit 120 has a function of controlling the overall operation of the object guidance control device 10. The control unit 120 is realized, for example, by causing a CPU provided as hardware in the object guidance control device 10 to execute a program. To realize this function, the control unit 120 includes a detection unit 122, an identification unit 124, and an output control unit 126.

（検出部１２２）
検出部１２２は、案内対象の候補を検出する機能を有する。例えば、検出部１２２は、車両２０に乗車しているユーザ４０の音声とユーザ４０が案内対象を指定する動作と車両２０の周囲の撮像画像とに基づき、案内対象の候補を検出する。具体的に、検出部１２２は、撮像画像を分析することで案内対象の候補群を検出し、音声を分析して検出する案内対象に関連する自然言語と、ユーザ４０の動作を分析して検出する方向とに基づき、候補群の中から方向に存在する候補を検出する。 (Detection Unit 122)
The detection unit 122 has a function of detecting candidates for guidance targets. For example, the detection unit 122 detects candidates for guidance targets based on the voice of the user 40 riding in the vehicle 20, the action of the user 40 specifying the guidance target, and a captured image of the surroundings of the vehicle 20. Specifically, the detection unit 122 detects a group of candidates for guidance targets by analyzing the captured image, and detects candidates that exist in a direction from the group of candidates based on a natural language related to the guidance target detected by analyzing the voice and a direction detected by analyzing the action of the user 40.

ここで、図３を参照しながら、案内対象の候補の検出の例について説明する。図３は、本実施形態に係る案内対象の候補の検出の例の説明図である。まず、検出部１２２は、車両２０のマイク２１２が取得する音声を分析することで、音声から自然言語を検出する。検出した自然言語から、ユーザ４０が案内を求めていることを示すボイスコマンドが検出された場合、検出部１２２は、さらに音声を分析し、案内対象に関する自然言語を検出する。例えば、検出部１２２は、吹き出し４２に示す「右前のビル」という音声を案内対象に関連する自然言語として検出する。図３に示す例では、ユーザ４０は車両２０の進行方向である矢印２２の方向を向いているため、検出部１２２は、検出した自然言語から車両２０の右前にあるビル５０をユーザ４０が指していることを検出できる。 Here, an example of detection of a candidate for guidance will be described with reference to FIG. 3. FIG. 3 is an explanatory diagram of an example of detection of a candidate for guidance according to this embodiment. First, the detection unit 122 detects natural language from the voice by analyzing the voice acquired by the microphone 212 of the vehicle 20. When a voice command indicating that the user 40 is requesting guidance is detected from the detected natural language, the detection unit 122 further analyzes the voice and detects natural language related to the guidance target. For example, the detection unit 122 detects the voice "building in front to the right" shown in the speech bubble 42 as natural language related to the guidance target. In the example shown in FIG. 3, the user 40 is facing the direction of the arrow 22, which is the traveling direction of the vehicle 20, so the detection unit 122 can detect from the detected natural language that the user 40 is pointing to the building 50 in front to the right of the vehicle 20.

さらに、検出部１２２は、車両２０の車内カメラ２１４が取得する車内の撮像画像に写るユーザ４０のポーズを分析することで、ユーザ４０が矢印４４の方向を指していることを検出する。また、検出部１２２は、車両２０の車外カメラ２１６が取得する車外の撮像画像に写る車外風景を分析することで、図３に示すビル５０とビル５１を案内対象の候補群として検出する。以上の検出結果より、検出部１２２は、候補群の中から、車両２０の右前かつ矢印４４の方向にあるビル５０を案内対象の候補として検出する。 Furthermore, the detection unit 122 detects that the user 40 is pointing in the direction of the arrow 44 by analyzing the pose of the user 40 captured in the captured image inside the vehicle 20 acquired by the in-vehicle camera 214 of the vehicle 20. The detection unit 122 also detects buildings 50 and 51 shown in FIG. 3 as a group of candidates for guidance targets by analyzing the scenery outside the vehicle captured in the captured image outside the vehicle acquired by the exterior camera 216 of the vehicle 20. From the above detection results, the detection unit 122 detects building 50, which is located to the right of the vehicle 20 in the direction of the arrow 44, as a candidate for guidance targets from the group of candidates.

以上説明したように、検出部１２２は、移動している車両２０の車内のユーザ４０の音声とユーザ４０の動作に基づき、車両２０の車外風景から検出された案内対象の候補群とユーザ４０が指定した案内対象とを結び付けることができる。かかる構成により、検出部１２２は、移動している車両２０の車外のランドマークを案内対象の候補として検出することができる。また、検出部１２２は、案内対象の候補の検出時に、車両２０の車内のユーザ４０の音声とユーザ４０の動作の両方を用いるため、案内対象の候補の検出の精度を向上することができる。なお、検出部１２２は、候補群の中から、１つの案内対象の候補を検出してもよいし、複数の案内対象の候補を検出してもよい。 As described above, the detection unit 122 can link the group of candidate guidance targets detected from the scenery outside the vehicle 20 with the guidance target specified by the user 40 based on the voice of the user 40 inside the moving vehicle 20 and the actions of the user 40. With this configuration, the detection unit 122 can detect landmarks outside the moving vehicle 20 as candidate guidance targets. In addition, when detecting candidate guidance targets, the detection unit 122 uses both the voice of the user 40 inside the vehicle 20 and the actions of the user 40, and therefore can improve the accuracy of detection of candidate guidance targets. Note that the detection unit 122 may detect one candidate guidance target from the group of candidates, or may detect multiple candidate guidance targets.

（特定部１２４）
特定部１２４は、検出部１２２が検出した案内対象の候補から、ユーザ４０が指定した案内対象を特定する機能を有する。例えば、特定部１２４は、車両２０の位置情報に基づき車両２０の周囲の空間情報における車両２０の位置と向きを特定し、特定した車両２０の向きを基準に空間情報における案内対象の方向を特定し、特定した方向に基づいて案内対象の候補から案内対象を特定する。本実施形態に係る車両２０の周囲の空間情報は、例えば、車両２０の周囲の３次元地図、３次元地図上のランドマークのメタデータ等を含むランドマーク情報等である。なお、空間情報は、かかる例に限定されない。 (Identification unit 124)
The identification unit 124 has a function of identifying a guidance target designated by the user 40 from among the candidate guidance targets detected by the detection unit 122. For example, the identification unit 124 identifies the position and orientation of the vehicle 20 in spatial information around the vehicle 20 based on the position information of the vehicle 20, identifies the direction of the guidance target in the spatial information based on the identified orientation of the vehicle 20, and identifies the guidance target from the candidate guidance targets based on the identified direction. The spatial information around the vehicle 20 according to this embodiment is, for example, a three-dimensional map around the vehicle 20, landmark information including metadata of landmarks on the three-dimensional map, and the like. Note that the spatial information is not limited to such an example.

ここで、図４を参照しながら、車両２０の位置情報と３次元地図との対応付けについて説明する。図４は、本実施形態に係る車両２０の位置情報と３次元地図との対応付けの例の説明図である。図４に示すように、３次元地図６０は、ビル５０、ビル５１、および山５２等のランドマークの３次元データを有するものとする。 Here, the correspondence between the position information of the vehicle 20 and the three-dimensional map will be described with reference to FIG. 4. FIG. 4 is an explanatory diagram of an example of the correspondence between the position information of the vehicle 20 and the three-dimensional map according to this embodiment. As shown in FIG. 4, the three-dimensional map 60 has three-dimensional data of landmarks such as building 50, building 51, and mountain 52.

まず、特定部１２４は、車両２０のＧＰＳが取得する位置情報から３次元地図における車両２０の位置と向きを特定し、車両２０を３次元地図上にマッピングする。車両２０の向きは、例えば、ジャイロセンサのセンシング情報や車両２０の移動履歴等に基づき特定されてもよい。次いで、特定部１２４は、３次元地図上にて、矢印２２が示す車両２０の向きを基準に、ユーザ４０が指定した案内対象の方向を特定する。そして、特定部１２４は、案内対象の候補から、特定した方向（矢印４４）に存在する候補であるビル５０を案内対象として特定する。 First, the identification unit 124 identifies the position and orientation of the vehicle 20 on a three-dimensional map from the position information acquired by the GPS of the vehicle 20, and maps the vehicle 20 on the three-dimensional map. The orientation of the vehicle 20 may be identified, for example, based on sensing information from a gyro sensor or the movement history of the vehicle 20. Next, the identification unit 124 identifies the direction of the guidance target specified by the user 40 on the three-dimensional map based on the orientation of the vehicle 20 indicated by the arrow 22. Then, the identification unit 124 identifies, from among the candidates for the guidance target, a building 50 that is a candidate located in the identified direction (arrow 44) as the guidance target.

以上説明したように、特定部１２４は、車両２０の位置情報と３次元地図に基づき、検出部１２２が検出した案内対象の候補から、ユーザ４０が指定した案内対象を３次元地図上で一義的に特定することができる。かかる構成により、特定部１２４は、検出部１２２が１つの案内対象の候補を検出した場合には、案内対象であることの確度を高めることができる。また、特定部１２４は、検出部１２２が複数の案内対象の候補を検出した場合には、案内対象を絞り込むことができる。これにより、案内対象の特定精度が向上するため、ユーザ４０が指定した案内対象に関する案内の精度も向上することが可能となる。 As described above, the identification unit 124 can uniquely identify the guidance target specified by the user 40 on the 3D map from the candidate guidance targets detected by the detection unit 122 based on the position information of the vehicle 20 and the 3D map. With this configuration, when the detection unit 122 detects one candidate guidance target, the identification unit 124 can increase the certainty that the target is a guidance target. Furthermore, when the detection unit 122 detects multiple candidate guidance targets, the identification unit 124 can narrow down the guidance targets. This improves the accuracy of identifying the guidance target, and therefore makes it possible to improve the accuracy of guidance regarding the guidance target specified by the user 40.

（出力制御部１２６）
出力制御部１２６は、車両２０における出力を制御する機能を有する。例えば、出力制御部１２６は、特定した案内対象と対応する空間情報に基づき、案内対象に関する出力情報を車両２０の出力装置に出力させる。具体的に、まず、出力制御部１２６は、特定した案内対象のメタデータをランドマーク情報から取得し、メタデータに基づき案内情報を生成する。そして、出力制御部１２６は、生成した案内情報を通信部１１０に車両２０へ送信させ、車両２０の出力装置に案内情報を出力させる。 (Output control unit 126)
The output control unit 126 has a function of controlling the output in the vehicle 20. For example, the output control unit 126 causes an output device of the vehicle 20 to output output information related to the identified guidance target based on spatial information corresponding to the identified guidance target. Specifically, the output control unit 126 first acquires metadata of the identified guidance target from the landmark information, and generates guidance information based on the metadata. Then, the output control unit 126 causes the communication unit 110 to transmit the generated guidance information to the vehicle 20, and causes an output device of the vehicle 20 to output the guidance information.

案内対象がビルである場合、メタデータは、例えば、ビルの名称、住所、フロア数、各フロアのテナント情報等である。出力制御部１２６は、これらのメタデータに基づき、例えば、「あのビルはＡＡＡビルです。」という紹介文を紹介情報として生成する。そして、出力制御部１２６は、車両２０の出力装置に応じて、出力装置に紹介情報を出力させる。例えば、出力装置が音声出力装置である場合、出力制御部１２６は、紹介文を音声に変換して音声出力装置に出力させる。また、出力装置が表示装置である場合、出力制御部１２６は、紹介文を画像に変換して表示装置に出力させる。かかる構成により、ユーザ４０に対して、案内対象の特定後に取得するランドマーク情報に基づき、特定した案内対象や取得したランドマーク情報に応じた柔軟な案内を行うことができる。 When the guidance target is a building, the metadata is, for example, the building's name, address, number of floors, tenant information for each floor, etc. Based on this metadata, the output control unit 126 generates an introduction text, for example, "That building is the AAA Building," as the introduction information. Then, the output control unit 126 causes the output device to output the introduction information according to the output device of the vehicle 20. For example, when the output device is an audio output device, the output control unit 126 converts the introduction text into audio and outputs it to the audio output device. Also, when the output device is a display device, the output control unit 126 converts the introduction text into an image and outputs it to the display device. With this configuration, flexible guidance can be provided to the user 40 according to the identified guidance target and the acquired landmark information, based on the landmark information acquired after the guidance target is identified.

案内情報の生成時、出力制御部１２６は、車両２０やユーザ４０の状況、ユーザ４０のプロファイル情報に応じて、案内情報を生成してもよい。車両２０の状況は、例えば、車両２０の位置、向き、速度、移動経路等である。ユーザ４０の状況は、例えば、ユーザ４０の姿勢、車内におけるユーザ４０の位置等である。ユーザ４０のプロファイル情報は、例えば、ユーザ４０の名前、性別、身長、体重、趣味、対象案内制御システムの使用履歴等である。出力制御部１２６は、ユーザ４０のプロファイル情報に基づき、例えば、「あのビルはＡＡＡビルです。ビルの中には、以前にご案内したＢＢＢカフェがあります。」という紹介文を紹介情報として生成する。かかる構成により、ユーザ４０に対して、車両２０やユーザ４０の状況、ユーザ４０のプロファイル情報に応じた柔軟な案内を行うことができる。 When generating the guidance information, the output control unit 126 may generate the guidance information according to the status of the vehicle 20 and the user 40, and the profile information of the user 40. The status of the vehicle 20 is, for example, the position, direction, speed, and moving route of the vehicle 20. The status of the user 40 is, for example, the posture of the user 40, and the position of the user 40 in the vehicle. The profile information of the user 40 is, for example, the name, sex, height, weight, hobbies, and usage history of the target guidance control system of the user 40. Based on the profile information of the user 40, the output control unit 126 generates, for example, an introduction text such as "That building is the AAA building. Inside the building is the BBB cafe that we introduced previously." as the introduction information. With this configuration, flexible guidance can be provided to the user 40 according to the status of the vehicle 20 and the user 40, and the profile information of the user 40.

（記憶部１３０）
記憶部１３０は、各種情報を記憶する機能を有する。記憶部１３０は、記憶媒体、例えば、ＨＤＤ（Hard Disk Drive）、フラッシュメモリ、ＥＥＰＲＯＭ（Electrically Erasable Programmable Read Only Memory）、ＲＡＭ（Random Access read/write Memory）、ＲＯＭ（Read Only Memory）、またはこれらの記憶媒体の任意の組み合わせによって構成される。記憶部１３０は、例えば、不揮発性メモリを用いることができる。 (Memory unit 130)
The storage unit 130 has a function of storing various information. The storage unit 130 is configured by a storage medium, for example, a hard disk drive (HDD), a flash memory, an electrically erasable programmable read only memory (EEPROM), a random access read/write memory (RAM), a read only memory (ROM), or any combination of these storage media. The storage unit 130 may be, for example, a non-volatile memory.

記憶部１３０、例えば、３次元地図やランドマーク情報等の車両２０の周囲の空間情報を記憶する。また、記憶部１３０は、車両２０の取得部２１０が取得したセンシング情報、対象案内制御装置１０の特定部１２４が案内対象として特定したランドマーク、出力制御部１２６が生成した出力情報、ユーザ４０のプロファイル情報等を記憶してもよい。 The storage unit 130 stores spatial information about the surroundings of the vehicle 20, such as a three-dimensional map and landmark information. The storage unit 130 may also store sensing information acquired by the acquisition unit 210 of the vehicle 20, landmarks identified as guidance targets by the identification unit 124 of the target guidance control device 10, output information generated by the output control unit 126, profile information of the user 40, etc.

以上、対象案内制御装置１０の構成要素の全部が車両２０には備えられずサーバ装置等により実現される場合の対象案内制御システム１の構成例について説明した。なお、対象案内制御システム１の構成例はかかる例に限定されない。例えば、対象案内制御装置１０の構成要素の全部が車両２０に備えられる場合、外部装置と通信を行う機能を有する構成要素として、通信部１１０または通信部２２０の少なくとも一方が車両２０に設けられればよい。 The above describes an example configuration of the object guidance control system 1 in the case where not all of the components of the object guidance control device 10 are provided in the vehicle 20 but are realized by a server device or the like. Note that the example configuration of the object guidance control system 1 is not limited to this example. For example, in the case where all of the components of the object guidance control device 10 are provided in the vehicle 20, at least one of the communication unit 110 or the communication unit 220 may be provided in the vehicle 20 as a component having a function of communicating with an external device.

（対象案内制御システム１における処理の流れ）
続いて、本実施形態に係る対象案内制御システム１における処理の流れについて説明する。図５は、本実施形態に係る対象案内制御システム１における処理の流れを示すフローチャートである。なお、以下では、対象案内制御装置１０の構成要素の全部が車両２０には備えられず、サーバ装置等により実現される場合の対象案内制御システム１における処理の流れの例について説明する。 (Processing flow in the object guidance control system 1)
Next, a process flow in the object guidance control system 1 according to the present embodiment will be described. Fig. 5 is a flowchart showing a process flow in the object guidance control system 1 according to the present embodiment. Note that, below, an example of a process flow in the object guidance control system 1 in the case where not all of the components of the object guidance control device 10 are provided in the vehicle 20 but are realized by a server device or the like will be described.

まず、車両２０のマイク２１２は、車両２０の車内における音声を取得する（Ｓ１０２）。マイク２１２が取得した音声は、ネットワーク３０を介して対象案内制御装置１０へ送信される。マイク２１２が音声を取得すると、車両２０の制御部２３０は、車両２０の車内カメラ２１４をオンにする（Ｓ１０４）。また、制御部２３０は、車両２０の車外カメラ２１６もオンにする（Ｓ１０６）。 First, the microphone 212 of the vehicle 20 captures voice within the vehicle 20 (S102). The voice captured by the microphone 212 is transmitted to the target guidance control device 10 via the network 30. When the microphone 212 captures the voice, the control unit 230 of the vehicle 20 turns on the in-vehicle camera 214 of the vehicle 20 (S104). The control unit 230 also turns on the exterior camera 216 of the vehicle 20 (S106).

次いで、ユーザ４０が発話すると（Ｓ１０８）、マイク２１２がユーザ４０の発話を含む音声を取得し、当該音声がネットワーク３０を介して対象案内制御装置１０へ送信される。次いで、対象案内制御装置１０の制御部１２０は、案内を求めるボイスコマンドが受信した音声に含まれるか否かを判定する（Ｓ１１０）。音声にボイスコマンドが含まれない場合（Ｓ１１０／ＮＯ）、対象案内制御システム１は、音声にボイスコマンドが含まれるまでＳ１０８とＳ１１０の処理を行う。音声にボイスコマンドが含まれる場合（Ｓ１１０／ＹＥＳ）、制御部１２０は、案内対象特定処理を行う（Ｓ１１２）。なお、案内対象特定処理の詳細は後述される。 Next, when the user 40 speaks (S108), the microphone 212 acquires voice including the user's utterance, and the voice is transmitted to the object guidance control device 10 via the network 30. Next, the control unit 120 of the object guidance control device 10 determines whether the received voice contains a voice command requesting guidance (S110). If the voice does not contain a voice command (S110/NO), the object guidance control system 1 performs the processes of S108 and S110 until the voice contains a voice command. If the voice contains a voice command (S110/YES), the control unit 120 performs a guidance target identification process (S112). Details of the guidance target identification process will be described later.

案内対象特定処理後、制御部１２０は、案内対象として特定したランドマークを記録する（Ｓ１１４）。次いで、制御部１２０は、ランドマークのメタデータを３次元地図から取得する（Ｓ１１６）。次いで、制御部１２０は、車両２０やユーザ４０の状況とユーザ４０のプロファイル情報に応じて紹介文を生成する（Ｓ１１８）。紹介文の生成後、制御部１２０は、紹介文を音声に変換し、変換した音声を車両２０のスピーカ２４２に出力させることで、紹介文の読み上げを行う（Ｓ１２０）。紹介文の読み上げ後、制御部１２０は、次の対話があるか否かを判定する（Ｓ１２２）。次の対話がある場合（Ｓ１２２／ＹＥＳ）、制御部１２０は、Ｓ１１０から処理を繰り返す。次の対話がない場合（Ｓ１２２／ＮＯ）、制御部１２０は、次に音声が取得されるまで処理を終了する。 After the guidance target identification process, the control unit 120 records the landmark identified as the guidance target (S114). Next, the control unit 120 acquires metadata of the landmark from the 3D map (S116). Next, the control unit 120 generates an introduction text according to the situation of the vehicle 20 and the user 40 and the profile information of the user 40 (S118). After generating the introduction text, the control unit 120 converts the introduction text into audio and reads out the converted audio from the speaker 242 of the vehicle 20 (S120). After reading out the introduction text, the control unit 120 determines whether there is a next dialogue (S122). If there is a next dialogue (S122/YES), the control unit 120 repeats the process from S110. If there is no next dialogue (S122/NO), the control unit 120 ends the process until the next voice is acquired.

（案内対象特定処理の流れ）
続いて、Ｓ１１２の案内対象特定処理の詳細について説明する。図６は、本実施形態に係る案内対象特定処理の流れを示すフローチャートである。 (Flow of the guidance target specification process)
Next, the details of the guidance target specification process of S112 will be described below. Fig. 6 is a flowchart showing the flow of the guidance target specification process according to this embodiment.

まず、制御部１２０は、音声に含まれる自然言語を分析する（Ｓ２０２）。次いで、制御部１２０は、車両２０からネットワーク３０を介して受信する車内カメラ２１４の撮像画像に写るユーザ４０のポーズを分析する（Ｓ２０４）。次いで、制御部１２０は、車両２０からネットワーク３０を介して受信する車外カメラ２１６の撮像画像に写る車外風景を分析する（Ｓ２０６）。 First, the control unit 120 analyzes the natural language contained in the voice (S202). Next, the control unit 120 analyzes the pose of the user 40 shown in the image captured by the in-vehicle camera 214 and received from the vehicle 20 via the network 30 (S204). Next, the control unit 120 analyzes the scenery outside the vehicle shown in the image captured by the exterior camera 216 and received from the vehicle 20 via the network 30 (S206).

自然言語、ポーズ、および車外風景の分析後、制御部１２０は、案内対象を検出できたか否かを判定する（Ｓ２０８）。案内対象を検出できなかった場合（Ｓ２０８／ＮＯ）、制御部１２０は、ユーザ４０に対して再度発話を促す（Ｓ２２０）。 After analyzing the natural language, the pause, and the scenery outside the vehicle, the control unit 120 determines whether or not the guidance target has been detected (S208). If the guidance target has not been detected (S208/NO), the control unit 120 prompts the user 40 to speak again (S220).

案内対象を検出できた場合（Ｓ２０８／ＹＥＳ）、制御部１２０は、車両２０の位置と向きを取得する（Ｓ２１０）。次いで、制御部１２０は、車両２０の位置と向きを３次元地図と対応付ける（Ｓ２１２）。次いで、制御部１２０は、ユーザ４０のポーズの向きを車両２０の向きと対応付ける（Ｓ２１４）。次いで、制御部１２０は、車両２０の車外風景が写った撮像画像を３次元地図と対応付ける（Ｓ２１６）。 If the guidance target is detected (S208/YES), the control unit 120 acquires the position and orientation of the vehicle 20 (S210). Next, the control unit 120 associates the position and orientation of the vehicle 20 with the three-dimensional map (S212). Next, the control unit 120 associates the orientation of the pose of the user 40 with the orientation of the vehicle 20 (S214). Next, the control unit 120 associates a captured image showing the scenery outside the vehicle 20 with the three-dimensional map (S216).

各対応付け後、制御部１２０は、案内対象が３次元地図上で一義的に決定できるか否かを判定する（Ｓ２１８）。案内対象を一義的に決定できなかった場合（Ｓ２１８／ＮＯ）、制御部１２０は、ユーザ４０に対して、案内対象を絞り込むための発話を促す（Ｓ２２２）。案内対象を一義的に決定できた場合（Ｓ２１８／ＹＥＳ）、制御部１２０は、案内対象特定処理を終了する。 After each correspondence, the control unit 120 determines whether the guidance target can be univocally determined on the three-dimensional map (S218). If the guidance target cannot be univocally determined (S218/NO), the control unit 120 prompts the user 40 to speak to narrow down the guidance targets (S222). If the guidance target can be univocally determined (S218/YES), the control unit 120 ends the guidance target identification process.

以上、対象案内制御装置１０の構成要素の全部が車両２０には備えられず、サーバ装置等により実現される場合の対象案内制御システム１における処理の流れの例について説明した。なお、対象案内制御システム１における処理の流れはかかる例に限定されない。例えば、対象案内制御装置１０の構成要素の全部が車両２０に備えられる場合、対象案内制御システム１は、ネットワーク３０を介さずに処理を行い得る。 The above describes an example of the processing flow in the object guidance control system 1 when not all of the components of the object guidance control device 10 are provided in the vehicle 20, but are realized by a server device or the like. Note that the processing flow in the object guidance control system 1 is not limited to this example. For example, when all of the components of the object guidance control device 10 are provided in the vehicle 20, the object guidance control system 1 can perform processing without going through the network 30.

以上説明したように、本実施形態に係る対象案内制御装置１０は、移動している車両２０の車内のユーザ４０の音声とユーザ４０の動作に基づき、車両２０の車外風景から検出された案内対象の候補群とユーザ４０が指定した案内対象とを結び付ける。これにより、対象案内制御装置１０は、移動している車両２０の車外のランドマークを案内対象の候補として検出することができる。
また、本実施形態に係る対象案内制御装置１０は、車両２０の車内のユーザ４０の音声とユーザ４０の動作の両方を用いて案内対象の候補を検出する。これにより、対象案内制御装置１０は、案内対象の検出の精度を向上することができる。
また、本実施形態に係る対象案内制御装置１０は、車両２０の位置情報と３次元地図に基づき、３次元地図上で案内対象を一義的に特定する。これにより、対象案内制御装置１０は、案内対象の特定の精度を向上し、さらに、ユーザ４０が指定した案内対象に関する案内の精度も向上することができる。
したがって、本実施形態に係る対象案内制御装置１０は、車両２０に乗車しているユーザ４０が案内を求める対象として指定した案内対象に関する案内の精度を向上することができる。 As described above, the object guidance control device 10 according to this embodiment links a group of candidate guidance targets detected from the scenery outside the vehicle 20 with a guidance target designated by the user 40, based on the voice and actions of the user 40 inside the moving vehicle 20. This allows the object guidance control device 10 to detect landmarks outside the moving vehicle 20 as candidate guidance targets.
Furthermore, the object guidance control device 10 according to the present embodiment detects candidates for guidance targets using both the voice of the user 40 inside the vehicle 20 and the actions of the user 40. This allows the object guidance control device 10 to improve the accuracy of detection of guidance targets.
Furthermore, the object guidance control device 10 according to this embodiment uniquely identifies the guidance object on the 3D map based on the position information of the vehicle 20 and the 3D map. This enables the object guidance control device 10 to improve the accuracy of identifying the guidance object, and further improve the accuracy of guidance regarding the guidance object designated by the user 40.
Therefore, the target guidance control device 10 according to this embodiment can improve the accuracy of guidance regarding a guidance target designated by the user 40 in the vehicle 20 as a target for which guidance is requested.

（変形例）
以上、本発明の実施形態について説明した。続いて、本発明の実施形態の変形例について説明する。なお、以下に説明する各変形例は、単独で本発明の実施形態に適用されてもよいし、組み合わせで本発明の実施形態に適用されてもよい。また、各変形例は、本発明の実施形態で説明した構成に代えて適用されてもよいし、本発明の各実施形態で説明した構成に対して追加的に適用されてもよい。 (Modification)
The embodiments of the present invention have been described above. Next, modified examples of the embodiments of the present invention will be described. Each modified example described below may be applied alone to the embodiments of the present invention, or may be applied in combination to the embodiments of the present invention. Furthermore, each modified example may be applied in place of the configuration described in the embodiments of the present invention, or may be applied in addition to the configuration described in each embodiment of the present invention.

（第１の変形例）
上述した実施形態では、対象案内制御装置１０がユーザ４０に指定された案内対象に関する案内の制御を行う例について説明したが、かかる例に限定されない。車両２０の周囲にユーザ４０が興味を持ちそうなランドマークがある場合、対象案内制御装置１０は、ユーザ４０からの指定がなくても当該ランドマークを案内対象として案内の制御を行ってもよい。ユーザ４０が興味を持ちそうなランドマークは、例えば、車両２０の位置情報やユーザ４０のプロファイル情報等に基づき検出される。かかる構成により、ユーザ４０に対して、ユーザ４０に応じた柔軟な案内を行うことができる。 (First Modification)
In the above-described embodiment, an example has been described in which the target guidance control device 10 controls guidance regarding a guidance target specified by the user 40, but the present invention is not limited to such an example. If there is a landmark around the vehicle 20 that is likely to interest the user 40, the target guidance control device 10 may control guidance to the landmark as a guidance target even if it is not specified by the user 40. The landmark that is likely to interest the user 40 is detected based on, for example, position information of the vehicle 20 or profile information of the user 40. With such a configuration, flexible guidance can be provided to the user 40 according to the user 40.

（第２の変形例）
上述した実施形態では、対象案内制御装置１０が車両２０の出力装置に案内情報を出力させる例について説明したが、かかる例に限定されない。例えば、検出部１２２が検出した自然言語から、ユーザ４０が経路変更を求めていることを示すボイスコマンドが検出された場合、案内情報は車両２０の制御部２３０に対して出力されてもよい。この時、制御部２３０には、変更後の経路を示す経路情報が案内情報として入力される。そして、制御部２３０は、入力された経路情報に基づき、車両２０が変更後の経路に沿って移動するように、車両２０の自動運転を制御する。 (Second Modification)
In the above-described embodiment, an example has been described in which the object guidance control device 10 causes the output device of the vehicle 20 to output guidance information, but the present invention is not limited to such an example. For example, when a voice command indicating that the user 40 is requesting a route change is detected from the natural language detected by the detection unit 122, the guidance information may be output to the control unit 230 of the vehicle 20. At this time, route information indicating the changed route is input to the control unit 230 as the guidance information. Then, the control unit 230 controls the automatic driving of the vehicle 20 based on the input route information so that the vehicle 20 moves along the changed route.

（第３の変形例）
上述した実施形態では、出力制御部１２６がメタデータに基づき案内情報を生成する例について説明したが、係る例に限定されない。出力制御部１２６は、ウェブサービスから案内対象に関する情報を取得し、取得した情報に基づき案内情報を生成してもよい。 (Third Modification)
In the above embodiment, an example in which the output control unit 126 generates guidance information based on metadata has been described, but the present invention is not limited to this example. The output control unit 126 may obtain information on the guidance target from a web service and generate guidance information based on the obtained information.

以上、本発明の実施形態の変形例について説明した。なお、上述した実施形態における対象案内制御装置１０をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＦＰＧＡ（Field Programmable Gate Array）等のプログラマブルロジックデバイスを用いて実現されるものであってもよい。 The above describes a modified embodiment of the present invention. The object guidance control device 10 in the above embodiment may be realized by a computer. In that case, a program for realizing this function may be recorded in a computer-readable recording medium, and the program recorded in the recording medium may be read into a computer system and executed to realize the function. Note that the term "computer system" here includes hardware such as an OS and peripheral devices. Furthermore, the term "computer-readable recording medium" refers to portable media such as flexible disks, optical magnetic disks, ROMs, and CD-ROMs, and storage devices such as hard disks built into a computer system. Furthermore, the term "computer-readable recording medium" may include a medium that dynamically holds a program for a short period of time, such as a communication line when a program is transmitted via a network such as the Internet or a communication line such as a telephone line, and a medium that holds a program for a certain period of time, such as a volatile memory inside a computer system that is a server or client in such a case. Furthermore, the above program may be a program for realizing a part of the above-mentioned function, or may be a program that can realize the above-mentioned function in combination with a program already recorded in the computer system, or may be a program that is realized using a programmable logic device such as an FPGA (Field Programmable Gate Array).

以上、図面を参照してこの発明の実施形態について詳しく説明してきたが、具体的な構成は上述のものに限られることはなく、この発明の要旨を逸脱しない範囲内において様々な設計変更等をすることが可能である。 The above describes in detail an embodiment of the present invention with reference to the drawings, but the specific configuration is not limited to the above, and various design changes can be made without departing from the spirit of the present invention.

１…対象案内制御システム、１０…対象案内制御装置、２０…車両、３０…ネットワーク、１１０…通信部、１２０…制御部、１２２…検出部、１２４…特定部、１２６…出力制御部、１３０…記憶部、２１０…取得部、２１２…マイク、２１４…車内カメラ、２１６…車外カメラ、２２０…通信部、２３０…制御部、２４０…案内情報出力部 1... object guidance control system, 10... object guidance control device, 20... vehicle, 30... network, 110... communication unit, 120... control unit, 122... detection unit, 124... identification unit, 126... output control unit, 130... storage unit, 210... acquisition unit, 212... microphone, 214... in-vehicle camera, 216... outside-vehicle camera, 220... communication unit, 230... control unit, 240... guidance information output unit

Claims

a detection unit that detects candidates for a guidance target designated by a user based on a voice of the user riding in the vehicle, an image captured inside the vehicle in which a pose of the user designating a guidance target for which the user requests guidance, and an image captured around the vehicle;
an identification unit that identifies a position and a direction of the vehicle in spatial information around the vehicle based on position information of the vehicle, identifies a direction of the guidance target in the spatial information based on the identified direction of the vehicle, determines whether the guidance target can be uniquely determined from the candidates based on the identified direction, identifies the guidance target if it can be determined, and supplies a voice uttered by the user to the detection unit by prompting the user to speak to narrow down the guidance targets if it cannot be determined;
an output control unit that outputs output information related to the identified guidance target based on the spatial information corresponding to the identified guidance target;
An object guidance control device comprising:

The object guidance control device according to claim 1, wherein the detection unit detects a group of candidates for the guidance target by analyzing the captured image, and detects the candidate that exists in the direction from the group of candidates based on a natural language related to the guidance target detected by analyzing the voice and the direction detected by analyzing a pose of the user included in the captured image inside the vehicle.

The target guidance control device according to claim 1 or 2, wherein the output control unit acquires metadata of the identified guidance target from the spatial information and generates the output information based on the metadata.

Detecting candidates for a guidance target designated by a user based on a voice of the user riding in the vehicle, an in-vehicle captured image in which a pose of the user designating a guidance target for which the user requests guidance is captured, and a captured image of the surroundings of the vehicle;
specifying a position and orientation of the vehicle in spatial information around the vehicle based on position information of the vehicle, specifying a direction of the guidance target in the spatial information based on the specified orientation of the vehicle, determining whether or not the guidance target can be uniquely determined from the candidates based on the specified direction, specifying the guidance target if it can be determined, and supplying a voice uttered by the user to the detection unit by prompting the user to speak to narrow down the guidance targets if it cannot be determined ;
outputting output information related to the identified guidance target based on the spatial information corresponding to the guidance target;
23. An object guidance control method executed by a processor, comprising:

Computer,
a detection unit that detects candidates for a guidance target designated by a user based on a voice of the user riding in the vehicle, an image captured inside the vehicle in which a pose of the user designating a guidance target for which the user requests guidance, and an image captured around the vehicle;
an identification unit that identifies a position and a direction of the vehicle in spatial information around the vehicle based on position information of the vehicle, identifies a direction of the guidance target in the spatial information based on the identified direction of the vehicle, determines whether the guidance target can be uniquely determined from the candidates based on the identified direction, identifies the guidance target if it can be determined, and supplies a voice uttered by the user to the detection unit by prompting the user to speak to narrow down the guidance targets if it cannot be determined;
an output control unit that outputs output information related to the identified guidance target based on the spatial information corresponding to the identified guidance target;
A program to function as a