JP7699471B2

JP7699471B2 - Voice guidance device, voice guidance method, and voice guidance program

Info

Publication number: JP7699471B2
Application number: JP2021088998A
Authority: JP
Inventors: 彰堀口; 西蔵羽山; 啓祐安藤; 桂相澤
Original assignee: Sohgo Security Services Co Ltd
Current assignee: Sohgo Security Services Co Ltd
Priority date: 2021-05-27
Filing date: 2021-05-27
Publication date: 2025-06-27
Anticipated expiration: 2041-05-27
Also published as: JP2022181825A

Description

本発明は、音声案内装置、音声案内方法及び音声案内プログラムに関する。 The present invention relates to a voice guidance device, a voice guidance method, and a voice guidance program.

今日において、例えば駅、施設等の構内又は構外において、例えば「５ｍ先に改札口があります」又は「３ｍ先に横断歩道があります」等の音声案内が、視覚に障害のあるユーザ等に対して提供されている。 Nowadays, for example, inside or outside a station or facility, voice guidance such as "The ticket gate is 5 meters ahead" or "The pedestrian crossing is 3 meters ahead" is provided to visually impaired users.

また、特許文献１（特開２０２０－１２５９０７公報）には、駅構内を通行するユーザ（視覚障害者）に向けて、そのユーザの移動方向に応じた音声案内を行う視覚障害者用音声案内システムが開示されている。これにより、駅構内を通行するユーザに対して不要となる音声案内を行わないようにすることができ、無駄な音声案内の出力を軽減できる。 Patent Document 1 (JP Patent Publication 2020-125907A) discloses a voice guidance system for visually impaired people that provides voice guidance to a user (visually impaired person) passing through a station according to the direction of the user's movement. This makes it possible to avoid providing unnecessary voice guidance to a user passing through a station, thereby reducing the output of unnecessary voice guidance.

特開２０２０－１２５９０７公報JP2020-125907A

しかし、特許文献１の視覚障害者用音声案内システムを含め、従来の音声案内システムでは、視覚に障害のある複数のユーザが近い位置に存在していた場合、聴取した音声案内が、自分に対する音声案内なのか、又は、視覚に障害のある他のユーザに対する音声案内なのか、認識しづらいという問題があった。 However, with conventional voice guidance systems, including the voice guidance system for visually impaired people described in Patent Document 1, when multiple visually impaired users are located close to each other, it is difficult for them to know whether the voice guidance they are hearing is intended for them or for other visually impaired users.

例えば、視覚に障害のある複数のユーザが同じ場所に位置しており、一方のユーザと他方のユーザは、それぞれ異なる方向に歩行していたとする。この状況で、一方のユーザに対して「５ｍ直進した場所に改札口があります」との音声案内を行ったとする。この一方のユーザに対して行った音声案内が、一方のユーザとは反対方向に向かって歩行している他方のユーザにより、自分に対する音声案内だと誤認識されると、他方のユーザは、５ｍ直進しても改札口には到着できない不都合を生ずる。 For example, suppose multiple visually impaired users are located in the same place, and one user and the other user are walking in different directions. In this situation, suppose one user is given voice guidance such as "The ticket gate is five meters straight ahead." If the voice guidance given to one user is mistaken by the other user, who is walking in the opposite direction to the first user, as being intended for him/herself, the other user will be inconvenienced and will not be able to reach the ticket gate even if he/she walks five meters straight ahead.

本発明は、上述の課題に鑑みてなされたものであり、視覚に障害のあるユーザに対して、出力されている音声案内が自分に対する音声案内であることを認識させ、混同させることなく、ユーザ毎に音声案内を有効に機能させることを可能とした音声案内装置、音声案内方法及び音声案内プログラムの提供を目的とする。 The present invention was made in consideration of the above-mentioned problems, and aims to provide a voice guidance device, a voice guidance method, and a voice guidance program that enable visually impaired users to recognize that the voice guidance being output is intended for them, and to effectively provide voice guidance for each user without causing confusion.

上述した課題を解決し、目的を達成するために、本発明は、カメラ装置で撮像された撮像画像を解析することで、視覚に障害のあるユーザを検出すると共に、視覚に障害のあるユーザの少なくとも現在位置を検出する検出部と、検出部により、視覚に障害のあるユーザが複数検出された場合に、各ユーザに対して、それぞれ割り当てた異なる声質の案内音声データに基づいて案内音声を生成する割り当て部と、検出部により検出された、視覚に障害のある各ユーザの少なくとも現在位置に対応する音声出力装置を介して、各ユーザに割り当てた異なる声質の案内音声データに基づいて生成された案内音声を出力制御する出力制御部と、を備える。 In order to solve the above-mentioned problems and achieve the object, the present invention comprises a detection unit that detects visually impaired users and detects at least the current position of the visually impaired users by analyzing an image captured by a camera device, an assignment unit that generates guidance voices based on guidance voice data with different voice qualities assigned to each user when multiple visually impaired users are detected by the detection unit, and an output control unit that controls the output of the guidance voices generated based on the guidance voice data with different voice qualities assigned to each user via a voice output device corresponding to at least the current position of each visually impaired user detected by the detection unit.

本発明によれば、視覚に障害のあるユーザに対して、出力されている案内音声が自分に対する音声案内であることを認識させることができる。このため、混同させることなく、ユーザ毎に音声案内を有効に機能させることができる。 According to the present invention, visually impaired users can be made aware that the voice guidance being output is intended for them. This allows voice guidance to function effectively for each user without causing confusion.

図１は、実施の形態となる音声案内システムのシステム構成の一例を示す図である。FIG. 1 is a diagram showing an example of a system configuration of a voice guidance system according to an embodiment of the present invention. 図２は、実施の形態の音声案内システムに設けられている解析装置のブロック図である。FIG. 2 is a block diagram of an analysis device provided in the voice guidance system according to the embodiment. 図３は、解析装置に記憶されている地図データのフォーマットの一例を示す図である。FIG. 3 is a diagram showing an example of a format of map data stored in the analysis device. 図４は、解析装置に記憶されている案内音声データのフォーマットの一例を示す図である。FIG. 4 is a diagram showing an example of a format of guidance voice data stored in the analysis device. 図５は、ユーザ情報テーブルの模式図である。FIG. 5 is a schematic diagram of a user information table. 図６は、解析装置のＣＰＵが音声案内プログラムを実行することで実現される各機能の機能ブロック図である。FIG. 6 is a functional block diagram of each function realized by the CPU of the analysis device executing the voice guidance program. 図７は、実施の形態の音声案内システムの音声案内動作の前半の流れを示すフローチャートである。FIG. 7 is a flowchart showing the flow of the first half of the voice guidance operation of the voice guidance system according to the embodiment. 図８は、実施の形態の音声案内システムの音声案内動作の後半の流れを示すフローチャートである。FIG. 8 is a flowchart showing the flow of the latter half of the voice guidance operation of the voice guidance system according to the embodiment. 図９は、実施の形態の音声案内システムにおける音声案内動作を説明するための第１の模式図である。FIG. 9 is a first schematic diagram for explaining a voice guidance operation in the voice guidance system according to the embodiment. 図１０は、実施の形態の音声案内システムにおける音声案内動作を説明するための第２の模式図である。FIG. 10 is a second schematic diagram for explaining the voice guidance operation in the voice guidance system according to the embodiment. 図１１は、実施の形態の音声案内システムにおける音声案内動作を説明するための第３の模式図である。FIG. 11 is a third schematic diagram for explaining the voice guidance operation in the voice guidance system according to the embodiment. 図１２は、実施の形態の音声案内システムにおける音声案内動作を説明するための第４の模式図である。FIG. 12 is a fourth schematic diagram for explaining the voice guidance operation in the voice guidance system according to the embodiment.

以下、図面を参照して、本発明を提供した実施の形態の音声案内システムの説明をする。 The following describes an embodiment of the voice guidance system according to the present invention with reference to the drawings.

（システム構成）
図１は、実施の形態の音声案内システムのシステム構成を示す図である。この図１に示すように、音声案内システムは、複数の端末装置６０と、例えば管理室等の設けられた管理者端末装置である解析装置３とを、インターネット等の広域網又はＬＡＮ（Local Area Network）等のプライベート網を介して相互に接続することで構成されている。 (System Configuration)
Fig. 1 is a diagram showing a system configuration of a voice guidance system according to an embodiment. As shown in Fig. 1, the voice guidance system is configured by connecting a plurality of terminal devices 60 and an analysis device 3, which is a manager terminal device installed in, for example, a management room, via a wide area network such as the Internet or a private network such as a LAN (Local Area Network).

端末装置６０は、例えばユーザが通行する通路に沿って所定の間隔で設けられる等のように、地理的に異なる位置に設けられている。各端末装置６０は、それぞれカメラ装置１及びスピーカ装置２を備えている。カメラ装置１は、例えば定点カメラ装置となっており、固定された撮像領域内の通路等を通行するユーザを撮像する。なお、カメラ装置１は、撮像領域を変更可能なカメラ装置でもよい。スピーカ装置２は、音声出力装置の一例であり、案内音声を出力する。 The terminal devices 60 are provided at different geographical locations, for example at predetermined intervals along a passageway through which users pass. Each terminal device 60 includes a camera device 1 and a speaker device 2. The camera device 1 is, for example, a fixed camera device, and captures images of users passing through a passageway or the like within a fixed imaging area. The camera device 1 may also be a camera device whose imaging area is changeable. The speaker device 2 is an example of an audio output device, and outputs audio guidance.

解析装置３は、各端末装置６０のカメラ装置１の撮像画像を解析して、視覚に障害のあるユーザの特徴を解析する。そして、解析装置３は、視覚に障害のあるユーザ毎に異なる音声を割り当て、ユーザが移動する位置に設けられているスピーカ装置２から案内音声を出力する。これにより、近接する位置に視覚に障害のあるユーザが複数存在している場合でも、混同させることなくユーザ毎に音声案内を行うことができる。 The analysis device 3 analyzes the images captured by the camera device 1 of each terminal device 60 to analyze the characteristics of visually impaired users. The analysis device 3 then assigns a different voice to each visually impaired user and outputs a guidance voice from the speaker device 2 provided in the position where the user moves. This makes it possible to provide voice guidance to each user without confusion, even if there are multiple visually impaired users in close proximity.

（解析装置のハードウェア構成）
図２は、解析装置３のハードウェア構成を示すブロック図である。この図２に示すように、解析装置３は、ＣＰＵ（Central Processing Unit）１１、ＲＯＭ（Read Only Memory）１２、ＲＡＭ（Random Access Memory）１３、及び、通信部１４を備えている。また、解析装置３は、ＨＤＤ（Hard Disk Drive）１５、入出力インターフェース（入出力Ｉ／Ｆ）１６、及び、通信インターフェース（通信Ｉ／Ｆ）１７を備えている。 (Hardware configuration of the analysis device)
Fig. 2 is a block diagram showing a hardware configuration of the analysis device 3. As shown in Fig. 2, the analysis device 3 includes a CPU (Central Processing Unit) 11, a ROM (Read Only Memory) 12, a RAM (Random Access Memory) 13, and a communication unit 14. The analysis device 3 also includes a HDD (Hard Disk Drive) 15, an input/output interface (input/output I/F) 16, and a communication interface (communication I/F) 17.

通信部１４は、インターネット又はＬＡＮ等のネットワークを介して有線通信の他、ブルートゥース（登録商標）又はＷｉ－Ｆｉ（登録商標）等の無線通信を行う。ＨＤＤ１５には、視覚に障害があるユーザに対して音声案内を行うための音声案内プログラム、地図データ５０、案内音声データ５１、及び、ユーザ情報テーブル５２が記憶されている。 The communication unit 14 performs wired communication via a network such as the Internet or a LAN, as well as wireless communication such as Bluetooth (registered trademark) or Wi-Fi (registered trademark). The HDD 15 stores a voice guidance program for providing voice guidance to visually impaired users, map data 50, guidance voice data 51, and a user information table 52.

地図データ５０としては、図３に示すように、音声案内を行う地理的範囲（以降、同じ意味で「サービスエリア」という表現も併せて用いる）に位置する施設、テナント、改札口、エレベータ装置等の施設名又は名称を含む施設情報及び位置情報と、その施設に対する音声案内を行う際の条件を示す出力条件情報とが関連付けられて記憶されている。 As shown in FIG. 3, the map data 50 stores facility information and location information, including the names of facilities, tenants, ticket gates, elevator devices, etc., located in the geographic area where voice guidance is provided (hereinafter, the term "service area" is also used to mean the same thing), in association with output condition information indicating the conditions for providing voice guidance for the facilities.

案内音声データ５１は、地図データ５０に基づいて音声案内を行うためのデータであり、聴覚上異なる複数の案内音声データ５１が記憶されている。さらに具体的には、案内音声データ５１は、例えば「わたくし」、「太郎」、「花子」、「５」、「メートル」、「先」、「に」、「改札」、「口」、「が」、「あります」、「黒」、「白」、「色」、「の」、「カーディガン」、「を」、「コート」、「野球」、「帽子」、「ロング」、「ショート」、「ヘアー」等の各種単語毎の音声データである分割音声データとして記憶されている。 The guidance voice data 51 is data for providing voice guidance based on the map data 50, and multiple audibly different guidance voice data 51 are stored. More specifically, the guidance voice data 51 is stored as divided voice data, which is voice data for each of various words, such as "I," "Taro," "Hanako," "5," "meters," "ahead," "to," "ticket gate," "exit," "is," "there," "black," "white," "color," "of," "cardigan," "for," "coat," "baseball," "hat," "long," "short," and "hair."

一例ではあるが、図４に示すように、聴覚上異なる複数の案内音声データ５１には、各々を一意に表す案内音声ＩＤが付与され、話者が男性の案内音声データ５１及び話者が女性の案内音声データ５１に分けて記憶されている。また、話者の性別が同じものであっても、聴覚上、聞き分けが容易な案内音声データ５１が記憶されている。また、話者が男性の各案内音声データ５１には、音声周波数を示す情報、音圧、音高及び発話速度を示す情報が、それぞれ記憶されている。話者が女性の案内音声データ５１も同様であり、聴覚上、聞き分けが容易な案内音声データ５１が記憶されている。また、話者が女性の各案内音声データ５１にも、音声周波数を示す情報、音圧、音高及び発話速度を示す情報が、それぞれ記憶されている。 As an example, as shown in FIG. 4, a plurality of aurally different guidance voice data 51 are assigned a unique guidance voice ID, and are stored as male speaker guidance voice data 51 and female speaker guidance voice data 51. Even if the speakers are the same gender, the stored guidance voice data 51 is easy to distinguish aurally. Each male speaker guidance voice data 51 stores information indicating the voice frequency, sound pressure, pitch, and speech rate. Similarly, female speaker guidance voice data 51 stores guidance voice data 51 that is easy to distinguish aurally. Each female speaker guidance voice data 51 stores information indicating the voice frequency, sound pressure, pitch, and speech rate.

実施の形態の音声案内システムの場合、視覚に障害があるユーザが複数存在する場合、図４に示すような話者の性別、音声周波数、音圧、音高及び発話速度等をファクタとして用い、視覚に障害があるユーザがそれぞれ聞き分け容易な案内音声データ５１を割り当てて音声案内を行う。 In the case of the voice guidance system of the embodiment, when there are multiple visually impaired users, factors such as the gender of the speaker, voice frequency, sound pressure, pitch, and speaking speed as shown in Figure 4 are used, and voice guidance is provided by assigning guidance voice data 51 that is easy for visually impaired users to distinguish.

図５に、ユーザ情報テーブル５２の模式図を示す。この図５に示すように、ユーザ情報テーブル５２は、視覚に障害があるユーザとそのユーザ毎に割り当てられた案内音声データ５１との対応を記憶するテーブルである。詳しくは後述するが、ユーザＩＤ、そのユーザの特徴（人物特徴）の情報、及びそのユーザに割り当てた案内音声データ５１のＩＤ（案内音声ＩＤ）の対応を記憶する。 Figure 5 shows a schematic diagram of the user information table 52. As shown in Figure 5, the user information table 52 is a table that stores the correspondence between visually impaired users and the guidance voice data 51 assigned to each user. As will be described in detail later, the table stores the correspondence between the user ID, information on the characteristics (personality characteristics) of the user, and the ID (guidance voice ID) of the guidance voice data 51 assigned to the user.

入出力Ｉ／Ｆ１６には、必要な場合に、表示部１８及び操作部１９が接続される。通信Ｉ／Ｆ１７は、必要な場合に、ネットワークケーブルを介してネットワーク５に接続される。 The input/output I/F 16 is connected to a display unit 18 and an operation unit 19 when necessary. The communication I/F 17 is connected to the network 5 via a network cable when necessary.

（解析装置の機能構成）
図６は、ＣＰＵ１１がＨＤＤ１５に記憶されている音声案内プログラムを実行することでソフトウェア的に実現される各機能の機能ブロック図である。この図６に示すように、ＣＰＵ１１は、音声案内プログラムを実行することで、映像取得部２１、地図データ取得部２２、画像解析部２３、出力音声割り当て部２４、通信制御部２５、スピーカ切り替え部２６及び緊急処理部２７として機能する。 (Functional configuration of the analysis device)
6 is a functional block diagram of each function realized by software when the CPU 11 executes the voice guidance program stored in the HDD 15. As shown in this Fig. 6, the CPU 11 executes the voice guidance program to function as a video acquisition unit 21, a map data acquisition unit 22, an image analysis unit 23, an output voice allocation unit 24, a communication control unit 25, a speaker switching unit 26, and an emergency processing unit 27.

映像取得部２１は、各カメラ装置１で撮像されている地理的範囲を往来するユーザの撮像画像を取得する。地図データ取得部２２は、各カメラ装置１で撮像されている地理的範囲の経緯度に対応する地図データ５０をＨＤＤ１５から取得する。画像解析部２３は、検出部の一例であり、各カメラ装置１で撮像された撮像画像に基づいて、ユーザの人物特徴を解析し、また、視覚障害の有無等を判断する。 The video acquisition unit 21 acquires captured images of users traveling within the geographical area captured by each camera device 1. The map data acquisition unit 22 acquires map data 50 corresponding to the longitude and latitude of the geographical area captured by each camera device 1 from the HDD 15. The image analysis unit 23 is an example of a detection unit, and analyzes the personal characteristics of the user based on the captured images captured by each camera device 1, and also determines whether or not the user has a visual impairment, etc.

画像解析部２３は、視覚障害のあるユーザであると判断された場合に、解析されたそのユーザの特徴（人物特徴）の情報と一致する特徴（人物特徴）の情報が、ユーザ情報テーブル５２に記憶されているか否かを判断する。ユーザ情報テーブル５２に記憶されていない場合、画像解析部２３は、解析したそのユーザを一意に表すユーザＩＤを発行し、発行したユーザＩＤと、解析したそのユーザの特徴（人物特徴）の情報とを対応付けて、ユーザ情報テーブル５２に記憶する。 When it is determined that the user is visually impaired, the image analysis unit 23 determines whether or not information on characteristics (personal characteristics) that matches the analyzed information on the user's characteristics (personal characteristics) is stored in the user information table 52. If it is not stored in the user information table 52, the image analysis unit 23 issues a user ID that uniquely represents the analyzed user, and stores the issued user ID in the user information table 52 in association with the analyzed information on the user's characteristics (personal characteristics).

また、画像解析部２３は、カメラ装置１で撮像されている撮像画像の各座標に対応する経緯度に基づいて、そのユーザの現在位置を検出する。さらに、画像解析部２３は、例えば数フレームの一連の撮像画像に写っている同じユーザの現在位置の差から、そのユーザの移動方向を検出する。また、画像解析部２３は、視覚に障害のあるユーザが、例えば白杖を頭上５０ｃｍ程度に掲げる動作、又は、白杖をユーザの顔の前あたりで左右に振る動作等の、「助けを求める動き」の有無を検出する。 The image analysis unit 23 also detects the current position of the user based on the longitude and latitude corresponding to each coordinate of the captured image captured by the camera device 1. Furthermore, the image analysis unit 23 detects the direction of movement of the user from the difference in the current positions of the same user captured in a series of captured images over several frames, for example. The image analysis unit 23 also detects the presence or absence of a "movement calling for help" by a visually impaired user, such as raising a white cane about 50 cm above the head or waving the white cane from side to side in front of the user's face.

出力音声割り当て部２４は、割り当て部の一例であり、画像解析部２３により検出された、視覚に障害のあるユーザに対して、ＨＤＤ１５に記憶されている案内音声データ５１を割り当てる。具体的には、画像解析部２３により、新規にユーザが登録された場合、出力音声割り当て部２４は、ＨＤＤ１５に記憶されている案内音声データ５１のうち、既に割り当てられている案内音声とは異なる声質の別の案内音声データ５１を割り当てる。そして、出力音声割り当て部２４は、その案内音声データ５１のＩＤ（案内音声ＩＤ）を新規ユーザの案内音声としてユーザ情報テーブル５２に記憶する。さらに、出力音声割り当て部２４は、ユーザ情報テーブル５２を参照し、各ユーザに割り当てた、異なる声質の案内音声データ５１に基づいて、各ユーザ用の案内音声を生成する。 The output voice allocation unit 24 is an example of an allocation unit, and allocates guidance voice data 51 stored in the HDD 15 to a visually impaired user detected by the image analysis unit 23. Specifically, when a new user is registered by the image analysis unit 23, the output voice allocation unit 24 allocates another guidance voice data 51 with a voice quality different from the guidance voice already allocated from the guidance voice data 51 stored in the HDD 15. Then, the output voice allocation unit 24 stores the ID (guidance voice ID) of the guidance voice data 51 in the user information table 52 as the guidance voice for the new user. Furthermore, the output voice allocation unit 24 refers to the user information table 52, and generates a guidance voice for each user based on the guidance voice data 51 with a different voice quality allocated to each user.

スピーカ切り替え部２６は、出力制御部の一例であり、視覚に障害のあるユーザの現在位置に応じて、案内音声を出力するスピーカ装置２を切り替え制御し、そのユーザに割り当てられた声質の案内音声データ５１で生成された案内音声を出力する。これにより、各ユーザ用に割り当てられた声質の案内音声で、各ユーザの移動に追従して音声案内が行われる。通信制御部２５は、各端末装置６０と通信を行い、カメラ装置１で撮像された撮像画像の取得及びスピーカ装置２に対する案内音声の送信等を行う。緊急処理部２７は、画像解析部２３において視覚に障害があるユーザ等から助けを求める動作が解析された際に、この解析結果に基づいて管理者等に緊急通知を行う。また、緊急処理部２７は、助けを求めているユーザの位置に対応するスピーカ装置２を介して、係員が至急救助に向かう旨のメッセージの出力制御等を行う。 The speaker switching unit 26 is an example of an output control unit, and controls switching of the speaker device 2 that outputs the guidance voice according to the current location of the visually impaired user, and outputs the guidance voice generated by the guidance voice data 51 of the voice quality assigned to the user. As a result, the voice guidance is performed following the movement of each user with the guidance voice of the voice quality assigned to each user. The communication control unit 25 communicates with each terminal device 60, and acquires the captured image captured by the camera device 1 and transmits the guidance voice to the speaker device 2. When the image analysis unit 23 analyzes the action of the visually impaired user, etc., to request help, the emergency processing unit 27 issues an emergency notification to the administrator, etc. based on the analysis result. The emergency processing unit 27 also controls the output of a message that a staff member will immediately come to the rescue via the speaker device 2 corresponding to the location of the user requesting help.

なお、この例では、映像取得部２１～緊急処理部２７は、音声案内プログラムにより、ソフトウェアで実現することとした。しかし、これらのうち全部又は一部を、ＩＣ（Integrated Circuit）等のハードウェアで実現してもよい。 In this example, the video acquisition unit 21 to the emergency processing unit 27 are implemented as software using a voice guidance program. However, all or part of these may be implemented as hardware such as an integrated circuit (IC).

また、音声案内プログラムは、インストール可能な形式又は実行可能な形式のファイル情報でＣＤ－ＲＯＭ、フレキシブルディスク（ＦＤ）などのコンピュータ装置で読み取り可能な記録媒体に記録して提供してもよい。また、音声案内プログラムは、ＣＤ－Ｒ、ＤＶＤ（Digital Versatile Disc）、ブルーレイ（登録商標）ディスク、半導体メモリ等のコンピュータ装置で読み取り可能な記録媒体に記録して提供してもよい。また、音声案内プログラムは、インターネット等のネットワーク経由でインストールするかたちで提供してもよい。また、音声案内プログラムは、機器内のＲＯＭ等に予め組み込んで提供してもよい。 The voice guidance program may be provided by recording it in the form of file information in an installable or executable format on a recording medium that can be read by a computer device, such as a CD-ROM or a flexible disk (FD). The voice guidance program may be provided by recording it on a recording medium that can be read by a computer device, such as a CD-R, a DVD (Digital Versatile Disc), a Blu-ray (registered trademark) disc, or a semiconductor memory. The voice guidance program may be provided by installing it via a network such as the Internet. The voice guidance program may be provided by being pre-installed in a ROM or the like within the device.

（音声案内動作）
図７及び図８は、実施の形態の音声案内システムにおける音声案内動作の流れを示すフローチャートである。このうち、図７は、音声案内動作の前半の流れを示すフローチャートである。また、図８は、音声案内動作の後半の流れを示すフローチャートである。 (Voice guidance operation)
7 and 8 are flowcharts showing the flow of the voice guidance operation in the voice guidance system of the embodiment. Of these, Fig. 7 is a flowchart showing the flow of the first half of the voice guidance operation, and Fig. 8 is a flowchart showing the flow of the second half of the voice guidance operation.

（ステップＳ１）
まず、図７のフローチャートにおいて、ステップＳ１では、映像取得部２１が、各カメラ装置１で撮像されたユーザ（通行人）の撮像画像を取得する。 (Step S1)
First, in step S1 of the flowchart in FIG. 7, the video acquisition unit 21 acquires images of users (passersby) captured by each camera device 1.

（ステップＳ２）
ステップＳ１の後、ステップＳ２では、画像解析部２３が、ステップＳ１で取得された撮像画像からユーザの特徴（人物特徴）、現在位置、移動方向を解析する。 (Step S2)
After step S1, in step S2, the image analysis unit 23 analyzes the user's features (personal features), current location, and moving direction from the captured image acquired in step S1.

ユーザの特徴としては、画像解析部２３は、所定のアルゴリズムを用いて、そのユーザの年齢及び性別を検出する。また、画像解析部２３は、撮像画像を解析することで、そのユーザの服装、服装の色、ハンドバッグ又はリュックサック等の所持品及び所持品の色等の特徴を検出する。これらの特徴を人物特徴と呼ぶ。 As for the user's characteristics, the image analysis unit 23 detects the user's age and gender using a predetermined algorithm. In addition, the image analysis unit 23 detects the user's characteristics such as clothing, clothing color, belongings such as a handbag or backpack, and the colors of the belongings by analyzing the captured image. These characteristics are called personal characteristics.

また、画像解析部２３は、カメラ装置１で撮像されている撮像画像の各座標に対応する経緯度に基づいて、そのユーザの現在位置を検出する。また、画像解析部２３は、例えば数フレームの一連の撮像画像に写っている同じユーザの現在位置の差から、そのユーザの移動方向を検出する。 The image analysis unit 23 also detects the current position of the user based on the longitude and latitude corresponding to each coordinate of the captured image captured by the camera device 1. The image analysis unit 23 also detects the direction of movement of the user from the difference in the current positions of the same user captured in a series of captured images, for example, several frames.

（ステップＳ３）
ステップＳ２の後、ステップＳ３において、画像解析部２３が、撮像画像に写っているユーザが、健常者であるか、又は、視覚に障害のあるユーザであるかを判別する。 (Step S3)
After step S2, in step S3, the image analysis unit 23 determines whether the user shown in the captured image is a normal person or a visually impaired user.

一例ではあるが、視覚に障害のあるユーザは、白色の杖である盲人安全杖を所有している。これに対して、視覚に障害は無いが、歩行が困難な老人は、茶色又は黒色等の杖を使用している。このため、画像解析部２３は、撮像画像に写っているユーザが所有している杖の色が白色であるか否かに基づいて、視覚に障害のあるユーザであるか否かを判別する。 As one example, a visually impaired user owns a blind safety cane, which is a white cane. In contrast, an elderly person who is not visually impaired but has difficulty walking uses a brown or black cane, etc. For this reason, the image analysis unit 23 determines whether or not a user in a captured image is visually impaired based on whether or not the cane owned by the user is white.

また、視覚に障害のあるユーザは、障害物の有無等を確認するために、盲人安全杖で地面等を軽く叩きながら歩行するという、独特の動きがある。画像解析部２３は、このような独特な動きの有無も、視覚に障害のあるユーザであるか否かを判別するためのファクタとして用いる。 In addition, visually impaired users have a unique movement of lightly tapping the ground with a blind safety stick as they walk to check for the presence or absence of obstacles. The image analysis unit 23 uses the presence or absence of such a unique movement as a factor for determining whether or not a user is visually impaired.

また、視覚に障害のあるユーザは、盲導犬を連れている場合がある。通常の犬は、首輪又はハーネスに、１本の紐状のリードが取り付けられている。これに対して盲導犬の場合、「Ｕ字型ハーネス」又は「バーハンドル型ハーネス」と呼ばれる、独特の形状のハーネスが装着されている。画像解析部２３は、このようなハーネスの形状も、視覚に障害のあるユーザであるか否かを判別するためのファクタとして用いる。 In addition, visually impaired users may be accompanied by guide dogs. Ordinary dogs are fitted with a collar or harness with a single string-like lead attached. Guide dogs, on the other hand, are fitted with a uniquely shaped harness called a "U-shaped harness" or "bar-handle harness." The image analysis unit 23 also uses the shape of such a harness as a factor for determining whether or not the user is visually impaired.

また、「Ｕ字型ハーネス」又は「バーハンドル型ハーネス」は、多くの場合、白色である。このため、画像解析部２３は、犬に装着されているハーネスの色も、その犬を連れたユーザが、視覚に障害のあるユーザであるか否かを判別するためのファクタとして用いる。 Also, "U-shaped harnesses" or "bar-handle harnesses" are often white in color. For this reason, the image analysis unit 23 also uses the color of the harness worn by the dog as a factor for determining whether the user with the dog is visually impaired.

また、犬を連れての入場が制限される場所に対して、犬を連れて入場している場合、その犬は盲導犬であり、そのユーザは、視覚に障害のあるユーザである可能性が高い。このため、犬を連れての入場が制限される場所に対して、犬を連れているユーザを、画像解析部２３は、視覚に障害のあるユーザとして判別する。 Furthermore, if a user enters a place where entry with dogs is restricted, the dog is likely to be a guide dog and the user is likely to be visually impaired. For this reason, the image analysis unit 23 determines that a user who is bringing a dog to a place where entry with dogs is restricted is a visually impaired user.

また、盲導犬は、多くの場合、ユーザの進行方向に対して左側を歩行することが多い。このため、画像解析部２３は、ユーザの進行方向に対して左側を歩行している犬を連れているか否かということも、視覚に障害のあるユーザであるか否かを判別するためのファクタとして用いる。 In addition, guide dogs often walk on the left side of the user's direction of travel. For this reason, the image analysis unit 23 uses whether or not a user is accompanied by a dog that walks on the left side of the user's direction of travel as a factor for determining whether or not the user is visually impaired.

さらに、画像解析部２３は、サングラスの着用の有無、身障者マークの有無等も、視覚に障害のあるユーザであるか否かを判別するためのファクタとして用いる。 In addition, the image analysis unit 23 also uses factors such as whether or not the user is wearing sunglasses, whether or not there is a handicapped mark, etc., to determine whether or not the user is visually impaired.

（ステップＳ３：Ｎｏ→ステップＳ１へ）
次に、撮像画像に写っているユーザが、健常者のみである場合（ステップＳ３：Ｎｏ）、処理がステップＳ１に戻り、画像解析部２３により、ステップＳ１～ステップＳ３の処理が繰り返し行われる。これに対して、撮像画像に写っているユーザが、視覚に障害のあるユーザであると判別した場合（ステップＳ３：Ｙｅｓ）、ステップＳ４に処理が進む。 (Step S3: No → Go to step S1)
Next, if the users appearing in the captured image are all able-bodied individuals (step S3: No), the process returns to step S1, and the processes of steps S1 to S3 are repeatedly performed by the image analysis unit 23. On the other hand, if it is determined that the users appearing in the captured image are visually impaired users (step S3: Yes), the process proceeds to step S4.

（ステップＳ４）
ステップＳ４では、画像解析部２３が、ステップＳ２において撮像画像を解析することで得た、その視覚に障害があると判別したユーザの特徴（人物特徴）をもとにユーザ情報テーブル５２を参照し、同じ特徴（人物特徴）の情報がユーザＩＤおよび案内音声ＩＤと対応付けて記憶されているか否かによって、そのユーザが既に登録されたユーザか否かを判別する。 (Step S4)
In step S4, the image analysis unit 23 refers to the user information table 52 based on the characteristics (personality characteristics) of the user determined to be visually impaired, obtained by analyzing the captured image in step S2, and determines whether the user is already a registered user based on whether information of the same characteristics (personality characteristics) is stored in correspondence with the user ID and guidance voice ID.

（ステップＳ４：Ｎｏ→ステップＳ１４へ）
ユーザ情報テーブル５２に登録された既知のユーザでない場合は（ステップＳ４：Ｎｏ）、ステップＳ１４に処理が進む。 (Step S4: No → Go to step S14)
If the user is not a known user registered in the user information table 52 (step S4: No), the process proceeds to step S14.

（ステップＳ１４：新規に案内音声ＩＤの割り当て）
ステップＳ１４では、ステップＳ４において既知のユーザでないと判断したユーザに対し、画像解析部２３が新たにユーザＩＤを発行する。画像解析部２３は、発行したユーザＩＤと解析したそのユーザの特徴（人物特徴）の情報を対応づけてユーザ情報テーブル５２に記憶する。また、これと共に、出力音声割り当て部２４が、現在、他のユーザに割り当てられていない声質の案内音声データ５１を割り当てる。そして、出力音声割り当て部２４は、その案内音声データ５１のＩＤ（案内音声ＩＤ）、及び、そのユーザのユーザＩＤ及び特徴（人物特徴）を関連付けてユーザ情報テーブル５２に記憶する。すなわち、出力音声割り当て部２４は、各ユーザ用の案内音声データ５１として、聴覚的に差異のある案内音声データ５１を割り当てる。 (Step S14: Newly assign a guidance voice ID)
In step S14, the image analysis unit 23 issues a new user ID to the user determined in step S4 to be not a known user. The image analysis unit 23 associates the issued user ID with the analyzed information on the user's characteristics (personal characteristics) and stores them in the user information table 52. In addition, the output voice allocation unit 24 assigns guidance voice data 51 of a voice quality that is not currently assigned to another user. Then, the output voice allocation unit 24 associates the ID (guidance voice ID) of the guidance voice data 51 with the user ID and characteristics (personal characteristics) of the user and stores them in the user information table 52. That is, the output voice allocation unit 24 assigns guidance voice data 51 with a difference in auditory quality as the guidance voice data 51 for each user.

（ステップＳ１４の詳細説明）
聴覚的な差異は、性別、音声周波数（例えばフォルマント周波数）、音圧、音高、発話速度等のうち、いずれか一つ又は複数を異ならせることで生じさせる。具体的には、例えば視覚に障害のあるユーザが２名存在する場合、出力音声割り当て部２４は、一方のユーザに対しては、男性の声の案内音声データ５１を割り当て、他方のユーザに対しては、女性の声の音声データを割り当てる。または、出力音声割り当て部２４は、一方のユーザに対しては、男性の声の案内音声データ５１のうち、音声周波数が高く、発話速度が速い声質の案内音声データ５１を割り当て、他方のユーザに対しては、同じ男性であっても、音声周波数が低く、発話速度が遅い声質の案内音声データ５１を割り当てる。 (Detailed Description of Step S14)
The auditory difference is generated by making one or more of the following different: gender, voice frequency (e.g., formant frequency), sound pressure, pitch, speaking speed, etc. Specifically, for example, when there are two visually impaired users, the output voice allocation unit 24 allocates guidance voice data 51 of a male voice to one user, and voice data of a female voice to the other user. Alternatively, the output voice allocation unit 24 allocates guidance voice data 51 of a male voice with a high voice frequency and a fast speaking speed to one user, and allocates guidance voice data 51 of a male voice with a low voice frequency and a slow speaking speed to the other user, even if both users are male.

このように性別、音声周波数、音圧、音高、及び発話速度に基づいて、声質の異なる案内音声データ５１を各ユーザに割り当てることで、各ユーザに対して、自分用の案内音声の声質を予め認識させ易くすることができる。 In this way, by assigning guidance voice data 51 with different voice qualities to each user based on gender, voice frequency, sound pressure, pitch, and speaking speed, it is possible to make it easier for each user to recognize in advance the voice quality of the guidance voice that will be used for them.

（ステップＳ１４→ステップＳ１５；案内音声（事前認識用）の生成）
ステップＳ１４の後、ステップＳ１５において、出力音声割り当て部２４は、ステップＳ１３で割り当てられた案内音声ＩＤに対応する案内音声データ５１で、後述する「事前認識用音声案内」を内容とする案内音声を生成する。この際、出力音声割り当て部２４は、案内音声データ５１毎に対応づけられている、案内音声データ５１の話者を示す固有名詞（例えば「太郎」又は「花子」など）を含む案内音声を生成する。このように生成した案内音声をユーザに対して出力することにより、これから自分に対して特定の話者の案内音声によって音声案内が行われることを、予め認識させることができる。 (Step S14 → Step S15: Generation of Guidance Voice (for Pre-recognition))
After step S14, in step S15, the output voice allocation unit 24 generates a voice guidance including the "pre-recognition voice guidance" described later, using the voice guidance data 51 corresponding to the voice guidance ID assigned in step S13. At this time, the output voice allocation unit 24 generates a voice guidance including a proper noun (e.g., "Taro" or "Hanako") indicating the speaker of the voice guidance data 51, which is associated with each voice guidance data 51. By outputting the voice guidance generated in this manner to the user, the user can be made aware in advance that voice guidance will be provided to him or her by a voice guidance from a specific speaker.

（ステップＳ１５の詳細説明）
出力音声割り当て部２４は、割り当てられた案内音声ＩＤに対応する案内音声データ５１で、例えば「わたくし花子がご案内します」等の、「太郎」又は「花子」のような音声案内を行う話者を示す固有名詞を含む案内音声を生成する。これにより、自分に対しては、例えば「花子」の声質の案内音声で音声案内が行われることを、ユーザに対して予め認識させることができる。 (Detailed Description of Step S15)
The output voice allocation unit 24 generates a voice guidance including a proper noun indicating the speaker who will provide the voice guidance, such as "Taro" or "Hanako", for example, "I, Hanako, will guide you", using the voice guidance data 51 corresponding to the assigned voice guidance ID. This allows the user to recognize in advance that the voice guidance for him/her will be provided in a voice with the voice quality of, for example, "Hanako".

なお、この例は、「太郎」又は「花子」等の「名」の固有名詞を付加する例であった。この他、「氏」又は「氏名」の固有名詞を付加してもよいし、地名、国名、建物名等の他の固有名詞を付加してもよい。 Note that this example is an example of adding a proper noun "given name" such as "Taro" or "Hanako." In addition, a proper noun "surname" or "name" may be added, or other proper nouns such as place names, country names, building names, etc. may be added.

このようなステップＳ１５の処理後、ステップＳ６へ処理が進む。 After step S15, the process proceeds to step S6.

（ステップＳ４：Ｙｅｓ）
一方、ステップＳ４において、既にユーザ情報テーブル５２に登録された既知のユーザである場合は（ステップＳ４：Ｙｅｓ）、ステップＳ５に処理が進む。 (Step S4: Yes)
On the other hand, in step S4, if the user is a known user who has already been registered in the user information table 52 (step S4: Yes), the process proceeds to step S5.

（ステップＳ５：割り当てられた案内音声ＩＤの取得）
ステップＳ５では、出力音声割り当て部２４が、ユーザ情報テーブル５２からそのユーザに割り当てられた案内音声ＩＤを取得する。これにより、処理がステップＳ６へ進む。 (Step S5: Acquire assigned voice guidance ID)
In step S5, the output voice allocation unit 24 obtains the guidance voice ID assigned to the user from the user information table 52. Then, the process proceeds to step S6.

ステップＳ６では、出力音声割り当て部２４が、ステップＳ５で取得した、またはステップＳ１３で割り当てた案内音声ＩＤに対応する案内音声データ５１で、ユーザの現在位置及び移動方向に対応する施設等の音声案内を含む案内音声を生成する。 In step S6, the output voice allocation unit 24 generates a guidance voice including voice guidance of facilities corresponding to the user's current location and direction of travel using the guidance voice data 51 corresponding to the guidance voice ID acquired in step S5 or assigned in step S13.

（ステップＳ６の詳細説明）
具体的には、ユーザの現在位置が、例えば店舗の近くである場合、出力音声割り当て部２４は、ＨＤＤ１５から「右手」、「に」、「店舗」、「Ａ」、「が」、「ございます」等の各種単語毎に、割り当てられた案内音声ＩＤに対応する案内音声データ５１を読み出す。また、出力音声割り当て部２４は、読み出した案内音声データ５１を組み合わせることで、「右手に店舗Ａがございます」等の、ユーザの現在位置及び移動方向に対応する音声案内を内容とする案内音声を生成する。 (Detailed Description of Step S6)
Specifically, when the user's current location is, for example, near a store, the output voice allocation unit 24 reads out guidance voice data 51 corresponding to the assigned guidance voice ID for each of various words such as "on your right", "to", "store", "A", "there", "there is", etc. from the HDD 15. In addition, the output voice allocation unit 24 combines the read guidance voice data 51 to generate a guidance voice having voice guidance content corresponding to the user's current location and moving direction, such as "Store A is on your right".

（ステップＳ７～ステップＳ８：スピーカ装置の決定、案内音声出力）
ステップＳ６の後、ステップＳ７へ処理が進み、スピーカ切り替え部２６が、各ユーザの現在位置、又は、現在位置及び移動方向に基づいて、案内音声を出力するスピーカ装置２を決定する。その後、ステップＳ８へ処理が進み、ステップＳ６において出力音声割り当て部２４が生成した、ユーザの現在位置及び移動方向に対応する施設等の音声案内を内容とする案内音声を、スピーカ切り替え部２６が、ステップＳ７において決定したスピーカ装置２を介して出力制御する。または、ステップＳ１４において出力音声割り当て部２４が生成した事前認識用音声案内、及び、ユーザの現在位置及び移動方向に対応する施設等の音声案内を内容とする案内音声を、ステップＳ７において決定したスピーカ装置２を介して、スピーカ切り替え部２６が出力制御する。 (Steps S7 to S8: Determination of speaker device, output of guidance voice)
After step S6, the process proceeds to step S7, where the speaker switching unit 26 determines the speaker device 2 that outputs the guidance voice based on the current location of each user, or the current location and moving direction. Then, the process proceeds to step S8, where the speaker switching unit 26 controls the output of the guidance voice generated by the output voice allocation unit 24 in step S6, which includes the voice guidance of facilities, etc. corresponding to the user's current location and moving direction, via the speaker device 2 determined in step S7. Alternatively, the speaker switching unit 26 controls the output of the pre-recognition voice guidance generated by the output voice allocation unit 24 in step S14, and the guidance voice including the voice guidance of facilities, etc. corresponding to the user's current location and moving direction, via the speaker device 2 determined in step S7.

（ステップＳ１５～ステップＳ８の具体例）
ここで、「事前認識用音声案内」を含む案内音声を生成し、スピーカ装置２から出力するまでの流れについて、具体例を示して説明する。例えば、画像解析部２３により解析されたユーザの特徴が、黒色のコートを着た女性である場合、出力音声割り当て部２４は、ＨＤＤ１５から「黒」、「色」、「の」、「コート」、「を」、「着た」、「女性」、「の」、「方」等の各種単語毎に、そのユーザに割り当てた案内音声ＩＤに対応する案内音声データ５１を読み出す。また、出力音声割り当て部２４は、読み出した案内音声データ５１を組み合わせることで、「黒色のコートを着た女性の方」等の、解析されたユーザに対する音声案内であることをそのユーザに認識させるための音声案内（事前認識用音声案内）、およびそのユーザの現在位置及び移動方向に基づいた音声案内を含む案内音声を生成する。 (Specific example of steps S15 to S8)
Here, a flow of generating a guidance voice including the "pre-recognition voice guidance" and outputting it from the speaker device 2 will be described with a concrete example. For example, if the user's characteristics analyzed by the image analysis unit 23 are a woman wearing a black coat, the output voice allocation unit 24 reads out guidance voice data 51 corresponding to the guidance voice ID assigned to the user for each of various words such as "black,""color,""of,""coat,""wear,""woman,""of," and "person" from the HDD 15. In addition, the output voice allocation unit 24 combines the read guidance voice data 51 to generate guidance voice including voice guidance (pre-recognition voice guidance) for the user to recognize that the voice guidance is for the analyzed user, such as "a woman wearing a black coat," and voice guidance based on the user's current position and moving direction.

そして、スピーカ切り替え部２６が、そのユーザの現在位置及び移動方向に基づいて、案内音声を出力するスピーカ装置を、例えば解析したそのユーザの撮像画像を撮像したカメラ装置１が設けられている端末装置６０のスピーカ装置２に決定する。スピーカ切り替え部２６は、決定したスピーカ装置２を介して、上述の案内音声を出力制御する。これにより、そのユーザに対して、上述の事前認識用音声案内を聞かせることができるので、今から出力される音声案内が、自分に対する音声案内であること、及び、その案内音声の声質を、そのユーザに対して予め認識させることができる。 Then, the speaker switching unit 26 determines the speaker device that will output the guidance voice based on the current position and moving direction of the user, for example, to be the speaker device 2 of the terminal device 60 in which the camera device 1 that captured the analyzed captured image of the user is provided. The speaker switching unit 26 controls the output of the above-mentioned guidance voice via the determined speaker device 2. This allows the user to hear the above-mentioned pre-recognition voice guidance, so that the user can be made aware in advance that the voice guidance that is about to be output is voice guidance for him/her and the voice quality of the guidance voice.

このようにして一人又は複数のユーザに対する音声案内が開始されると、スピーカ切り替え部２６が、ユーザの現在位置及び移動方向に対応するスピーカ装置２を選択し、そのユーザに割り当てられた声質の案内音声を出力する。このように、ユーザに対する音声案内は、最初に割り当てられた声質の案内音声で、終始行われる。このため、視覚に障害のあるユーザが近接して複数存在した場合でも、異なる声質で各ユーザへの音声案内が行われるため、視覚に障害のある各ユーザは、出力されている音声案内が自分に対する音声案内であることを終始認識でき、混同することがない。よって、ユーザ毎に音声案内を有効に機能させることができる。 When voice guidance for one or more users is started in this manner, the speaker switching unit 26 selects the speaker device 2 corresponding to the user's current position and direction of movement, and outputs the guidance voice in the voice quality assigned to that user. In this manner, the voice guidance for the user is always provided in the guidance voice with the initially assigned voice quality. Therefore, even if there are multiple visually impaired users in close proximity, the voice guidance is provided to each user in a different voice quality, so that each visually impaired user can always recognize that the voice guidance being output is for them, and will not be confused. Therefore, the voice guidance can be provided effectively for each user.

（ステップＳ９及びステップＳ１０）
次に、ステップＳ８の後、図８のフローチャートのステップＳ９に処理が進み、出力音声割り当て部２４が、サービスエリア外へユーザが移動したか否かを判定する。具体的には、出力音声割り当て部２４は、ユーザ情報テーブル５２への最終アクセス日時を参照し、現在時刻から一定時間（例えば１時間）以上前のものであれば、そのユーザはサービスエリア外へ移動したものと判定する。そのユーザはサービスエリア外へ移動したものと判定されると（ステップＳ９：Ｙｅｓ）、出力音声割り当て部２４は、そのユーザに関する情報をユーザ情報テーブル５２から消去する（ステップＳ１０）。 (Steps S9 and S10)
Next, after step S8, the process proceeds to step S9 in the flowchart of Fig. 8, where the output voice allocation unit 24 determines whether the user has moved outside the service area. Specifically, the output voice allocation unit 24 refers to the last access date and time to the user information table 52, and if it is more than a certain time (e.g., one hour) before the current time, it determines that the user has moved outside the service area. If it is determined that the user has moved outside the service area (step S9: Yes), the output voice allocation unit 24 erases information about the user from the user information table 52 (step S10).

もしくは、画像解析部２３が、施設出入口などのカメラ映像を解析することで、そのユーザが施設外へ移動したか否かを判定する（ステップＳ９）。カメラ映像により、そのユーザの施設外への移動が確認された場合（ステップＳ９：Ｙｅｓ）、出力音声割り当て部２４は、そのユーザに関する情報をユーザ情報テーブル５２から消去する（ステップＳ１０）。 Alternatively, the image analysis unit 23 determines whether the user has moved outside the facility by analyzing camera footage of the facility entrance or exit (step S9). If the camera footage confirms that the user has moved outside the facility (step S9: Yes), the output audio allocation unit 24 deletes information about the user from the user information table 52 (step S10).

（複数のユーザに対する音声案内の具体例）
さらに、具体的に説明すると、図９～図１２は、視覚に障害のあるユーザＡ及びユーザＢに対して行う音声案内を模式的に示す図である。まず、図９に示すように店舗の第１の通路を、ユーザＡが左方向から直進し、ユーザＢが右方向から直進してきたとする。第１の通路に対しては、いわゆるＴ字路を形成するように第２の通路が設けられている。この第１の通路及び第２の通路に沿って、図１に示した端末装置６０に相当する端末装置６０ａ～６０ｈが所定の間隔で配置されている。ユーザＡ及びユーザＢは、各々端末装置６０ａのカメラ装置１及び端末装置６０ｆのカメラ装置１で撮像され、ユーザＡは「黒色のコートを着た女性」、ユーザＢは「グレーのスーツを着た男性」の特徴を持つことが解析されたとする。 (Specific example of voice guidance for multiple users)
More specifically, Figs. 9 to 12 are diagrams that show voice guidance provided to visually impaired users A and B. First, as shown in Fig. 9, assume that user A goes straight from the left and user B goes straight from the right in a first aisle of a store. A second aisle is provided to the first aisle so as to form a so-called T-junction. Along the first and second aisles, terminal devices 60a to 60h corresponding to the terminal device 60 shown in Fig. 1 are placed at predetermined intervals. User A and user B are photographed by the camera device 1 of terminal device 60a and the camera device 1 of terminal device 60f, respectively, and it is analyzed that user A has the characteristics of "a woman wearing a black coat" and user B has the characteristics of "a man wearing a gray suit."

ユーザＡは、第１の通路の端末装置６０ａのスピーカ装置２に近接した位置を歩行しており、ユーザＢは、第１の通路の端末装置６０ｆのスピーカ装置２に近接した位置を歩行している。この場合、スピーカ切り替え部２６は、ユーザＡに対する音声案内を出力するスピーカ装置として端末装置６０ａのスピーカ装置２を選択し、ユーザＢに対する音声案内を出力するスピーカ装置として端末装置６０ｆのスピーカ装置２を選択する。また、出力音声割り当て部２４は、ユーザＡに対しては、男性の話者の太郎さんの案内音声データ５１（案内音声ＩＤ：Ｍ１）を割り当て、ユーザＢに対しては、男性の話者の太郎さんとは異なる声質の、女性の話者の花子さんの案内音声データ５１（案内音声ＩＤ：Ｆ１）を割り当てたものとする。 User A is walking near the speaker device 2 of terminal device 60a in the first passage, and user B is walking near the speaker device 2 of terminal device 60f in the first passage. In this case, the speaker switching unit 26 selects the speaker device 2 of terminal device 60a as the speaker device that outputs audio guidance to user A, and selects the speaker device 2 of terminal device 60f as the speaker device that outputs audio guidance to user B. In addition, the output audio allocation unit 24 assigns guidance audio data 51 (guidance audio ID: M1) of male speaker Taro to user A, and assigns guidance audio data 51 (guidance audio ID: F1) of female speaker Hanako, who has a different voice quality from that of male speaker Taro, to user B.

スピーカ切り替え部２６は、ユーザＡに対して割り当てられた案内音声ＩＤがＭ１の案内音声データ５１で生成された、例えば「黒色のコートを着た女性の方、わたくし太郎がご案内します。」との事前認識用音声案内を含む案内音声を、端末装置６０ａのスピーカ装置２を介して出力する。これにより、ユーザＡは、自分に対する音声案内は、男性の太郎さんの声で行われることを認識できる。なお、上述のように人物特徴に基づく事前認識用音声案内を行うことで、これから行われる音声案内が、自分用の音声案内であることを、ユーザＡに対して、さらに認識させることができる。 The speaker switching unit 26 outputs, via the speaker device 2 of the terminal device 60a, guidance voice including pre-recognition voice guidance, for example, "Lady in a black coat, I, Taro, will guide you," generated from the guidance voice data 51 with the guidance voice ID M1 assigned to user A. This allows user A to recognize that the voice guidance for him/her will be provided by the voice of a male person named Taro. Note that by providing pre-recognition voice guidance based on personal characteristics as described above, user A can be made to further recognize that the upcoming voice guidance is for him/her.

同様に、スピーカ切り替え部２６は、ユーザＢに対して割り当てられた、案内音声ＩＤがＦ１の案内音声データ５１で生成された、例えば「グレーのスーツを着た男性の方、わたくし花子がご案内します。この先、左に店舗Ｂがございます。」との事前認識用音声案内を含む案内音声を、端末装置６０ｆのスピーカ装置２を介して出力する。これにより、ユーザＢは、自分に対する音声案内は、女性の花子さんの声で行われることを認識できる。なお、上述のように人物特徴に基づく事前認識用音声案内を行うことで、これから行われる音声案内が、自分用の音声案内であることを、ユーザＢに対して、さらに認識させることができる。 Similarly, the speaker switching unit 26 outputs, via the speaker device 2 of the terminal device 60f, guidance voice including pre-recognition voice guidance, for example, "Male in a gray suit, I am Hanako and will guide you. Store B is ahead on the left," generated from guidance voice data 51 with guidance voice ID F1 assigned to user B. This allows user B to recognize that the voice guidance for him/her will be provided by the voice of a female person, Hanako. Note that by providing pre-recognition voice guidance based on personal characteristics as described above, user B can be made to further recognize that the upcoming voice guidance is for him/her.

次に、図１０に示すように、それぞれ直進するユーザＡ及びユーザＢが、第２の通路に近い位置まで前進したとする。この場合、スピーカ切り替え部２６は、ユーザＡに対する案内音声を出力するスピーカ装置２として端末装置６０ｂのスピーカ装置２を選択し、ユーザＢに対する案内音声を出力するスピーカ装置２として端末装置６０ｄのスピーカ装置２を選択する。 Next, as shown in FIG. 10, assume that user A and user B, who are moving straight ahead, advance to a position close to the second passage. In this case, the speaker switching unit 26 selects the speaker device 2 of the terminal device 60b as the speaker device 2 that outputs the guidance voice for user A, and selects the speaker device 2 of the terminal device 60d as the speaker device 2 that outputs the guidance voice for user B.

そして、スピーカ切り替え部２６は、ユーザＡに対して割り当てられた案内音声ＩＤがＭ１の案内音声データ５１で生成された、例えば「この先、Ｔ字路です。店舗Ａへは右折、店舗Ｂへは直進してください。」との案内音声を、端末装置６０ｂのスピーカ装置２を介して出力する。また、スピーカ切り替え部２６は、ユーザＢに対して割り当てられた案内音声ＩＤがＦ１の案内音声データ５１で生成された、例えば「この先、Ｔ字路です。店舗Ａへは左折してください。」との案内音声を、端末装置６０ｄのスピーカ装置２を介して出力する。 The speaker switching unit 26 then outputs, via the speaker device 2 of the terminal device 60b, a guidance voice generated from the guidance voice data 51 with a guidance voice ID of M1 assigned to user A, for example, "Ahead is a T-junction. Turn right to store A, and go straight to store B." The speaker switching unit 26 also outputs, via the speaker device 2 of the terminal device 60d, a guidance voice generated from the guidance voice data 51 with a guidance voice ID of F1 assigned to user B, for example, "Ahead is a T-junction. Turn left to store A."

次に、図１１に示すように、ユーザＡとユーザＢが、ほぼ同時にＴ字路に差し掛かったとする。この場合、選択されるスピーカ装置は、同じ端末装置６０ｃのスピーカ装置２となる。そして、スピーカ切り替え部２６は、ユーザＡに対して割り当てられた案内音声ＩＤがＭ１の案内音声データ５１で生成された、例えば「Ｔ字路です。店舗Ａへは右折、店舗Ｂへは直進してください。」との案内音声を、端末装置６０ｃのスピーカ装置２を介して出力する。また、スピーカ切り替え部２６は、ユーザＢに対して割り当てられた案内音声ＩＤがＦ１の案内音声データ５１で生成された、例えば「Ｔ字路です。店舗Ａへは左折してください。」との案内音声を、端末装置６０ｃのスピーカ装置２を介して出力する。 Next, as shown in FIG. 11, assume that user A and user B approach a T-junction at almost the same time. In this case, the speaker device selected is the speaker device 2 of the same terminal device 60c. The speaker switching unit 26 outputs, via the speaker device 2 of the terminal device 60c, a guidance voice generated from the guidance voice data 51 with the guidance voice ID M1 assigned to user A, for example, "This is a T-junction. Turn right to store A, and go straight to store B." The speaker switching unit 26 also outputs, via the speaker device 2 of the terminal device 60c, a guidance voice generated from the guidance voice data 51 with the guidance voice ID F1 assigned to user B, for example, "This is a T-junction. Turn left to store A."

図１１の例では、各ユーザＡ、Ｂの位置は近接しているが、各ユーザＡ、Ｂは、事前に自分に対する案内音声の声質を認識している。また、ユーザＡに対する音声案内で使用される案内音声ＩＤがＭ１の声による案内音声と、ユーザＢに対する音声案内で使用される案内音声ＩＤがＦ１の声による案内音声とは、声質が異なるため、ユーザＡ及びユーザＢは、自分に対する案内音声と、他方のユーザに対する案内音声を混同することなく聞き分けることができる。これにより、同じスピーカ装置２を介して、各ユーザＡ、Ｂに対する音声案内をほぼ同時に出力しても、それぞれ異なる声質の案内音声を、各ユーザＡ、Ｂが聞き分け、それぞれ自分に対する音声案内に従って行動することができる。このため、各ユーザＡ、Ｂに対する音声案内を有効に機能させることができる。 In the example of FIG. 11, users A and B are located close to each other, but each user A and B recognizes the voice quality of the guidance voice for them in advance. In addition, the guidance voice with the guidance voice ID M1 used in the voice guidance for user A and the guidance voice with the guidance voice ID F1 used in the voice guidance for user B have different voice qualities, so users A and B can distinguish the guidance voice for themselves from the guidance voice for the other user without confusing them. As a result, even if the voice guidance for users A and B is output almost simultaneously through the same speaker device 2, users A and B can distinguish the guidance voices with different voice qualities and act according to the voice guidance for themselves. This allows the voice guidance for users A and B to function effectively.

さらに、図１２に示すように、第１の通路を直進することで、ユーザＡが、端末装置６０ｅのスピーカ装置２に近接した位置まで移動すると、スピーカ切り替え部２６は、ユーザＡに対して割り当てられた案内音声ＩＤがＭ１の案内音声データ５１で生成した、例えば「間も無く店舗Ｂに到着です。店舗Ｂは、右側にございます。」との案内音声を、端末装置６０ｆのスピーカ装置２を介して出力する。これにより、ユーザＡは、店舗Ｂまで自分が移動したことを認識できる。 Furthermore, as shown in FIG. 12, when user A moves straight down the first passage and reaches a position close to the speaker device 2 of terminal device 60e, the speaker switching unit 26 outputs, via the speaker device 2 of terminal device 60f, a guidance voice generated from the guidance voice data 51 with guidance voice ID M1 assigned to user A, for example, "You will soon arrive at store B. Store B is on your right." This allows user A to recognize that he or she has moved to store B.

また、第２の通路に進入したユーザＢ、端末装置６０ｈのスピーカ装置２に近接した位置まで移動すると、ユーザＢに対して割り当てられた案内音声ＩＤがＦ１の案内音声データ５１で生成した、例えば「間も無く店舗Ａに到着です。店舗Ａは、左側にございます。」との案内音声を、端末装置６０ｈのスピーカ装置２を介して出力する。これにより、ユーザＢは、店舗Ａの近くまで自分が移動したことを認識できる。 When user B enters the second passage and moves to a position close to the speaker device 2 of the terminal device 60h, the guidance voice generated by the guidance voice data 51 of the guidance voice ID F1 assigned to user B, for example, "You will soon arrive at store A. Store A is on your left." is output via the speaker device 2 of the terminal device 60h. This allows user B to recognize that he or she has moved close to store A.

このようにユーザの移動に応じてスピーカ装置２を切り替えながら、各ユーザに割り当てられた声質の異なる案内音声を出力することで、混同を生じさせることなく、各ユーザに対する音声案内を行うことができる。 In this way, by switching the speaker device 2 in response to the user's movement and outputting guidance voices with different voice qualities assigned to each user, it is possible to provide voice guidance to each user without causing confusion.

（緊急処理）
次に、図８のフローチャートのステップＳ１１では、画像解析部２３は、視覚に障害のあるユーザが、例えば白杖を頭上５０ｃｍ程度に掲げる動作、又は、白杖をユーザの顔の前あたりで左右に振る動作等の、「助けを求める動き」の有無を検出する。このような「助けを求める動き」が検出されない場合（ステップＳ１１：Ｎｏ）、処理がステップＳ１に戻る。 (Emergency Processing)
8, the image analysis unit 23 detects whether or not the visually impaired user is making a "movement calling for help," such as, for example, raising the white cane about 50 cm above the head or waving the white cane from side to side in front of the user's face. If such a "movement calling for help" is not detected (step S11: No), the process returns to step S1.

これに対して、「助けを求める動き」を検出した場合（ステップＳ１１：Ｙｅｓ）、緊急処理部２７が、視覚に障害のあるユーザが助けを求めていることを示す緊急通知を、例えば表示部１８を介して行う（ステップＳ１２）。 In contrast, if a "movement calling for help" is detected (step S11: Yes), the emergency processing unit 27 issues an emergency notification, for example via the display unit 18, indicating that the visually impaired user is calling for help (step S12).

また、これと共に、ステップＳ１３において、スピーカ切り替え部２６は、助けを求めているユーザの現在位置に対応するスピーカ装置２を介して、例えば「管理者に緊急通知を行いました。すぐに助けが参りますので、しばらくお待ちください。」等の音声案内を行う。すなわち、スピーカ切り替え部２６は、助けに応じて管理者に連絡した旨の音声案内、及び、しばらくの待機をお願いする音声案内を、そのユーザに対して割り当てられている声質の案内音声で行う。これにより、助けを求めた視覚に障害があるユーザに対して、自分の助けを求める要望に応じて管理者等が動いてくれていることを認識させることができ、安心感を与えることができる。また、この緊急通知を受信すると、管理者又は警備員等の補助者が、助けを求めているユーザの位置に直行して補助を行うなどの対応が可能となる。 In addition, in step S13, the speaker switching unit 26 issues voice guidance such as "An emergency notification has been sent to the administrator. Help will be on the way soon, so please wait for a while" via the speaker device 2 corresponding to the current location of the user requesting help. That is, the speaker switching unit 26 issues voice guidance informing the user that the administrator has been contacted in response to the request for help, and asking the user to wait for a while, in a guidance voice with a voice quality assigned to the user. This allows the visually impaired user who has requested help to recognize that the administrator or the like is acting in response to his or her request for help, providing a sense of security. In addition, when this emergency notification is received, an assistant such as an administrator or security guard can respond by going directly to the location of the user requesting help to provide assistance.

（実施の形態の効果）
以上の説明から明らかなように、実施の形態の音声案内システムは、視覚に障害のある複数のユーザが近接する位置に存在する場合、各ユーザに対して、それぞれ異なる声質の案内音声データ５１を割り当てて案内音声を生成する。そして、割り当てた声質の案内音声を、各ユーザの移動位置に対応するスピーカ装置２を介して出力する。これにより、視覚に障害のあるユーザが同じ場所に複数存在する場合でも、各ユーザが自分に対する案内音声を容易に聞き分け可能となり、ユーザ毎に音声案内を有効に機能させることができる。 (Effects of the embodiment)
As is clear from the above description, when multiple visually impaired users are located close to each other, the voice guidance system of the embodiment generates a guidance voice by assigning guidance voice data 51 with a different voice quality to each user. Then, the guidance voice with the assigned voice quality is output through the speaker device 2 corresponding to the moving position of each user. As a result, even when multiple visually impaired users are located in the same place, each user can easily distinguish the guidance voice for himself/herself, and the voice guidance can be effectively performed for each user.

また、人物特徴を解析し、割り当てられた声質で、ユーザに対して自分に対する音声であることを認識させる音声案内（事前認識用音声案内）を行うことで、各ユーザに対して、自分用の案内音声を、他と区別してさらに認識させ易くすることができる。 In addition, by analyzing personal characteristics and providing voice guidance (pre-recognition voice guidance) in an assigned voice quality that helps the user recognize that the voice is directed at them, it becomes easier for each user to distinguish the voice guidance intended for them from others.

また、例えば「太郎」又は「花子」のように、音声案内を行う話者を示す固有名詞を音声案内に含めて出力することにより、各ユーザに対して、自分に対する音声案内を、より意識付けすることができる。 In addition, by including a proper noun indicating the speaker providing the audio guidance, such as "Taro" or "Hanako," in the audio guidance, each user can be made more aware of the audio guidance that is directed at them.

また、ユーザの移動に応じて、案内音声を出力するスピーカ装置２を切り替えるため、同じスピーカ装置２から、常時、案内音声が出力されることで音声案内が健常者、近隣の店舗の店員、近隣の居住者等に対するノイズとなる不都合を防止できる。 In addition, the speaker device 2 that outputs the guidance voice is switched according to the user's movement, so that the same speaker device 2 is always outputting the guidance voice, which can prevent the inconvenience of the voice guidance becoming a noise for able-bodied people, staff at nearby stores, nearby residents, etc.

なお、上述の実施の形態の例では、視覚に障害のある各ユーザ（通行人）の人物特徴をそれぞれ登録したユーザ情報テーブル５２を用いることで、視覚に障害のある各ユーザ（通行人）を一意に識別することとした。しかし、これに限らず、下記のようにしてもよい。 In the above-described embodiment, each visually impaired user (passerby) is uniquely identified by using a user information table 52 in which the personal characteristics of each visually impaired user (passerby) are registered. However, this is not limiting, and the following may also be used.

例えば、視覚に障害のあるユーザ（通行人）に対して、自己の識別情報を含む電波を発信するＢＬＥタグ等の無線タグを所持させる。ＢＬＥは、「Bluetooth（登録商標） Low Energy」の略語である。また、その無線タグが発信する自己の識別情報を含む電波の受信装置を、例えばカメラ装置１及びスピーカ装置２と共に端末装置６０に設ける。 For example, a visually impaired user (passerby) is provided with a wireless tag such as a BLE tag that transmits radio waves including the user's own identification information. BLE is an abbreviation for "Bluetooth (registered trademark) Low Energy." In addition, a receiving device for the radio waves including the user's own identification information transmitted by the wireless tag is provided in the terminal device 60 together with the camera device 1 and the speaker device 2, for example.

受信装置は、無線タグからの電波を受信し、電波に含まれる識別情報を、ネットワーク５を介して解析装置３へ送信する。解析装置３は、識別情報を受信した受信装置と共に端末装置６０に設けられているカメラ装置１で撮像された撮像画像を解析して検出したユーザの画像に、受信した識別情報を関連付けてデータベースに登録する。これにより、上述と同様に、視覚に障害のある各ユーザ（通行人）を一意に識別できる。 The receiving device receives radio waves from the wireless tag and transmits the identification information contained in the radio waves to the analysis device 3 via the network 5. The analysis device 3, together with the receiving device that received the identification information, analyzes the captured image captured by the camera device 1 provided in the terminal device 60, detects an image of the user, associates the received identification information with the detected image, and registers the image in a database. This allows each visually impaired user (passerby) to be uniquely identified, as described above.

最後に、上述の実施の形態は、一例として提示したものであり、本発明の範囲を限定することは意図していない。この新規な実施の形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことも可能である。また、実施の形態及び実施の形態の変形は、発明の範囲や要旨に含まれると共に、特許請求の範囲に記載された発明とその均等の範囲に含まれるものである。 Finally, the above-described embodiment is presented as an example and is not intended to limit the scope of the present invention. This novel embodiment can be implemented in various other forms, and various omissions, substitutions, and modifications can be made without departing from the gist of the invention. Furthermore, the embodiment and modifications of the embodiment are included in the scope and gist of the invention, and are included in the scope of the invention and its equivalents described in the claims.

１カメラ装置
２スピーカ装置
３解析装置
５ネットワーク
１１ＣＰＵ
１２ＲＯＭ
１３ＲＡＭ
１４通信部
１５ＨＤＤ
１６入出力インターフェース（入出力Ｉ／Ｆ）
１７通信インターフェース（通信Ｉ／Ｆ）
１８表示部
１９操作部
２１映像取得部
２２地図データ取得部
２３画像解析部
２４出力音声割り当て部
２５通信制御部
２６スピーカ切り替え部
２７緊急処理部
５０地図データ
５１案内音声データ
５２ユーザ情報テーブル Reference Signs List 1 Camera device 2 Speaker device 3 Analysis device 5 Network 11 CPU
12 ROM
13 RAM
14 Communication unit 15 HDD
16 Input/Output Interface (Input/Output I/F)
17 Communication interface (communication I/F)
18 Display unit 19 Operation unit 21 Video acquisition unit 22 Map data acquisition unit 23 Image analysis unit 24 Output audio allocation unit 25 Communication control unit 26 Speaker switching unit 27 Emergency processing unit 50 Map data 51 Guidance audio data 52 User information table

Claims

a detection unit that detects a visually impaired user and detects at least a current location of the visually impaired user by analyzing an image captured by the camera device;
an allocation unit that generates a guidance voice based on guidance voice data of different voice qualities assigned to each user when a plurality of visually impaired users are detected by the detection unit;
an output control unit that controls output of the guidance voice generated based on the guidance voice data of different voice qualities assigned to each user via a voice output device corresponding to at least a current position of each visually impaired user detected by the detection unit;
A voice guidance device comprising:

The voice guidance device according to claim 1, wherein the allocation unit allocates guidance voice data that differs in at least one or more of gender, voice frequency, sound pressure, pitch, and speaking speed to each visually impaired user.

The detection unit detects characteristics of each visually impaired user,
The voice guidance device according to claim 1 or 2, wherein the output control unit performs pre-recognition voice guidance indicating the characteristics of each detected user using guidance voice data of a voice quality assigned to each user.

The voice guidance device according to claim 1 , wherein the allocation unit generates the guidance voice data including a proper noun indicating a speaker providing the voice guidance when generating the guidance voice data.

a detection step in which a detection unit detects a visually impaired user by analyzing an image captured by the camera device, and detects at least a current position of the visually impaired user;
an allocation step in which, when a plurality of visually impaired users are detected in the detection step, an allocation unit generates a guidance voice based on guidance voice data of different voice qualities assigned to each user;
an output control step in which an output control unit controls output of the guidance voice generated based on the guidance voice data of different voice qualities assigned to each user via a voice output device corresponding to at least a current position of each visually impaired user detected in the detection step;
The voice guidance method includes:

Computer,
a detection unit that detects a visually impaired user and detects at least a current location of the visually impaired user by analyzing an image captured by the camera device;
an allocation unit that generates a guidance voice based on guidance voice data of different voice qualities assigned to each user when a plurality of visually impaired users are detected by the detection unit;
functioning as an output control unit that controls output of the guidance voice generated based on guidance voice data of different voice qualities assigned to each user via a voice output device corresponding to at least the current position of each visually impaired user detected by the detection unit;
A voice guidance program that features: