JP7790109B2

JP7790109B2 - Text-to-speech system and text-to-speech device

Info

Publication number: JP7790109B2
Application number: JP2021190285A
Authority: JP
Inventors: 里葉子芦田; 直都神山; 美貴角場; 亮町田
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2020-12-02
Filing date: 2021-11-24
Publication date: 2025-12-23
Anticipated expiration: 2041-11-24
Also published as: JP2022088329A

Description

本発明は、音声読み上げシステム及び音声読み上げ装置に関する。 The present invention relates to a text-to-speech system and a text-to-speech device .

特許文献１には、スマートフォンのカメラ機能を用いて印刷物の文字を認識させて、認識した文字を読み上げることにより、小さな文字の認識が可能となるアプリケーションソフトが開示されている。 Patent Document 1 discloses application software that uses a smartphone's camera function to recognize characters on printed materials and reads the recognized characters aloud, making it possible to recognize small characters.

特開２０１４－１２７１９７号公報JP 2014-127197 A

しかし、特許文献１のアプリケーションソフトは、近視や老眼のユーザにとっては、印刷物が読み易くなり有用であるが、単に文章を読み上げるだけなので、遊び心に欠け、子供やその家族が興味を持つものではなかった。 However, while the application software in Patent Document 1 is useful for users with myopia or presbyopia, making printed materials easier to read, it simply reads out loud text, lacking in playfulness and not engaging for children or their families.

本発明は、斯かる事情に鑑みてなされたものであり、子供やその家族が楽しむことができる音声読み上げシステム及び音声読み上げ装置を提供することを目的とする。 The present invention has been made in view of the above circumstances, and has as its object to provide a text-to-speech system and a text-to-speech device that can be enjoyed by children and their families.

本願は上記課題を解決する手段を複数含んでいるが、その一例を挙げるならば、音声読み上げシステムは、撮像装置と、撮像対象が記載された対象物を載置可能な載置面を有する本体装置とを備え、前記撮像装置は、把持部と、撮像対象を覗き込むための窓部と、前記窓部を介して撮像対象が覗き込まれた状態で前記撮像対象を撮像可能な撮像部と、前記撮像部で撮像して得られた画像データを前記本体装置へ送信する送信部とを備え、前記本体装置は、前記画像データを受信する受信部と、前記受信部で受信した画像データを解析する解析部と、前記解析部の解析結果に基づいて音声を出力する出力部とを備える。 The present application includes multiple means for solving the above problem, and one example is a voice reading system comprising an imaging device and a main unit having a mounting surface on which an object to be imaged can be placed, the imaging device comprising a gripping unit, a window for looking into the object, an imaging unit capable of imaging the object while the object is being looked into through the window, and a transmitting unit for transmitting image data captured by the imaging unit to the main unit, the main unit comprising a receiving unit for receiving the image data, an analyzing unit for analyzing the image data received by the receiving unit, and an output unit for outputting audio based on the analysis results of the analyzing unit.

本発明によれば、子供やその家族が楽しむことができる。 This invention can be enjoyed by children and their families.

第１実施形態の音声読み上げシステムの構成の一例を示す外観斜視図式図である。1 is a schematic perspective view showing an example of the configuration of a text-to-speech system according to a first embodiment; 第１実施形態の音声読み上げシステムの構成の一例を示すブロック図である。1 is a block diagram showing an example of the configuration of a text-to-speech system according to a first embodiment; ＢＧＭリストの構成の一例を示す模式図である。FIG. 10 is a schematic diagram showing an example of the configuration of a BGM list. 音（サウンド）の一例を示す説明図である。FIG. 10 is an explanatory diagram showing an example of a sound. 補正部による撮像範囲の補正方法の一例を示す模式図である。10A and 10B are schematic diagrams illustrating an example of a method for correcting an imaging range by a correction unit. 第１実施形態の音声読み上げシステムの処理手順の一例を示すフローチャートである。10 is a flowchart illustrating an example of a processing procedure of the text-to-speech system according to the first embodiment. 音声読み上げ装置の構成の一例を示すブロック図である。FIG. 1 is a block diagram showing an example of the configuration of a text-to-speech device. 第３実施形態の音声読み上げシステムの構成の一例を示すブロック図である。FIG. 10 is a block diagram showing an example of the configuration of a text-to-speech system according to a third embodiment. 第３実施形態の音声読み上げシステムの処理手順の一例を示すフローチャートである。11 is a flowchart illustrating an example of a processing procedure of the text-to-speech system according to the third embodiment. 第４実施形態の情報処理システムの構成の一例を示す図である。FIG. 10 is a diagram illustrating an example of a configuration of an information processing system according to a fourth embodiment. 第４実施形態の情報処理システムの処理手順の一例を示す図である。FIG. 13 is a diagram illustrating an example of a processing procedure of an information processing system according to a fourth embodiment. 興味関心分析機能の処理手順の一例を示す図である。FIG. 10 is a diagram illustrating an example of a processing procedure of an interest analysis function. 興味関心分析機能の分析結果の一例を示す図である。FIG. 10 is a diagram showing an example of an analysis result of an interest analysis function. 興味関心タイプ分析機能の処理手順の一例を示す図である。FIG. 10 is a diagram illustrating an example of a processing procedure of an interest type analysis function. 興味関心タイプ分析機能の分析結果の一例を示す図である。FIG. 10 is a diagram showing an example of an analysis result of an interest type analysis function. 活動タイプ分析機能の処理手順の一例を示す図である。FIG. 10 is a diagram illustrating an example of a processing procedure of an activity type analysis function. 活動タイプ分析機能の分析結果の一例を示す図である。FIG. 10 is a diagram showing an example of an analysis result of the activity type analysis function. 好きな色分析機能の処理手順の一例を示す図である。FIG. 10 is a diagram illustrating an example of a processing procedure of a favorite color analysis function. 好きな色分析機能の分析結果の一例を示す図である。FIG. 10 is a diagram showing an example of an analysis result of a favorite color analysis function. 第５実施形態の情報処理システムの構成の一例を示す図である。FIG. 13 is a diagram illustrating an example of a configuration of an information processing system according to a fifth embodiment. 第５実施形態の情報処理システムの処理手順の一例を示す図である。FIG. 13 is a diagram illustrating an example of a processing procedure of an information processing system according to a fifth embodiment. 第６実施形態の情報処理システムの構成の一例を示す図である。FIG. 13 is a diagram illustrating an example of a configuration of an information processing system according to a sixth embodiment. 第６実施形態の撮像装置５０の処理手順の一例を示す図である。FIG. 13 is a diagram illustrating an example of a processing procedure of the imaging device 50 of the sixth embodiment. 興味関心分析結果の推移の一例を示す図である。FIG. 10 is a diagram showing an example of a transition of interest analysis results. 年代別・地域別・時系列での興味関心分析結果の一例を示す図である。FIG. 10 is a diagram showing an example of the results of an interest analysis by age group, region, and time series.

（第１実施形態）
以下、本発明の実施の形態を図面に基づいて説明する。図１は第１実施形態の音声読み上げシステム１００の構成の一例を示す外観斜視図である。音声読み上げシステム１００は、撮像装置５０、及び本体装置１０を備える。撮像装置５０は、把持部６２と、把持部６２の一端側に設けられた窓部６１を備える。把持部６２は、ユーザ（例えば、幼児や子供、その家族など）が撮像装置５０を手に持つ際に保持する部分である。窓部６１には、レンズ（拡大鏡）、透明の樹脂やガラスが設けられ、あるいは単に開口が形成され、ユーザが窓部６１を介して対象物（例えば、絵本、図鑑、児童書など）に記載された撮像対象（例えば、文章などの文字列、写真を含む図など）を覗き込むことができる。 (First embodiment)
Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is an external perspective view showing an example of the configuration of a text-to-speech system 100 according to a first embodiment. The text-to-speech system 100 includes an imaging device 50 and a main unit 10. The imaging device 50 includes a grip 62 and a window 61 provided on one end of the grip 62. The grip 62 is a portion that a user (e.g., an infant, a child, or their family) holds when holding the imaging device 50. The window 61 may be provided with a lens (magnifying glass), transparent resin or glass, or simply an opening, allowing the user to look through the window 61 into an object (e.g., a picture book, an illustrated guide, a children's book, etc.) to view an image (e.g., a character string such as a sentence, a diagram including a photograph, etc.) depicted in the object.

また、撮像装置５０は、窓部６１を介して撮像対象が覗き込まれた状態で撮像対象を撮像可能な撮像部５１、対象物までの距離を検出する距離センサ５２、撮像部５１による撮像開始操作を受け付けるボタン（シャッターボタン）６３を備える。撮像部５１及び距離センサ５２は、把持部６２の一方側（窓部６１を介して撮像対象を覗き込む場合に撮像対象側）に設けられ、ボタン６３は、把持部６２の他方側（窓部６１を介して撮像対象を覗き込む場合にユーザの顔側）に設けられている。撮像部５１は、少なくとも１つのカメラで構成することができる。距離センサ５２は、距離を検出できるセンサであればよい。なお、距離センサ５２に代えて、複数のカメラの視差に応じて距離を計測してもよい。 The imaging device 50 also includes an imaging unit 51 that can capture an image of the target when the target is viewed through the window 61, a distance sensor 52 that detects the distance to the target, and a button (shutter button) 63 that accepts an operation to start imaging using the imaging unit 51. The imaging unit 51 and distance sensor 52 are provided on one side of the grip 62 (the side facing the target when the target is viewed through the window 61), and the button 63 is provided on the other side of the grip 62 (the side facing the user's face when the target is viewed through the window 61). The imaging unit 51 can be composed of at least one camera. The distance sensor 52 can be any sensor that can detect distance. Instead of the distance sensor 52, distance can be measured according to the parallax of multiple cameras.

本体装置１０は、対象物を載置可能な載置面２１を有する。載置面２１は、平面視で矩形状をなし、載置面２１の中央部２３を間にした１対の縁辺部２２それぞれから中央部２３に向かって高さが小さくなるように傾斜している。これにより、絵本、図鑑、児童書などを見開き状態で載置面２１に置くことができるとともに両頁の紙面が１８０度よりも小さい角度をなすように置くことができ、文書や図が見やすくなるように対象物を載置することができる。 The main device 10 has a placement surface 21 on which an object can be placed. The placement surface 21 is rectangular in plan view and is inclined so that the height decreases from each of a pair of edge portions 22 that face the center portion 23 of the placement surface 21 toward the center portion 23. This allows picture books, illustrated guides, children's books, etc. to be placed on the placement surface 21 in a double-page spread, and the pages can be placed so that an angle of less than 180 degrees is formed between them, allowing the object to be placed in a way that makes the document or diagram easier to read.

本体装置１０は、載置面２１の傾斜する方向に沿って載置面２１の他の１対の縁辺部に対象物の移動を規制する規制部２４を備える。規制部２４は、載置面２１より突出した状態で設けられている。これにより、載置面２１に載置した絵本、図鑑、児童書などが載置面２１から滑り落ちることを防止できる。 The main unit 10 is equipped with restricting sections 24 that restrict the movement of objects on the other pair of edges of the placing surface 21 along the inclined direction of the placing surface 21. The restricting sections 24 are provided in a state where they protrude from the placing surface 21. This prevents picture books, illustrated guides, children's books, etc. placed on the placing surface 21 from slipping off the placing surface 21.

本体装置１０は、載置面２１に、撮像装置５０を収容するための収容部２５を形成してある。収容部２５の平面視の形状は、撮像装置５０の平面視の形状と同様の形状とすることができる。撮像装置５０を収容部２５に嵌め込む構成でもよく、磁石などを用いて、両者を吸着させるようにしてもよい。これにより、撮像装置５０を紛失するおそれを防止するとともに幼児や子供には、後片付けの習慣を身に着けさせることができる。 The main unit 10 has a storage section 25 formed on the mounting surface 21 for storing the imaging device 50. The shape of the storage section 25 in a plan view can be the same as the shape of the imaging device 50 in a plan view. The imaging device 50 may be configured to fit into the storage section 25, or the two may be attached together using a magnet or the like. This prevents the imaging device 50 from being lost and helps infants and children develop the habit of tidying up after themselves.

本体装置１０は、本体装置１０の状態を表示する表示灯（例えば、ＬＥＤなど）２６を設けることができる。表示灯２６は、電源駆動、バッテリ駆動、動作中、充電中、異常などの状態を表示することができる。なお、図示していないが、タッチ操作が可能な表示パネルを設けてもよい。表示パネルを介して、所要の設定操作を行うようにしてもよい。 The main unit 10 may be provided with an indicator light (e.g., an LED) 26 that displays the status of the main unit 10. The indicator light 26 can display statuses such as powered, battery-powered, operating, charging, and abnormality. Although not shown, a touch-enabled display panel may also be provided. Required setting operations may be performed via the display panel.

図２は第１実施形態の音声読み上げシステム１００の構成の一例を示すブロック図である。撮像装置５０は、前述の撮像部５１、距離センサ５２の他に、補正部５３、メモリ５４、プロセッサ５５、及び通信部５６を備える。プロセッサ５５は、撮像装置５０全体を制御することができる。メモリ５４は、半導体メモリ等で構成され、撮像部５１で撮像して得られた画像データを記憶することができる。 Figure 2 is a block diagram showing an example of the configuration of the text-to-speech system 100 of the first embodiment. In addition to the aforementioned imaging unit 51 and distance sensor 52, the imaging device 50 also includes a correction unit 53, memory 54, processor 55, and communication unit 56. The processor 55 can control the entire imaging device 50. The memory 54 is composed of a semiconductor memory or the like, and can store image data obtained by imaging with the imaging unit 51.

通信部５６は、無線ＬＡＮなどの宅内ネットワーク１を介して、本体装置１０との間の通信機能を実現する。撮像装置５０（例えば、プロセッサ５５）は、通信部５６を介して、撮像部５１で撮像して得られた画像データ（メモリ５４に一旦記憶した画像データも含む）を本体装置１０へ送信することができる。 The communication unit 56 realizes communication functionality with the main unit 10 via the home network 1, such as a wireless LAN. The imaging device 50 (e.g., the processor 55) can transmit image data captured by the imaging unit 51 (including image data temporarily stored in the memory 54) to the main unit 10 via the communication unit 56.

補正部５３は、距離センサ５２で検出した距離に応じて窓部６１を介した視野内の撮像対象が撮像可能となるように撮像部５１の撮像範囲を補正することができる。補正部５３の詳細は後述する。 The correction unit 53 can correct the imaging range of the imaging unit 51 so that the imaging target within the field of view through the window unit 61 can be imaged according to the distance detected by the distance sensor 52. Details of the correction unit 53 will be described later.

本体装置１０は、制御部１１、通信部１２、解析部１３、音声合成部１４、順序推定部１５、記憶部１６、マイク１７、スピーカ１８、及び感情指標算出部１９を備える。解析部１３は、文字列解析部１３１、及び図解析部１３２を備える。記憶部１６は、例えば、半導体メモリで構成され、ＢＧＭリスト１６１、及び音声データリスト１６２を記憶することができる。制御部１１は、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）及びＲＡＭ（Random Access Memory）などで構成することができる。 The main unit 10 includes a control unit 11, a communication unit 12, an analysis unit 13, a voice synthesis unit 14, a sequence estimation unit 15, a memory unit 16, a microphone 17, a speaker 18, and an emotion index calculation unit 19. The analysis unit 13 includes a character string analysis unit 131 and a graphic analysis unit 132. The memory unit 16 is configured, for example, from semiconductor memory, and can store a background music list 161 and an audio data list 162. The control unit 11 can be configured with a CPU (Central Processing Unit), ROM (Read Only Memory), RAM (Random Access Memory), etc.

通信部１２は、宅内ネットワーク１を介して、撮像装置５０との間の通信機能を実現する。通信部１２は、撮像装置５０が送信した画像データを受信することができる。 The communication unit 12 realizes communication functions with the imaging device 50 via the home network 1. The communication unit 12 can receive image data transmitted by the imaging device 50.

解析部１３は、通信部１２を介して受信した画像データを解析する。具体的には、解析部１３は、通信部１２を介して受信した画像データに対して画像認識を行って撮像対象が文字列であるか図であるかを解析することができる。画像認識は、例えば、公知の手法を用いることができ、前処理、特徴抽出、照合・分類などの処理を行えばよい。 The analysis unit 13 analyzes the image data received via the communication unit 12. Specifically, the analysis unit 13 performs image recognition on the image data received via the communication unit 12 to analyze whether the imaged subject is a character string or a picture. For image recognition, for example, a known method can be used, and processing such as preprocessing, feature extraction, matching, and classification can be performed.

文字列解析部１３１は、画像処理エンジン及び言語処理エンジンを搭載し、画像データを解析して、文字列（テキスト）を出力することができる。画像データから文字列を抽出する処理は、例えば、公知の手法を用いればよい。 The character string analysis unit 131 is equipped with an image processing engine and a language processing engine, and can analyze image data and output character strings (text). The process of extracting character strings from image data can be performed using, for example, a known method.

図解析部１３２は、画像処理エンジンを搭載し、画像データを解析して、画像に含まれる図（写真を含む）が何を表すものであるかを解析することができる。例えば、撮像対象としては、電車や自動車などの乗物、動物、昆虫、楽器などが含まれるが、これらに限定されるものではない。 The image analysis unit 132 is equipped with an image processing engine and can analyze image data to determine what the images (including photographs) contained in the images represent. For example, imaged objects include, but are not limited to, vehicles such as trains and automobiles, animals, insects, and musical instruments.

制御部１１は、解析部１３の解析結果に基づいて、スピーカ１８を介して音声を出力することができる。 The control unit 11 can output audio through the speaker 18 based on the analysis results of the analysis unit 13.

上述の構成により、幼児や子供が、撮像装置５０の窓部６１で、絵本、図鑑や児童書の中で気になったモノを覗いてボタン６３を押すだけで、覗いたモノについて音声で教えてくれるので、幼児や子供は、音声を使った体験をすることができ、モノに対する興味がわくとともに楽しい体験をすることができる。また、単にモノを覗いて遊ぶだけでなく、幼児や子供が疑問に思うもの、あるいは興味があるものを覗くと、音声を使った体験を提供することで、新たな発見へと導くような機能を提供することができ、「知る楽しさ」を提供できる。また、幼児や子供と一緒に家族（例えば、親など）も楽しい体験をすることができる。 With the above-described configuration, infants and children can look at an object that catches their eye in a picture book, encyclopedia, or children's book through the window 61 of the imaging device 50 and press the button 63 to receive audio information about the object. This allows infants and children to have an audio experience that sparks their interest in objects and provides a fun experience. Furthermore, rather than simply playing with objects, by looking at something that infants and children are curious about or interested in, an audio experience is provided, providing a function that leads to new discoveries and provides the "fun of learning." Furthermore, family members (e.g., parents) can also enjoy the experience together with their infants and children.

次に、出力する音声をどのように準備するかについて、第１～第３の方法について説明する。 Next, we will explain the first to third methods for preparing the audio to be output.

第１の方法は、予め絵本、図鑑や児童書などに記載されたテキスト（文章）を録音しておき、録音した音声を音声データリスト１６２として記憶部１６に記憶しておく。音声データリスト１６２は、絵本、図鑑や児童書などの書籍毎に、テキストを示す情報と、当該テキストの音声データとを関連付けておく。文字列解析部１３１で解析して得られた文字列（テキスト）に対応する音声データをスピーカ１８から出力することにより、「音声読み上げ」を行うことができる。録音した音声を再生する場合に、話者の属性を変更できるようにしてもよい。例えば、男性又は女性の声、若者又は年配者の声、アニメの声優の声を好みに応じて設定できるようにしてもよい。このような設定は、タッチ操作が可能な表示パネルを用いてもよい。また、マイク１７を使って、親の声を録音し、録音した音声を再生してもよい。 The first method involves recording text (sentences) found in picture books, encyclopedias, children's books, etc. in advance, and storing the recorded audio in the storage unit 16 as an audio data list 162. The audio data list 162 associates information indicating the text with audio data for each book, such as a picture book, encyclopedia, or children's book. Audio data corresponding to the character string (text) analyzed by the character string analysis unit 131 is output from the speaker 18 to perform "audio reading." When playing back recorded audio, the speaker's attributes may be changed. For example, a male or female voice, a young or elderly voice, or the voice of an anime voice actor may be set according to preference. Such settings may be made using a touch-enabled display panel. Alternatively, a parent's voice may be recorded using the microphone 17, and the recorded audio may be played back.

第２及び第３の方法は、音声合成部１４により行うことができる。まず、第２の方法は、予め録音された音声の素片を連結して音声を合成する。具体的には、録音された文字（例えば、「あ」、「か」など）、単語や文節を連結して音声を合成することができる。この場合、発話速度や声の高さ、イントネーション（音調、抑揚）などを調整して自然に聞こえる音声に調整することができる。また、コーパスベース音声合成を用いてもよい。コーパスベース音声合成は、テキストの文、フレーズ、アクセント句、形態素、音素、アクセントなどの言語特徴量に基づいて基本周波数や音素継続長などを予測し、予測した基本周波数や音素継続長などに最も合致する音声素片を、予め準備した音声データベースから選んで連結する手法である。 The second and third methods can be performed by the speech synthesis unit 14. First, the second method synthesizes speech by concatenating pre-recorded speech fragments. Specifically, speech can be synthesized by concatenating recorded characters (e.g., "a" and "ka"), words, and phrases. In this case, speech can be adjusted to sound more natural by adjusting the speaking rate, voice pitch, intonation (tone, intonation), etc. Corpus-based speech synthesis may also be used. Corpus-based speech synthesis is a technique that predicts fundamental frequency, phoneme duration, etc. based on linguistic features of the text, such as sentences, phrases, accent phrases, morphemes, phonemes, and accents, and then selects and concatenates speech fragments that best match the predicted fundamental frequency, phoneme duration, etc. from a pre-prepared speech database.

第３の方法は、予め録音された音声の音声特徴量を用いて音声を合成する。具体的には、音声合成部１４は、録音された音声から音声特徴量を学習した学習済みモデルを備える。音声合成部１４は、学習済みモデルが出力する音声特徴量に基づいて音声波形に変換することができる。音声特徴量は、例えば、メル周波数ケプストラム係数（ＭＦＣＣ）、線スペクトル対（ＬＳＰ）、基本周波数などを含む。 The third method synthesizes speech using speech features of pre-recorded speech. Specifically, the speech synthesis unit 14 is equipped with a trained model that has learned speech features from recorded speech. The speech synthesis unit 14 can convert speech into a speech waveform based on the speech features output by the trained model. Speech features include, for example, Mel-Frequency Cepstrum Coefficients (MFCC), line spectral pairs (LSP), fundamental frequency, etc.

制御部１１は、解析部１３の解析結果に基づいて、スピーカ１８を介してＢＧＭ（バックグラウンドミュージック、背景音楽とも称する）を出力することができる。以下、具体的に説明する。 Based on the analysis results of the analysis unit 13, the control unit 11 can output BGM (background music) via the speaker 18. This is explained in detail below.

感情指標算出部１９は、文字列解析部１３１が抽出した文字列に対して意味解析を行って感情指標を算出することができる。例えば、感情指標算出部１９は、文字列から感情を表す単語を抽出し、抽出した単語に基づいて感情指標を算出することができる。感情をポジティブな感情と、ネガティブな感情とに区分し、予め感情を表す単語に対して、ポジティブであるかネガティブであるかを決めておく。さらに、感情を表す単語それぞれに対して、感情の強さを示す値を決めておく。感情指標算出部１９は、抽出した単語それぞれが、ポジティブであるかネガティブであるか、感情の強さの値に基づいて、抽出した文字列に対する感情指標を算出することができる。なお、感情指標の算出単位は、例えば、１つの段落、あるいは、複数の段落を纏めた段落群で行ってもよい。 The emotion index calculation unit 19 can calculate an emotion index by performing semantic analysis on the character string extracted by the character string analysis unit 131. For example, the emotion index calculation unit 19 can extract words that express emotions from the character string and calculate an emotion index based on the extracted words. Emotions are divided into positive emotions and negative emotions, and it is determined in advance whether each word that expresses an emotion is positive or negative. Furthermore, a value indicating the strength of the emotion is determined for each word that expresses an emotion. The emotion index calculation unit 19 can calculate an emotion index for the extracted character string based on the value of the strength of the emotion, whether each extracted word is positive or negative. The emotion index calculation unit may calculate an emotion index for, for example, a single paragraph or a group of multiple paragraphs.

制御部１１は、記憶部１６に記憶したＢＧＭリスト１６１を用いて、感情指標算出部１９が算出した感情指標に応じたＢＧＭをスピーカ１８から出力することができる。 The control unit 11 can use the BGM list 161 stored in the memory unit 16 to output BGM from the speaker 18 according to the emotion index calculated by the emotion index calculation unit 19.

図３はＢＧＭリスト１６１の構成の一例を示す模式図である。ＢＧＭリスト１６１は、感情指標と、ＢＧＭとの対応関係を定めている。感情指標は、例えば、ポジティブのレベル１～３、及びネガティブのレベル１～３とすることができる。レベルの数値は、大きいほど感情の強さが強いとすることができる。レベル毎に複数のＢＧＭが対応付けられている。例えば、図に示すように、ポジティブのレベル１に対して、ＢＧＭ１ａ、ＢＧＭ１ｂ、ＢＧＭ１ｃ、ＢＧＭ１ｄが対応付けられている。他の感情指標についても同様である。ここで、符号ａ～ｄは、絵本などに記載されている物語や文章における異なるシーンを識別するものである。例えば、感情指標がポジティブのレベル１の場合に、シーンが変わるとＢＧＭもシーンに応じて変えることができる。これにより、臨場感を演出することができ、音声を使った体験を一層楽しくするとともに、わくわくする感情を抱かせることができる。 Figure 3 is a schematic diagram showing an example of the configuration of the BGM list 161. The BGM list 161 defines the correspondence between emotional indices and BGM. Emotional indices can be, for example, positive levels 1 to 3 and negative levels 1 to 3. The higher the level number, the stronger the emotion. Multiple BGMs are associated with each level. For example, as shown in the figure, BGM1a, BGM1b, BGM1c, and BGM1d are associated with positive level 1. The same applies to other emotional indices. Here, the symbols a to d identify different scenes in a story or text found in a picture book, for example. For example, when the emotional index is positive level 1, the BGM can also be changed when the scene changes. This creates a sense of realism, making the audio experience more enjoyable and exciting.

また、制御部１１は、解析部１３の解析結果に基づいて、スピーカ１８を介して音（サウンドとも称する）を出力することができる。具体的には、制御部１１は、スピーカ１８を介して図解析部１３２が解析した解析結果（画像に含まれる図（写真を含む）が何を表すものであるかを示す）に関連する音を出力することができる。 The control unit 11 can also output a sound via the speaker 18 based on the analysis results of the analysis unit 13. Specifically, the control unit 11 can output a sound via the speaker 18 that is related to the analysis results (indicating what the figures (including photographs) included in the image represent) obtained by the diagram analysis unit 132.

図４は音（サウンド）の一例を示す説明図である。図に示すように、撮像対象（画像に含まれる図）は、例えば、電車、自動車、動物、虫、楽器などとすることができる。図が電車の場合には、電車の走行音が出力される。図が動物の場合には、動物の鳴き声が出力される。図が楽器の場合には、楽器の音が出力される。これにより、幼児や子供が、図鑑などを開いて、興味のある図を窓部６１で覗き込むことにより、目で見た図に関連する音が再生される。幼児や子供は、疑問や興味を持ったモノを覗き込むことにより、そのモノがどのような音を発するものかを知ることができ、幼児や子供を新たな発見へと導くことができる。 Figure 4 is an explanatory diagram showing an example of sound. As shown in the figure, the imaged object (the image included in the image) can be, for example, a train, a car, an animal, an insect, or a musical instrument. If the image is a train, the sound of the train running is output. If the image is an animal, the animal's cry is output. If the image is a musical instrument, the sound of the instrument is output. In this way, when infants and children open an illustrated book or the like and look through the window 61 at an image that interests them, a sound related to the image they see is played. By looking into an object that they are curious about or interested in, they can find out what sound the object makes, leading them to new discoveries.

順序推定部１５は、文字列解析部１３１が抽出した文字列の配列（レイアウトなど）に基づいて文字列の読み上げ順序を推定することができる。例えば、文書のレイアウトが縦書きの場合、文字列の読み上げ順序を上から下へとし、文書のレイアウトが横書きの場合、文字列の読み上げ順序を左から右へとすることができる。これにより、対象物が縦書きでも横書きでも音声を読み上げることができる。 The order estimation unit 15 can estimate the reading order of character strings based on the arrangement (layout, etc.) of the character strings extracted by the character string analysis unit 131. For example, if the document layout is vertical, the reading order of character strings can be from top to bottom, and if the document layout is horizontal, the reading order of character strings can be from left to right. This makes it possible to read aloud whether the object is written vertically or horizontally.

次に、補正部５３について説明する。 Next, we will explain the correction unit 53.

図５は補正部５３による撮像範囲の補正方法の一例を示す模式図である。補正部５３は、窓部６１と絵本などの対象物との間の距離に応じて、窓部６１の枠内で視認できる範囲（視野内）の撮像対象が撮像可能となるように撮像範囲を補正することができる。図に示すように、絵本の表面に対する窓部６１の位置がＰ１の場合に、窓部６１を覗いて見える範囲をＳ１とする。なお、撮像範囲Ｓ１は矩形状をなすのに対して、窓部６１は円形であるので、正確には、視野は円形となるが、便宜上、当該円形が内接円となるような矩形を撮像範囲Ｓ１としている。窓部６１を絵本から少し遠ざけて位置Ｐ２において絵本の表面を覗くと、窓部６１と絵本との間の距離が長くなり、窓部６１を覗いて見える範囲は大きくなるので、撮像範囲Ｓ２は、撮像範囲Ｓ１よりも大きくなるように補正される。 Figure 5 is a schematic diagram showing an example of a method for correcting the imaging range using the correction unit 53. The correction unit 53 can correct the imaging range so that the imaging target within the range (field of view) visible within the frame of the window 61 can be captured, depending on the distance between the window 61 and the object, such as a picture book. As shown in the figure, when the position of the window 61 relative to the surface of the picture book is P1, the range visible when looking through the window 61 is defined as S1. Note that the imaging range S1 is rectangular, while the window 61 is circular. Therefore, technically, the field of view is circular. However, for convenience, the imaging range S1 is defined as a rectangle in which the circle is an inscribed circle. If the window 61 is moved slightly away from the picture book and the surface of the picture book is viewed at position P2, the distance between the window 61 and the picture book increases, and the range visible when looking through the window 61 increases. Therefore, the imaging range S2 is corrected to be larger than the imaging range S1.

次に、音声読み上げシステム１００の動作について説明する。 Next, we will explain the operation of the text-to-speech system 100.

図６は第１実施形態の音声読み上げシステム１００の処理手順の一例を示すフローチャートである。撮像装置５０は、シャッターボタン操作を受け付け（Ｓ１１）、対象物までの距離を検出し（Ｓ１２）、撮像対象を撮像し（Ｓ１３）、対象物との距離に応じて撮像した画像データを補正する（Ｓ１４）。撮像装置５０は、撮像して得られた画像データを本体装置１０へ送信し（Ｓ１５）、処理を終了する。なお、画像データを補正することなく後の解析範囲とともに撮像データを送信する構成としてもよい。 Figure 6 is a flowchart showing an example of the processing steps of the text-to-speech system 100 of the first embodiment. The imaging device 50 accepts a shutter button operation (S11), detects the distance to the object (S12), captures an image of the object (S13), and corrects the captured image data according to the distance to the object (S14). The imaging device 50 transmits the image data obtained by capturing the image to the main device 10 (S15), and the processing ends. Note that the imaging data may also be transmitted together with the subsequent analysis range without correcting the image data.

本体装置１０は、画像データを受信し（Ｓ１６）、撮像対象が文字列であるか否かを判定する（Ｓ１７）。撮像対象が文字列である場合（Ｓ１７でＹＥＳ）、本体装置１０は、文字列を読み上げる（Ｓ１８）。本体装置１０は、文字列に基づいて感情指標を算出し（Ｓ１９）、感情指標に応じたＢＧＭを出力する（Ｓ２０）。 The main unit 10 receives the image data (S16) and determines whether the imaged subject is a character string (S17). If the imaged subject is a character string (YES in S17), the main unit 10 reads out the character string (S18). The main unit 10 calculates an emotion index based on the character string (S19) and outputs background music corresponding to the emotion index (S20).

本体装置１０は、文字列に基づいてシーンが変わったか否かを判定し（Ｓ２１）、シーンが変わった場合（Ｓ２１でＹＥＳ）、ＢＧＭを変更して出力し（Ｓ２２）、処理を終了する。シーンが変わっていない場合（Ｓ２１でＮＯ）、本体装置１０は、処理を終了する。 The main unit 10 determines whether the scene has changed based on the character string (S21). If the scene has changed (YES in S21), it changes the background music and outputs it (S22), and ends the process. If the scene has not changed (NO in S21), the main unit 10 ends the process.

ステップＳ１７において、撮像対象が文字列でない場合（Ｓ１７でＮＯ）、本体装置１０は、撮像対象が図であるか否かを判定する（Ｓ２３）。撮像対象が図である場合（Ｓ２３でＹＥＳ）、本体装置１０は、図４に例示したように、撮像対象である図の内容に関連する音を出力し（Ｓ２４）、処理を終了する。撮像対象が図でない場合（Ｓ２３でＮＯ）、本体装置１０は、処理を終了する。 If the image capture target is not a character string in step S17 (NO in S17), the main unit 10 determines whether the image capture target is a picture (S23). If the image capture target is a picture (YES in S23), the main unit 10 outputs a sound related to the content of the picture that is the image capture target (S24), as shown in FIG. 4, and ends the process. If the image capture target is not a picture (NO in S23), the main unit 10 ends the process.

（第２実施形態）
第１実施形態では、音声読み上げシステム１００は、撮像装置５０及び本体装置１０を備える構成であったが、第２実施形態では、本体装置１０の機能を撮像装置５０に組み込んで音声読み上げ装置としている。 Second Embodiment
In the first embodiment, the text-to-speech system 100 was configured to include an imaging device 50 and a main body device 10, but in the second embodiment, the functions of the main body device 10 are incorporated into the imaging device 50 to form a text-to-speech device.

図７は音声読み上げ装置２００の構成の一例を示すブロック図である。なお、音声読み上げ装置２００の外観形状は、第１実施形態の撮像装置５０と同様であり、把持部、シャッターボタン及び窓部を備えている。音声読み上げ装置２００は、制御部２０１、撮像部２０２、距離センサ２０３、補正部２０４、文字列解析部２０６及び図解析部２０７を有する解析部２０５、音声合成部２０８、順序推定部２０９、ＢＧＭリスト２１１及び音声データリスト２１２を記憶する記憶部２１０、マイク２１３、スピーカ２１４、及び感情指標算出部２１５を備える。制御部２０１、撮像部２０２、距離センサ２０３、補正部２０４、文字列解析部２０６及び図解析部２０７を有する解析部２０５、音声合成部２０８、順序推定部２０９、ＢＧＭリスト２１１及び音声データリスト２１２を記憶する記憶部２１０、マイク２１３、スピーカ２１４、及び感情指標算出部２１５の各機能は第１実施形態の場合と同様であるので説明は省略する。 Figure 7 is a block diagram showing an example of the configuration of the text-to-speech device 200. The external shape of the text-to-speech device 200 is similar to that of the imaging device 50 of the first embodiment, and includes a grip, a shutter button, and a window. The text-to-speech device 200 includes a control unit 201, an imaging unit 202, a distance sensor 203, a correction unit 204, an analysis unit 205 having a character string analysis unit 206 and a figure analysis unit 207, a voice synthesis unit 208, an order estimation unit 209, a memory unit 210 that stores a background music list 211 and an audio data list 212, a microphone 213, a speaker 214, and an emotion index calculation unit 215. The functions of the control unit 201, imaging unit 202, distance sensor 203, correction unit 204, analysis unit 205 having a character string analysis unit 206 and a graphic analysis unit 207, voice synthesis unit 208, order estimation unit 209, memory unit 210 storing a BGM list 211 and a voice data list 212, microphone 213, speaker 214, and emotion index calculation unit 215 are the same as in the first embodiment, so description thereof will be omitted.

（第３実施形態）
第３実施形態は、第１実施形態の場合と同様に撮像装置５０、本体装置３０を備え、さらにサーバ３００を備え、第１実施形態の本体装置１０の主な機能をサーバ３００に組み込んだ構成である。 (Third embodiment)
The third embodiment is configured to include an imaging device 50 and a main body device 30, as in the first embodiment, and further includes a server 300, in which the main functions of the main body device 10 of the first embodiment are incorporated.

図８は第３実施形態の音声読み上げシステムの構成の一例を示すブロック図である。撮像装置５０は、第１実施形態の撮像装置５０と同一の構成である。本体装置３０は、制御部３６、第１通信部３１、第２通信部３２、ＢＧＭリスト３３１を記憶する記憶部３３、マイク３４、スピーカ３５を備える。 Figure 8 is a block diagram showing an example of the configuration of a text-to-speech system according to the third embodiment. The imaging device 50 has the same configuration as the imaging device 50 according to the first embodiment. The main device 30 includes a control unit 36, a first communication unit 31, a second communication unit 32, a memory unit 33 that stores a background music list 331, a microphone 34, and a speaker 35.

第１通信部３１は、宅内ネットワーク１を介して、撮像装置５０との間の通信機能を実現する。 The first communication unit 31 realizes communication functions with the imaging device 50 via the home network 1.

第２通信部３２は、インターネットなどの通信ネットワーク２を介して、サーバ３００との間の通信機能を実現する。ＢＧＭリスト３３１、マイク３４、スピーカ３５は第１実施の形態の場合と同様であるので説明は省略する。 The second communication unit 32 realizes communication functions with the server 300 via a communication network 2 such as the Internet. The BGM list 331, microphone 34, and speaker 35 are the same as in the first embodiment, so their description will be omitted.

サーバ３００は、制御部３０１、通信部３０２、文字列解析部３０４及び図解析部３０５を有する解析部３０３、音声合成部３０６、順序推定部３０７、音声データリスト３０９を記憶する記憶部３０８、及び感情指標算出部３１０を備える。 The server 300 includes a control unit 301, a communication unit 302, an analysis unit 303 having a character string analysis unit 304 and a figure analysis unit 305, a voice synthesis unit 306, an order estimation unit 307, a memory unit 308 that stores a voice data list 309, and an emotion index calculation unit 310.

通信部３０２は、の通信ネットワーク２を介して、本体装置３０との間の通信機能を実現する。制御部３０１、解析部３０３、文字列解析部３０４、図解析部３０５、音声合成部３０６、順序推定部３０７、音声データリスト３０９、及び感情指標算出部３１０の各機能は第１実施形態の場合と同様であるので説明を省略する。 The communication unit 302 realizes communication functions with the main device 30 via the communication network 2. The functions of the control unit 301, analysis unit 303, character string analysis unit 304, graphic analysis unit 305, speech synthesis unit 306, order estimation unit 307, speech data list 309, and emotion index calculation unit 310 are the same as in the first embodiment, so explanations will be omitted.

図９は第３実施形態の音声読み上げシステムの処理手順の一例を示すフローチャートである。撮像装置５０の処理は第１実施形態の場合と同様であるので省略する。本体装置３０は、撮像装置５０が送信した画像データを受信し（Ｓ３１）、受信した画像データをサーバ３００へ送信する（Ｓ３２）。 Figure 9 is a flowchart showing an example of the processing procedure of the text-to-speech system of the third embodiment. The processing of the imaging device 50 is the same as in the first embodiment, so a description thereof is omitted. The main device 30 receives image data transmitted by the imaging device 50 (S31) and transmits the received image data to the server 300 (S32).

サーバ３００は、画像データを受信し（Ｓ３３）、撮像対象が文字列であるか否かを判定する（Ｓ３４）。撮像対象が文字列である場合（Ｓ３４でＹＥＳ）、サーバ３００は、文字列を音声データに変換し、変換した音声データを本体装置３０へ送信する（Ｓ３５）。本体装置３０は、音声を出力する（Ｓ３６）。これにより文字列を読み上げることができる。 The server 300 receives the image data (S33) and determines whether the imaged object is a character string (S34). If the imaged object is a character string (YES in S34), the server 300 converts the character string into audio data and transmits the converted audio data to the main device 30 (S35). The main device 30 outputs audio (S36). This allows the character string to be read aloud.

サーバ３００は、文字列に基づいて感情指標を算出し（Ｓ３７）、算出した感情指標を本体装置３０へ送信する（Ｓ３８）。本体装置３０は、感情指標を受信し（Ｓ３９）、受信した感情指標に応じたＢＧＭを出力する（Ｓ４０）。 The server 300 calculates an emotion index based on the character string (S37) and transmits the calculated emotion index to the main device 30 (S38). The main device 30 receives the emotion index (S39) and outputs background music according to the received emotion index (S40).

サーバ３００は、文字列に基づいてシーンが変わったか否かを判定し（Ｓ４１）、シーンが変わった場合（Ｓ４１でＹＥＳ）、本体装置３０に対してシーンの変更を通知する（Ｓ４２）。本体装置３０は、ＢＧＭを変更して出力する（Ｓ４３）。シーンが変わっていない場合（Ｓ４１でＮＯ）、サーバ３００は、処理を終了する。 The server 300 determines whether the scene has changed based on the character string (S41), and if the scene has changed (YES in S41), it notifies the main device 30 of the scene change (S42). The main device 30 changes the background music and outputs it (S43). If the scene has not changed (NO in S41), the server 300 ends the process.

ステップＳ３４において、撮像対象が文字列でない場合（Ｓ３４でＮＯ）、サーバ３００は、撮像対象が図であるか否かを判定する（Ｓ４４）。撮像対象が図である場合（Ｓ４４でＹＥＳ）、サーバ３００は、図の内容に関する音データを本体装置３０へ送信し（Ｓ４５）、処理を終了する。本体装置３０は、音データを受信し、受信した音データに基づいて音を出力し（Ｓ４６）、処理を終了する。撮像対象が図でない場合（Ｓ４４でＮＯ）、サーバ３００は、処理を終了する。 In step S34, if the imaging target is not a character string (NO in S34), the server 300 determines whether the imaging target is a picture (S44). If the imaging target is a picture (YES in S44), the server 300 transmits sound data relating to the content of the picture to the main device 30 (S45) and ends the process. The main device 30 receives the sound data and outputs sound based on the received sound data (S46), and ends the process. If the imaging target is not a picture (NO in S44), the server 300 ends the process.

本実施の形態の音声読み上げシステム及び音声読み上げ装置は、幼児や子供に対しては、家の中でのおうち遊び、保育園や幼稚園等の施設での遊び、一時預かり所（例えば、ショッピングモール、スーパー、美容室など）等での遊びに利用することができる。また、ユーザは幼児や子供に限定されるものではなく、老人、外国人、ハンディを持った人が本実施の形態の音声読み上げシステム及び音声読み上げ装置を利用してもよい。 The text-to-speech system and text-to-speech device of this embodiment can be used by infants and children for play at home, at facilities such as nurseries and kindergartens, and at temporary care facilities (for example, shopping malls, supermarkets, beauty salons, etc.). Furthermore, users are not limited to infants and children; elderly people, foreigners, and people with disabilities may also use the text-to-speech system and text-to-speech device of this embodiment.

本実施の形態の音声読み上げシステムの撮像装置、本体装置及び音声読み上げ装置は、玩具として販売又はレンタルによって利用者に提供することができる。また、サーバの機能をクラウド上で提供し対価としての利用料を得ることもできる。 The imaging device, main unit, and text-to-speech device of the text-to-speech system of this embodiment can be sold or rented to users as toys. It is also possible to provide server functions on the cloud and receive usage fees in return.

（第４実施形態）
第４実施形態では、前述の音声読み上げシステム１００と同様の機能を用いて、親などの保護者が、幼児などの子供の興味関心（特に最新の興味関心や、成長又は環境変化などによる興味関心の移り変わりなど）を把握することができる情報処理装置システム、情報処理装置、撮像装置などについて説明する。 (Fourth embodiment)
In the fourth embodiment, we will explain an information processing device system, information processing device, imaging device, etc. that uses functions similar to those of the above-mentioned voice reading system 100 to allow parents and other guardians to understand the interests of their children, such as young children (especially the latest interests and changes in interests due to growth or environmental changes, etc.).

近年、共働きの家庭が増加し、子供を保育園等に預けることが増えたため、保護者は、子供の日中の行動を観察することが困難になっており、子供が何に興味や関心を持ち始めたか、あるいは持っているかを把握しづらくなっている。保護者は日々少しずつ生じる子供の成長や変化を適切に捉えられていない。 In recent years, the number of dual-income households has increased, and parents are increasingly leaving their children in daycare centers, making it difficult for them to observe their children's daytime behavior and understanding what their children are beginning to become interested in, or if they even have any. Parents are unable to properly grasp the growth and changes in their children that occur little by little every day.

また、幼児教育での知的好奇心を伸ばすアプローチでは、多くの子供に適合させるために画一的なアプローチや大雑把なアプローチが多く、個人個人の知的好奇心や興味関心、嗜好の変遷などが考慮されていない。以下の実施形態では、これらの課題を解決すべく、リアルタイムな興味関心・嗜好の状態や、子供の成長や環境変化なども考慮して、個人個人に即した興味関心を伸ばす支援について説明する。 Furthermore, approaches to fostering intellectual curiosity in early childhood education are often one-size-fits-all or broad-based in order to suit many children, and do not take into account the changes in each individual's intellectual curiosity, interests, and preferences. In order to solve these issues, the following embodiment describes support for fostering interests that are tailored to each individual, taking into account the real-time state of interests and preferences, as well as the child's growth and changes in the environment.

図１０は第４実施形態の情報処理システムの構成の一例を示す図である。情報処理システムは、撮像装置５０、端末装置１００、及び情報処理装置としてのサーバ４００を備える。撮像装置５０は、第１実施形態の場合と同様の構成を備えるが、スピーカ５７を撮像装置５０内に組み込んである点で相違する。撮像装置５０は、朗読モード、又は効果音再生モードで撮像対象を撮像することができる。朗読モード、又は効果音再生モードは、スイッチやボタン等によって手動で切り替えてもよく、あるいは、サーバ４００の解析部４０３による解析結果によって自動で切り替えてもよい。撮像装置５０は、幼児や子供が手に持って遊ぶ虫めがねのようなデバイスであり、端末装置１００は、保護者が携帯する端末である。 Figure 10 is a diagram showing an example of the configuration of an information processing system according to the fourth embodiment. The information processing system includes an imaging device 50, a terminal device 100, and a server 400 as an information processing device. The imaging device 50 has the same configuration as in the first embodiment, but differs in that a speaker 57 is built into the imaging device 50. The imaging device 50 can capture an image of an imaging target in a reading mode or a sound effect playback mode. The reading mode or sound effect playback mode may be switched manually using a switch or button, or may be switched automatically based on the analysis results of the analysis unit 403 of the server 400. The imaging device 50 is a magnifying glass-like device that infants and children play with, and the terminal device 100 is a terminal carried by a parent or guardian.

端末装置１００は、第３実施形態の本体装置３０と同様の構成を備えるが、表示部１０４、操作部１０５を備える点で相違する。制御部１０１、第１通信部１０２、第２通信部１０３、及び記憶部１０６は、第３実施形態の本体装置３０の制御部３６、第１通信部３１、第２通信部３２、及び記憶部３３と同様である。端末装置１００は、例えば、スマートフォン、タブレット端末等で構成することができる。表示部１０４は、液晶ディスプレイ又は有機ＥＬ（Electro Luminescence）ディスプレイで構成することができる。操作部１０５は、タッチパネル等で構成され、表示部１０４上で文字の入力操作、表示部１０４に表示されたアイコン、画像又は文字等に対する操作を行うようにしてもよい。 The terminal device 100 has a configuration similar to that of the main device 30 of the third embodiment, but differs in that it includes a display unit 104 and an operation unit 105. The control unit 101, first communication unit 102, second communication unit 103, and memory unit 106 are similar to the control unit 36, first communication unit 31, second communication unit 32, and memory unit 33 of the main device 30 of the third embodiment. The terminal device 100 may be configured as, for example, a smartphone, a tablet terminal, or the like. The display unit 104 may be configured as a liquid crystal display or an organic EL (Electro Luminescence) display. The operation unit 105 may be configured as a touch panel or the like, and may be used to input characters on the display unit 104 and to operate icons, images, or characters displayed on the display unit 104.

サーバ４００は、サーバ４００全体を制御する制御部４０１、通信部４０２、解析部４０３、順序推定部４０６、音声合成部４０７、音再生部４０８、分析部４０９、及び記憶部４１０を備える。解析部４０３は、文字列解析部４０４、及び図解析部４０５を備える。記憶部４１０は、音声データリスト４１１を記憶している。制御部４０１、通信部４０２、解析部４０３、文字列解析部４０４、図解析部４０５、順序推定部４０６、音声合成部４０７、及び記憶部４１０は、第３実施形態のサーバ３００の制御部３０１、通信部３０２、解析部３０３、文字列解析部３０４、図解析部３０５、順序推定部３０７、音声合成部３０６、及び記憶部３０８と同様である。解析部４０３の解析結果が、撮像対象が図であるか文字列であるかに応じて、効果音再生モードか、あるいは朗読モードかを自動で切り替えてもよい。また、音再生部４０８は、制御部１１、３０１などの音再生機能と同様である。分析部４０９の詳細は後述する。 The server 400 includes a control unit 401 that controls the entire server 400, a communication unit 402, an analysis unit 403, an order estimation unit 406, a voice synthesis unit 407, a sound playback unit 408, an analysis unit 409, and a memory unit 410. The analysis unit 403 includes a character string analysis unit 404 and a figure analysis unit 405. The memory unit 410 stores a voice data list 411. The control unit 401, communication unit 402, analysis unit 403, character string analysis unit 404, figure analysis unit 405, order estimation unit 406, voice synthesis unit 407, and memory unit 410 are similar to the control unit 301, communication unit 302, analysis unit 303, character string analysis unit 304, figure analysis unit 305, order estimation unit 307, voice synthesis unit 306, and memory unit 308 of the server 300 of the third embodiment. The analysis unit 403 may automatically switch between sound effect playback mode and reading mode depending on the analysis result, which is whether the imaged object is a picture or a character string. The sound playback unit 408 has the same sound playback function as the control units 11, 301, etc. Details of the analysis unit 409 will be described later.

図１１は第４実施形態の情報処理システムの処理手順の一例を示す図である。撮像装置５０は、撮像対象を撮像し（Ｓ１０１）、撮像して得られた画像データを端末装置１００へ送信する（Ｓ１０２）。端末装置１００は、画像データを受信し（Ｓ１０３）、受信した画像データをサーバ４００へ送信する（Ｓ１０４）。 Figure 11 is a diagram showing an example of the processing procedure of the information processing system of the fourth embodiment. The imaging device 50 captures an image of the object to be imaged (S101) and transmits the image data obtained by the image capture to the terminal device 100 (S102). The terminal device 100 receives the image data (S103) and transmits the received image data to the server 400 (S104).

サーバ４００は、画像データを受信し（Ｓ１０５）、受信した画像データに基づいて撮像対象を解析する（Ｓ１０６）。サーバ４００は、撮像対象が図であるか文字列であるかに応じて、図の内容に関連する音又は文字列を読み上げる音声を生成する（Ｓ１０７）。図の内容に関連する音及び文字列を読み上げる音声を纏めて「音声」と称してもよい。 The server 400 receives the image data (S105) and analyzes the imaged object based on the received image data (S106). Depending on whether the imaged object is a figure or a string of characters, the server 400 generates a sound related to the content of the figure or a voice reading out the string of characters (S107). The sound related to the content of the figure and the voice reading out the string of characters may be collectively referred to as "voice."

サーバ４００は、生成した音又は音声を端末装置１００へ送信する（Ｓ１０８）。端末装置１００は、音又は音声を受信し（Ｓ１０９）、受信した音又は音声を撮像装置５０へ送信する（Ｓ１１０）。撮像装置５０は、音又は音声を受信し（Ｓ１１１）、受信した音又は音声を出力する（Ｓ１１２）。 The server 400 transmits the generated sound or audio to the terminal device 100 (S108). The terminal device 100 receives the sound or audio (S109) and transmits the received sound or audio to the imaging device 50 (S110). The imaging device 50 receives the sound or audio (S111) and outputs the received sound or audio (S112).

サーバ４００は、受信した画像データ、解析結果を記憶部４１０に記録し（Ｓ１１３）、音の再生又は音声の読み上げ回数を更新する（Ｓ１１４）。幼児や子供などのユーザが、撮像装置５０を持って撮像対象を撮像する都度、図１１に示す処理が繰り返され、画像データ、解析結果、音の再生又は音声の読み上げ回数などの情報を収集することができる。 The server 400 records the received image data and analysis results in the storage unit 410 (S113) and updates the number of times the sound is played or the voice is read aloud (S114). Each time a user such as an infant or child takes an image of an object with the imaging device 50, the process shown in FIG. 11 is repeated, and information such as the image data, analysis results, and number of times the sound is played or the voice is read aloud can be collected.

制御部４０１は、収集部としての機能を有し、通信部４０２を介して、撮像対象を撮像して得られた画像データを収集する。分析部４０９は、収集した画像データに基づいて、撮像対象を撮像したユーザ（幼児や子供）の興味関心を分析する。制御部４０１は、提供部としての機能を有し、分析部４０９の分析結果を提供することができる。 The control unit 401 functions as a collection unit and collects image data obtained by capturing images of the imaging target via the communication unit 402. The analysis unit 409 analyzes the interests of the user (infant or child) who captured the image of the imaging target based on the collected image data. The control unit 401 functions as a provision unit and can provide the analysis results of the analysis unit 409.

このように、幼児や子供が興味や関心を持って撮像した撮像対象の画像データをライフログの一つとして収集し、収集したライフログに基づいて、幼児や子供の日々の興味関心を分析し、分析結果を保護者にフィードバックすることにより、保護者は、幼児や子供の興味関心に即したフォローや後押しが可能となる。例えば、保護者は、子供が興味や関心を示す分野に関連するグッズを買い与えることや、子供が興味や関心を示す場所や施設などに連れて行くことができる。 In this way, image data of subjects captured by infants and children that interest them can be collected as part of a life log, and the infant's or child's daily interests can be analyzed based on the collected life log. The analysis results can then be fed back to parents, allowing parents to follow up and support their infants and children in accordance with their interests. For example, parents can buy their children goods related to areas that interest them, or take them to places or facilities that interest them.

次に、分析部４０９による分析処理の詳細について説明する。分析部４０９は、（１）興味関心分析機能、（２）興味関心タイプ分析機能、（３）活動タイプ分析機能、及び（４）好きな色分析機能などの各機能を備える。分析対象となるライフログは、撮像装置５０で撮像されて収集された画像データ、及び朗読モードと効果音再生モードそれぞれの使用回数とすることができる。以下、各分析機能について説明する。 Next, the analysis process performed by the analysis unit 409 will be described in detail. The analysis unit 409 has various functions, such as (1) an interest analysis function, (2) an interest type analysis function, (3) an activity type analysis function, and (4) a favorite color analysis function. The life log to be analyzed can be image data captured and collected by the imaging device 50, and the number of times the reading mode and sound effect playback mode are used. Each analysis function will be described below.

図１２は興味関心分析機能の処理手順の一例を示す図である。以下では、便宜上、処理の主体を制御部４０１として説明する。制御部４０１は、画像データを収集し（Ｓ１２１）、第１所定期間に亘って収集したか否かを判定する（Ｓ１２２）。第１所定期間は、例えば、１週間とすることができるが、これに限定されない。第１所定期間に亘って収集していない場合（Ｓ１２２でＮＯ）、制御部４０１は、ステップＳ１２１の処理を続ける。 Figure 12 is a diagram showing an example of the processing procedure for the interest analysis function. For convenience, the following description will be given with the control unit 401 as the subject of processing. The control unit 401 collects image data (S121) and determines whether the data has been collected for a first predetermined period (S122). The first predetermined period can be, for example, one week, but is not limited to this. If the data has not been collected for the first predetermined period (NO in S122), the control unit 401 continues processing in step S121.

第１所定期間に亘って収集した場合（Ｓ１２２でＹＥＳ）、制御部４０１は、撮像対象を分野別に分類する（Ｓ１２３）。具体的には、分析部４０９が撮像対象を分野別に分類する。分析部４０９は、物体検出のための学習モデルを備えてもよい。学習モデルは、例えば、ＨＯＧ（Histogram of Oriented Gradients）、Ｒ－ＣＮＮ（Region-based CNN）、ＦａｓｔＲ－ＣＮＮ、ＲＰＮ（Region Proposal Network）、ＹＯＬＯ（You Only Look Once）、ＳＳＤ（Single Shot Detector）、Transformerなどを含む。分析部４０９によって検出された物体（オブジェクト）を分野（カテゴリ）毎に分類すればよい。分野としては、例えば、電車、くるま、飛行機、花、動物、食べ物、楽器、魚、昆虫、人形など適宜決定することができる。 If the images have been collected over the first predetermined period (YES in S122), the control unit 401 classifies the imaged objects by category (S123). Specifically, the analysis unit 409 classifies the imaged objects by category. The analysis unit 409 may be equipped with a learning model for object detection. Examples of learning models include HOG (Histogram of Oriented Gradients), R-CNN (Region-based CNN), Fast R-CNN, RPN (Region Proposal Network), YOLO (You Only Look Once), SSD (Single Shot Detector), and Transformer. The objects detected by the analysis unit 409 can be classified by category. Categories can be determined as appropriate, for example, trains, cars, airplanes, flowers, animals, food, musical instruments, fish, insects, dolls, etc.

制御部４０１は、分野毎に撮像対象の合計数を算出する（Ｓ１２４）。例えば、１週間の間に子供が「電車」に分類されるオブジェクトを２０個撮像したとすると、「電車」の件数を２０件とする。制御部４０１は、撮像数の多いものを「興味あり」の分野として登録する（Ｓ１２５）。例えば、分野毎に撮像数を算出し、撮像数の多い順に上位５個の分野を「興味あり」の分野として登録する。なお、「興味あり」の分野の数は５個に限定されない。 The control unit 401 calculates the total number of objects captured for each category (S124). For example, if a child captures 20 images of objects classified as "trains" over the course of a week, the number of "trains" is set to 20. The control unit 401 registers the fields with the most images as "interesting" categories (S125). For example, the control unit 401 calculates the number of images captured for each category, and registers the top five fields with the most images captured as "interesting" categories. Note that the number of "interesting" categories is not limited to five.

制御部４０１は、直近の第１所定期間（例えば、先週）における分野毎の撮像数と比較して、今回の第１所定期間（例えば、今週）における分野毎の撮像数が増加傾向にある分野を「急上昇」として登録する（Ｓ１２６）。例えば、先週と今週の分野毎の撮像数の差分を算出し、算出した差分が所定の差分閾値以上である分野を「急上昇」の分野として登録することができる。あるいは、算出した差分が最も大きい分野を「急上昇」の分野として登録してもよい。 The control unit 401 compares the number of images taken by each category in the most recent first predetermined period (e.g., last week) and registers as a "rapidly rising" category any category in which the number of images taken by each category in the current first predetermined period (e.g., this week) is on the rise (S126). For example, the difference between the number of images taken by each category in last week and this week can be calculated, and any category in which the calculated difference is equal to or greater than a predetermined difference threshold can be registered as a "rapidly rising" category. Alternatively, the category in which the calculated difference is the largest can be registered as a "rapidly rising" category.

制御部４０１は、分野毎の撮像数のうち、最も出現数の多い分野を「マイブーム」として登録する（Ｓ１２７）。例えば、１週間で撮像されたオブジェトの数が最も多い分野を「マイブーム」とすることができる。 The control unit 401 registers the field with the most appearances among the number of images taken for each field as "my fad" (S127). For example, the field with the most objects taken in one week can be designated as "my fad."

制御部４０１は、分析結果（「興味あり」、「急上昇」、「マイブーム」）を端末装置１００へ送信し、端末装置１００は、分析結果を表示する。これにより、制御部４０１は、分析結果を提供し（Ｓ１２８）、処理を終了する。なお、「興味あり」、「急上昇」、及び「マイブーム」の文言は一例であって、これらの文言に限定されるものではない。 The control unit 401 transmits the analysis results ("Interested," "Rapidly Rising," "My Craze") to the terminal device 100, which then displays the analysis results. The control unit 401 then provides the analysis results (S128) and terminates the process. Note that the words "Interested," "Rapidly Rising," and "My Craze" are merely examples, and the present invention is not limited to these words.

上述のように、分析部４０９は、第１解析部としての機能を有し、収集した画像データに基づいて撮像対象の分野を解析することができる。分析部４０９は、第１所定期間の都度収集した解析結果に基づいて、幼児や子供（ユーザ）の興味関心を分析してもよい。 As described above, the analysis unit 409 functions as a first analysis unit and can analyze the field of the image subject based on the collected image data. The analysis unit 409 may also analyze the interests of infants and children (users) based on the analysis results collected each time a first predetermined period of time is reached.

分析部４０９は、解析した分野毎に撮像された撮像対象の数に基づいて、ユーザの興味関心に関する「興味あり」（第１指標）を分析してもよい。これにより、保護者は、子供の「興味の持ち始め」を見逃すことなく、適切なフォローや後押しを子供に与えることが可能となる。 The analysis unit 409 may analyze the user's "interested" (first indicator) based on the number of subjects captured for each analyzed field. This allows parents to provide appropriate support and encouragement to their children without missing their child's "beginning of interest."

分析部４０９は、第１所定期間毎の、分野毎に撮像された撮像対象の数の変化に基づいてユーザの興味関心に関する「急上昇」（第２指標）を分析してもよい。また、分析部４０９は、分野毎に撮像された撮像対象の数のうち、撮像対象の数が最多の分野に基づいてユーザの興味関心に関する「マイブーム」（第３指標）を分析してもよい。 The analysis unit 409 may analyze a "rapid increase" (second indicator) related to the user's interests based on changes in the number of subjects photographed for each field for each first predetermined period. Furthermore, the analysis unit 409 may analyze a "current fad" (third indicator) related to the user's interests based on the field with the largest number of subjects photographed among the number of subjects photographed for each field.

図１３は興味関心分析機能の分析結果の一例を示す図である。図１３に示す「ＯＯちゃんの興味関心分析結果」画面５０１は、端末装置１００の表示部１０４に表示することができる。「ＯＯちゃんの興味関心分析結果」画面５０１は、例えば、「日付が正しくありません。」などのエラーメッセージを表示するメッセージ表示エリア５０２、今週の１週間に撮影した対象物の分野別の比率を表示する表示エリア５０３、「興味あり」の分野を表示する表示エリア５０５、「マイブーム」の分野を表示する表示エリア５０６、「急上昇」の分野を表示する表示エリア５０７を有する。また、全てのログを見るための「全部のログを見る」アイコン５０４が表示されている。 Figure 13 is a diagram showing an example of the analysis results of the interest analysis function. The "OO-chan's interest analysis results" screen 501 shown in Figure 13 can be displayed on the display unit 104 of the terminal device 100. The "OO-chan's interest analysis results" screen 501 has a message display area 502 that displays an error message such as "The date is incorrect," a display area 503 that displays the ratio of subjects photographed this week by category, a display area 505 that displays "interesting" categories, a display area 506 that displays "current obsession" categories, and a display area 507 that displays "trending" categories. In addition, a "View all logs" icon 504 is displayed to view all logs.

図１３の例では、「興味あり」の分野として、「電車」、「花」、「くるま」、「食べ物」、「動物」が表示され、それぞれの分野における撮影数として、２０件、１８件、１５件、１０件、９件という数値が表示されている。「マイブーム」では、撮影数が最も多い「電車」の分野の中から、例えば、撮影数が最も多いオブジェクト（図１３の例では、「新幹線」の画像）を表示するとともに、「いまのマイブームは新幹線！」の如く文言を表示する。これにより、保護者は、子供のリアルタイムな興味関心、嗜好の状態を容易に把握できる。「急上昇」では、『「食べ物」に最近興味がでてきたようです』の如く文言を表示する。これにより、保護者は、興味関心の変化や、嗜好の変化を適切に捉えることができ、子供との日々のコミュニケーションや生活（購買活動など）に役立てることができる。 In the example of Figure 13, the "interesting" categories are displayed as "Trains," "Flowers," "Cars," "Food," and "Animals," and the number of photos taken in each category is displayed as 20, 18, 15, 10, and 9, respectively. In "My Craze," for example, the object with the most photos (in the example of Figure 13, an image of a "Bullet Train") is displayed from the "Trains" category, which has the most photos taken, along with a message such as "My current craze is the Bullet Train!" This allows parents to easily grasp their child's real-time interests and tastes. In "Trending," a message such as "It seems he's recently become interested in 'Food'" is displayed. This allows parents to appropriately grasp changes in interests and tastes, which can be useful in daily communication with their children and in their daily lives (such as purchasing activities).

図１４は興味関心タイプ分析機能の処理手順の一例を示す図である。制御部４０１は、直近の第１所定期間（例えば、先週）における「興味あり」のランキングと、今回の第１所定期間（例えば、今週）における「興味あり」のランキングとを比較し（Ｓ１３１）、上位のランキングに変化があるか否かを判定する（Ｓ１３２）。例えば、図１３に例示したように、「興味あり」のランキングを１位から５位まで表示している場合、上位のランキングは、１位及び２位のランキングとすることができるが、これに限定されるものではない。先週の上位２位までのランキングを、例えば、１位が「動物」、２位が「花」とし、今週の上位２位までのランキングを、例えば、１位が「電車」、２位が「花」とすると、先週から今週にかけて、ランキング１位が「動物」から「電車」に変化しているので、この場合、上位ランキングに変化ありと判定できる。 Figure 14 is a diagram showing an example of the processing procedure for the interest type analysis function. The control unit 401 compares the "interested" rankings for the most recent first predetermined period (e.g., last week) with the "interested" rankings for the current first predetermined period (e.g., this week) (S131) and determines whether there has been a change in the top rankings (S132). For example, as shown in Figure 13, if the "interested" rankings are displayed from 1st to 5th, the top rankings can be, but are not limited to, the first and second rankings. If the top two rankings last week were, for example, "animals" and "flowers," and the top two rankings this week are, for example, "trains" and "flowers," then the top ranking has changed from "animals" to "trains" from last week to this week, and in this case, it can be determined that there has been a change in the top rankings.

上位のランキングに変化がある場合（Ｓ１３２でＹＥＳ）、制御部４０１は、ユーザの興味関心タイプを「好奇心旺盛」タイプに分類し（Ｓ１３３）、分析結果（興味関心タイプ）を端末装置１００に提供し（Ｓ１３４）、処理を終了する。 If there is a change in the top rankings (YES in S132), the control unit 401 classifies the user's interest type as "curious" (S133), provides the analysis results (interest type) to the terminal device 100 (S134), and ends the process.

上位のランキングに変化がない場合（Ｓ１３２でＮＯ）、制御部４０１は、ランキングに変化がないか否かを判定する（Ｓ１３５）。ランキングに変化がない場合（Ｓ１３５でＹＥＳ）、すなわち、先週と今週とで１位から５位までのランキングに変化がない場合、制御部４０１は、ユーザの興味関心タイプを「熟考型博士」タイプに分類し（Ｓ１３６）、ステップＳ１３４の処理を行う。 If there is no change in the top rankings (NO in S132), the control unit 401 determines whether there is any change in the rankings (S135). If there is no change in the rankings (YES in S135), that is, if there is no change in the rankings from 1st to 5th place between last week and this week, the control unit 401 classifies the user's interest type as the "contemplative doctor" type (S136) and performs the processing of step S134.

ランキングに変化がある場合（Ｓ１３５でＮＯ）、すなわち、上位の除く下位のランキング（例えば、３位から５位までのランキング）に変化がある場合、制御部４０１は、ユーザの興味関心タイプを「中間」タイプに分類し（Ｓ１３７）、ステップＳ１３４の処理を行う。また、ランキングの変化は、上位下位の入れ替わりだけでなく、ランキング全体の変化で判定してもよい。例えば、検出数の多い順の週次ランキングを最上位から所定数（所定数は可変）の順位までのランキング（例えば、ＴＯＰ２０位まで等）のうち、何割が入れ替わったかに応じて判定してもよい。例えば、ランキングが入れ替わったものがＮ割以上の場合には「好奇心旺盛」と判定し、ランキングが入れ替わらなかったものがＮ割以下の場合には「熟考型博士」と判定し、これら以外の場合には「中間」と判定してもよい。Ｎの数値は適宜設定可能である。なお、「好奇心旺盛」、「熟考型博士」、及び「中間」の文言は一例であって、これらの文言に限定されるものではない。 If there is a change in the rankings (NO in S135), that is, if there is a change in the lower rankings excluding the top rankings (e.g., rankings from 3rd to 5th), the control unit 401 classifies the user's interest type as "intermediate" (S137) and performs the processing of step S134. Furthermore, a change in rankings may be determined not only by a change in the top or bottom rankings, but also by a change in the overall rankings. For example, a change may be determined based on what percentage of the rankings from the top to a predetermined number (the predetermined number is variable) (e.g., the top 20) in the weekly rankings sorted by the number of detections have changed. For example, if N% or more of the rankings have changed, the user may be determined to be "curious," whereas if N% or less of the rankings have not changed, the user may be determined to be "thoughtful doctor," and in all other cases, the user may be determined to be "intermediate." The value N can be set as appropriate. Note that the terms "curious," "thoughtful doctor," and "intermediate" are merely examples and are not limited to these terms.

上述のように、分析部４０９は、特定部としての機能を有し、第１所定期間毎に分析した「興味あり」（第１指標）の変化を特定し、特定した「興味あり」の変化に応じて、ユーザの興味関心に関するタイプを分析してもよい。 As described above, the analysis unit 409 functions as an identification unit, and may identify changes in "interested" (first indicator) analyzed for each first predetermined period, and analyze the user's type of interest based on the identified change in "interested."

図１５は興味関心タイプ分析機能の分析結果の一例を示す図である。図１５に示す「ＯＯちゃんの興味関心分析結果」画面５１１は、端末装置１００の表示部１０４に表示することができる。「ＯＯちゃんの興味関心分析結果」画面５１１は、例えば、エラーメッセージを表示するメッセージ表示エリア５０２、興味関心タイプを表示する表示エリア５１２、今週のログを表示する表示エリア５１４を有する。 Figure 15 is a diagram showing an example of the analysis results of the interest type analysis function. The "OO-chan's interest analysis results" screen 511 shown in Figure 15 can be displayed on the display unit 104 of the terminal device 100. The "OO-chan's interest analysis results" screen 511 has, for example, a message display area 502 that displays error messages, a display area 512 that displays interest types, and a display area 514 that displays this week's log.

図１５の例では、興味関心タイプとして、『ＯＯちゃんは「好奇心旺盛」タイプいろいろなことに興味があります』の如く文言を表示されている。これにより、保護者は、子供の興味関心タイプを把握することができ、子供の興味関心タイプに合わせたフォローや後押しを行うことができる。「写真を選択」アイコン５１３を操作することにより、保護者は、端末装置１００に記録した子供の写真や、他のスマートフォンやＰＣからアップロードした子供の写真の中から、所望の写真を表示することができる。 In the example of Figure 15, the interest type is displayed as "OO-chan is a 'curious' type and is interested in a variety of things." This allows parents to understand their child's interest type and provide follow-up and support that matches their child's interest type. By operating the "Select Photo" icon 513, parents can display a desired photo from among photos of their child stored on the terminal device 100 or photos of their child uploaded from another smartphone or PC.

今週のログとして、「興味あり」分野のランキング、今週撮影した対象物の分野別の比率、「急上昇」などを表示することができる。「詳しく見る」アイコン５１５を操作することにより、「興味あり」分野のランキングをさらに詳しく表示させることができる。また、「先週のログを見る」アイコン５１６を操作することにより、今週のログに代えて、あるいは今週のログとともに、先週のログを表示させることができる。 This week's log can display rankings of "interesting" categories, the percentage of objects photographed this week by category, and "trending" categories. By operating the "View details" icon 515, the rankings of "interesting" categories can be displayed in more detail. Additionally, by operating the "View last week's log" icon 516, last week's log can be displayed instead of or in addition to this week's log.

これにより、保護者は、子供の興味関心タイプを適切に捉えることができ、子供との日々のコミュニケーションや生活（購買活動など）に役立てることができる。 This allows parents to accurately grasp their child's interests and can use this information in their daily communication with their children and in their daily lives (such as purchasing activities).

図１６は活動タイプ分析機能の処理手順の一例を示す図である。制御部４０１は、効果音再生機能及び朗読機能の使用回数を記録し（Ｓ１４１）、第３所定期間に亘って記録したか否かを判定する（Ｓ１４２）。効果音再生機能及び朗読機能の使用回数は、効果音再生モード及び朗読モードでの使用回数である。効果音再生モード及び朗読モードは手動又は自動で設定することができる。第３所定期間は、例えば、１か月とすることができるが、これに限定されるものではない。 Figure 16 shows an example of the processing procedure for the activity type analysis function. The control unit 401 records the number of times the sound effects playback function and the reading function are used (S141) and determines whether the recording has been completed over a third predetermined period (S142). The number of times the sound effects playback function and the reading function are used is the number of times they are used in sound effects playback mode and reading mode. The sound effects playback mode and reading mode can be set manually or automatically. The third predetermined period can be, for example, one month, but is not limited to this.

第３所定期間に亘って記録していない場合（Ｓ１４２でＮＯ）、制御部４０１は、ステップＳ１４１の処理を続ける。第３所定期間に亘って記録した場合（Ｓ１４２でＹＥＳ）、制御部４０１は、効果音再生機能の使用回数の割合が全体のＭ割以上であるか否かを判定する（Ｓ１４３）。全体は、効果音再生機能の使用回数と朗読機能の使用回数との合計数である。また、Ｍの数値は可変であり、適宜変更することができる。効果音再生機能の使用回数の割合が全体のＭ割以上である場合（Ｓ１４３でＹＥＳ）、制御部４０１は、ユーザの活動タイプを「探検家タイプ」に分類し（Ｓ１４４）、分析結果（活動タイプ）を端末装置１００に提供し（Ｓ１４５）、処理を終了する。 If recording has not been performed for the third predetermined period (NO in S142), the control unit 401 continues processing in step S141. If recording has been performed for the third predetermined period (YES in S142), the control unit 401 determines whether the proportion of the number of times the sound effects playback function has been used is equal to or greater than M% of the total (S143). The total is the total number of times the sound effects playback function has been used and the reading function has been used. The value of M is variable and can be changed as appropriate. If the proportion of the number of times the sound effects playback function has been used is equal to or greater than M% of the total (YES in S143), the control unit 401 classifies the user's activity type as "explorer type" (S144), provides the analysis result (activity type) to the terminal device 100 (S145), and ends processing.

効果音再生機能の使用回数の割合が全体のＭ割以上でない場合（Ｓ１４３でＮＯ）、制御部４０１は、朗読機能の使用回数の割合が全体のＭ割以上であるか否かを判定する（Ｓ１４６）。朗読機能の使用回数の割合が全体のＭ割以上である場合（Ｓ１４６でＹＥＳ）、制御部４０１は、ユーザの活動タイプを「読書家タイプ」に分類し（Ｓ１４７）、ステップＳ１４５の処理を行う。朗読機能の使用回数の割合が全体のＭ割以上でない場合（Ｓ１４６でＮＯ）、制御部４０１は、ユーザの活動タイプを「興味津々タイプ」に分類し（Ｓ１４８）、ステップＳ１４５の処理を行う。 If the percentage of times the sound effect playback function has been used is less than M% of the total (NO in S143), the control unit 401 determines whether the percentage of times the reading function has been used is more than M% of the total (S146). If the percentage of times the reading function has been used is more than M% of the total (YES in S146), the control unit 401 classifies the user's activity type as the "bookworm type" (S147) and performs the processing of step S145. If the percentage of times the reading function has been used is less than M% of the total (NO in S146), the control unit 401 classifies the user's activity type as the "very interested type" (S148) and performs the processing of step S145.

上述のように、解析部４０３は、第３解析部としての機能を有し、収集した画像データに基づいて撮像対象が文字列であるか図であるかを解析することができる。音声合成部４０７は、朗読部としての機能を有し、撮像対象が文字列であると解析された場合、文字列を読み上げることができる。音再生部４０８は、再生部としての機能を有し、撮像対象が図であると解析した場合、当該図の内容に関連する音を再生することができる。分析部４０９は、第３所定期間（例えば、１か月など）に亘る、音声合成部４０７で読み上げた回数、及び音再生部４０８で再生した回数に基づいて、ユーザの活動タイプを分析してもよい。 As described above, the analysis unit 403 functions as a third analysis unit and can analyze whether the imaged object is a string of characters or a figure based on the collected image data. The voice synthesis unit 407 functions as a reading unit and can read out the string of characters if the imaged object is analyzed to be a string of characters. The sound playback unit 408 functions as a playback unit and can play back sound related to the content of the figure if the imaged object is analyzed to be a figure. The analysis unit 409 may analyze the user's activity type based on the number of times the voice synthesis unit 407 reads out the text and the number of times the sound playback unit 408 plays it back over a third predetermined period (e.g., one month).

図１７は活動タイプ分析機能の分析結果の一例を示す図である。図１７に示す「ＯＯちゃんの興味関心分析結果」画面５２１は、端末装置１００の表示部１０４に表示することができる。「ＯＯちゃんの興味関心分析結果」画面５２１は、例えば、エラーメッセージを表示するメッセージ表示エリア５０２、活動タイプを表示する表示エリア５２２、今週のログを表示する表示エリア５１４を有する。 Figure 17 is a diagram showing an example of the analysis results of the activity type analysis function. The "OO-chan's Interests Analysis Results" screen 521 shown in Figure 17 can be displayed on the display unit 104 of the terminal device 100. The "OO-chan's Interests Analysis Results" screen 521 has, for example, a message display area 502 that displays error messages, a display area 522 that displays activity types, and a display area 514 that displays this week's log.

図１７の例では、活動タイプとして、『ＯＯちゃんは「探検家」タイプいろいろなものを探して遊ぶのが好きです』の如く文言を表示されている。これにより、保護者は、子供の活動タイプを把握することができ、子供の活動タイプに合わせたフォローや後押しを行うことができる。「写真を選択」アイコン５１３を操作することにより、保護者は、端末装置１００に記録した子供の写真や、他のスマートフォンやＰＣからアップロードした子供の写真の中から、所望の写真を表示することができる。 In the example of Figure 17, the activity type is displayed as "OO-chan is an 'explorer' type who likes to play by exploring various things." This allows parents to understand their child's activity type and provide follow-up and support that matches their child's activity type. By operating the "Select Photo" icon 513, parents can display a desired photo from among photos of their child recorded on the terminal device 100 or photos of their child uploaded from another smartphone or PC.

今週のログは、図１５の場合と同様であるので、説明は省略する。このように、保護者は、子供の活動タイプを適切に捉えることができ、子供との日々のコミュニケーションや生活（購買活動など）に役立てることができる。 This week's log is the same as in Figure 15, so a detailed explanation will be omitted. In this way, parents can accurately grasp their child's activity type, which can be useful in daily communication with their child and in daily life (such as purchasing activities).

図１８は好きな色分析機能の処理手順の一例を示す図である。制御部４０１は、画像データを収集し（Ｓ１５１）、第２所定期間に亘って収集したか否かを判定する（Ｓ１５２）。第２所定期間は、例えば、１週間とすることができるが、これに限定されない。第２所定期間に亘って収集していない場合（Ｓ１５２でＮＯ）、制御部４０１は、ステップＳ１５１の処理を続ける。 Figure 18 shows an example of the processing procedure for the favorite color analysis function. The control unit 401 collects image data (S151) and determines whether the data has been collected for a second predetermined period (S152). The second predetermined period can be, for example, one week, but is not limited to this. If the data has not been collected for the second predetermined period (NO in S152), the control unit 401 continues processing in step S151.

第２所定期間に亘って収集した場合（Ｓ１５２でＹＥＳ）、制御部４０１は、撮像対象に含まれる色を分類する（Ｓ１５３）。具体的には、分析部４０９は、色分析機能を有する学習モデルを備え、画像内の領域の色や画像内で使われている色を検出することができる。分析部４０９は、頻出色別に色を分類することができる。 If the data has been collected over a second predetermined period (YES in S152), the control unit 401 classifies the colors contained in the imaged subject (S153). Specifically, the analysis unit 409 is equipped with a learning model with a color analysis function, and can detect the colors of areas within the image and the colors used within the image. The analysis unit 409 can classify colors by frequently occurring colors.

制御部４０１は、色毎に撮像対象の合計数を算出する（Ｓ１５４）。すなわち、分類した色ごとに検出されたオブジェクトの合計数を算出すればよい。例えば、１週間の間に子供が黄色に分類されるオブジェクトを２０個撮像したとすると、「黄色」の件数を２０件とする。制御部４０１は、撮像数の多いものを「好きな色」として登録する（Ｓ１５５）。 The control unit 401 calculates the total number of objects captured for each color (S154). That is, the total number of objects detected for each classified color can be calculated. For example, if a child captures 20 images of objects classified as yellow over the course of a week, the number of "yellow" objects is set to 20. The control unit 401 registers the color captured the most as the "favorite color" (S155).

制御部４０１は、直近の第２所定期間（例えば、先週）における色毎の撮像数と比較して、今回の第２所定期間（例えば、今週）における色毎の撮像数が増加傾向にある色を「急上昇」として登録する（Ｓ１５６）。例えば、先週と今週の色毎の撮像数の差分を算出し、算出した差分が所定の差分閾値以上である色を「急上昇」の色として登録することができる。あるいは、算出した差分が最も大きい色を「急上昇」の色として登録してもよい。 The control unit 401 compares the number of images taken for each color in the most recent second predetermined period (e.g., last week) and registers the color for which the number of images taken for each color in the current second predetermined period (e.g., this week) is on an increasing trend as a "rapidly rising" color (S156). For example, the difference between the number of images taken for each color last week and this week can be calculated, and colors for which the calculated difference is equal to or greater than a predetermined difference threshold can be registered as "rapidly rising" colors. Alternatively, the color for which the calculated difference is the largest can be registered as a "rapidly rising" color.

制御部４０１は、分析結果（「好きな色」、「急上昇」）を端末装置１００に提供し（Ｓ１５７）、処理を終了する。なお、「好きな色」、及び「急上昇」の文言は一例であって、これらの文言に限定されるものではない。 The control unit 401 provides the analysis results ("Favorite Color", "Rapidly Rising") to the terminal device 100 (S157), and ends the process. Note that the words "Favorite Color" and "Rapidly Rising" are merely examples and are not limited to these words.

上述のように、分析部４０９は、第２解析部としての機能を有し、収集した画像データに基づいて撮像対象に含まれる色を解析する。分析部４０９は、第２所定期間の都度収集した解析結果に基づいて、ユーザの色に関する興味関心を分析してもよい。 As described above, the analysis unit 409 functions as a second analysis unit and analyzes the colors contained in the image capture subject based on the collected image data. The analysis unit 409 may analyze the user's color interests based on the analysis results collected each time a second predetermined period of time is reached.

図１９は好きな色分析機能の分析結果の一例を示す図である。図１９に示す「ＯＯちゃんの色の好み分析結果」画面５３１は、端末装置１００の表示部１０４に表示することができる。「ＯＯちゃんの色の好み分析結果」画面５３１は、例えば、エラーメッセージを表示するメッセージ表示エリア５０２、好きな色の画像を表示する表示エリア５３２、好きな色を表示する表示エリア５３３、「急上昇」の色を表示する表示エリア５３４を有する。 Figure 19 is a diagram showing an example of the analysis results of the favorite color analysis function. The "OO-chan's Color Preference Analysis Results" screen 531 shown in Figure 19 can be displayed on the display unit 104 of the terminal device 100. The "OO-chan's Color Preference Analysis Results" screen 531 has, for example, a message display area 502 that displays an error message, a display area 532 that displays an image of a favorite color, a display area 533 that displays a favorite color, and a display area 534 that displays a "trending" color.

表示エリア５３２には、『ＯＯちゃんは特に「黄色」が好きみたいです！』の如く文言を、好きな色（黄色）で描かれた画像１～４が表示されている。画像１～４は、一定時間経過の都度、ライフログの中から別の画像に切り替えて表示してもよい。好きな色の順位（図１９の例では、１位が「きいろ」、２位が「あか」、３位が「あお」）が表示されている。 Display area 532 displays images 1-4, each drawn in the user's favorite color (yellow), with the phrase "OO-chan seems to particularly like yellow!". Images 1-4 may be switched to a different image from the life log after a certain period of time has passed. The ranking of favorite colors (in the example of Figure 19, first place is "yellow," second place is "red," and third place is "blue") is also displayed.

表示エリア５３３には、好きな色の順に、色ごとに検出されたオブジェクトの数が表示されている。図１９の例では、「きいろ」が２０件、「あか」が１８件、「あお」が１５件、「オレンジ」が１０件、「緑」が７件という数値が表示されている。「緑」は、先週の分析結果と今週の分析結果と比較して、今週の撮影数が増加傾向にある色であり、「急上昇！」の文言が付与されている。 Display area 533 displays the number of objects detected for each color, in order of favorite color. In the example of Figure 19, the following numbers are displayed: "yellow" 20, "red" 18, "blue" 15, "orange" 10, and "green" 7. "Green" is a color that has shown an increasing number of photographs this week when comparing last week's analysis results with this week's analysis results, and is labeled "Rapidly Rising!"

表示エリア５３４には、『「緑」に最近興味がでてきたようです』の如く文言が表示され、緑で描かれた画像１～２が表示されている。これにより、保護者は、リアルタイムに子供が興味を持ち始めた色、好きな色を容易に把握できる。また、保護者は、興味のある色の変化を適切に捉えることができ、子供との日々のコミュニケーションや生活（購買活動など）に役立てることができる。前述の例では、図１５に示した興味関心タイプ、図１７に示した活動タイプ、及び図１９に示した好みの色分析を、便宜上それぞれ個別の図を用いて説明したが、これらは一例であって、興味関心タイプ、活動タイプ、及び好みの色分析は、同一画面上で同時に表示することができる。 Display area 534 displays a message such as "It seems he has recently become interested in 'green,'" along with images 1 and 2 drawn in green. This allows parents to easily understand in real time which colors their child has become interested in and which colors they like. Parents can also accurately grasp changes in colors of interest, which can be useful in daily communication with their children and in daily life (such as purchasing activities). In the above example, the interest types shown in Figure 15, the activity types shown in Figure 17, and the preferred color analysis shown in Figure 19 were explained using separate figures for convenience, but these are just examples, and the interest types, activity types, and preferred color analysis can be displayed simultaneously on the same screen.

（第５実施形態）
第４実施形態では、サーバ４００に分析機能を設ける構成であったが、これに限定されるものではない。第５実施形態では、端末装置に分析機能を設ける構成について説明する。 Fifth Embodiment
In the fourth embodiment, the analysis function is provided in the server 400, but the present invention is not limited to this. In the fifth embodiment, a configuration in which the analysis function is provided in a terminal device will be described.

図２０は第５実施形態の情報処理システムの構成の一例を示す図である。情報処理システムは、撮像装置５０、及び情報処理装置としての端末装置１５０を備える。撮像装置５０は、第４実施形態の場合と同様である。端末装置１５０は、第４実施形態の場合と比較して、解析部１５５、文字列解析部１５６、図解析部１５７、順序推定部１５８、音声合成部１５９、音再生部１６０、分析部１６１、音声データリスト１６３、コンピュータプログラム１６４を備える点で相違する。解析部１５５、文字列解析部１５６、図解析部１５７、順序推定部１５８、音声合成部１５９、音再生部１６０、分析部１６１、音声データリスト１６３は、それぞれ第４実施形態のサーバ４００が具備している解析部４０３、文字列解析部４０４、図解析部４０５、順序推定部４０６、音声合成部４０７、音再生部４０８、分析部４０９、音声データリスト４１１と同様である。コンピュータプログラム１６４は、制御部１５１によって実行されることにより、解析部１５５、文字列解析部１５６、図解析部１５７、順序推定部１５８、音声合成部１５９、音再生部１６０、及び分析部１６１の全部又は一部の機能を実現することができる。 Figure 20 is a diagram showing an example of the configuration of an information processing system of the fifth embodiment. The information processing system includes an imaging device 50 and a terminal device 150 as an information processing device. The imaging device 50 is the same as in the fourth embodiment. Compared to the fourth embodiment, the terminal device 150 differs in that it includes an analysis unit 155, a character string analysis unit 156, a figure analysis unit 157, an order estimation unit 158, a voice synthesis unit 159, a sound playback unit 160, an analysis unit 161, an audio data list 163, and a computer program 164. The analysis unit 155, character string analysis unit 156, figure analysis unit 157, order estimation unit 158, speech synthesis unit 159, sound reproduction unit 160, analysis unit 161, and speech data list 163 are similar to the analysis unit 403, character string analysis unit 404, figure analysis unit 405, order estimation unit 406, speech synthesis unit 407, sound reproduction unit 408, analysis unit 409, and speech data list 411 provided in the server 400 of the fourth embodiment. When executed by the control unit 151, the computer program 164 can realize all or part of the functions of the analysis unit 155, character string analysis unit 156, figure analysis unit 157, order estimation unit 158, speech synthesis unit 159, sound reproduction unit 160, and analysis unit 161.

図２１は第５実施形態の情報処理システムの処理手順の一例を示す図である。撮像装置５０は、撮像対象を撮像し（Ｓ１６１）、撮像して得られた画像データを端末装置１５０へ送信する（Ｓ１６２）。端末装置１５０は、画像データを受信し（Ｓ１６３）、受信した画像データに基づいて撮像対象を解析する（Ｓ１６４）。端末装置１５０は、撮像対象が図であるか文字列であるかに応じて、図の内容に関連する音又は文字列を読み上げる音声を生成する（Ｓ１６５）。 Figure 21 is a diagram showing an example of the processing procedure of the information processing system of the fifth embodiment. The imaging device 50 captures an image of the imaging target (S161) and transmits the image data obtained by capturing the image to the terminal device 150 (S162). The terminal device 150 receives the image data (S163) and analyzes the imaging target based on the received image data (S164). The terminal device 150 generates a sound related to the content of the image or a voice that reads out the text, depending on whether the imaging target is a picture or a string of characters (S165).

端末装置１５０は、生成した音又は音声を撮像装置５０へ送信する（Ｓ１６６）。撮像装置５０は、音又は音声を受信し（Ｓ１６７）、受信した音又は音声を出力する（Ｓ１６８）。端末装置１５０は、受信した画像データ、解析結果を記憶部１６２に記録し（Ｓ１６９）、音の再生又は音声の読み上げ回数を更新する（Ｓ１７０）。幼児や子供などのユーザが、撮像装置５０を持って撮像対象を撮像する都度、図２１に示す処理が繰り返され、画像データ、解析結果、音の再生又は音声の読み上げ回数などの情報を収集することができる。なお、第５実施形態においても、図１２～図１９の場合と同様の処理が行われるので説明は省略する。 The terminal device 150 transmits the generated sound or voice to the imaging device 50 (S166). The imaging device 50 receives the sound or voice (S167) and outputs the received sound or voice (S168). The terminal device 150 records the received image data and analysis results in the storage unit 162 (S169) and updates the number of times the sound is played or the voice is read aloud (S170). Each time a user, such as an infant or child, captures an image of an object with the imaging device 50, the process shown in FIG. 21 is repeated, and information such as the image data, analysis results, and number of times the sound is played or the voice is read aloud can be collected. Note that the same processes as those in FIGS. 12 to 19 are performed in the fifth embodiment, so a description thereof will be omitted.

制御部１５１は、収集部としての機能を有し、通信部１５２を介して、撮像対象を撮像して得られた画像データを収集する。分析部１６１は、収集した画像データに基づいて、撮像対象を撮像したユーザ（幼児や子供）の興味関心を分析する。制御部１５１は、提供部としての機能を有し、分析部１６１の分析結果を提供することができる。 The control unit 151 functions as a collection unit and collects image data obtained by capturing images of the imaging target via the communication unit 152. The analysis unit 161 analyzes the interests of the user (infant or child) who captured the image of the imaging target based on the collected image data. The control unit 151 functions as a provision unit and can provide the analysis results of the analysis unit 161.

また、端末装置１５０上で動作するコンピュータプログラム１６４は、コンピュータに、撮像対象を撮像して得られた画像データを収集し、収集した画像データに基づいて、撮像対象を撮像したユーザの興味関心を分析し、分析結果を提供する、処理を実行させる。 In addition, the computer program 164 running on the terminal device 150 causes the computer to execute processing to collect image data obtained by capturing images of the subject, analyze the interests of the user who captured the image based on the collected image data, and provide the analysis results.

（第６実施形態）
第６実施形態では、撮像装置に分析機能を設ける構成について説明する。 Sixth Embodiment
In the sixth embodiment, a configuration in which an analysis function is provided in an imaging device will be described.

図２２は第６実施形態の情報処理システムの構成の一例を示す図である。情報処理システムは、撮像装置５０、及び端末装置１００を備える。端末装置１００は、第４実施形態の場合と同様である。撮像装置５０は、第４実施形態の場合と比較して、解析部７１、文字列解析部７２、図解析部７３、順序推定部６５、音声合成部６６、音再生部６７、分析部６８、音声データリスト７０を備える点で相違する。解析部７１、文字列解析部７２、図解析部７３、順序推定部６５、音声合成部６６、音再生部６７、分析部６８、音声データリスト７０は、それぞれ第４実施形態のサーバ４００が具備している解析部４０３、文字列解析部４０４、図解析部４０５、順序推定部４０６、音声合成部４０７、音再生部４０８、分析部４０９、音声データリスト４１１と同様である。 Figure 22 is a diagram showing an example of the configuration of an information processing system according to the sixth embodiment. The information processing system includes an imaging device 50 and a terminal device 100. The terminal device 100 is the same as that of the fourth embodiment. The imaging device 50 differs from that of the fourth embodiment in that it includes an analysis unit 71, a character string analysis unit 72, a graphic analysis unit 73, an order estimation unit 65, a voice synthesis unit 66, a sound reproduction unit 67, an analysis unit 68, and a voice data list 70. The analysis unit 71, character string analysis unit 72, graphic analysis unit 73, order estimation unit 65, voice synthesis unit 66, sound reproduction unit 67, analysis unit 68, and voice data list 70 are the same as the analysis unit 403, character string analysis unit 404, graphic analysis unit 405, order estimation unit 406, voice synthesis unit 407, sound reproduction unit 408, analysis unit 409, and voice data list 411, respectively, included in the server 400 according to the fourth embodiment.

図２３は第６実施形態の撮像装置５０の処理手順の一例を示す図である。撮像装置５０は、撮像対象を撮像し（Ｓ１８１）、撮像して得られた画像データに基づいて撮像対象を解析する（Ｓ１８２）。撮像装置５０は、撮像対象が図であるか文字列であるかに応じて、図の内容に関連する音又は文字列を読み上げる音声を生成する（Ｓ１８３）。 Figure 23 is a diagram showing an example of the processing procedure of the imaging device 50 of the sixth embodiment. The imaging device 50 captures an image of the imaging target (S181) and analyzes the imaging target based on the image data obtained by capturing the image (S182). Depending on whether the imaging target is a figure or a string of characters, the imaging device 50 generates a sound related to the content of the figure or a voice that reads out the string of characters (S183).

撮像装置５０は、生成した音又は音声を出力する（Ｓ１８４）。撮像装置５０は、画像データ、解析結果を記憶部６９に記録し（Ｓ１８５）、音の再生又は音声の読み上げ回数を更新する（Ｓ１８６）。幼児や子供などのユーザが、撮像装置５０を持って撮像対象を撮像する都度、図２３に示す処理が繰り返され、画像データ、解析結果、音の再生又は音声の読み上げ回数などの情報を収集することができる。なお、第６実施形態においても、図１２～図１９の場合と同様の処理が行われるので説明は省略する。 The imaging device 50 outputs the generated sound or voice (S184). The imaging device 50 records the image data and analysis results in the storage unit 69 (S185), and updates the number of times the sound is played or the voice is read aloud (S186). Each time a user, such as an infant or child, takes an image of an object with the imaging device 50, the process shown in FIG. 23 is repeated, and information such as the image data, analysis results, and number of times the sound is played or the voice is read aloud can be collected. Note that the same processes as those in FIGS. 12 to 19 are performed in the sixth embodiment, so a description thereof will be omitted.

撮像装置５０は、把持部６２、撮像対象を覗き込むための窓部６１、窓部６１を介して撮像対象が覗き込まれた状態で撮像対象を撮像可能な撮像部５１、撮像部５１で撮像して収集された画像データに基づいて、撮像対象を撮像したユーザの興味関心を分析する分析部６８、及び分析部６８の分析結果を提供する提供部としての制御部１１を備える。 The imaging device 50 includes a gripping unit 62, a window 61 for looking into the imaging target, an imaging unit 51 capable of imaging the imaging target while the imaging target is being looked into through the window 61, an analysis unit 68 that analyzes the interests of the user who captured the imaging target based on image data captured and collected by the imaging unit 51, and a control unit 11 that serves as a providing unit that provides the analysis results of the analysis unit 68.

第４実施形態～第６実施形態では、分析機能を、サーバ４００、端末装置１５０、あるいは撮像装置５０のいずれかに備える構成であったが、分析機能をサーバ４００、端末装置１５０、及び撮像装置５０の少なくとも２つで分散して備える構成でもよい。 In the fourth to sixth embodiments, the analysis function was provided in either the server 400, the terminal device 150, or the imaging device 50, but the analysis function may be distributed among at least two of the server 400, the terminal device 150, and the imaging device 50.

図２４は興味関心分析結果の推移の一例を示す図である。幼児や子供などのユーザの興味関心分析結果を、ユーザの成長に合わせて収集することにより、個々のユーザの興味関心分析結果を推移で把握することができる。図２４の例では、「マイブーム」の分野（例えば、分野Ａ１～Ａ５）の変遷と、「急上昇」の分野（例えば、分野Ｂ１～Ｂ５）の変遷が、ユーザの年齢と共に表示されている。分野Ａ１～Ａ５、Ｂ１～Ｂ５は、共有の分野が含まれていれもよい。図２４に示すようなデータを、多数のユーザについても収集することにより、ビックデータとして活用することが可能となる。また、個々のユーザの興味関心が今後どのように変化・推移するかを予測するようにしてもよい。 Figure 24 shows an example of the transition of interest analysis results. By collecting interest analysis results for users such as infants and children as they grow, it is possible to understand the transition of each individual user's interest analysis results. In the example of Figure 24, the transition of "My Craze" categories (e.g., categories A1-A5) and "Rapidly Trending" categories (e.g., categories B1-B5) is displayed along with the user's age. Categories A1-A5 and B1-B5 may also include shared categories. By collecting data such as that shown in Figure 24 for a large number of users, it can be utilized as big data. It is also possible to predict how individual users' interests will change and transition in the future.

図２５は年代別・地域別・時系列での興味関心分析結果の一例を示す図である。多数の子供たちの分析結果を収集し、年代別・地域別・時系列で興味関心の傾向の違いを把握することができる。これにより、年代別、地域別、保育園別などの任意のグループ（ユーザ群）ごとの時系列推移及び将来の予測を把握することができる。図２５Ａは、年代Ａ１、Ａ２、Ａ３、…別に興味関心分野を分析した結果を示す。便宜上、興味関心分野をＣ１～Ｃ５としているが、分野の数はこれに限定されない。それぞれの年代での興味関心分野を、レーダチャートのような図で表してもよい。図２５Ｂは、地域Ｌ１、Ｌ２、Ｌ３、…別に興味関心分野を分析した結果を示す。便宜上、興味関心分野をＣ１～Ｃ５としているが、分野の数はこれに限定されない。それぞれの地域での興味関心分野を、レーダチャートのような図で表してもよい。図２５Ｃは、所定の基準時点（例えば、特定の年齢など）からの経過時間別に興味関心分野を分析した結果を示す。便宜上、興味関心分野をＣ１～Ｃ５としているが、分野の数はこれに限定されない。それぞれの経過時点での興味関心分野を、レーダチャートのような図で表してもよい。上述のように、分析部４０９は、収集した画像データに基づいて、年代別及び地域別の少なくとも１つで区分されたユーザ群の興味関心の時間的推移（例えば、任意のグループ毎の時系列推移、あるいは将来予測を含む）を分析してもよい。 Figure 25 shows an example of the results of an interest analysis by age group, region, and time series. By collecting the analysis results of a large number of children, differences in interest trends by age group, region, and time series can be understood. This allows for understanding time-series trends and future predictions for any group (user group), such as by age group, region, or nursery school. Figure 25A shows the results of an analysis of interest fields by age group A1, A2, A3, etc. For convenience, the interest fields are labeled C1 through C5, but the number of fields is not limited to this number. The interest fields for each age group may be represented in a diagram similar to a radar chart. Figure 25B shows the results of an analysis of interest fields by region L1, L2, L3, etc. For convenience, the interest fields are labeled C1 through C5, but the number of fields is not limited to this number. The interest fields for each region may be represented in a diagram similar to a radar chart. Figure 25C shows the results of an analysis of interest fields by time elapsed since a specified reference point (e.g., a specific age). For convenience, the fields of interest are designated C1 to C5, but the number of fields is not limited to this. The fields of interest at each point in time may be represented in a diagram such as a radar chart. As described above, the analysis unit 409 may analyze the temporal changes in the interests of a group of users divided by at least one of age and region based on the collected image data (for example, including time-series changes for each group or future predictions).

本実施の形態の音声読み上げシステムは、撮像装置と、撮像対象が記載された対象物を載置可能な載置面を有する本体装置とを備え、前記撮像装置は、把持部と、撮像対象を覗き込むための窓部と、前記窓部を介して撮像対象が覗き込まれた状態で前記撮像対象を撮像可能な撮像部と、前記撮像部で撮像して得られた画像データを前記本体装置へ送信する送信部とを備え、前記本体装置は、前記画像データを受信する受信部と、前記受信部で受信した画像データを解析する解析部と、前記解析部の解析結果に基づいて音声を出力する出力部とを備える。 The text-to-speech system of this embodiment comprises an imaging device and a main unit having a mounting surface on which an object to be imaged can be placed, the imaging device comprising a gripping unit, a window for looking into the object, an imaging unit capable of imaging the object while looking into the object through the window, and a transmitting unit for transmitting image data captured by the imaging unit to the main unit, the main unit comprising a receiving unit for receiving the image data, an analyzing unit for analyzing the image data received by the receiving unit, and an output unit for outputting audio based on the analysis results of the analyzing unit.

本実施の形態の音声読み上げシステムにおいて、前記載置面は、平面視で矩形状をなし、前記載置面の中央部を間にした１対の縁辺部それぞれから前記中央部に向かって高さが小さくなるように傾斜している。 In the text-to-speech system of this embodiment, the placement surface is rectangular in plan view and is inclined so that the height decreases from each of a pair of edges that sandwich the center of the placement surface toward the center.

本実施の形態の音声読み上げシステムにおいて、前記本体装置は、前記載置面の傾斜する方向に沿って前記載置面の他の１対の縁辺部に前記対象物の移動を規制する規制部を備える。 In the text-to-speech system of this embodiment, the main device is equipped with a restricting unit that restricts movement of the object on another pair of edges of the placement surface along the inclined direction of the placement surface.

本実施の形態の音声読み上げシステムにおいて、前記本体装置は、前記撮像装置を収容するための収容部を前記載置面に形成している。 In the text-to-speech system of this embodiment, the main unit has a storage section formed on the placement surface for storing the imaging device.

本実施の形態の音声読み上げシステムにおいて、前記撮像装置は、前記対象物までの距離を検出する検出部と、前記検出部で検出した距離に応じて前記窓部を介した視野内の撮像対象が撮像可能となるように前記撮像部の撮像範囲を補正する補正部とを備える。 In the text-to-speech system of this embodiment, the imaging device includes a detection unit that detects the distance to the object, and a correction unit that corrects the imaging range of the imaging unit so that the object within the field of view through the window can be imaged in accordance with the distance detected by the detection unit.

本実施の形態の音声読み上げシステムにおいて、前記解析部は、前記受信部で受信した画像データに基づいて撮像対象が文字列であるか図であるかを解析し、前記出力部は、前記解析部で撮像対象が文字列であると解析した場合、前記文字列を読み上げる音声を出力する。 In the text-to-speech system of this embodiment, the analysis unit analyzes whether the imaged object is a character string or a figure based on the image data received by the receiving unit, and if the analysis unit analyzes that the imaged object is a character string, the output unit outputs a voice that reads out the character string.

本実施の形態の音声読み上げシステムにおいて、前記出力部は、前記解析部で撮像対象が図であると解析した場合、前記図の内容に関連する音を出力する。 In the text-to-speech system of this embodiment, when the analysis unit analyzes that the imaged object is a diagram, the output unit outputs a sound related to the content of the diagram.

本実施の形態の音声読み上げシステムは、前記解析部で撮像対象が文字列であると解析した場合、前記文字列に対して意味解析を行って感情指標を算出する感情指標算出部を備え、前記出力部は、前記感情指標算出部で算出した感情指標に応じた背景音楽を出力する。 The text-to-speech system of this embodiment includes an emotional index calculation unit that, when the analysis unit analyzes that the imaged object is a character string, performs a semantic analysis of the character string to calculate an emotional index, and the output unit outputs background music according to the emotional index calculated by the emotional index calculation unit.

本実施の形態の音声読み上げシステムは、前記解析部で撮像対象が文字列であると解析した場合、前記文字列の配列に基づいて前記文字列の読み上げ順序を推定する読み上げ順序推定部を備える。 The text-to-speech system of this embodiment includes a reading order estimation unit that, when the analysis unit analyzes that the imaged object is a string of characters, estimates the reading order of the string of characters based on the arrangement of the string of characters.

本実施の形態の音声読み上げシステムは、予め録音された音声の音声特徴量を用いて音声を合成する音声合成部を備え、前記出力部は、前記解析部で撮像対象が文字列であると解析した場合、前記文字列に基づいて前記音声合成部が合成した音声を出力する。 The text-to-speech system of this embodiment includes a speech synthesis unit that synthesizes speech using speech features of pre-recorded speech, and when the analysis unit analyzes that the imaged object is a character string, the output unit outputs the speech synthesized by the speech synthesis unit based on the character string.

本実施の形態の音声読み上げシステムは、予め録音された音声の素片を連結して音声を合成する音声合成部を備え、前記出力部は、前記解析部で撮像対象が文字列であると解析した場合、前記文字列に基づいて前記音声合成部が合成した音声を出力する。 The text-to-speech system of this embodiment includes a speech synthesis unit that synthesizes speech by concatenating pre-recorded speech fragments, and when the analysis unit analyzes that the imaged object is a character string, the output unit outputs the speech synthesized by the speech synthesis unit based on the character string.

本実施の形態の音声読み上げ装置は、把持部と、撮像対象を覗き込むための窓部と、前記窓部を介して撮像対象が覗き込まれた状態で前記撮像対象を撮像可能な撮像部と、前記撮像部で撮像して得られた画像データを解析する解析部と、前記解析部の解析結果に基づいて音声を出力する出力部とを備える。 The text-to-speech device of this embodiment includes a gripping unit, a window for looking into the image capture target, an imaging unit capable of imaging the image capture target while the image capture target is being looked into through the window, an analysis unit that analyzes image data captured by the imaging unit, and an output unit that outputs audio based on the analysis results of the analysis unit.

本実施の形態の音声読み上げ装置は、撮像対象が記載された対象物までの距離を検出する検出部と、前記検出部で検出した距離に応じて前記窓部を介した視野内の撮像対象が撮像可能となるように前記撮像部の撮像範囲を補正する補正部とを備える。 The text-to-speech device of this embodiment includes a detection unit that detects the distance to the object on which the image capture target is written, and a correction unit that corrects the imaging range of the imaging unit based on the distance detected by the detection unit so that the image capture target within the field of view through the window can be captured.

本実施の形態の情報処理装置は、撮像対象を撮像して得られた画像データを収集する収集部と、前記収集部で収集した画像データに基づいて、前記撮像対象を撮像したユーザの興味関心を分析する分析部と、前記分析部の分析結果を提供する提供部とを備える。 The information processing device of this embodiment includes a collection unit that collects image data obtained by capturing an image of an object, an analysis unit that analyzes the interests of the user who captured the image of the object based on the image data collected by the collection unit, and a provision unit that provides the analysis results of the analysis unit.

本実施の形態の情報処理装置は、前記収集部で収集した画像データに基づいて撮像対象の分野を解析する第１解析部を備え、前記分析部は、第１所定期間の都度収集した前記第１解析部の解析結果に基づいて、前記ユーザの興味関心を分析する。 The information processing device of this embodiment includes a first analysis unit that analyzes the field of the image subject based on the image data collected by the collection unit, and the analysis unit analyzes the user's interests based on the analysis results of the first analysis unit collected each time a first predetermined period of time is reached.

本実施の形態の情報処理装置において、前記分析部は、前記第１解析部で解析した分野毎に撮像された撮像対象の数に基づいて前記ユーザの興味関心に関する第１指標を分析する。 In the information processing device of this embodiment, the analysis unit analyzes a first index related to the user's interests based on the number of subjects captured for each field analyzed by the first analysis unit.

本実施の形態の情報処理装置は、前記第１所定期間毎に前記分析部で分析した前記第１指標の変化を特定する特定部を備え、前記分析部は、前記特定部で特定した前記第１指標の変化に応じて、前記ユーザの興味関心に関するタイプを分析する。 The information processing device of this embodiment includes an identification unit that identifies changes in the first indicator analyzed by the analysis unit for each first predetermined period, and the analysis unit analyzes the user's type of interests in accordance with the changes in the first indicator identified by the identification unit.

本実施の形態の情報処理装置において、前記分析部は、前記第１所定期間毎の、前記第１解析部で解析した分野毎に撮像された撮像対象の数の変化に基づいて前記ユーザの興味関心に関する第２指標を分析する。 In the information processing device of this embodiment, the analysis unit analyzes a second index related to the user's interests based on changes in the number of subjects captured for each field analyzed by the first analysis unit for each first predetermined period.

本実施の形態の情報処理装置において、前記分析部は、前記第１解析部で解析した分野毎に撮像された撮像対象の数のうち、撮像対象の数が最多の分野に基づいて前記ユーザの興味関心に関する第３指標を分析する。 In the information processing device of this embodiment, the analysis unit analyzes the third index related to the user's interests based on the field with the largest number of imaging subjects among the number of imaging subjects captured for each field analyzed by the first analysis unit.

本実施の形態の情報処理装置は、前記収集部で収集した画像データに基づいて撮像対象に含まれる色を解析する第２解析部を備え、前記分析部は、第２所定期間の都度収集した前記第２解析部の解析結果に基づいて、前記ユーザの色に関する興味関心を分析する。 The information processing device of this embodiment includes a second analysis unit that analyzes the colors contained in the image subject based on the image data collected by the collection unit, and the analysis unit analyzes the user's color-related interests based on the analysis results of the second analysis unit collected each time a second predetermined period of time is reached.

本実施の形態の情報処理装置は、前記収集部で収集した画像データに基づいて撮像対象が文字列であるか図であるかを解析する第３解析部と、前記第３解析部で撮像対象が文字列であると解析した場合、前記文字列を読み上げる朗読部と、前記第３解析部で撮像対象が図であると解析した場合、前記図の内容に関連する音を再生する再生部とを備え、前記分析部は、第３所定期間に亘る、前記朗読部で読み上げた回数、及び前記再生部で再生した回数に基づいて、前記ユーザの活動タイプを分析する。 The information processing device of this embodiment includes a third analysis unit that analyzes whether the imaged object is a string of characters or a figure based on the image data collected by the collection unit, a reading unit that reads out the string of characters if the third analysis unit analyzes that the imaged object is a string of characters, and a playback unit that plays sound related to the content of the figure if the third analysis unit analyzes that the imaged object is a figure, and the analysis unit analyzes the user's activity type based on the number of times the reading unit reads out the string and the number of times the playback unit plays it over a third predetermined period.

本実施の形態の情報処理装置は、前記第３解析部の解析結果に基づいて音声を出力する出力部を備える。 The information processing device of this embodiment includes an output unit that outputs audio based on the analysis results of the third analysis unit.

本実施の形態の情報処理装置において、前記分析部は、前記収集部で収集した画像データに基づいて、年代別及び地域別の少なくとも１つで区分されたユーザ群の興味関心の時間的推移を分析する。 In the information processing device of this embodiment, the analysis unit analyzes the temporal changes in the interests of a group of users divided by at least one of age and region, based on the image data collected by the collection unit.

本実施の形態の撮像装置は、把持部と、撮像対象を覗き込むための窓部と、前記窓部を介して撮像対象が覗き込まれた状態で前記撮像対象を撮像可能な撮像部と、前記撮像部で撮像して収集された画像データに基づいて、前記撮像対象を撮像したユーザの興味関心を分析する分析部と、前記分析部の分析結果を提供する提供部とを備える。 The imaging device of this embodiment includes a gripping unit, a window for looking into the imaging target, an imaging unit capable of imaging the imaging target while the imaging target is being looked into through the window, an analysis unit that analyzes the interests of the user who captured the imaging target based on image data captured and collected by the imaging unit, and a provision unit that provides the analysis results of the analysis unit.

本実施の形態のコンピュータプログラムは、コンピュータに、撮像対象を撮像して得られた画像データを収集し、収集した画像データに基づいて、前記撮像対象を撮像したユーザの興味関心を分析し、分析結果を提供する、処理を実行させる。 The computer program of this embodiment causes a computer to execute the following process: collect image data obtained by capturing an image of an object, analyze the interests of the user who captured the image of the object based on the collected image data, and provide the analysis results.

１宅内ネットワーク
２通信ネットワーク
１０本体装置
１１、２０１制御部
１２通信部
１３、２０５解析部
１３１、２０６文字列解析部
１３２、２０７図解析部
１４、２０８音声合成部
１５、２０９順序推定部
１６、２１０記憶部
１６１、２１１ＢＧＭリスト
１６２、２１２音声データリスト
１７、２１３マイク
１８、２１４スピーカ
１９、２１５感情指標算出部
２１載置面
２２縁辺部
２３中央部
２４規制部
２５収容部
２６表示灯
３０本体装置
３１第１通信部
３２第２通信部
３３記憶部
３３１ＢＧＭリスト
３４マイク
３５スピーカ
３６制御部
５０撮像装置
５１、２０２撮像部
５２、２０３距離センサ
５３、２０４補正部
５４メモリ
５５プロセッサ
５６通信部
５７スピーカ
６９記憶部
７０音声データリスト
１００、１５０端末装置
１０１、１５１制御部
１０２第１通信部
１０３第２通信部
１５２通信部
１０４、１５３表示部
１０５、１５４操作部
１０６、１６２記憶部
１６３音声データリスト
１６４コンピュータプログラム
３００サーバ
３０１制御部
３０２通信部
３０３解析部
３０４文字列解析部
３０５図解析部
３０６音声合成部
３０７順序推定部
３０８記憶部
３０９音声データリスト
３１０感情指標算出部
４００サーバ
４０１制御部
４０２通信部
４０３、１５５、７１解析部
４０４、１５６、７２文字列解析部
４０５、１５７、７３図解析部
４０６、１５８、６５順序推定部
４０７、１５９、６６音声合成部
４０８、１６０、６７音再生部
４０９、１６１、６８分析部
４１０記憶部
４１１音声データリスト REFERENCE SIGNS LIST 1 Home network 2 Communication network 10 Main unit 11, 201 Control unit 12 Communication unit 13, 205 Analysis unit 131, 206 Character string analysis unit 132, 207 Graphic analysis unit 14, 208 Voice synthesis unit 15, 209 Order estimation unit 16, 210 Memory unit 161, 211 BGM list 162, 212 Voice data list 17, 213 Microphone 18, 214 Speaker 19, 215 Emotion index calculation unit 21 Placement surface 22 Edge portion 23 Center portion 24 Restriction unit 25 Storage unit 26 Indicator light 30 Main unit 31 First communication unit 32 Second communication unit 33 Memory unit 331 BGM list 34 Microphone 35 Speaker 36 Control unit 50 Imaging device 51, 202 Imaging unit 52, 203 Distance sensor 53, 204 Correction unit 54 Memory 55 Processor 56 Communication unit 57 Speaker 69 Storage unit 70 Voice data list 100, 150 Terminal device 101, 151 Control unit 102 First communication unit 103 Second communication unit 152 Communication unit 104, 153 Display unit 105, 154 Operation unit 106, 162 Storage unit 163 Voice data list 164 Computer program 300 Server 301 Control unit 302 Communication unit 303 Analysis unit 304 Character string analysis unit 305 Figure analysis unit 306 Voice synthesis unit 307 Order estimation unit 308 Storage unit 309 Voice data list 310 Emotion index calculation unit 400 Server 401 Control unit 402 Communication unit 403, 155, 71 Analysis unit 404, 156, 72 Character string analysis unit 405, 157, 73 Graphic analysis unit 406, 158, 65 Order estimation unit 407, 159, 66 Speech synthesis unit 408, 160, 67 Sound reproduction unit 409, 161, 68 Analysis unit 410 Storage unit 411 Voice data list

Claims

The image capturing device includes an imaging device and a main body device having a mounting surface on which an object having an image capturing target written thereon can be placed,
The imaging device is
A gripping portion;
a window for looking into the imaging target;
an imaging unit capable of imaging the imaging target in a state where the imaging target is viewed through the window;
a transmission unit that transmits image data obtained by capturing an image with the imaging unit to the main device,
The main body device includes:
a receiving unit that receives the image data;
an analysis unit that analyzes the image data received by the receiving unit;
an output unit that outputs a voice based on the analysis result of the analysis unit ;
an emotion index calculation unit that calculates an emotion index by performing a semantic analysis on the character string when the analysis unit analyzes the image data to determine that the image capture target is a character string;
Equipped with
The output unit
outputting background music according to the emotion index calculated by the emotion index calculation unit ;
Voice reading system.

The placement surface is
It has a rectangular shape in plan view,
The mounting surface is inclined so that the height decreases from each of a pair of edge portions sandwiching the center portion of the mounting surface toward the center portion.
The text-to-speech system according to claim 1 .

The main body device includes:
and a restricting portion for restricting movement of the object on another pair of edges of the placement surface along the inclined direction of the placement surface.
The text-to-speech system according to claim 1 or 2.

The main body device includes:
a storage section for storing the imaging device is formed on the placement surface;
The text-to-speech system according to any one of claims 1 to 3.

The imaging device is
a detection unit that detects the distance to the object;
a correction unit that corrects an imaging range of the imaging unit so that an imaging target within a field of view through the window unit can be imaged in accordance with the distance detected by the detection unit,
The text-to-speech system according to any one of claims 1 to 4.

The analysis unit
Analyzing whether the imaged object is a character string or a picture based on the image data received by the receiving unit;
The output unit
When the analysis unit analyzes that the imaged object is a character string, a voice reading the character string is output.
The text-to-speech system according to any one of claims 1 to 5.

The output unit
When the analysis unit analyzes that the imaging target is a figure, it outputs a sound related to the content of the figure.
The text-to-speech system according to claim 6.

a reading order estimation unit that, when the analysis unit analyzes that the image capture target is a character string, estimates a reading order of the character string based on an arrangement of the character string;
The text-to-speech system according to any one of claims 6 to 7 .

a speech synthesis unit that synthesizes speech using speech features of pre-recorded speech;
The output unit
When the analysis unit analyzes that the imaging target is a character string, the voice synthesis unit outputs a synthesized voice based on the character string.
The text-to-speech system according to any one of claims 6 to 8 .

a speech synthesis unit that synthesizes speech by connecting pre-recorded speech fragments;
The output unit
When the analysis unit analyzes that the imaging target is a character string, the voice synthesis unit outputs a synthesized voice based on the character string.
The text-to-speech system according to any one of claims 6 to 8 .

A gripping portion;
a window for looking into the imaging target;
an imaging unit capable of imaging the imaging target in a state where the imaging target is viewed through the window;
an analysis unit that analyzes image data captured by the imaging unit;
an output unit that outputs a voice based on the analysis result of the analysis unit ;
an emotion index calculation unit that calculates an emotion index by performing a semantic analysis on the character string when the analysis unit analyzes the image data to determine that the image capture target is a character string;
Equipped with
The output unit
outputting background music according to the emotion index calculated by the emotion index calculation unit ;
Voice reading device.

a detection unit that detects the distance to the object on which the image is to be captured;
a correction unit that corrects an imaging range of the imaging unit so that an imaging target within a field of view through the window unit can be imaged in accordance with the distance detected by the detection unit,
The text-to-speech device according to claim 11 .