JP7353057B2

JP7353057B2 - Recording systems and programs

Info

Publication number: JP7353057B2
Application number: JP2019067081A
Authority: JP
Inventors: 遼馬服部; 勇気櫻井; 遥香松本
Original assignee: Tokyo Gas Co Ltd
Current assignee: Tokyo Gas Co Ltd
Priority date: 2019-03-29
Filing date: 2019-03-29
Publication date: 2023-09-29
Anticipated expiration: 2039-03-29
Also published as: JP2020166602A

Description

本発明は、記録システム、プログラムに関する。 The present invention relates to a recording system and a program.

撮影した画像の選別を行い、それを例えば、時系列的に並べることで、画像を編集し、アルバム化を図ることがある。 The images may be edited and put into an album by sorting the captured images and arranging them, for example, in chronological order.

特許文献１に記載の電子アルバム編集装置は、画像データおよび該画像データに対応する音声データを取り込む画像／音声データＩ／Ｆ手段と、画像データおよび前記音声データを管理保存する記録手段と、音声データをテキストデータに変換するための音声データ変換手段と、テキストデータに基づいて画像データの検索、ソートを行うための検索・ソート手段と、ソート結果に基づいて画像を配置してアルバムを作成するアルバム作成手段と、を備える。 The electronic album editing device described in Patent Document 1 includes: an image/audio data I/F unit that imports image data and audio data corresponding to the image data; a recording unit that manages and stores the image data and the audio data; A voice data conversion means for converting data into text data, a search/sort means for searching and sorting image data based on the text data, and creating an album by arranging images based on the sorting results. An album creation means is provided.

また、特許文献２に記載のアルバム作成装置は、蓄積された複数の画像データから、画像データが作成された日時の範囲、画像データが撮影された場所の範囲、画像データに画像が含まれる人物若しくは画像データが伴う音声に含まれる話し声を出す人物又はこれらの組合せを検索条件として１又は複数の画像データを抽出する抽出手段と、該抽出された画像データを表示する表示手段と、を備える。また、これらの検索条件を自動生成する手段や、検索条件をユーザから入力するための対話をする手段を備えることが記載されている。 Further, the album creation device described in Patent Document 2 is capable of analyzing, from a plurality of accumulated image data, a range of dates and times when image data was created, a range of places where image data was taken, and a person whose image is included in the image data. or a person making a speaking voice included in the audio accompanying the image data, or a combination thereof as a search condition to extract one or more image data; and a display means to display the extracted image data. It is also described that the system includes means for automatically generating these search conditions and means for interacting with the user to input search conditions.

特開２００３－１１１００９JP2003-111009 特開２００５－１０７８６７JP2005-107867

例えば、撮影した画像の中から、ユーザが適切な画像を選択し、これを並べることで画像や音声のアルバム化を行う場合がある。ところが、このような作業は、一般に多くの時間と労力を必要とし、ユーザに対する負担が過大になりやすい。
本発明の目的は、撮影した画像や取得した音声のアルバム化を図る際に、ユーザに対する負担を軽減することができる記録システム等を提供することを目的とする。 For example, a user may select appropriate images from among the captured images and arrange them in order to create an album of images and audio. However, such work generally requires a lot of time and effort, and tends to place an excessive burden on the user.
An object of the present invention is to provide a recording system and the like that can reduce the burden on the user when creating an album of captured images and acquired audio.

かくして本発明によれば、人物が撮影されている画像と、画像に関連した音声と、を取得し、記憶する記憶手段と、ユーザから撮影シーンの入力を受け付け、受け付けた当該撮影シーンに予め対応付けられて登録されている言葉を選択する選択手段と、選択された前記言葉を、記憶手段に記憶された音声から判別する判別手段と、選択された前記言葉に対応する画像を選別し、選別した画像を出力する出力手段と、を有し、出力手段は、判別された言葉を発したときの画像を含む予め定められた期間の画像を選別し、判別された言葉を発した人物を特定し、期間として、特定した人物が撮影されている期間の画像を選別することを特徴とする記録システムが提供される。 Thus, according to the present invention, there is provided a storage means for acquiring and storing an image in which a person is photographed and audio related to the image, and a storage means for receiving and storing an image in which a person is photographed, and receiving an input of a photographing scene from a user, and responding in advance to the received photographing scene. a selection means for selecting a word registered as a tagged word; a discriminating means for discriminating the selected word from audio stored in a storage means; and a discrimination means for selecting an image corresponding to the selected word. , an output means for outputting the selected images, and the output means selects images of a predetermined period including images when the identified words were uttered, and outputs images of the person who uttered the identified words. A recording system is provided, which is characterized in that it identifies a person and selects, as a period, images during a period in which the specified person is photographed.

ここで、出力手段は、選別した画像および選別した画像に対応する音声を時系列順に並べ、アルバム化することができる。この場合、関連性がより高い画像および音声を１つにまとめることができる。 Here , the output means can arrange the selected images and the sounds corresponding to the selected images in chronological order and create an album. In this case, images and sounds that are more closely related can be combined into one.

さらに、本発明によれば、コンピュータに、人物が撮影されている画像と、画像に関連した音声と、を取得し、記憶する記憶機能と、ユーザから撮影シーンの入力を受け付け、受け付けた当該撮影シーンに予め対応付けられて登録されている言葉を選択する選択機能と、選択された前記言葉を、記憶機能に記憶された音声から判別する判別機能と、選択された前記言葉に対応する画像を選別し、選別した画像を出力する出力機能と、を実現させ、出力機能は、判別された言葉を発したときの画像を含む予め定められた期間の画像を選別し、判別された言葉を発した人物を特定し、期間として、特定した人物が撮影されている期間の画像を選別するためのプログラムが提供される。 Furthermore, according to the present invention, the computer has a memory function for acquiring and storing an image in which a person is photographed and audio related to the image, and a memory function for receiving and storing an input of a photographed scene from a user, and a selection function for selecting a word registered in advance in association with a scene; a discrimination function for discriminating the selected word from audio stored in a memory function; The output function selects images and outputs the selected images, and the output function selects images of a predetermined period including images when the identified words are uttered, A program is provided for identifying the person who uttered the message and selecting images from the period during which the identified person was photographed.

本発明によれば、撮影した画像や取得した音声のアルバム化を図る際に、ユーザに対する負担を軽減することができる記録システム等を提供することができる。 According to the present invention, it is possible to provide a recording system and the like that can reduce the burden on the user when creating an album of captured images and acquired audio.

本実施の形態における記録システムの構成例を示す図である。1 is a diagram showing a configuration example of a recording system in this embodiment. 記録システムの概略動作の例について示した図である。FIG. 2 is a diagram illustrating an example of a schematic operation of the recording system. 第１の実施形態における記録システムの機能構成例を示したブロック図である。1 is a block diagram showing an example of a functional configuration of a recording system in a first embodiment. FIG. 第１の実施形態における記録システムの動作について説明したフローチャートである。It is a flowchart explaining the operation of the recording system in the first embodiment. 携帯端末において、ユーザが言葉の登録を行う画面の例を示した図である。FIG. 2 is a diagram illustrating an example of a screen on which a user registers words on a mobile terminal. （ａ）～（ｂ）は、選別部が、画像を選択する期間について示した図である。(a) to (b) are diagrams showing periods during which a selection unit selects images. 第２の実施形態における記録システムの動作について説明したフローチャートである。It is a flow chart explaining operation of a recording system in a 2nd embodiment. （ａ）～（ｂ）は、図７のステップ２０５において、記憶部で保存するデータのデータ構造について示した図である。(a) to (b) are diagrams showing the data structure of data saved in the storage unit in step 205 of FIG. 7. 第３の実施形態における記録システムの機能構成例を示したブロック図である。FIG. 7 is a block diagram showing an example of a functional configuration of a recording system in a third embodiment. 第３の実施形態における記録システムの動作について説明したフローチャートである。It is a flow chart explaining operation of a recording system in a 3rd embodiment. 携帯端末において、撮影シーンの登録を行う画面の例を示した図である。FIG. 3 is a diagram illustrating an example of a screen for registering a shooting scene on a mobile terminal. 図１０のステップ３０６において、記憶部で保存するデータのデータ構造について示した図である。11 is a diagram showing the data structure of data saved in the storage unit in step 306 of FIG. 10. FIG. 第３の実施形態で使用するロボットについて示した図である。It is a figure shown about the robot used in a 3rd embodiment. ロボットを用いた場合の記録システムの機能構成例を示したブロック図である。FIG. 2 is a block diagram showing an example of a functional configuration of a recording system when a robot is used. 第４の実施形態における記録システムの動作について説明したフローチャートである。It is a flow chart explaining operation of a recording system in a 4th embodiment.

以下、添付図面を参照して、本発明の実施の形態について詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

＜記録システム１全体の説明＞
図１は、本実施の形態における記録システム１の構成例を示す図である。
図示するように本実施の形態の記録システム１は、カメラ１０と、携帯端末２０ａ、２０ｂと、管理サーバ４０とが、ネットワーク７０、ネットワーク８０、アクセスポイント９０を介して接続されることにより構成されている。なお、カメラ１０は、図１では、１つのみ示したが、個数はいくつでもよい。また以後、携帯端末２０ａと携帯端末２０ｂとを区別しない場合には、単に「携帯端末２０」と言うことがある。 <Description of the entire recording system 1>
FIG. 1 is a diagram showing a configuration example of a recording system 1 in this embodiment.
As shown in the figure, the recording system 1 of this embodiment is configured by connecting a camera 10, mobile terminals 20a and 20b, and a management server 40 via a network 70, a network 80, and an access point 90. ing. Note that although only one camera 10 is shown in FIG. 1, any number may be used. Further, hereinafter, when the mobile terminal 20a and the mobile terminal 20b are not distinguished, they may be simply referred to as "mobile terminal 20."

カメラ１０は、撮影装置であり、動画または静止画を撮影する。カメラ１０は、対象物からの光を収束する光学系と、光学系により収束された光を検出するイメージセンサとを備える。光学系は、単一のレンズまたは複数のレンズを組み合わせて構成される。例えば、２つの半球レンズを使用し、その球面側を向かい合わせに組み合わせたツインレンズが用いられる。イメージセンサは、ＣＣＤ（Charge Coupled Device）やＣＭＯＳ（Complementary Metal Oxide Semiconductor）等を配列して構成される。
また、カメラ１０は、撮影を行うのみならず、カメラ１０の周囲の音を取得する。よって、カメラ１０は、例えば、マイクロフォンを備え、マイクロフォンにより取得した音を、電気信号（音声信号）に変換する。またこのとき、電気信号（音声信号）を増幅するようにしてもよい。マイクロフォンの種類としては、ダイナミック型、コンデンサ型等、既存の種々のものが使用できる。なお、カメラ１０の周囲の音をくまなく取得するという観点から、無指向性のＭＥＭＳ（Micro Electro Mechanical Systems）型マイクロフォンであることが好ましい。
また、カメラ１０は、無線通信にて、アクセスポイント９０に接続するネットワークカメラとなっており、撮影した画像の情報および取得した音の情報を、アクセスポイント９０、ネットワーク７０、ネットワーク８０を介して管理サーバ４０に送信することができる。なお、有線通信回線を利用して、ネットワーク７０に接続してもよい。 The camera 10 is a photographing device and photographs moving images or still images. The camera 10 includes an optical system that converges light from an object and an image sensor that detects the light converged by the optical system. An optical system is composed of a single lens or a combination of multiple lenses. For example, a twin lens is used in which two hemispherical lenses are combined with their spherical surfaces facing each other. The image sensor is configured by arranging CCDs (Charge Coupled Devices), CMOSs (Complementary Metal Oxide Semiconductors), and the like.
Further, the camera 10 not only takes pictures but also acquires sounds around the camera 10. Therefore, the camera 10 includes, for example, a microphone, and converts sound acquired by the microphone into an electrical signal (audio signal). Further, at this time, the electric signal (audio signal) may be amplified. Various existing types of microphones can be used, such as dynamic type and condenser type. Note that from the viewpoint of capturing all sounds around the camera 10, it is preferable to use an omnidirectional MEMS (Micro Electro Mechanical Systems) type microphone.
Furthermore, the camera 10 is a network camera that connects to an access point 90 via wireless communication, and manages information on captured images and acquired sound information via the access point 90, the network 70, and the network 80. It can be transmitted to the server 40. Note that it is also possible to connect to the network 70 using a wired communication line.

携帯端末２０は、例えば、モバイルコンピュータ、携帯電話、スマートフォン、タブレット等のモバイル端末である。携帯端末２０ａ、２０ｂは、無線通信を行うためにアクセスポイント９０に接続する。そして、携帯端末２０ａ、２０ｂは、アクセスポイント９０を介して、ネットワーク７０に接続する。 The mobile terminal 20 is, for example, a mobile terminal such as a mobile computer, a mobile phone, a smartphone, or a tablet. Mobile terminals 20a and 20b connect to access point 90 for wireless communication. The mobile terminals 20a and 20b then connect to the network 70 via the access point 90.

管理サーバ４０は、記録システム１の全体の管理をするサーバコンピュータである。詳しくは後述するが、管理サーバ４０は、カメラ１０により撮影された画像および取得された音声を基に、予め定められた言葉を判別し、この言葉に対応する画像や音声のアルバムを作成する。 The management server 40 is a server computer that manages the entire recording system 1. As will be described in detail later, the management server 40 identifies a predetermined word based on the image taken by the camera 10 and the sound obtained, and creates an album of images and sounds corresponding to this word.

携帯端末２０および管理サーバ４０は、演算手段であるＣＰＵ（Central Processing Unit）と、記憶手段であるメインメモリと、ＨＤＤ（Hard Disk Drive）等のストレージとを備える。ここで、ＣＰＵは、ＯＳ（基本ソフトウェア）やアプリケーションプログラム（応用ソフトウェア）等の各種ソフトウェアを実行する。また、メインメモリは、各種ソフトウェアやその実行に用いるデータ等を記憶する記憶領域であり、ストレージは、各種ソフトウェアに対する入力データや各種ソフトウェアからの出力データ等を記憶する記憶領域である。
さらに、携帯端末２０および管理サーバ４０は、外部との通信を行うための通信インタフェース（以下、「通信Ｉ／Ｆ」と表記する）と、ビデオメモリやディスプレイ等からなる表示機構と、入力ボタン、タッチパネル、キーボード等の入力機構とを備える。 The mobile terminal 20 and the management server 40 include a CPU (Central Processing Unit) as a calculation means, a main memory as a storage means, and a storage such as an HDD (Hard Disk Drive). Here, the CPU executes various software such as an OS (basic software) and application programs (application software). Further, the main memory is a storage area that stores various software and data used for executing the software, and the storage is a storage area that stores input data for various software, output data from the various software, and the like.
Furthermore, the mobile terminal 20 and the management server 40 include a communication interface (hereinafter referred to as "communication I/F") for communicating with the outside, a display mechanism consisting of a video memory, a display, etc., input buttons, It is equipped with input mechanisms such as a touch panel and a keyboard.

ネットワーク７０は、カメラ１０、携帯端末２０および管理サーバ４０の情報通信に用いられる通信手段であり、例えば、インターネットである。
ネットワーク８０も、ネットワーク７０と同様に、カメラ１０、携帯端末２０および管理サーバ４０の間の情報通信に用いられる通信手段であり、例えば、ＬＡＮ（Local Area Network）である。 The network 70 is a communication means used for information communication between the camera 10, the mobile terminal 20, and the management server 40, and is, for example, the Internet.
Like the network 70, the network 80 is also a communication means used for information communication between the camera 10, the mobile terminal 20, and the management server 40, and is, for example, a LAN (Local Area Network).

アクセスポイント９０は、無線通信回線を利用して無線通信を行う機器である。アクセスポイント９０は、カメラ１０、携帯端末２０および管理サーバ４０の間のネットワーク７０やネットワーク８０との間の情報の送受信を媒介する。
無線通信回線の種類としては、携帯電話回線、ＰＨＳ（Personal Handy-phone System）回線、Ｗｉ－Ｆｉ（Wireless Fidelity）、Bluetooth（登録商標）、ZigBee、ＵＷＢ（Ultra Wideband）等の各回線が使用可能である。 The access point 90 is a device that performs wireless communication using a wireless communication line. The access point 90 mediates the transmission and reception of information between the camera 10, the mobile terminal 20, and the management server 40 with the network 70 and the network 80.
The types of wireless communication lines that can be used include mobile phone lines, PHS (Personal Handy-phone System) lines, Wi-Fi (Wireless Fidelity), Bluetooth (registered trademark), ZigBee, and UWB (Ultra Wideband). It is.

＜記録システム１の動作の概略説明＞
図２は、記録システム１の概略動作の例について示した図である。
まず、ユーザＡが、携帯端末２０ａにより、予め定められた言葉の登録を行う。これは、ユーザＡが、言葉の情報を入力し、これを携帯端末２０ａから、アクセスポイント９０、ネットワーク７０、ネットワーク８０を介し、管理サーバ４０に送ることで行う（１Ａ）。
また、管理サーバ４０は、この言葉をＨＤＤ等のストレージに記憶し、登録が完了する（１Ｂ）。 <Schematic explanation of the operation of recording system 1>
FIG. 2 is a diagram showing an example of a schematic operation of the recording system 1. As shown in FIG.
First, user A registers predetermined words using the mobile terminal 20a. This is done by user A inputting verbal information and sending it from the mobile terminal 20a to the management server 40 via the access point 90, network 70, and network 80 (1A).
Furthermore, the management server 40 stores this word in a storage such as an HDD, and the registration is completed (1B).

次に、カメラ１０が、画像の撮影および音の取得を行い、撮影した画像および取得した音を、送信情報として管理サーバ４０に送信する（１Ｃ）。送信情報は、アクセスポイント９０、ネットワーク７０、ネットワーク８０を介し、管理サーバ４０に送られる。
管理サーバ４０では、撮影した画像および取得した音をＨＤＤ等のストレージに記憶する（１Ｄ）。 Next, the camera 10 photographs an image and acquires sound, and transmits the photographed image and the acquired sound to the management server 40 as transmission information (1C). The transmission information is sent to the management server 40 via the access point 90, network 70, and network 80.
The management server 40 stores the photographed images and the acquired sounds in a storage such as an HDD (1D).

そして、管理サーバ４０は、登録された言葉をストレージから読み出す（１Ｅ）。そして、取得した音の分析を行い、登録された言葉が音声として含まれているか否かを判別する（１Ｆ）。
さらに、管理サーバ４０は、登録された言葉が音声として含まれている場合、この言葉に対応する画像を選別し、音声とともにアルバム化する（１Ｇ）。このアルバムは、特定の言葉が含まれる画像が、音声とともに抽出されてまとめられたものである。アルバムに含まれる画像や音声の順序は、特に限られるものではなく、時系列順、時間の長さ順など種々の形態を取ることができる。
アルバム化された画像および音声は、ＨＤＤ等のストレージに記憶する（１Ｈ）。
アルバム化された画像および音声は、ユーザＢが携帯端末２０ｂを操作することにより、ストレージから読み出され（１Ｉ）、管理サーバ４０から携帯端末２０ｂに送られる（１Ｊ）。その結果、ユーザＢは、このアルバムを閲覧することができる。
なお、ユーザＡとユーザＢとは、例えば、所定の交友関係があり、親子の関係であったり、友人同士の関係である。なお、ユーザＡとユーザＢとが同一の人物であってもよい。つまり、言葉を登録した人物が、アルバムを閲覧する場合も考えられる。 The management server 40 then reads the registered words from the storage (1E). The acquired sound is then analyzed to determine whether the registered words are included as audio (1F).
Further, if the registered word is included as audio, the management server 40 selects an image corresponding to this word and creates an album together with the audio (1G). This album is a compilation of images containing specific words, along with audio. The order of images and sounds included in an album is not particularly limited, and can take various forms such as chronological order and time length order.
The albumed images and sounds are stored in a storage such as an HDD (1H).
The albumed images and sounds are read from the storage by user B operating the mobile terminal 20b (1I) and sent from the management server 40 to the mobile terminal 20b (1J). As a result, user B can view this album.
Note that the user A and the user B have, for example, a predetermined friendship relationship, such as a parent-child relationship, or a relationship between friends. Note that user A and user B may be the same person. In other words, it is conceivable that a person who has registered a word may view the album.

＜記録システム１の詳細説明＞
［第１の実施形態］
次に、本実施の形態の記録システム１の詳細な機能構成および動作について説明する。
ここでは、まず、記録システム１の第１の実施形態について説明する。第１の実施形態では、ユーザが、言葉として１つの言葉を管理サーバ４０に登録し、管理サーバ４０は、この言葉に基づきアルバムを作成する。 <Detailed explanation of recording system 1>
[First embodiment]
Next, the detailed functional configuration and operation of the recording system 1 of this embodiment will be explained.
Here, first, a first embodiment of the recording system 1 will be described. In the first embodiment, a user registers one word in the management server 40, and the management server 40 creates an album based on this word.

図３は、第１の実施形態における記録システム１の機能構成例を示したブロック図である。
なおここでは、記録システム１が有する種々の機能のうち本実施の形態に関係するものを選択して図示している。
カメラ１０は、人物の撮影を行う撮影部１１と、音を取得する音取得部１２と、撮影した画像の情報を送信する送信部１３とを備える。
撮影部１１は、人物の撮影を行う機能部であり、上述した光学系とイメージセンサとから構成される。
音取得部１２は、人物の音声を含むカメラ１０の周囲の音を取得する機能部であり、上述したマイクロフォンに対応する。
送信部１３は、撮影した画像および取得した音の情報を送信情報として、管理サーバ４０に対し送信する。送信部１３は、例えば、通信Ｉ／Ｆであり、アクセスポイント９０、ネットワーク７０およびネットワーク８０を介し、管理サーバ４０に送信情報を送信する。 FIG. 3 is a block diagram showing an example of the functional configuration of the recording system 1 in the first embodiment.
Here, out of the various functions that the recording system 1 has, those related to this embodiment are selected and illustrated.
The camera 10 includes a photographing section 11 that photographs a person, a sound acquisition section 12 that acquires sound, and a transmitting section 13 that transmits information about the photographed image.
The photographing unit 11 is a functional unit that photographs a person, and is composed of the above-mentioned optical system and an image sensor.
The sound acquisition unit 12 is a functional unit that acquires sounds surrounding the camera 10 including the voice of a person, and corresponds to the above-mentioned microphone.
The transmitter 13 transmits information about the captured image and the acquired sound to the management server 40 as transmission information. The transmitter 13 is, for example, a communication I/F, and transmits transmission information to the management server 40 via the access point 90, the network 70, and the network 80.

本実施の形態の記録システム１において、携帯端末２０ａと携帯端末２０ｂとは、同様の機能構成を有し、送信情報の送受信を行う送受信部２１と、画像の表示を行う表示部２２と、情報を入力する入力部２３と、発話音声を取得する音声取得部２４とを備える。 In the recording system 1 of this embodiment, the mobile terminal 20a and the mobile terminal 20b have the same functional configuration, and include a transmitting/receiving section 21 that transmits and receives transmission information, a display section 22 that displays images, and a The apparatus includes an input section 23 for inputting the utterance, and a voice acquisition section 24 for acquiring the uttered voice.

送受信部２１は、画像の情報や言葉の登録の際に、これに対応する情報の送受信を行う。送受信部２１は、例えば、通信Ｉ／Ｆであり、アクセスポイント９０、ネットワーク７０およびネットワーク８０を介し、管理サーバ４０と情報の送受信を行う。 The transmitting/receiving unit 21 transmits and receives information corresponding to image information and words when registering them. The transmitting/receiving unit 21 is, for example, a communication I/F, and transmits and receives information to and from the management server 40 via the access point 90, the network 70, and the network 80.

表示部２２は、画像の表示を行う。表示部２２は、例えば、タッチパネルである。この場合、表示部２２は、各種情報が表示されるディスプレイと、指やスタイラスペン等で接触された位置を検出する位置検出シートとを備える。接触された位置を検出する手段としては、接触による圧力をもとに検出する抵抗膜方式や、接触した物の静電気をもとに検出する静電容量方式など、どのようなものが用いられてもよい。 The display unit 22 displays images. The display unit 22 is, for example, a touch panel. In this case, the display unit 22 includes a display on which various information is displayed, and a position detection sheet that detects a position touched by a finger, a stylus pen, or the like. What types of methods are used to detect the touched position, such as a resistive film method that detects based on the pressure caused by contact, or a capacitance method that detects based on the static electricity of the object in contact. Good too.

入力部２３は、携帯端末２０のユーザが、所定の操作を行うための入力機構である。
例えば、上述したタッチパネルである。この場合、タッチパネルは、表示部２２および入力部２３の双方の機能を有する。つまり、所定の画像を表示するとともに、表示された画面に対し、タッチを行うことで、携帯端末２０で動作するアプリの起動・終了やアプリに対する操作を行うことができる。なお、これに限られるものではなく、入力部２３は、キーボードやマウス等で構成されていてもよい。 The input unit 23 is an input mechanism for the user of the mobile terminal 20 to perform predetermined operations.
For example, it is the touch panel mentioned above. In this case, the touch panel has the functions of both the display section 22 and the input section 23. That is, by displaying a predetermined image and touching the displayed screen, it is possible to start/stop an application running on the mobile terminal 20 and perform operations on the application. Note that the input unit 23 is not limited to this, and may be configured with a keyboard, a mouse, or the like.

音声取得部２４は、ユーザの発話音声を取得する。音声取得部２４は、例えば、マイクロフォンである。音声取得部２４として使用するマイクロフォンについては、特に限られることはなく、既存の種々のものが使用できる。 The audio acquisition unit 24 acquires the user's uttered audio. The audio acquisition unit 24 is, for example, a microphone. The microphone used as the audio acquisition section 24 is not particularly limited, and various existing microphones can be used.

管理サーバ４０は、外部と通信を行う送受信部４１と、画像および音の情報を記憶する記憶部４２と、音の中から登録された言葉を判別する判別部４３と、画像を選別する選別部４４とを備える。 The management server 40 includes a transmitting/receiving section 41 that communicates with the outside, a storage section 42 that stores image and sound information, a discrimination section 43 that discriminates registered words from sounds, and a sorting section that sorts images. 44.

送受信部４１は、カメラ１０や携帯端末２０との間で通信を行い、所定の情報のやりとりを行う。送受信部４１は、カメラ１０が撮影した画像および取得した音の情報を受信する。また、送受信部４１は、携帯端末２０ａから言葉の登録を受け付け、携帯端末２０ｂに対し、アルバムの情報を出力する。
記憶部４２は、記憶手段の一例であり、カメラ１０により撮影された画像として、人物が撮影されている画像と、画像に関連した音声と、を取得し、記憶する。また、ユーザが登録した言葉や後述するアルバムの情報について記憶する。 The transmitting/receiving unit 41 communicates with the camera 10 and the mobile terminal 20 and exchanges predetermined information. The transmitting/receiving unit 41 receives information about images captured by the camera 10 and acquired sounds. The transmitting/receiving unit 41 also accepts registration of words from the mobile terminal 20a, and outputs album information to the mobile terminal 20b.
The storage unit 42 is an example of a storage unit, and acquires and stores an image of a person as an image taken by the camera 10 and audio related to the image. It also stores words registered by the user and album information, which will be described later.

判別部４３は、判別手段の一例であり、記憶部４２に記憶された音声から、予め定められた言葉を判別する。詳しくは、後述するが、判別部４３は、例えば、ユーザにより予め登録された言葉が音の中に存在するか否かを判別する。 The determining unit 43 is an example of a determining unit, and determines predetermined words from the sounds stored in the storage unit 42. As will be described in detail later, the determining unit 43 determines, for example, whether or not a word registered in advance by the user exists in the sound.

選別部４４は、出力手段の一例であり、予め定められた言葉に対応する画像を選別し、選別した画像を出力する。出力された画像は、アルバムとして、記憶部４２に記憶される。そして、ユーザの操作により携帯端末２０にて閲覧することができる。 The sorting unit 44 is an example of an output means, and sorts out images corresponding to predetermined words, and outputs the sorted images. The output images are stored in the storage unit 42 as an album. Then, it can be viewed on the mobile terminal 20 by the user's operation.

送受信部４１は、例えば、通信Ｉ／Ｆである。また、記憶部４２は、例えば、ストレージである。さらに、判別部４３、選別部４４の各機能は、例えば、ＣＰＵにより実現することができる。 The transmitter/receiver 41 is, for example, a communication I/F. Further, the storage unit 42 is, for example, a storage. Further, each function of the determining section 43 and the sorting section 44 can be realized by, for example, a CPU.

次に、第１の実施形態における記録システム１の動作について、より詳細に説明を行う。
図４は、第１の実施形態における記録システム１の動作について説明したフローチャートである。
まず、携帯端末２０ａを所有するユーザが、アルバムを作成するためのキーワードとして、言葉の登録を行う。この場合、ユーザは、例えば、専用アプリを起動し、携帯端末２０ａの表示部２２を見ながら、入力部２３を使用して、言葉の入力を行う（ステップ１０１）。なお、音声取得部２４を使用して、音声入力を行ってもよい。このユーザは、図２に示した場合では、携帯端末２０ａを所有するユーザＡである。
入力された言葉の情報は、携帯端末２０ａの送受信部２１から、管理サーバ４０に送信される。そして、管理サーバ４０の送受信部４１は、言葉の情報を受信し、記憶部４２に記憶する（ステップ１０２）。 Next, the operation of the recording system 1 in the first embodiment will be explained in more detail.
FIG. 4 is a flowchart explaining the operation of the recording system 1 in the first embodiment.
First, a user who owns the mobile terminal 20a registers a word as a keyword for creating an album. In this case, the user starts, for example, a dedicated application, and uses the input unit 23 to input words while looking at the display unit 22 of the mobile terminal 20a (step 101). Note that the voice acquisition unit 24 may be used to perform voice input. In the case shown in FIG. 2, this user is user A who owns the mobile terminal 20a.
The input word information is transmitted to the management server 40 from the transmitting/receiving section 21 of the mobile terminal 20a. Then, the transmitting/receiving unit 41 of the management server 40 receives the word information and stores it in the storage unit 42 (step 102).

図５は、携帯端末２０ａにおいて、ユーザが言葉の登録を行う画面の例を示した図である。
この場合、表示部２２の上側には、「電子アルバムを作成するために必要なキーワードを入力してください。」のメッセージ２２１が表示されている。これは、ユーザに、言葉としてキーワードの入力を求めるメッセージである。 FIG. 5 is a diagram showing an example of a screen on which a user registers words on the mobile terminal 20a.
In this case, above the display section 22, a message 221 is displayed that reads, "Please enter the keywords necessary to create an electronic album." This is a message asking the user to input a keyword as a word.

そして、このメッセージ２２１の下側に、キーワード入力欄２２２が表示され、さらにその下側にアルバム名入力欄２２３が表示される。キーワード入力欄２２２は、アルバムを作成するための言葉を入力する欄である。この場合、キーワード入力欄２２２には、「おめでとう」が入力されている。さらに、アルバム名入力欄２２３は、作成されるアルバムの名称を入力する欄である。つまり、「おめでとう」の言葉により作成されるアルバムの名称を入力することができる。この場合、アルバム名入力欄２２３には、「誕生会」が入力されている。
そして、この画面から表示部２２の最下段に表示されるＯＫボタン２２４を押下すると、入力した内容が、管理サーバ４０に送信される。また、キャンセルボタン２２５を押下すると、例えば、この画面より前に表示されていた画面に戻る。 A keyword input field 222 is displayed below this message 221, and an album name input field 223 is further displayed below that. The keyword input field 222 is a field for inputting words for creating an album. In this case, "congratulations" is entered in the keyword input field 222. Further, the album name input field 223 is a field for inputting the name of the album to be created. In other words, it is possible to input the name of an album created using the words "Congratulations." In this case, “Birthday Party” is input in the album name input field 223.
Then, when the OK button 224 displayed at the bottom of the display section 22 is pressed from this screen, the entered contents are transmitted to the management server 40. Furthermore, when the cancel button 225 is pressed, the screen returns to the screen that was displayed before this screen, for example.

図４に戻り、カメラ１０の撮影部１１が、所定の撮影範囲について、画像の撮影を行うとともに、音取得部１２が、カメラ１０の周囲の音の取得を行う（ステップ１０３）。そして、送信部１３が、撮影した画像および取得した音を、送信情報として管理サーバ４０に送信する（ステップ１０４）。この送信情報には、画像や音を取得した時刻の情報やユーザＡのユーザＩＤ等を含んでもよい。
管理サーバ４０の送受信部４１は、送信情報を受け取り、記憶部４２に記憶する（ステップ１０５）。
次に、判別部４３が、取得した送信情報のうち音の情報を分析し、ステップ１０２で登録した言葉に対応する音声が含まれているか否かを判別する（ステップ１０６）。
その結果、含まれていない場合（ステップ１０６でＮｏ）、ステップ１０３に戻る。
対して、含まれている場合（ステップ１０６でＹｅｓ）、選別部４４は、言葉に対応する音声とともに、画像の抽出を行う（ステップ１０７）。そして、抽出された画像を並べ、アルバム化を行い、記憶部４２に出力する（ステップ１０８）。なお、この音声は、何れの人物の音声であってもよい。さらに、人物に限らず、ロボットの音声やペットの音声であってもよい。
記憶部４２では、出力されアルバム化された画像および音声を記憶する（ステップ１０９）。 Returning to FIG. 4, the photographing section 11 of the camera 10 photographs an image in a predetermined photographing range, and the sound acquisition section 12 acquires sounds around the camera 10 (step 103). Then, the transmitter 13 transmits the photographed image and the acquired sound to the management server 40 as transmission information (step 104). This transmission information may include information on the time when images and sounds were acquired, user A's user ID, and the like.
The transmitting/receiving unit 41 of the management server 40 receives the transmitted information and stores it in the storage unit 42 (step 105).
Next, the determining unit 43 analyzes the sound information in the acquired transmission information and determines whether the sound corresponding to the word registered in step 102 is included (step 106).
As a result, if it is not included (No in step 106), the process returns to step 103.
On the other hand, if the word is included (Yes in step 106), the selection unit 44 extracts the image along with the sound corresponding to the word (step 107). Then, the extracted images are arranged, made into an album, and output to the storage unit 42 (step 108). Note that this voice may be the voice of any person. Furthermore, the voice is not limited to a person, and may be a robot's voice or a pet's voice.
The storage unit 42 stores the images and sounds that have been output and made into an album (step 109).

このとき、選別部４４は、判別された言葉を発したときの画像を含む予め定められた期間の画像を選別する。
図６（ａ）～（ｂ）は、選別部４４が、画像を選択する期間について示した図である。
このうち、図６（ａ）は、選別部４４が、画像を選別する期間の第１の例を示した図である。
ここでは、選別部４４は、判別された言葉を発した人物を特定し、期間として、特定した人物が撮影されている期間の画像を選別する。つまり、選別部４４は、上記音声を発した人物を画像から特定し、この人物が撮影されている期間にて画像および音声を抽出する。図６（ａ）では、音声を発した時間を時間ｔ０で表し、選別部４４が、時間ｔ０を挟み、この期間として、期間ｔａを決定したことを示している。
また、図６（ｂ）は、選別部４４が、画像を選別する期間の第２の例を示した図である。
ここでは、選別部４４は、期間として、判別された言葉を発したときの画像が撮影された時間を基準として予め定められた時間範囲の画像を選別する。つまり、選別部４４は、上記音声を発した時間を時間ｔ０とすると、この時間ｔ０を挟み、この期間として、期間ｔｂを決定したことを示している。期間ｔｂは、時間ｔ０より以前の期間ｔ１と、以後の期間ｔ２とからなる。この期間ｔ１、ｔ２は、予め定められている。なお、期間ｔ１、ｔ２は、管理サーバ４０側で決めてもよく、携帯端末２０ａを所有するユーザが入力し、管理サーバ４０に送信することで決めてもよい。
なお、図６（ａ）～（ｂ）の場合は、選別部４４は、時間ｔ０を挟み期間の決定を行ったが、時間ｔ０を挟まず、時間ｔ０が起点または終点となる期間を設定してもよい。 At this time, the selection unit 44 selects images of a predetermined period including images when the identified words are uttered.
FIGS. 6(a) to 6(b) are diagrams showing periods during which the selection unit 44 selects images.
Of these, FIG. 6A is a diagram showing a first example of a period during which the selection unit 44 sorts out images.
Here, the selection unit 44 identifies the person who uttered the identified words, and selects, as the period, images during a period in which the identified person is photographed. That is, the selection unit 44 identifies the person who uttered the sound from the image, and extracts images and sounds during the period in which this person is photographed. In FIG. 6A, the time when the voice was uttered is expressed as time t0, and the selection unit 44 has determined the period ta as this period with time t0 in between.
Further, FIG. 6(b) is a diagram showing a second example of the period during which the sorting unit 44 sorts out images.
Here, the selection unit 44 selects images within a predetermined time range based on the time when the image when the identified word was uttered is taken as the period. In other words, the selection unit 44 has determined the period tb as the period between which the time t0 is defined as the time when the voice was uttered. The period tb consists of a period t1 before the time t0 and a period t2 after the time t0. These periods t1 and t2 are predetermined. Note that the periods t1 and t2 may be determined on the management server 40 side, or may be determined by being input by a user who owns the mobile terminal 20a and transmitted to the management server 40.
In the case of FIGS. 6(a) and 6(b), the sorting unit 44 determines the period with time t0 in between, but it does not include time t0 and sets a period whose starting point or end point is time t0. It's okay.

また、抽出した画像および音声は、他の期間において抽出した画像および音声と結合し、１つのアルバムとする。これは、選別部４４により行うことができる。この場合、選別部４４は、選別した画像および選別した画像に対応する音声を、例えば、時系列順に並べ、アルバム化する。これにより関連性がより高い画像および音声を１つにまとめることができる。ただし、別のアルバムとすることを妨げるものではない。例えば、送信情報が取得された日付が異なる場合は、関連性がより低いと考えられるため、別のアルバムとするようにしてもよい。 Furthermore, the extracted images and audio are combined with images and audio extracted during other periods to form one album. This can be done by the sorting section 44. In this case, the sorting unit 44 arranges the selected images and the sounds corresponding to the selected images, for example, in chronological order, and creates an album. This allows images and sounds that are more closely related to each other to be combined into one. However, this does not preclude making it a separate album. For example, if the transmission information was acquired on different dates, it is considered that the relevance is lower, so the information may be placed in a separate album.

再び図４に戻り、次に、送受信部４１は、ユーザからアルバムの閲覧の要求があったか否かを判断する（ステップ１１０）。このユーザは、図２に示した場合では、携帯端末２０ｂを所有するユーザＢである。この場合、ユーザＢは、例えば、携帯端末２０ｂにて専用アプリを起動する。そして、専用アプリにて、携帯端末２０ｂの表示部２２を見ながら、入力部２３を使用して、アルバムの閲覧の要求を行う。
その結果、アルバムの閲覧の要求がなかった場合（ステップ１１０でＮｏ）、ステップ１０３に戻る。
対して、アルバムの閲覧の要求があった場合（ステップ１１０でＹｅｓ）、送受信部４１は、記憶部４２から対応するアルバムの情報を読み出し、携帯端末２０ｂに送信する（ステップ１１１）。 Returning to FIG. 4 again, next, the transmitter/receiver 41 determines whether or not there is a request from the user to view the album (step 110). In the case shown in FIG. 2, this user is user B who owns the mobile terminal 20b. In this case, user B starts a dedicated application on the mobile terminal 20b, for example. Then, using the dedicated application, the user uses the input unit 23 while looking at the display unit 22 of the mobile terminal 20b to request viewing of the album.
As a result, if there is no request to view the album (No in step 110), the process returns to step 103.
On the other hand, if there is a request to view an album (Yes at step 110), the transmitting/receiving section 41 reads out the information of the corresponding album from the storage section 42 and transmits it to the mobile terminal 20b (step 111).

アルバムの情報は、携帯端末２０ｂの送受信部２１が受け取り、アルバムの内容が、表示部２２に表示される（ステップ１１２）。これにより、ユーザＢは、アルバムを閲覧することができる。 The album information is received by the transmitting/receiving section 21 of the mobile terminal 20b, and the contents of the album are displayed on the display section 22 (step 112). Thereby, user B can view the album.

［第２の実施形態］
次に、記録システム１の第２の実施形態について説明する。第２の実施形態では、ユーザが、言葉として複数の言葉を管理サーバ４０に登録し、管理サーバ４０は、複数の言葉に基づきアルバムを作成する。 [Second embodiment]
Next, a second embodiment of the recording system 1 will be described. In the second embodiment, a user registers a plurality of words as words in the management server 40, and the management server 40 creates an album based on the plurality of words.

第２の実施形態における記録システム１の機能構成例は、図３に示した第１の実施形態と同様である。よって、図３に示した機能構成例を使用して、第２の実施形態における記録システム１の動作について説明を行う。
図７は、第２の実施形態における記録システム１の動作について説明したフローチャートである。
ステップ２０１～ステップ２０５は、図４に示したステップ１０１～ステップ１０５と同様である。ただし、第２の実施形態では、ユーザは、図５に例示した画面から複数回言葉の登録を行う。 The functional configuration example of the recording system 1 in the second embodiment is the same as that in the first embodiment shown in FIG. 3. Therefore, the operation of the recording system 1 in the second embodiment will be explained using the functional configuration example shown in FIG. 3.
FIG. 7 is a flowchart explaining the operation of the recording system 1 in the second embodiment.
Steps 201 to 205 are similar to steps 101 to 105 shown in FIG. However, in the second embodiment, the user registers words multiple times from the screen illustrated in FIG.

図８（ａ）～（ｂ）は、図７のステップ２０５において、記憶部４２で保存するデータのデータ構造について示した図である。
図８（ａ）に図示するように、本データ構造は、「言葉」、「アルバム名」、「グループ」の各データを含む。このうち、「言葉」および「アルバム名」は、図５の例において、ユーザが入力したキーワードおよびアルバム名である。そして、本実施の形態では、ユーザは、複数の言葉を登録するため、言葉およびアルバム名は、複数個となっている。ここでは、「言葉」として、「おめでとう」、「誕生日」、「●●ちゃん」、「あけまして」、および「新年」が、登録され、それぞれの言葉について、「誕生会」および「新年会」それぞれのアルバム名が対応する場合を示している。なお、「●●ちゃん」は、例えば、子供の名前である。 FIGS. 8(a) to 8(b) are diagrams showing the data structure of data stored in the storage unit 42 in step 205 of FIG. 7.
As illustrated in FIG. 8(a), this data structure includes each data of "word", "album name", and "group". Among these, "word" and "album name" are the keyword and album name input by the user in the example of FIG. In this embodiment, since the user registers multiple words, there are multiple words and album names. Here, "Congratulations", "Birthday", "●●-chan", "Happy New Year", and "New Year" are registered as "words", and "Birthday party" and "New Year's party" are registered for each word. This shows cases in which the respective album names correspond. Note that "●●-chan" is, for example, a child's name.

また、「グループ」は、アルバム名に対し、設定される。図８（ａ）の例では、それぞれのアルバム名毎に、グループが１つずつ設定される。ここでは、「誕生会」のアルバム名にグループとして、「１」が設定されている。また、「新年会」のアルバム名にグループとして、「２」が設定されている。グループは、複数の言葉をグループ化したときに付けられる名称であり、このグループが同じであれば、言葉が異なっても同じグループとなる。その結果、言葉が異なっても同じアルバムとなる。図８（ａ）の例では、アルバムは、グループが１の場合と、グループが２の場合とで、２種類作成される。 Furthermore, "group" is set for the album name. In the example of FIG. 8A, one group is set for each album name. Here, "1" is set as a group in the album name of "Birthday Party." Further, "2" is set as a group in the album name of "New Year Party". A group is a name given when a plurality of words are grouped together, and if the groups are the same, they are the same group even if the words are different. The result is the same album even though the words are different. In the example shown in FIG. 8A, two types of albums are created: one for group 1 and one for group 2.

図８（ａ）の例では、それぞれのアルバム名毎に、グループが１つずつ設定されたがこれに限られるものではない。図８（ｂ）の例では、「誕生会」のアルバム名にグループとして、「１」または「２」が設定されている。また、「新年会」のアルバム名にグループとして、「３」が設定されている。この場合、言葉として、「おめでとう」および「誕生日」に対応するグループは、「１」であり、これらの言葉を基に作成されるアルバムは、同じとなる。ただし、言葉として、「●●ちゃん」に対応するグループは、「２」であり、この言葉を基に作成されるアルバムは、別となる。そして、言葉として、「あけまして」および「新年」に対応するグループは、「３」であり、この言葉を基に作成されるアルバムは、さらに別となる。図８（ｂ）の例では、アルバムは、グループが１、２、３のそれぞれの場合について、アルバムが３種類作成される。
グループについては、図５に示したような入力画面において、キーワードおよびアルバム名とともに、ユーザに入力させることで設定することができる。 In the example of FIG. 8(a), one group is set for each album name, but the invention is not limited to this. In the example of FIG. 8(b), "1" or "2" is set as a group in the album name "Birthday Party". Further, "3" is set as a group in the album name of "New Year Party". In this case, the group corresponding to the words "congratulations" and "birthday" is "1", and the albums created based on these words are the same. However, the group corresponding to the word "●●-chan" is "2", and the album created based on this word is different. The group corresponding to the words "New Year" and "New Year" is "3", and the album created based on these words is yet another. In the example of FIG. 8(b), three types of albums are created for each of groups 1, 2, and 3.
Groups can be set by having the user input them along with keywords and album names on the input screen as shown in FIG.

図７に戻り、判別部４３は、予め定められた言葉として複数種の言葉を判別する。つまり、登録した複数種の言葉のうち何れかの言葉に対応する音声が含まれているか否かを判別する（ステップ２０６）。
その結果、何れの言葉も含まれていない場合（ステップ２０６でＮｏ）、ステップ２０３に戻る。
対して、複数種の言葉のいずれかが含まれている場合（ステップ２０６でＹｅｓ）、選別部４４は、言葉に対応する音声とともに、画像の抽出を行う（ステップ２０７）。
さらに、選別部４４は、記憶部４２を参照し、含まれる言葉に対応するグループを取得する（ステップ２０８）。
そして、グループ毎にアルバム化を行い、記憶部４２に出力する（ステップ２０９）。即ち、第２の実施形態では、選別部４４は、複数種の言葉に対し設定されたグループ毎に画像を選別および出力する。
以後のステップ２１０～ステップ２１３は、図４のステップ１０９～ステップ１１２と同じである。 Returning to FIG. 7, the discrimination unit 43 discriminates a plurality of types of words as predetermined words. That is, it is determined whether or not a voice corresponding to any one of the registered plural types of words is included (step 206).
As a result, if no words are included (No in step 206), the process returns to step 203.
On the other hand, if any of a plurality of types of words is included (Yes in step 206), the selection unit 44 extracts the image along with the audio corresponding to the word (step 207).
Further, the selection unit 44 refers to the storage unit 42 and obtains a group corresponding to the included word (step 208).
Then, an album is created for each group and output to the storage unit 42 (step 209). That is, in the second embodiment, the sorting unit 44 sorts and outputs images for each group set for a plurality of types of words.
Subsequent steps 210 to 213 are the same as steps 109 to 112 in FIG. 4.

［第３の実施形態］
次に、記録システム１の第３の実施形態について説明する。第３の実施形態では、ユーザではなく、管理サーバ４０が言葉を選択し、この言葉に基づきアルバムを作成する。 [Third embodiment]
Next, a third embodiment of the recording system 1 will be described. In the third embodiment, the management server 40 rather than the user selects a word and creates an album based on this word.

図９は、第３の実施形態における記録システム１の機能構成例を示したブロック図である。
図示する記録システム１の機能構成例は、図３に示した記録システム１の機能構成例と比較して、管理サーバ４０に選択部４５が加わる点で異なり、他は同様である。よって、以下、第１の実施形態と異なる点を中心に説明を行う。 FIG. 9 is a block diagram showing an example of the functional configuration of the recording system 1 in the third embodiment.
The illustrated functional configuration example of the recording system 1 is different from the functional configuration example of the recording system 1 illustrated in FIG. 3 in that a selection unit 45 is added to the management server 40, and the other points are the same. Therefore, the following description will focus on the differences from the first embodiment.

選択部４５は、選択手段の一例であり、予め定められた言葉を、撮影する対象の状態に応じて選択する。撮影する対象の状態とは、カメラ１０で撮影する対象となる撮影シーンとして、どのような状態であるかを意味する。例えば、撮影シーンとしては、誕生会、新年会、クリスマス会、結婚式等のパーティや、会議、集会等のイベントなどが挙げられる。そして、管理サーバ４０の記憶部４２に、この撮影シーンに対応する言葉を予め登録しておき、選択部４５は、撮影シーンに応じ、記憶部４２に記憶されている言葉を読み出し、判別部４３で判別を行う言葉とする。 The selection unit 45 is an example of a selection means, and selects predetermined words depending on the state of the object to be photographed. The state of the object to be photographed refers to the state of the photographic scene to be photographed by the camera 10. For example, shooting scenes include parties such as birthday parties, New Year's parties, Christmas parties, and weddings, and events such as meetings and gatherings. Then, words corresponding to this shooting scene are registered in advance in the storage unit 42 of the management server 40, and the selection unit 45 reads out the words stored in the storage unit 42 according to the shooting scene. This is a word that can be used for discrimination.

次に、第３の実施形態における記録システム１の動作について説明を行う。
図１０は、第３の実施形態における記録システム１の動作について説明したフローチャートである。
まず、携帯端末２０ａを所有するユーザが、アルバムを作成するための撮影シーンの入力を行う（ステップ３０１）。この場合、ユーザは、例えば、専用アプリを使用することで撮影シーンの入力を行う。
入力された撮影シーンの情報は、送信情報として、携帯端末２０ａの送受信部２１から、管理サーバ４０に送信される。そして、管理サーバ４０の送受信部４１が、送信情報を受信し、選択部４５が、撮影シーンに対応する言葉を選択する（ステップ３０２）。さらに、記憶部４２は、選択された言葉の情報を記憶する（ステップ３０３）。 Next, the operation of the recording system 1 in the third embodiment will be explained.
FIG. 10 is a flowchart explaining the operation of the recording system 1 in the third embodiment.
First, a user who owns the mobile terminal 20a inputs a shooting scene for creating an album (step 301). In this case, the user inputs the shooting scene by using a dedicated application, for example.
The input shooting scene information is transmitted as transmission information from the transmission/reception unit 21 of the mobile terminal 20a to the management server 40. Then, the transmitting/receiving unit 41 of the management server 40 receives the transmitted information, and the selecting unit 45 selects words corresponding to the photographed scene (step 302). Furthermore, the storage unit 42 stores information on the selected word (step 303).

図１１は、携帯端末２０ａにおいて、撮影シーンの登録を行う画面の例を示した図である。
この場合、表示部２２の上側には、「電子アルバムを作成するときの撮影シーンを選択してください。」のメッセージ２２１が表示されている。これは、ユーザに、撮影シーンの入力を求めるメッセージである。 FIG. 11 is a diagram showing an example of a screen for registering a shooting scene on the mobile terminal 20a.
In this case, a message 221 is displayed on the upper side of the display section 22 that reads, "Please select a shooting scene when creating an electronic album." This is a message requesting the user to input a shooting scene.

そして、このメッセージ２２１の下側に、撮影シーン入力欄２２６が表示され、さらにその下側にアルバム名入力欄２２３が表示される。撮影シーン入力欄２２６は、例えば、プルダウンメニューになっており、表示される種々の撮影シーンの中から、選択できるようになっている。この場合、撮影シーン入力欄２２６では、「パーティ」が選択されたことを示している。さらに、アルバム名入力欄２２３は、作成されるアルバムの名称を入力する欄である。この場合、アルバム名入力欄２２３には、「誕生会」が入力されている。
そして、この画面から表示部２２の最下段に表示されるＯＫボタン２２４を押下すると、入力した内容が、管理サーバ４０に送信される。また、キャンセルボタン２２５を押下すると、例えば、この画面より前に表示されていた画面に戻る。 A shooting scene input field 226 is displayed below this message 221, and an album name input field 223 is further displayed below this. The photographic scene input field 226 is, for example, a pull-down menu, and allows selection from among various displayed photographic scenes. In this case, the shooting scene input field 226 indicates that "party" has been selected. Further, the album name input field 223 is a field for inputting the name of the album to be created. In this case, “Birthday Party” is input in the album name input field 223.
Then, when the OK button 224 displayed at the bottom of the display section 22 is pressed from this screen, the entered contents are transmitted to the management server 40. Furthermore, when the cancel button 225 is pressed, the screen returns to the screen that was displayed before this screen, for example.

図１０に戻り、ステップ３０４～ステップ３１３は、図４に示したステップ１０３～ステップ１１２と同様である。 Returning to FIG. 10, steps 304 to 313 are similar to steps 103 to 112 shown in FIG.

図１２は、図１０のステップ３０６において、記憶部４２で保存するデータのデータ構造について示した図である。
図１２に図示するように、本データ構造は、「撮影シーン」、「アルバム名」、「言葉」、「グループ」の各データを含む。このうち、「撮影シーン」および「アルバム名」は、図１１において、ユーザが入力した撮影シーンおよびアルバム名である。そして、本実施の形態では、撮影シーンに応じて選択部４５が選択した言葉が登録される。ここでは、撮影シーンが、「パーティ」の場合、「言葉」として、「おめでとう」、「誕生日」、「結婚」が、登録されたことを示している。また、撮影シーンが、「新年会」の場合、「言葉」として、「あけまして」、「新年」が、登録されたことを示している。 FIG. 12 is a diagram showing the data structure of the data stored in the storage unit 42 in step 306 of FIG. 10.
As shown in FIG. 12, this data structure includes each data of "shooting scene", "album name", "word", and "group". Among these, "photographing scene" and "album name" are the photographing scene and album name input by the user in FIG. 11. In this embodiment, the words selected by the selection unit 45 according to the shooting scene are registered. Here, when the shooting scene is "party", "congratulations", "birthday", and "marriage" are registered as "words". Further, when the shooting scene is a "New Year's party", "Happy New Year" and "New Year" are registered as "words".

また、「グループ」は、第２の実施形態の場合と同様である。この例では、それぞれのアルバム名毎に、グループが１つずつ設定される。ここでは、「誕生会」のアルバム名にグループとして、「１」が設定されている。また、「新年会」のアルバム名にグループとして、「２」が設定されている。 Furthermore, the "group" is the same as in the second embodiment. In this example, one group is set for each album name. Here, "1" is set as a group in the album name of "Birthday Party." Further, "2" is set as a group in the album name of "New Year Party".

なお、上述した例では、選択部４５は、ユーザが選んだ撮影シーンに応じて言葉を選択する処理を行ったが、選択部４５は、自装置の周辺の状況を認識し、認識した状況に応じて言葉を選択する処理を行ってもよい。つまり、カメラ１０にて撮影した画像や取得した音声から、撮影する対象の状態を推測し、パーティが行われている、会議が行われているなどの状況を認識する。そして、この状況に応じた言葉を選択し、この言葉を基にしてアルバムを作成する。 In the above example, the selection unit 45 performed the process of selecting words according to the shooting scene selected by the user, but the selection unit 45 recognizes the surrounding situation of its own device and selects words according to the recognized situation. Processing may be performed to select words accordingly. That is, the state of the object to be photographed is inferred from the image photographed by the camera 10 and the sound obtained, and the situation, such as a party or a meeting, is recognized. Then, choose a word that suits the situation and create an album based on this word.

［第４の実施形態］
次に、記録システム１の第４の実施形態について説明する。第４の実施形態では、カメラ１０および管理サーバ４０の双方の機能を、住居等に置かれたロボットを使用することで行う。 [Fourth embodiment]
Next, a fourth embodiment of the recording system 1 will be described. In the fourth embodiment, the functions of both the camera 10 and the management server 40 are performed by using a robot placed in a residence or the like.

図１３は、第３の実施形態で使用するロボット１００について示した図である。また、図１４は、ロボット１００を用いた場合の記録システム１の機能構成例を示したブロック図である。
図１３に示したロボット１００は、歩行等を行うことで移動する機能を有する移動式としてもよいが、移動しない非移動式としてもよい。
ロボット１００は、図３に示したカメラ１０と同様の機能を有する。つまり、撮影部１１に対応する撮影部１１０と、音取得部１２に対応するマイクロフォン１２０とを備える。なお、送信部１３は、送受信部４１と統合している。
さらに、ロボット１００は、送受信部４１に対応する通信アンテナ１３０と、記憶部４２、判別部４３および選別部４４に対応する制御部１４０とを有する。
このロボット１００により、第１の実施形態と同様のことを行うことができる。 FIG. 13 is a diagram showing a robot 100 used in the third embodiment. Further, FIG. 14 is a block diagram showing an example of the functional configuration of the recording system 1 when the robot 100 is used.
The robot 100 shown in FIG. 13 may be a mobile type that has the function of moving by walking or the like, or may be a non-mobile type that does not move.
The robot 100 has the same functions as the camera 10 shown in FIG. 3. That is, it includes a photographing section 110 corresponding to the photographing section 11 and a microphone 120 corresponding to the sound acquisition section 12. Note that the transmitter 13 is integrated with the transmitter/receiver 41.
Further, the robot 100 includes a communication antenna 130 corresponding to the transmitting/receiving section 41 and a control section 140 corresponding to the storage section 42, the determining section 43, and the sorting section 44.
This robot 100 can do the same things as in the first embodiment.

図１５は、第４の実施形態における記録システム１の動作について説明したフローチャートである。
ステップ４０１～ステップ４０３は、図４のステップ１０１～ステップ１０３と同様である。
以後、第４の実施形態では、画像や音の情報を管理サーバ４０に送信する必要がないため、ステップ４０３において取得した画像および音を記憶部４２に記憶する（ステップ４０４）。
そして、ステップ４０５～ステップ４１１は、図４のステップ１０６～ステップ１１２と同様である。 FIG. 15 is a flowchart explaining the operation of the recording system 1 in the fourth embodiment.
Steps 401 to 403 are similar to steps 101 to 103 in FIG.
Thereafter, in the fourth embodiment, since there is no need to transmit image and sound information to the management server 40, the image and sound acquired in step 403 are stored in the storage unit 42 (step 404).
Steps 405 to 411 are the same as steps 106 to 112 in FIG.

以上詳述した形態によれば、カメラ１０により取得した音の中から予め登録された言葉を判別し、この言葉が存在するときの画像および音声を抽出して、アルバムを作成する。よって、カメラ１０が取得した画像および音声を基にしてアルバムを作成することが、いわば自動的にできる。よって、ユーザが多くの時間と労力を必要とせずに、ユーザに対する負担を軽減して、カメラ１０が撮影した画像や取得した音声のアルバム化を図ることができる。 According to the embodiment described in detail above, a pre-registered word is identified from among the sounds acquired by the camera 10, images and sounds when the word is present are extracted, and an album is created. Therefore, it is possible to automatically create an album based on the images and sounds acquired by the camera 10. Therefore, it is possible to create an album of images taken by the camera 10 and sounds acquired, without requiring much time and effort on the part of the user, and with a reduced burden on the user.

また、以上詳述した第１の実施形態～第３の実施形態では、記録システム１は、カメラ１０と、携帯端末２０ａ、２０ｂと、管理サーバ４０とが、ネットワーク７０、ネットワーク８０、アクセスポイント９０を介して接続されることにより構成されていたが、管理サーバ４０だけでも記録システムであるとして扱うことができる。
また、第４の実施形態では、記録システム１は、ロボット１００と、携帯端末２０ａ、２０ｂとが、ネットワーク７０、ネットワーク８０、アクセスポイント９０を介して接続されることにより構成されていたが、ロボット１００だけでも記録システムであるとして扱うことができる。 Further, in the first to third embodiments described in detail above, the recording system 1 includes the camera 10, the mobile terminals 20a and 20b, and the management server 40, the network 70, the network 80, and the access point 90. However, the management server 40 alone can be treated as a recording system.
Further, in the fourth embodiment, the recording system 1 is configured by connecting the robot 100 and the mobile terminals 20a and 20b via the network 70, the network 80, and the access point 90. 100 alone can be treated as a recording system.

また、以上詳述した例において、カメラ１０を複数台用いる場合は、何れのカメラ１０の画像や音声を選択するかを決定するようにしてもよい。選択を行う方法としては、例えば、音声を発した人物の顔が撮影されている画像を選択する、音声が最も大きく取得されている画像を選択する、などが考えられる。
さらに、以上詳述した例では、アルバム化を図る画像は、動画である場合について説明したが、静止画であってもよい。なお、静止画の場合、アルバムの中に音声は入れなくてもよい。 Furthermore, in the example detailed above, when a plurality of cameras 10 are used, it may be determined which camera 10's image or audio is to be selected. Possible selection methods include, for example, selecting an image in which the face of the person who made the sound is photographed, or selecting an image in which the loudest sound is captured.
Further, in the example detailed above, the images to be put into an album are moving images, but they may be still images. Note that in the case of still images, it is not necessary to include audio in the album.

＜プログラムの説明＞
ここで、以上説明を行った本実施の形態における管理サーバ４０やロボット１００が行う処理は、例えば、アプリケーションソフトウェア等のプログラムとして用意される。そして、この処理は、ソフトウェアとハードウェア資源とが協働することにより実現される。即ち、管理サーバ４０やロボット１００に設けられたコンピュータ内部の図示しないＣＰＵが、上述した各機能を実現するプログラムを実行し、これらの各機能を実現させる。 <Program description>
Here, the processes performed by the management server 40 and the robot 100 in this embodiment described above are prepared, for example, as a program such as application software. This processing is realized by cooperation between software and hardware resources. That is, a CPU (not shown) inside a computer provided in the management server 40 or the robot 100 executes a program for realizing each of the above-described functions, thereby realizing each of these functions.

よって、本実施の形態で、管理サーバ４０やロボット１００が行う処理は、コンピュータに、人物が撮影されている画像と、画像に関連した音声と、を取得し、記憶する記憶機能と、記憶機能に記憶された音声から、予め定められた言葉を判別する判別機能と、予め定められた言葉に対応する画像を選別し、選別した画像を出力する出力機能と、を実現させるためのプログラムとして捉えることもできる。 Therefore, in this embodiment, the processing performed by the management server 40 and the robot 100 includes a memory function for acquiring and storing an image of a person and a sound related to the image in the computer; It can be viewed as a program that realizes a discrimination function that identifies predetermined words from sounds stored in the computer, and an output function that selects images that correspond to predetermined words and outputs the selected images. You can also do that.

なお、本実施の形態を実現するプログラムは、通信手段により提供することはもちろんＣＤ－ＲＯＭ等の記録媒体に格納して提供することも可能である。 It should be noted that the program for implementing the present embodiment can of course be provided by communication means, and can also be provided by being stored in a recording medium such as a CD-ROM.

以上、本実施の形態について説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されない。上記実施の形態に、種々の変更または改良を加えたものも、本発明の技術的範囲に含まれることは、特許請求の範囲の記載から明らかである。 Although the present embodiment has been described above, the technical scope of the present invention is not limited to the scope described in the above embodiment. It is clear from the claims that various changes or improvements made to the above embodiments are also included within the technical scope of the present invention.

１…記録システム、１０…カメラ、２０、２０ａ、２０ｂ…携帯端末、４０…管理サーバ、４１…送受信部、４２…記憶部、４３…判別部、４４…選別部、４５…選択部 DESCRIPTION OF SYMBOLS 1... Recording system, 10... Camera, 20, 20a, 20b... Mobile terminal, 40... Management server, 41... Transmission/reception part, 42... Storage part, 43... Discrimination part, 44... Selection part, 45... Selection part

Claims

a storage means for acquiring and storing an image in which a person is photographed and audio related to the image;
a selection means that receives input of a photographic scene from a user and selects words that are registered in advance and associated with the accepted photographic scene;
Discrimination means for discriminating the selected word from the audio stored in the storage means;
output means for selecting images corresponding to the selected word and outputting the selected images;
has
The output means selects images of a predetermined period including images when the identified words are uttered, identifies the person who uttered the identified words, and determines when the identified person is photographed during the period. A recording system characterized by selecting images from a period of time.

2. The recording system according to claim 1, wherein the output means arranges the selected images and the audio corresponding to the selected images in chronological order and creates an album.

to the computer,
a memory function that acquires and stores an image of a person and audio related to the image;
a selection function that accepts input of a shooting scene from a user and selects words that are registered in advance and associated with the accepted shooting scene;
a discrimination function that discriminates the selected word from the audio stored in the memory function;
an output function that selects images corresponding to the selected word and outputs the selected images;
Realize,
The output function selects images from a predetermined period including images when the identified words were uttered, identifies the person who uttered the identified words, and identifies the person who uttered the identified words during the relevant period. A program for sorting images from a given period.