JP7801976B2

JP7801976B2 - Subject imaging device, subject imaging method, and program

Info

Publication number: JP7801976B2
Application number: JP2022141651A
Authority: JP
Inventors: 奈翁美笹谷
Original assignee: Individual
Current assignee: Individual
Priority date: 2022-09-06
Filing date: 2022-09-06
Publication date: 2026-01-19
Anticipated expiration: 2042-09-06
Also published as: JP2024037028A

Description

本開示は、被写体撮像装置、被写体撮像方法、及びプログラムに関する。 This disclosure relates to a subject imaging device, a subject imaging method, and a program.

動物などの被写体を撮影する場合、動物の気を引いて良い状態の動物を撮影したいというニーズがある。動物は人間の意志とは関係なく自由気ままに移動することから、動物を撮影に適した状態に維持することが難しかった。また、被写体が人間である場合であっても、例えば、乳幼児などは撮影に適した状態を維持することが難しかった。 When photographing subjects such as animals, there is a need to attract the animal's attention and capture the animal in the best possible condition. However, because animals move freely regardless of human will, it has been difficult to keep the animal in the right condition for photography. Even when the subject is a human, it has been difficult to keep an animal in the right condition for photography, for example, with infants.

例えば、下記の特許文献１には、撮影者が生物の撮影を行う場合に、当該撮影を支援する撮影支援装置であって、生息位置を含む生物に関する生物関連情報を記憶する生物関連情報データベースと、生物を撮影する撮影者の位置情報を取得する位置情報取得手段と、取得した位置情報に示される撮影者の現在位置周辺に生息している生物に関する生物関連情報を抽出する生物関連情報抽出部と、抽出された生物関連情報を送信する生物関連情報送信手段と、を備える撮影支援装置が開示されている。 For example, Patent Document 1 below discloses a photography support device that supports photographers in photographing living organisms, and includes: a living organism-related information database that stores living organism-related information about living organisms, including their habitat locations; a location information acquisition means that acquires the location information of the photographer photographing the living organisms; a living organism-related information extraction unit that extracts living organism-related information about living organisms living around the photographer's current location, as indicated in the acquired location information; and a living organism-related information transmission means that transmits the extracted living organism-related information.

特開２０１２－１９９８８５号公報JP 2012-199885 A

しかしながら、上記の特許文献１に記載の撮影支援装置は、生物の生息位置を把握することはできるが、被写体を撮影に適した状態にすることはできなかった。 However, while the photography support device described in Patent Document 1 above can determine the habitat of living creatures, it cannot put the subject in a state suitable for photography.

本開示は上記の課題を鑑み、人や動物などの被写体を撮影に適した状態にして撮影することが可能な被写体撮像装置、被写体撮像方法、及びプログラムを提供することを目的とする。 In consideration of the above-mentioned problems, the present disclosure aims to provide a subject imaging device, subject imaging method, and program that can capture images of subjects such as people and animals in a state suitable for photography.

上述した課題を解決し、目的を達成するために、本開示に係る被写体撮像装置は、人や動物を含む被写体の映像を撮像する撮像部と、前記撮像部が撮像した被写体の映像に基づいて被写体の種別を判別する判別部と、前記判別部が判別した被写体の種別に応じて音のバリエーションを変えて、被写体に反応を促す音情報を生成する生成部と、前記生成部が生成した前記音情報を出力する出力部と、前記音情報が出力された際の被写体の映像を取得する取得部と、前記音情報を出力した際の被写体の映像に基づいて、出力した前記音情報に対する被写体の反応を判定する判定部と、出力した前記音情報の特徴と被写体の反応との関係を学習する学習部と、を備え、前記生成部は、出力した前記音情報の特徴と被写体の反応との関係に基づいて、音のバリエーションを変化させて前記音情報を生成する。 In order to solve the above-mentioned problems and achieve the objectives, the subject imaging device according to the present disclosure comprises an imaging unit that captures video of a subject, including people and animals; a discrimination unit that determines the type of subject based on the video of the subject captured by the imaging unit; a generation unit that generates sound information that prompts a reaction from the subject by changing the variation of the sound according to the type of subject determined by the discrimination unit; an output unit that outputs the sound information generated by the generation unit; an acquisition unit that acquires the video of the subject when the sound information is output; a discrimination unit that determines the reaction of the subject to the output sound information based on the video of the subject when the sound information is output; and a learning unit that learns the relationship between the characteristics of the output sound information and the reaction of the subject, and the generation unit generates the sound information by changing the variation of the sound based on the relationship between the characteristics of the output sound information and the reaction of the subject.

実施形態の一態様によれば、人や動物などの被写体を撮影に適した状態にして撮影することが可能な被写体撮像装置、被写体撮像方法、及びプログラムを提供することができる。 According to one aspect of the embodiment, it is possible to provide a subject imaging device, subject imaging method, and program that can capture images of subjects such as people and animals in a state suitable for photography.

図１は、実施形態に係る情報処理の一例を示す図である。FIG. 1 is a diagram illustrating an example of information processing according to an embodiment. 図２は、実施形態に係る被写体撮像装置の構成例を示す図である。FIG. 2 is a diagram illustrating an example of the configuration of a subject imaging apparatus according to the embodiment. 図３は、実施形態に係る情報処理のフローを示すフローチャートである。FIG. 3 is a flowchart showing the flow of information processing according to the embodiment. 図４は、被写体撮像装置の機能を実現するコンピュータの一例を示すハードウェア構成図である。FIG. 4 is a hardware configuration diagram showing an example of a computer that realizes the functions of the subject imaging device.

以下に、本願に係る被写体撮像装置、被写体撮像方法、及びプログラムを実施するための形態（以下、「実施形態」と記載する。）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願に係る被写体撮像装置、被写体撮像方法、及びプログラムが限定されるものではない。 The following describes in detail, with reference to the drawings, aspects (hereinafter referred to as "embodiments") of the subject imaging device, subject imaging method, and program according to the present application. Note that the subject imaging device, subject imaging method, and program according to the present application are not limited to these embodiments.

（実施形態）
〔１．実施形態に係る情報処理〕
〔１－１．実施形態に係る情報処理の一例〕
まず、図１を用いて、実施形態に係る情報処理の一例について説明する。図１は、実施形態に係る情報処理の一例を示す図である。まず、実施形態に係る情報処理の概要を説明した後に、個々の処理を詳細に説明する。 (Embodiment)
1. Information Processing According to the Embodiment
1-1. Example of information processing according to the embodiment
First, an example of information processing according to an embodiment will be described with reference to Fig. 1. Fig. 1 is a diagram showing an example of information processing according to an embodiment. First, an overview of the information processing according to the embodiment will be described, and then each process will be described in detail.

図１では、被写体撮像装置１０が、被写体としての動物Ａ１の種類を判別し、判別した被写体の種類に合わせて音情報を生成し、生成した音情報を動物Ａ１に向けて発して、音に対して反応した動物Ａ１を撮像する処理が示されている。以下、図１を用いて、実施形態に係る情報処理の一例についてステップごとに詳細に説明する。 Figure 1 shows a process in which the subject imaging device 10 determines the type of animal A1 serving as the subject, generates sound information according to the determined type of subject, emits the generated sound information toward the animal A1, and captures an image of the animal A1 that reacts to the sound. Below, an example of information processing according to this embodiment will be described in detail step by step using Figure 1.

まず、被写体撮像装置１０は、被写体の映像を含む撮像データを取得する（ステップＳ１）。例えば、被写体撮像装置１０は、後述して説明する撮像部１３が撮像した人や動物などの被写体の映像を含む撮像データを後述して説明する記憶部１２から読み出すことにより取得する。 First, the subject imaging device 10 acquires imaging data including video of the subject (step S1). For example, the subject imaging device 10 acquires imaging data including video of the subject, such as a person or animal, captured by the imaging unit 13 (described below) by reading it from the storage unit 12 (described below).

次に、被写体撮像装置１０は、撮像データに基づいて、被写体の種別を判別する（ステップＳ２）。例えば、被写体撮像装置１０は、撮像データに基づいて、撮像データからオブジェクトを検出し、検出したオブジェクトを分類する処理、すなわちオブジェクト検出処理を行って、撮像データから被写体の種類を判別する処理を実行する。 Next, the subject imaging device 10 determines the type of subject based on the imaging data (step S2). For example, the subject imaging device 10 performs an object detection process, which detects objects from the imaging data and classifies the detected objects, based on the imaging data, to determine the type of subject from the imaging data.

次に、被写体撮像装置１０は、被写体の種別に応じて音のバリエーションを変えて、被写体に反応を促す音情報を生成する（ステップＳ３）。例えば、被写体撮像装置１０は、被写体が子供である場合は、おもちゃのピーピー、キューキューなどの擬態語を用いて表現することが可能な音情報や、電車、車、アニメのキャラクターの声などを含む音情報を生成する。また、被写体撮像装置１０は、被写体が犬猫である場合は、おもちゃ、ビニール袋の音であってカサカサという擬態語により表現可能な音情報や、飼い主の声、鳥の声、虫の声、動物の声、缶詰を開ける音、袋を開ける音、物が擦れる音などを含む音情報を生成する。 Next, the subject imaging device 10 generates sound information that prompts a response from the subject by varying the sound variations depending on the type of subject (step S3). For example, if the subject is a child, the subject imaging device 10 generates sound information that can be expressed using onomatopoeia such as the squeals and squeals of toys, as well as sound information including the voices of trains, cars, and anime characters. If the subject is a dog or cat, the subject imaging device 10 generates sound information that can be expressed using onomatopoeia such as the rustling of toys and plastic bags, as well as the voice of the owner, birds, insects, animals, the sound of opening a can, the sound of opening a bag, and the sound of objects rustling.

次に、被写体撮像装置１０は、生成した音情報を出力する（ステップＳ４）。例えば、被写体撮像装置１０は、生成した音情報を音声として被写体に向けてスピーカーから出力する。 Next, the subject imaging device 10 outputs the generated sound information (step S4). For example, the subject imaging device 10 outputs the generated sound information as audio from a speaker toward the subject.

次に、被写体撮像装置１０は、音情報が出力された際の被写体の映像を取得する（ステップＳ５）。例えば、被写体撮像装置１０は、音情報を被写体に向けて出力し、被写体が音情報に反応した際の映像を撮影し、撮影した映像を取得する。 Next, the subject imaging device 10 captures an image of the subject when the sound information is output (step S5). For example, the subject imaging device 10 outputs sound information toward the subject, captures an image of the subject reacting to the sound information, and acquires the captured image.

これにより、音を発することにより被写体を撮影に適した状態にして、撮影に適した状態で被写体を撮影することが可能となる。そのため、人や動物などの被写体を撮影に適した状態にして撮影することが可能な被写体撮像装置１０を提供することができる。 This makes it possible to put the subject in a state suitable for photography by emitting sound, allowing the subject to be photographed in a state suitable for photography. Therefore, it is possible to provide a subject imaging device 10 that can photograph subjects such as people and animals in a state suitable for photography.

〔１－２．実施形態に係る情報処理の他の例〕
被写体撮像装置１０は、被写体の種別に加えて、被写体の気分、時間、場所、周りの状況、騒音有無を含むコンテキストにも応じて、音のバリエーションを変化させて音情報を生成する。 1-2. Other Examples of Information Processing According to the Embodiment
The subject imaging device 10 generates sound information by changing the variation of sound according to the type of subject as well as the context including the subject's mood, time, place, surrounding conditions, and whether or not there is noise.

この情報処理について順を追って説明する。まず、被写体撮像装置１０は、図１に示したステップＳ１からステップＳ２までと同じ処理を実行する。ステップＳ１からステップＳ２までの処理は、前述したステップＳ１からステップＳ２までの処理と同じであるから説明を省略する。 This information processing will be explained step by step. First, the subject imaging device 10 executes the same processing as steps S1 to S2 shown in Figure 1. The processing from steps S1 to S2 is the same as the processing from steps S1 to S2 described above, so its explanation will be omitted.

次に、被写体撮像装置１０は、被写体の種別に応じて音のバリエーションを変えて、被写体に反応を促す音情報を生成する（ステップＳ３）。この場合において、被写体撮像装置１０は、被写体の種別に加えて、被写体の気分、時間、場所、周りの状況、騒音有無を含むコンテキストにも応じて、音のバリエーションを変化させて音情報を生成する。なお、ここで、音のバリエーションとは、音の大きさや、音の音調、音の長さ、音の種類などを含む音の内容を表すものである。例えば、被写体撮像装置１０は、場所が「公園」であり、周りの状況が「子供が遊んでいる」という状況である場合であれば、楽しげな雰囲気を壊さないように、中程度の音の大きさで、明るい音調の音情報を生成する。 Next, the subject imaging device 10 generates sound information that changes the sound variation depending on the type of subject and prompts a response from the subject (step S3). In this case, the subject imaging device 10 generates sound information by changing the sound variation depending on the context, including the subject's mood, time, location, surrounding circumstances, and whether or not there is noise, in addition to the type of subject. Note that here, sound variation refers to the content of the sound, including the volume, tone, length, and type of sound. For example, if the location is a "park" and the surrounding circumstances are "children playing," the subject imaging device 10 generates sound information with a medium volume and a bright tone so as not to disrupt the cheerful atmosphere.

次に、被写体撮像装置１０は、図１に示したステップＳ４からステップＳ５までと同じ処理を実行する。ステップＳ４からステップＳ５までの処理は、前述したステップＳ４からステップＳ５までの処理と同じであるから説明を省略する。 Next, the subject imaging device 10 executes the same processing as steps S4 to S5 shown in FIG. 1. Since the processing from steps S4 to S5 is the same as the processing from steps S4 to S5 described above, a description thereof will be omitted.

これにより、周囲の状況などのコンテキストに応じて出力する音を変えて出力することができる。そのため、周囲の状況に配慮したうえで、音を発することにより被写体を撮影に適した状態にして、撮影に適した状態で被写体を撮影することが可能となる。 This allows the sound output to be changed depending on the context, such as the surrounding situation. Therefore, by emitting sound while taking into consideration the surrounding situation, it is possible to put the subject in an optimal state for shooting, and capture the subject in an optimal state for shooting.

〔２．被写体撮像装置の構成〕
次に、図２を用いて実施形態に係る被写体撮像装置の構成について説明する。図２は、実施形態に係る被写体撮像装置の構成例を示す図である。図２に示すように、被写体撮像装置１０は、通信部１１と、記憶部１２と、撮像部１３と、音出力部１４と、制御部１５を備える。 2. Configuration of the object imaging device
Next, the configuration of the subject imaging device according to the embodiment will be described with reference to Fig. 2. Fig. 2 is a diagram showing an example of the configuration of the subject imaging device according to the embodiment. As shown in Fig. 2, the subject imaging device 10 includes a communication unit 11, a storage unit 12, an imaging unit 13, a sound output unit 14, and a control unit 15.

（通信部１１について）
通信部１１は、被写体撮像装置１０の内部と外部を相互に通信可能に接続し、被写体撮像装置１０の内部と外部との間で相互に情報を送受信する。通信部１１は、例えば、ＮＩＣ（ＮｅｔｗｏｒｋＩｎｔｅｒｆａｃｅＣａｒｄ）、無線ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）カード、Ｂｌｕｅｔｏｏｔｈ（登録商標）モジュール、Ｗｉ－Ｆｉ（登録商標）モジュール、アンテナ等によって実現されてよい。 (Regarding the communication unit 11)
The communication unit 11 connects the inside and outside of the subject imaging device 10 so that they can communicate with each other, and transmits and receives information between the inside and outside of the subject imaging device 10. The communication unit 11 may be realized by, for example, a NIC (Network Interface Card), a wireless LAN (Local Area Network) card, a Bluetooth (registered trademark) module, a Wi-Fi (registered trademark) module, an antenna, or the like.

（記憶部１２について）
記憶部１２は、各種の情報を記憶する記憶装置である。記憶部１２は、主記憶装置と補助記憶装置とを備える。主記憶装置は、例えばＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、フラッシュメモリ（ＦｌａｓｈＭｅｍｏｒｙ）等のような半導体メモリ素子によって実現されてよい。また、補助記憶装置は、例えばハードディスクやＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、光ディスク等によって実現されてよい。 (Regarding the storage unit 12)
The storage unit 12 is a storage device that stores various types of information. The storage unit 12 includes a main storage device and an auxiliary storage device. The main storage device may be realized by a semiconductor memory element such as a random access memory (RAM), a read-only memory (ROM), or a flash memory. The auxiliary storage device may be realized by a hard disk, a solid state drive (SSD), an optical disk, or the like.

（撮像部１３について）
撮像部１３は、人や動物などの被写体の映像を撮像する。撮像部１３は、例えばカメラであり、複数のカメラを組み合わせたものや、３６０度の全方向を撮影する全周カメラなどで構成したものでもよい。撮像部１３は、人や動物などの被写体の映像を撮像したら、撮像データを記憶部１２に記憶する。 (Regarding the imaging unit 13)
The imaging unit 13 captures video of a subject such as a person or an animal. The imaging unit 13 is, for example, a camera, and may be configured as a combination of multiple cameras or an all-around camera that captures video in all directions 360 degrees. After capturing video of a subject such as a person or an animal, the imaging unit 13 stores the captured video data in the storage unit 12.

ここで、カメラについて説明すると、カメラは、光学素子と撮像素子とを含み、光学素子は、例えばレンズ、ミラー、プリズム、フィルタなどの光学系を構成する素子である。撮像素子は、光学素子を通して入射した光を電気信号である画像信号に変換する素子である。なお、撮像素子は、例えば、ＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ）センサや、ＣＭＯＳ（ＣｏｍｐｌｅｍｅｎｔａｒｙＭｅｔａｌＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ）センサなどである。 Now, to explain cameras, cameras include optical elements and image sensors. Optical elements are elements that make up an optical system, such as lenses, mirrors, prisms, and filters. Image sensors are elements that convert light that enters through optical elements into electrical image signals. Examples of image sensors include CCD (Charge Coupled Device) sensors and CMOS (Complementary Metal Oxide Semiconductor) sensors.

（音出力部１４について）
音出力部１４は、人や動物などの被写体に向けて音情報を出力する。音出力部１４は、例えば、スピーカーにより実現されてよい。スピーカーは、電気信号をダイヤフラムにより音に変換する。すなわち、音出力部１４は、電気信号により与えられる制御指令に基づいて、ダイヤフラムを所定の振幅、振動数によって振動させることにより、ダイヤフラムに接している空気を振動させて音を出力する。 (Regarding the sound output unit 14)
The sound output unit 14 outputs sound information toward a subject, such as a person or an animal. The sound output unit 14 may be realized by, for example, a speaker. The speaker converts an electrical signal into sound using a diaphragm. That is, based on a control command given by the electrical signal, the sound output unit 14 vibrates the diaphragm at a predetermined amplitude and frequency, thereby vibrating the air in contact with the diaphragm and outputting sound.

（制御部１５について）
制御部１５は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）やＭＰＵ（ＭｉｃｒｏＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）等によって、被写体撮像装置１０の記憶装置に記憶されている各種プログラムがＲＡＭを作業領域として実行されることにより実現される。また、制御部１５は、例えば、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）やＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）等の集積回路により実現されてもよい。 (Regarding the control unit 15)
The control unit 15 is realized by a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or the like executing various programs stored in the storage device of the subject imaging device 10 using RAM as a work area. The control unit 15 may also be realized by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

図２に示すように、制御部１５は、取得部１５１と、判別部１５２と、判定部１５３と、生成部１５４と、学習部１５５と、出力部１５６を有する。制御部１５は、記憶部１２からプログラムを読み出して実行することで、これらを実現して、これらの処理を実行する。なお、制御部１５は、１つのＣＰＵによってこれらの処理を実行してもよいし、複数のＣＰＵを備えて、複数のＣＰＵで、これらの処理を並列に実行してもよい。以下、これらの構成について順に説明する。 As shown in FIG. 2, the control unit 15 has an acquisition unit 151, a discrimination unit 152, a determination unit 153, a generation unit 154, a learning unit 155, and an output unit 156. The control unit 15 realizes these functions and performs these processes by reading and executing programs from the storage unit 12. Note that the control unit 15 may perform these processes using a single CPU, or may be equipped with multiple CPUs that perform these processes in parallel. Each of these components will be described below in order.

（取得部１５１について）
取得部１５１は、人や動物などの被写体の映像を含む撮像データを取得する。すなわち、取得部１５１は、撮像部１３が撮像した人や動物などの被写体の映像を含む撮像データを記憶部１２から読み出して取得する。 (Regarding the Acquisition Unit 151)
The acquisition unit 151 acquires imaging data including video of subjects such as people, animals, etc. That is, the acquisition unit 151 reads and acquires imaging data including video of subjects such as people, animals, etc. captured by the imaging unit 13 from the storage unit 12.

取得部１５１は、音情報が出力された際の被写体の映像を取得する。例えば、取得部１５１は、音情報が出力された際の撮像部１３が撮像した被写体の映像を、記憶部１２から読み出して取得する。 The acquisition unit 151 acquires the video of the subject when the sound information is output. For example, the acquisition unit 151 reads and acquires from the storage unit 12 the video of the subject captured by the imaging unit 13 when the sound information is output.

（判別部１５２について）
判別部１５２は、撮像部１３の撮像データに基づいて動物の種類を判別する。判別部１６３は、撮像データに基づいて、撮像データからオブジェクトを検出し、検出したオブジェクトを分類する処理、すなわちオブジェクト検出処理を行って、撮像データから動物の種類を判別する処理を実行してよい。判別部１６３は、例えば撮像データの画素を物体クラスに分類する手法であるセマンティック・セグメンテーション（ＳｅｍａｎｔｉｃＳｅｇｍｅｎｔａｔｉｏｎ）等を用いることにより実現されてよい。 (Regarding the determination unit 152)
The discrimination unit 152 discriminates the type of animal based on the imaging data of the imaging unit 13. The discrimination unit 163 may detect objects from the imaging data based on the imaging data and classify the detected objects, i.e., perform object detection processing, and execute processing to discriminate the type of animal from the imaging data. The discrimination unit 163 may be realized, for example, by using semantic segmentation, which is a method of classifying pixels of imaging data into object classes.

（判定部１５３について）
判定部１５３は、音情報を出力した際の被写体の映像に基づいて、出力した音情報に対する被写体の反応を判定する。例えば、判定部１５３は、被写体の映像と被写体の反応を示すラベルとの関係を学習した学習済みモデルに、被写体の映像を入力することにより、音情報に対する被写体の反応を判定してよい。この場合の学習用モデルには、例えば、ＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）等を用いてよい。また、この場合の被写体の反応を示すラベルは任意に設定して良く、例えば、「良好」、「普通」、「反応なし」などが挙げられる。 (Regarding the determination unit 153)
The determination unit 153 determines the subject's reaction to the output sound information based on the video of the subject when the sound information is output. For example, the determination unit 153 may determine the subject's reaction to the sound information by inputting the video of the subject into a trained model that has learned the relationship between the video of the subject and a label indicating the subject's reaction. In this case, for example, a convolutional neural network (CNN) or the like may be used as the training model. In addition, the label indicating the subject's reaction in this case may be set arbitrarily, and examples include "good,""normal," and "no reaction."

（生成部１５４について）
生成部１５４は、判別部１５２が判別した被写体の種別に応じて音のバリエーションを変えて、被写体に反応を促す音情報を生成する。例えば、生成部１５４は、被写体が子供である場合、おもちゃのピーピー、キューキューなどの擬態語を用いて表現することが可能な音情報や、電車、車、アニメのキャラクターの声などを含む音情報を生成する。また、生成部１５４は、被写体が犬猫である場合は、おもちゃ、ビニール袋の音であってカサカサという擬態語により表現可能な音情報や、飼い主の声、鳥の声、虫の声、動物の声、缶詰を開ける音、袋を開ける音、物が擦れる音などを含む音情報を生成する。 (Regarding the generation unit 154)
The generation unit 154 generates sound information that prompts a response from the subject by changing the variation of the sound depending on the type of subject determined by the determination unit 152. For example, if the subject is a child, the generation unit 154 generates sound information that can be expressed using onomatopoeia such as the squealing and squealing of toys, as well as sound information including the voices of trains, cars, anime characters, etc. Furthermore, if the subject is a dog or cat, the generation unit 154 generates sound information that can be expressed using onomatopoeia such as the rustling of toys and plastic bags, as well as the voice of the owner, the voice of birds, the voice of insects, the voice of animals, the sound of opening a can, the sound of opening a bag, the sound of objects rustling, etc.

生成部１５４は、被写体の種別に加えて、被写体の気分、時間、場所、周りの状況、騒音有無を含むコンテキストにも応じて、音のバリエーションを変化させて音情報を生成する。例えば、生成部１５４は、なお、ここで、音のバリエーションとは、音の大きさや、音の音調、音の長さ、音の種類などを含む音の内容を表すものである。例えば、生成部１５４は、場所が「公園」であり、周りの状況が「子供が遊んでいる」という状況である場合であれば、楽しげな雰囲気を壊さないように、中程度の音の大きさで、明るい音調の音情報を生成する。なお、生成部１５４は、後述する学習部１５５が学習させた学習済みモデルを用いて、被写体の反応が「良好」となるように、音のバリエーションを変えて音情報を生成してもよい。 The generation unit 154 generates sound information by changing the sound variation in accordance with the context, including the mood of the subject, time, location, surrounding circumstances, and whether or not there is noise, in addition to the type of subject. Note that here, sound variation represents the content of the sound, including the volume, tone, length, and type of sound. For example, if the location is a "park" and the surrounding circumstances are "children playing," the generation unit 154 generates sound information with a medium volume and a bright tone so as not to ruin the cheerful atmosphere. Note that the generation unit 154 may generate sound information by changing the sound variation, using a trained model trained by the learning unit 155 (described later), so as to elicit a "good" response from the subject.

（学習部１５５について）
学習部１５５は、出力した音情報の特徴と被写体の反応との関係をモデルに学習させる。すなわち、学習部１５５は、出力した音情報のデータと被写体の反応のラベルを学習データとしてモデルに学習させる。この場合のモデルには、例えば、ディープニューラルネットワーク（ＤｅｅｐＮｅｕｒａｌＮｅｔｗｏｒｋ）等を用いてよい。学習部１５５は、音情報のデータと被写体の反応との関係をモデルに学習させたら、学習済みモデルを記憶部１２に記憶する。 (Regarding the learning unit 155)
The learning unit 155 causes a model to learn the relationship between the features of the output sound information and the subject's response. That is, the learning unit 155 causes the model to learn the output sound information data and the subject's response labels as learning data. In this case, for example, a deep neural network or the like may be used as the model. After the learning unit 155 has caused the model to learn the relationship between the sound information data and the subject's response, the learning unit 155 stores the learned model in the storage unit 12.

（出力部１５６について）
出力部１５６は、生成部１５４が生成した音情報を出力する。すなわち、出力部１５６は、音出力部１４に生成部１５４が生成した音情報を出力させる制御指令を与えることにより、音情報を出力する。なお、出力部１５６は、音出力部１４に制御指令を与えることに限定されず、通信部１１を介して、外部の音情報を出力可能な装置に音情報を出力させる制御指令を与えてもよい。 (Regarding the output unit 156)
The output unit 156 outputs the sound information generated by the generation unit 154. That is, the output unit 156 outputs the sound information by giving a control command to the sound output unit 14 to output the sound information generated by the generation unit 154. Note that the output unit 156 is not limited to giving a control command to the sound output unit 14, and may give a control command via the communication unit 11 to an external device capable of outputting sound information to output the sound information.

〔３．情報処理のフロー〕
次に、図３を用いて、実施形態に係る被写体撮像装置１０による情報処理の手順について説明する。図３は、実施形態に係る情報処理の一例を示すフローチャートである。例えば、被写体撮像装置１０は、人や動物を含む被写体の映像を撮像する（ステップＳ１０１）。そして、被写体撮像装置１０は、撮像した被写体の映像に基づいて被写体の種別を判別する（ステップＳ１０２）。そして、被写体撮像装置１０は、判別した被写体の種別に応じて、被写体に反応を促す音のバリエーションを変えて音情報を生成する（ステップＳ１０３）。そして、被写体撮像装置１０は、生成した音情報を出力する（ステップＳ１０４）。そして、被写体撮像装置１０は、音情報が出力された際の被写体の映像を取得する（ステップＳ１０５）。そして、被写体撮像装置１０は、音情報を出力した際の被写体の映像に基づいて、出力した音情報に対する被写体の反応を判定する（ステップＳ１０６）。そして、被写体撮像装置１０は、出力した音情報の特徴と被写体の反応との関係を学習する（ステップＳ１０７）。 [3. Information processing flow]
Next, the procedure of information processing by the subject imaging device 10 according to the embodiment will be described with reference to FIG. 3 . FIG. 3 is a flowchart illustrating an example of information processing according to the embodiment. For example, the subject imaging device 10 captures video of a subject, including a person or an animal (step S101). Then, the subject imaging device 10 determines the type of subject based on the captured video of the subject (step S102). Then, the subject imaging device 10 generates sound information by changing the variation of a sound that prompts a response from the subject according to the determined type of subject (step S103). Then, the subject imaging device 10 outputs the generated sound information (step S104). Then, the subject imaging device 10 acquires a video of the subject when the sound information is output (step S105). Then, the subject imaging device 10 determines the subject's response to the output sound information based on the video of the subject when the sound information is output (step S106). Then, the subject imaging device 10 learns the relationship between the characteristics of the output sound information and the subject's response (step S107).

〔４．ハードウェア構成〕
また、上述した実施形態に係る被写体撮像装置１０は、例えば図４に示すような構成のコンピュータ１０００によって実現される。図４は、被写体撮像装置の機能を実現するコンピュータの一例を示すハードウェア構成図である。コンピュータ１０００は、出力装置１０１０、入力装置１０２０と接続され、演算装置１０３０、一次記憶装置１０４０、二次記憶装置１０５０、出力ＩＦ（Ｉｎｔｅｒｆａｃｅ）１０６０、入力ＩＦ１０７０、ネットワークＩＦ１０８０がバス１０９０により接続された形態を有する。 4. Hardware Configuration
The subject imaging device 10 according to the embodiment described above is realized by a computer 1000 having a configuration such as that shown in Fig. 4. Fig. 4 is a hardware configuration diagram showing an example of a computer that realizes the functions of the subject imaging device. The computer 1000 is connected to an output device 1010 and an input device 1020, and has a configuration in which an arithmetic unit 1030, a primary storage device 1040, a secondary storage device 1050, an output IF (Interface) 1060, an input IF 1070, and a network IF 1080 are connected via a bus 1090.

演算装置１０３０は、一次記憶装置１０４０や二次記憶装置１０５０に格納されたプログラムや入力装置１０２０から読み出したプログラム等に基づいて動作し、各種の処理を実行する。一次記憶装置１０４０は、ＲＡＭ等、演算装置１０３０が各種の演算に用いるデータを一次的に記憶するメモリ装置である。また、二次記憶装置１０５０は、演算装置１０３０が各種の演算に用いるデータや、各種のデータベースが記憶される記憶装置であり、ＲＯＭ(ＲｅａｄＯｎｌｙＭｅｍｏｒｙ)、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）、フラッシュメモリ等により実現される。 The arithmetic device 1030 operates based on programs stored in the primary storage device 1040 and secondary storage device 1050, or programs read from the input device 1020, and executes various processes. The primary storage device 1040 is a memory device, such as RAM, that temporarily stores data used by the arithmetic device 1030 for various calculations. The secondary storage device 1050 is a storage device that stores data used by the arithmetic device 1030 for various calculations and various databases, and is realized by a ROM (Read Only Memory), HDD (Hard Disk Drive), flash memory, etc.

出力ＩＦ１０６０は、モニタやプリンタといった各種の情報を出力する出力装置１０１０に対し、出力対象となる情報を送信するためのインタフェースであり、例えば、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）やＤＶＩ（ＤｉｇｉｔａｌＶｉｓｕａｌＩｎｔｅｒｆａｃｅ）、ＨＤＭＩ（登録商標）（ＨｉｇｈＤｅｆｉｎｉｔｉｏｎＭｕｌｔｉｍｅｄｉａＩｎｔｅｒｆａｃｅ）といった規格のコネクタにより実現される。また、入力ＩＦ１０７０は、マウス、キーボード、およびスキャナ等といった各種の入力装置１０２０から情報を受信するためのインタフェースであり、例えば、ＵＳＢ等により実現される。 The output IF 1060 is an interface for transmitting information to be output to an output device 1010, such as a monitor or printer, which outputs various types of information. It is implemented, for example, by a connector conforming to a standard such as USB (Universal Serial Bus), DVI (Digital Visual Interface), or HDMI (High Definition Multimedia Interface). The input IF 1070 is an interface for receiving information from various input devices 1020, such as a mouse, keyboard, and scanner, and is implemented, for example, by USB.

なお、入力装置１０２０は、例えば、ＣＤ（ＣｏｍｐａｃｔＤｉｓｃ）、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）、ＰＤ（ＰｈａｓｅｃｈａｎｇｅｒｅｗｒｉｔａｂｌｅＤｉｓｋ）等の光学記録媒体、ＭＯ（Ｍａｇｎｅｔｏ－Ｏｐｔｉｃａｌｄｉｓｋ）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等から情報を読み出す装置であってもよい。また、入力装置１０２０は、ＵＳＢメモリ等の外付け記憶媒体であってもよい。 The input device 1020 may be a device that reads information from optical recording media such as a CD (Compact Disc), DVD (Digital Versatile Disc), or PD (Phase Change Rewritable Disk), a magneto-optical recording media such as an MO (Magneto-Optical Disk), a tape medium, a magnetic recording medium, or a semiconductor memory. The input device 1020 may also be an external storage medium such as a USB memory.

ネットワークＩＦ１０８０は、ネットワークＮを介して他の機器からデータを受信して演算装置１０３０へ送り、また、ネットワークＮを介して演算装置１０３０が生成したデータを他の機器へ送信する。 Network IF 1080 receives data from other devices via network N and sends it to the computing device 1030, and also transmits data generated by the computing device 1030 to other devices via network N.

演算装置１０３０は、出力ＩＦ１０６０や入力ＩＦ１０７０を介して、出力装置１０１０や入力装置１０２０の制御を行う。例えば、演算装置１０３０は、入力装置１０２０や二次記憶装置１０５０からプログラムを一次記憶装置１０４０上にロードし、ロードしたプログラムを実行する。 The arithmetic unit 1030 controls the output device 1010 and the input device 1020 via the output IF 1060 and the input IF 1070. For example, the arithmetic unit 1030 loads a program from the input device 1020 or the secondary storage device 1050 onto the primary storage device 1040 and executes the loaded program.

例えば、コンピュータ１０００が被写体撮像装置１０として機能する場合、コンピュータ１０００の演算装置１０３０は、一次記憶装置１０４０上にロードされたプログラムを実行することにより、被写体撮像装置１０の制御部１５の機能を実現する。 For example, when the computer 1000 functions as the subject imaging device 10, the arithmetic unit 1030 of the computer 1000 executes a program loaded onto the primary storage device 1040, thereby realizing the functions of the control unit 15 of the subject imaging device 10.

〔５．構成と効果〕
本開示に係る被写体撮像装置は、人や動物を含む被写体の映像を撮像する撮像部と、撮像部が撮像した被写体の映像に基づいて被写体の種別を判別する判別部と、判別部が判別した被写体の種別に応じて音のバリエーションを変えて、被写体に反応を促す音情報を生成する生成部と、生成部が生成した音情報を出力する出力部と、音情報が出力された際の被写体の映像を取得する取得部と、音情報を出力した際の被写体の映像に基づいて、出力した音情報に対する被写体の反応を判定する判定部と、出力した音情報の特徴と被写体の反応との関係を学習する学習部と、を備え、生成部は、出力した音情報の特徴と被写体の反応との関係に基づいて、音のバリエーションを変化させて音情報を生成する。 [5. Composition and Effects]
The subject imaging device according to the present disclosure includes an imaging unit that captures video of a subject including people and animals, a discrimination unit that discriminates the type of subject based on the video of the subject captured by the imaging unit, a generation unit that generates sound information that prompts a reaction from the subject by changing the variation of the sound according to the type of subject discriminated by the discrimination unit, an output unit that outputs the sound information generated by the generation unit, an acquisition unit that acquires the video of the subject when the sound information is output, a discrimination unit that determines the reaction of the subject to the output sound information based on the video of the subject when the sound information is output, and a learning unit that learns the relationship between the characteristics of the output sound information and the reaction of the subject, and the generation unit generates sound information by changing the variation of the sound based on the relationship between the characteristics of the output sound information and the reaction of the subject.

この構成によれば、音を発することにより被写体を撮影に適した状態にして、撮影に適した状態で被写体を撮影することが可能となる。そのため、人や動物などの被写体を撮影に適した状態にして撮影することが可能な被写体撮像装置１０を提供することができる。 With this configuration, it is possible to put the subject in a state suitable for photography by emitting sound, and then photograph the subject in a state suitable for photography. Therefore, it is possible to provide a subject imaging device 10 that can photograph subjects such as people and animals in a state suitable for photography.

本開示に係る被写体撮像装置の生成部は、被写体の種別に加えて、被写体の気分、時間、場所、周りの状況、騒音有無を含むコンテキストにも応じて、音のバリエーションを変化させて前記音情報を生成する。 The generation unit of the subject imaging device according to the present disclosure generates the sound information by varying the sound variations according to the context, including the subject's mood, time, location, surrounding circumstances, and whether or not there is noise, in addition to the type of subject.

この構成によれば、周囲の状況などのコンテキストに応じて出力する音を変えて出力することができる。そのため、周囲の状況に配慮したうえで、音を発することにより被写体を撮影に適した状態にして、撮影に適した状態で被写体を撮影することが可能となる。 With this configuration, the sound output can be changed depending on the context, such as the surrounding situation. Therefore, by emitting sound while taking the surrounding situation into consideration, it is possible to put the subject in a state suitable for shooting, and capture the subject in a state suitable for shooting.

本開示に係る被写体撮像方法は、人や動物を含む被写体の映像を撮像するステップと、撮像した被写体の映像に基づいて被写体の種別を判別するステップと、判別した被写体の種別に応じて音のバリエーションを変えて、被写体に反応を促す音情報を生成するステップと、生成した音情報を出力するステップと、音情報が出力された際の被写体の映像を取得するステップと、音情報を出力した際の被写体の映像に基づいて、出力した音情報に対する被写体の反応を判定するステップと、出力した音情報の特徴と被写体の反応との関係を学習するステップと、を備え、生成するステップにおいては、出力した音情報の特徴と被写体の反応との関係に基づいて、音のバリエーションを変化させて音情報を生成する。 The subject imaging method according to the present disclosure comprises the steps of capturing video of a subject, including a person or an animal; determining the type of subject based on the captured video of the subject; generating sound information that prompts a reaction from the subject by changing the sound variation according to the determined type of subject; outputting the generated sound information; acquiring video of the subject when the sound information is output; determining the subject's reaction to the output sound information based on the video of the subject when the sound information is output; and learning the relationship between the characteristics of the output sound information and the subject's reaction. In the generating step, the sound information is generated by changing the sound variation based on the relationship between the characteristics of the output sound information and the subject's reaction.

この構成によれば、音を発することにより被写体を撮影に適した状態にして、撮影に適した状態で被写体を撮影することが可能となる。そのため、人や動物などの被写体を撮影に適した状態にして撮影することが可能な被写体撮像方法を提供することができる。 With this configuration, it is possible to put the subject in a state suitable for photography by emitting sound, and then photograph the subject in a state suitable for photography. This provides a subject imaging method that can photograph subjects such as people and animals in a state suitable for photography.

本開示に係るプログラムは、人や動物を含む被写体の映像を撮像するステップと、撮像した被写体の映像に基づいて被写体の種別を判別するステップと、判別した被写体の種別に応じて音のバリエーションを変えて、被写体に反応を促す音情報を生成するステップと、生成した音情報を出力するステップと、音情報が出力された際の被写体の映像を取得するステップと、音情報を出力した際の被写体の映像に基づいて、出力した音情報に対する被写体の反応を判定するステップと、出力した音情報の特徴と被写体の反応との関係を学習するステップと、を備え、生成するステップにおいては、出力した音情報の特徴と被写体の反応との関係に基づいて、音のバリエーションを変化させて音情報を生成すること、をコンピュータに実行させる。 The program disclosed herein includes the steps of capturing video of a subject, including people and animals; determining the type of subject based on the captured video of the subject; generating sound information that prompts a response from the subject by changing the sound variation according to the determined type of subject; outputting the generated sound information; acquiring a video of the subject when the sound information is output; determining the subject's response to the output sound information based on the video of the subject when the sound information is output; and learning the relationship between the characteristics of the output sound information and the subject's response. In the generating step, the program causes a computer to generate sound information by changing the sound variation based on the relationship between the characteristics of the output sound information and the subject's response.

この構成によれば、音を発することにより被写体を撮影に適した状態にして、撮影に適した状態で被写体を撮影することが可能となる。そのため、人や動物などの被写体を撮影に適した状態にして撮影することが可能なプログラムを提供することができる。 With this configuration, it is possible to put the subject in a state suitable for photography by emitting sound, allowing the subject to be photographed in a state suitable for photography. Therefore, it is possible to provide a program that allows subjects such as people and animals to be photographed in a state suitable for photography.

以上、本願の実施形態を図面に基づいて詳細に説明したが、これは例示であり、発明の開示の欄に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 The above describes in detail the embodiments of the present application based on the drawings, but this is merely an example, and the present invention can be implemented in other forms that incorporate various modifications and improvements based on the knowledge of those skilled in the art, including the aspects described in the Disclosure of the Invention section.

また、上述してきた「部（ｓｅｃｔｉｏｎ、ｍｏｄｕｌｅ、ｕｎｉｔ）」は、「手段」や「回路」などに読み替えることができる。例えば、取得部１５１は、取得手段や取得回路に読み替えることができる。 Furthermore, the above-mentioned "section, module, unit" can be interpreted as "means" or "circuit." For example, the acquisition unit 151 can be interpreted as acquisition means or acquisition circuit.

１０被写体撮像装置
１１通信部
１２記憶部
１３撮像部
１４音出力部
１５制御部
１５１取得部
１５２判別部
１５３判定部
１５４生成部
１５５学習部
１５６出力部 REFERENCE SIGNS LIST 10 Subject imaging device 11 Communication unit 12 Storage unit 13 Imaging unit 14 Sound output unit 15 Control unit 151 Acquisition unit 152 Discrimination unit 153 Determination unit 154 Generation unit 155 Learning unit 156 Output unit

Claims

an imaging unit that captures video of subjects including people and animals;
a discrimination unit that discriminates the type of subject based on the image of the subject captured by the imaging unit;
a generator that generates sound information that changes the variation of a sound depending on the type of subject determined by the determiner and prompts the subject to respond;
an output unit that outputs the sound information generated by the generation unit;
an acquisition unit that acquires an image of a subject when the sound information is output;
a determination unit that determines a response of the subject to the output sound information based on an image of the subject when the sound information is output;
a learning unit that learns the relationship between the characteristics of the output sound information and the reaction of the subject,
the generation unit generates the sound information by changing a variation of the sound based on a relationship between a feature of the output sound information and a reaction of the subject.
Subject imaging device.

the generation unit generates the sound information by changing the sound variation in accordance with the type of subject as well as context including the subject's mood, time, place, surrounding circumstances, and presence or absence of noise.
The subject imaging device according to claim 1 .

capturing an image of a subject including a person or an animal;
determining the type of the subject based on the captured image of the subject;
generating sound information that changes the sound variation depending on the type of the identified subject, and prompts the subject to respond;
outputting the generated sound information;
acquiring an image of the subject when the sound information is output;
determining a response of the subject to the output sound information based on an image of the subject when the sound information is output;
and learning a relationship between the characteristics of the output sound information and a response of the subject,
In the generating step, the sound information is generated by changing a variation of the sound based on a relationship between a feature of the output sound information and a response of the subject.
Subject imaging method.

capturing an image of a subject including a person or an animal;
determining the type of the subject based on the captured image of the subject;
generating sound information that changes the sound variation depending on the type of the identified subject, and prompts the subject to respond;
outputting the generated sound information;
acquiring an image of the subject when the sound information is output;
determining a response of the subject to the output sound information based on an image of the subject when the sound information is output;
and learning a relationship between the characteristics of the output sound information and a response of the subject,
In the generating step, the sound information is generated by changing a variation of the sound based on a relationship between a feature of the output sound information and a reaction of the subject;
A program that causes a computer to execute the following.