JP7729466B2

JP7729466B2 - Image classification device, image classification method, and program

Info

Publication number: JP7729466B2
Application number: JP2024507433A
Authority: JP
Inventors: 尊裕中川
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2022-03-18
Filing date: 2022-03-18
Publication date: 2025-08-26
Anticipated expiration: 2042-03-18
Also published as: JPWO2023175931A1; WO2023175931A1

Description

本開示は、撮影した画像を分類する技術に関する。 This disclosure relates to technology for classifying captured images.

愛玩動物（以下、「ペット」と呼ぶ。）などを被写体とする写真や動画は膨大な量になることがあり、膨大な量の写真や動画の中には、ペットが後ろを向いているなど、ペットの飼い主の好みに合わない写真や動画が含まれていることがある。ペットの飼い主が、このような膨大な写真や動画の中から、好みの写真や動画を分類していくことは手間である。例えば、特許文献１では、被写体を撮像した複数の画像データから被写体の動作を識別し、分類する装置を記載している。 Photographs and videos of pets (hereafter referred to as "pets") and other subjects can amount to a huge volume, and among these vast amounts of photographs and videos, there may be some that do not suit the pet owner's preferences, such as when the pet is facing away from them. It is time-consuming for pet owners to sort through such a vast number of photographs and videos to find their favorites. For example, Patent Document 1 describes a device that identifies and classifies the movements of a subject from multiple image data of the subject.

特開２００５－２６７６０４号公報Japanese Patent Application Laid-Open No. 2005-267604

しかし、特許文献１によっても、ペットの飼い主が好む写真や動画を分類することは難しい。 However, even with Patent Document 1, it is difficult to classify photos and videos that pet owners prefer.

本開示の１つの目的は、複数の画像からユーザが好む画像を分類することができる画像分類装置を提供することにある。 One objective of the present disclosure is to provide an image classification device that can classify images that a user prefers from multiple images.

上記の課題を解決するため、本開示の一つの観点では、画像分類装置は、
対象被写体が写っている画像を取得する画像取得手段と、
前記対象被写体の所定の状態が撮影されたと推定される条件である所定の状態の発生条件を満たしているか否かを判定する条件判定手段と、
対象被写体が写っている画像と、所定の状態の発生条件を満たしているか否かの判定結果と、前記対象被写体の所定の状態と、の関係が機械学習されたモデルを用いて、前記画像取得手段により取得された前記画像と前記条件判定手段により判定された判定結果から、前記対象被写体の所定の状態が写っている画像を分類する画像分類手段と、
前記画像及び前記分類の結果を出力する出力手段と、
を備え、
前記条件判定手段は、前記対象被写体の撮影者の心拍数に基づいて、前記所定の状態の発生条件を満たしているか否かを判定する。
In order to solve the above problem, in one aspect of the present disclosure, an image classification device includes:
image acquisition means for acquiring an image including a target subject;
a condition determination means for determining whether a predetermined condition for the target subject to be photographed is satisfied; and
an image classification means for classifying images of a target subject that show a predetermined state based on the images acquired by the image acquisition means and the determination result made by the condition determination means , using a machine-learned model of the relationship between an image of the target subject , a determination result as to whether a condition for occurrence of a predetermined state is satisfied, and the predetermined state of the target subject;
an output means for outputting the image and the classification result;
Equipped with
The condition determining means determines whether or not the occurrence condition of the predetermined state is satisfied based on the heart rate of a photographer of the target subject .

本発明の他の観点では、コンピュータにより実行される画像分類方法は、
対象被写体が写っている画像を取得する画像取得処理を行い、
前記対象被写体の所定の状態が撮影されたと推定される条件である所定の状態の発生条件を満たしているか否かを判定する条件判定処理を行い、
対象被写体が写っている画像と、所定の状態の発生条件を満たしているか否かの判定結果と、前記対象被写体の所定の状態と、の関係が機械学習されたモデルを用いて、前記画像取得処理により取得された前記画像と前記条件判定処理により判定された判定結果から、前記対象被写体の所定の状態が写っている画像を分類する画像分類処理を行い、
前記画像及び前記分類の結果を出力する出力処理を行い、
前記条件判定処理は、前記対象被写体の撮影者の心拍数に基づいて、前記所定の状態の発生条件を満たしているか否かを判定する。
In another aspect of the present invention, a computer implemented method for image classification comprises:
An image acquisition process is performed to acquire an image containing the target subject.
performing a condition determination process for determining whether a predetermined condition for the target subject to be photographed is satisfied;
performing an image classification process to classify images of the target subject that show the predetermined state based on the images acquired by the image acquisition process and the determination result determined by the condition determination process , using a model in which the relationship between the image in which the target subject is captured , the determination result of whether or not a condition for occurrence of a predetermined state is satisfied , and the predetermined state of the target subject has been machine-learned ;
performing an output process for outputting the image and the classification result;
The condition determination process determines whether or not the occurrence condition of the predetermined state is satisfied based on the heart rate of a photographer of the target subject .

本発明のさらに他の観点では、プログラムは、
対象被写体が写っている画像を取得する画像取得処理を行い、
前記対象被写体の所定の状態が撮影されたと推定される条件である所定の状態の発生条件を満たしているか否かを判定する条件判定処理を行い、
対象被写体が写っている画像と、所定の状態の発生条件を満たしているか否かの判定結果と、前記対象被写体の所定の状態と、の関係が機械学習されたモデルを用いて、前記画像取得処理により取得された前記画像と前記条件判定処理により判定された判定結果から、前記対象被写体の所定の状態が写っている画像を分類する画像分類処理を行い、
前記画像及び前記分類の結果を出力する出力処理を行い、
前記条件判定処理は、前記対象被写体の撮影者の心拍数に基づいて、前記所定の状態の発生条件を満たしているか否かを判定する処理をコンピュータに実行させる。 In yet another aspect of the invention, a program includes:
An image acquisition process is performed to acquire an image containing the target subject.
performing a condition determination process for determining whether a predetermined condition for the target subject to be photographed is satisfied;
performing an image classification process to classify images of the target subject that show the predetermined state based on the images acquired by the image acquisition process and the determination result determined by the condition determination process , using a model in which the relationship between the image in which the target subject is captured , the determination result of whether or not a condition for occurrence of a predetermined state is satisfied , and the predetermined state of the target subject has been machine-learned ;
performing an output process for outputting the image and the classification result;
The condition determination process causes a computer to execute a process of determining whether or not the occurrence condition of the predetermined state is satisfied based on the heart rate of a person who is taking a photograph of the target subject .

本開示によれば、複数の画像からユーザが好む画像を分類することが可能となる。 This disclosure makes it possible to classify images that a user prefers from multiple images.

第１実施形態に係る画像分類システムの全体構成を示す。1 shows the overall configuration of an image classification system according to a first embodiment. サーバ及びユーザ端末の構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of a server and a user terminal. サーバの機能構成を示すブロック図である。FIG. 2 is a block diagram showing the functional configuration of a server. 学習装置の機能構成を示すブロック図である。FIG. 2 is a block diagram showing the functional configuration of the learning device. 画像分類システムのフローチャートである。1 is a flowchart of an image classification system. 第１実施形態の変形例１の機能構成を示すブロック図である。FIG. 10 is a block diagram showing a functional configuration of a first modified example of the first embodiment. 第２実施形態の情報処理装置の機能構成を示すブロック図である。FIG. 10 is a block diagram showing the functional configuration of an information processing apparatus according to a second embodiment. 第２実施形態の情報処理装置による処理のフローチャートである。10 is a flowchart of a process performed by an information processing apparatus according to a second embodiment.

＜第１実施形態＞
［全体構成］
図１は、本開示に係る画像分類装置を適用した画像分類システムの全体構成を示す。画像分類システム１は、サーバ２００と、飼い主の使用するユーザ端末３００とを含む。サーバ２００は画像分類装置の一例である。サーバ２００と飼い主のユーザ端末３００とは無線通信可能である。 First Embodiment
[Overall configuration]
1 shows the overall configuration of an image classification system to which an image classification device according to the present disclosure is applied. The image classification system 1 includes a server 200 and a user terminal 300 used by a pet owner. The server 200 is an example of an image classification device. The server 200 and the owner's user terminal 300 are capable of wireless communication.

基本的な動作として、サーバ２００は、飼い主のユーザ端末３００から送信された動画を基に、ペットの所定の状態が写っている画像を取得する。具体的に、飼い主は、ペットＰと遊ぶときなどに、ユーザ端末３００を常時録画モードにし、動画を撮影する。そして、ユーザ端末３００は撮影した動画（以下、「撮影動画」とも呼ぶ。）をサーバ２００へ送信する。サーバ２００は、ユーザ端末３００の撮影動画からフレーム毎に静止画を抽出し、ＡＩ（ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）の画像解析によって、ペットの所定の状態が写っている画像か否かを分類する。ここで、ペットの所定の状態が写っている画像（以下、「ＧＯＯＤショット」とも呼ぶ。）とは、例えば、ペットの顔が写っている画像、ペットがジャンプしている画像、ペットが遊んでいる画像など、ペットの飼い主が良いと感じるペットの画像である。そして、サーバ２００は、ユーザ端末３００の撮影動画から抽出した静止画（以下、「抽出画像」とも呼ぶ。）に、ＧＯＯＤショットか否かの分類結果を付し、飼い主と対応付けてデータベースへ保存する。その後、飼い主は、ユーザ端末３００もしくは、ユーザ端末３００以外の端末からサーバ２００へアクセスし、ＧＯＯＤショットのみをスライドショーなどで確認する。これにより、飼い主は、ペットのシャッターチャンスを逃すことなく、ペットの画像を取得することが可能になる。また、飼い主は、スマートグラスをユーザ端末３００として用いることで、ペットと触れ合いながら、ＧＯＯＤショットを取得することが可能になる。なお、スマートグラスの代わりに、ＡＲ（ＡｕｇｍｅｎｔｅｄＲｅａｌｉｔｙ）グラスやＭＲ（ＭｉｘｅｄＲｅａｌｉｔｙ）グラス、ＶＲ（ＶｉｒｔｕａｌＲｅａｌｉｔｙ）グラスなど、他のメガネ型のウェアラブル端末を用いてもよい。As a basic operation, the server 200 acquires images showing a predetermined state of the pet based on video transmitted from the owner's user terminal 300. Specifically, the owner switches the user terminal 300 to continuous recording mode and captures video when playing with the pet P, for example. The user terminal 300 then transmits the captured video (hereinafter also referred to as "captured video") to the server 200. The server 200 extracts still images for each frame from the video captured by the user terminal 300 and classifies the images as showing the predetermined state of the pet through image analysis using AI (Artificial Intelligence). Here, an image showing the predetermined state of the pet (hereinafter also referred to as a "good shot") is an image of the pet that the pet owner finds pleasing, such as an image showing the pet's face, the pet jumping, or the pet playing. The server 200 then classifies still images (hereinafter also referred to as "extracted images") extracted from the video captured by the user terminal 300 as good shots or not, associates them with the owner, and stores them in a database. The owner then accesses the server 200 from the user terminal 300 or a terminal other than the user terminal 300 and checks only the good shots using a slideshow or the like. This allows the owner to capture images of their pet without missing a photo opportunity. Furthermore, by using smart glasses as the user terminal 300, the owner can capture good shots while interacting with their pet. Note that instead of smart glasses, other eyeglass-type wearable devices such as AR (Augmented Reality) glasses, MR (Mixed Reality) glasses, or VR (Virtual Reality) glasses may be used.

なお、ＧＯＯＤショットとして分類される画像は静止画に限らず、動画でもよい。この場合、サーバ２００は、ユーザ端末３００の撮影動画から所定の時間間隔で動画を抽出する。そして、サーバ２００は、動画にＧＯＯＤショットが含まれているか否かを分類し、抽出した動画（同様に「抽出画像」とも呼ぶ。）にＧＯＯＤショットか否かの分類結果を付して保存する。 Note that images classified as GOOD shots are not limited to still images, but may also be videos. In this case, the server 200 extracts videos from the videos shot by the user terminal 300 at predetermined time intervals. The server 200 then classifies the videos as to whether they contain GOOD shots, and saves the extracted videos (also referred to as "extracted images") with the classification result of whether they are GOOD shots or not.

［サーバ］
図２（Ａ）は、サーバ２００の構成を示すブロック図である。サーバ２００は、主に、通信部２１１と、プロセッサ２１２と、メモリ２１３と、記録媒体２１４と、データベース（ＤＢ）２１５と、を備える。 [server]
2A is a block diagram showing the configuration of the server 200. The server 200 mainly includes a communication unit 211, a processor 212, a memory 213, a recording medium 214, and a database (DB) 215.

通信部２１１は、外部装置との間でデータの送受信を行う。具体的に、通信部２１１は、飼い主のユーザ端末３００との間で情報を送受信する。 The communication unit 211 sends and receives data to and from external devices. Specifically, the communication unit 211 sends and receives information to and from the owner's user terminal 300.

プロセッサ２１２は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）などのコンピュータであり、予め用意されたプログラムを実行することにより、サーバ２００の全体を制御する。なお、プロセッサ２１２は、ＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＦＰＧＡ（Ｆｉｅｌｄ－ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）、ＤＳＰ（Ｄｅｍａｎｄ－ＳｉｄｅＰｌａｔｆｏｒｍ）、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）などであってもよい。 Processor 212 is a computer such as a CPU (Central Processing Unit) that controls the entire server 200 by executing pre-prepared programs. Processor 212 may also be a GPU (Graphics Processing Unit), FPGA (Field-Programmable Gate Array), DSP (Demand-Side Platform), ASIC (Application Specific Integrated Circuit), etc.

メモリ２１３は、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）などにより構成される。メモリ２１３は、プロセッサ２１２による各種の処理の実行中に作業メモリとしても使用される。また、メモリ２１３は、プロセッサ２１２の制御に基づき、ユーザ端末３００が撮影した一連の動画を一時的に記憶する。この動画は、例えば、飼い主の識別情報、及び、タイムスタンプの情報等と関連付けられてメモリ２１３に記憶される。 Memory 213 is composed of ROM (Read Only Memory), RAM (Random Access Memory), etc. Memory 213 is also used as working memory while various processes are being executed by processor 212. Memory 213 also temporarily stores a series of videos captured by user terminal 300 under the control of processor 212. These videos are stored in memory 213 in association with, for example, the owner's identification information and timestamp information.

記録媒体２１４は、ディスク状記録媒体、半導体メモリなどの不揮発性で非一時的な記録媒体であり、サーバ２００に対して着脱可能に構成される。記録媒体２１４は、プロセッサ２１２が実行する各種のプログラムを記録している。 The recording medium 214 is a non-volatile, non-temporary recording medium such as a disk-shaped recording medium or semiconductor memory, and is configured to be detachable from the server 200. The recording medium 214 records various programs executed by the processor 212.

データベース（ＤＢ）２１５は、ＧＯＯＤショットか否かの分類結果が付された抽出画像を記憶する。ＤＢ２１５は、サーバ２００に接続又は内蔵されたハードディスクなどの外部記憶装置を含んでもよく、着脱自在なフラッシュメモリなどの記憶媒体を含んでもよい。なお、ＤＢ２１５をサーバ２００に備える代わりに、外部のサーバなどにＤＢ２１５を設け、通信により当該サーバへＧＯＯＤショットか否かの分類結果が付された抽出画像を記憶するようにしてもよい。 Database (DB) 215 stores extracted images with classification results as to whether they are GOOD shots or not. DB 215 may include an external storage device such as a hard disk connected to or built into server 200, or may include a storage medium such as a removable flash memory. Note that instead of providing DB 215 in server 200, DB 215 may be provided on an external server, and extracted images with classification results as to whether they are GOOD shots or not may be stored in the server via communication.

なお、サーバ２００は、管理者などが指示や入力を行うためのキーボード、マウスなどの入力部、及び、液晶ディスプレイなどの表示部を備えていてもよい。 In addition, the server 200 may be equipped with input units such as a keyboard and mouse for administrators to give instructions and input, and a display unit such as an LCD display.

［ユーザ端末］
図２（Ｂ）は、飼い主が使用するユーザ端末３００の内部構成を示すブロック図である。ユーザ端末３００は、例えば、スマートグラスやスマートフォンなどの端末装置である。ユーザ端末３００は、通信部３１１と、プロセッサ３１２と、メモリ３１３と、表示部３１４と、カメラ３１５と、マイク３１６と、を備える。 [User terminal]
2B is a block diagram showing the internal configuration of a user terminal 300 used by a pet owner. The user terminal 300 is a terminal device such as smart glasses or a smartphone. The user terminal 300 includes a communication unit 311, a processor 312, a memory 313, a display unit 314, a camera 315, and a microphone 316.

通信部３１１は、外部装置との間でデータの送受信を行う。具体的に、通信部３１１は、サーバ２００との間で情報を送受信する。 The communication unit 311 sends and receives data to and from external devices. Specifically, the communication unit 311 sends and receives information to and from the server 200.

プロセッサ３１２は、ＣＰＵなどのコンピュータであり、予め用意されたプログラムを実行することにより、ユーザ端末３００の全体を制御する。なお、プロセッサ３１２は、ＧＰＵ、ＦＰＧＡ、ＤＳＰ、ＡＳＩＣなどであってもよい。プロセッサ３１２は、予め用意されたプログラムを実行することにより、カメラ３１５により撮影した動画をサーバ２００へ送信する。 Processor 312 is a computer such as a CPU, and controls the entire user terminal 300 by executing a pre-prepared program. Processor 312 may also be a GPU, FPGA, DSP, ASIC, etc. Processor 312 transmits video captured by camera 315 to server 200 by executing a pre-prepared program.

メモリ３１３は、ＲＯＭ、ＲＡＭなどにより構成される。メモリ３１３は、プロセッサ３１２により実行される各種のプログラムを記憶する。また、メモリ３１３は、プロセッサ３１２による各種の処理の実行中に作業メモリとしても使用される。カメラ３１５により撮影された動画は、メモリ３１３に記憶された後、サーバ２００へ送信される。表示部３１４は、例えば液晶表示装置などであり、カメラ３１５により撮影された動画や、サーバ２００に保存されているＧＯＯＤショットの抽出画像などを表示する。 Memory 313 is composed of ROM, RAM, etc. Memory 313 stores various programs executed by processor 312. Memory 313 is also used as working memory while processor 312 is executing various processes. Video captured by camera 315 is stored in memory 313 and then transmitted to server 200. Display unit 314 is, for example, an LCD display device, and displays videos captured by camera 315, extracted images of good shots stored on server 200, etc.

カメラ３１５には、ユーザの視界を撮影するカメラ（「アウトカメラ」とも呼ぶ。）や、ユーザの眼球を撮影するカメラ（「アイカメラ」とも呼ぶ。）が含まれる。アウトカメラは、ユーザ端末３００の外側に搭載される。アウトカメラは、ペットなどの被写体を含むユーザの視界を撮影し、サーバ２００へ送信する。これにより、サーバ２００は、ペットなどの被写体の画像を取得することができる。アイカメラは、ユーザの眼球を撮影するようユーザ端末３００の内側に搭載される。アイカメラは、ユーザの眼球を撮影し、プロセッサ３１２へ送信する。プロセッサ３１２は、アイカメラが撮影したユーザの眼球の画像を基に、ユーザの視線の動きなどを検出する。これにより、ユーザ端末３００は、ユーザの視線方向などの情報を取得することができる。 The camera 315 includes a camera that captures the user's field of vision (also called an "out-camera") and a camera that captures the user's eyeballs (also called an "eye camera"). The out-camera is mounted on the outside of the user terminal 300. The out-camera captures the user's field of vision, including subjects such as pets, and transmits the images to the server 200. This allows the server 200 to acquire images of subjects such as pets. The eye camera is mounted on the inside of the user terminal 300 to capture images of the user's eyeballs. The eye camera captures images of the user's eyeballs and transmits the images to the processor 312. The processor 312 detects the user's line of sight and other information based on the images of the user's eyeballs captured by the eye camera. This allows the user terminal 300 to acquire information such as the user's line of sight.

マイク３１６は、ユーザの声や周辺の音を集音し、サーバ２００へ送信する。サーバ２００は、例えば、ユーザの声やペットの鳴き声に基づいて、ユーザが所定の言葉を発したことや、ユーザがペットに指示したことや号令をかけたことを推定することができる。 Microphone 316 collects the user's voice and surrounding sounds and transmits them to server 200. Server 200 can infer, for example, based on the user's voice or the pet's cries, whether the user has uttered a specific word or whether the user has given an instruction or command to the pet.

［機能構成］
図３は、サーバ２００の機能構成を示すブロック図である。サーバ２００は、機能的には、画像取得部４１１と、画像分類部４１２と、を含む。 [Functional configuration]
3 is a block diagram showing the functional configuration of the server 200. Functionally, the server 200 includes an image acquisition unit 411 and an image classification unit 412.

サーバ２００には、ユーザ端末３００の撮影動画が入力される。ユーザ端末３００の撮影動画は、画像取得部４１１に入力される。画像取得部４１１は、ユーザ端末３００の撮影動画から静止画又は動画を抽出画像として抽出する。画像取得部４１１は、抽出画像を画像分類部４１２へ出力する。 Video captured by the user terminal 300 is input to the server 200. The video captured by the user terminal 300 is input to the image acquisition unit 411. The image acquisition unit 411 extracts still images or videos from the video captured by the user terminal 300 as extracted images. The image acquisition unit 411 outputs the extracted images to the image classification unit 412.

画像分類部４１２は、予め用意された画像認識モデルなどを用いて、画像取得部４１１から取得した抽出画像がＧＯＯＤショットであるか否かを分類する。この画像認識モデルは、画像がＧＯＯＤショットであるか否かを分類するように予め学習された機械学習モデルであり、以下、「画像分類モデル」とも呼ぶ。画像分類部４１２は、画像分類モデルによって抽出画像がＧＯＯＤショットであると分類された場合は、抽出画像にＧＯＯＤショットであることを示す付加情報を付する。一方、画像分類部４１２は、画像分類モデルによって抽出画像がＧＯＯＤショットでない、すなわちＢＡＤショットと分類された場合は、抽出画像にＢＡＤショットであることを示す付加情報を付する。ＢＡＤショットとは、ＧＯＯＤショット以外の画像であり、例えば、ペットの顔が写っていない画像などを言う。画像分類部４１２は、付加情報を付した抽出画像をＤＢ２１５へ出力する。The image classification unit 412 classifies the extracted image acquired from the image acquisition unit 411 as a GOOD shot or not, using a pre-prepared image recognition model or the like. This image recognition model is a machine learning model that has been trained in advance to classify whether an image is a GOOD shot or not, and is hereinafter also referred to as the "image classification model." If the extracted image is classified as a GOOD shot by the image classification model, the image classification unit 412 attaches additional information to the extracted image indicating that it is a GOOD shot. On the other hand, if the extracted image is classified as a BAD shot by the image classification model, i.e., not a GOOD shot, the image classification unit 412 attaches additional information to the extracted image indicating that it is a BAD shot. A BAD shot is an image other than a GOOD shot, such as an image that does not show a pet's face. The image classification unit 412 outputs the extracted image with the attached additional information to DB215.

［画像分類モデルの学習］
次に、画像分類部４１２が用いる画像分類モデルの学習について説明する。画像分類モデルは、いわゆる教師あり学習によって生成される。図４は、画像分類モデルの学習方法を示すブロック図であり、学習データ５１１と、学習装置５１２と、を含む。 [Image classification model training]
Next, we will explain the training of the image classification model used by the image classification unit 412. The image classification model is generated by so-called supervised learning. Fig. 4 is a block diagram showing the training method of the image classification model, and includes training data 511 and a training device 512.

学習データ５１１は、ＧＯＯＤショットであるか否かを事前にラベル付けした画像データ（以下、「教師データ」とも呼ぶ）である。画像データへのラベル付けは、ペットの所定の部位が写っているか、ペットが所定の動作をしているか、などを基準に行われる。ペットの所定の部位とは、ペットの顔などを指す。例えば、ペットの顔が写っている画像には、ＧＯＯＤショットのラベルが付与される。一方、ペットが写っていない画像や、ペットが後ろを向いている画像、ペットの胴体や脚しか写っていない画像には、ＢＡＤショットのラベルが付与される。また、ペットの所定の動作とは、ペットの人目を惹くような動作などを指す。例えば、ペットがジャンプしている画像やペットが道具をくわえている画像にはＧＯＯＤショットのラベルが付与される。 The training data 511 is image data (hereinafter also referred to as "teaching data") that has been pre-labeled as to whether it is a GOOD shot or not. Image data is labeled based on criteria such as whether a specific part of the pet is captured, or whether the pet is performing a specific action. A specific part of the pet refers to the pet's face, for example. For example, an image that shows the pet's face is labeled as a GOOD shot. On the other hand, an image that does not show the pet, an image in which the pet is facing away, or an image that shows only the pet's torso or legs is labeled as a BAD shot. Furthermore, a specific action of the pet refers to an eye-catching action of the pet, for example. For example, an image in which the pet is jumping or holding a tool in its mouth is labeled as a GOOD shot.

なお、ペットの飼い主が、複数のペットの画像をＧＯＯＤショットか否かに選別し、その結果をラベル付けした画像を教師データとして用いてもよい。これにより、よりペットの飼い主の好みに合った画像を分類することが可能な画像分類モデルを生成することができる。 In addition, pet owners can sort multiple images of their pets into good shots and label them accordingly, and use the resulting labeled images as training data. This makes it possible to generate an image classification model that can classify images that better suit the pet owner's preferences.

また、ペットの飼い主や第三者がＳＮＳ（ＳｏｃｉａｌＮｅｔｗｏｒｋＳｅｒｖｉｃｅ）上に投稿した動物の画像を収集し、教師データとして用いてもよい。この場合、ペットの飼い主や第三者がＳＮＳ上に投稿した画像にはＧＯＯＤショットのラベル付けが行われる。これにより、教師データの量が増え、より精度の高い画像分類モデルを生成することが可能となる。 Also, images of animals posted by pet owners or third parties on social networking services (SNS) can be collected and used as training data. In this case, images posted by pet owners or third parties on SNS are labeled as "good shots." This increases the amount of training data, making it possible to generate more accurate image classification models.

学習装置５１２は、学習データ５１１をもとに、ＧＯＯＤショットのパターンを学習し、学習済モデルとして画像分類モデルを出力する。これにより、ペットが写っている画像と、ＧＯＯＤショットに該当するペットの状態との関係を学習した画像分類モデルが生成される。 The learning device 512 learns the patterns of good shots based on the learning data 511 and outputs an image classification model as a trained model. This generates an image classification model that has learned the relationship between images containing pets and the state of the pet that corresponds to a good shot.

［画像分類モデルによる分類］
画像分類部４１２は、画像分類モデルを用いて、画像がＧＯＯＤショットであるか否かを推定する。具体的に、画像分類モデルは、入力された画像がＧＯＯＤショットであるか否かを推定し、その画像がＧＯＯＤショットである確率を示すスコア（「ＧＯＯＤショットスコア」と呼ぶ。）と、その画像がＢＡＤショットである確率を示すスコア（「ＢＡＤショットスコア」と呼ぶ。）を算出する。画像分類モデルは、例えば、ＧＯＯＤショットスコアとＢＡＤショットスコアの合計が「１」となるように各スコアを算出する。そして、画像分類モデルは、ＧＯＯＤショットスコアと、ＢＡＤショットスコアを予め決められた所定の閾値ＴＨと比較し、閾値ＴＨより大きいスコアを有する方を分類結果として採用する。例えば、ある画像について、画像分類モデルは、ＧＯＯＤショットスコア「０．８」、ＢＡＤショットスコア「０．２」を算出し、予め決められた閾値ＴＨと比較する。閾値ＴＨを「０．５」とすると、画像分類モデルは、その画像をＧＯＯＤショットであると推定する。 [Classification using image classification model]
The image classification unit 412 uses an image classification model to estimate whether an image is a good shot. Specifically, the image classification model estimates whether an input image is a good shot and calculates a score indicating the probability that the image is a good shot (referred to as a "good shot score") and a score indicating the probability that the image is a bad shot (referred to as a "bad shot score"). The image classification model calculates each score, for example, so that the sum of the good shot score and the bad shot score is "1." The image classification model then compares the good shot score and the bad shot score with a predetermined threshold TH and adopts the score greater than the threshold TH as the classification result. For example, for a certain image, the image classification model calculates a good shot score of "0.8" and a bad shot score of "0.2" and compares them with a predetermined threshold TH. If the threshold TH is "0.5," the image classification model estimates that the image is a good shot.

［画像分類処理］
次に、上記のような画像分類を行う画像分類処理について説明する。図５は、サーバ２００において行われる画像分類処理のフローチャートである。この処理は、図２に示すプロセッサ２１２が予め用意されたプログラムを実行し、図３に示す各要素として動作することにより実現される。 [Image classification processing]
Next, the image classification process for performing the above-described image classification will be described. Fig. 5 is a flowchart of the image classification process performed by the server 200. This process is realized by the processor 212 shown in Fig. 2 executing a program prepared in advance and operating as each element shown in Fig. 3.

まず、画像取得部４１１は、ユーザ端末３００から撮影動画を取得する。そして、画像取得部４１１は、撮影動画から画像（静止画又は動画）を取得する（ステップＳ１１）。次に、画像分類部４１２は、画像取得部４１１が取得した画像が、ＧＯＯＤショットか否かを分類する（ステップＳ１２）。具体的には、画像分類部４１２は、その画像がＧＯＯＤショットである確率を示すスコアと、その画像がＢＡＤショットである確率を示すスコアとを算出する。画像分類部４１２は、算出された各スコアを閾値ＴＨと比較し、その画像がＧＯＯＤショットであるかＢＡＤショットであるかを分類する。First, the image acquisition unit 411 acquires a shot video from the user terminal 300. Then, the image acquisition unit 411 acquires an image (still image or video) from the shot video (step S11). Next, the image classification unit 412 classifies the image acquired by the image acquisition unit 411 as whether it is a GOOD shot or not (step S12). Specifically, the image classification unit 412 calculates a score indicating the probability that the image is a GOOD shot and a score indicating the probability that the image is a BAD shot. The image classification unit 412 compares each calculated score with a threshold TH and classifies the image as either a GOOD shot or a BAD shot.

次に、画像分類部４１２は、画像取得部４１１が取得した画像に分類結果を付して、データベース（ＤＢ）２１５に保存する（ステップＳ１３）。例えば、画像分類部４１２は、ＧＯＯＤショットであると分類された画像には「１」、ＢＡＤショットであると分類された画像には「０」などのフラグを付して、ＤＢ２１５に保存する。そして、画像分類処理は終了する。Next, the image classification unit 412 assigns classification results to the images acquired by the image acquisition unit 411 and stores them in the database (DB) 215 (step S13). For example, the image classification unit 412 assigns a flag such as "1" to images classified as GOOD shots and "0" to images classified as BAD shots, and stores them in the DB 215. The image classification process then ends.

これにより、ユーザが撮影した膨大な画像から、ユーザの好みに合ったＧＯＯＤショットの画像が抽出され、サーバ２００のＤＢ２１５内に蓄積される。ユーザは、サーバ２００にアクセスし、ＤＢ２１５に保存されているＧＯＯＤショットの画像を閲覧することができる。また、ユーザは、サーバ２００からＧＯＯＤショットの画像をダウンロードして、ユーザ端末３００などの端末装置に保存することができる。 As a result, good shot images that match the user's preferences are extracted from the vast number of images taken by the user and stored in DB215 of server 200. The user can access server 200 and view the good shot images stored in DB215. The user can also download good shot images from server 200 and save them on a terminal device such as user terminal 300.

［変形例］
次に、第１実施形態の変形例を説明する。以下の変形例は、適宜組み合わせて第１実施形態に適用することができる。
（変形例１）
上記の第１実施形態では、サーバ２００は、撮影動画から抽出された抽出画像に基づいて画像を分類している。サーバ２００は、上記に加え、所定の状態発生条件を満たしたか否かを判定し、判定結果を用いて画像を分類してもよい。所定の状態発生条件とは、ＧＯＯＤショットが撮影されたと推定される条件であり、以下、「ＧＯＯＤショットの発生条件」とも呼ぶ。ＧＯＯＤショットの発生条件は、例えば、撮影者の生体情報や行動情報などに基づいて決定される。 [Modification]
Next, a description will be given of modifications of the first embodiment. The following modifications can be applied to the first embodiment in appropriate combinations.
(Variation 1)
In the first embodiment, the server 200 classifies images based on extracted images extracted from a captured video. In addition to the above, the server 200 may determine whether a predetermined condition for occurrence is met and classify the images using the determination result. The predetermined condition for occurrence is a condition under which a good shot is estimated to have been captured, and is hereinafter also referred to as a "condition for occurrence of a good shot." The condition for occurrence of a good shot is determined based on, for example, biometric information or behavioral information of the photographer.

具体的に、図６は、変形例１のサーバ２００ａの機能構成を示す。図示のように、変形例１では、サーバ２００ａに条件判定部４１３を設ける。条件判定部４１３は、ユーザ端末３００から、撮影者の生体情報などをタイムスタンプと共に取得する。そして、条件判定部４１３は、予め学習された学習済みのモデルを用いて、撮影者の生体情報などが所定の条件を満たしているか否かを判定し、判定結果を画像分類部４１２へ出力する。 Specifically, Figure 6 shows the functional configuration of server 200a in variant example 1. As shown in the figure, in variant example 1, server 200a is provided with a condition determination unit 413. The condition determination unit 413 acquires the photographer's biometric information, etc., along with a timestamp, from the user terminal 300. Then, using a pre-trained model, the condition determination unit 413 determines whether the photographer's biometric information, etc., satisfies predetermined conditions, and outputs the determination result to the image classification unit 412.

撮影者の生体情報は、視線や音声、心拍数などを含む。撮影者の生体情報は、ユーザ端末３００によって取得される。ユーザ端末３００は、ユーザ端末３００に搭載されたカメラ、マイク、センサなどから、生体情報を取得してもよいし、Ｂｌｕｅｔｏｏｔｈ（登録商標）やＷｉ－Ｆｉ（登録商標）などにより、外部機器と無線通信を行い、外部機器から生体情報を取得してもよい。また、所定の条件には、例えば、撮影者がペットに視線を向けていること、撮影者が所定の閾値以上の大きさの声を発したこと、撮影者が「いいね」などの所定の言葉を発したこと、撮影者の心拍数が所定の閾値以上の高さとなったこと、などが挙げられる。撮影者の生体情報が上記の条件を満たす場合は、その時点及びその前後の時点において、ＧＯＯＤショットが撮影されている可能性が高いと推定される。なお、条件判定部４１３は、撮影者の生体情報が所定の条件を満たした時点に加え、その前後の時点においても所定の条件を満たしていると判定し、画像分類部４１２へ判定結果を出力してもよい。 Biometric information of the photographer includes gaze, voice, heart rate, and the like. The photographer's biometric information is acquired by the user terminal 300. The user terminal 300 may acquire the biometric information from a camera, microphone, sensor, or the like built into the user terminal 300, or may acquire the biometric information from an external device via wireless communication with the external device via Bluetooth (registered trademark) or Wi-Fi (registered trademark). Predetermined conditions include, for example, the photographer directing their gaze at a pet, the photographer speaking a voice louder than a predetermined threshold, the photographer uttering a predetermined phrase such as "like," or the photographer's heart rate rising above a predetermined threshold. If the photographer's biometric information satisfies the above conditions, it is estimated that a good shot was likely taken at that time and at times before and after that time. The condition determination unit 413 may determine that the photographer's biometric information satisfies the predetermined conditions not only at the time when it satisfied the predetermined conditions, but also at times before and after that time, and output the determination results to the image classification unit 412.

また、条件判定部４１３は、撮影者やペットの行動情報に基づいて、ＧＯＯＤショットの発生条件を満たしたか否かの判定を行ってもよい。例えば、撮影者が合図をし、ペットが合図に従って行動した場合や、撮影者が指示や号令をかけ、ペットが指示や号令に従って行動した場合は、条件判定部４１３は、ＧＯＯＤショットの発生条件を満たしていると判定し、判定結果を画像分類部４１２へ出力する。なお、撮影者やペットの行動情報は、ユーザ端末３００に搭載されたマイク、センサなどから取得してもよいし、ユーザ端末３００の撮影動画から取得してもよい。 The condition determination unit 413 may also determine whether the conditions for generating a GOOD shot are met based on behavioral information about the photographer and the pet. For example, if the photographer gives a signal and the pet acts in accordance with the signal, or if the photographer gives an instruction or command and the pet acts in accordance with the instruction or command, the condition determination unit 413 determines that the conditions for generating a GOOD shot are met and outputs the determination result to the image classification unit 412. Note that behavioral information about the photographer and the pet may be obtained from a microphone, sensor, etc. installed in the user terminal 300, or from videos captured by the user terminal 300.

画像分類部４１２は、画像取得部４１１から入力された抽出画像及び、条件判定部４１３から入力された判定結果に基づいて、抽出画像がＧＯＯＤショットか否かを分類する。この場合、画像分類部４１２が使用する画像分類モデルは、抽出画像及び判定結果に基づいてＧＯＯＤショットか否かを推定するように予め学習された学習済みのモデルとする。 The image classification unit 412 classifies the extracted image as being a good shot or not based on the extracted image input from the image acquisition unit 411 and the judgment result input from the condition judgment unit 413. In this case, the image classification model used by the image classification unit 412 is a trained model that has been trained in advance to estimate whether or not an extracted image is a good shot based on the extracted image and the judgment result.

上記のように、撮影者の生体情報や行動情報などを考慮してＧＯＯＤショットか否かの分類をすることで、撮影者が良いと感じる瞬間を撮影したペットの画像を高精度で取得することが可能になる。 As described above, by taking into account the photographer's biometric information and behavioral information to classify whether a shot is good or not, it is possible to obtain with high accuracy an image of a pet captured at a moment that the photographer feels is good.

（変形例２）
上記の第１実施形態により分類されたＧＯＯＤショットを基に、画像分類モデルの再学習用の教師データを作成してもよい。具体的に、ペットの飼い主は、サーバ２００が分類したＧＯＯＤショットの要否を判断する。サーバ２００は、ペットの飼い主が必要と判断した画像はＧＯＯＤショットであるとする。一方、サーバ２００は、ペットの飼い主が不要と判断した画像はＢＡＤショットであるとし、ラベルの変更を行う。そして、サーバ２００は、上記のＧＯＯＤショットの画像データ及びＢＡＤショットの画像データを学習データとし、画像分類モデルの再学習を行う。これにより、サーバ２００は、より飼い主の好みに合ったＧＯＯＤショットを分類することが可能となる。 (Variation 2)
Based on the GOOD shots classified by the first embodiment, training data for retraining the image classification model may be created. Specifically, the pet owner determines whether or not the GOOD shots classified by the server 200 are necessary. The server 200 determines that images that the pet owner determines are necessary are GOOD shots. On the other hand, the server 200 determines that images that the pet owner determines are unnecessary are BAD shots and changes the labels. The server 200 then uses the image data of the GOOD shots and the image data of the BAD shots as training data and retrains the image classification model. This enables the server 200 to classify GOOD shots that better suit the pet owner's preferences.

（変形例３）
上記の第１実施形態では、ユーザ端末３００はカメラを常時録画モードにし、撮影動画をサーバ２００へ送信している。その代わりに、ユーザ端末３００は、カメラに被写体が映ったタイミングで録画を開始し、カメラに被写体が映らなくなったタイミングで録画を終了し、録画開始から録画終了までの撮影動画をサーバ２００へ送信してもよい。具体的に、ユーザ端末３００は、カメラに映っている画像を所定のタイミング毎にキャプチャし、サーバ２００へ送信する。サーバ２００は、予め作成した画像認識モデルなどに基づいて、ペットがユーザ端末３００のカメラに映ったか否かを判定する。ペットがユーザ端末３００のカメラに映った場合は、サーバ２００は、ユーザ端末３００を録画モードにし、録画を開始する。その後、ペットがユーザ端末３００のカメラに映らなくなった場合は、サーバ２００は、ユーザ端末３００の録画モードを終了する。これにより、ユーザ端末３００からサーバ２００へ送信する撮影動画のデータ量を削減することができる。 (Variation 3)
In the first embodiment described above, the user terminal 300 sets the camera to continuous recording mode and transmits the captured video to the server 200. Alternatively, the user terminal 300 may start recording when a subject is captured on the camera, stop recording when the subject is no longer captured on the camera, and transmit the captured video from the start to the end of recording to the server 200. Specifically, the user terminal 300 captures images captured on the camera at predetermined intervals and transmits them to the server 200. The server 200 determines whether a pet is captured on the camera of the user terminal 300 based on a pre-created image recognition model or the like. If a pet is captured on the camera of the user terminal 300, the server 200 sets the user terminal 300 to recording mode and starts recording. Thereafter, if the pet is no longer captured on the camera of the user terminal 300, the server 200 ends the recording mode of the user terminal 300. This reduces the amount of data of the captured video transmitted from the user terminal 300 to the server 200.

なお、ペットがユーザ端末３００のカメラに映ったか否かは、ユーザ端末３００が判定してもよい。この場合、ユーザ端末３００は、予め作成した画像認識モデルなどを用いて、ペットがユーザ端末３００のカメラに映ったか否かを判定する。そして、ユーザ端末３００は、判定結果に従って、録画開始や録画終了の制御をしてもよい。 The user terminal 300 may determine whether or not the pet is captured by the camera of the user terminal 300. In this case, the user terminal 300 may use a pre-created image recognition model or the like to determine whether or not the pet is captured by the camera of the user terminal 300. The user terminal 300 may then control the start and end of recording according to the determination result.

（変形例４）
上記の第１実施形態では、サーバ２００は、ペットを被写体とした撮影動画に基づいて、ＧＯＯＤショットを分類しているが、被写体はペットに限らず、例えば、子供など、シャッターチャンスを逃す機会の多い別の被写体であってもよい。 (Variation 4)
In the first embodiment described above, the server 200 classifies good shots based on videos shot with a pet as the subject, but the subject is not limited to a pet and may be another subject, such as a child, that often misses a photo opportunity.

（変形例５）
上記の第１実施形態では、基本的にユーザ端末３００により取得された情報がそのままサーバ２００へ送信され、サーバ２００が受信した情報に基づいてＧＯＯＤショットを分類している。その代わりに、ＧＯＯＤショットを分類するための処理をユーザ端末３００が行い、その処理結果をサーバ２００へ送信することとしてもよい。もしくは、サーバ２００を使用せず、ＧＯＯＤショットを分類するための処理及び処理結果の保存をユーザ端末３００で行うこととしてもよい。これにより、ユーザ端末３００からサーバ２００への通信負荷、及び、サーバ２００における処理負荷を軽減することができる。これらの場合、ユーザ端末３００は画像分類装置の一例である。 (Variation 5)
In the first embodiment described above, information acquired by the user terminal 300 is basically transmitted as is to the server 200, and the server 200 classifies the good shots based on the received information. Instead, the user terminal 300 may perform a process for classifying the good shots and transmit the processing results to the server 200. Alternatively, the server 200 may not be used, and the process for classifying the good shots and the storage of the processing results may be performed by the user terminal 300. This reduces the communication load from the user terminal 300 to the server 200 and the processing load on the server 200. In these cases, the user terminal 300 is an example of an image classification device.

＜第２実施形態＞
図７は、第２実施形態の画像分類装置５０の機能構成を示すブロック図である。第２実施形態の画像分類装置５０は、画像取得手段５１と、画像分類手段５２と、出力手段５３とを備える。 Second Embodiment
7 is a block diagram showing the functional configuration of an image classification device 50 according to the second embodiment. The image classification device 50 according to the second embodiment includes an image acquisition unit 51, an image classification unit 52, and an output unit 53.

図８は、画像分類装置５０による処理のフローチャートである。画像取得手段５１は、対象被写体が写っている画像を取得する（ステップＳ５１）。画像分類手段５２は、対象被写体が写っている画像と前記対象被写体の所定の状態との関係が機械学習されたモデルを用いて、前記画像から、前記対象被写体の所定の状態が写っている画像を分類する（ステップＳ５２）。出力手段５３は、前記画像及び前記分類の結果を出力する（ステップＳ５３）。 Figure 8 is a flowchart of processing by the image classification device 50. The image acquisition means 51 acquires images containing target subjects (step S51). The image classification means 52 classifies the images into those containing images of the target subjects in a predetermined state, using a machine-learned model of the relationship between images containing the target subjects and a predetermined state of the target subjects (step S52). The output means 53 outputs the images and the classification results (step S53).

第２実施形態の画像分類装置５０によれば、ユーザが好む画像を容易に分類することが可能となる。 The image classification device 50 of the second embodiment makes it possible to easily classify images that users prefer.

上記の実施形態の一部又は全部は、以下の付記のようにも記載されうるが、以下には限られない。 Some or all of the above embodiments may also be described as, but are not limited to, the following notes:

（付記１）
対象被写体が写っている画像を取得する画像取得手段と、
対象被写体が写っている画像と前記対象被写体の所定の状態との関係が機械学習されたモデルを用いて、前記画像から、前記対象被写体の所定の状態が写っている画像を分類する画像分類手段と、
前記画像及び前記分類の結果を出力する出力手段と、
を備える画像分類装置。 (Appendix 1)
image acquisition means for acquiring an image including a target subject;
an image classification means for classifying, from the images, images showing a predetermined state of the target subject using a machine-learned model of the relationship between an image showing a target subject and a predetermined state of the target subject;
an output means for outputting the image and the classification result;
An image classification device comprising:

（付記２）
前記画像分類手段は、前記対象被写体の所定の部位が写っている画像を分類する付記１に記載の画像分類装置。 (Appendix 2)
2. The image classification device according to claim 1, wherein the image classification means classifies images that include a predetermined part of the target subject.

（付記３）
前記画像分類手段は、前記対象被写体が所定の動作を行っている画像を分類する付記１又は２に記載の画像分類装置。 (Appendix 3)
3. The image classification device according to claim 1, wherein the image classification means classifies images in which the target subject is performing a predetermined action.

（付記４）
前記所定の状態の発生条件を満たしているか否かを判定する条件判定手段を備え、
前記画像分類手段は、前記画像及び前記発生条件の判定結果に基づいて、前記画像を分類する付記１乃至３のいずれか一項に記載の画像分類装置。 (Appendix 4)
a condition determination means for determining whether or not the occurrence condition of the predetermined state is satisfied;
The image classification device according to any one of claims 1 to 3, wherein the image classification means classifies the images based on the images and the determination results of the occurrence conditions.

（付記５）
前記条件判定手段は、前記対象被写体の撮影者の視線方向に基づいて、前記発生条件を満たしているか否かを判定する付記４に記載の画像分類装置。 (Appendix 5)
The image classification device according to claim 4, wherein the condition determination means determines whether the occurrence condition is satisfied based on a gaze direction of a photographer of the target subject.

（付記６）
前記条件判定手段は、前記対象被写体の撮影者の心拍数に基づいて、前記発生条件を満たしているか否かを判定する付記４又は５に記載の画像分類装置。 (Appendix 6)
The image classification device according to claim 4 or 5, wherein the condition determination means determines whether the occurrence condition is satisfied based on a heart rate of a photographer of the target subject.

（付記７）
前記条件判定手段は、前記対象被写体の撮影者の音声に基づいて、前記発生条件を満たしているか否かを判定する付記４乃至６のいずれか一項に記載の画像分類装置。 (Appendix 7)
The image classification device according to any one of claims 4 to 6, wherein the condition determination means determines whether the occurrence condition is met based on the voice of a photographer of the target subject.

（付記８）
前記条件判定手段は、撮影者の音声を検出し、前記対象被写体が撮影者の音声に反応して行動したことを前記発生条件とする付記４乃至７のいずれか一項に記載の画像分類装置。 (Appendix 8)
The image classification device according to any one of appendices 4 to 7, wherein the condition determination means detects the voice of the photographer and determines that the target subject has acted in response to the voice of the photographer as the occurrence condition.

（付記９）
前記画像取得手段は、前記対象被写体が端末装置のカメラに写った場合に、対象被写体が写っている画像の取得を開始し、前記対象被写体が前記端末装置のカメラに写らなくなった場合に、対象被写体が写っている画像の取得を終了する付記１乃至８のいずれか一項に記載の画像分類装置。 (Appendix 9)
The image classification device according to any one of appendixes 1 to 8, wherein the image acquisition means starts acquiring images containing the target subject when the target subject is captured by the camera of the terminal device, and stops acquiring images containing the target subject when the target subject is no longer captured by the camera of the terminal device.

（付記１０）
前記出力手段が出力した結果のうち、ユーザにより要否判断がされた画像を学習データとして用いて前記モデルの再学習を行う学習手段を備える付記１乃至９のいずれか一項に記載の画像分類装置。 (Appendix 10)
10. The image classification device according to claim 1, further comprising a learning means for re-training the model using images, among the results output by the output means, for which a user has determined whether or not they are necessary, as training data.

（付記１１）
対象被写体が写っている画像を取得し、
対象被写体が写っている画像と前記対象被写体の所定の状態との関係が機械学習されたモデルを用いて、前記画像から、前記対象被写体の所定の状態が写っている画像を分類し、
前記画像及び前記分類の結果を出力する画像分類方法。 (Appendix 11)
Acquire an image containing the target subject,
classifying, from the images, images showing a predetermined state of the target subject using a machine-learned model of the relationship between images showing the target subject and a predetermined state of the target subject;
The image classification method outputs the image and the result of the classification.

（付記１２）
対象被写体が写っている画像を取得し、
対象被写体が写っている画像と前記対象被写体の所定の状態との関係が機械学習されたモデルを用いて、前記画像から、前記対象被写体の所定の状態が写っている画像を分類し、
前記画像及び前記分類の結果を出力する処理をコンピュータに実行させるプログラムを記録した記録媒体。 (Appendix 12)
Acquire an image containing the target subject,
classifying, from the images, images showing a predetermined state of the target subject using a machine-learned model of the relationship between images showing the target subject and a predetermined state of the target subject;
A recording medium on which a program for causing a computer to execute a process for outputting the image and the classification results is recorded.

以上、実施形態及び実施例を参照して本開示を説明したが、本開示は上記実施形態及び実施例に限定されるものではない。本開示の構成や詳細には、本開示のスコープ内で当業者が理解し得る様々な変更をすることができる。 The present disclosure has been described above with reference to embodiments and examples, but the present disclosure is not limited to the above embodiments and examples. Various modifications that would be understood by a person skilled in the art can be made to the configuration and details of the present disclosure within the scope of the present disclosure.

２００サーバ
２１５データベース（ＤＢ）
３００ユーザ端末
４１１画像取得部
４１２画像分類部
４１３条件判定部
５１１学習データ
５１２学習装置 200 Server 215 Database (DB)
300 User terminal 411 Image acquisition unit 412 Image classification unit 413 Condition determination unit 511 Learning data 512 Learning device

Claims

image acquisition means for acquiring an image including a target subject;
a condition determination means for determining whether a predetermined condition for the target subject to be photographed is satisfied; and
an image classification means for classifying images of a target subject that show a predetermined state based on the images acquired by the image acquisition means and the determination result made by the condition determination means , using a machine-learned model of the relationship between an image of the target subject , a determination result as to whether a condition for occurrence of a predetermined state is satisfied, and the predetermined state of the target subject;
an output means for outputting the image and the classification result;
Equipped with
The condition determination means determines whether or not the occurrence condition of the predetermined state is satisfied based on the heart rate of a photographer of the target subject .

The image classification device described in claim 1, wherein the image classification means classifies images that include a specified part of the target subject.

The image classification device described in claim 1 or 2, wherein the image classification means classifies images in which the target subject is performing a predetermined action.

The image classification device according to claim 1 , wherein the condition determination means determines whether the occurrence condition is satisfied based on a gaze direction of a photographer of the target subject.

The image classification device according to claim 1 , wherein the condition determination means determines whether the occurrence condition is met based on a voice of a photographer of the target subject.

The image classification device according to claim 1 , wherein the condition determination means detects a voice of a photographer and determines that the target subject has taken an action in response to the voice of the photographer as the occurrence condition.

1. A computer-implemented method for image classification, comprising:
An image acquisition process is performed to acquire an image containing the target subject.
performing a condition determination process for determining whether a predetermined condition for the target subject to be photographed is satisfied;
performing an image classification process to classify images of the target subject that show the predetermined state based on the images acquired by the image acquisition process and the determination result determined by the condition determination process , using a model in which the relationship between the image in which the target subject is captured , the determination result of whether or not a condition for occurrence of a predetermined state is satisfied , and the predetermined state of the target subject has been machine-learned ;
performing an output process for outputting the image and the classification result;
The condition determination process determines whether or not the occurrence condition of the predetermined state is satisfied based on the heart rate of a photographer of the target subject .

An image acquisition process is performed to acquire an image containing the target subject.
performing a condition determination process for determining whether a predetermined condition for the target subject to be photographed is satisfied;
performing an image classification process to classify images of the target subject that show the predetermined state based on the images acquired by the image acquisition process and the determination result determined by the condition determination process , using a model in which the relationship between the image in which the target subject is captured , the determination result of whether or not a condition for occurrence of a predetermined state is satisfied , and the predetermined state of the target subject has been machine-learned ;
performing an output process for outputting the image and the classification result;
The condition determination process is a program that causes a computer to execute a process of determining whether or not the occurrence condition of the predetermined state is satisfied based on the heart rate of a person who is taking a photograph of the target subject .