JP7766451B2

JP7766451B2 - Object detection device

Info

Publication number: JP7766451B2
Application number: JP2021161727A
Authority: JP
Inventors: 成典田中; 健二中村; 雄平山本; 文渊姜; 丈司鳴尾; ちひろ田中; 一磨坂本; 英一政木; 豊松林; 恭仁新名; 貴之山田
Original assignee: Intelligent Style Co Ltd
Current assignee: Intelligent Style Co Ltd
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2025-11-10
Anticipated expiration: 2041-09-30
Also published as: JP2023051195A

Description

特許法第３０条第２項適用２０２１年９月２８日に第４６回土木情報学シンポジウム（ウエブ開催）にて発表Application of Article 30, Paragraph 2 of the Patent Act. Announced at the 46th Civil Engineering Informatics Symposium (website held) on September 28, 2021.

特許法第３０条第２項適用２０２１年９月２２日に第４６回土木情報学シンポジウムの講演集（ウエブサイト）にて発表Article 30, Paragraph 2 of the Patent Act applies. Published on September 22, 2021 in the proceedings of the 46th Civil Engineering Informatics Symposium (website).

この発明は、画像中から対象物を抽出する技術に関するものである。 This invention relates to technology for extracting objects from images.

ＣＮＮ（コンボリューショナル・ニューラル・ネットワーク）などのニューラルネットワークを学習させることにより、画像中から人物などの対象物を抽出することが行われている。 By training neural networks such as CNNs (convolutional neural networks), it is possible to extract objects such as people from images.

たとえば、YOLO(You Only Look Once)のような物体検出モデルによって、撮像画像に映し出された対象物を認識し、バウンダリーボックスにて当該対象物を囲って出力することができる。 For example, an object detection model such as YOLO (You Only Look Once) can recognize objects in captured images and output a boundary box surrounding the object.

このような物体検出モデルを用いることにより、たとえば、フィールドを動き回る選手を撮像した画像から選手を抽出したり、公道を通過する自動車を撮像した画像から自動車を抽出したりして、選手の動きや自動車の動きを解析するためのデータを生成することができる。 By using such an object detection model, it is possible to generate data for analyzing the movements of players and cars, for example, by extracting players from images of players moving around the field, or cars from images of cars passing through a public road.

特開２０２０－１６０８０４Patent Publication No. 2020-160804

しかしながら、このようなモデルを精度よく動作させるためには、十分な学習を行うことが必要である。すなわち、大量の学習データを用意しなければならないという問題があった。 However, in order for such models to operate accurately, they need to be trained sufficiently. This means that a large amount of training data must be prepared, which poses a problem.

上記のような物体検出モデルにおいては、映し出されたものが背景であるのか対象物であるのかの推定を行っている。特定の背景を有する画像によって学習したモデルでは、当該背景中に対象物が存在する画像について、高精度で対象物を抽出することができる。しかし、異なる背景中に対象物が存在する画像については、高い精度を期待することはできなかった。高精度の抽出を行うのであれば、当該異なる背景中に対象物が存在する画像による学習データを生成して学習を行う必要があり、手間がかかっていた。 Object detection models like the one above estimate whether what is shown is background or an object. A model trained using images with a specific background can extract objects with high accuracy in images where the object exists in that background. However, high accuracy cannot be expected for images where the object exists in a different background. To achieve high-accuracy extraction, it is necessary to generate and train training data from images where the object exists in that different background, which is time-consuming.

この発明は、上記のような問題点を解決して、学習が容易な対象物検出装置を提供することを目的とする。
SUMMARY OF THE INVENTION An object of the present invention is to solve the above problems and to provide an object detection device that is easy to learn.

この発明の独立して適用可能な特徴を以下に列挙する。 Independently applicable features of this invention are listed below:

(1)(2)(3)(4)この発明に係る新規モデル生成装置は、背景中に少なくとも一つの対象物を有する対象物画像から対象物を抽出するように学習された対象物抽出既存モデルに基づいて、前記背景とは異なる背景中に少なくとも一つの対象物を有する異背景対象物画像から対象物を抽出することのできる対象物抽出新規モデルを生成する装置であって、前記異背景対象物画像を、前記対象物抽出既存モデルに与えて、抽出された対象物の情報を取得する抽出対象物取得手段と、前記異背景対象物画像および抽出対象物取得手段によって抽出された対象物の情報に基づいて学習データを生成する学習データ生成手段と、前記対象物抽出既存モデルを前記生成した学習データに基づいて追加学習し、対象物抽出新規モデルを生成する追加学習手段とを備えている。 (1)(2)(3)(4) The new model generation device of the present invention is a device that generates a new object extraction model that can extract an object from a different background object image having at least one object in a background different from the background, based on an existing object extraction model that has been trained to extract an object from an object image having at least one object in the background, and includes an extracted object acquisition means that provides the different background object image to the existing object extraction model and acquires information about the extracted object, a training data generation means that generates training data based on the different background object image and the information about the object extracted by the extracted object acquisition means, and an additional training means that additionally trains the existing object extraction model based on the generated training data to generate a new object extraction model.

したがって、所望の背景によって撮像した対象物を精度よく抽出することのできるモデルを容易に作成することができる。 This makes it easy to create a model that can accurately extract an object captured against a desired background.

(5)この発明に係る新規モデル生成装置は、前記新規モデル生成装置または前記学習データ生成装置が、ネットワークを介して端末装置からアクセス可能なサーバ装置として構成されていることを特徴としている。 (5) The new model generation device according to the present invention is characterized in that the new model generation device or the training data generation device is configured as a server device accessible from a terminal device via a network.

したがって、端末装置からアクセスすることで、新規モデルを得ることができる。 Therefore, new models can be obtained by accessing them from a terminal device.

(6)この発明に係る新規モデル生成装置は、学習データ生成手段が、前記異背景対象物画像および抽出対象物取得手段によって抽出された対象物の情報を取得する取得手段と、前記異背景対象物画像に対し背景差分法によって背景ではない非背景部を抽出する非背景抽出手段と、前記抽出対象物取得手段によって抽出された対象物のうち、対応する前記非背景部がない対象物の情報を学習データから排除する誤検出対応手段とを備えている。 (6) In the novel model generation device according to the present invention, the training data generation means includes an acquisition means for acquiring information about the different background object image and the object extracted by the extracted object acquisition means, a non-background extraction means for extracting non-background parts that are not background from the different background object image using a background subtraction method, and an erroneous detection response means for excluding from the training data information about objects extracted by the extracted object acquisition means that do not have corresponding non-background parts.

したがって、既存モデルによる誤判定に対応して、より適切な学習データを生成することができる。 This makes it possible to generate more appropriate training data in response to misclassifications made by existing models.

(7)この発明に係る新規モデル生成装置は、学習データ生成手段が、前記異背景対象物画像および抽出対象物取得手段によって抽出された対象物の情報を取得する取得手段と、前記異背景対象物画像に対し背景差分法によって背景ではない非背景部を抽出する非背景抽出手段と、前記非背景部に対応して、前記対象物が抽出対象物取得手段によって抽出されない領域があれば、前記異背景対象物画像中の当該領域を無特徴化した画像にして学習データとする検出漏対応手段とを備えている。 (7) The novel model generation device according to the present invention includes a training data generation means that includes an acquisition means for acquiring information about the different background object image and the object extracted by the extracted object acquisition means, a non-background extraction means for extracting non-background portions that are not background from the different background object image using a background subtraction method, and a detection omission response means that, if there is a region in the different background object image where the object is not extracted by the extracted object acquisition means, converts that region in the different background object image into a featureless image and uses it as training data.

(8)この発明に係る新規モデル生成装置は、対象物が、運動選手、歩行者、自動車、動物、ボールのいずれか一つを含む移動物であることを特徴としている。 (8) The novel model generation device of this invention is characterized in that the object is a moving object including one of an athlete, a pedestrian, a car, an animal, and a ball.

したがって、これらを対象物として新規モデルを生成することができる。 Therefore, new models can be generated using these as objects.

(9)-(12)この発明に係る新規モデル生成システムは、新規モデル生成サーバ装置と、当該新規モデル生成サーバ装置と通信可能に構成された端末装置とを備え、背景中に少なくとも一つの対象物を有する対象物画像から対象物を抽出するように学習された対象物抽出既存モデルに基づいて、前記背景とは異なる背景中に少なくとも一つの対象物を有する異背景対象物画像から対象物を抽出することのできる新規モデル生成システムであって、
前記新規モデル生成サーバ装置は、前記端末装置から送信されてきた前記異背景対象物画像および前記対象物抽出既存モデルの少なくとも重みパラメータを受信するサーバ側受信手段と、前記異背景対象物画像を、前記対象物抽出既存モデルに与えて、抽出された対象物の情報を取得する抽出対象物取得手段と、前記異背景対象物画像および抽出対象物取得手段によって抽出された対象物の情報に基づいて学習データを生成する学習データ生成手段と、当該学習データによって、前記対象物抽出既存モデルの少なくとも重みパラメータを追加学習して対象物抽出新規モデルの少なくとも重みパラメータを生成する追加学習手段と、生成された対象物抽出新規モデルの少なくとも重みパラメータを前記端末装置に送信するサーバ側送信手段と、を備え、
前記端末装置は、記録されている前記対象物抽出既存モデルの少なくとも重みパラメータを取得する取得手段と、前記異背景対象物画像および前記対象物抽出既存モデルの少なくとも重みパラメータを前記新規モデル生成サーバ装置に送信する端末側送信手段と、前記サーバ装置から送信されてきた前記対象物抽出新規モデルの少なくとも重みパラメータを受信する端末側受信手段と、前記対象物抽出既存モデルについて、受信した前記少なくとも重みパラメータに置き換えて、対象物抽出新規モデルを生成する新規モデル生成手段とを備えている。 (9)-(12) A new model generation system according to the present invention comprises a new model generation server device and a terminal device configured to be able to communicate with the new model generation server device, and is capable of extracting an object from a different background object image having at least one object in a background different from the background, based on an existing object extraction model trained to extract an object from an object image having at least one object in the background,
the new model generation server device comprises: a server-side receiving means for receiving the different background object image and at least weight parameters of the object-extraction existing model transmitted from the terminal device; an extracted object obtaining means for providing the different background object image to the object-extraction existing model to obtain information on the extracted object; a learning data generating means for generating learning data based on the different background object image and the information on the object extracted by the extracted object obtaining means; an additional learning means for additionally learning at least the weight parameters of the object-extraction existing model using the learning data to generate at least weight parameters of an object-extraction new model; and a server-side transmitting means for transmitting at least the weight parameters of the generated object-extraction new model to the terminal device;
The terminal device includes an acquisition means for acquiring at least the weight parameters of the recorded existing object-extraction model, a terminal-side transmission means for transmitting the different background object image and at least the weight parameters of the existing object-extraction model to the new model generation server device, a terminal-side reception means for receiving at least the weight parameters of the new object-extraction model transmitted from the server device, and a new model generation means for replacing the existing object-extraction model with the received at least weight parameters to generate a new object-extraction model.

(13)(14)この発明に係る対象物抽出装置は、対象物を撮像した対象物動画から対象物を抽出する対象物抽出装置であって、前記対象物動画の一部を、対象物画像から対象物を抽出するように学習された対象物抽出既存モデルに与えて、抽出された対象物の情報を取得する抽出対象物取得手段と、前記対象物動画の一部および抽出対象物取得手段によって抽出された対象物の情報に基づいて学習データを生成する学習データ生成手段と、前記対象物抽出既存モデルを前記生成した学習データに基づいて追加学習し、対象物抽出新規モデルを生成する追加学習手段と、前記対象物抽出新規モデルに基づいて、前記対象物動画の少なくとも一部以外の部分について、対象物を抽出する対象物抽出手段とを備えている。 (13)(14) The object extraction device of the present invention is an object extraction device that extracts an object from an object video captured of the object, and includes: extracted object acquisition means that provides a portion of the object video to an existing object extraction model that has been trained to extract objects from object images and acquires information about the extracted object; training data generation means that generates training data based on the portion of the object video and the information about the object extracted by the extracted object acquisition means; additional training means that additionally trains the existing object extraction model based on the generated training data and generates a new object extraction model; and object extraction means that extracts an object from a portion other than at least a portion of the object video based on the new object extraction model.

したがって、学習データを自動的に生成して、対象物を抽出しようとする動画に適した対象物抽出新規モデルを形成し、精度良く対象物を抽出することができる。 Therefore, training data can be automatically generated to create a new object extraction model suited to the video from which the object is to be extracted, enabling accurate object extraction.

(15)(16)この発明に係る新規モデル生成システムは、新規モデル生成サーバ装置と、当該新規モデル生成サーバ装置と通信可能に構成された端末装置とを備え、背景中に少なくとも一つの対象物を有する対象物画像から対象物を抽出するように学習された対象物抽出既存モデルに基づいて、前記背景とは異なる背景中に少なくとも一つの対象物を有する異背景対象物画像から対象物を抽出することのできる新規モデル生成システムであって、
前記新規モデル生成サーバ装置は、前記端末装置から送信されてきた前記異背景対象物画像を受信するサーバ側受信手段と、受信した異背景対象物画像に基づいて追加学習データを生成する追加学習データ生成手段と、当該追加学習データによって、前記対象物抽出既存モデルを追加学習して対象物抽出新規モデルを生成する追加学習手段と、生成された対象物抽出新規モデルもしくはそのパラメータを前記端末装置に送信するサーバ側送信手段と、前記対象物抽出新規モデルを、対象物抽出既存モデルとして記録するモデル更新手段と、を備え、
前記端末装置は、前記異背景対象物画像を前記新規モデル生成サーバ装置に送信する端末側送信手段と、前記サーバ装置から送信されてきた前記対象物抽出新規モデルまたはそのパラメータを受信する端末側受信手段とを備えている。 (15) (16) A new model generation system according to the present invention comprises a new model generation server device and a terminal device configured to be able to communicate with the new model generation server device, and is capable of extracting an object from a different background object image having at least one object in a background different from the background, based on an existing object extraction model trained to extract an object from an object image having at least one object in the background,
the new model generation server device comprises: a server-side receiving means for receiving the different background object image transmitted from the terminal device; an additional learning data generating means for generating additional learning data based on the received different background object image; an additional learning means for additionally learning the object extraction existing model using the additional learning data to generate an object extraction new model; a server-side transmitting means for transmitting the generated object extraction new model or its parameters to the terminal device; and a model updating means for recording the object extraction new model as an object extraction existing model,
The terminal device includes a terminal-side transmitting means for transmitting the different background object image to the new model generation server device, and a terminal-side receiving means for receiving the object extraction new model or its parameters transmitted from the server device.

したがって、サーバ装置に記録される既存モデルが使用されるたびに更新されて精度がよくなり、これに基づいて生成された新規モデルの精度が高くなる。 Therefore, existing models stored on the server device are updated and improved in accuracy each time they are used, which increases the accuracy of new models generated based on them.

「抽出対象物取得手段」は、実施形態においては、ステップＳ３がこれに対応する。 In this embodiment, step S3 corresponds to the "extraction object acquisition means."

「学習データ生成手段」は、実施形態においては、ステップＳ１０がこれに対応する。 In this embodiment, step S10 corresponds to the "learning data generation means."

「サーバ側受信手段」は、実施形態においては、ステップＳ２１がこれに対応する。 In this embodiment, step S21 corresponds to the "server-side receiving means."

「追加学習手段」は、実施形態においては、ステップＳ１１がこれに対応する。 In this embodiment, step S11 corresponds to the "additional learning means."

「サーバ側送信手段」は、実施形態においては、ステップＳ２３がこれに対応する。 In this embodiment, step S23 corresponds to the "server-side transmission means."

「端末側受信手段」は、実施形態においては、ステップＳ５４がこれに対応する。 In this embodiment, step S54 corresponds to the "terminal-side receiving means."

「新規モデル生成手段」は、実施形態においては、ステップＳ５５がこれに対応する。 In this embodiment, step S55 corresponds to the "new model generation means."

「モデル更新手段」は、実施形態においては、ステップＳ２５がこれに対応する。 In this embodiment, step S25 corresponds to the "model update means."

「装置」とは、１台のコンピュータによって構成されるものだけでなく、ネットワークなどを介して接続された複数のコンピュータによって構成されるものも含む概念である。したがって、本発明の手段（あるいは手段の一部でもよい）が複数のコンピュータに分散されている場合、これら複数のコンピュータが装置に該当する。 The term "device" is a concept that includes not only a single computer, but also multiple computers connected via a network, etc. Therefore, if the means of the present invention (or even part of the means) are distributed across multiple computers, these multiple computers constitute the device.

「プログラム」とは、ＣＰＵにより直接実行可能なプログラムだけでなく、ソース形式のプログラム、圧縮処理がされたプログラム、暗号化されたプログラム等を含む概念である。 The term "program" refers not only to programs that can be executed directly by a CPU, but also to source-format programs, compressed programs, encrypted programs, etc.

この発明の一実施形態による新規モデル生成装置の機能構成図である。1 is a functional configuration diagram of a novel model generating device according to an embodiment of the present invention. 新規モデル生成装置のハードウエア構成である。1 shows the hardware configuration of a new model generation device. ＹＯＬＯモデルのアーキテクチャである。This is the architecture of the YOLO model. 新規モデル生成プログラムのフローチャートである。10 is a flowchart of a new model generation program. 新規モデル生成プログラムのフローチャートである。10 is a flowchart of a new model generation program. 既存モデルの学習に用いたフレーム画像の例である。10 is an example of a frame image used for training an existing model. 異背景フレーム画像（異背景対象物画像）の例である。10 is an example of a different background frame image (different background object image). 図７の異背景フレーム画像を既存モデルに与えて、選手を抽出した画像である。This is an image in which the player is extracted by applying the different background frame image of FIG. 7 to an existing model. 図７の画像に対して背景差分をとった画像である。This is an image obtained by subtracting the background from the image in FIG. 7. 既存モデルにて抽出できなかった選手の領域を無特徴化（黒塗り）した画像である。This image shows the player's areas that could not be extracted using the existing model de-featurized (blacked out). 第２の実施形態による新規モデル生成システムの機能構成図である。FIG. 10 is a functional configuration diagram of a new model generation system according to a second embodiment. 新規モデル生成システムのシステム構成である。This is the system configuration of the new model generation system. 新規パラメータ生成サーバ装置のハードウエア構成である。1 shows the hardware configuration of a new parameter generation server device. 端末プログラム、サーバプログラムのフローチャートである。10 is a flowchart of a terminal program and a server program. 端末プログラム、サーバプログラムのフローチャートである。10 is a flowchart of a terminal program and a server program. 第３の実施形態による対象物抽出装置の機能構成図である。FIG. 10 is a functional configuration diagram of an object extraction device according to a third embodiment. 対象物抽出プログラムのフローチャートである。10 is a flowchart of an object extraction program. 対象物抽出プログラムのフローチャートである。10 is a flowchart of an object extraction program. 第４の実施形態による新規モデル生成システムの機能構成図である。FIG. 10 is a functional configuration diagram of a new model generation system according to a fourth embodiment. 新規モデル生成システムのシステム構成である。This is the system configuration of the new model generation system. 端末プログラム、サーバプログラムのフローチャートである。10 is a flowchart of a terminal program and a server program. 端末プログラム、サーバプログラムのフローチャートである。10 is a flowchart of a terminal program and a server program.

１．第１の実施形態
1.1機能構成
図１に、この発明の一実施形態による新規モデル生成装置の機能構成を示す。対象物抽出既存モデル４は、背景中に一以上の対象物を有する対象物画像から、対象物を抽出するように学習されたニューラル・ネットワーク・モデルである。 1. First embodiment
1 shows the functional configuration of a new model generation device according to an embodiment of the present invention. The object extraction existing model 4 is a neural network model trained to extract objects from an object image having one or more objects in a background.

この実施形態による新規モデル生成装置は、この対象物抽出既存モデルに基づいて、異なる背景の対象物画像についても精度よく対象物を抽出することのできる対象物抽出新規モデルを得ることを目的としている。 The purpose of the new model generation device in this embodiment is to obtain a new object extraction model based on this existing object extraction model that can accurately extract objects even in object images with different backgrounds.

抽出対象物取得手段２は、異なる背景の対象物画像（異背景対象物画像）を取得する。これを、対象物抽出既存モデル４に与えて、対象物を抽出させて、抽出された対象物情報を得る。 The extracted object acquisition means 2 acquires an object image with a different background (different background object image). This is provided to the object extraction existing model 4 to extract the object and obtain extracted object information.

学習データ生成手段６は、異背景対象物画像と対象物情報に基づいて、学習データを生成する。追加学習手段６は、生成された学習データに基づいて、対象物抽出既存モデル４を追加学習する。 The training data generation means 6 generates training data based on the different background object images and object information. The additional training means 6 additionally trains the existing object extraction model 4 based on the generated training data.

このようにして、学習データを人手によって作成することなく、異背景対象物画像についても精度よく対象物を抽出することのできる対象物抽出新規モデルを生成することができる。
In this way, a new object extraction model can be generated that can extract objects with high accuracy even in images of objects with different backgrounds, without manually creating learning data.

1.2ハードウエア構成
図２に、新規モデル生成装置のハードウエア構成を示す。ＣＰＵ１０には、メモリ１２、ディスプレイ１４、ＳＳＤ１６、ＤＶＤ－ＲＯＭドライブ１８、キーボード／マウス２０、通信回路２２が接続されている。 1.2 Hardware Configuration The hardware configuration of the new model generation device is shown in Figure 2. A CPU 10 is connected to a memory 12, a display 14, an SSD 16, a DVD-ROM drive 18, a keyboard/mouse 20, and a communication circuit 22.

通信回路２２は、インターネットに接続するための回路である。ＳＳＤ１６には、オペレーティングシステム３０、新規モデル生成プログラム３２が記録されている。新規モデル生成プログラム３２は、オペレーティングシステム３０と協働してその機能を発揮するものである。これらプログラムは、ＤＶＤ－ＲＯＭ３６に記録されていたものを、ＤＶＤ－ＲＯＭドライブ１８を介してＳＳＤ１６にインストールしたものである。また、ＳＳＤ１６には、対象物抽出既存モデル３４も記録されている。
The communications circuit 22 is a circuit for connecting to the Internet. The SSD 16 stores an operating system 30 and a new model generation program 32. The new model generation program 32 functions in cooperation with the operating system 30. These programs were originally stored on a DVD-ROM 36 and were installed on the SSD 16 via the DVD-ROM drive 18. The SSD 16 also stores an object extraction existing model 34.

1.3新規モデル生成処理
この実施形態では、物体検出モデルとしてＹＯＬＯを用いている。ＹＯＬＯのアーキテクチャを図３に示す。コンボリューション層、マックスプーリング層を複数層経た後、全結合層によって、検出した物体の領域を示す矩形の情報（たとえば、左上、右下の２点の座標）、物体のラベル（物体の種類）およびラベルの確度を出力するものである。 1.3 New Model Generation Process In this embodiment, YOLO is used as the object detection model. The YOLO architecture is shown in Figure 3. After passing through multiple convolutional and max pooling layers, a fully connected layer outputs rectangular information indicating the area of the detected object (for example, the coordinates of two points in the upper left and lower right), the object label (object type), and the accuracy of the label.

この実施形態では、選手を検出対象とする物体としている。ＹＯＬＯでは、予め、人を含む様々な物体を区別して検出するように学習されたモデルが用意されている。この学習済モデルをベースとして、様々なフィールドを背景とした選手、様々なアングルにて撮像された選手などの学習データに基づいて、追加学習を行う。 In this embodiment, players are the objects to be detected. YOLO has a model that has been trained in advance to distinguish and detect various objects, including people. Using this trained model as a base, additional learning is performed based on training data such as images of players against various field backgrounds and players photographed from various angles.

図６に、既存モデルの学習に用いる学習データの例を示す。図６Ａに示すように、フィールド上の選手を撮像した動画像から、フレーム画像を所定枚数間隔で抽出し（たとえば、１０フレームごとに１枚のフレーム画像を間引いて抽出する）、人手によって選手の領域をバウンダリーボックスで囲う。これにより、フレーム画像と選手・コーチなどの位置を示すバウンダリーボックスの座標位置（左上と右下の座標）が記録される。これが、学習データとなる。多数のフレームについて上記の処理を行って学習データを得る。 Figure 6 shows an example of training data used to train an existing model. As shown in Figure 6A, frame images are extracted at intervals of a predetermined number from video images of players on the field (for example, one frame image is extracted every 10 frames), and the player's area is manually enclosed in a boundary box. This records the coordinate positions (top left and bottom right coordinates) of the frame images and the boundary boxes indicating the positions of players, coaches, etc. This becomes the training data. The above process is performed on a large number of frames to obtain training data.

また、図６Ｂに示すように、異なるフィールドにおけるフレーム画像も学習データとして用意する。 In addition, as shown in Figure 6B, frame images from different fields are also prepared as training data.

図６Ａや図６Ｂに示す学習データに基づいて、ＹＯＬＯモデルを学習する。これにより、基本となる選手検出の学習済モデル（選手検出の既存モデル）を得ることができる。なお、ＹＯＬＯの学習済モデルを用いることなく、上記の学習データにて未学習のＹＯＬＯモデルを学習するようにしてもよい。 The YOLO model is trained based on the training data shown in Figures 6A and 6B. This allows a basic trained model for player detection (an existing model for player detection) to be obtained. Note that it is also possible to train an untrained YOLO model using the above training data without using a trained YOLO model.

図４に、新規モデル生成プログラムのフローチャートを示す。ＣＰＵ１０は、上記の学習に用いた背景とは異なる背景にて撮像された選手の連続画像（フレーム画像）を取得する（ステップＳ１）。ここで、「異なる背景にて撮像された」とは、異なるフィールドで撮像された選手の画像や、同じフィールドで撮像され異なる撮像アングルにて撮像された選手の画像などをいうものである。 Figure 4 shows a flowchart of the new model generation program. The CPU 10 acquires a series of images (frame images) of a player captured against a background different from the background used for the learning described above (step S1). Here, "captured against a different background" refers to images of a player captured on a different field, or images of a player captured on the same field but from a different angle, etc.

たとえば、図７に示すようなフレーム画像を用いる。この実施形態においては、撮像した動画データからフレーム画像を間引いて抽出し（たとえば１０枚ごとに１枚のフレーム画像を取り出す）、ＣＰＵ１０はこれらを取得する。この実施形態では、数分程度の動画からフレーム画像を取り出すようにしている。 For example, frame images such as those shown in Figure 7 are used. In this embodiment, frame images are extracted from captured video data (for example, one frame image is extracted for every 10 frames), and the CPU 10 acquires these. In this embodiment, frame images are extracted from a video of approximately several minutes in length.

次に、ＣＰＵ１０は、前記フレーム画像のうちの先頭の１枚を対象フレーム画像とし、以下の処理を行う。選手検出の学習済の既存モデル３４を用いて、対象フレーム画像について、選手の検出を行わせる。これにより、フレーム画像上における、選手やコーチなどを取り囲むバウンダリーボックスの座標（左上と右下の点の座標）を得ることができる。 Next, the CPU 10 selects the first of the frame images as the target frame image and performs the following process. Using the existing model 34 that has been trained for player detection, it detects players in the target frame image. This makes it possible to obtain the coordinates (coordinates of the upper left and lower right points) of the boundary boxes surrounding players, coaches, etc. on the frame image.

得られたバウンダリーボックスの下辺の中点の座標を求め、これが画像に写ったフィールドを囲う線４５の内側にあるか否かを判断する。内側であれば、フィールド内の選手であると判断できる。外側であれば、控え選手やコーチなどであると判断できるので、そのバウンダリーボックスを削除する。 The coordinates of the midpoint of the bottom edge of the obtained boundary box are found, and it is determined whether this is inside the line 45 that surrounds the field in the image. If it is inside, it can be determined that the person is a player on the field. If it is outside, it can be determined that the person is a substitute player or coach, and the boundary box for that person is deleted.

これを画像として表示すると、図８に示すように、検出した選手をバウンダリーボックスで囲ったフレーム画像となる。なお、フィールドを囲う線４５は、操作者がフレーム画像をディスプレイに表示しながら、マウスによって予め指定したものである。 When this is displayed as an image, it becomes a frame image in which the detected players are surrounded by a boundary box, as shown in Figure 8. Note that the line 45 surrounding the field is specified in advance by the operator using the mouse while the frame image is displayed on the display.

次に、ＣＰＵ１０は、対象フレーム画像について、背景差分法により背景以外の物体画像を抽出する（ステップＳ４）。なお、陰の無い時間帯の背景を用いて、陰のある時間帯の画像について背景差分を適用すると、陰を物体画像として抽出してしまうことになる。そこで、この実施形態では、混合正規分布による動的背景差分法（ＭＯＧ２）を用いて背景差分画像を得るようにしている。 Next, the CPU 10 extracts object images other than the background from the target frame image using background subtraction (step S4). Note that if background subtraction is applied to an image from a time period with no shadows using a background from a time period with shadows, the shadows will be extracted as object images. Therefore, in this embodiment, a background subtraction image is obtained using dynamic background subtraction (MOG2) using a mixed normal distribution.

動的背景差分法は、対象フレームの前後所定フレーム（たとえば、前後１００フレーム）にわたって、各画素の代表値を算出し、これら代表値によって背景画像を生成している。したがって、陰のある時間帯であれば、陰も含めた背景画像が得られ、陰自体を物体として抽出することが無くなる。 Dynamic background subtraction calculates a representative value for each pixel over a specified number of frames before and after the target frame (for example, 100 frames before and after), and generates a background image using these representative values. Therefore, if there is shadow in the image, a background image that includes the shadow is obtained, and the shadow itself is not extracted as an object.

このようにして得られた物体画像の中には、選手ではない小さなノイズが含まれている。そこで、この実施形態では、図９に示すＹ方向に応じて選手の標準高さＨ（Ｙ軸方向の長さ）を決めておき（奥に行くほど選手が小さく映し出されるので）、この標準高さＨの１／３以下の高さしかない物体は、選手では無いとして削除している。 The object images obtained in this way contain small noises that are not players. Therefore, in this embodiment, the standard height H (length in the Y-axis direction) of a player is determined according to the Y direction shown in Figure 9 (because the further back the player is, the smaller the player appears), and objects that are less than one-third of this standard height H are deemed not to be players and are deleted.

また、画像として得られた物体を取り囲む矩形を生成し、その下辺の中点の座標が、フィールドを囲う線５４の内側にある物体のみを残すようにしている。これにより、フィールド内の選手の画像のみを得ることができる。図９に、図７の対象フレーム画像についての背景差分画像を示す。 A rectangle is then generated surrounding the object obtained as an image, and only objects whose coordinates of the midpoint of the bottom edge are within the line 54 surrounding the field are retained. This allows for the acquisition of images of only the players on the field. Figure 9 shows a background difference image for the target frame image in Figure 7.

次に、ＣＰＵ１０は、既存の選手検出モデルにて検出された選手のバウンダリーボックスがあるにも拘わらず、対応する位置に、差分として検出された物体画像がない箇所を選択する（ステップＳ５）。たとえば、図８の選手検出画像においては、バウンダリーボックス４０として選手が検出されているが、図９の背景差分画像においては、対応する位置に物体が現れていない。したがって、ＣＰＵ１０は、このバウンダリーボックス４０を選択することになる。 Next, the CPU 10 selects a location where there is a boundary box of a player detected using an existing player detection model, but no object image detected as a difference exists at the corresponding position (step S5). For example, in the player detection image of Figure 8, a player is detected as boundary box 40, but in the background difference image of Figure 9, no object appears at the corresponding position. Therefore, the CPU 10 selects this boundary box 40.

続いて、ＣＰＵ１０は、選択したバウンダリーボックス４０を削除する（ステップＳ６）。すなわち、選手としての検出がなかったものとする。背景差分画像に物体が現れていないにも拘わらず、選手として検出するということは、誤って検出した可能性が高い。したがって、これを学習データとして採用することは好ましくないためである。 The CPU 10 then deletes the selected boundary box 40 (step S6). In other words, it is considered that the object was not detected as a player. If an object is detected as a player even though it does not appear in the background subtraction image, it is highly likely that the detection was erroneous. Therefore, it is not desirable to use this as training data.

図８の例では、１箇所のバウンダリーボックス４０のみが該当したが、複数箇所のバウンダリーボックスが該当する場合には、これら全てのバウンダリーボックスを削除する。 In the example shown in Figure 8, only one boundary box 40 is found to be relevant, but if multiple boundary boxes are relevant, all of these boundary boxes will be deleted.

次に、ＣＰＵ１０は、図９の背景差分画像に物体があるにも拘わらず、図８の選手検出画像においてバウンダリーボックスが現れていない箇所を選択する（ステップＳ７）。たとえば、図９の背景差分画像においては物体（選手）５０が現れているが、図８の選手検出画像においては対応する位置にバウンダリーボックスが現れていない。したがって、ＣＰＵ１０は、選手５０を選択することになる。 Next, the CPU 10 selects a location where an object is present in the background difference image of Figure 9 but no boundary box appears in the player detection image of Figure 8 (step S7). For example, object (player) 50 appears in the background difference image of Figure 9, but no boundary box appears in the corresponding position in the player detection image of Figure 8. Therefore, the CPU 10 selects player 50.

ＣＰＵ１０は、図１０に示すように、対象フレーム画像において選手５０を囲う選手領域６５（ここでは矩形領域であるが選手を囲う領域であればその他の形状でもよい）を設定し、この領域を黒塗りする（ステップＳ８）。すなわち、その領域には選手はいなかったものとする。背景差分画像において物体（選手）が現れているにも拘わらずこれを選手として検出できなかったのであるから、当該物体（選手）の画像が検出には適さない画像である可能性が高い。したがって、当該物体（選手）の画像を対象フレーム画像からなくすために、上記のように黒塗りにする。黒塗りにすることによって、その領域における画像的な特徴を無くすようにしている。 As shown in Figure 10, the CPU 10 sets a player area 65 (here a rectangular area, but any other shape that surrounds a player can be used) surrounding the player 50 in the target frame image and blacks out this area (step S8). In other words, it is assumed that there was no player in that area. Because an object (player) appeared in the background difference image but could not be detected as a player, it is highly likely that the image of the object (player) is not suitable for detection. Therefore, in order to remove the image of the object (player) from the target frame image, it is blacked out as described above. By blacking out the area, the image characteristics of that area are eliminated.

なお、黒塗りでは無く、所定の色で一色に塗りつぶすようにしてもよい。また、物体（選手）の領域に背景画像を填め込むようにしてもよい。 In addition, instead of painting it black, it may be painted solid with a specified color. Also, a background image may be inserted into the area of the object (player).

図１０の例では、１箇所の選手領域６５のみが該当したが、複数箇所の領域が該当する場合には、これら全ての領域を黒塗りする。 In the example of Figure 10, only one player area 65 is applicable, but if multiple areas are applicable, all of these areas will be painted black.

以上のようにして、ステップＳ７、Ｓ８の処理を経た対象フレーム画像に、選手として検出されたバウンダリーボックス（ステップＳ６で削除されたものは除く）の座標情報を付加して、学習データとする。 In this way, the coordinate information of the boundary boxes detected as players (excluding those deleted in step S6) is added to the target frame image that has been processed in steps S7 and S8, and this is used as learning data.

上記のように、対象フレーム画像に基づいて学習データを生成すると、ＣＰＵ１０は、次のフレームを対象フレームとして、ステップＳ３～Ｓ８の処理を繰り返す（ステップＳ２、Ｓ９）。全てのフレーム画像について処理を終えると、追加学習のための学習データが得られることになる（ステップＳ１０）。 Once learning data is generated based on the target frame image as described above, the CPU 10 repeats steps S3 to S8 with the next frame as the target frame (steps S2 and S9). When processing has been completed for all frame images, learning data for additional learning will be obtained (step S10).

生成した学習データは、既存モデルの学習データの背景には含まれていなかった、新たな背景を有する。したがって、この学習データによって、既存モデルを追加学習することで、新たな背景についても精度よく判定を行うことのできるモデルを得ることができる。 The generated training data contains new backgrounds that were not included in the backgrounds of the training data for existing models. Therefore, by additionally training existing models using this training data, it is possible to obtain models that can accurately determine new backgrounds as well.

ＣＰＵ１０は、このようにして生成した学習データ用いて、既存モデル３４を追加学習する（ステップＳ１１）。これにより、新たな背景にも対応して精度よく選手を検出できる検出モデルを得ることができる。
The CPU 10 uses the training data generated in this way to additionally train the existing model 34 (step S11), thereby obtaining a detection model that can accurately detect players even against new backgrounds.

1.4その他
(1)上記実施形態では、選手のみを検出対象としている。しかし、選手に代えてあるいは選手に加えてボール、コーチ、控え選手などを検出対象としてもよい。 1.4 Other
(1) In the above embodiment, only players are detected. However, instead of or in addition to players, balls, coaches, substitute players, etc. may also be detected.

また、選手だけでなく歩行者などの人物や自動車などの移動物全般に適用することができる。 It can also be applied to moving objects in general, including not only athletes but also people such as pedestrians and cars.

(2)上記実施形態では、競技エリア内の人物（選手）を検出対象としている。しかし、競技エリア外の人物（選手）も検出対象としてもよい。 (2) In the above embodiment, people (players) within the competition area are the detection targets. However, people (players) outside the competition area may also be the detection targets.

(3)上記実施形態では、生成した学習データを用いて所定の背景にて学習された既存モデルを追加学習している。しかし、生成した学習データを用いて未学習のモデル（あるいは、ＹＯＬＯにて提供されている学習済モデル）を学習するようにしてもよい。 (3) In the above embodiment, the generated training data is used to additionally train an existing model trained against a specific background. However, the generated training data may also be used to train an untrained model (or a trained model provided by YOLO).

(4)上記実施形態では、フレーム画像を用いて学習データを生成している。しかし、フレーム画像以外の静止画像を用いて学習データを生成するようにしてもよい。 (4) In the above embodiment, training data is generated using frame images. However, training data may also be generated using still images other than frame images.

(5)上記実施形態では、ステップＳ８において、対象フレーム画像の選手を黒塗りするようにしている。しかし、対象フレーム画像はそのままにして、当該選手を囲うようにバウンダリーボックスを追加するようにしてもよい。 (5) In the above embodiment, in step S8, the player in the target frame image is painted black. However, it is also possible to leave the target frame image as is and add a boundary box to surround the player.

(6)この実施形態では、YOLO(You Only Look Onece)による物体検出のモデルを用いて、撮像画像から対象物（選手）の位置と領域を認識し、バウンダリーボックスにて対象物を囲って出力するようにしている。なお、YOLO以外の、SSD(Single Shot MultiBox Detector)、R-CNN(Regions with CNN features)などの物体検出(Object Detection)のアルゴリズムを用いてもよい。また、Mask R-CNNやYOLACTなどのインスタンス・セグメンテーション(Instande Segmentation)を用いてもよい。 (6) In this embodiment, a YOLO (You Only Look Once) object detection model is used to recognize the position and area of the object (player) from the captured image, and the object is surrounded by a boundary box and output. Note that object detection algorithms other than YOLO, such as SSD (Single Shot MultiBox Detector) and R-CNN (Regions with CNN features), may also be used. Instance segmentation algorithms such as Mask R-CNN and YOLACT may also be used.

(7)上記変形例は、その本質に反しない限り、他の実施形態においても適用することができる。
(7) The above modifications can be applied to other embodiments as long as they do not contradict the essence of the embodiments.

２．第２の実施形態
2.1機能構成
図１１に、第２の実施形態による新規モデル生成システムの機能構成を示す。このシステムは、端末装置８０、パラメータ生成サーバ装置６０を備えている。 2. Second embodiment
11 shows the functional configuration of the new model generation system according to the second embodiment. This system includes a terminal device 80 and a parameter generation server device 60.

端末装置８０には、所定の背景中に少なくとも一つの対象物を有する対象物画像に基づいて学習された、対象物抽出既存モデル８４が記録されている。この実施形態では、第１の実施形態と同じようにＹＯＬＯをモデルとして用いている。したがって、対象物抽出既存モデル８４は、ＹＯＬＯプログラム、ハイパーパラメーター（コンボリューションをいくつ用意するか、ニューラルネットワークの層の数、正規化の係数など）、各ノード間を結合する重みのパラメータを備えて構築されている。 The terminal device 80 stores an existing object extraction model 84 that has been trained based on an object image having at least one object in a specified background. In this embodiment, YOLO is used as the model, as in the first embodiment. Therefore, the existing object extraction model 84 is constructed with the YOLO program, hyperparameters (such as the number of convolutions to prepare, the number of neural network layers, and normalization coefficients), and weight parameters for connecting each node.

このシステムは、端末装置８０に記録されている対象物抽出既存モデル８４を、新たな背景の画像（異背景対象物画像）に対応して対象物を検出するように追加学習するシステムである。 This system incrementally trains an existing object extraction model 84 stored in a terminal device 80 so that it can detect objects in response to new background images (different background object images).

端末装置８０の取得手段８２は、対象物抽出既存モデル８４の重みパラメータを取得する。端末側送信手段８６は、この重みパラメータおよび異背景対象物画像８８をサーバ装置６０に送信する。この実施形態では、ハイパーパラメータは予め固定されたものを用いるようにしているので、サーバ装置６０には送信しない。 The acquisition means 82 of the terminal device 80 acquires the weight parameters of the existing object extraction model 84. The terminal-side transmission means 86 transmits these weight parameters and the different background object image 88 to the server device 60. In this embodiment, the hyperparameters are fixed in advance, so they are not transmitted to the server device 60.

サーバ側受信手段６２は、これを受信する。抽出対象物取得手段６４は、受信した重みパラメータに基づいて対象物既存モデル７０を構築し、これに受信した異背景対象物画像８８を与えて、対象物を検出させる。抽出対象物取得手段６４は、検出された対象物の情報（バウンダリーボックスの座標）を取得する。 The server-side receiving means 62 receives this. The extracted object acquisition means 64 constructs an existing object model 70 based on the received weighting parameters, and provides this with the received different background object image 88 to detect the object. The extracted object acquisition means 64 acquires information about the detected object (boundary box coordinates).

学習データ生成手段６６は、端末装置からの異背景対象物画像８８と検出された対象物の情報とに基づいて、学習データを生成する。追加学習手段８８は、対象物既存モデル７０を、生成した学習データにより追加学習し、新規重みパラメータ７２を得る。 The training data generation means 66 generates training data based on the different background object image 88 from the terminal device and information about the detected object. The additional training means 88 additionally trains the existing object model 70 using the generated training data to obtain new weight parameters 72.

サーバ側送信手段７４は、新規重みパラメータ７２を端末装置に送信する。端末側受信手段９４は、新規重みパラメータ７２を受信する。新規モデル生成手段９０は、対象物抽出既存モデル８４の重みパラメータを、受信した新規重みパラメータに置き換えて、対象物抽出新規モデル９２を生成する。 The server-side transmitting means 74 transmits the new weight parameters 72 to the terminal device. The terminal-side receiving means 94 receives the new weight parameters 72. The new model generating means 90 replaces the weight parameters of the existing object extraction model 84 with the received new weight parameters to generate a new object extraction model 92.

以上のようにして、端末装置８０の側から重みパラメータと異背景対象物画像をサーバ装置６０に送信することで、新規重みパラメータを得て、対象物抽出新規モデル９２を生成することができる。
In this manner, by transmitting the weighting parameters and the different background object image from the terminal device 80 to the server device 60, new weighting parameters can be obtained and a new object extraction model 92 can be generated.

2.2システム構成・ハードウエア構成
図１２に、新規モデル生成システムのシステム構成を示す。パラメータ生成サーバ装置６０は、インターネット上のサーバ装置として構築されている。端末装置８０ａ、８０ｂ・・・８０ｘは、インターネットを介してパラメータ生成サーバ装置６０と通信可能である。 2.2 System Configuration and Hardware Configuration Fig. 12 shows the system configuration of the new model generation system. The parameter generation server device 60 is constructed as a server device on the Internet. Terminal devices 80a, 80b, ..., 80x can communicate with the parameter generation server device 60 via the Internet.

端末装置８０のハードウエア構成は、第１の実施形態において示した図２と同様である。ただし、ＳＳＤ１６には、新規モデル生成プログラム３２に代えて、端末プログラムが記録されている。 The hardware configuration of the terminal device 80 is the same as that shown in Figure 2 in the first embodiment. However, a terminal program is stored on the SSD 16 instead of the new model generation program 32.

図１３に、パラメータ生成サーバ装置６０のハードウエア構成を示す。ＣＰＵ１１０には、メモリ１１２、ＳＳＤ１１４、ＤＶＤ－ＲＯＭドライブ１１６、通信回路１１８が接続されている。通信回路１１８は、インターネットに接続するための回路である。 Figure 13 shows the hardware configuration of the parameter generation server device 60. The CPU 110 is connected to a memory 112, an SSD 114, a DVD-ROM drive 116, and a communication circuit 118. The communication circuit 118 is a circuit for connecting to the Internet.

ＳＳＤ１１４には、オペレーティングシステム１２０、サーバプログラム１２２、対象物抽出未学習モデル１２４が記録されている。これらプログラムは、ＤＶＤ－ＲＯＭ１２６に記録されていたものを、ＤＶＤ－ＲＯＭドライブ１１６を介してＳＳＤ１１４にインストールしたものである。なお、インターネットを介して他のサーバ装置からインストールしてもよい。
The SSD 114 stores an operating system 120, a server program 122, and an object extraction unlearned model 124. These programs were originally recorded on a DVD-ROM 126 and have been installed on the SSD 114 via the DVD-ROM drive 116. The programs may also be installed from another server device via the Internet.

2.3新規モデル生成処理
図１４、図１５に、サーバプログラム１２２、端末プログラムのフローチャートを示す。 2.3 New Model Generation Process FIGS. 14 and 15 show flowcharts of the server program 122 and the terminal program.

端末装置８０のＳＳＤ１６には、ＹＯＬＯなどによって構築された対象物抽出既存モデル３４が記録されている。この対象物抽出既存モデル３４は、第１の実施形態と同じように、様々な背景にて撮像された選手の画像（たとえば図６Ａや図６Ｂのような画像）にて学習されたものである。 The SSD 16 of the terminal device 80 stores an existing object extraction model 34 constructed by YOLO or similar. As with the first embodiment, this existing object extraction model 34 has been trained using images of players captured against various backgrounds (for example, images such as those shown in Figures 6A and 6B).

端末装置８０のＣＰＵ１０（以下、端末装置８０と省略することがある）は、上記の学習に用いた背景とは異なる背景にて撮像された選手の連続画像（フレーム画像）を取得する（ステップＳ５１）。ここで、「異なる背景にて撮像された」とは、異なるフィールドで撮像された選手の画像や、同じフィールドで撮像され異なる撮像アングルにて撮像された選手の画像などをいうものである。 The CPU 10 of the terminal device 80 (hereinafter sometimes referred to as the terminal device 80) acquires a series of images (frame images) of a player captured against a background different from the background used in the learning (step S51). Here, "captured against a different background" refers to images of a player captured on a different field, or images of a player captured on the same field but from a different angle, etc.

たとえば、図７に示すようなフレーム画像を用いる。この実施形態においては、撮像した動画データからフレーム画像を間引いて抽出し（たとえば１０枚ごとに１枚のフレーム画像を取り出す）、端末装置８０はこれらを取得する。この実施形態では、数分程度の動画からフレーム画像を取り出すようにしている。 For example, frame images such as those shown in Figure 7 are used. In this embodiment, frame images are extracted from the captured video data (for example, one frame image is extracted every 10 frames), and the terminal device 80 acquires these. In this embodiment, frame images are extracted from a video of approximately several minutes in length.

次に、端末装置８０は、ＳＳＤ１６に記録されている学習済の対象物抽出既存モデル３４の重みパラメータを取得する（ステップＳ５２）。続いて、端末装置８０は、ステップＳ５１において取得した異背景のフレーム画像群と、ステップＳ５２において取得した重みパラメータを、パラメータ生成サーバ装置６０に送信する（ステップＳ５３）。 Next, the terminal device 80 acquires the weighting parameters of the trained object extraction existing model 34 recorded on the SSD 16 (step S52). Subsequently, the terminal device 80 transmits the group of frame images with different backgrounds acquired in step S51 and the weighting parameters acquired in step S52 to the parameter generation server device 60 (step S53).

パラメータ生成サーバ装置６０のＣＰＵ１１０（以下、パラメータ生成サーバ装置６０と省略することがある）は、異背景のフレーム画像群および既存モデルの重みパラメータを受信する（ステップＳ２１）。続いて、パラメータ生成サーバ装置６０は、ＳＳＤ１１４に記録されている未学習の対象物抽出モデル１２４を読み出して、その未学習パラメータを、受信した既存モデルの重みパラメータで置き換えて、対象物抽出既存モデルを生成する（ステップＳ２２）。 The CPU 110 of the parameter generation server device 60 (hereinafter sometimes abbreviated as the parameter generation server device 60) receives a group of frame images with different backgrounds and weight parameters of the existing model (step S21). Next, the parameter generation server device 60 reads the unlearned object extraction model 124 recorded on the SSD 114, and replaces the unlearned parameters with the weight parameters of the received existing model to generate an object extraction existing model (step S22).

次に、パラメータ生成サーバ装置６０は、受信した異背景フレーム画像のうちの先頭の１枚を対象フレーム画像とし、以下の処理を行う。ステップＳ２２にて生成した対象物抽出既存モデルを用いて、対象フレーム画像について、選手の検出を行わせる（ステップＳ３）。これにより、フレーム画像上における、選手やコーチなどを取り囲むバウンダリーボックスの座標（左上と右下の点の座標）を得ることができる。第１の実施形態と同じようにして、フィールド内の選手のバウンダリーボックスのみを残す。これを画像として表示すると、図８に示すように、検出した選手をバウンダリーボックスで囲ったフレーム画像となる。 Next, the parameter generation server device 60 selects the first of the received different background frame images as the target frame image and performs the following process. Using the existing object extraction model generated in step S22, players are detected for the target frame image (step S3). This allows the coordinates (coordinates of the upper left and lower right points) of the boundary boxes surrounding players, coaches, etc. on the frame image to be obtained. In the same way as in the first embodiment, only the boundary boxes of the players on the field are left. When this is displayed as an image, the frame image will have the detected players surrounded by boundary boxes, as shown in Figure 8.

次に、パラメータ生成サーバ装置６０は、対象フレーム画像について、背景差分法により背景以外の物体画像を抽出する（ステップＳ４）。第１の実施形態と同じように、混合正規分布による動的背景差分法（ＭＯＧ２）を用いて背景差分画像を得るようにしている。また、Ｙ軸方向の長さによって小さなノイズを削除する方法も第１の実施形態と同じである。さらに、第１の実施形態と同じようにして、図９に示すように、フィールド内の選手の画像のみを得るようにしている。 Next, the parameter generation server device 60 extracts an object image other than the background from the target frame image using background subtraction (step S4). As in the first embodiment, a background subtraction image is obtained using a dynamic background subtraction method (MOG2) based on a mixed normal distribution. The method of removing small noise based on the length in the Y-axis direction is also the same as in the first embodiment. Furthermore, as in the first embodiment, only images of the players on the field are obtained, as shown in Figure 9.

次に、パラメータ生成サーバ装置６０は、既存の選手検出モデルにて検出された選手のバウンダリーボックスがあるにも拘わらず、対応する位置に、差分として検出された物体画像がない箇所を選択する（ステップＳ５）。たとえば、図８の選手検出画像においては、バウンダリーボックス４０として選手が検出されているが、図９の背景差分画像においては、対応する位置に物体が現れていない。したがって、パラメータ生成サーバ装置６０は、このバウンダリーボックス４０を選択することになる。 Next, the parameter generation server device 60 selects a location where there is a boundary box of a player detected using an existing player detection model, but where there is no object image detected as a difference at the corresponding position (step S5). For example, in the player detection image of Figure 8, a player is detected as boundary box 40, but in the background difference image of Figure 9, no object appears at the corresponding position. Therefore, the parameter generation server device 60 selects this boundary box 40.

続いて、パラメータ生成サーバ装置６０は、選択したバウンダリーボックス４０を削除する（ステップＳ６）。すなわち、選手としての検出がなかったものとする。背景差分画像に物体が現れていないにも拘わらず、選手として検出するということは、誤って検出した可能性が高い。したがって、これを学習データとして採用することは好ましくないためである。 The parameter generation server device 60 then deletes the selected boundary box 40 (step S6). In other words, it is considered that it was not detected as a player. If an object is detected as a player even though it does not appear in the background subtraction image, it is highly likely that it was detected incorrectly. Therefore, it is not desirable to use this as training data.

次に、パラメータ生成サーバ装置６０は、図９の背景差分画像に物体があるにも拘わらず、図８の選手検出画像においてバウンダリーボックスが現れていない箇所を選択する（ステップＳ７）。たとえば、図９の背景差分画像においては物体（選手）５０が現れているが、図８の選手検出画像においては対応する位置にバウンダリーボックスが現れていない。したがって、パラメータ生成サーバ装置６０は、選手５０を選択することになる。 Next, the parameter generation server device 60 selects a location where an object is present in the background difference image of Figure 9 but no boundary box appears in the player detection image of Figure 8 (step S7). For example, object (player) 50 appears in the background difference image of Figure 9, but no boundary box appears in the corresponding position in the player detection image of Figure 8. Therefore, the parameter generation server device 60 selects player 50.

パラメータ生成サーバ装置６０は、図１０に示すように、対象フレーム画像において選手５０を囲う選手領域６５を設定し、この領域を黒塗りする（ステップＳ８）。すなわち、その領域には選手はいなかったものとする。背景差分画像において物体（選手）が現れているにも拘わらずこれを選手として検出できなかったのであるから、当該物体（選手）の画像が検出には適さない画像である可能性が高い。したがって、当該物体（選手）の画像を対象フレーム画像からなくすために、上記のように黒塗りにする。黒塗りにすることによって、その領域における画像的な特徴を無くすようにしている。 As shown in Figure 10, the parameter generation server device 60 sets a player area 65 surrounding the player 50 in the target frame image and paints this area black (step S8). In other words, it is assumed that there was no player in that area. Because an object (player) appeared in the background difference image but could not be detected as a player, it is highly likely that the image of the object (player) is not suitable for detection. Therefore, in order to remove the image of the object (player) from the target frame image, it is painted black as described above. Painting it black eliminates the image characteristics of that area.

上記のように、対象フレーム画像に基づいて学習データを生成すると、パラメータ生成サーバ装置６０は、次のフレームを対象フレームとして、ステップＳ３～Ｓ８の処理を繰り返す（ステップＳ２、Ｓ９）。全てのフレーム画像について処理を終えると、追加学習のための学習データが得られることになる（ステップＳ１０）。 Once learning data is generated based on the target frame image as described above, the parameter generation server device 60 repeats steps S3 to S8 with the next frame as the target frame (steps S2 and S9). Once processing has been completed for all frame images, learning data for additional learning will be obtained (step S10).

パラメータ生成サーバ装置６０は、このようにして生成した学習データ用いて、ステップＳ２２にて生成した既存モデルを追加学習する（ステップＳ１１）。これにより、新たな背景にも対応して精度よく選手を検出できる検出モデルを得ることができる。 The parameter generation server device 60 uses the training data generated in this way to additionally train the existing model generated in step S22 (step S11). This makes it possible to obtain a detection model that can accurately detect players even in new backgrounds.

パラメータ生成サーバ装置６０は、このようにして追加学習したモデルの重みパラメータを取得し、端末装置８０に送信する（ステップＳ２３）。 The parameter generation server device 60 obtains the weight parameters of the model that has been additionally trained in this way and transmits them to the terminal device 80 (step S23).

端末装置８０は、パラメータ生成サーバ装置６０から送信されてきた重みパラメータを受信する（ステップＳ５４）。端末装置８０は、ＳＳＤ１６に記録されている対象物抽出既存モデル３４のパラメータを、受信した重みパラメータにて置き換えて、対象物抽出新規モデルを生成する（ステップＳ５５）。上記のようにして、端末装置８０から、既存モデルの重みパラメータ、異背景対象物画像を、パラメータ生成サーバ装置６０に送信することで、追加学習されたパラメータを得て、対象物抽出新規モデルを生成することができる。
The terminal device 80 receives the weight parameters transmitted from the parameter generation server device 60 (step S54). The terminal device 80 replaces the parameters of the existing object extraction model 34 recorded in the SSD 16 with the received weight parameters to generate a new object extraction model (step S55). By transmitting the weight parameters of the existing model and the different background object image from the terminal device 80 to the parameter generation server device 60 in the above manner, additional learned parameters can be obtained and a new object extraction model can be generated.

2.4その他
(1)上記実施形態では、既存モデルの重みパラメータを、端末装置８０からサーバ装置６０に送信するようにしている。しかし、重みパラメータだけでなく、ハイパーパラメータも送信するようにしてもよい。この場合、パラメータ生成サーバ装置６０は、重みパラメータだけでなくハイパーパラメータも追加学習し、端末装置８０に送信する。端末装置８０は、これを受けて、重みパラメータ、ハイパーパラメータに基づいて、対象物抽出新規モデルを生成する。 2.4 Other
(1) In the above embodiment, the weight parameters of the existing model are transmitted from the terminal device 80 to the server device 60. However, not only the weight parameters but also the hyperparameters may be transmitted. In this case, the parameter generation server device 60 additionally learns not only the weight parameters but also the hyperparameters and transmits them to the terminal device 80. In response to this, the terminal device 80 generates a new object extraction model based on the weight parameters and hyperparameters.

(2)上記実施形態では、端末装置８０からサーバ装置６０に対して、重みパラメータを送信している。しかし、対象物抽出既存モデル全体をサーバ装置６０に送信し、サーバ装置６０において追加学習をして、対象物抽出新規モデルを返信してもらうようにしてもよい。 (2) In the above embodiment, weight parameters are transmitted from the terminal device 80 to the server device 60. However, the entire existing object extraction model may be transmitted to the server device 60, which may then perform additional learning and return a new object extraction model.

(3)上記変形例は、その本質に反しない限り、他の実施形態においても適用することができる。
(3) The above modifications can be applied to other embodiments as long as they do not contradict the essence of the embodiments.

３．第３の実施形態
3.1機能構成
図１６に、第３の実施形態による対象物抽出装置の機能構成を示す。 3. Third embodiment
3.1 Functional Configuration FIG. 16 shows the functional configuration of the object extraction device according to the third embodiment.

対象物抽出既存モデル４は、対象物動画を構成する各対象物画像（静止画）から、対象物を抽出するように学習されたニューラル・ネットワーク・モデルである。 The existing object extraction model 4 is a neural network model trained to extract objects from each object image (still image) that makes up an object video.

抽出対象物取得手段２は、対象物動画（対象物抽出既存モデル４の学習に用いた対象物動画とは異なるもの）の一部（たとえば開始数百フレームの動画）を取得する。これを、対象物抽出既存モデル４に与えて、対象物を抽出させて、抽出された対象物情報を得る。 The extracted object acquisition means 2 acquires a portion (e.g., the first few hundred frames) of the object video (different from the object video used to train the object extraction existing model 4). This is provided to the object extraction existing model 4 to extract the object and obtain extracted object information.

学習データ生成手段６は、対象物動画と対象物情報に基づいて、学習データを生成する。追加学習手段６は、生成された学習データに基づいて、対象物抽出既存モデル４を追加学習する。対象物抽出手段７は、対象物動画を対象物抽出新規モデル９に与えて、対象物動画から対象物を抽出する。 The training data generation means 6 generates training data based on the object video and object information. The additional training means 6 additionally trains the object extraction existing model 4 based on the generated training data. The object extraction means 7 provides the object video to the object extraction new model 9 to extract the object from the object video.

このように、学習データを人手によって作成することなく、対象物を抽出しようとする対象物動画についても精度良く対象物を抽出することのできる対象物抽出新規モデル９を得て、対象物を抽出することができる。
In this way, a new object extraction model 9 can be obtained that can accurately extract objects from object videos from which the objects are to be extracted, without having to manually create learning data, and the objects can be extracted.

3.2ハードウエア構成
ハードウエア構成は、第１の実施形態と同様である。ただし、ＳＳＤ１６には、新規モデル生成プログラム３２に代えて、対象物抽出プログラムが記録されている。
3.2 Hardware Configuration The hardware configuration is the same as that of the first embodiment, except that the SSD 16 stores an object extraction program instead of the new model generation program 32.

3.3対象物抽出処理
図１７、図１８に、対象物抽出プログラムのフローチャートを示す。ＣＰＵ１０は、選手を抽出しようとする動画の開始から所定フレーム（たとえば１０００フレーム）を取得する（ステップＳ１）。選手を抽出しようとする動画は、当然ではあるが、既存モデル３４を学習する際に用いた動画とは異なる動画である（異なる背景の場合や背景が同じで選手が異なる場合などを含む）。 17 and 18 show a flowchart of the object extraction program. The CPU 10 acquires a predetermined number of frames (for example, 1000 frames) from the start of the video from which players are to be extracted (step S1). The video from which players are to be extracted is, of course, different from the video used to train the existing model 34 (this includes cases where the background is different, or where the background is the same but the players are different).

次に、ＣＰＵ１０は、対象フレーム画像について、背景差分法により背景以外の物体画像を抽出する（ステップＳ４）。この処理は、第1の実施形態と同じである。 Next, the CPU 10 extracts object images other than the background from the target frame image using background subtraction (step S4). This process is the same as in the first embodiment.

上記のように、対象フレーム画像に基づいて学習データを生成すると、ＣＰＵ１０は、次のフレームを対象フレームとして、ステップＳ３～Ｓ８の処理を繰り返す（ステップＳ２、Ｓ９）。取得した全てのフレーム画像について処理を終えると、追加学習のための学習データが得られることになる（ステップＳ１０）。 Once learning data is generated based on the target frame image as described above, the CPU 10 repeats steps S3 to S8 with the next frame as the target frame (steps S2 and S9). Once processing has been completed for all acquired frame images, learning data for additional learning will be obtained (step S10).

生成した学習データは、既存モデルの学習データとは異なり、これから選手を抽出しようとする動画の一部を用いて生成したものである。したがって、この学習データによって、既存モデルを追加学習することで、新たな動画についても精度よく判定を行うことのできる新規モデルを得ることができる。 The generated training data differs from the training data of existing models in that it is generated using a portion of the video from which players are to be extracted. Therefore, by additionally training the existing model using this training data, it is possible to obtain a new model that can make accurate judgments on new videos as well.

ＣＰＵ１０は、このようにして生成した学習データ用いて、既存モデル３４を追加学習する（ステップＳ１１）。これにより、新たな動画にも対応して精度よく選手を検出できる新規モデルを得ることができる。 The CPU 10 uses the training data generated in this way to additionally train the existing model 34 (step S11). This makes it possible to obtain a new model that can accurately detect players even when using new videos.

次に、ＣＰＵ１０は、動画の全てのフレームを取得する（ステップＳ１２）。次に、上記で生成した新規モデルを用いて、動画の全フレームについて選手検出を行う（ステップＳ１４）。これにより、選手の追跡などを行うことができる。 Next, the CPU 10 acquires all frames of the video (step S12). Next, using the new model generated above, player detection is performed for all frames of the video (step S14). This makes it possible to track players, etc.

3.4その他
(1)上記実施形態では、動画の最初の部分のフレームを用いて学習データを生成している。しかし、動画の他の一部を用いて学習データを生成するようにしてもよい。 3.4 Other
(1) In the above embodiment, the training data is generated using frames from the beginning of the video. However, the training data may be generated using other parts of the video.

(2)上記実施形態では、動画の全フレームについて新規モデルによって選手検出を行うようにしている。しかし、動画の一部については既存モデルによる選手検出結果（学習データ作成の際の検出結果）を用いて、他の部分については新規モデルによる選手検出結果を用いるようにしてもよい。 (2) In the above embodiment, player detection is performed using a new model for all frames of the video. However, it is also possible to use player detection results using an existing model (detection results obtained when creating training data) for part of the video, and player detection results using a new model for other parts.

４．第３の実施形態
4.1機能構成
図１９に、第４の実施形態による新規モデル生成システムの機能構成を示す。端末側送信手段８６は、対象物画像（先の実施形態における異背景対象物画像に対応する）を、新規モデル生成サーバ装置９０に送信する。サーバ側受信手段６２はこれを受信する。学習データ生成手段６６は、この対象物画像に基づいて追加学習のための学習データを生成する。追加学習手段６８は、対象物抽出既存モデル７０を、この学習データによって追加学習し、対象物抽出新規モデル７６を得る。 4. Third embodiment
4.1 Functional Configuration Fig. 19 shows the functional configuration of a new model generation system according to the fourth embodiment. Terminal-side transmitting means 86 transmits an object image (corresponding to the different-background object image in the previous embodiment) to a new model generation server device 90. Server-side receiving means 62 receives this. Training data generating means 66 generates training data for additional training based on this object image. Additional training means 68 additionally trains the existing object extraction model 70 using this training data, thereby obtaining a new object extraction model 76.

サーバ側送信手段７４は、生成された対象物抽出新規モデル７６を、端末装置８０に送信する。端末側受信手段９４は、この対象物抽出新規モデル７６を受信する。 The server-side transmitting means 74 transmits the generated object extraction new model 76 to the terminal device 80. The terminal-side receiving means 94 receives this object extraction new model 76.

以上のようにして、任意の背景を持つ対象物画像について、精度よく対象物を抽出できるモデル７６を得ることができる。 In this way, a model 76 can be obtained that can accurately extract an object from an object image with any background.

なお、更新手段７８は、生成された対象物抽出新規モデル７６を、対象物抽出既存モデル７０として更新する。これにより、次に、対象物画像が送られてきたときには、追加学習された対象物抽出モデルに基づいて、さらなる追加学習を行うことができる。よって、新規モデル生成サーバ装置９０の利用が増えるとともに、ベースとなる対象物抽出既存モデル７０の精度が高くなり、これを追加学習したモデルの精度も高くなる。
The update means 78 updates the generated object extraction new model 76 as the object extraction existing model 70. As a result, the next time an object image is sent, further additional learning can be performed based on the additionally learned object extraction model. Therefore, as the use of the new model generation server device 90 increases, the accuracy of the base object extraction existing model 70 increases, and the accuracy of the model additionally learned from this increases as well.

4.2システム構成・ハードウエア構成
図２０に、新規モデル生成システムのシステム構成を示す。新規モデル生成サーバ装置９０は、インターネット上のサーバ装置として構築されている。端末装置８０ａ、８０ｂ・・・８０ｘは、インターネットを介して新規モデル生成サーバ装置９０と通信可能である。 4.2 System Configuration and Hardware Configuration Figure 20 shows the system configuration of the new model generation system. The new model generation server device 90 is constructed as a server device on the Internet. Terminal devices 80a, 80b, ..., 80x can communicate with the new model generation server device 90 via the Internet.

新規モデル生成サーバ装置９０のハードウエア構成は、第２の実施形態において示した図１３と同様である。ただし、ＳＳＤ１１４には、対象物抽出未学習モデル１２４に代えて、対象物抽出既存モデル７６が記録されている。
The hardware configuration of the new model generation server device 90 is the same as that shown in Fig. 13 in the second embodiment, except that the object extraction existing model 76 is recorded in the SSD 114 instead of the object extraction unlearned model 124.

3.3新規モデル生成処理
図２１、図２２に、端末プログラムのフローチャートと、サーバプログラムのフローチャートを示す。 3.3 New Model Generation Process FIGS. 21 and 22 show a flowchart of the terminal program and a flowchart of the server program.

新規モデル生成サーバ装置９０のＳＳＤ１１４には、ＹＯＬＯなどによって構築された対象物抽出既存モデル３４が記録されている。この対象物抽出既存モデル３４は、第１の実施形態と同じように、様々な背景にて撮像された選手の画像（たとえば図６Ａや図６Ｂのような画像）にて学習されたものである。 The SSD 114 of the new model generation server device 90 stores an existing object extraction model 34 constructed by YOLO or similar. As with the first embodiment, this existing object extraction model 34 has been trained using images of players captured against various backgrounds (for example, images like those in Figures 6A and 6B).

端末装置８０は、上記の学習に用いた背景とは異なる背景にて撮像された選手の連続画像（フレーム画像）を取得する（ステップＳ５１）。 The terminal device 80 acquires a series of images (frame images) of the player captured against a background different from the background used for the learning (step S51).

端末装置８０は、操作者の操作を受けて、この異背景対象物画像（フレーム画像群）を新規モデル生成サーバ装置６０に送信する（ステップＳ５３）。新規モデル生成サーバ装置９０は、このフレーム画像群を受信する（ステップＳ２１）。 In response to an operation by the operator, the terminal device 80 transmits this different background object image (frame image group) to the new model generation server device 60 (step S53). The new model generation server device 90 receives this frame image group (step S21).

次に、新規モデル生成サーバ装置９０は、受信した異背景フレーム画像のうちの先頭の１枚を対象フレーム画像とし、以下の処理を行う。記録されている対象物抽出既存モデルを用いて、対象フレーム画像について、選手の検出を行わせる（ステップＳ３）。これにより、フレーム画像上における、選手やコーチなどを取り囲むバウンダリーボックスの座標（左上と右下の点の座標）を得ることができる。第１の実施形態と同じようにして、フィールド内の選手のバウンダリーボックスのみを残す。これを画像として表示すると、図８に示すように、検出した選手をバウンダリーボックスで囲ったフレーム画像となる。 Next, the new model generation server device 90 selects the first of the received different background frame images as the target frame image and performs the following process. Using the recorded existing object extraction model, players are detected for the target frame image (step S3). This allows the coordinates (coordinates of the upper left and lower right points) of the boundary boxes surrounding players, coaches, etc. on the frame image to be obtained. In the same way as in the first embodiment, only the boundary boxes of the players on the field are left. When this is displayed as an image, the frame image will have the detected players surrounded by boundary boxes, as shown in Figure 8.

次に、新規モデル生成サーバ装置９０は、対象フレーム画像について、背景差分法により背景以外の物体画像を抽出する（ステップＳ４）。第１の実施形態と同じように、混合正規分布による動的背景差分法（ＭＯＧ２）を用いて背景差分画像を得るようにしている。また、Ｙ軸方向の長さによって小さなノイズを削除する方法も第１の実施形態と同じである。さらに、第１の実施形態と同じようにして、図９に示すように、フィールド内の選手の画像のみを得るようにしている。 Next, the new model generation server device 90 extracts object images other than the background from the target frame image using background subtraction (step S4). As in the first embodiment, a background subtraction image is obtained using dynamic background subtraction (MOG2) using a mixed normal distribution. The method of removing small noise based on the length in the Y-axis direction is also the same as in the first embodiment. Furthermore, as in the first embodiment, only images of the players on the field are obtained, as shown in Figure 9.

次に、新規モデル生成サーバ装置９０は、既存の選手検出モデルにて検出された選手のバウンダリーボックスがあるにも拘わらず、対応する位置に、差分として検出された物体画像がない箇所を選択する（ステップＳ５）。たとえば、図８の選手検出画像においては、バウンダリーボックス４０として選手が検出されているが、図９の背景差分画像においては、対応する位置に物体が現れていない。したがって、新規モデル生成サーバ装置９０は、このバウンダリーボックス４０を選択することになる。 Next, the new model generation server device 90 selects a location where there is a boundary box of a player detected by an existing player detection model, but where there is no object image detected as a difference at the corresponding position (step S5). For example, in the player detection image of Figure 8, a player is detected as boundary box 40, but in the background difference image of Figure 9, no object appears at the corresponding position. Therefore, the new model generation server device 90 selects this boundary box 40.

続いて、新規モデル生成サーバ装置９０は、選択したバウンダリーボックス４０を削除する（ステップＳ６）。 Next, the new model generation server device 90 deletes the selected boundary box 40 (step S6).

次に、新規モデル生成サーバ装置９０は、図９の背景差分画像に物体があるにも拘わらず、図８の選手検出画像においてバウンダリーボックスが現れていない箇所を選択する（ステップＳ７）。たとえば、図９の背景差分画像においては物体（選手）５０が現れているが、図８の選手検出画像においては対応する位置にバウンダリーボックスが現れていない。したがって、新規モデル生成サーバ装置９０は、選手５０を選択することになる。 Next, the new model generation server device 90 selects a location where an object is present in the background difference image of Figure 9 but no boundary box appears in the player detection image of Figure 8 (step S7). For example, object (player) 50 appears in the background difference image of Figure 9, but no boundary box appears in the corresponding position in the player detection image of Figure 8. Therefore, the new model generation server device 90 selects player 50.

新規モデル生成サーバ装置９０は、図１０に示すように、対象フレーム画像において選手５０を囲う選手領域６５を設定し、この領域を黒塗りする（ステップＳ８）。すなわち、その領域には選手はいなかったものとする。 The new model generation server device 90 sets a player area 65 surrounding the player 50 in the target frame image, as shown in Figure 10, and paints this area black (step S8). In other words, it is assumed that no player was present in that area.

上記のように、対象フレーム画像に基づいて学習データを生成すると、新規モデル生成サーバ装置９０は、次のフレームを対象フレームとして、ステップＳ３～Ｓ８の処理を繰り返す（ステップＳ２、Ｓ９）。全てのフレーム画像について処理を終えると、追加学習のための学習データが得られることになる（ステップＳ１０）。 Once training data is generated based on the target frame image as described above, the new model generation server device 90 repeats steps S3 to S8 with the next frame as the target frame (steps S2 and S9). Once processing has been completed for all frame images, training data for additional training will be obtained (step S10).

新規モデル生成サーバ装置９０は、記録されている既存モデルを追加学習する（ステップＳ１１）。これにより、新たな背景にも対応して精度よく選手を検出できる検出モデルを得ることができる。 The new model generation server device 90 additionally learns the existing recorded model (step S11). This makes it possible to obtain a detection model that can accurately detect players even in new backgrounds.

新規モデル生成サーバ装置９０は、このようにして既存モデルを追加学習して得られた新規モデルを、端末装置８０に送信する（ステップＳ２３）。 The new model generation server device 90 transmits the new model obtained by additionally learning the existing model in this manner to the terminal device 80 (step S23).

端末装置８０は、新規モデル生成サーバ装置９０から送信されてきた新規モデルを受信する（ステップＳ５４）。端末装置８０は、この新規モデルをＳＳＤ１６に記録する。 The terminal device 80 receives the new model transmitted from the new model generation server device 90 (step S54). The terminal device 80 records this new model on the SSD 16.

以上のようにして、所望の背景を有する対象物画像について、既存モデルをべースとして追加学習された新規モデルを得ることができる。
In this way, a new model can be obtained by additional learning based on an existing model for an object image having a desired background.

4.4その他
(1)上記実施形態では、生成された新規モデルを端末装置８０に送信するようにしている。しかし、端末装置８０の側において基本となるモデル（未学習や既学習のモデル）があれば、新規モデルの重みパラメータのみを送信するようにしてもよい。端末装置８０の側にて、新規モデルを構築できるからである。 4.4 Other
(1) In the above embodiment, the generated new model is transmitted to the terminal device 80. However, if a base model (an unlearned or learned model) exists on the terminal device 80 side, only the weight parameters of the new model may be transmitted. This is because the new model can be constructed on the terminal device 80 side.

(2)上記実施形態では、重みパラメータを追加学習するようにしている。しかし、ハイパーパラメータも追加学習するようにしてもよい。 (2) In the above embodiment, weight parameters are additionally learned. However, hyperparameters may also be additionally learned.

(3)上記実施形態においては、追加学習データの生成、追加学習の方法は、第１、第２の実施形態と同様であるものとした。しかし、その他の、追加学習の方法を用いるようにしてもよい。たとえば、端末装置８０において生成した学習データをサーバ装置９０に送信して、追加学習を行うようにしてもよい。 (3) In the above embodiment, the methods for generating additional learning data and performing additional learning are the same as those in the first and second embodiments. However, other methods for additional learning may be used. For example, learning data generated in the terminal device 80 may be transmitted to the server device 90 for additional learning.

(4)上記変形例は、その本質に反しない限り、他の実施形態においても適用することができる。

(4) The above modifications can be applied to other embodiments as long as they do not contradict the essence of the embodiment.

Claims

1. An apparatus for generating a new object extraction model that can extract an object from a different background object image having at least one object in a background different from an existing object extraction model that has been trained to extract an object from an object image having at least one object in the background, the new object extraction model comprising:
an extracted object acquisition means for providing the different background object image to the object extraction existing model to acquire information on the extracted object;
a learning data generating means for generating learning data based on the different background object image and information on the object extracted by the extracted object obtaining means;
additional learning means for additionally learning the existing object extraction model based on the generated learning data to generate a new object extraction model;
A novel model generating device comprising:
The learning data generation means
an acquisition means for acquiring information on the different background object image and the object extracted by the extracted object acquisition means;
a non-background extraction means for extracting a non-background portion that is not a background from the different background object image by a background subtraction method;
a false detection handling means for excluding information on objects that do not have corresponding non-background portions from the learning data, among the objects extracted by the extracted object obtaining means;
A novel model generating device comprising:

A new model generation program for realizing, by a computer, an apparatus for generating a new object extraction model capable of extracting an object from a different background object image having at least one object in a background different from an existing object extraction model trained to extract an object from an object image having at least one object in the background, the new model generation program comprising:
an extracted object acquisition means for providing the different background object image to the object extraction existing model to acquire information on the extracted object;
a learning data generating means for generating learning data based on the different background object image and information on the object extracted by the extracted object obtaining means;
a new model generation program for functioning as additional learning means for additionally learning the existing object extraction model based on the generated learning data and generating a new object extraction model,
The learning data generation means
an acquisition means for acquiring information on the different background object image and the object extracted by the extracted object acquisition means;
a non-background extraction means for extracting a non-background portion that is not a background from the different background object image by a background subtraction method;
a false detection handling means for excluding information on objects that do not have corresponding non-background portions from the learning data, among the objects extracted by the extracted object obtaining means;
A new model generation program comprising:

1. A training data generation device that generates training data for additional training based on an existing object extraction model that has been trained to extract an object from an object image having at least one object in a background,
an extracted object acquisition means for providing a different background object image having at least one object in a background different from the background of the object image used to train the object extraction existing model to the object extraction existing model, and acquiring information about the extracted object;
a learning data generating means for generating learning data based on the different background object image and information on the object extracted by the extracted object obtaining means;
A training data generation device comprising:
The learning data generation means
an acquisition means for acquiring information on the different background object image and the object extracted by the extracted object acquisition means;
a non-background extraction means for extracting a non-background portion that is not a background from the different background object image by a background subtraction method;
a false detection handling means for excluding information on objects that do not have corresponding non-background portions from the learning data, among the objects extracted by the extracted object obtaining means;
A training data generation device comprising:

A training data generation program for realizing a training data generation device that generates training data for additional learning based on an existing object extraction model that has been trained to extract an object from an object image having at least one object in a background, the program comprising:
an extracted object acquisition means for providing a different background object image having at least one object in a background different from the background of the object image used to train the object extraction existing model to the object extraction existing model, and acquiring information about the extracted object;
a learning data generation program for causing the computer to function as learning data generation means for generating learning data based on the different background object image and information about the object extracted by the extracted object acquisition means,
The learning data generation means
an acquisition means for acquiring information on the different background object image and the object extracted by the extracted object acquisition means;
a non-background extraction means for extracting a non-background portion that is not a background from the different background object image by a background subtraction method;
a false detection handling means for excluding information on objects that do not have corresponding non-background portions from the learning data, among the objects extracted by the extracted object obtaining means;
A learning data generation program comprising:

1. An apparatus for generating a new object extraction model that can extract an object from a different background object image having at least one object in a background different from an existing object extraction model that has been trained to extract an object from an object image having at least one object in the background, the new object extraction model comprising:
an extracted object acquisition means for providing the different background object image to the object extraction existing model to acquire information on the extracted object;
a learning data generating means for generating learning data based on the different background object image and information on the object extracted by the extracted object obtaining means;
additional learning means for additionally learning the existing object extraction model based on the generated learning data to generate a new object extraction model;
A novel model generating device comprising:
The learning data generation means
an acquisition means for acquiring information on the different background object image and the object extracted by the extracted object acquisition means;
a non-background extraction means for extracting a non-background portion that is not a background from the different background object image by a background subtraction method;
a detection omission response means for processing the area in the different background object image as a featureless image and using the image as learning data if there is an area in the non-background portion where the object is not extracted by the extraction object acquisition means;
A novel model generating device comprising:

A new model generation program for realizing, by a computer, an apparatus for generating a new object extraction model capable of extracting an object from a different background object image having at least one object in a background different from an existing object extraction model trained to extract an object from an object image having at least one object in the background, the new model generation program comprising:
an extracted object acquisition means for providing the different background object image to the object extraction existing model to acquire information on the extracted object;
a learning data generating means for generating learning data based on the different background object image and information on the object extracted by the extracted object obtaining means;
a new model generation program for functioning as additional learning means for additionally learning the existing object extraction model based on the generated learning data and generating a new object extraction model,
The learning data generation means
an acquisition means for acquiring information on the different background object image and the object extracted by the extracted object acquisition means;
a non-background extraction means for extracting a non-background portion that is not a background from the different background object image by a background subtraction method;
a detection omission response means for processing the area in the different background object image as a featureless image and using the image as learning data if there is an area in the non-background portion where the object is not extracted by the extraction object acquisition means;
A new model generation program comprising:

1. A training data generation device that generates training data for additional training based on an existing object extraction model that has been trained to extract an object from an object image having at least one object in a background,
an extracted object acquisition means for providing a different background object image having at least one object in a background different from the background of the object image used to train the object extraction existing model to the object extraction existing model, and acquiring information about the extracted object;
a learning data generating means for generating learning data based on the different background object image and information on the object extracted by the extracted object obtaining means;
A training data generation device comprising:
The learning data generation means
an acquisition means for acquiring information on the different background object image and the object extracted by the extracted object acquisition means;
a non-background extraction means for extracting a non-background portion that is not a background from the different background object image by a background subtraction method;
a detection omission response means for processing the area in the different background object image as a featureless image and using the image as learning data if there is an area in the non-background portion where the object is not extracted by the extraction object acquisition means;
A training data generation device comprising:

A training data generation program for realizing a training data generation device that generates training data for additional learning based on an existing object extraction model that has been trained to extract an object from an object image having at least one object in a background, the program comprising:
an extracted object acquisition means for providing a different background object image having at least one object in a background different from the background of the object image used to train the object extraction existing model to the object extraction existing model, and acquiring information about the extracted object;
a learning data generation program for causing the computer to function as learning data generation means for generating learning data based on the different background object image and information about the object extracted by the extracted object acquisition means,
The learning data generation means
an acquisition means for acquiring information on the different background object image and the object extracted by the extracted object acquisition means;
a non-background extraction means for extracting a non-background portion that is not a background from the different background object image by a background subtraction method;
a detection omission response means for processing the area in the different background object image as a featureless image and using the image as learning data if there is an area in the non-background portion where the object is not extracted by the extraction object acquisition means;
A learning data generation program comprising:

In the new model generation device or the training data generation device according to any one of claims 1, 3, 5 and 7,
The new model generation device or the training data generation device is configured as a server device that can be accessed from a terminal device via a network.

In the device or program according to any one of claims 1 to 9,
The object is a moving object including one of an athlete, a pedestrian, a car, an animal, and a ball.

A new model generation system comprising a new model generation server device and a terminal device configured to be able to communicate with the new model generation server device, and capable of extracting an object from a different background object image having at least one object in a background different from the background, based on an existing object extraction model trained to extract an object from an object image having at least one object in the background,
The new model generation server device
a server-side receiving means for receiving the different background object image and at least weight parameters of the object extraction existing model transmitted from the terminal device;
an extracted object acquisition means for providing the different background object image to the object extraction existing model to acquire information on the extracted object;
a learning data generating means for generating learning data based on the different background object image and information on the object extracted by the extracted object obtaining means;
additional learning means for additionally learning at least weight parameters of the existing object extraction model using the learning data to generate at least weight parameters of a new object extraction model;
a server-side transmitting means for transmitting at least weight parameters of the generated new object extraction model to the terminal device;
The terminal device
an acquisition means for acquiring at least weight parameters of the recorded object extraction existing model;
a terminal-side transmitting means for transmitting the different background object image and at least weight parameters of the object-extraction existing model to the new model generation server device;
a terminal-side receiving means for receiving at least the weight parameters of the new object extraction model transmitted from the server device;
a new model generating means for generating a new object extraction model by replacing the existing object extraction model with at least the received weight parameters;
In a novel model generation system comprising:
The learning data generation means
an acquisition means for acquiring information on the different background object image and the object extracted by the extracted object acquisition means;
a non-background extraction means for extracting a non-background portion that is not a background from the different background object image by a background subtraction method;
a false detection handling means for excluding information on objects that do not have corresponding non-background portions from the learning data, among the objects extracted by the extracted object obtaining means;
A novel model generation system comprising:

A server device that generates at least weight parameters of a new object extraction model that can extract an object from a different background object image having at least one object in a background different from an existing object extraction model that has been trained to extract an object from an object image having at least one object in the background,
a server-side receiving means for receiving the different background object image and at least weight parameters of the object extraction existing model transmitted from a terminal device ;
an extracted object acquisition means for providing the different background object image to the object extraction existing model to acquire information on the extracted object;
a learning data generating means for generating learning data based on the different background object image and information on the object extracted by the extracted object obtaining means;
additional learning means for additionally learning at least weight parameters of the existing object extraction model using the learning data to generate at least weight parameters of a new object extraction model;
a server-side transmitting means for transmitting at least weight parameters of the generated new object extraction model to the terminal device;
In a parameter generation server device comprising:
The learning data generation means
an acquisition means for acquiring information on the different background object image and the object extracted by the extracted object acquisition means;
a non-background extraction means for extracting a non-background portion that is not a background from the different background object image by a background subtraction method;
a false detection handling means for excluding information on objects that do not have corresponding non-background portions from the learning data, among the objects extracted by the extracted object obtaining means;
A parameter generation server device comprising:

A terminal device constituting a new model generation system includes a new model generation server device and a terminal device configured to be able to communicate with the new model generation server device, and is capable of extracting an object from a different background object image having at least one object in a background different from the background, based on an existing object extraction model trained to extract an object from an object image having at least one object in the background,
an acquisition means for acquiring at least weight parameters of the recorded object extraction existing model;
a terminal-side transmitting means for transmitting the different background object image and at least weight parameters of the object-extraction existing model to the new model generation server device;
a terminal-side receiving means for receiving at least weight parameters of a new model for object extraction transmitted from a new model generation server device that has additionally learned at least weight parameters of the existing model for object extraction using learning data generated based on information of the object extracted from the different background object image by the existing model for object extraction;
a new model generating means for generating a new object extraction model by replacing the existing object extraction model with at least the received weight parameters;
In a terminal device comprising:
The new model generation server device
an acquisition means for acquiring information on the different background object image and the object extracted by the extracted object acquisition means;
a non-background extraction means for extracting a non-background portion that is not a background from the different background object image by a background subtraction method;
a false detection handling means for excluding information on objects that do not have corresponding non-background portions from the learning data, among the objects extracted by the extracted object obtaining means;
A terminal device comprising:

A terminal program for realizing, by a computer, the terminal device constituting a new model generation system that includes a new model generation server device and a terminal device configured to be able to communicate with the new model generation server device, and that is capable of extracting an object from a different background object image having at least one object in a background different from the background based on an existing object extraction model that has been trained to extract an object from an object image having at least one object in the background, the terminal program comprising:
an acquisition means for acquiring at least weight parameters of the recorded object extraction existing model;
a terminal-side transmitting means for transmitting the different background object image and at least weight parameters of the object-extraction existing model to the new model generation server device;
a terminal-side receiving means for receiving at least weight parameters of a new model for object extraction transmitted from a new model generation server device that has additionally learned at least weight parameters of the existing model for object extraction using learning data generated based on information of the object extracted from the different background object image by the existing model for object extraction;
a terminal program for causing the terminal to function as new model generation means for generating a new object extraction model by replacing the existing object extraction model with at least the received weight parameters,
The new model generation server device
an acquisition means for acquiring information on the different background object image and the object extracted by the extracted object acquisition means;
a non-background extraction means for extracting a non-background portion that is not a background from the different background object image by a background subtraction method;
a false detection handling means for excluding information on objects that do not have corresponding non-background portions from the learning data, among the objects extracted by the extracted object obtaining means;
A terminal program comprising:

A new model generation system comprising a new model generation server device and a terminal device configured to be able to communicate with the new model generation server device, and capable of extracting an object from a different background object image having at least one object in a background different from the background, based on an existing object extraction model trained to extract an object from an object image having at least one object in the background,
The new model generation server device
a server-side receiving means for receiving the different background object image and at least weight parameters of the object extraction existing model transmitted from the terminal device;
an extracted object acquisition means for providing the different background object image to the object extraction existing model to acquire information on the extracted object;
a learning data generating means for generating learning data based on the different background object image and information on the object extracted by the extracted object obtaining means;
additional learning means for additionally learning at least weight parameters of the existing object extraction model using the learning data to generate at least weight parameters of a new object extraction model;
a server-side transmitting means for transmitting at least weight parameters of the generated new object extraction model to the terminal device;
The terminal device
an acquisition means for acquiring at least weight parameters of the recorded object extraction existing model;
a terminal-side transmitting means for transmitting the different background object image and at least weight parameters of the object-extraction existing model to the new model generation server device;
a terminal-side receiving means for receiving at least the weight parameters of the new object extraction model transmitted from the server device;
a new model generating means for generating a new object extraction model by replacing the existing object extraction model with at least the received weight parameters;
In a novel model generation system comprising:
The learning data generation means
an acquisition means for acquiring information on the different background object image and the object extracted by the extracted object acquisition means;
a non-background extraction means for extracting a non-background portion that is not a background from the different background object image by a background subtraction method;
a detection omission response means for processing the area in the different background object image as a featureless image and using the image as learning data if there is an area in the non-background portion where the object is not extracted by the extraction object acquisition means;
A novel model generation system comprising:

A server device that generates at least weight parameters of a new object extraction model that can extract an object from a different background object image having at least one object in a background different from an existing object extraction model that has been trained to extract an object from an object image having at least one object in the background,
a server-side receiving means for receiving the different background object image and at least weight parameters of the object extraction existing model transmitted from a terminal device;
an extracted object acquisition means for providing the different background object image to the object extraction existing model to acquire information on the extracted object;
a learning data generating means for generating learning data based on the different background object image and information on the object extracted by the extracted object obtaining means;
additional learning means for additionally learning at least weight parameters of the existing object extraction model using the learning data to generate at least weight parameters of a new object extraction model;
a server-side transmitting means for transmitting at least weight parameters of the generated new object extraction model to the terminal device;
In a parameter generation server device comprising:
The learning data generation means
an acquisition means for acquiring information on the different background object image and the object extracted by the extracted object acquisition means;
a non-background extraction means for extracting a non-background portion that is not a background from the different background object image by a background subtraction method;
a detection omission response means for processing the area in the different background object image as a featureless image and using the image as learning data if there is an area in the non-background portion where the object is not extracted by the extraction object acquisition means;
A parameter generation server device comprising:

A terminal device constituting a new model generation system includes a new model generation server device and a terminal device configured to be able to communicate with the new model generation server device, and is capable of extracting an object from a different background object image having at least one object in a background different from the background, based on an existing object extraction model trained to extract an object from an object image having at least one object in the background,
an acquisition means for acquiring at least weight parameters of the recorded object extraction existing model;
a terminal-side transmitting means for transmitting the different background object image and at least weight parameters of the object-extraction existing model to the new model generation server device;
a terminal-side receiving means for receiving at least weight parameters of a new model for object extraction transmitted from a new model generation server device that has additionally learned at least weight parameters of the existing model for object extraction using learning data generated based on information of the object extracted from the different background object image by the existing model for object extraction;
a new model generating means for generating a new object extraction model by replacing the existing object extraction model with at least the received weight parameters;
In a terminal device comprising:
The new model generation server device
an acquisition means for acquiring information on the different background object image and the object extracted by the extracted object acquisition means;
a non-background extraction means for extracting a non-background portion that is not a background from the different background object image by a background subtraction method;
a detection omission response means for processing the area in the different background object image as a featureless image and using the image as learning data if there is an area in the non-background portion where the object is not extracted by the extraction object acquisition means;
A new model generation server device comprising:

A terminal program for realizing, by a computer, the terminal device constituting a new model generation system that includes a new model generation server device and a terminal device configured to be able to communicate with the new model generation server device, and that is capable of extracting an object from a different background object image having at least one object in a background different from the background based on an existing object extraction model that has been trained to extract an object from an object image having at least one object in the background, the terminal program comprising:
an acquisition means for acquiring at least weight parameters of the recorded object extraction existing model;
a terminal-side transmitting means for transmitting the different background object image and at least weight parameters of the object-extraction existing model to the new model generation server device;
a terminal-side receiving means for receiving at least weight parameters of a new model for object extraction transmitted from a new model generation server device that has additionally learned at least weight parameters of the existing model for object extraction using learning data generated based on information of the object extracted from the different background object image by the existing model for object extraction;
a terminal program for causing the terminal to function as new model generation means for generating a new object extraction model by replacing the existing object extraction model with at least the received weight parameters,
The new model generation server device
an acquisition means for acquiring information on the different background object image and the object extracted by the extracted object acquisition means;
a non-background extraction means for extracting a non-background portion that is not a background from the different background object image by a background subtraction method;
a detection omission response means for processing the area in the different background object image as a featureless image and using the image as learning data if there is an area in the non-background portion where the object is not extracted by the extraction object acquisition means;
A terminal program comprising: