JP7464509B2

JP7464509B2 - OBJECT DETECTION DEVICE, OBJECT DETECTION SYSTEM AND OBJECT DETECTION METHOD

Info

Publication number: JP7464509B2
Application number: JP2020194858A
Authority: JP
Inventors: 紫薇 ▲とう▼; 全孔; 直人秋良; 智明吉永
Original assignee: Hitachi Solutions Ltd
Current assignee: Hitachi Solutions Ltd
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2024-04-09
Anticipated expiration: 2040-11-25
Also published as: JP2022083513A

Description

本開示は、オブジェクト検出装置、オブジェクト検出システム及びオブジェクト検出方法に関する。 The present disclosure relates to an object detection device, an object detection system, and an object detection method.

近年、ＩＴ化の進展に伴い、社会に多数のセンサが配置され、極めて大量のデータが蓄積されている。そうした中、集積された画像データを活用する様々な方策が検討されている。特に、写真、動画、画像等の映像コンテンツが増える中、その映像におけるオブジェクトを自在に特定し、オブジェクトのカテゴリーや位置を正確に検出するニーズが高まっている。 In recent years, with the advancement of IT, numerous sensors have been installed in society and an extremely large amount of data has been accumulated. In this context, various methods for utilizing accumulated image data are being considered. In particular, as the amount of video content, such as photographs, videos, and images, increases, there is a growing need to freely identify objects in the video and accurately detect the object category and position.

オブジェクト検出手段の一つとして、いわゆる深層ニューラルネットワーク（ＤｅｅｐＮｅｕｒａｌＮｅｔｗｏｒｋ；ＤＮＮ）が知られている。ＤＮＮの発展により、オブジェクト検出は、例えばＸ線画像解析等、様々な場面で活用することが可能となった。 One of the object detection methods is the so-called deep neural network (DNN). With the development of DNN, object detection can now be used in a variety of situations, such as X-ray image analysis.

しかし、ＤＮＮで高精度のオブジェクト検出を実現するためには、ＤＮＮを訓練するためのラベル付き学習データが大量に必要となり、ＤＮＮを応用する場面によっては、このようなラベル付き学習データを入手することが困難な場合や、膨大なコストや労力を要する場合がある。そのため、入手が比較的に容易なドメインの学習データは大量に入手されるのに対して、入手が困難なドメインの学習データは少量となり、学習データが少ないドメインでのオブジェクト検出精度が限定される。 However, to achieve highly accurate object detection using a DNN, a large amount of labeled learning data is required to train the DNN. Depending on the application of the DNN, it may be difficult to obtain such labeled learning data or it may require enormous cost and effort. As a result, while there is a large amount of learning data available for domains that are relatively easy to obtain, there is only a small amount of learning data available for domains that are difficult to obtain, and the accuracy of object detection in domains with little training data is limited.

上記の課題を解決するために、入手しやすいドメインのラベル付き学習データを活用し、ＤＮＮを訓練する手段として、例えば特開２０１９－０３２８２１号公報（特許文献１）がある。 To solve the above problems, for example, JP 2019-032821 A (Patent Document 1) discloses a method for training a DNN using easily available labeled learning data from a domain.

特許文献１には「コストがかかり、退屈であり、ミスを起こしやすい手作業による訓練データのラベル付けの必要性を減らす方法を提供する。方法は、対象カメラにより撮影された画像を画風目標画像として用い、あらゆる写実的な入力画像を変換後画像に変換する画風変換を行う画風変換ネットワークを訓練する。変換後画像は、入力画像の内容を有し、入力画像の写実品質が維持されており、画風目標画像の画風と一致する画風である。訓練済みの画風変換ネットワークを用いて、原訓練データセットの訓練画像を変換後訓練画像に変換し、各変換後訓練画像を原訓練データセットの対応する訓練画像の訓練ラベルでラベル付けして、水増し訓練データセットを作成し、水増し訓練データセットを用いて、特定の作業を行うようにディープニューラルネットワーク（ＤＮＮ）を訓練する」技術が記載されている。 Patent document 1 describes a technology that "provides a method for reducing the need for costly, tedious, and error-prone manual labeling of training data. The method uses an image captured by a target camera as a style target image to train a style transfer network that performs style transfer to convert any photo-realistic input image into a transformed image. The transformed image has the content of the input image, maintains the photo-realistic quality of the input image, and is of a style that matches the style of the style target image. The trained style transfer network is used to convert training images of an original training dataset into transformed training images, and each transformed training image is labeled with the training label of a corresponding training image in the original training dataset to create an augmented training dataset, and the augmented training dataset is used to train a deep neural network (DNN) to perform a specific task."

特開２０１９－０３２８２１号公報JP 2019-032821 A

特許文献１では、画風変換ネットワークを用いて、特定のソースドメインに対応するラベル付き画像をターゲットのドメイン（特許文献１に記載の「目標画風」）に変換することで、当該ターゲットドメインに対応するラベル付き画像を取得し、このように取得した画像データを用いてＤＮＮを訓練することが記載されている。 Patent document 1 describes a method of using a style transfer network to convert labeled images corresponding to a specific source domain into a target domain (the "target style" described in patent document 1), thereby obtaining labeled images corresponding to the target domain, and training a DNN using the image data obtained in this way.

しかし、特許文献１は、ターゲットドメインターゲットドメインに変換した疑似ターゲットドメインの画像（特許文献１に記載の「画風目標画像」）と、実際のターゲットドメインとの間で良好な近似精度が実現できることを前提としており、Ｘ線画像等のような、ソースドメインとターゲットドメインとの差が大きい（いわゆる「ドメインギャップ」が存在する）画像の場合、ターゲットドメインに変換した疑似ターゲットドメインの画像のラベルを用いてＤＮＮを訓練したとしても、ターゲットドメインの画像に対するオブジェクト検出精度が限定されてしまう。 However, Patent Document 1 is based on the premise that good approximation accuracy can be achieved between the pseudo target domain image (the "style target image" described in Patent Document 1) converted to the target domain and the actual target domain. In the case of images such as X-ray images where there is a large difference between the source domain and the target domain (so-called "domain gap" exists), even if a DNN is trained using the labels of the pseudo target domain image converted to the target domain, the object detection accuracy for the target domain image is limited.

そこで、本開示は、入手しやすいドメインのラベル付き学習データを活用し、ソースドメインとターゲットドメインとのドメインギャップを短縮させた上でＤＮＮを訓練する。これによって、本開示は、Ｘ線画像等のような、ソースドメインとターゲットドメインとの差が大きい画像の場合であっても、高精度なオブジェクト検出手段を提供することを目的とする。 Therefore, the present disclosure utilizes easily available labeled learning data from domains to train a DNN after shortening the domain gap between the source domain and the target domain. In this way, the present disclosure aims to provide a highly accurate object detection means even for images with a large difference between the source domain and the target domain, such as X-ray images.

上記の課題を解決するために、代表的な本開示のオブジェクト検出装置の一つは、Ｘ線画像におけるオブジェクトを検出するためのオブジェクト検出装置であって、ソースドメインに対応するソースドメイン画像と、ターゲットドメインに対応するターゲットドメイン画像とを含む入力画像セットを受け付ける画像入力部と、前記入力画像セットに対するドメイン変換処理を行い、前記ソースドメイン画像を前記ターゲットドメインに変換した疑似ターゲットドメイン画像と、前記ターゲットドメイン画像を前記ソースドメイン画像に変換した疑似ソースドメイン画像とを含む変換画像セットを生成するドメイン変換部と、前記入力画像セットと、前記変換画像セットとで画像ペアを生成するペア生成部と、前記画像ペアに含まれる各画像について、特徴マップを抽出する特徴抽出部と、前記特徴マップに基づいて、前記画像ペアに含まれる各画像におけるオブジェクトのカテゴリー及び位置を示す予測結果を生成する検出予測部と、所定のＸ線画像を解析することで、前記Ｘ線画像におけるオブジェクトのカテゴリー及び位置を示す検出結果を生成するオブジェクト検出部とを含む。 In order to solve the above problem, one representative object detection device of the present disclosure is an object detection device for detecting objects in X-ray images, and includes an image input unit that accepts an input image set including a source domain image corresponding to a source domain and a target domain image corresponding to a target domain, a domain conversion unit that performs a domain conversion process on the input image set to generate a converted image set including a pseudo target domain image obtained by converting the source domain image to the target domain and a pseudo source domain image obtained by converting the target domain image to the source domain image, a pair generation unit that generates image pairs using the input image set and the converted image set, a feature extraction unit that extracts a feature map for each image included in the image pair, a detection prediction unit that generates a prediction result indicating the category and position of an object in each image included in the image pair based on the feature map, and an object detection unit that analyzes a specified X-ray image to generate a detection result indicating the category and position of an object in the X-ray image.

本開示によれば、入手しやすいドメインのラベル付き学習データを活用し、ソースドメインとターゲットドメインとのドメインギャップを短縮させた上でＤＮＮを訓練することで、Ｘ線画像等のような、ソースドメインとターゲットドメインとの差が大きい画像の場合であっても、高精度なオブジェクト検出手段を提供することができる。
上記以外の課題、構成及び効果は、以下の発明を実施するための形態における説明により明らかにされる。 According to the present disclosure, by utilizing labeled learning data from easily available domains and shortening the domain gap between the source domain and the target domain before training a DNN, it is possible to provide a highly accurate object detection means even in the case of images with large differences between the source domain and the target domain, such as X-ray images.
Other objects, configurations and effects will become apparent from the following description of the preferred embodiment of the invention.

図１は、本開示の実施形態を実施するためのコンピュータシステムを示す図である。FIG. 1 illustrates a computer system for implementing an embodiment of the present disclosure. 図２は、本開示の実施形態に係るオブジェクト検出システムの構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of a configuration of an object detection system according to an embodiment of the present disclosure. 図３は、本開示の実施形態に係るＸ線画像の一例を示す図である。FIG. 3 is a diagram illustrating an example of an X-ray image according to an embodiment of the present disclosure. 図４は、本開示の実施形態に係るオブジェクト検出装置におけるオブジェクト検出学習部の論理構成を示す図である。FIG. 4 is a diagram illustrating a logical configuration of an object detection learning unit in the object detection device according to an embodiment of the present disclosure. 図５は、本開示の実施形態に係るドメイン変換部によるドメイン変換処理の一例を示す図である。FIG. 5 is a diagram illustrating an example of a domain conversion process performed by the domain conversion unit according to an embodiment of the present disclosure. 図６は、本開示の実施形態に係る画像ペア生成部による画像ペア生成処理の一例を示す図である。FIG. 6 is a diagram illustrating an example of an image pair generation process performed by the image pair generation unit according to an embodiment of the present disclosure. 図７は、本開示の実施形態に係るドメインギャップ短縮の一例を示す図である。FIG. 7 illustrates an example of domain gap shortening according to an embodiment of the present disclosure. 図８は、本開示の実施形態に係るオブジェクト検出部訓練方法の一例を示す図である。FIG. 8 is a diagram illustrating an example method for training an object detector according to an embodiment of the present disclosure. 図９は、本開示の実施形態に係るオブジェクト検出処理の一例を示す図である。FIG. 9 is a diagram illustrating an example of an object detection process according to an embodiment of the present disclosure.

以下、図面を参照して、本開示の実施形態について説明する。なお、この実施形態により本開示が限定されるものではない。また、図面の記載において、同一部分には同一の符号を付して示している。 Hereinafter, an embodiment of the present disclosure will be described with reference to the drawings. Note that the present disclosure is not limited to this embodiment. In addition, in the description of the drawings, the same parts are denoted by the same reference numerals.

上述したように、ＤＮＮを訓練するためのラベル付き学習データが大量に必要となり、ＤＮＮを応用する場面によっては、このようなラベル付き学習データを入手することが困難な場合や、膨大なコストや労力を要する場合がある。そのため、入手が比較的に容易なドメインの学習データは大量に入手されるのに対して、入手が困難なドメインの学習データは少量となり、学習データが少ないドメインでのオブジェクト検出精度が限定される。
従って、入手が困難なターゲットドメインの学習データを収集せずに、入手しやすいドメイン（以下、「ソースドメイン」という；英：「ｓｏｕｒｃｅｄｏｍａｉｎ」）のラベル付き学習データのみを活用し、任意のターゲットドメイン（英：「ｔａｒｇｅｔｄｏｍａｉｎ」）に対して高精度のオブジェクト検出結果を生成できるＤＮＮを訓練する手段があれば望ましい。 As described above, a large amount of labeled learning data is required to train a DNN, and depending on the application of the DNN, it may be difficult to obtain such labeled learning data or it may require huge costs and labor. Therefore, while a large amount of learning data is available for domains that are relatively easy to obtain, the amount of learning data for domains that are difficult to obtain is small, and the object detection accuracy in domains with little learning data is limited.
It would therefore be desirable to have a means to train a DNN that can generate highly accurate object detection results for any target domain (English: “target domain”) using only labeled training data from a domain that is readily available (hereafter referred to as the “source domain”), without collecting training data from the target domain that is difficult to obtain.

ここでの「ドメイン」とは、画像の表示を規定するパラメータの集合を意味する。例えば、画像の色、鮮鋭度（シャープネス）、解像度、明るさ、コントラスト等の各種表示設定は、画像のドメインを規定するパラメータとなる。つまり、ドメインとは、データの集まりを意味するものである。
ソースドメインに対応するソースドメイン画像と、ターゲットドメインに対応するターゲットドメイン画像とで、上述した表示設定のパラメータが大きく相違すると、ソースドメインとターゲットドメインとの間でいわゆる「ドメインギャップ」が存在する。 Here, "domain" refers to a set of parameters that define the display of an image. For example, various display settings such as color, sharpness, resolution, brightness, and contrast of an image are parameters that define the domain of an image. In other words, a domain refers to a collection of data.
When the display setting parameters described above differ significantly between a source domain image corresponding to the source domain and a target domain image corresponding to the target domain, a so-called "domain gap" exists between the source domain and the target domain.

このようなドメインギャップが存在すると、例えばオブジェクト検出用のＤＮＮがソースドメインのラベル付きデータによって訓練されたとしても、ラベル無しのドメインであるターゲットドメインの画像に対しては高精度のオブジェクト検出結果を生成することができない。 When such a domain gap exists, for example, even if a DNN for object detection is trained with labeled data in the source domain, it cannot generate highly accurate object detection results for images in the target domain, which is an unlabeled domain.

そこで、上記の課題を鑑み、本開示の実施形態では、ラベル付きのソースドメイン画像及びラベル無しのターゲットドメイン画像を含む入力画像セットに加えて、ソースドメイン画像をターゲットドメインに変換した疑似ターゲットドメイン画像と、ターゲットドメイン画像をソースドメインに変換した疑似ソースドメイン画像とを含む変換画像セット用いてオブジェクト検出用のＤＮＮを訓練する。これによって、本開示の実施形態では、ソースドメインとターゲットドメインとのドメインギャップを短縮すると共に、ターゲットドメインの画像についても高精度のオブジェクト検出結果を生成することができる。 In view of the above problems, in an embodiment of the present disclosure, in addition to an input image set including labeled source domain images and unlabeled target domain images, a DNN for object detection is trained using a transformed image set including pseudo target domain images in which source domain images are transformed into the target domain, and pseudo source domain images in which target domain images are transformed into the source domain. As a result, in an embodiment of the present disclosure, it is possible to shorten the domain gap between the source domain and the target domain, and to generate highly accurate object detection results for images in the target domain.

次に、図１を参照して、本開示の実施形態を実施するためのコンピュータシステム３００について説明する。本明細書で開示される様々な実施形態の機構及び装置は、任意の適切なコンピューティングシステムに適用されてもよい。コンピュータシステム３００の主要コンポーネントは、１つ以上のプロセッサ３０２、メモリ３０４、端末インターフェース３１２、ストレージインタフェース３１４、Ｉ／Ｏ（入出力）デバイスインタフェース３１６、及びネットワークインターフェース３１８を含む。これらのコンポーネントは、メモリバス３０６、Ｉ／Ｏバス３０８、バスインターフェースユニット３０９、及びＩ／Ｏバスインターフェースユニット３１０を介して、相互的に接続されてもよい。 Next, referring to FIG. 1, a computer system 300 for implementing an embodiment of the present disclosure will be described. The mechanisms and devices of the various embodiments disclosed herein may be applied to any suitable computing system. The main components of the computer system 300 include one or more processors 302, memory 304, a terminal interface 312, a storage interface 314, an I/O (input/output) device interface 316, and a network interface 318. These components may be interconnected via a memory bus 306, an I/O bus 308, a bus interface unit 309, and an I/O bus interface unit 310.

コンピュータシステム３００は、プロセッサ３０２と総称される１つ又は複数の汎用プログラマブル中央処理装置（ＣＰＵ）３０２Ａ及び３０２Ｂを含んでもよい。ある実施形態では、コンピュータシステム３００は複数のプロセッサを備えてもよく、また別の実施形態では、コンピュータシステム３００は単一のＣＰＵシステムであってもよい。各プロセッサ３０２は、メモリ３０４に格納された命令を実行し、オンボードキャッシュを含んでもよい。 Computer system 300 may include one or more general purpose programmable central processing units (CPUs) 302A and 302B, collectively referred to as processors 302. In some embodiments, computer system 300 may include multiple processors, and in other embodiments, computer system 300 may be a single CPU system. Each processor 302 executes instructions stored in memory 304 and may include an on-board cache.

ある実施形態では、メモリ３０４は、データ及びプログラムを記憶するためのランダムアクセス半導体メモリ、記憶装置、又は記憶媒体（揮発性又は不揮発性のいずれか）を含んでもよい。メモリ３０４は、本明細書で説明する機能を実施するプログラム、モジュール、及びデータ構造のすべて又は一部を格納してもよい。例えば、メモリ３０４は、オブジェクト検出アプリケーション３５０を格納していてもよい。ある実施形態では、オブジェクト検出アプリケーション３５０は、後述する機能をプロセッサ３０２上で実行する命令又は記述を含んでもよい。 In some embodiments, memory 304 may include random access semiconductor memory, storage devices, or storage media (either volatile or non-volatile) for storing data and programs. Memory 304 may store all or a portion of the programs, modules, and data structures that implement the functions described herein. For example, memory 304 may store an object detection application 350. In some embodiments, object detection application 350 may include instructions or descriptions that execute on processor 302 the functions described below.

ある実施形態では、オブジェクト検出アプリケーション３５０は、プロセッサベースのシステムの代わりに、またはプロセッサベースのシステムに加えて、半導体デバイス、チップ、論理ゲート、回路、回路カード、および/または他の物理ハードウェアデバイスを介してハードウェアで実施されてもよい。ある実施形態では、オブジェクト検出アプリケーション３５０は、命令又は記述以外のデータを含んでもよい。ある実施形態では、カメラ、センサ、または他のデータ入力デバイス（図示せず）が、バスインターフェースユニット３０９、プロセッサ３０２、またはコンピュータシステム３００の他のハードウェアと直接通信するように提供されてもよい。 In some embodiments, the object detection application 350 may be implemented in hardware via semiconductor devices, chips, logic gates, circuits, circuit cards, and/or other physical hardware devices instead of or in addition to a processor-based system. In some embodiments, the object detection application 350 may include data other than instructions or descriptions. In some embodiments, cameras, sensors, or other data input devices (not shown) may be provided to communicate directly with the bus interface unit 309, the processor 302, or other hardware of the computer system 300.

コンピュータシステム３００は、プロセッサ３０２、メモリ３０４、表示システム３２４、及びＩ／Ｏバスインターフェースユニット３１０間の通信を行うバスインターフェースユニット３０９を含んでもよい。Ｉ／Ｏバスインターフェースユニット３１０は、様々なＩ／Ｏユニットとの間でデータを転送するためのＩ／Ｏバス３０８と連結していてもよい。Ｉ／Ｏバスインターフェースユニット３１０は、Ｉ／Ｏバス３０８を介して、Ｉ／Ｏプロセッサ（ＩＯＰ）又はＩ／Ｏアダプタ（ＩＯＡ）としても知られる複数のＩ／Ｏインタフェースユニット３１２，３１４，３１６、及び３１８と通信してもよい。 Computer system 300 may include a bus interface unit 309 that provides communication between processor 302, memory 304, display system 324, and I/O bus interface unit 310. I/O bus interface unit 310 may be coupled to an I/O bus 308 for transferring data to and from various I/O units. I/O bus interface unit 310 may communicate via I/O bus 308 with multiple I/O interface units 312, 314, 316, and 318, also known as I/O processors (IOPs) or I/O adapters (IOAs).

表示システム３２４は、表示コントローラ、表示メモリ、又はその両方を含んでもよい。表示コントローラは、ビデオ、オーディオ、又はその両方のデータを表示装置３２６に提供することができる。また、コンピュータシステム３００は、データを収集し、プロセッサ３０２に当該データを提供するように構成された1つまたは複数のセンサ等のデバイスを含んでもよい。 The display system 324 may include a display controller, a display memory, or both. The display controller may provide video, audio, or both data to the display device 326. The computer system 300 may also include one or more sensors or other devices configured to collect data and provide the data to the processor 302.

例えば、コンピュータシステム３００は、心拍数データやストレスレベルデータ等を収集するバイオメトリックセンサ、湿度データ、温度データ、圧力データ等を収集する環境センサ、及び加速度データ、運動データ等を収集するモーションセンサ等を含んでもよい。これ以外のタイプのセンサも使用可能である。表示システム３２４は、単独のディスプレイ画面、テレビ、タブレット、又は携帯型デバイスなどの表示装置３２６に接続されてもよい。 For example, computer system 300 may include biometric sensors to collect data such as heart rate data or stress level data, environmental sensors to collect data such as humidity data, temperature data, or pressure data, and motion sensors to collect acceleration data, movement data, etc. Other types of sensors may also be used. Display system 324 may be connected to a display device 326, such as a standalone display screen, a television, a tablet, or a handheld device.

Ｉ／Ｏインタフェースユニットは、様々なストレージ又はＩ／Ｏデバイスと通信する機能を備える。例えば、端末インタフェースユニット３１２は、ビデオ表示装置、スピーカテレビ等のユーザ出力デバイスや、キーボード、マウス、キーパッド、タッチパッド、トラックボール、ボタン、ライトペン、又は他のポインティングデバイス等のユーザ入力デバイスのようなユーザＩ／Ｏデバイス３２０の取り付けが可能である。ユーザは、ユーザインターフェースを使用して、ユーザ入力デバイスを操作することで、ユーザＩ／Ｏデバイス３２０及びコンピュータシステム３００に対して入力データや指示を入力し、コンピュータシステム３００からの出力データを受け取ってもよい。ユーザインターフェースは例えば、ユーザＩ／Ｏデバイス３２０を介して、表示装置に表示されたり、スピーカによって再生されたり、プリンタを介して印刷されたりしてもよい。 The I/O interface unit provides the ability to communicate with various storage or I/O devices. For example, the terminal interface unit 312 can be fitted with user I/O devices 320, such as user output devices such as a video display, a speaker television, and user input devices such as a keyboard, a mouse, a keypad, a touchpad, a trackball, buttons, a light pen, or other pointing devices. A user may use a user interface to input input data or instructions to the user I/O devices 320 and the computer system 300 and receive output data from the computer system 300 by operating the user input devices. The user interface may be displayed on a display device, played through a speaker, or printed via a printer via the user I/O devices 320, for example.

ストレージインタフェース３１４は、１つ又は複数のディスクドライブや直接アクセスストレージ装置３２２（通常は磁気ディスクドライブストレージ装置であるが、単一のディスクドライブとして見えるように構成されたディスクドライブのアレイ又は他のストレージ装置であってもよい）の取り付けが可能である。ある実施形態では、ストレージ装置３２２は、任意の二次記憶装置として実装されてもよい。メモリ３０４の内容は、ストレージ装置３２２に記憶され、必要に応じてストレージ装置３２２から読み出されてもよい。Ｉ／Ｏデバイスインタフェース３１６は、プリンタ、ファックスマシン等の他のＩ／Ｏデバイスに対するインターフェースを提供してもよい。ネットワークインターフェース３１８は、コンピュータシステム３００と他のデバイスが相互的に通信できるように、通信経路を提供してもよい。この通信経路は、例えば、ネットワーク３３０であってもよい。 The storage interface 314 allows attachment of one or more disk drives or direct access storage devices 322 (usually magnetic disk drive storage devices, but may be an array of disk drives or other storage devices configured to appear as a single disk drive). In some embodiments, the storage device 322 may be implemented as any secondary storage device. The contents of the memory 304 may be stored in the storage device 322 and retrieved from the storage device 322 as needed. The I/O device interface 316 may provide an interface to other I/O devices such as printers, fax machines, etc. The network interface 318 may provide a communication path so that the computer system 300 and other devices can communicate with each other. This communication path may be, for example, a network 330.

ある実施形態では、コンピュータシステム３００は、マルチユーザメインフレームコンピュータシステム、シングルユーザシステム、又はサーバコンピュータ等の、直接的ユーザインターフェースを有しない、他のコンピュータシステム（クライアント）からの要求を受信するデバイスであってもよい。他の実施形態では、コンピュータシステム３００は、デスクトップコンピュータ、携帯型コンピュータ、ノートパソコン、タブレットコンピュータ、ポケットコンピュータ、電話、スマートフォン、又は任意の他の適切な電子機器であってもよい。 In some embodiments, computer system 300 may be a device that receives requests from other computer systems (clients) without a direct user interface, such as a multi-user mainframe computer system, a single-user system, or a server computer. In other embodiments, computer system 300 may be a desktop computer, a portable computer, a laptop, a tablet computer, a pocket computer, a telephone, a smartphone, or any other suitable electronic device.

次に、図２を参照して、本開示の実施形態に係るオブジェクト検出システムについて説明する。 Next, an object detection system according to an embodiment of the present disclosure will be described with reference to FIG.

図２は、本開示の実施形態に係るオブジェクト検出システム２００の構成の一例を示す図である。図２に示すように、本開示に係るオブジェクト検出システム２００は、主にＸ線装置２１１、通信ネットワーク２０２、及びオブジェクト検出装置２０１からなる。Ｘ線装置２１１及びオブジェクト検出装置２０１は、通信ネットワーク２０２を介して接続されている。 FIG. 2 is a diagram showing an example of the configuration of an object detection system 200 according to an embodiment of the present disclosure. As shown in FIG. 2, the object detection system 200 according to the present disclosure mainly comprises an X-ray device 211, a communication network 202, and an object detection device 201. The X-ray device 211 and the object detection device 201 are connected via the communication network 202.

通信ネットワーク２０２は、例えばローカルエリアネットワーク（ＬＡＮ）、ワイドエリアネットワーク（ＷＡＮ）、衛星ネットワーク、ケーブルネットワーク、Ｗｉ―Ｆｉネットワーク、またはそれらの任意の組み合わせを含むものであってもよい。また、Ｘ線装置２１１とオブジェクト検出装置２０１の接続は、有線であってもよく、無線であってもよい。 The communication network 202 may include, for example, a local area network (LAN), a wide area network (WAN), a satellite network, a cable network, a Wi-Fi network, or any combination thereof. In addition, the connection between the X-ray device 211 and the object detection device 201 may be wired or wireless.

Ｘ線装置２１１は、Ｘ線画像を撮影する装置である。Ｘ線装置２１１は、例えばＸ線を放射するＸ線発生器と、反射するＸ線を検出し、解析するＸ線検出器とからなる。本開示に係るＸ線装置２１１の種類は特に限定されず、荷物用のＸ線装置であってもよく、後方散乱Ｘ線検査装置であってもよく、医療用のＸ線装置であってもよい。Ｘ線装置２１１は、所定の被写体のＸ線画像を撮影し、通信ネットワーク２０２を介してオブジェクト検出装置２０１に送信するように構成されている。
なお、図２では、Ｘ線画像を処理する構成として、Ｘ線装置２１１を含む場合の構成を一例として示しているが、本開示はこれに限定されず、Ｘ線装置２１１は、例えば任意のカメラ、センサ、又はオブジェクト検出の対象となる入力画像を提供する他の装置であってもよい。 The X-ray device 211 is a device that captures an X-ray image. The X-ray device 211 includes, for example, an X-ray generator that emits X-rays and an X-ray detector that detects and analyzes the reflected X-rays. The type of the X-ray device 211 according to the present disclosure is not particularly limited, and may be an X-ray device for luggage, a backscattered X-ray inspection device, or a medical X-ray device. The X-ray device 211 is configured to capture an X-ray image of a predetermined subject and transmit it to the object detection device 201 via the communication network 202.
Note that FIG. 2 shows an example of a configuration for processing X-ray images that includes an X-ray device 211, but the present disclosure is not limited thereto, and the X-ray device 211 may be, for example, any camera, sensor, or other device that provides an input image that is the subject of object detection.

オブジェクト検出装置２０１は、例えばＸ線装置２１１から送信されるＸ線画像におけるオブジェクトのカテゴリー及び位置を検出するためのコンピューティングデバイスである。オブジェクト検出装置２０１は、例えばデスクトップコンピュータ、サーバコンピュータ、ラップトップコンピューター、タブレットコンピュータ、ワークステーション、携帯端末、または他の種類のコンピューティングデバイスとして構成されてもよい。 The object detection device 201 is a computing device for detecting the category and location of an object in, for example, an X-ray image transmitted from the X-ray device 211. The object detection device 201 may be configured as, for example, a desktop computer, a server computer, a laptop computer, a tablet computer, a workstation, a mobile device, or other type of computing device.

図２に示すように、オブジェクト検出装置２０１は、メモリ２０７に格納された命令を実行するためのプロセッサ２０３と、オブジェクト検出装置２０１の内部・外部デバイス間の通信を制御するためのＩ／Ｏインターフェース２０４と、通信ネットワーク２０２を介しての通信を制御するためのネットワークインターフェース２０５と、ユーザからの入力を受け付けるためのユーザＩ／Ｏインターフェース２０６と、本開示の実施形態に係るオブジェクト検出手段のそれぞれの機能を実行するための機能部を格納するメモリ２０７と、これらの構成要素の双方通信を制御するためのバス２１２とを含む。 As shown in FIG. 2, the object detection device 201 includes a processor 203 for executing instructions stored in a memory 207, an I/O interface 204 for controlling communication between internal and external devices of the object detection device 201, a network interface 205 for controlling communication via a communication network 202, a user I/O interface 206 for receiving input from a user, a memory 207 that stores functional units for executing each function of the object detection means according to an embodiment of the present disclosure, and a bus 212 for controlling bidirectional communication between these components.

また、図２に示すように、オブジェクト検出装置２０１のメモリ２０７は、ソースドメインに対応するソースドメイン画像と、ターゲットドメインに対応するターゲットドメイン画像とを含む入力画像セットや、Ｘ線装置から送信されるＸ線画像等を受け付ける画像入力部２０８と、入力画像セットに対するドメイン変換処理を行い、ソースドメイン画像をターゲットドメインに変換した疑似ターゲットドメイン画像と、ターゲットドメイン画像をソースドメイン画像に変換した疑似ソースドメイン画像とを含む変換画像セットを生成するドメイン変換部２１０と、高精度の疑似ターゲットドメイン画像及び疑似ソースドメイン画像を生成するようにドメイン変換部２１０を学習するドメイン変換学習部２１５と、検出精度を向上させるようにオブジェクト検出部２２０を学習するオブジェクト検出学習部２２５と、学習された後、Ｘ線装置２１１から送信されるＸ線画像におけるオブジェクトのカテゴリーや位置を検出するオブジェクト検出部２２０と、ソースドメイン画像を格納するためのソースドメイン画像ストレージ部２３０と、ターゲットドメイン画像を格納するためのターゲットドメイン画像ストレージ部２３５と、疑似ソースドメイン画像を格納するための疑似ソースドメイン画像ストレージ部２４０と、疑似ターゲットドメイン画像を格納するための疑似ターゲットドメイン画像ストレージ部２４５とを含む。 As shown in FIG. 2, the memory 207 of the object detection device 201 includes an image input unit 208 that receives an input image set including a source domain image corresponding to the source domain and a target domain image corresponding to the target domain, an X-ray image transmitted from an X-ray device, etc., a domain conversion unit 210 that performs domain conversion processing on the input image set to generate a converted image set including a pseudo target domain image obtained by converting the source domain image into the target domain and a pseudo source domain image obtained by converting the target domain image into the source domain image, and a domain conversion unit 220 that converts the pseudo target domain image and the pseudo source domain image into a highly accurate 210, an object detection learning unit 225 that trains the object detection unit 220 to improve detection accuracy, the object detection unit 220 that, after being trained, detects the category and position of an object in the X-ray image transmitted from the X-ray device 211, a source domain image storage unit 230 for storing source domain images, a target domain image storage unit 235 for storing target domain images, a pseudo source domain image storage unit 240 for storing pseudo source domain images, and a pseudo target domain image storage unit 245 for storing pseudo target domain images.

本開示の実施形態に係るオブジェクト検出部２２０は、深層ニューラルネットワークとして構成されてもよい。後述するように、ラベル付きのソースドメイン画像及びラベル無しのターゲットドメイン画像を含む入力画像セットに加えて、ソースドメイン画像をターゲットドメインに変換した疑似ターゲットドメイン画像と、ターゲットドメイン画像をソースドメインに変換した疑似ソースドメイン画像とを含む変換画像セット用いてオブジェクト検出部２２０となる深層ニューラルネットワークを訓練することで、ソースドメインとターゲットドメインとのドメインギャップを短縮すると共に、ターゲットドメインの画像についても高精度のオブジェクト検出結果を生成することができる。 The object detection unit 220 according to an embodiment of the present disclosure may be configured as a deep neural network. As described below, by training the deep neural network serving as the object detection unit 220 using an input image set including labeled source domain images and unlabeled target domain images, as well as a transformed image set including pseudo target domain images obtained by transforming source domain images into the target domain and pseudo source domain images obtained by transforming target domain images into the source domain, it is possible to shorten the domain gap between the source domain and the target domain and generate highly accurate object detection results for images in the target domain.

なお、オブジェクト検出装置２０１に含まれるそれぞれの機能部は、図１に示すコンピュータシステム３００におけるオブジェクト検出アプリケーション３５０を構成するソフトウエアモジュールであってもよく、独立した専用ハードウェアデバイスであってもよい。また、上記の機能部は、同一のコンピューティング環境に実施されてもよく、分散されたコンピューティング環境に実施されてもよい。 Each functional unit included in the object detection device 201 may be a software module constituting the object detection application 350 in the computer system 300 shown in FIG. 1, or may be an independent dedicated hardware device. Furthermore, the above functional units may be implemented in the same computing environment, or may be implemented in distributed computing environments.

以上説明したオブジェクト検出システム２００によれば、入手しやすいドメインのラベル付き学習データを活用し、ソースドメインとターゲットドメインとのドメインギャップを短縮させた上でＤＮＮを訓練することで、通信ネットワーク２０２を介してＸ線装置２１１等の外部装置から受信したＸ線画像におけるオブジェクトのカテゴリー及び位置を検出することができる。 According to the object detection system 200 described above, by utilizing labeled learning data from easily available domains, shortening the domain gap between the source domain and the target domain, and training the DNN, it is possible to detect the category and position of an object in an X-ray image received from an external device such as the X-ray device 211 via the communication network 202.

次に、図３を参照して、本開示の実施形態に係るＸ線画像について説明する。 Next, referring to FIG. 3, we will explain the X-ray image according to an embodiment of the present disclosure.

図３は、本開示の実施形態に係るＸ線画像の一例を示す図である。
上述したように、本開示の実施形態に係るオブジェクト検出装置（例えば、図２に示すオブジェクト検出装置２０１）は、ソースドメイン（例えば、ラベル付き学習データが豊富なドメイン）に対応するラベル付き画像であるソースドメイン画像と、ターゲットドメイン（例えば、ラベル付き学習データが少ないドメイン）に対応するラベル無し画像であるターゲットドメイン画像とを入力とする。これらのソースドメイン画像と、ターゲットドメイン画像は、後述するように、オブジェクト検出装置を訓練する際に用いられる学習データの一部となる。 FIG. 3 is a diagram illustrating an example of an X-ray image according to an embodiment of the present disclosure.
As described above, an object detection device according to an embodiment of the present disclosure (e.g., the object detection device 201 shown in FIG. 2 ) receives as input a source domain image, which is a labeled image corresponding to a source domain (e.g., a domain with a wealth of labeled training data), and a target domain image, which is an unlabeled image corresponding to a target domain (e.g., a domain with a small amount of labeled training data). These source domain image and target domain image become part of the training data used to train the object detection device, as described below.

図３は、本開示の実施形態に係るソースドメイン画像３６１及びターゲットドメイン画像３６２の一例を示す。ソースドメイン画像３６１は、ラベル付きの画像であるため、ソースドメイン画像３６１における各オブジェクトのカテゴリー及び位置を示すメタデータは、ソースドメイン画像３６１に添付されている。
一方、ターゲットドメイン画像３６２は、ラベル無しの画像であるため、ターゲットドメイン画像３６２における各オブジェクトのカテゴリー及び位置が不明である。 3 illustrates an example of a source domain image 361 and a target domain image 362 according to an embodiment of the present disclosure. The source domain image 361 is a labeled image, such that metadata is attached to the source domain image 361 indicating the category and location of each object in the source domain image 361.
On the other hand, the target domain image 362 is an unlabeled image, so the category and position of each object in the target domain image 362 is unknown.

ソースドメイン画像３６１及びターゲットドメイン画像３６２は、例えば、異なるＸ線装置によって撮影された、又は、同一のＸ線装置で異なる撮影設定で撮影されたため、ドメインが異なるＸ線画像となっている。このため、ソースドメイン画像３６１とターゲットドメイン画像３６２とで、色、鮮鋭度（シャープネス）等、様々な表示設定が相違しており、ソースドメインとターゲットドメインとの間でいわゆる「ドメインギャップ」が存在する。 The source domain image 361 and the target domain image 362 are X-ray images with different domains because, for example, they were taken by different X-ray devices or taken by the same X-ray device with different shooting settings. Therefore, various display settings such as color and sharpness are different between the source domain image 361 and the target domain image 362, and a so-called "domain gap" exists between the source domain and the target domain.

このようなドメインギャップが存在すると、例えばオブジェクト検出用のＤＮＮがソースドメインのラベル付きデータによって訓練されたとしても、ラベル無しのドメインであるターゲットドメインの画像に対しては高精度のオブジェクト検出結果を生成することができない。
そこで、後述するように、本開示では、ソースドメインとターゲットドメインとの距離を短縮し、ドメインギャップを縮小することで、ラベル無しのドメインの画像に対しても、高精度のオブジェクト検出結果を生成することが可能となる。 When such a domain gap exists, for example, even if a DNN for object detection is trained with labeled data in the source domain, it cannot generate highly accurate object detection results for images in the target domain, which is an unlabeled domain.
Therefore, as described below, in the present disclosure, by shortening the distance between the source domain and the target domain and reducing the domain gap, it becomes possible to generate highly accurate object detection results even for images in an unlabeled domain.

次に、図４を参照して、本開示の実施形態に係るオブジェクト検出装置におけるオブジェクト検出学習部の論理構成について説明する。 Next, with reference to FIG. 4, the logical configuration of the object detection learning unit in the object detection device according to an embodiment of the present disclosure will be described.

図４は、本開示の実施形態に係るオブジェクト検出装置２０１におけるオブジェクト検出学習部２２５の論理構成を示す図である。
上述したように、本開示の実施形態に係るオブジェクト検出装置２０１におけるオブジェクト検出学習部２２５は、検出精度を向上させるようにオブジェクト検出部（例えば、図２に示すオブジェクト検出部２２０）のパラメータを調整することでオブジェクト検出部を訓練するための機能部である。 FIG. 4 is a diagram illustrating a logical configuration of the object detection learning unit 225 in the object detection device 201 according to an embodiment of the present disclosure.
As described above, the object detection learning unit 225 in the object detection device 201 according to an embodiment of the present disclosure is a functional unit for training the object detection unit (e.g., the object detection unit 220 shown in FIG. 2) by adjusting parameters of the object detection unit so as to improve detection accuracy.

図４に示すように、オブジェクト検出学習部２２５は、ペア生成部３６５、特徴抽出部３６８、画像乖離度計算部３６９、適応損失計算部３７０、検出予測部３７１、検出損失計算部３７２、及びパラメータ更新部３７３を含む。 As shown in FIG. 4, the object detection learning unit 225 includes a pair generation unit 365, a feature extraction unit 368, an image disparity calculation unit 369, an adaptation loss calculation unit 370, a detection prediction unit 371, a detection loss calculation unit 372, and a parameter update unit 373.

まず、ペア生成部３６５は、ソースドメイン画像３６１と、ターゲットドメイン画像３６２とを含む入力画像セットと、当該入力画像セットに対するドメイン変換処理を施すことによって得られる、疑似ソースドメイン画像３６３と、疑似ターゲットドメイン画像３６４とを含む変換画像セットとを入力する。図３を参照して説明したように、ソースドメイン画像３６１は、所定のソースドメインに対応するラベル付き画像であり、ターゲットドメイン画像３６２は、ソースドメインと異なるドメインであるターゲットドメインに対応するラベル無し画像である。
なお、図４では、説明の便宜上、ペア生成部３６５は、ソースドメイン画像３６１と、ターゲットドメイン画像３６２、疑似ソースドメイン画像３６３、及び疑似ターゲットドメイン画像３６４を１つずつ入力する場合を一例として示しているが、本開示はこれに限定されない。実際には、ペア生成部３６５は、ソースドメイン画像３６１と、ターゲットドメイン画像３６２、疑似ソースドメイン画像３６３、及び疑似ターゲットドメイン画像３６４のそれぞれについて、複数の画像を含むバッチを入力してもよい。 First, the pair generation unit 365 inputs an input image set including a source domain image 361 and a target domain image 362, and a transformed image set including a pseudo source domain image 363 and a pseudo target domain image 364 obtained by performing a domain transformation process on the input image set. As described with reference to Fig. 3, the source domain image 361 is a labeled image corresponding to a predetermined source domain, and the target domain image 362 is an unlabeled image corresponding to a target domain that is a domain different from the source domain.
4, for convenience of explanation, the pair generation unit 365 inputs the source domain image 361, the target domain image 362, the pseudo source domain image 363, and the pseudo target domain image 364 one by one as an example, but the present disclosure is not limited to this. In reality, the pair generation unit 365 may input a batch including multiple images for each of the source domain image 361, the target domain image 362, the pseudo source domain image 363, and the pseudo target domain image 364.

疑似ソースドメイン画像３６３は、上述したドメイン変換部（例えば、図２に示すドメイン変換部２１０）を用いて、ターゲットドメイン画像３６２をソースドメインに変換することで得られた画像である。
なお、疑似ソースドメイン画像３６３は、ラベル無しの画像であるターゲットドメイン画像３６２から変換された画像であるため、ターゲットドメイン画像３６２と同様に、ラベル無しの画像である。 The pseudo source domain image 363 is an image obtained by transforming the target domain image 362 into the source domain using the above-mentioned domain transformation unit (for example, the domain transformation unit 210 shown in FIG. 2).
Note that the pseudo source domain image 363 is an image converted from the target domain image 362, which is an unlabeled image, and is therefore an unlabeled image, just like the target domain image 362.

疑似ターゲットドメイン画像３６４は、上述したドメイン変換部を用いて、ソースドメイン画像３６１をターゲットドメインに変換することで得られた画像である。
なお、疑似ターゲットドメイン画像３６４は、ラベル付きの画像であるソースドメイン画像３６１から変換された画像であるため、ソースドメイン画像３６１と同様に、ラベル付きの画像である。また、本開示では、入力画像セットに対するドメイン変換処理を施すことで得られる変換画像セットにおける各画像を「疑似」と呼ぶのは、実際のソースドメイン及びターゲットドメインと完全には一致しないからである。 The pseudo target domain image 364 is an image obtained by transforming the source domain image 361 into the target domain using the domain transformation unit described above.
Note that the pseudo target domain image 364 is an image transformed from the source domain image 361, which is a labeled image, and is therefore a labeled image, similar to the source domain image 361. Also, in this disclosure, the images in the transformed image set obtained by performing a domain transformation process on the input image set are called "pseudo" because they do not completely match the actual source and target domains.

ソースドメイン画像３６１と、ターゲットドメイン画像３６２と、疑似ソースドメイン画像３６３と、疑似ターゲットドメイン画像３６４とを入力したペア生成部３６５は、画像ペアを生成する。より具体的には、ペア生成部３６５は、入力画像セットに含まれる各画像と、変換画像セットに含まれる各画像とを組み合わせたペアを生成してもよい。また、ペア生成部３６５は、
入力画像セットと、変換画像セットとの中から、撮影内容（オブジェクトのカテゴリーや配置）が所定の類似度基準を満たす画像をポジティブペア３６６とし、入力画像セットと、変換画像セットとの中から、撮影内容が所定の類似度基準を満たさない画像をネガティブペア３６７とする。
なお、画像ペアの詳細については、図６を参照して説明するため、ここではその説明を省略する。 The pair generation unit 365 receives the source domain image 361, the target domain image 362, the pseudo source domain image 363, and the pseudo target domain image 364 and generates image pairs. More specifically, the pair generation unit 365 may generate pairs that combine each image included in the input image set with each image included in the transformation image set. The pair generation unit 365 may also generate pairs that combine each image included in the input image set with each image included in the transformation image set.
From the input image set and the transformed image set, images whose photographic contents (object category and arrangement) satisfy a predetermined similarity standard are designated as positive pairs 366, and from the input image set and the transformed image set, images whose photographic contents do not satisfy the predetermined similarity standard are designated as negative pairs 367.
The image pair will be described in detail with reference to FIG. 6, and therefore will not be described here.

特徴抽出部３６８は、ペア生成部によって生成された画像ペアを入力した後、これらの画像ペアに含まれる各画像について、特徴マップを抽出する。ここでは、画像ペアに含まれる各画像の特徴マップを抽出する手段は、例えばいわゆる畳み込みニューラルネットワーク等の既存の手段を用いてもよく、本開示では特に限定されない。特徴抽出部３６８によって作成される各画像の特徴マップは、画像乖離度計算部３６９と、検出予測部３７１とに転送される。 The feature extraction unit 368 inputs the image pairs generated by the pair generation unit, and then extracts feature maps for each image included in these image pairs. Here, the means for extracting the feature maps of each image included in the image pairs may be an existing means such as a so-called convolutional neural network, and is not particularly limited in this disclosure. The feature maps of each image created by the feature extraction unit 368 are transferred to the image deviation calculation unit 369 and the detection prediction unit 371.

画像乖離度計算部３６９は、各画像ペアに含まれるそれぞれの画像の特徴マップを比較することで、当該画像ペアのそれぞれの画像の特徴分布の乖離度を計算する。ここでの乖離度とは、画像の特徴分布の距離を示す値であり、画像ペアの特徴分布の乖離度が大きい程、上述したドメインギャップが大きいことを示す。
なお、画像ペアの特徴分布の乖離度の計算の詳細については後述するため、ここではその説明を省略する。 The image deviation calculation unit 369 calculates the deviation of the feature distribution of each image in each image pair by comparing the feature maps of each image in the image pair. The deviation here is a value indicating the distance of the feature distribution of the images, and the larger the deviation of the feature distribution of the image pair, the larger the domain gap described above.
The details of the calculation of the degree of deviation of the feature distribution of an image pair will be described later, and therefore will not be described here.

適応損失計算部３７０は、ペア生成部３６５によって生成される画像ペアの中で、ポジティブペアの乖離度を減算させる第１の適応損失パラメータと、ネガティブペアの乖離度を向上させる第２の適応損失パラメータとを計算する。
なお、適応損失パラメータの計算の詳細については後述するため、ここではその説明を省略する。 The adaptive loss calculation unit 370 calculates a first adaptive loss parameter that subtracts the disparity of a positive pair and a second adaptive loss parameter that improves the disparity of a negative pair among the image pairs generated by the pair generation unit 365.
The calculation of the adaptive loss parameters will be described in detail later, and therefore will not be described here.

検出予測部３７１は、特徴抽出部３６８によって生成された各画像ペアの特徴マップから、各画像におけるオブジェクトのカテゴリー及び位置を予測し、オブジェクトの予測したカテゴリー及び位置を示す予測結果を生成する。ここでの検出予測部３７１として、例えばオブジェクト検出部２２０を構成する深層ニューラルネットワークを訓練前の状態で用いてもよい。 The detection prediction unit 371 predicts the category and position of an object in each image from the feature map of each image pair generated by the feature extraction unit 368, and generates a prediction result indicating the predicted category and position of the object. As the detection prediction unit 371 here, for example, the deep neural network constituting the object detection unit 220 may be used in a pre-trained state.

検出損失計算部３７２は、検出予測部３７１によって生成された予測結果と、オブジェクトの実際のカテゴリー及び実際の位置を示すグラウンドトゥルースとを比較することで、検出予測部３７１による検出損失を示す検出損失パラメータを計算する。
なお、検出損失パラメータの計算の詳細については後述するため、ここではその説明を省略する。 The detection loss calculation unit 372 calculates a detection loss parameter indicating the detection loss by the detection prediction unit 371 by comparing the prediction result generated by the detection prediction unit 371 with ground truth indicating the actual category and actual location of the object.
The details of the calculation of the detection loss parameter will be described later, and therefore will not be described here.

パラメータ更新部３７３は、適応損失計算部３７０によって計算される第１の適応損失パラメータ及び第２の適応損失パラメータと、検出損失計算部３７２によって計算される検出損失パラメータに基づいて、オブジェクト検出部（例えば、図２に示すオブジェクト検出部２２０）を構成する深層ニューラルネットワークのパラメータを調整することで、オブジェクト検出部を訓練する。 The parameter update unit 373 trains the object detection unit by adjusting the parameters of the deep neural network constituting the object detection unit (e.g., the object detection unit 220 shown in FIG. 2) based on the first adaptation loss parameter and the second adaptation loss parameter calculated by the adaptation loss calculation unit 370 and the detection loss parameter calculated by the detection loss calculation unit 372.

以上説明したオブジェクト検出学習部２２５の構成によれば、適応損失パラメータと、検出損失パラメータとに基づいてオブジェクト検出部を訓練することで、オブジェクト検出部は、予測結果とグラウンドトゥルースとの差を最小化するように学習され、ターゲットドメインの画像についても、高精度のオブジェクト検出結果を生成することができるようになる。また、このように、入手しやすいドメインのラベル付き学習データのみを用いてオブジェクト検出部を訓練することができるため、入手が困難なターゲットドメインの学習データを収集することが不要となり、深層ニューラルネットワークによるオブジェクト検出手段を導入するコストを抑えることができる。 According to the configuration of the object detection learning unit 225 described above, by training the object detection unit based on the adaptation loss parameter and the detection loss parameter, the object detection unit is trained to minimize the difference between the prediction result and the ground truth, and can generate highly accurate object detection results for images in the target domain. In this way, the object detection unit can be trained using only labeled learning data from easily available domains, eliminating the need to collect learning data from the target domain that is difficult to obtain, and reducing the cost of introducing an object detection means using a deep neural network.

次に、図５を参照して、本開示の実施形態に係るドメイン変換部によるドメイン変換処理について説明する。 Next, referring to FIG. 5, the domain conversion process performed by the domain conversion unit according to an embodiment of the present disclosure will be described.

図５は、本開示の実施形態に係るドメイン変換部２１０によるドメイン変換処理の一例を示す図である。上述したように、本開示の実施形態に係るドメイン変換部２１０は、ソースドメイン（例えば、ラベル付き学習データが豊富なドメイン）に対応するラベル付き画像であるソースドメイン画像５０５をターゲットドメインに変換した疑似ターゲットドメイン画像５１１と、ターゲットドメイン（例えば、ラベル付き学習データが少ないドメイン）に対応するラベル無し画像であるターゲットドメイン画像５０７をソースドメインに変換した疑似ソースドメイン画像５０９とを生成する。 FIG. 5 is a diagram illustrating an example of a domain conversion process by the domain conversion unit 210 according to an embodiment of the present disclosure. As described above, the domain conversion unit 210 according to an embodiment of the present disclosure generates a pseudo target domain image 511 obtained by converting a source domain image 505, which is a labeled image corresponding to a source domain (e.g., a domain with a wealth of labeled learning data), into a target domain, and a pseudo source domain image 509 obtained by converting a target domain image 507, which is an unlabeled image corresponding to a target domain (e.g., a domain with little labeled learning data), into a source domain.

図５は、本開示の実施形態に係るソースドメイン画像５０５、ターゲットドメイン画像５０７、疑似ソースドメイン画像５０９、及び疑似ターゲットドメイン画像５１１の一例をしている。また、図５に示す各画像における三角及び丸は、画像における２種類のカテゴリーのオブジェクト（例えば、水筒と腕時計）を示している。
なお、図５では、説明の便宜上、２種類のカテゴリーのオブジェクトを含む画像を一例として示しているが、本開示はこれに限定されず、任意の数のカテゴリーを含む画像であってもよい。 5 shows an example of a source domain image 505, a target domain image 507, a pseudo source domain image 509, and a pseudo target domain image 511 according to an embodiment of the present disclosure. Also, the triangles and circles in each image shown in FIG. 5 represent two categories of objects (e.g., water bottles and watches) in the image.
Note that, for convenience of explanation, FIG. 5 shows an example of an image including objects of two categories, but the present disclosure is not limited to this, and the image may include any number of categories.

ソースドメイン画像５０５とターゲットドメイン画像５０７とは、例えば、異なるＸ線装置によって撮影された、又は、同一のＸ線装置で異なる撮影設定で撮影されたため、ドメインが異なるＸ線画像となっている。このため、ソースドメイン画像５０５とターゲットドメイン画像５０７とで、色、鮮鋭度（シャープネス）等、様々な表示設定が相違しており、ソースドメインとターゲットドメインとの間ではドメインギャップが存在し、それぞれの画像の特徴分布が大きく乖離している。 The source domain image 505 and the target domain image 507 are X-ray images with different domains, for example, because they were taken by different X-ray devices or taken by the same X-ray device with different shooting settings. For this reason, the source domain image 505 and the target domain image 507 differ in various display settings such as color and sharpness, and a domain gap exists between the source domain and the target domain, and the feature distributions of the respective images are significantly different.

このようなドメインギャップが存在すると、例えばオブジェクト検出用のＤＮＮがソースドメインのラベル付きデータによって訓練されたとしても、ラベル無しのドメインであるターゲットドメインの画像に対しては高精度のオブジェクト検出結果を生成することができない。
そこで、本開示では、ドメイン変換部を用いて、それぞれの画像に対するドメイン変換処理を行うことで、ドメインギャップを短縮し、ソースドメインとターゲットドメインとを接近させることができる。 When such a domain gap exists, for example, even if a DNN for object detection is trained with labeled data in the source domain, it cannot generate highly accurate object detection results for images in the target domain, which is an unlabeled domain.
Therefore, in the present disclosure, a domain conversion unit is used to perform domain conversion processing on each image, thereby shortening the domain gap and bringing the source domain and the target domain closer to each other.

本開示に係るドメイン変換処理では、ドメイン変換部は、画像におけるオブジェクトの位置等を変えずに、それぞれの画像の色、明るさ、鮮鋭度（シャープネス）等の表示設定パラメータを、他方の画像のドメインに整合させるように調整する。より具体的には、ドメイン変換部２１０は、ソースドメイン画像５０５の色、明るさ、鮮鋭度等をターゲットドメインに整合させるように調整し、疑似ターゲットドメイン画像５１１を生成する。同様に、ドメイン変換部２１０は、ターゲットドメイン画像５０７の色、明るさ、鮮鋭度等の表示設定パラメータをソースドメインに整合させるように調整し、疑似ソースドメイン画像５０９を生成する。 In the domain conversion process according to the present disclosure, the domain conversion unit adjusts the display setting parameters such as color, brightness, sharpness, etc. of each image to match the domain of the other image without changing the positions of objects in the images. More specifically, the domain conversion unit 210 adjusts the color, brightness, sharpness, etc. of the source domain image 505 to match the target domain, and generates a pseudo target domain image 511. Similarly, the domain conversion unit 210 adjusts the display setting parameters such as color, brightness, sharpness, etc. of the target domain image 507 to match the source domain, and generates a pseudo source domain image 509.

このドメイン変換処理によれば、画像間のドメインギャップが短縮される。また、このように画像間のドメインギャップが短縮された画像をオブジェクト検出用のＤＮＮを訓練するために用いることで、例えばターゲットドメインに対応する大量の学習データを収集しなくても、ターゲットドメインの画像にについて高精度のオブジェクト検出結果を生成することが可能となる。 This domain conversion process reduces the domain gap between images. In addition, by using images with a reduced domain gap to train a DNN for object detection, it becomes possible to generate highly accurate object detection results for images in the target domain, for example, without collecting a large amount of training data corresponding to the target domain.

次に、図６を参照して、本開示の実施形態に係る画像ペア生成部による画像ペア生成処理について説明する。 Next, the image pair generation process performed by the image pair generation unit according to an embodiment of the present disclosure will be described with reference to FIG. 6.

図６は、本開示の実施形態に係るペア生成部による画像ペア生成処理の一例を示す図である。上述したように、本開示の実施形態に係る画像ペア生成部（例えば、図４に示す画像ペア生成部３６５）は、ソースドメイン画像と、ターゲットドメイン画像とを含む入力画像セットと、疑似ソースドメイン画像と、疑似ターゲットドメイン画像とを含む変換画像セットとを入力し、入力画像セットに含まれる各画像と、変換画像セットに含まれる各画像とを組み合わせたペアを生成してもよい。 FIG. 6 is a diagram showing an example of an image pair generation process by a pair generation unit according to an embodiment of the present disclosure. As described above, an image pair generation unit according to an embodiment of the present disclosure (e.g., image pair generation unit 365 shown in FIG. 4) may input an input image set including a source domain image and a target domain image, and a transformed image set including a pseudo source domain image and a pseudo target domain image, and generate pairs combining each image included in the input image set with each image included in the transformed image set.

一例として、ソースドメイン画像６０５Ａ、ソースドメイン画像６０５Ｂ、及びソースドメイン画像６０５Ｃとの３つのソースドメイン画像と、これらの３つのソースドメイン画像をターゲットドメインに変換した疑似ターゲットドメイン画像６１０Ａ、疑似ターゲットドメイン画像６１０Ｂ、及び疑似ターゲットドメイン画像６１０Ｃとの３つのターゲットドメイン画像があるとする。この場合、画像ペア生成部は、ソースドメイン画像６０５Ａと疑似ターゲットドメイン画像６１０Ａ、ソースドメイン画像６０５Ａと疑似ターゲットドメイン画像６１０Ｂ、ソースドメイン画像６０５Ａと疑似ターゲットドメイン画像６１０Ｃ、ソースドメイン画像６０５Ｂと疑似ターゲットドメイン画像６１０Ａ、ソースドメイン画像６０５Ｂと疑似ターゲットドメイン画像６１０Ｂ、ソースドメイン画像６０５Ｂと疑似ターゲットドメイン画像６１０Ｃ、ソースドメイン画像６０５Ｃと疑似ターゲットドメイン画像６１０Ａ、ソースドメイン画像６０５Ｃと疑似ターゲットドメイン画像６１０Ｂ、及びソースドメイン画像６０５Ｃと疑似ターゲットドメイン画像６１０Ｃとの９つの画像ペアを生成する。 As an example, suppose there are three source domain images, source domain image 605A, source domain image 605B, and source domain image 605C, and three target domain images, pseudo target domain image 610A, pseudo target domain image 610B, and pseudo target domain image 610C, which are obtained by converting these three source domain images into the target domain. In this case, the image pair generation unit generates nine image pairs: source domain image 605A and pseudo target domain image 610A, source domain image 605A and pseudo target domain image 610B, source domain image 605A and pseudo target domain image 610C, source domain image 605B and pseudo target domain image 610A, source domain image 605B and pseudo target domain image 610B, source domain image 605B and pseudo target domain image 610C, source domain image 605C and pseudo target domain image 610A, source domain image 605C and pseudo target domain image 610B, and source domain image 605C and pseudo target domain image 610C.

また、画像ペア生成部は、この９つのペアの中から、撮影内容（オブジェクトの形状や配置）が所定の類似度基準を満たす画像をポジティブペア６１２とし、撮影内容が所定の類似度基準を満たさない画像をネガティブペア６１４とする。ここでの類似度基準とは、ユーザに予め設定されてもよい。また、画像の類似度は、既存の画像類似度アルゴリズムによって判定されてもよく、ここでは特に限定されない。
図５では、ポジティブペア６１２は実線で示され、ネガティブペア６１４は点線で示される。このように、それぞれのソースドメイン画像と、当該ソースドメイン画像から生成された疑似ターゲットドメイン画像とがポジティブペア６１２となり、それ以外の画像の組み合わせはネガティブペア６１４となる。 Furthermore, from among these nine pairs, the image pair generating unit selects images whose photographed contents (shape and arrangement of objects) satisfy a predetermined similarity standard as a positive pair 612, and selects images whose photographed contents do not satisfy the predetermined similarity standard as a negative pair 614. The similarity standard here may be set in advance by the user. Furthermore, the similarity of the images may be determined by an existing image similarity algorithm, and is not particularly limited here.
5, positive pairs 612 are shown with solid lines, and negative pairs 614 are shown with dotted lines. Thus, each source domain image and its corresponding pseudo target domain image are positive pairs 612, and the remaining image combinations are negative pairs 614.

より具体的には、ソースドメイン画像６０５Ａと疑似ターゲットドメイン画像６１０Ａ、ソースドメイン画像６０５Ｂと疑似ターゲットドメイン画像６１０Ｂ、及びソースドメイン画像６０５Ｃと疑似ターゲットドメイン画像６１０Ｃとがポジティブペア６１２となり、ソースドメイン画像６０５Ａと疑似ターゲットドメイン画像６１０Ｂ、ソースドメイン画像６０５Ａと疑似ターゲットドメイン画像６１０Ｃ、ソースドメイン画像６０５Ｂと疑似ターゲットドメイン画像６１０Ａ、ソースドメイン画像６０５Ｂと疑似ターゲットドメイン画像６１０Ｃ、ソースドメイン画像６０５Ｃと疑似ターゲットドメイン画像６１０Ａ、及びソースドメイン画像６０５Ｃと疑似ターゲットドメイン画像６１０Ｂとがネガティブペア６１４となる。 More specifically, the source domain image 605A and the pseudo target domain image 610A, the source domain image 605B and the pseudo target domain image 610B, and the source domain image 605C and the pseudo target domain image 610C form positive pairs 612, while the source domain image 605A and the pseudo target domain image 610B, the source domain image 605A and the pseudo target domain image 610C, the source domain image 605B and the pseudo target domain image 610A, the source domain image 605B and the pseudo target domain image 610C, the source domain image 605C and the pseudo target domain image 610A, and the source domain image 605C and the pseudo target domain image 610B form negative pairs 614.

後述するように、ペア生成部によって生成される画像ペアの中で、ポジティブペア６１２の乖離度を減算させ、ネガティブペア６１４の乖離度を向上させることで、オブジェクトのカテゴリーの識別力（つまり、検出精度）を高めつつ、ドメインギャップを短縮することができる。 As described below, by subtracting the degree of deviation of the positive pair 612 and improving the degree of deviation of the negative pair 614 among the image pairs generated by the pair generation unit, it is possible to shorten the domain gap while improving the discrimination ability (i.e., detection accuracy) of the object category.

次に、図７を参照して、本開示の実施形態に係るドメインギャップ短縮について説明する。 Next, referring to FIG. 7, we will explain domain gap shortening according to an embodiment of the present disclosure.

図７は、本開示の実施形態に係るドメインギャップ短縮の一例を示す図である。上述したように、本開示の実施形態に係る適応損失計算部（例えば、図３に示す適応損失計算部３７０）と、検出損失計算部（例えば、図３に示す検出損失計算部３７２）とによって生成されるパラメータに基づいてオブジェクト検出部のパラメータを更新することで、異なるドメインに対応する画像のドメインギャップを短縮することができる。 FIG. 7 is a diagram illustrating an example of domain gap shortening according to an embodiment of the present disclosure. As described above, the parameters of the object detection unit are updated based on parameters generated by an adaptive loss calculation unit (e.g., adaptive loss calculation unit 370 shown in FIG. 3) and a detection loss calculation unit (e.g., detection loss calculation unit 372 shown in FIG. 3) according to an embodiment of the present disclosure, thereby making it possible to shorten the domain gap of images corresponding to different domains.

図７は、適応損失計算部及び検出損失計算部によるドメインギャップ短縮の一例を示す。上述したように、適応損失計算部は、ペア生成部によって生成される画像ペアの中で、ポジティブペアの乖離度を減算させる第１の適応損失パラメータと、ネガティブペアの乖離度を向上させる第２の適応損失パラメータとを計算し、オブジェクト検出部は、これらのパラメータに基づいて訓練される。
これにより、図７に示すように、ポジティブペアの乖離度を減算させると、ソースドメイン画像５０５及び疑似ターゲットドメイン画像５１１と、ターゲットドメイン画像５０７及び疑似ソースドメイン画像５０９との間で、同一のカテゴリーの特徴分布が互いに接近する。また、ネガティブペアの乖離度を向上させると、ソースドメイン画像５０５及び疑似ターゲットドメイン画像５１１と、ターゲットドメイン画像５０７及び疑似ソースドメイン画像５０９との間で、異なるカテゴリーの特徴分布が更に乖離する。 7 shows an example of domain gap shortening by the adaptive loss calculation unit and the detection loss calculation unit. As described above, the adaptive loss calculation unit calculates a first adaptive loss parameter that reduces the disparity of a positive pair and a second adaptive loss parameter that improves the disparity of a negative pair among the image pairs generated by the pair generation unit, and the object detection unit is trained based on these parameters.
7, when the deviation degree of the positive pair is subtracted, the feature distributions of the same category between the source domain image 505 and the pseudo target domain image 511 and the target domain image 507 and the pseudo source domain image 509 become closer to each other. Also, when the deviation degree of the negative pair is improved, the feature distributions of different categories between the source domain image 505 and the pseudo target domain image 511 and the target domain image 507 and the pseudo source domain image 509 become more distant from each other.

また、検出損失計算部は、上述したように、検出予測部によって生成された予測結果と、オブジェクトの実際のカテゴリー及び実際の位置を示すグラウンドトゥルースとを比較することで、検出損失パラメータを計算し、オブジェクト検出部は、このパラメータに基づいて訓練される。これにより、オブジェクト検出部は、異なるカテゴリーをより高精度で認識できるようになる。 As described above, the detection loss calculation unit also calculates a detection loss parameter by comparing the prediction result generated by the detection prediction unit with ground truth indicating the actual category and actual location of the object, and the object detection unit is trained based on this parameter. This allows the object detection unit to recognize different categories with higher accuracy.

このように、適応損失計算部及び検出損失計算部によるパラメータを用いてオブジェクト検出部のパラメータを更新し、画像のドメインギャップを短縮することで、オブジェクト検出部は、ラベル無しのターゲットドメインの画像に対しても、高精度のオブジェクト検出結果を生成することができるようになる。 In this way, by updating the parameters of the object detection unit using the parameters from the adaptive loss calculation unit and the detection loss calculation unit and shortening the image domain gap, the object detection unit can generate highly accurate object detection results even for unlabeled target domain images.

次に、図８を参照して、本開示の実施形態に係るオブジェクト検出部訓練方法について説明する。 Next, with reference to FIG. 8, a method for training an object detection unit according to an embodiment of the present disclosure will be described.

図８は、本開示の実施形態に係るオブジェクト検出部訓練方法８００の一例を示す図である。図８に示すオブジェクト検出部訓練方法８００は、例えば図２に示すオブジェクト検出装置２０１の各機能部によって実行され、オブジェクト検出部を学習させるための方法である。 FIG. 8 is a diagram illustrating an example of an object detection unit training method 800 according to an embodiment of the present disclosure. The object detection unit training method 800 illustrated in FIG. 8 is a method executed by each functional unit of the object detection device 201 illustrated in FIG. 2, for example, to train the object detection unit.

まず、ステップＳ８１０では、画像入力部（例えば、図２に示す画像入力部２０８）は、
ソースドメインに対応するソースドメイン画像とターゲットドメインに対応するターゲットドメイン画像とを含む入力画像セットを受け付ける。ここで、ソースドメイン画像と、ターゲットドメイン画像とは、例えば学習用にユーザに選択された画像であってもよく、過去にオブジェクト検出装置に接続されているＸ線装置から送信された画像の中で、学習用に選択された画像であってもよい。
なお、ソースドメイン画像及びターゲットドメイン画像の詳細については、図３を参照して説明したため、ここではその説明を省略する。 First, in step S810, an image input unit (for example, the image input unit 208 shown in FIG. 2)
An input image set including a source domain image corresponding to the source domain and a target domain image corresponding to the target domain is accepted, where the source domain image and the target domain image may be, for example, images selected by a user for training, or may be images previously transmitted from an X-ray device connected to the object detection device and selected for training.
The details of the source domain image and the target domain image have been described with reference to FIG. 3, and therefore will not be described here.

次に、ステップＳ８２０では、ドメイン変換部（例えば、図２に示すドメイン変換部２１０）は、ステップＳ８１０で画像入力部によって受け付けられた入力画像セットに対するドメイン変換処理を行い、ソースドメイン画像をターゲットドメインに変換した疑似ターゲットドメイン画像と、ターゲットドメイン画像をソースドメイン画像に変換した疑似ソースドメイン画像とを含む変換画像セットを生成する。
なお、ドメイン変換処理の詳細については、図５を参照して説明したため、ここではその説明を省略する。 Next, in step S820, a domain conversion unit (e.g., the domain conversion unit 210 shown in FIG. 2) performs a domain conversion process on the input image set accepted by the image input unit in step S810, and generates a converted image set including a pseudo target domain image obtained by converting the source domain image to the target domain, and a pseudo source domain image obtained by converting the target domain image to the source domain image.
The details of the domain conversion process have been described with reference to FIG. 5, and therefore will not be described here.

次に、ステップＳ８３０では、ペア生成部（例えば、図４に示すペア生成部３６５）は、ソースドメイン画像と、ターゲットドメイン画像と、疑似ソースドメイン画像と、疑似ターゲットドメイン画像とを入力し、画像ペアを生成する。例えば、上述したように、ペア生成部は、入力画像セットと、変換画像セットとの中から、撮影内容（オブジェクトのカテゴリーや配置）が所定の類似度基準を満たす画像をポジティブペアとし、入力画像セットと、変換画像セットとの中から、撮影内容が所定の類似度基準を満たさない画像をネガティブペアとする。
なお、画像ペアを生成する処理の詳細については、図６を参照して説明したため、ここではその説明を省略する。 Next, in step S830, a pair generation unit (for example, the pair generation unit 365 shown in FIG. 4) inputs the source domain image, the target domain image, the pseudo source domain image, and the pseudo target domain image to generate image pairs. For example, as described above, the pair generation unit generates a positive pair of images from the input image set and the transformed image set whose captured content (object category and arrangement) meets a predetermined similarity standard, and generates a negative pair of images from the input image set and the transformed image set whose captured content does not meet the predetermined similarity standard.
The details of the process for generating image pairs have been described with reference to FIG. 6, and therefore will not be described here.

次に、ステップＳ８４０では、特徴抽出部（例えば、図４に示す特徴抽出部３６８）は、ステップＳ８３０で生成された画像ペアに含まれる各画像について、特徴マップを抽出する。ここでの特徴抽出部は、画像におけるオブジェクトのカテゴリー及び配置に応じて特徴マップを抽出するように、学習された畳み込みニューラルネットワークであってもよい。また、ここでの特徴マップとは、例えば１次元ベクトルであってもよく、２次元又は３次元のマトリックス表現であってもよい。 Next, in step S840, a feature extraction unit (e.g., feature extraction unit 368 shown in FIG. 4) extracts a feature map for each image included in the image pair generated in step S830. The feature extraction unit here may be a convolutional neural network trained to extract feature maps according to the categories and arrangements of objects in the images. The feature map here may be, for example, a one-dimensional vector, or a two- or three-dimensional matrix representation.

次に、ステップＳ８５０では、画像乖離度計算部（例えば、図４に示す画像乖離度計算部３６９）は、ステップＳ８４０で生成された特徴マップを用いて、各画像ペアの乖離度を計算する。ここでの乖離度とは、画像の特徴分布の距離を示す値であり、画像ペアの特徴分布の乖離度が大きい程、上述したドメインギャップが大きいことを示す。ここでの乖離度を計算する手法として、例えば数式１に示すように、多次元空間に投影される２つのベクトル間の角度のコサインを計算するコサイン類似度法を用いてもよい。

ここで、ｕ及びｖは、ソースドメインＳのｉ番目の画像の特徴マップfⁱ _s及び疑似ターゲットドメインＴ’のｉ番目の画像の特徴マップfⁱ _Ｔ’におけるベクトルであってもよい。このように、画像乖離度計算部、ペア生成部によって生成される各ポジティブペア及び各ネガティブペアについて、当該ペアの乖離度を計算する。
例えば、第１の画像に対応する第１の特徴マップ及び第２の画像に対応する第２の特徴マップとを含むポジティブペアがあり、第３の画像に対応する第３の特徴マップ及び第４の画像に対応する第４の特徴マップを含むネガティブペアがある場合、画像乖離度計算部は、第１の特徴マップと第２の特徴マップとの乖離度である第１の乖離度を計算し、第３の特徴マップと第４の特徴マップとの乖離度である第２の乖離度を計算してもよい。 Next, in step S850, an image disparity calculation unit (for example, the image disparity calculation unit 369 shown in FIG. 4) calculates the disparity of each image pair using the feature map generated in step S840. The disparity here is a value indicating the distance of the feature distribution of the images, and the larger the disparity of the feature distribution of the image pair, the larger the domain gap described above. As a method for calculating the disparity here, for example, a cosine similarity method may be used, which calculates the cosine of the angle between two vectors projected into a multidimensional space, as shown in Equation 1.

Here, u and v may be vectors in the feature map f ⁱ _s of the i-th image in the source domain S and the feature map f ⁱ _T ' of the i-th image in the pseudo target domain T'. In this way, the image disparity calculation unit calculates the disparity of each positive pair and each negative pair generated by the pair generation unit.
For example, if there is a positive pair including a first feature map corresponding to a first image and a second feature map corresponding to a second image, and a negative pair including a third feature map corresponding to a third image and a fourth feature map corresponding to a fourth image, the image discrepancy calculation unit may calculate a first discrepancy that is the discrepancy between the first feature map and the second feature map, and may calculate a second discrepancy that is the discrepancy between the third feature map and the fourth feature map.

次に、適応損失計算部（例えば、図４に示す適応損失計算部３７０）は、ポジティブペアに含まれる画像の乖離度（例えば、第１の乖離度）を減算させるための第１の適応損失パラメータを計算し、ネガティブペアに含まれる画像の乖離度（例えば、第２の乖離度）を向上させるための第２の適応損失パラメータを計算する。 Next, an adaptation loss calculation unit (e.g., the adaptation loss calculation unit 370 shown in FIG. 4) calculates a first adaptation loss parameter for subtracting the deviation (e.g., the first deviation) of the image included in the positive pair, and calculates a second adaptation loss parameter for improving the deviation (e.g., the second deviation) of the image included in the negative pair.

ポジティブペアに含まれる画像の乖離度を減算させるのは、撮影内容（オブジェクトのカテゴリー及び配置）が類似している画像について、ドメインギャップを短縮させるためのドメイン不変特徴（ｄｏｍａｉｎｉｎｖａｒｉａｎｔｆｅａｔｕｒｅｓ；つまり、ドメインによって変化しない特徴）を特徴抽出部に抽出させるように訓練するためである。これにより、ソースドメインと疑似ターゲットドメイン、及びターゲットドメインと疑似ソースドメインとの特徴分布が接近する。
また、ネガティブペアに含まれる画像の乖離度を向上させるのは、撮影内容（オブジェクトのカテゴリー及び配置）が類似していない画像について、異なる特徴を特徴抽出部に抽出させるように訓練するためである。これにより、異なるオブジェクトの識別力（つまり、検出精度）を向上させることができる。 The deviation of the positive pair images is subtracted to train the feature extractor to extract domain invariant features to reduce the domain gap for images with similar shooting contents (object categories and arrangements), thereby approximating the feature distributions between the source domain and the pseudo target domain, and between the target domain and the pseudo source domain.
The purpose of improving the dissimilarity of images in the negative pair is to train the feature extractor to extract different features for images that are dissimilar in content (object categories and placement), thereby improving the discrimination (i.e., detection accuracy) of different objects.

ここでの適応損失パラメータは、例えば以下の数式２から求められる。
なお、以下の数式２は、ソースドメイン及び疑似ターゲットドメインの適応損失パラメータＬ_adp ^S,T’
を計算するための数式であるが、ターゲットドメイン及び疑似ソースドメインの適応損失パラメータＬ_adp ^S’,Tを計算するために用いられてもよい。

数式２では、分子はポジティブペアの乖離度（例えば、第１の乖離度）であり、分母は、ネガティブペアの乖離度（例えば、第２の乖離度）である。従って、ポジティブペアの乖離度を減算し、ネガティブペアの乖離度を向上すると、適応損失パラメータがより小さい値となる。
なお、分母に示されるネガティブペアの乖離度は、複数のネガティブペアの乖離度の和であってもよい。１つのポジティブペアに対して、多数のネガティブペアの乖離度を合わせて数式２の分母とすることで、異なるオブジェクトの識別力（つまり、検出精度）を更に向上させることができる。 The adaptive loss parameter here is calculated, for example, from the following Equation 2.
The following Equation 2 is the adaptive loss parameter L _adp ^S,T′ of the source domain and the pseudo target domain.
, which may be used to calculate the adaptive loss parameters L _adp ^S',T in the target domain and pseudo source domain.

In Equation 2, the numerator is the positive pair disparity (e.g., the first disparity) and the denominator is the negative pair disparity (e.g., the second disparity). Thus, subtracting the positive pair disparity and improving the negative pair disparity results in a smaller adaptation loss parameter.
The deviation of the negative pair shown in the denominator may be the sum of the deviations of multiple negative pairs. By adding up the deviations of multiple negative pairs for one positive pair and using them as the denominator of Equation 2, the ability to distinguish between different objects (i.e., detection accuracy) can be further improved.

次に、ステップＳ８６０では、検出予測部（例えば、図４に示す検出予測部３７１）は、ステップＳ８４０で生成された各画像ペアの特徴マップを用いて、各画像におけるオブジェクトのカテゴリー及び位置を予測し、オブジェクトの予測したカテゴリー及び位置を示す予測結果を生成する。ここでの検出予測部は、例えば所定の対象領域（ＲｅｇｉｏｎｏｆＩｎｔｅｒｅｓｔ、ＲＯＩ）についての領域提案を生成するＲＰＮ（ＲｅｇｉｏｎＰｒｏｐｏｓａｌＮｅｔｗｏｒｋ）と、各ＲＯＩのカテゴリーを特定する分類器（ｃｌａｓｓｉｆｉｃａｔｉｏｎｈｅａｄ）と、各ＲＯＩのバウンディングボックスの座標を調整するためのボックス回帰器（Ｂｏｘｒｅｇｒｅｓｓｉｏｎｈｅａｄ）とを含むＦａｓｔｅｒＲ－ＣＮＮであってもよい。 Next, in step S860, a detection prediction unit (e.g., detection prediction unit 371 shown in FIG. 4) predicts the category and location of an object in each image using the feature map of each image pair generated in step S840, and generates a prediction result indicating the predicted category and location of the object. The detection prediction unit here may be, for example, a Faster R-CNN including a Region Proposal Network (RPN) that generates region proposals for a given region of interest (ROI), a classifier (classification head) that identifies the category of each ROI, and a box regression head for adjusting the coordinates of the bounding box of each ROI.

次に、検出損失計算部（例えば、図４に示す検出損失計算部３７２）は、検出予測部
によって生成された予測結果と、オブジェクトの実際のカテゴリー及び実際の位置を示すグラウンドトゥルースとを比較することで、検出予測部による検出損失パラメータを計算する。ここでの検出損失パラメータは、検出予測部によって生成された予測結果と、オブジェクトの実際のカテゴリー及び実際の位置を示すグラウンドトゥルースとの距離を計算することで得られるＲＰＮ損失、分類損失（ｃｌａｓｓｉｆｉｃａｔｉｏｎｌｏｓｓ）、及び回帰損失（Ｒｅｇｒｅｓｓｉｏｎｌｏｓｓ）を含んでもよい。
なお、ここでの検出損失パラメータは、ラベル付きのデータ（つまり、ソースドメイン画像及び疑似ターゲットドメイン画像）についてのみ計算される。 Next, a detection loss calculation unit (e.g., the detection loss calculation unit 372 shown in FIG. 4) calculates a detection loss parameter by the detection prediction unit by comparing the prediction result generated by the detection prediction unit with the ground truth indicating the actual category and actual location of the object. The detection loss parameter here may include an RPN loss, a classification loss, and a regression loss obtained by calculating the distance between the prediction result generated by the detection prediction unit and the ground truth indicating the actual category and actual location of the object.
Note that the detection loss parameter here is calculated only on labeled data (i.e., source domain images and pseudo target domain images).

次に、ステップＳ８７０では、パラメータ更新部（例えば、図４に示すパラメータ更新部３７３）は、適応損失計算部によって計算される第１の適応損失パラメータ及び第２の適応損失パラメータと、検出損失計算部によって計算される検出損失に基づいて、オブジェクト検出部のパラメータを調整することで、オブジェクト検出部を訓練する。 Next, in step S870, a parameter update unit (e.g., parameter update unit 373 shown in FIG. 4) trains the object detection unit by adjusting the parameters of the object detection unit based on the first adaptation loss parameter and the second adaptation loss parameter calculated by the adaptation loss calculation unit and the detection loss calculated by the detection loss calculation unit.

以上説明したオブジェクト検出部訓練方法８００によれば、適応損失パラメータと、検出損失パラメータとに基づいてオブジェクト検出部のパラメータを調整することで、オブジェクト検出部は、予測結果とグラウンドトゥルースとの差を最小化するために訓練され、例えばラベル付き学習データが少ないターゲットドメインの画像についても、高精度のオブジェクト検出結果を生成することができるようになる。 According to the object detection unit training method 800 described above, by adjusting the parameters of the object detection unit based on the adaptation loss parameter and the detection loss parameter, the object detection unit is trained to minimize the difference between the prediction result and the ground truth, and can generate highly accurate object detection results even for images in the target domain with little labeled training data, for example.

次に、図９を参照して、本開示の実施形態に係るオブジェクト検出処理について説明する。 Next, the object detection process according to an embodiment of the present disclosure will be described with reference to FIG.

図９は、本開示の実施形態に係るオブジェクト検出処理９００の一例を示す図である。図９に示すオブジェクト検出処理９００は、例えば図８に示すオブジェクト検出部訓練方法８００で訓練されたオブジェクト検出装置の各機能部によって実行され、所定のＸ線画像におけるオブジェクトのカテゴリー及び位置を検出するための処理である。 FIG. 9 is a diagram illustrating an example of an object detection process 900 according to an embodiment of the present disclosure. The object detection process 900 illustrated in FIG. 9 is executed by each functional unit of an object detection device trained, for example, by the object detection unit training method 800 illustrated in FIG. 8, and is a process for detecting the category and position of an object in a specified X-ray image.

まず、ステップＳ９１０では、画像入力部（例えば、図２に示す画像入力部２０８）は、
ターゲットドメイン画像を受け付ける。ここでは、画像入力部は、例えばターゲットドメインでのＸ線画像を取得するように構成されたＸ線装置から送信される画像を入力してもよい。 First, in step S910, an image input unit (for example, the image input unit 208 shown in FIG. 2)
A target domain image is received, where the image input may for example input an image sent from an X-ray device configured to acquire an X-ray image in the target domain.

次に、ステップＳ９２０では、特徴抽出部（例えば、図４に示す特徴抽出部３６８）は、ステップＳ９１０で画像入力部によって受け付けられたターゲットドメイン画像について、特徴マップを抽出する。上述したように、ここでの特徴抽出部は、画像におけるオブジェクトのカテゴリー及び配置に応じて特徴マップを抽出するように、学習された畳み込みニューラルネットワークであってもよい。また、ここでの特徴マップとは、例えば１次元ベクトルであってもよく、２次元又は３次元のマトリックス表現であってもよい。 Next, in step S920, a feature extraction unit (e.g., feature extraction unit 368 shown in FIG. 4) extracts a feature map for the target domain image received by the image input unit in step S910. As described above, the feature extraction unit here may be a convolutional neural network trained to extract a feature map according to the category and arrangement of objects in the image. In addition, the feature map here may be, for example, a one-dimensional vector, or a two- or three-dimensional matrix representation.

次に、ステップＳ９３０では、オブジェクト検出部（例えば、図２に示すオブジェクト検出部２２０）は、ステップＳ９２０で抽出された特徴マップに基づいて、ターゲットドメイン画像における各オブジェクトのカテゴリー及び位置を検出し、これらの各オブジェクトのカテゴリー及び位置を示すデータを検出結果として生成する。 Next, in step S930, an object detection unit (e.g., object detection unit 220 shown in FIG. 2) detects the category and position of each object in the target domain image based on the feature map extracted in step S920, and generates data indicating the category and position of each of these objects as detection results.

次に、ステップＳ９４０では、オブジェクト検出部は、ステップＳ９３０で生成した検出結果情報を出力する。ここでは、オブジェクト検出部は、検出結果を、例えば通信ネットワークを介して、所定の通知先（Ｘ線装置の管理者等）に送信してもよい。 Next, in step S940, the object detection unit outputs the detection result information generated in step S930. Here, the object detection unit may transmit the detection result to a predetermined notification destination (such as the administrator of the X-ray device) via, for example, a communication network.

以上説明したオブジェクト検出処理９００によれば、オブジェクト検出装置は、ラベル無し学習データが少ないターゲットドメインのＸ線画像についても、高精度のオブジェクト検出結果を生成することができる。 According to the object detection process 900 described above, the object detection device can generate highly accurate object detection results even for X-ray images of a target domain with little unlabeled training data.

以上、本開示の実施の形態について説明したが、本開示は、上述した実施の形態に限定されるものではなく、本開示の要旨を逸脱しない範囲において種々の変更が可能である。 Although the embodiments of the present disclosure have been described above, the present disclosure is not limited to the above-described embodiments, and various modifications are possible without departing from the spirit of the present disclosure.

２００オブジェクト検出システム
２０１オブジェクト検出装置
２０２通信ネットワーク
２０３プロセッサ
２０４Ｉ／Ｏインターフェース
２０５ネットワークインターフェース
２０６ユーザＩ／Ｏインターフェース
２０７メモリ
２０８画像入力部
２１０ドメイン変換部
２１１Ｘ線装置
２１５ドメイン変換学習部
２２０オブジェクト検出部
２２５オブジェクト検出学習部
２３０ソースドメイン画像ストレージ部
２３５ターゲットドメイン画像ストレージ部
２４０疑似ソースドメイン画像ストレージ部
２４５疑似ターゲットドメイン画像ストレージ部
３６１ソースドメイン画像
３６２ターゲットドメイン画像
３６３疑似ソースドメイン画像
３６４疑似ターゲットドメイン画像
３６５ペア生成部
３６６ポジティブペア
３６７ネガティブペア
３６８特徴抽出部
３６９画像乖離度計算部
３７０適応損失計算部
３７１検出予測部
３７２検出損失計算部
３７３パラメータ更新部 200 Object detection system 201 Object detection device 202 Communication network 203 Processor 204 I/O interface 205 Network interface 206 User I/O interface 207 Memory 208 Image input unit 210 Domain conversion unit 211 X-ray device 215 Domain conversion learning unit 220 Object detection unit 225 Object detection learning unit 230 Source domain image storage unit 235 Target domain image storage unit 240 Pseudo source domain image storage unit 245 Pseudo target domain image storage unit 361 Source domain image 362 Target domain image 363 Pseudo source domain image 364 Pseudo target domain image 365 Pair generation unit 366 Positive pair 367 Negative pair 368 Feature extraction unit 369 Image deviation calculation unit 370 Adaptation loss calculation unit 371 Detection prediction unit 372 Detection loss calculation unit 373 Parameter update unit

Claims

1. An object detection apparatus for detecting an object in an X-ray image, comprising:
an image input unit for accepting a set of input images including a source domain image corresponding to a source domain and a target domain image corresponding to a target domain;
A domain conversion unit that performs a domain conversion process on the input image set to generate a converted image set including a pseudo target domain image obtained by converting the source domain image into the target domain and a pseudo source domain image obtained by converting the target domain image into the source domain image;
a pair generation unit that generates image pairs using the input image set and the transformed image set;
a feature extractor for extracting a feature map for each image in the image pair;
a detection and prediction unit that generates a prediction result indicating a category and a position of an object in each image of the image pair based on the feature map;
an object detector configured to analyze a given X-ray image to generate detection results indicative of categories and locations of objects in the X-ray image;
1. An object detection device comprising:

The pair generation unit
A first image and a second image, the contents of which satisfy a predetermined similarity criterion, are determined as a positive pair from among the input image set and the converted image set;
A third image and a fourth image, the photographed contents of which do not satisfy a predetermined similarity standard, are set as a negative pair from among the input image set and the converted image set.
The object detection device according to claim 1 .

The feature extraction unit is
extracting a first feature map for a first image in the positive pair;
extracting a second feature map for a second image in the positive pair;
extracting a third feature map for a third image in the negative pair;
extracting a fourth feature map for a fourth image in the negative pair;
The object detection device includes:
calculating a first degree of discrepancy between the first feature map and the second feature map;
an image disparity calculation unit that calculates a second disparity between the third feature map and the fourth feature map;
The object detection apparatus according to claim 2 , further comprising:

The object detection device includes:
Calculating a first adaptive loss parameter for subtracting the first discrepancy for the first image and the second image included in the positive pair;
an adaptation loss calculation unit for calculating a second adaptation loss parameter for improving the second discrepancy for a third image and a fourth image included in the negative pair;
The object detection apparatus according to claim 3 , further comprising:

a detection loss calculation unit for calculating a detection loss parameter indicative of a detection loss by the object detection unit by comparing the prediction result with a ground truth indicative of an actual category and an actual location of the object;
a parameter update unit that uses the detection loss parameter, the first adaptation loss parameter, and the second adaptation loss parameter to update parameters of the object detection unit to train the object detection unit;
The object detection apparatus according to claim 4 , further comprising:

The object detection unit is a deep neural network.
The object detection device according to claim 5 .

1. An object detection system for detecting an object in an X-ray image, comprising:
an object detection device for analyzing the X-ray image and detecting an object;
an X-ray device for capturing the X-ray image and transmitting the X-ray image to the object detection device is connected via a communication network;
The object detection device includes:
an image input unit for accepting a set of input images including a source domain image corresponding to a source domain and a target domain image corresponding to a target domain;
A domain conversion unit that performs a domain conversion process on the input image set to generate a converted image set including a pseudo target domain image obtained by converting the source domain image into the target domain and a pseudo source domain image obtained by converting the target domain image into the source domain image;
a pair generation unit that generates image pairs using the input image set and the transformed image set;
a feature extractor for extracting a feature map for each image in the image pair;
a detection and prediction unit that generates a prediction result indicating a category and a position of an object in each image of the image pair based on the feature map;
an object detection unit that analyzes the X-ray image received from the X-ray device to generate a detection result indicating a category and a position of an object in the X-ray image and transmits the detection result to a predetermined notification destination;
1. An object detection system comprising:

1. An object detection method for detecting an object in an X-ray image, comprising:
accepting a set of input images including a source domain image corresponding to a source domain and a target domain image corresponding to a target domain;
performing a domain transformation process on the input image set to generate a transformed image set including a pseudo target domain image obtained by transforming the source domain image into the target domain and a pseudo source domain image obtained by transforming the target domain image into the source domain image;
generating image pairs by selecting a first image and a second image from the input image set and the transformed image set, the first image and the second image having contents that satisfy a predetermined similarity criterion as a positive pair, and selecting a third image and a fourth image from the input image set and the transformed image set, the third image and the fourth image having contents that do not satisfy a predetermined similarity criterion as a negative pair;
extracting a first feature map for a first image in the positive pair, extracting a second feature map for a second image in the positive pair, extracting a third feature map for a third image in the negative pair, and extracting a fourth feature map for a fourth image in the negative pair;
calculating a first degree of discrepancy between the first feature map and the second feature map;
calculating a second degree of discrepancy between the third feature map and the fourth feature map;
Calculating a first adaptive loss parameter for subtracting the first discrepancy for a first image and a second image included in the positive pair, and calculating a second adaptive loss parameter for improving the second discrepancy for a third image and a fourth image included in the negative pair;
generating a prediction indicative of a category and a location of an object in each image of the image pair based on the first feature map, the second feature map, the third feature map, and the fourth feature map;
calculating a detection loss parameter indicative of a detection loss due to object detection by comparing the prediction result with a ground truth indicative of an actual category and an actual location of the object;
training a deep neural network for object detection by updating parameters of the deep neural network for object detection using the detection loss parameters, the first adaptive loss parameters, and the second adaptive loss parameters;
analyzing a given X-ray image using the trained deep neural network for object detection to generate detection results indicative of object categories and locations in the X-ray image;
1. A method for detecting an object, comprising: