JP7832141B2

JP7832141B2 - Information processing device, information processing method, and program

Info

Publication number: JP7832141B2
Application number: JP2023043112A
Authority: JP
Inventors: 賢治土井; 智大田中; 雄也大塚; 真也落合; 一浩二宮
Original assignee: Individual
Current assignee: Individual
Priority date: 2023-03-17
Filing date: 2023-03-17
Publication date: 2026-03-17
Anticipated expiration: 2043-03-17
Also published as: JP2024132374A

Description

本発明は、情報処理装置、情報処理方法、及びプログラムに関する。 This invention relates to an information processing device, an information processing method, and a program.

機械学習により、画像上において所望の対象物を検出するというアノテーションに関する技術が知られている（例えば、特許文献１参照）。 A technique for annotating images using machine learning to detect desired objects is known (see, for example, Patent Document 1).

特開２０１９－４６０９４号公報Japanese Patent Publication No. 2019-46094

しかしながら従来の技術では、対象物の領域のセグメンテーションの精度が不十分であり、その結果、検出された対象物をアノテーションとして画像に付与するのが効果的でない場合があった。 However, conventional techniques lacked sufficient accuracy in segmenting the object's region, resulting in ineffective methods of annotating the detected object onto the image.

本発明は、このような事情が考慮されたものであり、画像上において対象物の領域を精度よくセグメンテーションすることで、アノテーションを効果的に行うことができる情報処理装置、情報処理方法、及びプログラムを提供することを目的の一つとする。 This invention takes these circumstances into consideration and aims to provide an information processing device, an information processing method, and a program that can effectively perform annotation by accurately segmenting the region of an object on an image.

本発明の一態様は、上空から地表面が撮像された画像上において対象物の領域を検出する検出部と、前記領域の向きを推定する推定部と、前記向きに基づいて、前記領域を整形する画像処理部と、を備える情報処理装置である。 One aspect of the present invention is an information processing apparatus comprising: a detection unit for detecting the region of an object on an image of the ground surface captured from above; an estimation unit for estimating the orientation of the region; and an image processing unit for shaping the region based on the orientation.

本発明の一態様によれば、画像上において対象物の領域を精度よくセグメンテーションすることで、アノテーションを効果的に行うことができる。 According to one aspect of the present invention, annotation can be effectively performed by accurately segmenting the region of an object on an image.

実施形態に係る情報処理装置１００の構成の一例を表す図である。This figure shows an example of the configuration of the information processing device 100 according to the embodiment. 実施形態に係る処理部１１０の一連の処理の流れを示すフローチャートである。This flowchart shows the sequence of processing steps of the processing unit 110 according to the embodiment. 第１機械学習モデルＭＤＬ１を用いて横断歩道の領域を検出する方法を説明するための図である。This figure illustrates a method for detecting the area of a pedestrian crossing using the first machine learning model MDL1. 第２機械学習モデルＭＤＬ２を用いて各種向きを推定する方法を説明するための図である。This figure illustrates a method for estimating various orientations using the second machine learning model, MDL2. 横断歩道の領域を整形する方法を説明するための図である。This is a diagram illustrating how to shape the area of a pedestrian crossing. アノテーションが付与された衛星画像の一例を表す図である。This figure shows an example of annotated satellite imagery. 実施形態の情報処理装置１００のハードウェア構成の一例を示す図である。This figure shows an example of the hardware configuration of the information processing device 100 in this embodiment.

以下、図面を参照し、本発明の情報処理装置、情報処理方法、及びプログラムの実施形態について説明する。 The following describes embodiments of the information processing apparatus, information processing method, and program of the present invention with reference to the drawings.

［概要］
本実施形態の一態様の情報処理装置は、上空から地表面が撮像された画像上において対象物の領域を検出するとともに、対象物の領域の向きを推定する。 [overview]
One aspect of this embodiment of the information processing device detects the region of an object on an image of the ground surface taken from above, and estimates the orientation of the region of the object.

上空から地表面が撮像された画像は、例えば、衛星画像（衛星写真）、航空画像（航空写真）、空中画像（空中写真）などである。 Images of the Earth's surface taken from above include, for example, satellite images (satellite photographs), aerial images (aerial photographs), and aerial photographs (aerial photographs).

画像上において検出される対象物は、ユーザに提供されるサービスやアプリ―ションの種類に応じて任意に決定することができる。例えば、ユーザに対して衛星画像や航空画像などを地図データとして提供する場合、それら画像上に写るあらゆるものが対象物となり得る。より具体的には、地図データを用いてユーザを目的地までナビゲーションする場合、目的地までの経路上に存在する横断歩道や信号機、スクールゾーン、踏切などの交通規制設置物が対象物として選択されてよい。以下、一例として対象物が「横断歩道」であるものとして説明する。 The objects detected on an image can be arbitrarily determined depending on the type of service or application provided to the user. For example, when providing users with satellite or aerial imagery as map data, anything visible in those images can be considered an object. More specifically, when navigating a user to a destination using map data, traffic control structures such as pedestrian crossings, traffic lights, school zones, and railway crossings along the route to the destination may be selected as objects. The following explanation will use "pedestrian crossings" as an example object.

なお、衛星画像や航空画像などの地図データが土砂くずれや河川氾濫といった自然災害の防災に利用される場合、それら災害を引き起こし得るもの（例えば森林を伐採して設置された太陽光パネルなど）が対象物として選択されてもよい。 Furthermore, when map data such as satellite imagery and aerial imagery is used for disaster prevention against natural disasters such as landslides and river flooding, objects that could potentially cause these disasters (for example, solar panels installed after forests have been cleared) may be selected as targets.

情報処理装置は、画像上において対象物（例えば横断歩道）の領域を検出し、かつ対象物の領域の向きを推定すると、推定された向きに基づいて対象物の領域を整形する。このように、対象物（例えば横断歩道）の領域を整形することでセグメンテーションの精度が向上する。この結果、横断歩道などの対象物を衛星画像などにアノテーションとして付与する際に、そのアノテーションを効果的に行うことができる。 The information processing device detects the region of an object (e.g., a pedestrian crossing) in an image and estimates its orientation. Based on the estimated orientation, it then reshapes the region of the object. This reshaping of the object's region improves the accuracy of segmentation. As a result, when adding annotations to satellite imagery, such as pedestrian crossings, the annotation process becomes more effective.

［情報処理装置］
図１は、実施形態に係る情報処理装置１００の構成の一例を表す図である。例えば、情報処理装置１００は、ユーザの端末装置と情報の送受信を行うウェブサーバやアプリケーションサーバである。 [Information Processing Device]
Figure 1 is a diagram showing an example of the configuration of an information processing device 100 according to the embodiment. For example, the information processing device 100 is a web server or application server that sends and receives information with a user's terminal device.

ユーザの端末装置は、スマートフォンやパーソナルコンピュータ、タブレット端末などの通信機能と表示機能を有するコンピュータ装置である。具体的には、端末装置は、ネットワークＮＷを介してコンテンツ提供装置５０などの外部装置を通信するための通信インタフェースと、ディスプレイと、ＧＮＳＳ（Global Navigation Satellite System）受信機とを備える。ネットワークＮＷは、インターネットやＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）、セルラー網などを含む。 The user's terminal device is a computer device with communication and display functions, such as a smartphone, personal computer, or tablet terminal. Specifically, the terminal device includes a communication interface for communicating with external devices such as the content provision device 50 via a network NW, a display, and a GNSS (Global Navigation Satellite System) receiver. The network NW includes the Internet, LAN (Local Area Network), WAN (Wide Area Network), and cellular networks.

通信インタフェースは、例えば、ＮＩＣ（Network Interface Card）等のネットワークカード、無線通信モジュールを含む。ディスプレイは、例えば、ＬＣＤ（Liquid Crystal Display）や有機ＥＬ（Electro Luminescence）ディスプレイ等を含む。これらディスプレイには、ユーザからの各種の入力操作を受け付けるためのＧＵＩ（Graphical User Interface）が表示される。ＧＮＳＳ受信機は、端末装置の位置を測位する。 The communication interface includes, for example, a network card such as a NIC (Network Interface Card) and a wireless communication module. The display includes, for example, an LCD (Liquid Crystal Display) or an organic EL (Electro-Luminescence) display. These displays show a GUI (Graphical User Interface) for accepting various user input operations. The GNSS receiver determines the location of the terminal device.

更に、端末装置は、ＣＰＵ（Central Processing Unit）などのプロセッサを備えており、そのＣＰＵがＵＡ（User Agent）を実行することで、各種コンテンツがディスプレイに表示される。 Furthermore, the terminal device is equipped with a processor such as a CPU (Central Processing Unit), and the CPU executes a User Agent (UA), which then displays various content on the screen.

情報処理装置１００は、ユーザの端末装置上においてウェブブラウザ又はアプリケーションがＵＡとして起動され、そのＵＡからリクエストが送信されると、そのリクエストに対するレスポンスとして、各種コンテンツを端末装置に提供する。コンテンツには、例えば、横断歩道などの対象物がアノテーションとして付与された衛星画像などが含まれる。 The information processing device 100, when a web browser or application is launched as a user agent (UA) on the user's terminal device and a request is sent from that UA, provides various content to the terminal device as a response to that request. This content includes, for example, satellite images with annotations indicating objects such as pedestrian crossings.

図示のように情報処理装置１００は、通信部１０２と、処理部１１０と、記憶部１３０とを備える。 As shown in the figure, the information processing device 100 comprises a communication unit 102, a processing unit 110, and a storage unit 130.

通信部１０２は、例えば、ネットワークＮＷに接続するためのネットワークカード等の通信インターフェースである。 The communication unit 102 is, for example, a communication interface such as a network card for connecting to a network NW.

処理部１１０は、取得部１１２と、領域検出部１１４と、向き推定部１１６と、画像処理部１１８と、通信制御部１２０と、学習部１２２とを備える。これらの構成要素は、例えば、ＣＰＵ（Central Processing Unit）などのハードウェアプロセッサがプログラム（ソフトウェア）を実行することにより実現される。処理部１１０の構成要素のうち一部または全部は、ＬＳＩ（Large Scale Integration）やＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）、ＧＰＵ（Graphics Processing Unit）などのハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。 The processing unit 110 comprises an acquisition unit 112, a region detection unit 114, an orientation estimation unit 116, an image processing unit 118, a communication control unit 120, and a learning unit 122. These components are realized, for example, by a hardware processor such as a CPU (Central Processing Unit) executing a program (software). Some or all of the components of the processing unit 110 may be realized by hardware (including circuitry) such as LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), FPGA (Field-Programmable Gate Array), and GPU (Graphics Processing Unit), or by the collaboration of software and hardware.

記憶部１３０は、例えば、ＨＤＤ（Hard Disc Drive）、フラッシュメモリ、ＥＥＰＲＯＭ（Electrically Erasable Programmable Read Only Memory）、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）などの記憶装置により実現される。記憶部１３０には、プロセッサによって実行されるファームウェアやアプリケーションプログラムなどが格納される。プログラムは、予め記憶部１３０に格納されていてもよいし、ＤＶＤやＣＤ－ＲＯＭなどの着脱可能な記憶媒体（非一過性の記憶媒体）に格納されており、記憶媒体がドライブ装置に装着されることで記憶部１３０にインストールされてもよい。 The storage unit 130 is implemented using, for example, a storage device such as an HDD (Hard Disc Drive), flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), ROM (Read Only Memory), or RAM (Random Access Memory). The storage unit 130 stores firmware and application programs executed by the processor. The programs may be pre-stored in the storage unit 130, or they may be stored on a removable storage medium (non-transient storage medium) such as a DVD or CD-ROM, and installed in the storage unit 130 when the storage medium is inserted into the drive device.

また記憶部１３０には、衛星画像や航空画像を含む地図データ１３２や、トレーニングデータセット１３４が記憶されている。地図データ１３２には、衛星画像や航空画像といった上空から地表面を撮像した各種画像が含まれる。トレーニングデータセット１３４は、後述の機械学習モデルを学習するために用意されたデータセット（入力データと正解の出力データの組み）である。 The memory unit 130 also stores map data 132, which includes satellite and aerial images, and a training dataset 134. The map data 132 includes various images of the Earth's surface taken from above, such as satellite and aerial images. The training dataset 134 is a dataset (a combination of input data and correct output data) prepared for training the machine learning model described later.

［情報処理装置の処理フロー］
以下、フローチャートに即して処理部１１０の処理内容についてフローチャートを用いて説明する。図２は、実施形態に係る処理部１１０の一連の処理の流れを示すフローチャートである。本フローチャートの処理は、例えば、所定の周期で繰り返し実行されてよい。 [Processing flow of information processing equipment]
The processing details of the processing unit 110 will be explained below using a flowchart. Figure 2 is a flowchart showing the sequence of processing steps of the processing unit 110 according to this embodiment. The processing steps in this flowchart may be executed repeatedly, for example, at a predetermined cycle.

まず、取得部１１２は、衛星画像ＩＭＧ１を取得する（ステップＳ１００）。例えば、記憶部１３０に地図データ１３２として衛星画像ＩＭＧ１が記憶されている場合、取得部１１２は、記憶部１３０から衛星画像ＩＭＧ１を読み出して取得してよい。また、取得部１１２は、通信部１０２を介して外部サーバ（例えばデータソースサーバ）から衛星画像ＩＭＧ１を取得してもよい。また、例えば、情報処理装置１００のドライブ装置に、衛星画像ＩＭＧ１が格納された非一過性の記憶媒体（例えばフレッシュメモリ等）が接続された場合、取得部１１２は、記憶媒体から衛星画像ＩＭＧ１を読み出してもよい。上述したように取得部１１２は、衛星画像の代わりに航空画像や空中画像を取得してもよい。以下、一例として取得対象の画像が衛星画像ＩＭＧ１であるものとして説明する。 First, the acquisition unit 112 acquires the satellite image IMG1 (step S100). For example, if the satellite image IMG1 is stored as map data 132 in the storage unit 130, the acquisition unit 112 may read and acquire the satellite image IMG1 from the storage unit 130. Alternatively, the acquisition unit 112 may acquire the satellite image IMG1 from an external server (e.g., a data source server) via the communication unit 102. Furthermore, if a non-transient storage medium (e.g., fresh memory) containing the satellite image IMG1 is connected to the drive device of the information processing device 100, the acquisition unit 112 may read the satellite image IMG1 from the storage medium. As described above, the acquisition unit 112 may acquire aerial images or other images instead of satellite images. The following explanation assumes that the image to be acquired is the satellite image IMG1.

次に、領域検出部１１４は、取得部１１２によって取得された衛星画像ＩＭＧ１（航空画像や空中画像でもよい）上において、対象物の一例である横断歩道の領域を検出する（ステップＳ１０２）。 Next, the region detection unit 114 detects the region of a pedestrian crossing, which is an example of an object, on the satellite image IMG1 (which may also be an aerial or aerial image) acquired by the acquisition unit 112 (step S102).

例えば、領域検出部１１４は、セグメンテーションモデルの一つである第１機械学習モデルＭＤＬ１を用いて、衛星画像ＩＭＧ１上において横断歩道の領域を検出する（セグメンテーションを行う）。 For example, the region detection unit 114 uses the first machine learning model MDL1, which is one of the segmentation models, to detect the area of a pedestrian crossing on the satellite image IMG1 (performing segmentation).

第１機械学習モデルＭＤＬ１は、例えば、例えば、ＣＮＮ（Convolutional Neural Network(s)）等のニューラルネットワークを用いて実装されたセグメンテーションモデルであってよい。 The first machine learning model, MDL1, may be a segmentation model implemented using a neural network such as a CNN (Convolutional Neural Network(s)).

第１機械学習モデルＭＤＬ１は、あるトレーニング対象の衛星画像に対して、トレーニング対象の衛星画像上において検出されるべき正解の対象物（例えば横断歩道）の領域が対応付けられた第１トレーニングデータセットに基づいて学習される。対象物を横断歩道とした場合、その領域は４頂点をもつ四角形の領域となる。 The first machine learning model, MDL1, is trained based on a first training dataset in which a given satellite image is associated with the regions of the correct objects (e.g., crosswalks) that should be detected on that satellite image. If the object is a crosswalk, its region will be a quadrilateral with four vertices.

図３は、第１機械学習モデルＭＤＬ１を用いて横断歩道の領域を検出する方法を説明するための図である。領域検出部１１４は、取得部１１２によって取得された衛星画像ＩＭＧ１を、第１機械学習モデルＭＤＬ１に入力する。上述したように、第１機械学習モデルＭＤＬ１は、第１トレーニングデータセットに基づき学習されている。そのため、第１機械学習モデルＭＤＬ１は、衛星画像ＩＭＧ１が入力されたことを受けて、横断歩行の領域として、その領域の位置やサイズを出力する。更に、第１機械学習モデルＭＤＬ１は、領域の位置やサイズの尤もらしさ、つまりどの程度信頼できるのかを表すスコア（以下、信頼スコアという）を出力してよい。 Figure 3 illustrates a method for detecting a pedestrian crossing area using the first machine learning model MDL1. The area detection unit 114 inputs the satellite image IMG1 acquired by the acquisition unit 112 into the first machine learning model MDL1. As described above, the first machine learning model MDL1 is trained based on the first training dataset. Therefore, upon receiving the satellite image IMG1, the first machine learning model MDL1 outputs the location and size of the pedestrian crossing area. Furthermore, the first machine learning model MDL1 may output a score (hereinafter referred to as the confidence score) representing the likelihood, or reliability, of the location and size of the area.

フローチャートの説明に戻る。次に、画像処理部１１８は、衛星画像ＩＭＧ１から横断歩行の領域をクロッピングし（切り出し）、そのクロッピングした衛星画像ＩＭＧ１の一部領域をクロップド画像ＩＭＧ２として生成する（ステップＳ１０４）。 Returning to the flowchart explanation, the image processing unit 118 then crops (extracts) the area of the traverse from the satellite image IMG1, and generates a cropped image IMG2 from the cropped portion of the satellite image IMG1 (step S104).

第１機械学習モデルＭＤＬ１を用いて横断歩道の領域を検出した場合、その横断歩道の領域は四角形の領域として検出されやすいが、一方で５頂点をもつ五角形の領域や、それ以上の頂点をもつ多角形の領域として検出される場合がある。 When detecting the area of a pedestrian crossing using the first machine learning model MDL1, the area of the pedestrian crossing is often detected as a quadrilateral, but it may also be detected as a pentagon with five vertices or a polygon with more than five vertices.

横断歩道の領域が５頂点以上の多角形の領域として検出された場合、画像処理部１１８は、以下の条件をもとに、多角形である横断歩道の領域を４頂点をもつ四角形（或いは５頂点をもつ五角形）に近似してよい。
（１）最初に４頂点をもつ四角形のポリゴンで横断歩道の領域を近似する。
（２）近似した四角形のポリゴンの面積と、元の横断歩道の領域の面積との差が、元の横断歩道の領域の面積の２５％未満、又は１ピクセルの面積を１とした場合に、元の横断歩道の領域の面積が１２８未満であれば、近似せずに元の横断歩道の領域をそのまま採用する。
（３）次に５頂点をもつ五角形のポリゴンで、四角形のポリゴンに近似した横断歩道の領域を更に近似する。 If the area of a pedestrian crossing is detected as a polygonal area with five or more vertices, the image processing unit 118 may approximate the polygonal area of the pedestrian crossing to a quadrilateral with four vertices (or a pentagon with five vertices) based on the following conditions.
(1) First, approximate the area of the crosswalk with a quadrilateral polygon having four vertices.
(2) If the difference between the area of the approximated quadrilateral polygon and the area of the original crosswalk is less than 25% of the area of the original crosswalk, or if the area of the original crosswalk is less than 128 when the area of one pixel is set to 1, the original crosswalk area will be used as is without approximation.
(3) Next, the area of the pedestrian crossing, which is approximated by the quadrilateral polygon, is further approximated by a pentagonal polygon with 5 vertices.

次に、画像処理部１１８は、クロップド画像ＩＭＧ２のアスペクト比が許容範囲内であるか否かを判定する（ステップＳ１０６）。例えば、後述の第２機械学習モデルＭＤＬ２に対して入力するクロップド画像ＩＭＧ２が正方形であるものと想定されている場合（学習時に正方形の画像を含む第２トレーニングデータセットを用いている場合）、アスペクト比の許容範囲は、例えば、正方形のアスペクト比１：１と同じか、又は正方形のアスペクト比に数％の誤差を許容したものであってよい。 Next, the image processing unit 118 determines whether the aspect ratio of the cropped image IMG2 is within an acceptable range (step S106). For example, if the cropped image IMG2 input to the second machine learning model MDL2 (described later) is assumed to be square (i.e., if a second training dataset containing square images is used during training), the acceptable range for the aspect ratio may be, for example, the same as the aspect ratio of a square image (1:1), or one that allows for an error of a few percent in the aspect ratio of a square image.

クロップド画像ＩＭＧ２のアスペクト比が許容範囲外である場合（つまりクロップド画像ＩＭＧ２が正方形でない場合）、画像処理部１１８は、アスペクト比が許容範囲内となるように、クロップド画像ＩＭＧ２をパディングする（ステップＳ１０８）。 If the aspect ratio of the cropped image IMG2 is outside the acceptable range (i.e., the cropped image IMG2 is not square), the image processing unit 118 pads the cropped image IMG2 so that the aspect ratio is within the acceptable range (step S108).

例えば、画像処理部１１８は、パディングとして、クロップド画像ＩＭＧ２を表す行列に対して、そのクロップド画像ＩＭＧ２の上下左右のいずれか一つ又は全部に任意の画素値（例えば道路面と同色の画素値）を加えることで、クロップド画像ＩＭＧ２の形状を正方形に近づける。 For example, the image processing unit 118, as padding, adds an arbitrary pixel value (for example, a pixel value of the same color as the road surface) to one or all of the top, bottom, left, or right sides of the matrix representing the cropped image IMG2, thereby making the shape of the cropped image IMG2 closer to a square.

一方、クロップド画像ＩＭＧ２のアスペクト比が許容範囲内である場合（つまりクロップド画像ＩＭＧ２が正方形である場合）、画像処理部１１８は、Ｓ１１８の処理であるパディングを省略する。 On the other hand, if the aspect ratio of the cropped image IMG2 is within an acceptable range (i.e., if the cropped image IMG2 is square), the image processing unit 118 omits the padding process in S118.

一般的に、横断歩道の形状は道路やその周辺の交通事情に応じて決められており、横断歩道の形状が極端に縦長であったり、横長であったりする場合がある。このように様々な形状の横断歩道のクロップド画像ＩＭＧ２が、後述の第２機械学習モデルＭＤＬ２に入力されることになると、第２機械学習モデルＭＤＬ２によって推定される各向き（詳細は後述する）が、学習時に想定された向きから乖離しやすい。例えば、学習時に想定された向きが０度、９０度、１８０度である場合、それら角度以外が第２機械学習モデルＭＤＬ２によって推定されやすい。 Generally, the shape of pedestrian crossings is determined according to the road and surrounding traffic conditions, and pedestrian crossings can sometimes be extremely elongated vertically or horizontally. When cropped images of pedestrian crossings of various shapes (IMG2) are input into the second machine learning model (MDL2), the orientations estimated by MDL2 (details below) tend to deviate from the orientations assumed during training. For example, if the orientations assumed during training were 0 degrees, 90 degrees, and 180 degrees, angles other than these are more likely to be estimated by MDL2.

これに対して、後述の第２機械学習モデルＭＤＬ２に対して入力するクロップド画像ＩＭＧ２のアスペクト比をパディングによって調整することで、後述の第２機械学習モデルＭＤＬ２の出力結果の精度を向上させることができる。例えば、アスペクト比を統一した上でクロップド画像ＩＭＧ２を第２機械学習モデルＭＤＬ２へと入力することで、第２機械学習モデルＭＤＬ２が学習時に想定された向き（０度、９０度、１８０度）を出力しやすくなる。 In contrast, by adjusting the aspect ratio of the cropped image IMG2 input to the second machine learning model MDL2 (described later) using padding, the accuracy of the output results of the second machine learning model MDL2 can be improved. For example, by inputting the cropped image IMG2 into the second machine learning model MDL2 after unifying the aspect ratio, the second machine learning model MDL2 becomes more likely to output the orientation (0 degrees, 90 degrees, 180 degrees) that was assumed during training.

なお、第２機械学習モデルＭＤＬ２に対して入力するクロップド画像ＩＭＧ２が正方形ではなく長方形であると想定されている場合（学習時に長方形の画像を含む第２トレーニングデータセットを用いている場合）、Ｓ１０６の判定処理で比較されるアスペクト比の許容範囲は、正方形ではなく長方形のアスペクト比に応じて決められてよい。 Furthermore, if the cropped image IMG2 input to the second machine learning model MDL2 is assumed to be rectangular rather than square (i.e., if a second training dataset containing rectangular images is used during training), the acceptable range of aspect ratios compared in the judgment process of S106 may be determined according to the aspect ratio of a rectangle, rather than a square.

次に、向き推定部１１６は、クロップド画像ＩＭＧ２に含まれる横断歩道の向きＶ１と、横断歩道として道路上に描かれた白線の向きＶ２とを推定する（ステップＳ１１０）。 Next, the orientation estimation unit 116 estimates the orientation V1 of the pedestrian crossing included in the cropped image IMG2 and the orientation V2 of the white line drawn on the road as the pedestrian crossing (step S110).

横断歩道の向きＶ１は、歩行者が横断歩道を横断する方向であり、白線の長手方向（白線が延在する方向）と交差する方向である。白線の向きＶ２は、白線の長手方向（白線が延在する方向）である。横断歩道の向きＶ１は「第１方向」の一例であり、白線の向きＶ２は「第２方向」の一例である。 The direction V1 of the crosswalk is the direction in which pedestrians cross the crosswalk, and it intersects with the longitudinal direction of the white line (the direction in which the white line extends). The direction V2 of the white line is the longitudinal direction of the white line (the direction in which the white line extends). The direction V1 of the crosswalk is an example of the "first direction," and the direction V2 of the white line is an example of the "second direction."

例えば、向き推定部１１６は、ＣＮＮ（Convolutional Neural Network(s)）等のニューラルネットワークを用いて実装された第２機械学習モデルＭＤＬ２を用いて、クロップド画像ＩＭＧ２上において横断歩道の向きＶ１及び白線の向きＶ２を推定する。 For example, the orientation estimation unit 116 uses a second machine learning model MDL2, implemented using a neural network such as a CNN (Convolutional Neural Network(s)), to estimate the orientation V1 of the crosswalk and the orientation V2 of the white line on the cropped image IMG2.

第２機械学習モデルＭＤＬ２は、あるトレーニング対象のクロップド画像上において推定されるべき正解の対象物の領域の向きとして横断歩道の向きＶ１及び白線の向きＶ２が対応付けられた第２トレーニングデータセットに基づいて学習される。トレーニング対象のクロップド画像は、トレーニング対象の衛星画像から横断歩道の領域が切り出された画像であり、例えば、上述したように正方形の画像である。 The second machine learning model, MDL2, is trained on a second training dataset in which the orientation of the crosswalk (V1) and the orientation of the white line (V2) are associated with the orientation of the ground truth object region to be estimated on a cropped image used for training. The cropped image used for training is an image extracted from a satellite image containing the crosswalk region, and is, for example, a square image as described above.

図４は、第２機械学習モデルＭＤＬ２を用いて各種向きを推定する方法を説明するための図である。向き推定部１１６は、画像処理部１１８によって生成されたクロップド画像ＩＭＧ２を、第２機械学習モデルＭＤＬ２に入力する。上述したように、第２機械学習モデルＭＤＬ２は、第２トレーニングデータセットに基づき学習されている。そのため、第２機械学習モデルＭＤＬ２は、クロップド画像ＩＭＧ２が入力されたことを受けて、対象物の領域の向きとして、横断歩道の向きＶ１及び白線の向きＶ２を出力する。 Figure 4 illustrates the method for estimating various orientations using the second machine learning model MDL2. The orientation estimation unit 116 inputs the cropped image IMG2, generated by the image processing unit 118, into the second machine learning model MDL2. As described above, the second machine learning model MDL2 is trained based on the second training dataset. Therefore, upon receiving the cropped image IMG2, the second machine learning model MDL2 outputs the orientation of the crosswalk V1 and the orientation of the white line V2 as the orientation of the object's region.

次に、画像処理部１１８は、向き推定部１１６によって推定された横断歩道の向きＶ１及び白線の向きＶ２に基づいて、領域検出部１１４によって検出された横断歩道の領域を整形する（ステップＳ１１２）。 Next, the image processing unit 118 reshapes the area of the pedestrian crossing detected by the area detection unit 114 based on the direction V1 of the pedestrian crossing and the direction V2 of the white line estimated by the direction estimation unit 116 (step S112).

図５は、横断歩道の領域を整形する方法を説明するための図である。図中の（ａ）は、クロップド画像ＩＭＧ２を表しており、そのクロップド画像ＩＭＧ２上のＲは、領域検出部１１４によって検出された横断歩道の領域を表している。クロップド画像ＩＭＧ２は便宜上長方形となっているが実際には正方形であってよい。またクロップド画像ＩＭＧ２が長方形であっても上述のようにパディングによって正方形に変換されてよい。 Figure 5 illustrates a method for shaping the area of a pedestrian crossing. (a) in the figure represents the cropped image IMG2, and R on the cropped image IMG2 represents the area of the pedestrian crossing detected by the area detection unit 114. Although the cropped image IMG2 is shown as a rectangle for convenience, it may actually be a square. Furthermore, even if the cropped image IMG2 is rectangular, it may be converted to a square by padding as described above.

図中の（ｂ）は、（ａ）のクロップド画像ＩＭＧ２上において横断歩道の向きＶ１及び白線の向きＶ２が推定されていることを表している。図示のように、検出された横断歩道の領域Ｒは、実際の横断歩道の領域を完全にはマスクしない場合がある。これはセグメンテーションモデルが第１機械学習モデルＭＤＬ１を用いたモデルベースであることに起因する。従って、画像処理部１１８は、横断歩道の向きＶ１及び白線の向きＶ２に基づいて、実際の横断歩道の領域を完全にマスクするように、第１機械学習モデルＭＤＬ１を用いて検出された横断歩道の領域Ｒを整形する。 Figure (b) indicates that the orientation of the pedestrian crossing V1 and the orientation of the white line V2 are estimated on the cropped image IMG2 of (a). As shown in the figure, the detected pedestrian crossing region R may not completely mask the actual pedestrian crossing region. This is because the segmentation model is model-based using the first machine learning model MDL1. Therefore, the image processing unit 118 reshapes the detected pedestrian crossing region R using the first machine learning model MDL1 to completely mask the actual pedestrian crossing region, based on the orientation of the pedestrian crossing V1 and the orientation of the white line V2.

（ｃ）に示すように、まず画像処理部１１８は、横断歩道の領域Ｒとして検出された四角形の４辺のうち、互いに対向する辺同士、つまり対辺を一組にして２つの組に分ける。具体的には、画像処理部１１８は、横断歩道の領域Ｒとして検出された四角形の４辺を、横断歩道の向きＶ１の角度に近い（Ｖ１との角度差が小さくより平行な）２辺と、白線の向きＶ２の角度に近い（Ｖ２との角度差が小さくより平行な）２辺とに分類する。横断歩道の向きＶ１の角度に近い（Ｖ１との角度差が小さくより平行な）２辺は「第１の対辺」の一例であり、白線の向きＶ２の角度に近い（Ｖ２との角度差が小さくより平行な）２辺は「第２の対辺」の一例である。 As shown in (c), the image processing unit 118 first divides the four sides of the rectangle detected as the crosswalk area R into two pairs, with opposite sides forming a pair. Specifically, the image processing unit 118 classifies the four sides of the rectangle detected as the crosswalk area R into two sets: two sides that are close to the angle of the crosswalk direction V1 (smaller angle difference with V1 and more parallel) and two sides that are close to the angle of the white line direction V2 (smaller angle difference with V2 and more parallel). The two sides that are close to the angle of the crosswalk direction V1 (smaller angle difference with V1 and more parallel) are an example of the "first pair of opposite sides," and the two sides that are close to the angle of the white line direction V2 (smaller angle difference with V2 and more parallel) are an example of the "second pair of opposite sides."

（ｄ）に示すように、次に画像処理部１１８は、横断歩道の向きＶ１の角度に近い（Ｖ１との角度差が小さくより平行な）２辺を、横断歩道の向きＶ１に近づくように回転させる。同様に、画像処理部１１８は、白線の向きＶ２の角度に近い（Ｖ２との角度差が小さくより平行な）２辺を、白線の向きＶ２に近づくように回転させる。この際、画像処理部１１８は、各辺の中点周りにそれら辺を回転させてよい。 As shown in (d), the image processing unit 118 then rotates two sides that are close in angle to the direction V1 of the crosswalk (i.e., the angle difference with V1 is small and they are more parallel) so that they approach the direction V1 of the crosswalk. Similarly, the image processing unit 118 rotates two sides that are close in angle to the direction V2 of the white line (i.e., the angle difference with V2 is small and they are more parallel) so that they approach the direction V2 of the white line. In this case, the image processing unit 118 may rotate the sides around their midpoints.

（ｅ）に示すように、画像処理部１１８は、角度が修正された４辺によって閉じた領域（つまり整形された領域）を、新たな横断歩道の領域Ｒ＃とする。この際、画像処理部１１８は、新たな横断歩道の領域Ｒ＃の面積が、整形前の横断歩道の領域Ｒの面積の５０％以下となった場合、新たな横断歩道の領域Ｒ＃を採用せずに、整形前の横断歩道の領域Ｒを採用する。また画像処理部１１８は、新たな横断歩道の領域Ｒ＃と、整形前の横断歩道の領域Ｒとの重複する部分が、整形前の横断歩道の領域Ｒの面積の５０％以下となった場合であっても、新たな横断歩道の領域Ｒ＃を採用せずに、整形前の横断歩道の領域Ｒを採用してよい。 As shown in (e), the image processing unit 118 defines the region enclosed by the four sides with corrected angles (i.e., the reshaped region) as the new crosswalk region R#. In this case, if the area of the new crosswalk region R# is 50% or less of the area of the crosswalk region R before reshaping, the image processing unit 118 will not adopt the new crosswalk region R# and will instead adopt the crosswalk region R before reshaping. Furthermore, even if the overlapping portion between the new crosswalk region R# and the crosswalk region R before reshaping is 50% or less of the area of the crosswalk region R before reshaping, the image processing unit 118 may still adopt the crosswalk region R before reshaping instead of the new crosswalk region R#.

このような一連の処理によって横断歩道の領域Ｒが整形され、本フローチャートの処理が終了する。通信制御部１２０は、ユーザの端末装置から送信された地図データのリクエストが通信部１０２によって受信されると、そのリクエストに対するレスポンスとして、整形後の横断歩道の領域Ｒ＃（面積の条件を満たさない場合には整形前の横断歩道の領域Ｒ）がアノテーションとして付与された衛星画像を、通信部１０２を介してユーザの端末装置に送信してよい。 Through this series of processes, the area R of the pedestrian crossing is shaped, and the processing in this flowchart is completed. When the communication control unit 120 receives a request for map data transmitted from the user's terminal device via the communication unit 102, it may, as a response to that request, transmit a satellite image to the user's terminal device via the communication unit 102, with the shaped pedestrian crossing area R# (or the unshaped pedestrian crossing area R if the area conditions are not met) annotated.

図６は、アノテーションが付与された衛星画像の一例を表す図である。図中のＡＮは、アノテーションを表しており、そのアノテーションは、衛星画像上の横断歩道に、整形後の横断歩道の領域Ｒ＃を重畳表示している。このようなアノテーションＡＮの表示によって、ユーザは、横断歩道などの対象物の位置や形状をよりリアルに認識することができ、ナビゲーションなどのサービスを違和感なく（不自然に感じることなく）利用することができる。 Figure 6 shows an example of annotated satellite imagery. In the figure, AN represents the annotation, which overlays the reshaped crosswalk area R# onto the crosswalk in the satellite image. This display of annotation AN allows users to more realistically perceive the location and shape of objects such as crosswalks, enabling them to use navigation and other services without feeling any discomfort or unnaturalness.

［機械学習モデルのトレーニング］
以下、上述した第１機械学習モデルＭＤＬ１と第２機械学習モデルＭＤＬ２の学習方法（トレーニング方法）について説明する。学習部１２２は、上述した第１トレーニングデータセットを用いて第１機械学習モデルＭＤＬ１を学習し、上述した第２トレーニングデータセットを用いて第２機械学習モデルＭＤＬ２を学習する。学習部１２２は、第２機械学習モデルＭＤＬ２を学習する際に、損失関数を以下のようにする。上述した「向き」は「角度」と同義で扱ってよい。第２機械学習モデルＭＤＬ２によってＶ１及びＶ２の角度として出力される数値は０～１の範囲であり、この数値範囲０～１は、角度範囲０度～１８０度に対応している。 [Training machine learning models]
The training methods for the first machine learning model MDL1 and the second machine learning model MDL2 described above are explained below. The learning unit 122 trains the first machine learning model MDL1 using the first training dataset described above, and trains the second machine learning model MDL2 using the second training dataset described above. When the learning unit 122 trains the second machine learning model MDL2, it sets the loss function as follows. The term "direction" mentioned above can be treated as synonymous with "angle". The numerical values output by the second machine learning model MDL2 as angles V1 and V2 are in the range of 0 to 1, and this numerical range of 0 to 1 corresponds to the angle range of 0 degrees to 180 degrees.

（ｉ）第２機械学習モデルＭＤＬ２によって出力された角度（以下、推定角度という）と、第２トレーニングデータセットに含まれる正解の角度とが９０度（０．５）以上異なっており、
（ｉｉ）推定角度と正解の角度の両方が、１８度（０．１）未満、又は１６２度（０．９）以上のどちらかに該当していれば、
（ｉｉｉ）正解の角度を推定角度に近づけるように（０度又は１８０度に近づけるように）修正してからＢＣＥＬｏｓｓ（Binary Cross Entropy Loss）によって損失関数の最適化を行う。 (i) The angle output by the second machine learning model MDL2 (hereinafter referred to as the estimated angle) differs from the correct angle included in the second training dataset by 90 degrees (0.5) or more.
(ii) If both the estimated angle and the correct angle are less than 18 degrees (0.1) or 162 degrees (0.9) or greater,
(iii) The correct angle is corrected to be closer to the estimated angle (closer to 0 degrees or 180 degrees), and then the loss function is optimized using BCELoss (Binary Cross Entropy Loss).

クロップド画像ＩＭＧ２を入力したことに応じて第２機械学習モデルＭＤＬ２によって出力される（推定される）Ｖ１及びＶ２の角度が、ほぼ０度又はほぼ１８０度となる場合、正解の角度と推定角度とが真逆の向きとなる場合がある。例えば、正解の角度が１７９．８度程度（第２機械学習モデルＭＤＬ２の出力値に換算すると０．９９９程度）となっている場合に、推定角度がほぼ０度（第２機械学習モデルＭＤＬ２の出力値に換算すると０．０００１程度）になっていまう場合がある。 When the angles V1 and V2 output (estimated) by the second machine learning model MDL2 in response to the input cropped image IMG2 are approximately 0 degrees or approximately 180 degrees, the estimated angle and the correct angle may be in opposite directions. For example, if the correct angle is approximately 179.8 degrees (equivalent to approximately 0.999 in the output values of the second machine learning model MDL2), the estimated angle may be approximately 0 degrees (equivalent to approximately 0.0001 in the output values of the second machine learning model MDL2).

ほぼ１８０度とほぼ０度は実際にはほとんど同一の角度であるが、第２機械学習モデルＭＤＬ２の出力値である推定角度が、ほぼ０（例えば０．０００１程度）か、ほぼ１（例えば０．９９９程度）と異なる値であっても、正解と見做した方がよい。 Although nearly 180 degrees and nearly 0 degrees are practically the same angle, it is best to consider the estimated angle, which is the output value of the second machine learning model MDL2, as correct even if it is a different value from nearly 0 (e.g., around 0.0001) or nearly 1 (e.g., around 0.999).

従って、上記の（ｉ）～（ｉｉｉ）の例外条件を含む損失関数に基づいて第２機械学習モデルＭＤＬ２を学習する。これによって、第２機械学習モデルＭＤＬ２の学習効率やその精度を向上させることができる。 Therefore, the second machine learning model MDL2 is trained based on the loss function that includes the exception conditions (i) to (iii) above. This improves the training efficiency and accuracy of the second machine learning model MDL2.

以上説明した実施形態によれば、情報処理装置１００は、衛星画像（衛星写真）、航空画像（航空写真）、空中画像（空中写真）といった上空から地表面が撮像された画像上において対象物（例えば横断歩道）の領域を検出するとともに、対象物の領域の向きを推定する。情報処理装置１００は、推定された向きに基づいて対象物の領域を整形する。このように、対象物（例えば横断歩道）の領域を整形することでセグメンテーションの精度が向上する。この結果、横断歩道などの対象物を衛星画像などにアノテーションとして付与する際に、そのアノテーションを効果的に行うことができる。 According to the embodiments described above, the information processing device 100 detects the area of an object (e.g., a pedestrian crossing) on an image of the Earth's surface taken from above, such as satellite imagery, aerial imagery, or aerial photograph, and estimates the orientation of the object's area. The information processing device 100 then reshapes the object's area based on the estimated orientation. By reshaping the object's area in this way, the accuracy of segmentation is improved. As a result, when adding annotations to satellite imagery or other data containing objects such as pedestrian crossings, the annotation can be performed more effectively.

＜ハードウェア構成＞
上述した実施形態の情報処理装置１００は、例えば、図７に示すようなハードウェア構成により実現される。図７は、実施形態の情報処理装置１００のハードウェア構成の一例を示す図である。 <Hardware Configuration>
The information processing device 100 of the above-described embodiment can be realized by a hardware configuration such as that shown in Figure 7. Figure 7 is a diagram showing an example of the hardware configuration of the information processing device 100 of the embodiment.

情報処理装置１００は、ＮＩＣ１００－１、ＣＰＵ１００－２、ＲＡＭ１００－３、ＲＯＭ１００－４、フラッシュメモリやＨＤＤなどの二次記憶装置１００－５、およびドライブ装置１００－６が、内部バスあるいは専用通信線によって相互に接続された構成となっている。ドライブ装置１００－６には、光ディスクなどの可搬型記憶媒体が装着される。二次記憶装置１００－５、またはドライブ装置１００－６に装着された可搬型記憶媒体に格納されたプログラムがＤＭＡコントローラ（不図示）などによってＲＡＭ１００－３に展開され、ＣＰＵ１００－２によって実行されることで処理部１１０が実現される。ＣＰＵ１００－２が参照するプログラムは、ネットワークＮＷを介して他の装置からダウンロードされてもよい。 The information processing device 100 consists of a NIC 100-1, a CPU 100-2, a RAM 100-3, a ROM 100-4, a secondary storage device 100-5 such as flash memory or an HDD, and a drive device 100-6, all interconnected via an internal bus or dedicated communication lines. A portable storage medium, such as an optical disc, is mounted in the drive device 100-6. A program stored in the secondary storage device 100-5 or the portable storage medium mounted in the drive device 100-6 is loaded into the RAM 100-3 by a DMA controller (not shown), and executed by the CPU 100-2 to realize the processing unit 110. The program referenced by the CPU 100-2 may be downloaded from another device via a network NW.

以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 Although embodiments for carrying out the present invention have been described above using examples, the present invention is not limited in any way to these embodiments, and various modifications and substitutions can be made without departing from the spirit of the invention.

１００…情報処理装置、１０２…通信部、１１０…処理部、１１２…取得部、１１４…領域検出部、１１６…向き推定部、１１８…画像処理部、１２０…通信制御部、１２２…学習部 100...Information processing unit, 102...Communication unit, 110...Processing unit, 112...Acquisition unit, 114...Region detection unit, 116...Orientation estimation unit, 118...Image processing unit, 120...Communication control unit, 122...Learning unit

Claims

A detection unit that detects the area of an object on an image of the ground surface taken from above,
An estimation unit for estimating the orientation of the said region,
An image processing unit that shapes the region based on the orientation,
A communication unit that communicates with a terminal device that displays the aforementioned image upon receiving a request from a user,
The system includes a communication control unit that transmits an annotation image, which is the image to which the shaped region of the object has been annotated, to the terminal device via the communication unit,
The annotation image is displayed on the terminal device as a response to the request.
Information processing device.

The aforementioned object is a pedestrian crossing,
The detection unit detects the area of the pedestrian crossing on the image,
The estimation unit estimates a first direction, which is the direction in which a pedestrian crosses the crosswalk, and a second direction, which is the longitudinal direction of the white line drawn on the road as the crosswalk.
The image processing unit shapes the area of the pedestrian crossing based on the first direction and the second direction.
The information processing apparatus according to claim 1.

The area of the aforementioned pedestrian crossing is a quadrilateral area consisting of four sides.
The image processing unit reshapes the quadrilateral region such that the first pair of opposite sides of the four sides is parallel to the first direction, and the second pair of opposite sides of the four sides is parallel to the second direction.
The information processing apparatus according to claim 2.

The detection unit uses a first machine learning model to detect the region of the object on the image.
The first machine learning model is a machine learning model that has been trained on a first training dataset in which a training target image is associated with the region of the correct object to be detected on the training target image.
The information processing apparatus according to claim 1 or 2.

The estimation unit uses a second machine learning model to estimate the orientation of the detected region on a cropped image, which is an image extracted from the image.
The second machine learning model is a machine learning model trained on a second training dataset in which the orientation of the object region to be estimated on the cropped image of the training target is associated with the cropped image of the training target, which is an image obtained by cutting out the region of the object from a training target image.
The information processing apparatus according to claim 1 or 2.

A computer-based information processing method,
Detecting the area of an object on an image of the ground surface taken from above,
To estimate the orientation of the said region,
To shape the region based on the orientation,
To communicate with a terminal device that displays the aforementioned image upon receiving a request from a user,
This includes transmitting an annotated image, which is the image to which the shaped region of the object has been annotated, to the terminal device via a communication unit .
The annotation image is displayed on the terminal device as a response to the request.
Information processing methods.

A program to be executed by a computer,
Detecting the area of an object on an image of the ground surface taken from above,
To estimate the orientation of the said region,
To shape the region based on the orientation,
To communicate with a terminal device that displays the aforementioned image upon receiving a request from a user,
This includes transmitting an annotated image, which is the image to which the shaped region of the object has been annotated, to the terminal device via a communication unit .
The annotation image is displayed on the terminal device as a response to the request.
program.