JP7416614B2

JP7416614B2 - Learning model generation method, computer program, information processing device, and information processing method

Info

Publication number: JP7416614B2
Application number: JP2019233573A
Authority: JP
Inventors: 文彦高橋
Original assignee: Ｇｏ株式会社
Priority date: 2019-12-24
Filing date: 2019-12-24
Publication date: 2024-01-17
Anticipated expiration: 2039-12-24
Also published as: JP2021103386A

Description

特許法第３０条第２項適用令和１年６月４日に、２０１９年度人工知能学会全国大会（第３３回）にて公開令和１年７月１日に、データサイエンティスト：ゆるふわ採用座談会にて公開令和１年７月１３日に、ＣＣＳＥ２０１９にて公開令和１年７月１６日に、ＤａｔａＤｒｉｖｅｎＤｅｖｅｌｏｐｅｒＭｅｅｔｕｐ＃６にて公開Application of Article 30, Paragraph 2 of the Patent Act Published on June 4, 2020 at the 2019 National Conference of the Japanese Society for Artificial Intelligence (33rd) Data scientist: Yurufuwa recruitment on July 1, 2020 Published at a roundtable discussion Published at CCSE 2019 on July 13, 2020 Published at Data Driven Developer Meetup #6 on July 16, 2020

本発明は、画像から文字列を認識する学習モデルの生成方法、コンピュータプログラム、情報処理装置、及び、情報処理方法に関する。 The present invention relates to a method for generating a learning model that recognizes character strings from images, a computer program, an information processing device, and an information processing method.

画像内の文字列を、ニューラルネットワークを用いて認識する学習方法が種々提案されている。特許文献１には、帳票に記入されうる単語が登録されたデータベースを用いた学習によって、帳票の画像データからの文字列の認識処理の精度を改善する方法が開示されている。 Various learning methods have been proposed for recognizing character strings in images using neural networks. Patent Document 1 discloses a method for improving the accuracy of character string recognition processing from image data of a form by learning using a database in which words that can be written on a form are registered.

特許第６５９０３５５号公報Patent No. 6590355

ニューラルネットワークを用いた文字認識が多様な場面で可能であるとはいえ、あらゆる画像から特に条件もなく、文字を認識することは依然として難しい。ニューラルネットワークを用いた文字認識を精度よく、また学習効率を向上させるためには、特許文献１に示されているように、対象となる画像を特定の画像とし、認識対象外の単語を除外し、前後の文字から類推が可能な状態とすることが必要である。 Although character recognition using neural networks is possible in a variety of situations, it is still difficult to recognize characters from any image without any particular conditions. In order to improve character recognition accuracy using a neural network and to improve learning efficiency, as shown in Patent Document 1, the target image is a specific image and words that are not to be recognized are excluded. , it is necessary to make it possible to make analogies from the preceding and following characters.

本発明は、精度よく、教師データのデータ量が少ない場合であっても効率的に学習できる文字認識のための学習モデルの生成方法、コンピュータプログラム、情報処理装置、及び情報処理方法を提供することを目的とする。 The present invention provides a method for generating a learning model for character recognition, a computer program, an information processing device, and an information processing method that can learn accurately and efficiently even when the amount of teacher data is small. With the goal.

本開示の一実施形態の学習モデルの生成方法は、文字列が写っている画像の画像データ及び前記文字列のテキストデータの組、画像データ及び該画像データの画像に文字列が写っているか否かを示す存否データの組を含む教師データを取得し、画像データを入力した場合に、写っている文字列のテキストデータ及び存否データを出力する学習モデルを、前記教師データを用いて生成する。 A learning model generation method according to an embodiment of the present disclosure includes a set of image data of an image in which a character string is shown and text data of the character string, and a set of image data and text data of the character string, and whether or not the character string is shown in the image data and the image data. A learning model that outputs text data and presence/absence data of a character string in the image when image data is input is obtained using the training data.

本開示の学習モデルの生成方法では、学習モデルは、文字列が写っていない画像の画像データと、文字列が写っている画像の画像データ及びその画像に写っている文字列の正解データとを用いて、文字列が写っているか否かの存否の判定と、文字列の認識とを同一のネットワークで学習させて生成される。 In the learning model generation method of the present disclosure, the learning model uses image data of an image without a character string, image data of an image with a character string, and correct data of the character string in the image. It is generated by training the same network to determine whether a character string is present or not and to recognize the character string.

本開示の一実施形態の情報処理装置は、画像データを入力した場合に、前記画像データの画像に写っている文字列の認識結果であるテキストデータ及び存否データを出力するように学習してあるモデルが記憶してある記憶部と、画像データを取得する画像取得部と、該画像取得部が取得した画像データを前記モデルに入力することによって前記モデルから出力されたテキストデータ及び存否データに基づき、前記画像データの画像に文字列が写っているか否か、及び、写っている文字列を出力する出力部とを備える。 The information processing device according to an embodiment of the present disclosure is trained to output text data and presence/absence data that are the recognition results of character strings appearing in the image of the image data when image data is input. A storage unit in which a model is stored, an image acquisition unit that acquires image data, and an image acquisition unit that inputs the image data acquired by the image acquisition unit into the model, based on the text data and presence/absence data output from the model. , an output unit that determines whether a character string is included in the image of the image data and outputs the included character string.

本開示の一実施形態の情報処理方法は、画像データを入力した場合に、前記画像データ
の画像に写っている文字列の認識結果であるテキストデータ及び存否データを出力するように学習してあるモデルを記憶しておき、画像データを取得し、取得した画像データを前記モデルに入力し、前記モデルから出力されたテキストデータ及び存否データに基づき、前記画像データの画像に文字列が写っているか否か、及び、写っている文字列を出力する処理を含む。 An information processing method according to an embodiment of the present disclosure is trained to output, when image data is input, text data and presence/absence data that are the recognition results of a character string appearing in an image of the image data. A model is memorized, image data is acquired, the acquired image data is input to the model, and based on the text data and presence/absence data output from the model, whether a character string is reflected in the image of the image data is determined. Includes processing to determine whether or not the image is displayed and to output the character string in the image.

本開示の一実施形態のコンピュータプログラムは、画像データを入力した場合に、前記画像データの画像に写っている文字列の認識結果であるテキストデータ及び存否データを出力するように学習してあるモデルを記憶してあるコンピュータに、画像データを取得し、取得した画像データを前記モデルに入力し、前記モデルから出力されたテキストデータ及び存否データに基づき、前記画像データの画像に文字列が写っているか否か、及び、写っている文字列を出力する処理を実行させる。 A computer program according to an embodiment of the present disclosure is a model trained to output, when image data is input, text data and presence/absence data that are the recognition results of a character string appearing in an image of the image data. is stored in a computer, the acquired image data is input to the model, and based on the text data and presence/absence data output from the model, a character string is reflected in the image of the image data. The process of outputting whether or not there is a character in the image and the character string in the image is executed.

本開示のコンピュータプログラムでは、車両から撮像される画像に写っている文字といった、低解像であって教師データが少ない対象であっても、学習モデルを用いて認識精度を向上させることができる。 In the computer program of the present disclosure, recognition accuracy can be improved using a learning model even for a target with low resolution and little training data, such as characters in an image captured from a vehicle.

本開示の一実施形態のコンピュータプログラムは、画像データを入力した場合に、前記画像データの画像に検出対象の文字列が写っている範囲を検出するように学習されてある第１モデル、及び、画像データを入力した場合に、写っている文字列を示すテキストデータ及び存否データを出力するように学習してある第２モデルを記憶してあるコンピュータに、第１画像データを取得し、取得した第１画像データを前記第１モデルに入力し、前記第１画像データの画像から、前記第１モデルにて検出された検出範囲を抽出した第２画像データを取得し、第２画像データを前記第２モデルに入力し、前記第１モデルから出力される、検出範囲及び該検出範囲に文字列が写っている確信度、並びに、前記第２モデルから出力されたテキストデータ及び該テキストデータと共に出力される文字列である確信度を、前記第１画像データと対応付けて記憶する処理を実行させる。 A computer program according to an embodiment of the present disclosure includes a first model trained to detect, when image data is input, a range in which a character string to be detected is included in an image of the image data; The first image data was acquired into a computer storing a second model that had been trained to output text data indicating character strings in the image and presence/absence data when image data was input. First image data is input to the first model, second image data is obtained by extracting the detection range detected by the first model from the image of the first image data, and second image data is input to the first model. The detection range and the confidence that a character string is included in the detection range are input to the second model and output from the first model, and the text data output from the second model and output together with the text data. A process of storing the certainty factor, which is a character string, in association with the first image data is executed.

本開示の生成方法によれば、少ない教師データであっても、教師データが多い場合と同程度の精度で文字列を認識できる学習モデルが生成できる。 According to the generation method of the present disclosure, even with a small amount of training data, a learning model that can recognize character strings with the same degree of accuracy as when there is a large amount of training data can be generated.

文字認識を実行する情報処理装置のブロック図である。FIG. 2 is a block diagram of an information processing device that performs character recognition. 学習モデルの概要図である。It is a schematic diagram of a learning model. 学習モデルの生成方法の一例を示すフローチャートである。3 is a flowchart illustrating an example of a learning model generation method. 学習モデルの学習の概要図である。FIG. 2 is a schematic diagram of learning of a learning model. 学習モデルの学習の概要図である。FIG. 2 is a schematic diagram of learning of a learning model. 学習データ量の精度の関係を示すグラフである。It is a graph showing the relationship between the amount of learning data and the accuracy. 実施の形態２の情報提供サービスの概要図である。FIG. 3 is a schematic diagram of an information providing service according to a second embodiment. 情報提供サービスを実現する通信機の構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of a communication device that implements an information providing service. 収集装置、情報処理装置及び記憶装置の構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of a collection device, an information processing device, and a storage device. 収集装置及び情報処理装置による処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the processing procedure by a collection device and an information processing device. ドライブレコーダで撮像された画像に対する処理結果を示す。The results of processing an image captured by a drive recorder are shown. 情報提供装置及び情報端末装置の構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of an information providing device and an information terminal device. 情報提供装置によって提供される情報の表示例を示す。A display example of information provided by an information providing device is shown.

本開示をその実施の形態を示す図面を参照して具体的に説明する。 The present disclosure will be specifically described with reference to drawings showing embodiments thereof.

（実施の形態１）
図１は、文字認識を実行する情報処理装置１のブロック図である。情報処理装置１は、制御部１０、画像処理部１１、記憶部１２、通信部１３、及び読取部１４を備える。情報処理装置１及び情報処理装置１における動作について以下では、１台のサーバコンピュータとして説明するが、複数のコンピュータによって処理を分散するようにして構成されてもよい。 (Embodiment 1)
FIG. 1 is a block diagram of an information processing device 1 that performs character recognition. The information processing device 1 includes a control section 10, an image processing section 11, a storage section 12, a communication section 13, and a reading section 14. Although the information processing apparatus 1 and the operation thereof will be described below as one server computer, it may be configured such that processing is distributed among a plurality of computers.

制御部１０は、ＣＰＵ（Central Processing Unit ）等のプロセッサ及びメモリ等を用い、装置の構成部を制御して各種機能を実現する。画像処理部１１は、ＧＰＵ（Graphics
Processing Unit）又は専用回路等のプロセッサ及びメモリを用い、制御部１０からの制御指示に応じて画像処理を実行し、学習モデル１Ｍとして機能する。なお、制御部１０及び画像処理部１１は、一体のハードウェアであってもよい。また制御部１０及び画像処理部１１は、ＣＰＵ、ＧＰＵ等のプロセッサ、メモリ、更には記憶部１２及び通信部１３を集積した１つのハードウェア（ＳｏＣ：System On a Chip）として構成されていてもよい。 The control unit 10 uses a processor such as a CPU (Central Processing Unit), a memory, and the like to control the constituent parts of the apparatus and realize various functions. The image processing unit 11 uses a GPU (Graphics
Using a processor such as a processing unit (Processing Unit) or a dedicated circuit and a memory, image processing is executed in accordance with control instructions from the control unit 10, and functions as a learning model 1M. Note that the control unit 10 and the image processing unit 11 may be integrated hardware. Furthermore, the control unit 10 and the image processing unit 11 may be configured as one piece of hardware (SoC: System On a Chip) that integrates a processor such as a CPU or GPU, a memory, and furthermore, a storage unit 12 and a communication unit 13. good.

記憶部１２は、ハードディスク又はフラッシュメモリを用いる。記憶部１２には、学習プログラム１Ｐ、ＤＬ（Deep Learning ）ライブラリ１Ｌが記憶されている。また記憶部１２には、ＤＬライブラリ１Ｌを用いた学習によって生成される学習モデル１Ｍを定義する定義情報、学習済み（学習中）の学習モデル１Ｍにおける各層の重み係数等を含むパラメータ情報等が記憶される。 The storage unit 12 uses a hard disk or flash memory. The storage unit 12 stores a learning program 1P and a DL (deep learning) library 1L. In addition, the storage unit 12 stores definition information that defines the learning model 1M generated by learning using the DL library 1L, parameter information including weighting coefficients of each layer in the trained (currently learning) learning model 1M, etc. be done.

記憶部１２には、学習モデル１Ｍの学習を実行するための学習プログラム（コンピュータプログラム）１Ｐが記憶されている。制御部１０は、学習プログラム１Ｐに基づいて、記憶部１２に記憶されている定義情報に基づくネットワークのパラメータの学習処理を実行する。 The storage unit 12 stores a learning program (computer program) 1P for executing learning of the learning model 1M. The control unit 10 executes a network parameter learning process based on the definition information stored in the storage unit 12 based on the learning program 1P.

通信部１３は、外部からのデータを入力し、外部へデータを送信するインタフェースである。通信部１３は、インターネット等の通信網への通信接続を実現する通信モジュールである。通信部１３は、ネットワークカード、無線通信デバイス又はキャリア通信用モジュールを含んでもよい。通信部１３は、ＵＳＢインタフェースであってもよい。 The communication unit 13 is an interface that inputs data from the outside and transmits data to the outside. The communication unit 13 is a communication module that realizes a communication connection to a communication network such as the Internet. The communication unit 13 may include a network card, a wireless communication device, or a carrier communication module. The communication unit 13 may be a USB interface.

読取部１４は、例えばディスクドライブを用い、光ディスク等を用いた記録媒体７に記憶してある学習プログラム７１Ｐ、ＤＬライブラリ７１Ｌを読み取ることが可能である。記憶部１２に記憶してあるＤＬライブラリ１Ｌは、記録媒体７から読取部１４が読み取った学習プログラム７１Ｐを、制御部１０が記憶部１２に複製したものであってもよい。 The reading unit 14 can read the learning program 71P and the DL library 71L stored in the recording medium 7 using an optical disk or the like using, for example, a disk drive. The DL library 1L stored in the storage unit 12 may be a copy of the learning program 71P read by the reading unit 14 from the recording medium 7 into the storage unit 12 by the control unit 10.

このように構成される情報処理装置１は、記憶部１２に記憶してあるＤＬライブラリ１Ｌ、及び学習プログラム１Ｐに基づき、畳み込みニューラルネットワーク（以下、ＣＮＮ（Convolutional Neural Network ）と呼ぶ）及びリカレントＮＮを、学習モデル１Ｍとするべくパラメータを学習する処理を行なう。 The information processing device 1 configured as described above operates a convolutional neural network (hereinafter referred to as a CNN) and a recurrent NN based on the DL library 1L and the learning program 1P stored in the storage unit 12. , performs a process of learning parameters to make the learning model 1M.

図２は、学習モデル１Ｍの概要図である。学習モデル１Ｍは、文字列の画像データを入力して特徴量を出力するＣＮＮ１２１と、ＣＮＮ１２１からの出力を順次入力するＢｉ－ＬＳＴＭ（Long Short Term Memory）であるＲＮＮ１２２とを含む。学習モデル１Ｍは、ＲＮＮ１２２の出力から尤もらしい文字の確率分布を出力するＣＲＦ（Conditional Random Field）層１２３を含む。 FIG. 2 is a schematic diagram of the learning model 1M. The learning model 1M includes a CNN 121 that inputs character string image data and outputs feature amounts, and an RNN 122 that is a Bi-LSTM (Long Short Term Memory) that sequentially inputs the output from the CNN 121. The learning model 1M includes a CRF (Conditional Random Field) layer 123 that outputs a probability distribution of plausible characters from the output of the RNN 122.

学習モデル１Ｍは、ＣＲＦ層１２３からの出力を分岐させて入力し、画像に写っている
文字列のデータを出力する第１出力層１２４と、分岐された出力を入力し、画像に文字列が写っているかの存否データを出力する第２出力層１２５とを含む。 The learning model 1M has a first output layer 124 which inputs the branched output from the CRF layer 123 and outputs the data of the character string appearing in the image, and a first output layer 124 which inputs the branched output and outputs the data of the character string shown in the image. and a second output layer 125 that outputs presence/absence data of the image.

このように、本開示の学習モデル１Ｍは、入力はＣＮＮ１２１であるのに対し、異なるデータを出力する第１出力層１２４及び第２出力層１２５を同時に学習させるマルチタスク学習によって生成される。 In this way, the learning model 1M of the present disclosure uses the CNN 121 as an input, and is generated by multitask learning in which the first output layer 124 and the second output layer 125 that output different data are simultaneously trained.

ＣＮＮ１２１は、入力された画像データの画像から、文字列の並列方向に、相互に重複する範囲を抽出するフィルターを含む。ＣＮＮ１２１は、順次抽出された範囲の画像データについて、特徴量を文字列の並列方向に沿って順に出力する。 The CNN 121 includes a filter that extracts mutually overlapping ranges in the parallel direction of character strings from an image of input image data. The CNN 121 sequentially outputs feature amounts for the sequentially extracted range of image data along the parallel direction of the character strings.

ＲＮＮ１２２は、ＣＮＮ１２１から順に出力される特徴量を順に入力し、入力される範囲毎に、その範囲に写っている文字の確信度を順次出力する。このときＲＮＮは、空白文字も含めてその確信度を出力する。 The RNN 122 sequentially inputs the feature quantities output from the CNN 121, and sequentially outputs, for each input range, the confidence of characters appearing in that range. At this time, the RNN outputs the confidence level including blank characters.

ＣＲＦ層１２３はＲＮＮ１２２から順次出力される文字及び角度から、ＣＮＮ１２１に入力されている画像データの画像に写っている可能性が最も確率が高い文字の組み合わせを出力する。 The CRF layer 123 outputs a combination of characters that is most likely to appear in the image of the image data input to the CNN 121 based on the characters and angles sequentially output from the RNN 122 .

第１出力層１２４は、ＣＲＦ層１２３から出力される文字列の組み合わせから、ＣＮＮ１２１のフィルターで重複して抽出されていることを考慮し、連続する文字列は同一の文字であると判断し、ＣＮＮ１２１に入力されている画像データの画像に写っている可能性が最も高い文字列を出力する。図２に示すように第１出力層１２４は、ＣＴＣ（Connectionist Temporal Classification ）を用い、空白文字を挟まずに連続して同一の文字が出力されている場合には、１つの文字であるとして出力する。 The first output layer 124 considers that the combination of character strings output from the CRF layer 123 is extracted redundantly by the filter of CNN 121, and determines that consecutive character strings are the same character, The character string that is most likely to appear in the image of the image data input to CNN 121 is output. As shown in FIG. 2, the first output layer 124 uses CTC (Connectionist Temporal Classification), and if the same character is output continuously without a blank space, it is output as one character. do.

第２出力層１２５は、ＣＲＦ層１２３から出力される文字列の組み合わせと、その確率から、ＣＮＮ１２１に入力されている画像データの画像にそもそも、文字列が写っているか否かの判定結果である存否データを出力する。 The second output layer 125 is a determination result of whether or not a character string appears in the image of the image data input to the CNN 121, based on the combination of character strings output from the CRF layer 123 and their probabilities. Output presence/absence data.

図３は、学習モデル１Ｍの生成方法の一例を示すフローチャートであり、図４及び図５は、学習モデル１Ｍの学習の概要図である。図４及び図５に示す概要図は、図２の学習モデル１Ｍと、教師データの例を示している。 FIG. 3 is a flowchart showing an example of a method for generating the learning model 1M, and FIGS. 4 and 5 are schematic diagrams of learning of the learning model 1M. The schematic diagrams shown in FIGS. 4 and 5 show examples of the learning model 1M of FIG. 2 and teacher data.

制御部１０は、学習プログラム１Ｐに基づいて、ＤＬライブラリ１Ｌに基づいて画像処理部１１を用い、図２のように、ＣＮＮ１２１、ＲＮＮ１２２、ＣＲＦ層１２３、第１出力層１２４、及び第２出力層１２５を定義するネットワークを生成する（ステップＳ１０１）。 The control unit 10 uses the image processing unit 11 based on the learning program 1P and the DL library 1L, and as shown in FIG. 125 is generated (step S101).

制御部１０は、教師データとして図４に示すように、文字列（図４では数字列）が写っている画像の画像データをネットワークのＣＮＮ１２１へ与える（ステップＳ１０２）。制御部１０は、ネットワークの第１出力層１２４から出力される文字列及びその確信度と、第２出力層１２５から出力される存否データ（確率）とを特定する（ステップＳ１０３）。 As shown in FIG. 4, the control unit 10 provides image data of an image containing a character string (in FIG. 4, a number string) to the CNN 121 of the network (step S102). The control unit 10 specifies the character string and its reliability output from the first output layer 124 of the network, and the presence/absence data (probability) output from the second output layer 125 (step S103).

制御部１０は、ステップＳ１０３で特定した文字列のテキスト、確信度、存否データと、教師データである前記画像に写っている文字列を表すテキスト（正解）、及び、文字列が存在することを示す存否データ（正解）とを用いて損失を演算で求める（ステップＳ１０４）。 The control unit 10 includes the text, confidence level, and presence/absence data of the character string identified in step S103, the text (correct answer) representing the character string shown in the image, which is teacher data, and the presence/absence of the character string. The loss is calculated using the presence/absence data (correct answer) shown (step S104).

制御部１０は、画像処理部１１を用い、ステップＳ１０４で求めた損失を、ネットワークに逆伝播させる学習を実行し、パラメータを更新する（ステップＳ１０５）。ステップＳ１０５において制御部１０は、１つの画像データに対し、学習回数が所定回数と到達するまで、又は、第１出力層１２４から出力される文字列の精度が所定精度に到達するまで繰り返し学習するとよい。 The control unit 10 uses the image processing unit 11 to execute learning to back-propagate the loss obtained in step S104 to the network, and updates the parameters (step S105). In step S105, the control unit 10 repeatedly performs learning for one image data until the number of learning times reaches a predetermined number or the accuracy of the character string output from the first output layer 124 reaches a predetermined accuracy. good.

制御部１０は、教師データとして図５に示すように、文字列が写っていない画像の画像データをネットワークのＣＮＮ１２１へ与える（ステップＳ１０６）。制御部１０は、第１出力層１２４からの出力は用いず、第２出力層１２５から出力される存否データ（確率）のみを特定する（ステップＳ１０７）。 As shown in FIG. 5, the control unit 10 provides image data of an image without a character string to the CNN 121 of the network (step S106). The control unit 10 specifies only the presence/absence data (probability) output from the second output layer 125 without using the output from the first output layer 124 (step S107).

制御部１０は、ステップＳ１０７で特定した存否データと、教師データである画像に文字列が存在しないことを示す存否データ（正解）とを用いて損失を演算で求める（ステップＳ１０８）。 The control unit 10 calculates the loss using the presence/absence data identified in step S107 and the presence/absence data (correct answer) indicating that the character string does not exist in the image, which is the teacher data (step S108).

制御部１０は、画像処理部１１を用い、ステップＳ１０８で求めた損失を、ネットワークに逆伝播させる学習を実行し、パラメータを更新する（ステップＳ１０９）。ステップＳ１０９においても制御部１０は、１つの画像データに対し、第２出力層１２５から出力される存否データの精度が所定精度に到達するまで繰り返し学習するとよい。 The control unit 10 uses the image processing unit 11 to perform learning to back-propagate the loss obtained in step S108 to the network, and updates the parameters (step S109). Also in step S109, the control unit 10 may repeatedly perform learning for one piece of image data until the accuracy of the presence/absence data output from the second output layer 125 reaches a predetermined accuracy.

制御部１０は、教師データとして用意された画像データ全てについて学習処理を実行したか否かを判断する（ステップＳ１１０）。制御部１０は、全てについて学習処理を実行していないと判断した場合（Ｓ１１０：ＮＯ）、処理をステップＳ１０２へ戻す。全てについて学習処理を実行したと判断された場合（Ｓ１１０：ＹＥＳ）、制御部１０は、学習処理を終了する。 The control unit 10 determines whether learning processing has been performed on all of the image data prepared as teacher data (step S110). When the control unit 10 determines that the learning process has not been performed for all of the items (S110: NO), the process returns to step S102. If it is determined that the learning process has been performed for all of them (S110: YES), the control unit 10 ends the learning process.

図３のフローチャートに示す手順では、制御部１０は、文字列が写っている画像を１つ選択してステップＳ１０２－Ｓ１０５の処理を実行してから、文字列がいない画像を１つ選択してステップＳ１０６－Ｓ１０９の処理を実行することを、全ての画像データに対して処理が完了するまで、交互に繰り返すとして説明した。しかしながら、制御部１０は、文字列が写っている画像の全ての画像データについてステップＳ１０２－Ｓ１０５の処理を実行してから、文字列が写っていない画像の全ての画像データについてステップＳ１０６－Ｓ１０９の処理を実行するようにしてもよい。 In the procedure shown in the flowchart of FIG. 3, the control unit 10 selects one image that includes a character string, executes the processing in steps S102 to S105, and then selects one image that does not include a character string. It has been described that the processing of steps S106 to S109 is repeated alternately until the processing is completed for all image data. However, the control unit 10 executes steps S102-S105 for all image data of images in which character strings are shown, and then processes steps S106-S109 for all image data of images in which character strings are not shown. Processing may also be executed.

再学習の際は、新たな画像データを用いて制御部１０は、ステップＳ１０２－Ｓ１１０の処理を実行するとよい。 When relearning, the control unit 10 preferably executes the processes of steps S102 to S110 using new image data.

このように学習モデル１Ｍは、文字列が写っていない画像の画像データと、文字列が写っている画像の画像データ及びその画像に写っている文字列の正解データとを用いて、文字列が写っているか否かの存否の判定と、文字列の認識とを同一のネットワークで学習させて生成される。これにより、学習モデル１Ｍに対し、文字列（数字）の形自体を学習させることができ、認識精度が、少ない教師データ量でも向上する。 In this way, the learning model 1M uses the image data of images that do not include character strings, the image data of images that do include character strings, and the correct data for the character strings that appear in those images. It is generated by training the same network to determine whether the image is present or not and to recognize character strings. This allows the learning model 1M to learn the shape of the character string (number) itself, and the recognition accuracy improves even with a small amount of training data.

実施の形態１では、学習モデル１Ｍは、図２－５に示したように、ＣＮＮ－ＲＮＮで構成するとして説明した。学習モデル１Ｍは、画像を文字列方向に順次抽出するフィルターを使わず、且つＲＮＮ１２２及びＣＲＦ層１２３を用いることなく実現されてもよい。例えば学習モデル１Ｍは、画像データを入力するＣＮＮ１２１及び他の公知の画像認識用のネットワークを用いて実現されてもよい。 In the first embodiment, the learning model 1M was described as being configured with CNN-RNN as shown in FIG. 2-5. The learning model 1M may be realized without using a filter that sequentially extracts images in the character string direction, and without using the RNN 122 and the CRF layer 123. For example, the learning model 1M may be realized using the CNN 121 that inputs image data and other known networks for image recognition.

図６は、学習データ量の精度の関係を示すグラフである。図６は、教師データの消費割
合に対する、認識精度の向上の推移を示す。図６中、実線は、上述の学習モデル１Ｍの学習方法によって生成されている過程における精度の推移を示す。破線は、第２出力層１２５を用いないモデル、即ち単純に文字列認識を学習させたモデルにおける精度の推移を示す。 FIG. 6 is a graph showing the relationship between the amount of learning data and the accuracy. FIG. 6 shows the transition of improvement in recognition accuracy with respect to the consumption rate of training data. In FIG. 6, the solid line indicates the transition in accuracy during the process of generating the learning model 1M using the learning method described above. The broken line shows the transition in accuracy in a model that does not use the second output layer 125, that is, a model that is simply trained to recognize character strings.

図６に示すように、複数の出力によってネットワークのパラメータを更新するマルチタスク学習によって、少ない教師データ量であっても、学習が進んだ後の精度と同程度に向上していることが分かる。 As shown in FIG. 6, it can be seen that multitask learning, which updates network parameters using multiple outputs, improves accuracy to the same extent as the accuracy after learning, even with a small amount of training data.

（実施の形態２）
実施の形態では、実施の形態１で説明した情報処理装置１によって文字列を認識し、認識した文字列と、文字列を含む画像が撮影された位置情報とを対応付けて記憶するデータベースを作成し、文字列が示すテキストデータに基づく情報を提供する情報提供サービスを実現する。認識される文字列は、実施の形態２においてはガソリンスタンドにおけるガソリンの値段を示す数字列である。 (Embodiment 2)
In the embodiment, a character string is recognized by the information processing device 1 described in Embodiment 1, and a database is created that stores the recognized character string and the position information where the image containing the character string is photographed in association with each other. and realizes an information provision service that provides information based on text data indicated by character strings. In the second embodiment, the recognized character string is a number string indicating the price of gasoline at a gas station.

図７は、実施の形態２の情報提供サービスの概要図である。情報提供サービスは、ガソリンスタンドが設けられている道路を走行する車両Ｖ、情報処理装置１、通信機２、収集装置３、記憶装置４、情報提供装置５、及び、情報端末装置６を含む。 FIG. 7 is a schematic diagram of the information providing service according to the second embodiment. The information providing service includes a vehicle V that travels on a road where a gas station is provided, an information processing device 1, a communication device 2, a collection device 3, a storage device 4, an information providing device 5, and an information terminal device 6.

通信機２は、車両Ｖに搭載されている。通信機２は、車両Ｖに設けられたドライブレコーダによって得られる画像データを収集装置３へ画像データを送信する。通信機２と収集装置３との間は、公衆ネットワークＮ１及びキャリアネットワークＮ２を含むネットワークＮを介して通信接続が可能である。ネットワークＮは、車両Ｖと収集装置３との間の通信接続のための専用ネットワークを含んでもよいし、道路交通情報の通信に使用されるネットワークであってもよい。 The communication device 2 is mounted on the vehicle V. The communication device 2 transmits image data obtained by a drive recorder provided in the vehicle V to the collection device 3. A communication connection is possible between the communication device 2 and the collection device 3 via a network N including a public network N1 and a carrier network N2. The network N may include a dedicated network for a communication connection between the vehicle V and the collection device 3, or may be a network used for communication of road traffic information.

収集装置３は、複数の車両Ｖから画像データを収集し、記憶装置４に記憶させる。収集装置３、情報処理装置１、記憶装置４、及び情報提供装置５は、サービス提供者のローカルネットワークＬＮを介して相互に、また、各々、外部のネットワークＮを介して他装置と通信接続が可能である。収集装置３は、記憶装置４に記憶した画像データを情報処理装置１へ与える。収集装置３は、情報処理装置１から出力される、画像データの画像に数字列が写っているか否か、写っている場合には数字列が示す数値は何であるかのデータを取得する。収集装置３は、取得した数値データ、即ちガソリンの値段を示す数値データを、画像データが撮像された際の車両Ｖの走行位置を示す位置データと対応付けて記憶装置４にて蓄積し、逐次最新のデータに更新する。 The collection device 3 collects image data from a plurality of vehicles V and stores it in the storage device 4. The collection device 3, the information processing device 1, the storage device 4, and the information providing device 5 are communicatively connected to each other via the service provider's local network LN, and to other devices via the external network N. It is possible. The collection device 3 provides the image data stored in the storage device 4 to the information processing device 1. The collection device 3 acquires data output from the information processing device 1 indicating whether or not a number string is included in the image of the image data, and if so, what the numerical value is indicated by the number string. The collection device 3 stores the acquired numerical data, that is, the numerical data indicating the price of gasoline, in the storage device 4 in association with the position data indicating the traveling position of the vehicle V when the image data was captured, and sequentially stores the numerical data in the storage device 4. Update to the latest data.

情報提供装置５は、情報端末装置６と、ネットワークＮを介して通信接続が可能である。情報提供装置５は、記憶装置４に記憶されている位置データと対応付けられているガソリンの値段を示す数値データに基づいて、地図上にガソリンの値段を表示させるサービスを提供する。 The information providing device 5 can be communicatively connected to the information terminal device 6 via the network N. The information providing device 5 provides a service for displaying the price of gasoline on a map based on numerical data indicating the price of gasoline that is associated with the position data stored in the storage device 4.

図８は、情報提供サービスを実現する通信機２の構成を示すブロック図である。通信機２は、通信部２２及びＧＰＳ受信機２３を備え、車両Ｖに、車両外を写すように搭載された撮影装置２０（例えばドライブレコーダ）から動画像の画像データを逐次取得する。通信機２は、撮影装置２０から画像データを取得した時点の車両Ｖの位置を示す位置データをＧＰＳ受信機２３によって取得し、画像データに対応付ける。通信機２は、対応付けられた画像データ及び位置データを、キャリアネットワーク、専用ネットワーク、又はビーコン等の無線通信を実現する通信部２２により、収集装置３に向けて送信する。通信機２は、撮影装置２０が設けられている車両Ｖの識別情報、及び時刻情報を、画像データ及び
位置データ対応付けて送信してもよい。 FIG. 8 is a block diagram showing the configuration of the communication device 2 that implements the information providing service. The communication device 2 includes a communication unit 22 and a GPS receiver 23, and sequentially acquires image data of moving images from a photographing device 20 (for example, a drive recorder) mounted on the vehicle V so as to photograph the outside of the vehicle. The communication device 2 uses the GPS receiver 23 to acquire position data indicating the position of the vehicle V at the time when the image data is acquired from the photographing device 20, and associates it with the image data. The communication device 2 transmits the associated image data and position data to the collection device 3 using a communication unit 22 that implements wireless communication such as a carrier network, a dedicated network, or a beacon. The communication device 2 may transmit identification information and time information of the vehicle V in which the photographing device 20 is installed in association with image data and position data.

図９は、収集装置３、情報処理装置１及び記憶装置４の構成を示すブロック図である。収集装置３は、制御部３０、画像処理部３１、記憶部３２、及び通信部３３を備える。制御部３０は、ＣＰＵ等のプロセッサ及びメモリ等を用い、装置の構成部を制御して各種機能を実現する。画像処理部３１は、ＧＰＵ又は専用回路等のプロセッサ及びメモリを用い、制御部３０からの制御指示に応じて画像処理を実行する。なお、制御部３０及び画像処理部３１は、一体のハードウェアであってもよい。 FIG. 9 is a block diagram showing the configurations of the collection device 3, the information processing device 1, and the storage device 4. The collection device 3 includes a control section 30, an image processing section 31, a storage section 32, and a communication section 33. The control unit 30 uses a processor such as a CPU, a memory, and the like to control the components of the device and realize various functions. The image processing unit 31 uses a processor such as a GPU or a dedicated circuit and a memory to perform image processing in accordance with control instructions from the control unit 30. Note that the control section 30 and the image processing section 31 may be integrated hardware.

記憶部３２は、処理プログラム３０Ｐ及び画像処理プログラム３１Ｐを記憶する。記憶部３２には、画像処理部３１によってガソリンの値段が写っている可能性が高い領域を検出するための検出モデル３Ｍの定義データが記憶されている。 The storage unit 32 stores a processing program 30P and an image processing program 31P. The storage unit 32 stores definition data of a detection model 3M that is used by the image processing unit 31 to detect an area where the price of gasoline is likely to be captured.

制御部３０は、処理プログラム３０Ｐに基づき、通信機２から画像データ及び位置データを受信し、少なくとも、画像データを識別する識別ＩＤと、受信した日時のデータとを対応付けて記憶装置４へ送信して記憶させる処理を実行する。 The control unit 30 receives image data and position data from the communication device 2 based on the processing program 30P, associates at least an identification ID for identifying the image data, and data on the received date and time, and transmits the data to the storage device 4. and executes the process of storing it.

制御部３０は、画像処理プログラム３１Ｐに基づき、取得した画像データから、ガソリンの値段が写っている可能性が高い領域を検出する検出モデル３Ｍによる領域検出処理を、画像処理部３１を用いて実行する。検出モデル３Ｍは、ドライブレコーダで撮像された画像から、ガソリンの値段が写っている領域を検出するように学習されている。検出モデル３Ｍは、例えばＳＳＤ（Single Shot Detector／Single Shot Multibox Detector ）等を用い、予めその領域がガソリンの値段が写っている領域であるという教師データに基づいて学習済みである。画像処理部３１は、その他、画像データにおける特定のものが写っている可能性が高い領域を画像から抽出するための公知の手法を用いてもよい。 Based on the image processing program 31P, the control unit 30 uses the image processing unit 31 to execute area detection processing using the detection model 3M, which detects an area where the price of gasoline is likely to be captured from the acquired image data. do. The detection model 3M is trained to detect an area where the price of gasoline is shown from an image captured by a drive recorder. The detection model 3M uses, for example, an SSD (Single Shot Detector/Single Shot Multibox Detector) or the like, and has been trained in advance based on training data indicating that the area is an area where the price of gasoline is shown. The image processing unit 31 may also use a known method for extracting from the image a region that is likely to contain a specific object in the image data.

通信部３３は、ローカルネットワークＬＮ、又はネットワークＮを介した記憶装置４及び情報処理装置１との通信接続を実現する通信モジュールである。通信部３３は例えばネットワークカードである。 The communication unit 33 is a communication module that realizes a communication connection with the storage device 4 and the information processing device 1 via the local network LN or the network N. The communication unit 33 is, for example, a network card.

記憶装置４は、制御部４０、記憶部４１及び通信部４２を備える。制御部４０は、ＣＰＵ等のプロセッサを用いる。記憶部４１は、ハードディスク、ＳＳＤ（Solid State Drive）等の大容量不揮発性メモリを用いる。記憶部４１には、車両Ｖのドライブレコーダで撮像されて送信された画像データ及び位置データ、画像データから文字列の部分を抽出した画像データ、並びに、抽出後の画像データから認識されたテキストデータ（数値データ）及び存否データが、識別ＩＤに対応付けて記憶される。 The storage device 4 includes a control section 40, a storage section 41, and a communication section 42. The control unit 40 uses a processor such as a CPU. The storage unit 41 uses a large-capacity nonvolatile memory such as a hard disk or SSD (Solid State Drive). The storage unit 41 stores image data and position data captured and transmitted by the drive recorder of the vehicle V, image data obtained by extracting a character string portion from the image data, and text data recognized from the extracted image data. (numerical data) and presence/absence data are stored in association with the identification ID.

図１０は、収集装置３及び情報処理装置１による処理手順の一例を示すフローチャートである。 FIG. 10 is a flowchart illustrating an example of a processing procedure by the collection device 3 and the information processing device 1.

制御部３０は、通信機２から画像データを通信によって取得する（ステップＳ３０１）。制御部３０は、画像データと対応付けて送信される位置データを取得する（ステップＳ３０２）。 The control unit 30 acquires image data from the communication device 2 through communication (step S301). The control unit 30 acquires position data that is transmitted in association with the image data (step S302).

制御部３０は、画像データ及び位置データを、識別ＩＤに対応付けて記憶装置４に記憶させる（ステップＳ３０３）。 The control unit 30 stores the image data and position data in the storage device 4 in association with the identification ID (step S303).

制御部３０は、画像データを検出モデル３Ｍへ入力し（ステップＳ３０４）、検出モデル３Ｍから出力される検出範囲、及び、検出範囲にガソリンの値段を示す文字列（数字列）が写っている確信度を示すデータを取得する（ステップＳ３０５）。 The control unit 30 inputs the image data to the detection model 3M (step S304), and determines the detection range output from the detection model 3M and the certainty that the character string (number string) indicating the price of gasoline is included in the detection range. Data indicating the degree is acquired (step S305).

制御部３０は、確信度に基づいてガソリンの値段が写っている領域が検出できたか否かを判断する（ステップＳ３０６）。制御部４０は、ステップＳ３０５によって複数の領域が検出されている場合、夫々に対してステップＳ３０６の処理及び以下のステップＳ３０７以降の処理を実行する。 The control unit 30 determines whether the area in which the price of gasoline is shown has been detected based on the confidence level (step S306). When a plurality of areas are detected in step S305, the control unit 40 executes the process in step S306 and the processes in and after step S307 for each area.

ステップＳ３０６にて、値段が写っている領域が検出できたと判断された場合（Ｓ３０６：ＹＥＳ）、制御部３０は、ステップＳ３０１で取得した画像データの画像から、検出範囲を抽出し（ステップＳ３０７）、抽出された画像の画像データを取得する（ステップＳ３０８）。 If it is determined in step S306 that the area in which the price is captured has been detected (S306: YES), the control unit 30 extracts the detection range from the image of the image data acquired in step S301 (step S307). , obtains image data of the extracted image (step S308).

制御部３０は、抽出後の画像データを、情報処理装置１へ通信部３３から送信する（ステップＳ３０９）。 The control unit 30 transmits the extracted image data to the information processing device 1 from the communication unit 33 (step S309).

情報処理装置１は、収集装置３から送信された抽出後の画像データを通信部１３から取得する（ステップＳ１２１）。情報処理装置１の制御部１０は、取得した画像データを、学習済みの学習モデル１Ｍとして機能する画像処理部１１へ入力する（ステップＳ１２２）。 The information processing device 1 acquires the extracted image data transmitted from the collection device 3 from the communication unit 13 (step S121). The control unit 10 of the information processing device 1 inputs the acquired image data to the image processing unit 11 that functions as the trained learning model 1M (step S122).

画像処理部１１は、学習モデル１Ｍとして、値段を示すテキストデータ（数値データ）、認識された数値の確信度のデータ、及び値段が写っているか否かを示す存否データを出力する（ステップＳ１２３）。 The image processing unit 11 outputs, as the learning model 1M, text data (numeric data) indicating the price, data on the reliability of the recognized numerical value, and presence/absence data indicating whether the price is included in the image (step S123). .

制御部１０は、テキストデータ（数値データ）及び存否データを通信部１３から収集装置３へ送信する（ステップＳ１２４）。 The control unit 10 transmits text data (numeric data) and presence/absence data from the communication unit 13 to the collection device 3 (step S124).

収集装置３の制御部３０は、情報処理装置１から送信されたテキストデータ及び存否データを通信部３３によって取得する（ステップＳ３１０）。制御部３０は、取得したテキストデータ及び存否データを、識別ＩＤに対応付けて記憶装置４に記憶させ（ステップＳ３１１）、処理を終了する。 The control unit 30 of the collection device 3 uses the communication unit 33 to acquire the text data and presence/absence data transmitted from the information processing device 1 (step S310). The control unit 30 stores the acquired text data and presence/absence data in the storage device 4 in association with the identification ID (step S311), and ends the process.

ステップＳ３１１により、存否データが、文字列が写っていないことを示す画像データであっても、記憶装置４に蓄積される。 In step S311, presence/absence data is accumulated in the storage device 4 even if the presence/absence data is image data indicating that no character string is captured.

ステップＳ３０６にて、値段が写っている領域が検出できていなかったと判断された場合（Ｓ３０６：ＮＯ）、制御部３０は、ステップＳ３０７－Ｓ３１１の処理を省略して処理を終了する。制御部３０はこの場合、ステップＳ３０３で記憶した画像データ及び位置データを削除してもよい。 If it is determined in step S306 that the area in which the price is shown has not been detected (S306: NO), the control unit 30 omits the processes in steps S307 to S311 and ends the process. In this case, the control unit 30 may delete the image data and position data stored in step S303.

収集装置３は、情報処理装置１とは異なるハードウェアとして存在し、図１０のフローチャートに示す処理手順を実行するとして説明したが、収集装置３及び情報処理装置１は１つのコンピュータで実現されてもよい。 Although the collection device 3 has been described as existing as hardware different from the information processing device 1 and executing the processing procedure shown in the flowchart of FIG. 10, the collection device 3 and the information processing device 1 are realized by one computer. Good too.

図１１は、ドライブレコーダで撮像された画像に対する処理結果を示す。図１１に示すように、撮像画像に対し、検出モデル３Ｍに基づいて値段が写っている領域が、所定の確信度で検出されて抽出される。図１１に示すように、抽出後の画像が学習モデル１Ｍに入力された場合、ガソリンの値段を示す数値、領域検出の確信度、数値の確信度、存否結果の確信度が出力される。 FIG. 11 shows processing results for images captured by a drive recorder. As shown in FIG. 11, based on the detection model 3M, an area in which a price is shown is detected and extracted with a predetermined confidence in the captured image. As shown in FIG. 11, when the extracted image is input to the learning model 1M, the numerical value indicating the price of gasoline, the confidence level of area detection, the confidence level of the numerical value, and the confidence level of the presence/absence result are output.

このように情報処理装置１を用いて得られる画像に写っている文字列を学習済みの学習
モデル１Ｍを用いて認識できることにより、以下に示すように、ガソリンの値段を地図上に示すサービスが実現される。 By being able to recognize character strings in images obtained using the information processing device 1 using the trained learning model 1M, a service that shows gasoline prices on a map is realized as shown below. be done.

図１２は、情報提供装置５及び情報端末装置６の構成を示すブロック図である。情報提供装置５は、サーバコンピュータであって、制御部５０、記憶部５１、及び通信部５２を備える。 FIG. 12 is a block diagram showing the configurations of the information providing device 5 and the information terminal device 6. As shown in FIG. The information providing device 5 is a server computer and includes a control section 50, a storage section 51, and a communication section 52.

制御部５０は、ＣＰＵ又はＧＰＵ等であるプロセッサ及びメモリ等を用い、装置の構成部を制御して各種機能を実現する。記憶部５１には、情報提供プログラム５Ｐ、情報端末装置６からのリクエストを受け付けるためのＷｅｂサーバプログラム、及び地図データが記憶されている。記憶部５１にはその他、制御部５０が参照するデータが記憶されている。 The control unit 50 uses a processor such as a CPU or GPU, a memory, and the like to control the components of the device and realize various functions. The storage unit 51 stores an information providing program 5P, a web server program for accepting requests from the information terminal device 6, and map data. The storage unit 51 also stores data that the control unit 50 refers to.

通信部５２は、ネットワークＮを介した情報端末装置６との通信接続を実現するためのネットワークカード又は無線通信モジュールである。通信部５２は、記憶装置４からデータを読み出すためにネットワークＮを介した通信接続を実現するためのネットワークカード又は無線通信モジュールを含む。 The communication unit 52 is a network card or a wireless communication module for realizing a communication connection with the information terminal device 6 via the network N. The communication unit 52 includes a network card or a wireless communication module for realizing a communication connection via the network N in order to read data from the storage device 4.

制御部５０は、情報提供プログラム５Ｐに基づき、情報端末装置６からのリクエストに基づいて、リクエストで指示される位置データを含む所定範囲におけるガソリンの値段を、地図上に示すＷｅｂページのデータを情報端末装置６へ送信する。 Based on the information provision program 5P and based on a request from the information terminal device 6, the control unit 50 displays data on a web page that shows the price of gasoline on a map in a predetermined range including the location data specified in the request. It is transmitted to the terminal device 6.

情報端末装置６は、パーソナルコンピュータ、スマートフォン、タブレット端末等のコンピュータである。情報端末装置６は、制御部６０、記憶部６１、表示部６２、操作部６３、及び通信部６４を備える。 The information terminal device 6 is a computer such as a personal computer, a smartphone, or a tablet terminal. The information terminal device 6 includes a control section 60, a storage section 61, a display section 62, an operation section 63, and a communication section 64.

制御部６０は、ＣＰＵまたはＧＰＵを用いたプロセッサである。制御部６０は、ＣＰＵ、又はＧＰＵ等のプロセッサと、メモリ等を含む。制御部６０は、記憶部６１に記憶されている汎用のＷｅｂブラウザベースの表示プログラム６Ｐに基づき、情報提供装置５と通信接続し、汎用コンピュータを、情報提供サービスを受ける端末装置として動作させる。 The control unit 60 is a processor using a CPU or GPU. The control unit 60 includes a processor such as a CPU or GPU, a memory, and the like. The control unit 60 communicates with the information providing device 5 based on a general-purpose Web browser-based display program 6P stored in the storage unit 61, and operates the general-purpose computer as a terminal device that receives information providing services.

記憶部６１は、例えばフラッシュメモリ等の不揮発性メモリを含む。記憶部６１には、上述の表示プログラム６Ｐが記憶されている。 The storage unit 61 includes, for example, a nonvolatile memory such as a flash memory. The storage unit 61 stores the above-mentioned display program 6P.

表示部６２は、液晶パネル又は有機ＥＬディスプレイ等のディスプレイ装置を含む。操作部６３は、ユーザの操作を受け付けるインタフェースであり、物理ボタン、ディスプレイ内蔵のタッチパネルデバイスを含む。操作部６３は、物理ボタンまたはタッチパネルにて表示部６２で表示している画面上における操作を受け付けることが可能である。操作部６３は、マイクロフォン等を含み、マイクロフォンにて入力音声から操作内容を認識して操作を受け付けてもよい。 The display unit 62 includes a display device such as a liquid crystal panel or an organic EL display. The operation unit 63 is an interface that accepts user operations, and includes physical buttons and a touch panel device with a built-in display. The operation unit 63 can accept operations on the screen displayed on the display unit 62 using physical buttons or a touch panel. The operation unit 63 may include a microphone or the like, and may recognize the operation content from the input voice using the microphone and accept the operation.

通信部６４は、ネットワークＮを介した情報端末装置６との通信接続を実現するためのネットワークカード又は無線通信モジュールである。 The communication unit 64 is a network card or a wireless communication module for realizing a communication connection with the information terminal device 6 via the network N.

制御部６０は、表示プログラム６Ｐに基づいて、操作部６３で受け付けた位置データをリクエストとして情報提供装置５へ送信し、位置データが示す位置周辺におけるガソリンの値段を表示部６２に表示させることができる。 Based on the display program 6P, the control unit 60 transmits the position data received by the operation unit 63 as a request to the information providing device 5, and causes the display unit 62 to display the price of gasoline around the location indicated by the position data. can.

図１３は、情報提供装置５によって提供される情報の表示例を示す。図１３には、地図画像上に、各位置で撮像された画像の画像データから認識されたガソリンの値段のテキス
ト又は画像が重畳して表示されている。これにより、情報端末装置６を操作する操作者は、ガソリンの値段を把握することができる。ガソリンの値段は、車両Ｖで撮像された画像データが送信される都度、最新のデータに更新される。履歴として記憶装置４に蓄積されるので、制御部６０は、操作に応じて、ガソリンの値段の推移を表示部６２に表示させてもよい。 FIG. 13 shows a display example of information provided by the information providing device 5. As shown in FIG. In FIG. 13, text or images of gasoline prices recognized from the image data of images captured at each location are displayed superimposed on the map image. This allows the operator of the information terminal device 6 to grasp the price of gasoline. The price of gasoline is updated to the latest data each time image data captured by the vehicle V is transmitted. Since it is stored in the storage device 4 as a history, the control unit 60 may display the change in gasoline prices on the display unit 62 in accordance with the operation.

このようにして、ドライブレコーダで撮像されるガソリンスタンドにおけるガソリンの値段といった比較的小さな範囲の低解像となる画像内の文字列であっても、学習モデル１Ｍを用いた認識であれば、高精度に値段を認識できる。 In this way, even if it is a character string in a relatively small range of low-resolution images, such as the price of gasoline at a gas station captured by a drive recorder, recognition using learning model 1M can achieve high resolution. You can recognize prices with precision.

実施の形態２においては、収集装置３が収集する画像データは、車両Ｖのドライブレコーダにて撮影された画像の画像データであった。車両Ｖはタクシーに限られないし、運搬車輌であってもよい。自動運転機能を有する車両であってもよい。更に、文字列（数字列）の認識対象となる画像データは、車両Ｖに限らず、所謂ドローンと呼ばれるような無人機に搭載されている撮影装置によって撮影されたデータであってもよい。 In the second embodiment, the image data collected by the collection device 3 is image data of an image captured by the drive recorder of the vehicle V. The vehicle V is not limited to a taxi, and may be a transport vehicle. The vehicle may also have an automatic driving function. Further, the image data to be recognized as a character string (number string) is not limited to the vehicle V, and may be data taken by a photographing device mounted on an unmanned aircraft such as a so-called drone.

上述のように開示された実施の形態は全ての点で例示であって、制限的なものではない。本発明の範囲は、特許請求の範囲によって示され、特許請求の範囲と均等の意味及び範囲内での全ての変更が含まれる。 The embodiments disclosed above are illustrative in all respects and are not restrictive. The scope of the present invention is indicated by the claims, and includes all changes within the meaning and range equivalent to the claims.

１情報処理装置
１０制御部
１１画像処理部
１２記憶部
１３通信部
１Ｍ学習モデル（第２モデル）
１２１ＣＮＮ
１２２ＲＮＮ
１２３ＣＲＦ層
１２４第１出力層
１２５第２出力層
１Ｐ学習プログラム
３収集装置
３０制御部
３２記憶部
３０Ｐ処理プログラム
３１Ｐ画像処理プログラム
３Ｍ検出モデル（第１モデル）
４記憶装置
４１記憶部
６情報端末装置
６２表示部 1 Information processing device 10 Control unit 11 Image processing unit 12 Storage unit 13 Communication unit 1M Learning model (second model)
121 CNN
122 RNN
123 CRF layer 124 First output layer 125 Second output layer 1P Learning program 3 Collection device 30 Control section 32 Storage section 30P Processing program 31P Image processing program 3M Detection model (first model)
4 Storage device 41 Storage section 6 Information terminal device 62 Display section

Claims

A method for generating a learning model that outputs text data and presence/absence data of character strings in the image when image data is input , the method comprising:
The learning model includes a convolutional neural network that separates and sequentially inputs image data, and a recurrent neural network that sequentially inputs feature data output from the convolutional neural network,
The recurrent neural network includes a first output layer that outputs a character string appearing in an image of input image data, and a second output layer that outputs presence/absence data of a character string in the image,
For the learning model, a set of image data of an image in which a character string is shown and text data of the character string, image data of an image in which a character string is shown and an image in which no character string is shown, and the image data. For each, obtain training data including a pair with presence/absence data indicating whether or not a character string is included in the image,
When image data of an image containing a character string included in the teacher data is input, the convolutional neural network uses the output from the first output layer and the output from the second output layer. learn the parameters in
When image data of an image that does not include a character string and is included in the teacher data is input, the parameters in the convolutional neural network and the recurrent neural network are set without using the output from the first output layer, A method for generating a learning model in which learning is performed using a loss function using the output from the second output layer .

The convolutional neural network is configured to sequentially extract mutually overlapping predetermined ranges in the horizontal direction from images of input image data, and input image data of images in the extracted predetermined ranges. Generation method.

When image data of an image showing the character string is input, the probability that the correct text data of the teacher data will be obtained using the output from the first output layer and the output from the second output layer. learning parameters in the convolutional neural network and recurrent neural network so as to maximize
The production method according to claim 1 or 2 .

The learning model is
When image data of an image showing a character string is input, outputting text data of the character string and presence/absence data indicating that the character string is shown;
According to any one of claims 1 to 3 , the device is trained to output presence/absence data indicating that a character string is not included when image data of an image that does not include a character string is input. How to generate.

The generation method according to any one of claims 1 to 4 , wherein the image data is image data photographed by a photographing device mounted outward on a vehicle.

A computer program that generates, when image data is input to a computer, a learning model that outputs text data and presence/absence data of a character string in the image, the computer program comprising:
The learning model includes a convolutional neural network that separates and sequentially inputs image data, and a recurrent neural network that sequentially inputs feature data output from the convolutional neural network,
The recurrent neural network includes a first output layer that outputs a character string appearing in an image of input image data, and a second output layer that outputs presence/absence data of a character string in the image,
to the computer;
For the learning model, a set of image data of an image in which a character string is shown and text data of the character string, image data of an image in which a character string is shown and an image in which no character string is shown, and the image data. For each, obtain training data including a pair with presence/absence data indicating whether or not a character string is included in the image,
When image data of an image containing a character string included in the teacher data is input, the convolutional neural network uses the output from the first output layer and the output from the second output layer. learn the parameters in
When image data of an image that does not include a character string and is included in the teacher data is input, the parameters in the convolutional neural network and the recurrent neural network are set without using the output from the first output layer, Learning by a loss function using the output from the second output layer
A computer program that performs a process.

A method for generating a learning model that outputs text data and presence/absence data of a character string in the image when a computer inputs image data, the method comprising:
The learning model includes a convolutional neural network that separates and sequentially inputs image data, and a recurrent neural network that sequentially inputs feature data output from the convolutional neural network,
The recurrent neural network includes a first output layer that outputs a character string appearing in an image of input image data, and a second output layer that outputs presence/absence data of a character string in the image,
The computer includes:
For the learning model, a set of image data of an image in which a character string is shown and text data of the character string, image data of an image in which a character string is shown and an image in which no character string is shown, and the image data. For each, obtain training data including a pair with presence/absence data indicating whether or not a character string is included in the image,
When image data of an image containing a character string included in the teacher data is input, the convolutional neural network uses the output from the first output layer and the output from the second output layer. learn the parameters in
When image data of an image that does not include a character string and is included in the teacher data is input, the parameters in the convolutional neural network and the recurrent neural network are set without using the output from the first output layer, Learning by a loss function using the output from the second output layer
Information processing method.

A computer storing a model trained by the generation method according to any one of claims 1 to 5 ,
Get the image data,
Input the acquired image data into the model,
A computer program that executes a process of determining whether or not a character string appears in an image of the image data and outputting the character string that appears, based on text data and presence/absence data output from the model.

to the computer;
A request for executing a process of associating and storing the text data output from the model, information regarding the certainty that a character string is reflected in the image of the image data, which is output together with the text data, and the image data. The computer program according to item 8 .

to the computer;
A claim that executes a process of storing the image data in association with data indicating the absence of a character string when the presence/absence data output from the model indicates that the character string is not included in the image. 10. The computer program according to 8 or 9 .

The computer program according to any one of claims 8 to 10 , wherein the image data is image data photographed by a photographing device mounted outward on a vehicle.

to the computer;
obtaining position data of the vehicle at the timing when the image data was taken;
12. The computer program according to claim 11 , wherein the computer program executes a process of storing the acquired position data in association with text data and presence/absence data output from the model by inputting the image data into the model.

A first model trained to detect a range in which a character string to be detected is included in an image of the image data when image data is input; A computer that stores a second model trained by the generation method described above ,
Obtain first image data,
inputting the acquired first image data into the first model;
obtaining second image data in which a detection range detected by the first model is extracted from the first image data;
inputting second image data into the second model;
The detection range output from the first model and the confidence that the character string is included in the detection range, the text data output from the second model and the confidence that the character string is output together with the text data. A computer program that causes a computer program to execute a process of storing an image data in association with the first image data.

A storage unit storing a model trained by the generation method according to any one of claims 1 to 5 ;
an image acquisition unit that acquires image data;
By inputting the image data acquired by the image acquisition unit into the model, it is possible to determine whether or not a character string appears in the image of the image data, based on the text data and presence/absence data output from the model. An information processing device comprising: an output unit that outputs a character string;

Storing a trained model by the generation method according to any one of claims 1 to 5 ,
Get the image data,
Input the acquired image data into the model,
An information processing method including processing for determining whether or not a character string is shown in an image of the image data and outputting a character string that is shown based on text data and presence/absence data output from the model.