JP3338570B2

JP3338570B2 - Prediction device for processing end time in batch processing of natural language

Info

Publication number: JP3338570B2
Application number: JP28776294A
Authority: JP
Inventors: 信明高橋; 晴嗣加藤
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1994-11-22
Filing date: 1994-11-22
Publication date: 2002-10-28
Anticipated expiration: 2017-10-28
Also published as: JPH08147304A

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は自然言語の一括処理にお
ける処理終了時間の予測装置に関する。The present invention relates to relates to the prediction apparatus of the processing end time in the batch processing of natural language.

【０００２】日本語，英語等の自然言語について言語翻
訳や文字認識等の処理を行う自然言語処理は，処理対象
となる文書（テキスト）の量が膨大になると処理時間が
かなりかかる。しかし，処理終了時間を的確に予測する
ことができないため，利用者は，単に処理が終了するの
を無駄に待機するか，放置しておくことが多く，予定を
立てられない点で不都合がありその改善が望まれてい
る。2. Description of the Related Art Natural language processing for performing processing such as language translation and character recognition for natural languages such as Japanese and English takes a considerable amount of processing time when the amount of documents (texts) to be processed is enormous. However, since it is not possible to accurately predict the processing end time, the user often simply waits or leaves the processing end in vain, which is inconvenient in that the user cannot make a schedule. The improvement is desired.

【０００３】[0003]

【従来の技術】従来，自然言語処理において，処理中の
文書番号またはパーセンテージで経過を表示する機能は
あったが，終了時間を予測することができなかった。ま
た，終了時間を予測する技術を開示する文献として，特
開昭６３−２８４６７７号公報が存在する。この公報に
記載された技術を図５に従来例の説明図として示す。2. Description of the Related Art Conventionally, in natural language processing, there has been a function of displaying a progress by a document number or a percentage being processed, but an end time cannot be predicted. Japanese Patent Application Laid-Open No. 63-284677 is a document that discloses a technique for estimating an end time. The technology described in this publication is shown in FIG. 5 as an explanatory diagram of a conventional example.

【０００４】図５において，Ａ．はブロック構成図，
Ｂ．はフローチャートである。最初に処理すべき文を外
部記憶装置かキーボードから入力すると入力文保持部９
１に保持され，この時，文長計測部９４で入力文の字数
を計数する（図５のＳ１）。入力文の保持が完了すると
予測処理時間を算出する（同Ｓ２，Ｓ３）。この場合，
処理能力が予め保持された処理能力保持部９５に保持さ
れており，文長計測部９４で計測された文字数を，処理
能力（字／秒）で除算することにより予測処理時間が得
られる。算出結果は予測処理時間保持部９７に保持さ
れ，希望により表示され（同Ｓ５，Ｓ６），次に自然言
語処理が自然言語処理部９３で処理され（同Ｓ７），入
力文保持部９１の全内容が処理されると（同Ｓ８），ス
テップＳ１に戻って，新たに入力文を保持して同様の動
作を繰り返す。[0004] In FIG. Is the block diagram,
B. Is a flowchart. When a sentence to be processed first is input from an external storage device or a keyboard, an input sentence holding unit 9
At this time, the sentence length measuring unit 94 counts the number of characters of the input sentence (S1 in FIG. 5). When the holding of the input sentence is completed, a predicted processing time is calculated (S2 and S3). in this case,
The processing capacity is held in a processing capacity holding unit 95 in which the processing capacity is held in advance, and the predicted processing time is obtained by dividing the number of characters measured by the sentence length measuring unit 94 by the processing capacity (characters / second). The calculation result is held in the predicted processing time holding unit 97 and displayed as desired (S5 and S6), and then natural language processing is processed by the natural language processing unit 93 (S7). When the content is processed (S8), the process returns to step S1, and the same operation is repeated while newly holding the input sentence.

【０００５】なお，この従来例の公報には，他のフロー
チャートとして，入力文字長を計測した後，自然言語処
理を行って，予測処理時間算出と算出結果を保持するス
テップを経て更に自然言語処理を行う例が示されている
が，これは，処理能力保持部に処理能力を予め保持して
おくのではなく，処理の過程で処理を終了した文の長さ
と処理時間から能力を算出して保持するものであるが，
詳細な内容が記載されてないために具体的な構成は不明
である。In this prior art publication, as another flowchart, after measuring the input character length, a natural language process is performed, and a predicted processing time calculation and a result holding step are performed. This is an example in which the processing capacity is not stored in the processing capacity holding unit in advance, but the processing capacity is calculated from the length of the sentence that has been processed in the process and the processing time. To keep,
The specific configuration is unknown because detailed contents are not described.

【０００６】[0006]

【発明が解決しようとする課題】上記従来例によれば，
入力文保持部に保持された文字数を計測して，処理能力
保持部に保持された１秒間に処理可能な文字数で除算す
ることにより予測処理時間を算出することを原理とする
ものである。According to the above conventional example,
The principle is to calculate the predicted processing time by measuring the number of characters held in the input sentence holding unit and dividing by the number of characters that can be processed in one second held in the processing capacity holding unit.

【０００７】一方，多数の文字からなる文が，多数個で
構成される文章を言語翻訳する等の自然言語処理を行う
場合には，１秒間で何文字処理ができるかという文処理
能力を予め設定することは，その内容に応じて変化する
ため正確な数値を得ることは困難であり，文章が長大に
なると誤差が大きくなり，予測値が信用されないという
問題があった。On the other hand, when a sentence composed of a large number of characters is subjected to natural language processing such as language translation of a sentence composed of a large number of characters, the sentence processing ability to determine how many characters can be processed in one second is determined in advance. It is difficult to obtain an accurate numerical value because the setting changes depending on the content, and there is a problem that if the text is long, the error increases and the predicted value is not trusted.

【０００８】そのため，言語処理の終了時間は利用者が
経験則に基づいてかなり大まかに予測するのが現状であ
る。本発明は自然言語処理の対象となる長大な文章を一
括処理する場合に，処理の経過に従って終了時間を調整
しながら正確に予測することができる自然言語の一括処
理における処理終了時間の予測装置を提供することを目
的とする。Therefore, at present, the end time of the language processing is roughly roughly predicted by the user based on empirical rules. The present invention provides an apparatus for predicting the processing end time in batch processing of natural language, which can accurately predict the end time according to the progress of processing when batch processing a long sentence to be processed by natural language. The purpose is to provide.

【０００９】[0009]

【課題を解決するための手段】図１は本発明の原理構成
図である。図１において，１は処理装置，１０は文識別
部，１１は一文の文字数別の文の数や処理の経過で発生
するデータが格納されるテーブル，１２は自然言語処理
部，１３はタイマ部，１４はテーブル設定部，１５は終
了時間予測部，１５ａは実績データ有り文字数別終了時
間算出部，１５ｂは実績データ無し文字数別終了時間算
出部，１５ｃは終了時間合計部，２は自然言語処理の対
象となる文書が格納され，処理結果も格納される文書格
納部，３は表示装置である。FIG. 1 is a block diagram showing the principle of the present invention. In FIG. 1, 1 is a processing device, 10 is a sentence identification unit, 11 is a table storing the number of sentences for each sentence and data generated during the course of processing, 12 is a natural language processing unit, and 13 is a timer unit , 14 is a table setting unit, 15 is an end time prediction unit, 15a is an end time calculation unit for each number of characters with actual data, 15b is an end time calculation unit for each number of characters without actual data, 15c is a total end time unit, and 2 is natural language processing. A document storage unit 3 in which a document to be processed is stored and a processing result is also stored is a display device.

【００１０】本発明は多数の自然言語の文を含む長大な
テキストを一括処理する場合に，全テキストについて各
文の文字の数を計測して，文字数別の文の個数をテーブ
ルに設定し，言語処理を開始して，一文の自然言語処理
が終了する毎に，その処理時間を文字数に対応する処理
時間としてテーブルに登録して，処理が行われた文字数
別の各文の終了時間と処理が行われない文字数に対する
予測を含む計算により終了処理時間を求めて表示するも
のである。The present invention measures the number of characters in each sentence for all texts when batch processing a long text including many natural language sentences, and sets the number of sentences for each number of characters in a table. Each time the natural language processing of one sentence is completed, the processing time is registered in the table as the processing time corresponding to the number of characters, and the end time and processing of each sentence by the number of characters processed The end processing time is obtained and displayed by calculation including prediction for the number of characters for which no is performed.

【００１１】[0011]

【作用】処理装置１の文識別部１０は，自然言語が格納
された文書格納部２から全文書のデータを順次取り出し
て，一文（ピリオド，句点等で区切られる文字の列）毎
に含まれる文字数が何文字であるか識別し，識別結果に
よりテーブル１１の内容を更新する。テーブル１１に
は，一文の文字数別（１文字単位ではなく，一定の幅を
持つ文字数グループ）に分けて，文数や他のデータを格
納する位置が設けられ，識別結果に対応する文字数の文
数を順次インクリメント（＋１）する形で設定される。
こうして，文書格納部２の全文について識別が終了する
と，テーブル１１には全文書の各文の文字数別の文数が
全て格納される。The sentence identification unit 10 of the processing device 1 sequentially retrieves the data of all the documents from the document storage unit 2 in which the natural language is stored, and includes the data for each sentence (character string delimited by a period, a period, etc.). The number of characters is identified, and the contents of the table 11 are updated based on the identification result. In the table 11, positions for storing the number of sentences and other data are provided for each sentence according to the number of characters of each sentence (not for each character but for a number of characters having a fixed width). The number is set so as to be sequentially incremented (+1).
When the identification of all the sentences in the document storage unit 2 is completed in this way, the table 11 stores all the number of sentences of each sentence of the entire document.

【００１２】この後，自然言語処理部１２が処理を開始
する。自然言語処理部１２は一つの文について処理を開
始してから終了するまでの時間をタイマ部１３により計
時して，処理が終了すると処理した文の文字数とタイマ
部１３から得た処理時間をテーブル設定部１４に供給す
る。テーブル設定部１４はテーブル１１の該当する文字
数に対応する実績データを格納する。この実績データと
しては，処理済文数と累積処理時間とが含まれ，以前の
実績データがあると，その内容を今回のデータで更新す
る。Thereafter, the natural language processing section 12 starts processing. The natural language processing unit 12 measures the time from the start to the end of processing for one sentence by the timer unit 13, and when the processing ends, the number of characters of the processed sentence and the processing time obtained from the timer unit 13 are stored in a table. It is supplied to the setting unit 14. The table setting unit 14 stores the result data corresponding to the number of characters in the table 11. The result data includes the number of processed sentences and the accumulated processing time. If there is previous result data, the content is updated with the current data.

【００１３】次に終了時間予測部１５が起動される。終
了時間予測部１５は最初にテーブル１１を参照して実績
データがある文字数について，実績データ有り文字数別
終了時間算出部１５ａが駆動され，テーブル１１の実績
データを用いて，未処理の文数を処理するのに要する時
間を計算する。次にテーブル１１の実績データが設定さ
れていない文字数がある場合，実績データ無し文字数別
終了時間算出部１５ｂが，上記実績データ有り文字数別
終了時間算出部１５ａの実績データを用いて，各文字数
のそれぞれの文の個数を処理するための時間を算出す
る。各算出部１５ａ，１５ｂの処理が終了すると，終了
時間合計部１５ｃはそれぞれの算出結果を加算し，加算
結果は表示装置３に表示される。Next, the end time prediction unit 15 is started. The end time predicting unit 15 first drives the end time calculating unit 15a for the number of characters having actual data with reference to the table 11 for the number of characters having actual data, and calculates the number of unprocessed sentences using the actual data in the table 11. Calculate the time required to process. Next, when there is the number of characters for which the actual data in the table 11 is not set, the end time calculating unit 15b for each character without actual data uses the actual data of the end time calculating unit 15a for each character with actual data to calculate the number of characters. Calculate the time to process the number of each sentence. When the processing of each of the calculation units 15a and 15b ends, the end time totaling unit 15c adds the respective calculation results, and the addition result is displayed on the display device 3.

【００１４】[0014]

【実施例】図２は実施例のフローチャートであり，図
３，図４は処理に応じて内容が設定，変化するテーブル
の具体例（その１），（その２）である。FIG. 2 is a flowchart of the embodiment, and FIGS. 3 and 4 show specific examples (part 1) and (part 2) of tables whose contents are set and changed according to processing.

【００１５】図２のフローチャートは，ＣＰＵ及びメモ
リで構成する処理装置（図１の１に対応）により実行さ
れ，予めハードディスク等のファイル装置（図１の文書
格納部２に対応）に処理対象となる自然言語の文書を格
納しておく。 (1) 初期処理最初に文書全体の文数を獲得する。この場合，文の区切
りを検出し，その中に含まれる文字数に応じて文を分類
して，テーブルを作成する（図２のＳ１）。The flowchart of FIG. 2 is executed by a processing unit (corresponding to 1 in FIG. 1) constituted by a CPU and a memory, and is previously stored in a file device such as a hard disk (corresponding to the document storage unit 2 in FIG. 1). A natural language document is stored. (1) Initial processing First, the number of sentences in the entire document is obtained. In this case, a delimiter of a sentence is detected, the sentence is classified according to the number of characters included therein, and a table is created (S1 in FIG. 2).

【００１６】この処理では，ファイル装置から文書デー
タを順次取り出して，文の区切りを識別し（予め設定し
たピリオド，句点等），一つの文を構成する文字数をカ
ウントし，図３のＡ．に示すようなテーブル（図１の１
１に対応）の一文の文字数に対応する文数を設定する動
作が行われる。In this processing, the document data is sequentially taken out from the file device, a sentence break is identified (a predetermined period, a period, etc.), and the number of characters constituting one sentence is counted. Table (1 in FIG. 1)
An operation of setting the number of sentences corresponding to the number of characters of one sentence (corresponding to 1) is performed.

【００１７】一文の識別を行う場合，自然言語処理にお
ける一文の識別と同じ論理（同じ基準）により行い，文
字数のカウントでは空白（スペース）をカウントせず，
カタカナ文字列については１文字として扱い，英文や欧
文は文字数を単語数として計算する。When one sentence is identified, the same logic (same criterion) as that of one sentence in natural language processing is used. In counting the number of characters, white space is not counted.
Katakana character strings are treated as one character, and the number of characters is calculated as the number of words in English and European languages.

【００１８】更に，テーブルを作成する場合，その構成
（分類の個数，各分類毎の文字数の範囲等）が動的に設
定される。すなわち，文書全体に含まれる多数の文につ
いて識別し，一文に含まれる最大文字数，最小文字数が
分かるので，その数値からテーブルの，各分類の最適な
個数と各分類の範囲に含まれる文字数を決定する。Furthermore, when a table is created, its configuration (the number of classes, the range of the number of characters for each class, etc.) is dynamically set. In other words, many sentences included in the entire document are identified, and the maximum number of characters and the minimum number of characters included in one sentence can be determined. From the numerical values, the optimal number of each type in the table and the number of characters included in the range of each type are determined. I do.

【００１９】図３のＡ．に示す初期処理による設定され
たテーブルの具体例では，一文の文字数は，最大が８０
文字，最小は１０文字以内であるため，一文に含まれる
文字数（で示す）の分類として，０〜１０，１１〜２
０，…，７１〜８０の８つのグループに分類した例であ
り，全文書について識別を行った結果，図３に示すよう
に，各文字数のグループ毎に該当する文の文数（で示
す）が設定され，例えば文字数が０〜１０文字で構成す
る文は３０文存在する。図３のＡ．に示す他の項目とし
て設けられた処理済文数（で示す）及び累積処理時間
（で示す）は自然言語処理を実行した時にそれぞれ設
定される。 (2) 自然言語処理全文書の文について分類が終了すると，処理する原文が
ないか判別し（図２のＳ２），無い場合は処理を終了す
るが，有る場合は，自然言語処理が開始される。最初に
処理対象の一つの文について処理開始時間を記録して
（同Ｓ３），自然言語処理（例えば，翻訳処理）を行う
（同Ｓ４）。処理を終了すると，処理終了時間を記録す
る（同Ｓ５）。次にテーブルの文字数（で示す）に応
じた処理済文数（で示す）をカウントアップし（同
６），次に上記の処理終了時間と処理開始時間の差を求
めて処理時間を検出し文字数に応じた処理累積時間を記
録する（同Ｓ７）。FIG. In the specific example of the table set by the initial processing shown in, the maximum number of characters in one sentence is 80.
Characters, since the minimum is 10 characters or less, the classification of the number of characters contained in one sentence (indicated by) is 0 to 10, 11 to 2
This is an example of classification into eight groups of 0,..., 71 to 80. As a result of identifying all the documents, as shown in FIG. Is set. For example, there are 30 sentences having a character number of 0 to 10 characters. FIG. The number of processed sentences (indicated by) and the accumulated processing time (indicated by) provided as other items are set when the natural language processing is executed. (2) Natural language processing When the classification of all the sentences in the document is completed, it is determined whether there is an original text to be processed (S2 in FIG. 2). If there is no original text, the processing is ended. If there is, the natural language processing is started. You. First, the processing start time is recorded for one sentence to be processed (S3), and natural language processing (for example, translation processing) is performed (S4). When the processing ends, the processing end time is recorded (S5). Next, the number of processed sentences (indicated by) corresponding to the number of characters in the table (indicated by) is counted up (6), and the processing time is detected by calculating the difference between the processing end time and the processing start time. The processing cumulative time according to the number of characters is recorded (S7).

【００２０】図３のＢ．は処理した文の文字数に対応し
た処理済文数の記録の例であり，この例は文字数５１〜
６０文字の文を１文だけ処理した状態を表し，テーブル
内ので示す処理済文数は同じ文字数に含まれる文を処
理する毎に累積（加算）される。FIG. Is an example of recording the number of processed sentences corresponding to the number of characters of the processed sentence.
This represents a state in which only one sentence of a 60-character sentence is processed, and the number of processed sentences indicated by in the table is accumulated (added) each time a sentence included in the same number of characters is processed.

【００２１】図3 のＣ．は処理した文の文字数に対応し
た累積処理時間の記録の例であり，この例は上記図３の
Ｂ．の文（文字数５１〜６０文字の文）を処理した時の
処理時間（２５秒）がで示す累積処理時間として設定
された状態を表す。この時間は，同じ文字数の文が処理
される毎に累積（加算）される。C. of FIG. Is an example of recording of the accumulated processing time corresponding to the number of characters of the processed sentence. (A sentence having 51 to 60 characters) is set as the accumulated processing time indicated by the processing time (25 seconds). This time is accumulated (added) each time a sentence with the same number of characters is processed.

【００２２】なお，上記ステップＳ３で，文字数に対応
して処理時間を記録（２度目からは累積）するが，これ
とは別に処理済の一文毎に処理時間を記録して，サンプ
ル群テーブルを作成するようにしてもよい。この場合，
処理途中で統計的に有効な一文毎の処理時間のサンプル
がある場合には，一文処理にかかる時間の統計的な異常
値を平均値と置き換えて累積処理時間を計算することに
より，より正確な予測も可能となる。 (3) 文字数毎の終了時間の予測上記文字数毎に未処理文の処理終了時間を処理済文の文
数に比例するものとして予測計算する。In step S3, the processing time is recorded (accumulated from the second time) corresponding to the number of characters. Apart from this, the processing time is recorded for each processed sentence, and the sample group table is stored. You may make it create. in this case,
If there is a sample of the processing time for each sentence that is statistically effective during the processing, replace the statistical abnormal value of the processing time for one sentence with the average value and calculate the accumulated processing time to obtain a more accurate processing time. Prediction is also possible. (3) Prediction of end time for each number of characters The processing end time of an unprocessed sentence is predicted and calculated for each number of characters assuming that it is proportional to the number of processed sentences.

【００２３】ａ．実績データがある場合の予測最初に，実績データがある文字数については各文字数毎
に比例に基づいて当該文字数の残りの文の処理終了時間
を求めて設定する（図２のＳ８）。A. Prediction in the case where there is actual data First, the processing end time of the remaining sentences of the number of characters is calculated and set based on the number of characters having actual data in proportion to each character (S8 in FIG. 2).

【００２４】上記，図３のＣ．のテーブルの例では，文
字数５１〜６０の一文の処理時間の実績が２５秒である
から，この文字数を持つ総文数１１の中の残りの１０文
を処理するのに要する時間は，正比例するものとして２
５（秒）×１０＝２５０（秒）が計算により得られる。
この値は，文字数５１〜６０文字の処理終了の予測時間
として処理装置のメモリ内に予測値として保持される。As shown in FIG. In the example of the table, since the actual processing time of one sentence of 51 to 60 characters is 25 seconds, the time required to process the remaining 10 sentences in the total sentence number 11 having this number of characters is directly proportional. As thing 2
5 (seconds) × 10 = 250 (seconds) is obtained by calculation.
This value is held as a predicted value in the memory of the processing device as a predicted time of the end of processing of 51 to 60 characters.

【００２５】図４のＡ．は実績データのある文字数につ
いての予測処理の結果を示す。ｂ．実績データがない場合の予測次に実績データがない文字数については，文字数ごとの
処理終了時間を他の実績データから比例的に予測する
（図２のＳ９）。すなわち，文字数に実績データがない
場合は，文字数により処理時間が正比例するものとして
他の実績データから予測計算する。但し，他の実績デー
タが複数ある場合は最も文字数が近い実績データにより
計算を行う。FIG. Shows the result of the prediction process for the number of characters having actual data. b. Prediction when there is no actual data Next, for the number of characters that have no actual data, the processing end time for each number of characters is proportionally predicted from other actual data (S9 in FIG. 2). In other words, when there is no actual data in the number of characters, it is assumed that the processing time is directly proportional to the number of characters, and the prediction calculation is performed from other actual data. However, when there are a plurality of other result data, the calculation is performed using the result data having the closest number of characters.

【００２６】上記図４のＡ．に示す例により説明する
と，文字数５１〜６０文字の場合，１文平均が２５秒
（この場合，１文だけの実績データであるが，複数文の
実績データが得られるとこの数字は平均化する）である
から，他の文字数については，次の３つの例で示す正比
例による計算により各文字数の残りの文数に対して予測
時間が得られる。In FIG. If the number of characters is 51 to 60 characters, the average of one sentence is 25 seconds (in this case, the result data of only one sentence, but if the result data of a plurality of sentences is obtained, this number is averaged) ), For other numbers of characters, a prediction time is obtained for the remaining number of sentences for each number of characters by calculation in direct proportion as shown in the following three examples.

【００２７】文字数７１〜８０２５秒× ５文×８０／６０＝１
６６秒文字数６１〜７０２５秒×１０文×７０／６０＝２
９１秒文字数４１〜５０２５秒× ３文×５０／６０＝
６２秒他の各文字数についても同様の計算により予測時間が得
られ，図４のＢ．に示すよう全文字数についての予測時
間がそれぞれ得られる。 (4) 終了予測時間の表示各文字数の処理終了時間を合計して文書全体の処理終了
時間を予測する（図２のＳ１０）。次に合計して得られ
た時間を処理予測時間として表示装置に表示する（同Ｓ
１１）。図７の例では，各文字数毎の予測時間を加算し
て合計を表示装置（図１の３）に表示する。Number of characters 71 to 80 25 seconds × 5 sentences × 80/60 = 1
66 seconds 61-70 characters 25 seconds × 10 sentences × 70/60 = 2
91 seconds Number of characters 41-50 25 seconds x 3 sentences x 50/60 =
The predicted time is obtained by the same calculation for the other character counts of 62 seconds. As shown in the figure, the prediction time for the total number of characters is obtained. (4) Display of predicted end time The process end time for each character is totaled to predict the process end time of the entire document (S10 in FIG. 2). Next, the total obtained time is displayed on the display device as the predicted processing time (S
11). In the example of FIG. 7, the predicted time for each number of characters is added, and the total is displayed on the display device (3 in FIG. 1).

【００２８】図２の処理フローでは，ステップＳ１０の
後に，次の文が存在すると，その文についてＳ４〜Ｓ１
０の処理が実行されて実績データ（処理文数及び累積処
理時間）が増加すると，予測時間が実績データの蓄積に
応じて修正されて，正確性が増加する。In the processing flow of FIG. 2, if the next sentence exists after step S10, the next sentence is sent to S4 to S1.
When the process 0 is executed and the actual data (the number of processed statements and the accumulated processing time) increases, the prediction time is corrected according to the accumulation of the actual data, and the accuracy increases.

【００２９】上記図４のＢ．の予測時間が得られた後
に，他の文字数について実績データが得られた場合の予
測時間の修正を図４のＣ．に示す予測時間の修正の例に
より説明する。FIG. The correction of the predicted time when actual data is obtained for another number of characters after the predicted time of FIG. This will be described with reference to an example of correction of the estimated time shown in FIG.

【００３０】図４のＣ．は上記図４のＢ．に示す予測時
間が得られた後，文字数６１〜７０文字の文を一つ処理
して，処理時間が３３秒であった例である。この場合，
実績データとして６１〜７０文字の処理が３３秒である
から，同じ文字数の残りの９つの文の処理に要する時間
は，９×３３＝２９７秒となる。この時間は，図４の
Ｂ．の予測値（２９１秒）と異なるため２９７秒に修正
する。FIG. Is B. in FIG. In this example, one sentence having 61 to 70 characters is processed after the predicted time shown in FIG. 1 is obtained, and the processing time is 33 seconds. in this case,
Since the processing of 61 to 70 characters as result data takes 33 seconds, the time required for processing the remaining nine sentences having the same number of characters is 9 × 33 = 297 seconds. This time corresponds to B. in FIG. Since it is different from the predicted value of (291 seconds), it is corrected to 297 seconds.

【００３１】このように，順次処理が進むにつれて，各
文字数についての実績データが発生すると，それぞれの
予測時間が修正され，更に同じ文字数の文について処理
済の文数が増えると，累積処理時間が増加して，１文当
たりの平均処理時間が実際の処理時間に近似してくるこ
とは明らかである。As described above, as actual data is generated for each number of characters as the processing proceeds sequentially, the respective predicted times are corrected, and when the number of processed sentences for the same number of characters increases, the accumulated processing time becomes longer. It is clear that the average processing time per sentence approaches the actual processing time.

【００３２】従って，自然言語処理の開始から間もない
時点では，予測時間の精度は十分ではないが，利用者に
対し終了するまでのおおよその時間を知らせることがで
き，処理が進むにつれて精度の高い終了時間を知らせる
ことができる。Therefore, at the point of time shortly after the start of the natural language processing, the accuracy of the prediction time is not sufficient, but the user can be notified of the approximate time until the end, and as the processing progresses, the accuracy of the prediction time increases. High end times can be signaled.

【００３３】[0033]

【発明の効果】本発明の装置によれば大量の自然言語の
文書の処理時間が，文書内の文を構成する文字数を用い
て高い精度で予測することが可能となり，利用者は予測
された処理終了時間になるのを無駄に待機する必要がな
くなり，効率的に言語処理を行うことができる。Processing time documents a large amount of natural language, according to the equipment of the present invention according to the present invention is, it is possible to predict with high accuracy by using the number of characters constituting the text in the document, the user is expected There is no need to wait in vain for the processing end time, and language processing can be performed efficiently.

[Brief description of the drawings]

【図１】本発明の原理構成図である。FIG. 1 is a principle configuration diagram of the present invention.

【図２】実施例のフローチャートを示す図である。FIG. 2 is a diagram showing a flowchart of the embodiment.

【図３】処理に応じて内容が設定，変化するテーブルの
具体例（その１）を示す図である。FIG. 3 is a diagram illustrating a specific example (part 1) of a table whose contents are set and changed according to processing;

【図４】処理に応じて内容が設定，変化するテーブルの
具体例（その２）を示す図である。FIG. 4 is a diagram showing a specific example (part 2) of a table whose contents are set and changed according to processing;

【図５】従来例の説明図である。FIG. 5 is an explanatory diagram of a conventional example.

[Explanation of symbols]

１処理装置１０文識別部１１テーブル１２自然言語処理部１３タイマ部１４テーブル設定部１５終了時間予測部１５ａ実績データ有り文字数別終了時間算出部１５ｂ実績データ無し文字数別終了時間算出部１５ｃ終了時間合計部２文書格納部３表示装置 Reference Signs List 1 processing device 10 sentence identification unit 11 table 12 natural language processing unit 13 timer unit 14 table setting unit 15 end time prediction unit 15a end time calculation unit by number of characters with actual data 15b end time calculation unit by number of characters without actual data 15c total end time Unit 2 document storage unit 3 display device

フロントページの続き (56)参考文献特開昭61−84779（ＪＰ，Ａ) 特開昭62−229471（ＪＰ，Ａ) 特開昭62−241069（ＪＰ，Ａ) 特開昭63−109575（ＪＰ，Ａ) 特開昭63−220364（ＪＰ，Ａ) 特開昭63−284677（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/21 - 17/28 Continuation of the front page (56) References JP-A-61-84779 (JP, A) JP-A-62-229471 (JP, A) JP-A-62-241069 (JP, A) JP-A-63-109575 (JP) JP-A-63-220364 (JP, A) JP-A-63-284677 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) G06F 17/21-17/28

Claims

(57) [Claims]

1. A processing device for batch processing natural languages.
The document storage unit in which the natural language document is stored and the display device are connected.
And the processing device is configured to store all natural language document data in the document storage unit.
For data, identify the number of characters that make up the sentence
And a natural language process for each sentence in the document storage unit.
The processing time is measured, and the characters in the table
Natural language processing unit set as actual data corresponding to the number
When being activated every completion of processing the sentence, sets the table
Using the actual data for each number of characters,
Calculate the processing end time for the number of unprocessed statements and calculate the processing end time
And an end time prediction unit for predicting
Prediction device for processing end time in batch processing of natural language.

2. The natural language processing unit according to claim 1, wherein the natural language processing unit stores actual data in the table.
The number of processed statements and the total processing time corresponding to the number of characters
Set the product value, and the end time prediction unit corresponds to each number of characters in the table
Sentence that has actual data
Processing the number of unprocessed sentences using the actual data for the number of characters
End time calculation unit for each number of characters with actual data to calculate time
And the number of characters for which there is no actual data
Data that calculates the processing time for the number of unprocessed statements using
And an end time calculating unit for each character.
Prediction device for processing end time in batch processing of natural language.