JP6375201B2

JP6375201B2 - Automatic data flow parallelization system

Info

Publication number: JP6375201B2
Application number: JP2014216939A
Authority: JP
Inventors: 矢野　純一; 純一矢野; 健哉小島; 秀夫酒井; 平野　智哉; 智哉平野; 英司後藤
Original assignee: Nomura Research Institute Ltd
Current assignee: Nomura Research Institute Ltd
Priority date: 2014-10-24
Filing date: 2014-10-24
Publication date: 2018-08-15
Anticipated expiration: 2034-10-24
Also published as: JP2016085546A

Description

この発明はデータフローの自動並列化システムに係り、特に、複数のプログラムの協働によって達成される一連の業務処理について、並列化のための設計を行うことなく、自動的に並列実行させる技術に関する。 The present invention relates to an automatic data flow parallelization system, and more particularly to a technique for automatically executing a series of business processes achieved by cooperation of a plurality of programs without performing parallel design. .

CPUのマルチコア化に伴い、データフロー上の各処理を並列化させることにより、全体的な処理時間の短縮を図ることが求められるようになってきている。
ここで、代表的な並列化手法として、データ並列、タスク並列、パイプライン並列を挙げることができる（非特許文献１参照）。 As CPUs become multi-core, it is becoming necessary to shorten the overall processing time by parallelizing each process on the data flow.
Here, typical parallelization techniques include data parallelism, task parallelism, and pipeline parallelism (see Non-Patent Document 1).

まず、データ並列とは、XML等の設定ファイルによって指定されたキー値に基づいて処理対象となるデータを複数のグループに分割し、グループ毎に同一処理を同時に実行した上で、最後に処理結果を結合させる手法である。
また、タスク並列とは、データの受け渡し情報から、依存関係がないタスク同士を判別し、それぞれを並列的に実行させるものである。
さらに、パイプライン並列とは、データの受け渡しが発生するタスク同士を同時に起動し、先行タスクによってインプットデータを全件処理される前に、後続タスクに処理済みデータを受け渡すことにより、先行・後続関係のあるタスク同士を並列化させるものである。 First, data parallel means that the data to be processed is divided into multiple groups based on the key value specified in the settings file such as XML, and the same processing is executed for each group at the same time. It is a technique to combine.
The task parallel means that tasks having no dependency relationship are determined from the data transfer information, and the tasks are executed in parallel.
Furthermore, pipeline parallel refers to starting and succeeding tasks by simultaneously starting tasks that pass data and passing processed data to subsequent tasks before all input data is processed by the preceding task. Related tasks are parallelized.

I-10-8. 並列処理プログラミングの基本、並列化処理インターネットURL：http://ossforum.jp/node/578 検索日：２０１４年１０月８日I-10-8. Basics of parallel processing programming, parallel processing Internet URL: http://ossforum.jp/node/578 Search date: October 8, 2014

以上の並列化手法を適用するに際し、各処理に別個の実行スレッド（CPUコア）を割り当てることにより、マルチコアCUPの特性を活かした処理の効率化が実現される。 When applying the above parallelization technique, by assigning a separate execution thread (CPU core) to each process, the efficiency of the process utilizing the characteristics of the multi-core CUP is realized.

しかしながら、このようなデータフローの並列化を実現するためには、データフロー設計者あるいは運用担当者が各プログラムのキー情報やプログラム間のデータ接続情報を解析して並列化の可否を判断し、その結果に基づいて並列実行用の制御プログラムをコーディングしているため、設計の難易度が高く、バグが作り込まれやすいという問題があった。 However, in order to realize parallelization of such data flow, the data flow designer or the person in charge of the operation analyzes the key information of each program and the data connection information between programs to determine whether parallelization is possible, Since the control program for parallel execution is coded based on the result, there is a problem that the design is difficult and bugs are easily created.

この発明は、このような現状を鑑みて案出されたものであり、並列化のための設計を行うことなく、データフローの並列化を自動的に実現する技術の提供を目的としている。 The present invention has been devised in view of such a current situation, and an object of the present invention is to provide a technique for automatically realizing parallelization of a data flow without performing a design for parallelization.

上記の目的を達成するため、請求項１に記載したデータフローの自動並列化システムは、異種の業務処理手段を生成するための複数種類の業務処理プログラムと、各業務処理手段から出力された並列化判定用データを解析し、並列化設計データを生成するプログラム解析手段と、上記の各業務処理プログラムに実行スレッドを割り当てて起動させ、それぞれの業務処理手段を生成するプログラム実行手段とを備えたシステムであって、上記の各業務処理手段が、それぞれの業務処理プログラム中に規定された入出力データと、入力データの分割を許容するデータ項目である分割可能キーを、上記並列化判定用データとして出力し、上記プログラム解析手段が、一方の業務処理プログラムの出力データが他方の業務処理プログラムの入力データとなっており、かつ、両プログラム中に規定された分割可能キーが共通する場合には、先行の業務処理手段による分割可能キー単位での処理が完了する都度、処理結果を後続の業務処理手段に出力して処理させる旨を規定した並列化設計データを生成し、上記プログラム実行手段が、上記並列化設計データに基づき、上記の各業務処理手段を同時に起動させて必要な処理を実行させることを特徴としている。 In order to achieve the above object, an automatic data flow parallelization system according to claim 1 includes a plurality of types of business processing programs for generating different types of business processing means, and parallel outputs output from each business processing means. Analyzing the data for decision making and generating program design means for generating parallel design data and program execution means for allocating and starting an execution thread to each of the business processing programs and generating the respective business processing means In the system, each of the above-described business processing means converts the input / output data defined in each business processing program and a splittable key that is a data item that allows splitting of the input data into the parallel determination data The program analysis means outputs the output data of one business processing program as the input data of the other business processing program. When the splittable key defined in both programs is common, the processing result is transferred to the subsequent business processing means each time processing by the preceding business processing means in units of splittable keys is completed. Generating parallelized design data that specifies that the data is to be output and processed, and the program execution means simultaneously activates each of the business processing means based on the parallelized design data to execute necessary processing. It is a feature.

請求項２に記載したデータフローの自動並列化システムは、業務処理手段を生成するための業務処理プログラムと、データ分割部を生成するためのデータ分割プログラムと、データ結合部を生成するためのデータ結合プログラムと、上記業務処理手段から出力された並列化判定用データを解析し、並列化設計データを生成するプログラム解析手段と、上記の業務処理プログラム、データ分割プログラム、データ結合プログラムに実行スレッドを割り当てて起動させ、業務処理手段、データ分割部、データ結合部を生成するプログラム実行手段とを備えたシステムであって、上記業務処理手段が、上記業務処理プログラム中に規定された入力データの分割を許容するデータ項目である分割可能キーを、上記並列化判定用データとして出力し、上記プログラム解析手段が、上記業務処理手段の入力データ
が格納されたファイルを所定の数に分割する旨と、当該業務処理手段を複数起動させて、分割された各ファイルに対する処理を各業務処理手段に同時実行させる旨と、各業務処理手段による処理が完了した後に、それぞれの出力データを結合させる旨を規定した並列化設計データを生成し、上記プログラム実行手段が、上記並列化設計データに基づき、上記データ分割部を起動して上記入力データの格納されたファイルを指定された数に分割させると共に、上記業務処理手段を同数起動させて必要な処理を同時実行させ、さらに上記データ結合部を起動して各業務処理手段からの出力データを結合する処理を実行させることを特徴としている。 An automatic data flow parallelization system according to claim 2 includes a business processing program for generating business processing means, a data dividing program for generating a data dividing unit, and data for generating a data combining unit. Analyzing the combined program and the parallel determination data output from the business processing means to generate parallel design data, and executing threads for the business processing program, the data dividing program, and the data combined program A system comprising program execution means for generating and allocating and starting a business processing means, a data dividing section, and a data combining section, wherein the business processing means divides input data defined in the business processing program The splittable key, which is a data item that allows The program analysis means divides the file storing the input data of the business processing means into a predetermined number, and activates a plurality of the business processing means, and processes each divided file to each business processing means. After the completion of the processing by each business processing means, and the parallel execution design data that specifies that the output data is combined, the program execution means, based on the parallel design data, Activating the data dividing unit to divide the file storing the input data into a specified number, activating the same number of the business processing means to simultaneously execute necessary processing, and activating the data combining unit Thus, a process for combining output data from each business processing means is executed.

請求項１に記載のデータフローの自動並列化システムによれば、各業務処理手段、プログラム解析手段及びプログラム実行手段の協働によってパイプライン並列化用のデータフローが、また請求項２に記載のデータフローの自動並列化システムによればデータ並列化用のデータフローが自動的に形成される。
このように、各種並列化用のデータフローが自動生成される結果、人為的に並列化のための設計をする場合に比べて手間がかからず、バグが混入する危険性も低くなる利点が生じる。 According to the data flow automatic parallelization system according to claim 1, the data flow for pipeline parallelization is also obtained by the cooperation of each business processing means, program analysis means, and program execution means. According to the automatic data flow parallelization system, a data flow for data parallelization is automatically formed.
In this way, as a result of automatically generating data flows for various types of parallelization, there is an advantage that it is less time-consuming and less likely to introduce bugs than when designing for parallelization artificially. Arise.

図１は、この発明に係るデータフローの自動並列化システム10の機能構成を示すブロック図であり、プログラム解析部12と、プログラム実行部14と、売上集計プログラム16と、店名付与プログラム18と、商品名抽出プログラム20と、商品名付与プログラム22と、データ分割プログラム24と、データ結合プログラム26を備えている。 FIG. 1 is a block diagram showing a functional configuration of an automatic data flow parallelization system 10 according to the present invention. A program analysis unit 12, a program execution unit 14, a sales totaling program 16, a store name assigning program 18, A product name extracting program 20, a product name assigning program 22, a data dividing program 24, and a data combining program 26 are provided.

上記プログラム解析部12及びプログラム実行部14は、コンピュータのCPUが、所定のアプリケーションプログラムに従って動作することにより実現される。 The program analysis unit 12 and the program execution unit 14 are realized by the computer CPU operating in accordance with a predetermined application program.

上記のデータ分割プログラム24及びデータ結合プログラム26は、データの分割及び結合処理を実行する汎用のプログラムである。
また、上記の売上集計プログラム16、店名付与プログラム18、商品名抽出プログラム20及び商品名付与プログラム22は、特定の業務処理を実行するプログラムである。 The data division program 24 and the data combination program 26 are general-purpose programs that execute data division and combination processing.
The sales totaling program 16, the store name assigning program 18, the product name extracting program 20 and the product name assigning program 22 are programs for executing specific business processes.

上記コンピュータは複数のCPUコアを備えており、各プログラムは、プログラム実行部14によって実行スレッド（CPUコア）が割り当てられることにより、実行可能となされる。 The computer includes a plurality of CPU cores, and each program can be executed when an execution thread (CPU core) is assigned by the program execution unit 14.

図２は、各業務処理プログラムの実行によって実現される本来のデータフロー30を示すブロック図であり、売上ファイル32と、売上集計部34と、売上合計ファイル36と、店名付与部38と、店名ファイル40と、店名付売上合計ファイル42と、商品マスタファイル44と、商品名抽出部46と、商品名ファイル48と、商品名付与部50と、店名商品名付売上合計ファイル52とによって、データフロー30は構成されている。 FIG. 2 is a block diagram showing an original data flow 30 realized by executing each business processing program. The sales file 32, the sales totaling unit 34, the sales total file 36, the store name giving unit 38, and the store name The file 40, the sales total file with store name 42, the product master file 44, the product name extracting unit 46, the product name file 48, the product name giving unit 50, and the total sales file 52 with store name and product name Flow 30 is structured.

上記売上ファイル32には、店番（店コード）、商品コード、金額、日時等のデータ項目を備えたレコードが多数格納されている。各レコードは、店番（１〜2000）をキーにソートされている。
上記売上集計部34は、上記売上集計プログラム16の起動によって実現される処理機能であり、売上ファイル32に格納された各データの金額を「店番／商品コード」毎に集計し、売上合計ファイル36に格納する処理を担当する。 The sales file 32 stores a large number of records including data items such as a store number (store code), a product code, an amount of money, and a date and time. Each record is sorted using store numbers (1 to 2000) as keys.
The sales totaling unit 34 is a processing function realized by starting the sales totaling program 16, and totals the amount of each data stored in the sales file 32 for each "store number / product code". Responsible for the process of storing in.

上記店名付与部38は、上記店名付与プログラム18の起動によって実現される処理機能であり、店番と店名との対応関係が格納された店名ファイル40を参照して、売上合計ファイル36に格納された店番／商品コード単位の集計データに店名を付与した後、店名付売上合計ファイル42に格納する処理を担当する。 The store name assigning unit 38 is a processing function realized by starting the store name assigning program 18, and is stored in the sales total file 36 with reference to the store name file 40 in which the correspondence between the store number and the store name is stored. After assigning the store name to the total data of the store number / product code unit, it is in charge of the process of storing in the total sales file with store name 42.

上記商品マスタファイル44には、商品コード、商品名、供給業者等のデータ項目を備えたレコードが多数格納されている。
上記商品名抽出部46は、上記商品名抽出プログラム20の起動によって実現される処理機能であり、商品マスタファイル44に格納された各データから、商品コードと商品名の組合せを抽出し、商品名ファイル48に格納する処理を担当する。 The product master file 44 stores a large number of records having data items such as product codes, product names, and suppliers.
The product name extraction unit 46 is a processing function realized by starting the product name extraction program 20, and extracts a combination of a product code and a product name from each data stored in the product master file 44. Responsible for processing stored in the file 48.

上記商品名付与部50は、商品名付与プログラム22の起動によって実現される処理機能であり、店名付売上合計ファイル42に格納された各集計データに対して、商品名ファイル48に格納された商品名を付与して、店名商品名付売上合計ファイル52に格納する処理を担当する。 The product name assigning unit 50 is a processing function realized by starting the product name assigning program 22, and stores the product stored in the product name file 48 for each aggregated data stored in the sales total file with store name 42. It is in charge of the process of assigning a name and storing it in the store name product name sales total file 52.

上記の売上集計プログラム16、店名付与プログラム18、商品名抽出プログラム20及び商品名付与プログラム22は、上記した各プログラム本来の業務処理部を実現するためのコードの他に、伝達部54〜60を実現するためのコードをそれぞれ備えており、各業務処理部とそれぞれの伝達部54〜60を合わせたものが、この発明の「業務処理手段」に相当する。 The sales totaling program 16, the store name assigning program 18, the product name extracting program 20 and the product name assigning program 22 include the transmission units 54-60 in addition to the codes for realizing the original business processing unit of each program described above. A code for realizing each is provided, and a combination of each business processing unit and each transmission unit 54 to 60 corresponds to the “business processing means” of the present invention.

各伝達部54〜60は、プログラム本来の業務処理に係る入力データと出力データをプログラム解析部12に渡す機能を備えている。
また、入力データの分割を許容する分割可能キーがプログラム中に定義されている場合には、その分割可能キーをプログラム解析部12に渡す機能をも備えている。 Each of the transmission units 54 to 60 has a function of passing input data and output data related to the original business process of the program to the program analysis unit 12.
In addition, when a splittable key that allows splitting of input data is defined in the program, the program has a function of passing the splittable key to the program analysis unit 12.

以下において、図３のフローチャートに従い、このシステム10の処理手順について説明する。
まずプログラム実行部14が、業務処理用の各プログラム（売上集計プログラム16、店名付与プログラム18、商品名抽出プログラム20、商品名付与プログラム22）に実行スレッドを割り当て、それぞれを起動させる（Ｓ10）。 Hereinafter, the processing procedure of the system 10 will be described with reference to the flowchart of FIG.
First, the program execution unit 14 assigns an execution thread to each business processing program (sales counting program 16, store name assigning program 18, product name extracting program 20, product name assigning program 22), and activates them (S10).

つぎに、プログラム解析部12が、各業務処理プログラムの伝達部54〜60から並列化判定用データを取得する（Ｓ12）。
すなわち、各業務処理プログラムの伝達部54〜60は、プログラム内において予め定義された「入力データ及び出力データ」をプログラム解析部12に渡す。また、当該プログラム内において「分割可能キー」が定義されている場合には、これもプログラム解析部12に渡される。 Next, the program analysis unit 12 acquires parallel determination data from the transmission units 54 to 60 of each business processing program (S12).
That is, the transmission units 54 to 60 of each business processing program pass “input data and output data” defined in advance in the program to the program analysis unit 12. If a “divisible key” is defined in the program, it is also passed to the program analysis unit 12.

つぎに、プログラム解析部12は各伝達部54〜60から取得した並列化判定用データに基づいて、分割可能キーの解析処理（Ｓ14）、パイプライン接続の解析処理（Ｓ16）、ファイル入出力関係の解析処理（Ｓ18）を実行し、その解析結果に基づく並列化設計データを生成し（Ｓ20）、プログラム実行部14に出力する。 Next, the program analysis unit 12 analyzes the separable key analysis process (S14), the pipeline connection analysis process (S16), and the file input / output relationship based on the parallelization determination data acquired from each transmission unit 54-60. Analysis processing (S18) is executed, parallelized design data based on the analysis result is generated (S20), and output to the program execution unit 14.

ここで「分割可能キーの解析処理」とは、業務処理プログラム中に処理対象となる入力データの分割を許容するキー項目が設定されている場合に、処理対象となるファイルを上記キー項目の値単位で分割することなどを指示する設計データを生成する処理を意味している。 Here, “Analyzing process of separable key” means that the file to be processed is the value of the key item when a key item that allows the division of the input data to be processed is set in the business processing program. It means a process of generating design data that instructs to divide by unit.

例えば、売上集計プログラム16は「店番×商品コード」単位で売上金額を集計することを目的としており、同じ店番のデータが複数のファイルに分割されてしまう所謂データの泣き別れが生じるとファイル単位で正しい集計値が得られないため、予め「店番」が分割可能キーとして指定されている。
これに対しプログラム解析部12は、売上ファイルを「店番」単位で所定数に分割して並列処理に供すること、また並列処理が完了した後は処理結果を結合させることを指令する並列化設計データを生成する。 For example, the sales aggregation program 16 is intended to aggregate the sales amount in units of “store number x product code”, and when the so-called data tearing occurs in which data of the same store number is divided into multiple files, it is correct in file units Since the total value cannot be obtained, “store number” is designated in advance as a separable key.
On the other hand, the program analysis unit 12 divides the sales file into a predetermined number of “store numbers” to be used for parallel processing, and after the parallel processing is completed, the parallelized design data that instructs to combine the processing results Is generated.

つぎに「パイプライン接続の解析処理」とは、各業務処理プログラム中に設定された分割可能キーが共通しており、かつ、各業務処理が連続するステップである場合に、両業務処理をパイプライン並列化させることを指示する設計データを生成する処理を意味している。 Next, “Pipeline connection analysis processing” is a process in which both business processes are piped when the separable key set in each business process program is common and each business process is a continuous step. It means a process of generating design data instructing line parallelization.

例えば、店名付与プログラム18は売上合計ファイル36に対して店番単位で店名を挿入することを目的としているため、予め「店番」が分割可能キーとして指定されている。また、売上集計プログラム16にも、上記の通り「店番」が分割可能キーとして設定されている。さらに、売上集計部34の出力データである売上合計ファイル36が、店名付与部38の入力データとなっているため、両者間には連続性が認められる。
以上のことから、プログラム解析部12は売上集計部34の処理と店名付与部38の処理についてパイプライン並列化させることを指示する設計データを生成する。 For example, since the store name assigning program 18 is intended to insert a store name in units of store numbers into the sales total file 36, “Store number” is designated in advance as a splittable key. In the sales totaling program 16, “store number” is set as a separable key as described above. Furthermore, since the sales total file 36 which is the output data of the sales totaling unit 34 is the input data of the store name giving unit 38, continuity is recognized between the two.
From the above, the program analysis unit 12 generates design data instructing pipeline parallel processing for the processing of the sales totaling unit 34 and the processing of the store name giving unit 38.

また「ファイル入出力関係解析」とは、各業務処理プログラムの入力データと出力データをそれぞれ比較し、相互間に依存関係がない場合にはタスク並列化が可能と判定することを意味している。 “File input / output relationship analysis” means comparing the input data and output data of each business processing program and determining that task parallelism is possible if there is no dependency between them. .

例えば、商品名抽出プログラム20の入力データは商品マスタファイル44であり、出力データは商品名ファイル48であり、これらは売上集計プログラム16及び店名付与プログラム18の入力データ及び出力データと異なるため、商品名抽出プログラム20は売上集計プログラム16及び店名付与プログラム18に対して並列実行可能とプログラム解析部12は判断し、その旨を指令する並列化設計データを生成する。 For example, the input data of the product name extraction program 20 is the product master file 44, the output data is the product name file 48, and these are different from the input data and output data of the sales aggregation program 16 and the store name assignment program 18, The program analysis unit 12 determines that the name extraction program 20 can be executed in parallel with respect to the sales counting program 16 and the store name assignment program 18, and generates parallel design data for instructing that.

プログラム解析部12から並列化設計データを受け取ったプログラム実行部14は、これに従って各プログラムに順に実行スレッドを割り当てることにより（Ｓ22）、図４に示すように、並列化処理用のデータフロー70を実現する。 The program execution unit 14 that has received the parallelization design data from the program analysis unit 12 assigns an execution thread to each program in accordance with this (S22), thereby creating a data flow 70 for parallelization processing as shown in FIG. Realize.

この並列化処理用のデータフロー70は、売上ファイル32と、データ分割部72と、第１の売上ファイル32aと、第２の売上ファイル32bと、第１の売上集計部34aと、第２の売上集計部34bと、第１の店名付与部38aと、第２の店名付与部38bと、店名ファイル40と、第１の店名付売上合計ファイル42aと、第２の店名付売上ファイル42bと、データ結合部74と、店名付売上合計ファイル42と、商品マスタファイル44と、商品名抽出部46と、商品名ファイル48と、商品名付与部50と、店名商品名付売上合計ファイル52とによって構成されている。 The data flow 70 for parallel processing includes a sales file 32, a data division unit 72, a first sales file 32a, a second sales file 32b, a first sales totaling unit 34a, and a second A sales totaling unit 34b, a first store name assigning unit 38a, a second store name assigning unit 38b, a store name file 40, a first store name-added sales total file 42a, a second store name-added sales file 42b, By the data combining unit 74, the store name sales total file 42, the product master file 44, the product name extraction unit 46, the product name file 48, the product name giving unit 50, and the store name product name sales total file 52. It is configured.

このデータフロー70を構成するに際し、まずプログラム実行部14はデータ分割プログラム24を起動し、データ分割部72を生成する。
つぎにプログラム実行部14は、データ分割部72に分割可能キーとして「店番」を渡し、２つのファイルに分割するように指令する。 In configuring the data flow 70, the program execution unit 14 first activates the data division program 24 to generate the data division unit 72.
Next, the program execution unit 14 passes the “store number” as a separable key to the data division unit 72 and instructs it to be divided into two files.

これを受けたデータ分割部72は、売上ファイル32中に店番１〜店番2000までのデータが格納されていることから、これを店番１〜店番1000のデータを含む第１の売上ファイル32aと、店番1001〜店番2000のデータを含む第２の売上ファイル32bとに等分割する。 Receiving this, since the data from the store number 1 to the store number 2000 is stored in the sales file 32, the data dividing unit 72 stores the data in the first sales file 32a including the data of the store numbers 1 to 1000, The data is equally divided into the second sales file 32b including the data of the store numbers 1001 to 2000.

つぎにプログラム実行部14は、売上集計プログラム16及び店名付与プログラム18を起動して第１の売上集計部34a及び第１の店名付与部38aを生成した上で、第１の売上ファイル32aに対する売上集計処理と、店名付与処理を実行させる。 Next, the program execution unit 14 activates the sales totaling program 16 and the store name giving program 18 to generate the first sales totaling unit 34a and the first store name giving unit 38a, and then the sales for the first sales file 32a. The aggregation process and the store name assignment process are executed.

これに対し第１の売上集計部34aは、第１の売上ファイル32aに格納された各データの金額を「店番×商品コード」単位で集計し、集計結果を第１の店名付与部38aに渡す。
これを受けた第１の店名付与部38aは、店名ファイル40を参照して、店番に対応した店名を各データに付加し、第１の店名付売上合計ファイル42aに格納する。
第１の売上集計部34a及び第１の店名付与部38aは、第１の売上ファイル32aに格納された全てのデータについて処理が完了するまで、「店番×商品コード」単位での上記処理を続行する。 On the other hand, the first sales totaling unit 34a totals the amount of each data stored in the first sales file 32a in units of “store number × product code”, and passes the total result to the first store name giving unit 38a. .
Receiving this, the first store name giving unit 38a refers to the store name file 40, adds a store name corresponding to the store number to each data, and stores it in the first store name-added sales total file 42a.
The first sales totaling unit 34a and the first store name assigning unit 38a continue the above processing in units of “store number × product code” until the processing is completed for all the data stored in the first sales file 32a. To do.

上記と並行してプログラム実行部14は、売上集計プログラム16及び店名付与プログラム18を起動して第２の売上集計部34b及び第２の店名付与部38bを生成した上で、第２の売上ファイル32bに対する売上集計処理と、店名付与処理を実行させる。 In parallel with the above, the program execution unit 14 activates the sales totaling program 16 and the store name giving program 18 to generate the second sales totaling unit 34b and the second store name giving unit 38b, and then the second sales file. The sales aggregation process and store name assignment process for 32b are executed.

これに対し第２の売上集計部34bは、第２の売上ファイル32bに格納された各データの金額を「店番×商品コード」単位で集計し、集計結果を第２の店名付与部38bに渡す。
これを受けた第２の店名付与部38bは、店名ファイル40を参照して、店番に対応した店名を各データに付加し、第２の店名付売上合計ファイル42bに格納する。
第２の売上集計部34b及び第２の店名付与部38bは、第２の売上ファイル32bに格納された全てのデータについて処理が完了するまで、「店番×商品コード」単位での上記処理を続行する。 On the other hand, the second sales totaling unit 34b totals the amount of each data stored in the second sales file 32b in units of “store number × product code”, and passes the total result to the second store name giving unit 38b. .
Receiving this, the second store name assigning unit 38b refers to the store name file 40, adds a store name corresponding to the store number to each data, and stores it in the second store name-added sales total file 42b.
The second sales totaling unit 34b and the second store name assigning unit 38b continue the above processing in units of “store number × product code” until the processing is completed for all the data stored in the second sales file 32b. To do.

プログラム実行部14は、第１の売上集計部34a及び第１の店名付与部38aによる処理が完了すると共に、第２の売上集計部34b及び第２の店名付与部38bによる処理が完了した時点で、データ結合プログラム26を起動してデータ結合部74を生成する。
データ結合部74は、第１の店名付売上合計ファイル42aと第２の店名付売上合計ファイル42bを結合し、店名付売上合計ファイル42に格納する。 The program execution unit 14 completes the processing by the first sales totaling unit 34a and the first store name giving unit 38a and at the time when the processing by the second sales totaling unit 34b and the second store name giving unit 38b is completed. Then, the data combination program 26 is activated to generate the data combination unit 74.
The data combining unit 74 combines the first store-named sales total file 42a and the second store-named sales total file 42b and stores them in the store-named sales total file 42.

プログラム実行部14はまた、上記した各処理（データ分割部72による売上ファイル32の分割処理、第１の売上集計部34a及び第１の店名付与部38aによる売上集計及び店名付与処理、第２の売上集計部34b及び第２の店名付与部38bによる売上集計及び店名付与処理、データ結合部74による店名付売上合計ファイルの結合処理）と並行して、商品名抽出プログラム20を起動して商品名抽出部46を生成し、商品名抽出処理を実行させる。 The program execution unit 14 also performs the above-described processes (dividing processing of the sales file 32 by the data dividing unit 72, sales totaling and store name giving processing by the first sales totaling unit 34a and the first store name giving unit 38a, In parallel with the sales aggregation and store name assignment processing by the sales aggregation unit 34b and the second store name assigning unit 38b, and the combination processing of the sales total file with store name by the data combination unit 74), the product name extraction program 20 is started and the product name is started. An extraction unit 46 is generated to execute a product name extraction process.

商品名抽出部46は、商品マスタファイル44に格納されたデータから商品コードと商品名の組合せを抽出し、商品名ファイル48に格納する。 The product name extraction unit 46 extracts the combination of the product code and the product name from the data stored in the product master file 44 and stores it in the product name file 48.

プログラム実行部14は、上記した各処理（データ分割部72による売上ファイル32の分割処理、第１の売上集計部34a及び第１の店名付与部38aによる売上集計及び店名付与処理、第２の売上集計部34b及び第２の店名付与部38bによる売上集計及び店名付与処理、データ結合部74による店名付売上合計ファイルの結合処理、商品名抽出部46による商品名抽出処理）が完了した時点で、商品名付与プログラム22を起動して商品名付与部50を生成し、商品名付与処理を実行させる。 The program execution unit 14 performs the above-described processes (division processing of the sales file 32 by the data division unit 72, sales aggregation and store name assignment processing by the first sales aggregation unit 34a and the first store name assigning unit 38a, and second sales. Upon completion of sales aggregation and store name assignment processing by the aggregation unit 34b and the second store name assigning unit 38b, combination processing of the sales total file with store name by the data combination unit 74, and product name extraction processing by the product name extraction unit 46) The product name assigning program 22 is activated to generate a product name assigning unit 50 and execute product name assigning processing.

商品名付与部50は、商品名ファイル48を参照して、店名付売上合計ファイル42に格納された各データに商品名を付与した上で、店名商品名付売上合計ファイル52に格納する。 The product name assigning unit 50 refers to the product name file 48, assigns a product name to each data stored in the store name sales total file 42, and stores the data in the store name product name sales total file 52.

以上の通り、この並列化処理用のデータフロー70に従えば、第１の売上集計部34a及び第１の店名付与部38aによる処理と、第２の売上集計部34b及び第２の店名付与部38bによる処理とが、所謂データ並列化されると共に、これらの処理と商品名抽出部46による処理とが、所謂タスク並列化されることとなる。
また、第１の売上集計部34aによる処理と、第１の店名付与部38aによる処理とが、所謂パイプライン並列化されると共に、第２の売上集計部34bによる処理と、第２の店名付与部38bによる処理もパイプライン並列化されることとなる。
以上の結果、全体の処理効率を大幅に向上させることが可能となる。 As described above, according to the data flow 70 for parallel processing, the processing by the first sales totaling unit 34a and the first store name giving unit 38a, the second sales totaling unit 34b and the second store name giving unit The processing by 38b is so-called data parallelization, and these processing and the processing by the product name extraction unit 46 are so-called task parallelization.
Further, the processing by the first sales totaling unit 34a and the processing by the first store name giving unit 38a are parallelized in a so-called pipeline, and the processing by the second sales totaling unit 34b and the second store name giving The processing by the unit 38b is also pipeline parallelized.
As a result, the overall processing efficiency can be greatly improved.

しかも、このような並列化処理用のデータフロー70が、各業務処理プログラムの伝達部54〜60、プログラム解析部12及びプログラム実行部14の協働によって自動的に形成されるため、人為的に並列化のための設計をする場合に比べて手間がかからず、バグが混入する危険性も大幅に低下する。 Moreover, since the data flow 70 for parallel processing is automatically formed by the cooperation of the transmission units 54 to 60, the program analysis unit 12 and the program execution unit 14 of each business processing program, Compared to designing for parallelization, it is less time-consuming and the risk of bugs is greatly reduced.

もちろん、各業務処理プログラムの開発者は、本来の機能を実現するためのコードの他に、伝達部54〜60を実現するためのコードを記述する必要が生じるが、その際に要求されることは、入力データ及び出力データの指定と、分割可能キーが存在する場合にはその指定に限られており、これらは開発者自身が知り尽くしている事項であるため、大きな負担増とはならない。 Of course, the developer of each business processing program needs to write the code for realizing the transmission units 54 to 60 in addition to the code for realizing the original function. Is limited to the designation of input data and output data, and if there is a separable key, and these are matters that the developer himself knows, so there is no significant increase in burden.

従来、データフロー設計の専門家が並列化処理用のデータフローを設計する際に一番のネックとなっていたのが、個々の業務処理の内容を把握する点にあったことを考えれば、上記のように並列化判定用データを業務処理プログラム自身が出力し、これに基づいて自動的にデータフローが生成される仕組みを実現したことは、極めて有意義といえる。 Traditionally, the biggest bottleneck in designing dataflows for parallel processing by dataflow design experts was to understand the contents of each business process. As described above, it is extremely meaningful that the business processing program itself outputs the data for determining parallelization and realizes a mechanism for automatically generating a data flow based on the data.

この並列化処理用データフロー70の実行に際しては、最大で５つの処理（第１の売上集計部34aによる処理、第１の店名付与部38aによる処理、第２の売上集計部34bによる処理、第２の店名付与部38bによる処理、商品名抽出部46による処理）が同時実行されることとなり、またOS用にもCPUコアを割り当てる必要があるため、少なくとも６個のCPUコアを備えたコンピュータを用いることが望ましい。 When executing this parallel processing data flow 70, a maximum of five processes (processing by the first sales totaling unit 34a, processing by the first store name assigning unit 38a, processing by the second sales totaling unit 34b, 2 processing by the store name assigning unit 38b and processing by the product name extracting unit 46), and it is necessary to allocate CPU cores for the OS, so a computer having at least 6 CPU cores is required. It is desirable to use it.

さらに多くのCPUコアを搭載しているコンピュータを用いる場合には、データ分割部72によるファイルの分割数を３以上とすることができ、その分、全体の処理効率を一段と高めることが可能となる。 When a computer equipped with more CPU cores is used, the number of files divided by the data dividing unit 72 can be set to 3 or more, and the overall processing efficiency can be further increased accordingly. .

この発明に係るデータフローの自動並列化システムの機能構成を示すブロック図である。It is a block diagram which shows the function structure of the automatic parallelization system of the data flow concerning this invention. 各業務処理プログラムの実行によって実現される本来のデータフローを示すブロック図である。It is a block diagram which shows the original data flow implement | achieved by execution of each business processing program. このシステムにおける処理手順を示すフローチャートである。It is a flowchart which shows the process sequence in this system. 各業務処理プログラムの実行によって実現される並列化処理用のデータフローを示すブロック図である。It is a block diagram which shows the data flow for the parallelization process implement | achieved by execution of each business processing program.

10 データフローの自動並列化システム
12 プログラム解析部
14 プログラム実行部
16 売上集計プログラム
18 店名付与プログラム
20 商品名抽出プログラム
22 商品名付与プログラム
24 データ分割プログラム
26 データ結合プログラム
30 本来のデータフロー
32 売上ファイル
32a 第１の売上ファイル
32b 第２の売上ファイル
34 売上集計部
34a 第１の売上集計部
34b 第２の売上集計部
36 売上合計ファイル
38 店名付与部
38a 第１の店名付与部
38b 第２の店名付与部
40 店名ファイル
42 店名付売上合計ファイル
42a 第１の店名付売上合計ファイル
42b 第２の店名付売上合計ファイル
44 商品マスタファイル
46 商品名抽出部
48 商品名ファイル
50 商品名付与部
52 店名商品名付売上合計ファイル
54 伝達部
56 伝達部
58 伝達部
60 伝達部
70 並列化処理用のデータフロー
72 データ分割部
74 データ結合部 10 Automatic data flow parallelization system
12 Program analysis section
14 Program execution part
16 Sales summary program
18 Store name assignment program
20 Product name extraction program
22 Product name assignment program
24 Data division program
26 Data binding program
30 Original data flow
32 Sales file
32a First sales file
32b Second sales file
34 Sales Summary Department
34a 1st Sales Summary Department
34b Second Sales Summary Department
36 Total sales file
38 Store name assignment department
38a First store name assigning department
38b Second store name assigning department
40 Store name file
42 Total sales file with store name
42a Total sales file with first store name
42b Total sales file with second store name
44 Product master file
46 Product name extraction section
48 Product name file
50 Product Name Assignment Department
52 Total sales file with store name and product name
54 Transmitter
56 Transmitter
58 Transmitter
60 Transmitter
70 Data flow for parallel processing
72 Data division part
74 Data coupling part

Claims

A plurality of types of business processing programs for generating different types of business processing means;
Program analysis means for analyzing parallelization judgment data output from each business processing means and generating parallelized design data;
A program execution means for assigning and starting an execution thread to each of the business processing programs described above, and generating each business processing means;
A data flow automatic parallelization system comprising:
Each of the business processing means outputs the input / output data defined in each business processing program and the splittable key that is a data item that allows splitting of the input data as the parallelization determination data,
If the output data of one business process program is the input data of the other business process program and the splittable key defined in both programs is common, the program analysis means Each time processing in units of splittable keys by the processing means is completed, parallelized design data that specifies that the processing result is output to the subsequent business processing means and processed is generated,
An automatic data flow parallelization system characterized in that the program execution means activates the business processing means at the same time to execute necessary processing based on the parallelization design data.

A business processing program for generating business processing means;
A data division program for generating a data division unit;
A data combination program for generating a data combination part;
Program analysis means for analyzing the parallelization determination data output from the business processing means and generating parallelized design data;
Program execution means for allocating and starting an execution thread to the above-described business processing program, data division program, and data combination program, and generating business processing means, data division unit, and data combination unit;
A data flow automatic parallelization system comprising:
The business processing means outputs a splittable key, which is a data item that allows division of input data defined in the business processing program, as the parallelization determination data.
The program analysis means is configured to divide a file storing the input data of the business processing means into a predetermined number, and to activate a plurality of the business processing means and perform processing on each divided file. And generate parallel design data that specifies that the output data is to be combined after the processing by each business processing means is completed,
The program execution means activates the data dividing unit based on the parallelized design data to divide the file storing the input data into a specified number, and activates the same number of the business processing means. Data flow automatic parallelization system characterized in that the above-described processing is executed at the same time, and further, the above-described data combining section is activated to execute processing for combining output data from each business processing means.