JP6900265B2

JP6900265B2 - Data analysis system and data analysis method

Info

Publication number: JP6900265B2
Application number: JP2017140756A
Authority: JP
Inventors: 知也藤原
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2017-07-20
Filing date: 2017-07-20
Publication date: 2021-07-07
Anticipated expiration: 2037-07-20
Also published as: JP2019021176A

Description

本発明は、データ分析システム、及びデータ分析方法に関する。 The present invention relates to a data analysis system and a data analysis method.

特許文献１には、「分析処理の中間段階で生成されるデータを効率よく保存して中間データを再利用する」、「分析の中間段階で生成されたデータを保存しておき、保存したデータに対するフィードバック情報を定量化したものを評価値として受け付け、評価値が与えられなかった中間データについては優先的に削除する一方で、特に高い評価値を受け付けた中間データに対しては、類似するデータの分析処理を行い、比較対象となるデータの分析や派生的に想定される分析が高速に行えるように、バックグラウンド処理で中間データの自動管理を行う。」と記載されている。 In Patent Document 1, "data generated in the intermediate stage of analysis processing is efficiently saved and the intermediate data is reused", and "data generated in the intermediate stage of analysis is saved and saved data". The quantified feedback information for the data is accepted as the evaluation value, and the intermediate data for which no evaluation value is given is preferentially deleted, while the intermediate data for which a particularly high evaluation value is received is similar data. The intermediate data is automatically managed by background processing so that the analysis of the data to be compared and the analysis that is expected to be derived can be performed at high speed. "

特開２０１１−２９１１号公報Japanese Unexamined Patent Publication No. 2011-2911

情報処理装置を用いたデータ分析業務においては試行錯誤を繰り返しながら作業を進めていくことが多い。とくにビッグデータ等の大量のデータを対象としたデータ分析においては、データ分析処理を構成している、データの収集、加工、分類、検索、分析、機械学習、可視化等を行う処理要素のアルゴリズム（ロジック）を調整しつつ作業を進めていくことが多い。 In data analysis work using information processing equipment, the work is often carried out by repeating trial and error. Especially in data analysis targeting a large amount of data such as big data, the algorithms of the processing elements that make up the data analysis process, such as data collection, processing, classification, search, analysis, machine learning, and visualization ( In many cases, the work is carried out while adjusting the logic).

しかしこうした作業においては、一部の処理要素を変更する度にデータ分析処理の全体を最初から実行し直す必要があり、対象となるデータが膨大であったり処理が複雑な場合は結果を得るまでに長時間を要し、データ分析業務の効率化を阻害する要因となっている。ここで上記特許文献１には、中間データを再利用することが記載されているが、一連の分析処理の途中から再実行する観点に基づく仕組みについては何ら開示されていない。 However, in such work, it is necessary to re-execute the entire data analysis process from the beginning every time some processing elements are changed, and until the result is obtained when the target data is huge or the process is complicated. It takes a long time, which is a factor that hinders the efficiency of data analysis work. Here, Patent Document 1 describes that intermediate data is reused, but does not disclose any mechanism based on the viewpoint of re-execution from the middle of a series of analysis processes.

本発明は、データ分析業務を効率よく行うことが可能な、データ分析システム及びデータ分析方法を提供することを目的としている。 An object of the present invention is to provide a data analysis system and a data analysis method capable of efficiently performing data analysis work.

本発明の一つは、データ分析システムであって、データ分析処理を構成している複数の処理要素の実行順序を定義した情報を含む分析処理定義情報を記憶する記憶部と、デプロイされている資材により前記分析処理定義情報に従って前記処理要素を実行し、前記処理要素の夫々が生成するデータを記憶する処理実行部と、前記処理要素のソースコードに基づきビルドを実行してデプロイ対象となる資材を生成するビルド実行部と、前記処理実行部に現在デプロイされている資材の更新情報である現在デプロイ情報を記憶し、前記ソースコードの変更を受け付け、変更された前記ソースコードに基づきビルドを実行して新たな前記資材を生成し、前記現在デプロイ情報と前記新たな資材の更新情報との差を示す情報である変更情報を生成する変更管理部と、を含み、前記処理実行部は、前記データ分析処理の実行に際し前記変更情報に基づき当該データ分析処理の前記処理要素の変更有無を判断し、変更されていると判断した場合、変更された前記処理要素よりも前に実行されるように前記分析処理定義情報に定義されている前記処理要素の実行を省略し、前記変更された処理要素の前に実行されるように前記分析処理定義情報に定義されている処理要素が
過去に生成した前記データを用いて、前記変更された処理要素から処理を実行する。 One of the present invention is a data analysis system, which is deployed with a storage unit that stores analysis processing definition information including information defining an execution order of a plurality of processing elements constituting the data analysis processing. A processing execution unit that executes the processing element according to the analysis processing definition information by the material and stores data generated by each of the processing elements, and a material to be deployed by executing a build based on the source code of the processing element. The build execution unit that generates the data and the current deployment information that is the update information of the materials currently deployed in the processing execution unit are stored, the change of the source code is accepted, and the build is executed based on the changed source code. The processing execution unit includes a change management unit that generates the new material and generates change information that is information indicating the difference between the current deployment information and the update information of the new material. When executing the data analysis process, it is determined whether or not the processing element of the data analysis process has been changed based on the change information, and if it is determined that the processing element has been changed, the data analysis process is executed before the changed processing element. The execution of the processing element defined in the analysis processing definition information is omitted, and the processing element defined in the analysis processing definition information is generated in the past so as to be executed before the changed processing element. Using the data, processing is executed from the modified processing element.

その他、本願が開示する課題、及びその解決方法は、発明を実施するための形態の欄、及び図面により明らかにされる。 In addition, the problems disclosed in the present application and the solutions thereof will be clarified by the column of the form for carrying out the invention and the drawings.

本発明によれば、データ分析業務を効率よく行うことができる。 According to the present invention, data analysis work can be performed efficiently.

データ分析システムの概略的な構成を示す図である。It is a figure which shows the schematic structure of the data analysis system. データ分析システムの実現に用いられる情報処理装置の一例である。This is an example of an information processing device used to realize a data analysis system. ソース管理装置、ビルド装置、及び変更管理装置の夫々が備える機能、及びこれらの装置が管理する情報（データ）を説明する図である。It is a figure explaining the function which each of a source control apparatus, a build apparatus, and a change management apparatus has, and information (data) managed by these apparatus. 変更管理装置及び実行装置が備える機能、及びこれらの装置が管理する情報（データ）を説明する図である。It is a figure explaining the function provided by the change management apparatus and the execution apparatus, and the information (data) managed by these apparatus. 分析処理定義情報の一例である。This is an example of analysis process definition information. 現在デプロイ情報の一例である。Currently an example of deployment information. 変更情報の一例である。This is an example of change information. データ分析システムにおいて行われる処理を説明するシーケンス図である。It is a sequence diagram explaining the process performed in a data analysis system. 実行装置が行う処理を説明するフローチャートである。It is a flowchart explaining the process performed by the execution device. データレイクにおけるデータの管理形態の一例を示す図である。It is a figure which shows an example of the data management form in a data lake. 実行装置が実行する分析処理の流れと分析処理によってデータレイクに格納されるデータの関係を示す図である。It is a figure which shows the relationship between the flow of the analysis process executed by the execution apparatus, and the data stored in the data lake by the analysis process. 実行装置が実行する分析処理の流れと分析処理によってデータレイクに格納されるデータの関係を示す図である。It is a figure which shows the relationship between the flow of the analysis process executed by the execution apparatus, and the data stored in the data lake by the analysis process. 実行要求指示画面の一例である。This is an example of the execution request instruction screen.

以下、実施形態につき図面を参照しつつ説明する。以下の説明において、同一又は類似する構成に同一の符号を付して重複した説明を省略することがある。 Hereinafter, embodiments will be described with reference to the drawings. In the following description, the same or similar configurations may be designated by the same reference numerals and duplicate description may be omitted.

図１に実施形態として説明するデータ分析システム１の概略的な構成を示している。データ分析システム１は、サーバセグメント２及びユーザセグメント３の各管理セグメントに所属する情報処理装置を含む。サーバセグメント２に所属する情報処理装置とユーザセグメント３に所属する情報処理装置とは、通信ネットワーク５を介して通信可能に接続されている。通信ネットワーク５は、例えば、ＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）、インターネット（Internet）、専用線等である。 FIG. 1 shows a schematic configuration of a data analysis system 1 described as an embodiment. The data analysis system 1 includes information processing devices belonging to each management segment of the server segment 2 and the user segment 3. The information processing device belonging to the server segment 2 and the information processing device belonging to the user segment 3 are connected to each other so as to be able to communicate with each other via the communication network 5. The communication network 5 is, for example, a LAN (Local Area Network), a WAN (Wide Area Network), the Internet (Internet), a dedicated line, or the like.

データ分析システム１のユーザ（データ分析業務を行う者。例えば、データサイエンティスト、データ分析に関する処理（以下、データ分析処理と称する。）のプログラムの開発や保守を行う者等。）は、ユーザセグメント３に所属するユーザ端末１１０を利用してサーバセグメント２に所属する情報処理装置にアクセスする。 The user of the data analysis system 1 (a person who performs data analysis work, for example, a data scientist, a person who develops and maintains a program for processing related to data analysis (hereinafter referred to as data analysis processing), etc.) is a user segment 3. The user terminal 110 belonging to the user terminal 110 is used to access the information processing device belonging to the server segment 2.

同図に示すように、サーバセグメント２は、ソース管理装置１０１、ビルド装置１０２、変更管理装置１０３、及び実行装置１０４の複数の情報処理装置を含む。尚、本例では、このように複数の情報処理装置に機能を分散させているが、複数の機能を一つ以上の共通の情報処理装置に実現させるようにしてもよい。 As shown in the figure, the server segment 2 includes a plurality of information processing devices of the source control device 101, the build device 102, the change control device 103, and the execution device 104. In this example, the functions are distributed to a plurality of information processing devices in this way, but the plurality of functions may be realized in one or more common information processing devices.

図２は、サーバセグメント２に所属する、ソース管理装置１０１、ビルド装置１０２、変更管理装置１０３、実行装置１０４、及びユーザセグメント３に所属するユーザ端末１
１０、の夫々の実現に用いられる情報処理装置の一例（以下、情報処理装置１０と称する。）である。同図に示すように、情報処理装置１０は、プロセッサ１１、記憶装置１２、入力装置１３、出力装置１４、及び通信装置１５を備える。これらはバス等の通信手段を介して互いに通信可能に接続されている。尚、情報処理装置１０の全部または一部の構成を、例えば、クラウドシステムにおけるクラウドサーバ等の仮想的な資源を用いて実現する構成としてもよい。 FIG. 2 shows the source management device 101, the build device 102, the change management device 103, the execution device 104, and the user terminal 1 belonging to the user segment 3 belonging to the server segment 2.
10 is an example of an information processing device (hereinafter, referred to as an information processing device 10) used for realizing each of the above. As shown in the figure, the information processing device 10 includes a processor 11, a storage device 12, an input device 13, an output device 14, and a communication device 15. These are connected to each other so as to be able to communicate with each other via a communication means such as a bus. The configuration of all or part of the information processing device 10 may be realized by using virtual resources such as a cloud server in a cloud system.

プロセッサ１１は、例えば、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro-Processing Unit）、ＧＰＵ（Graphics Processing Unit）等を用いて構成される。ソース管理装置１０１、ビルド装置１０２、変更管理装置１０３、実行装置１０４、及びユーザ端末１１０の全部又は一部の機能は、例えば、プロセッサ１１が、記憶装置１２に格納されているプログラムを読み出して実行することにより実現される。情報処理装置１０は、例えば、オペレーティングシステム、ファイルシステム、デバイスドライバ、ＤＢＭＳ（DataBase Management System）等の機能を備えていてもよい。後述するソースリポジトリ１０６、アーカイブリポジトリ１０７、データレイク１０８、現在デプロイ情報１６０、及び変更情報１７０は、例えば、ＤＢＭＳを用いて実現もしくは管理することができる。 The processor 11 is configured by using, for example, a CPU (Central Processing Unit), an MPU (Micro-Processing Unit), a GPU (Graphics Processing Unit), or the like. For all or part of the functions of the source management device 101, the build device 102, the change management device 103, the execution device 104, and the user terminal 110, for example, the processor 11 reads and executes the program stored in the storage device 12. It is realized by doing. The information processing device 10 may have functions such as an operating system, a file system, a device driver, and a DBMS (DataBase Management System). The source repository 106, the archive repository 107, the data lake 108, the current deployment information 160, and the change information 170, which will be described later, can be realized or managed by using, for example, a DBMS.

記憶装置１２は、例えば、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）、不揮発性半導体メモリ（Non-volatile memory）、ハードディスクドライブ、ＳＳ
Ｄ（Solid State Drive）等であり、プログラムやデータを記憶する。記憶装置１２に格
納される各情報（データ）は、例えば、ファイルシステムやＤＢＭＳによって管理される。 The storage device 12 includes, for example, a ROM (Read Only Memory), a RAM (Random Access Memory), a non-volatile semiconductor memory (Non-volatile memory), a hard disk drive, and an SS.
D (Solid State Drive), etc., which stores programs and data. Each information (data) stored in the storage device 12 is managed by, for example, a file system or a DBMS.

入力装置１３は、外部入力を受け付けるユーザインタフェースであり、例えば、キーボード、マウス、タッチパネル等である。出力装置１４は、処理経過や処理結果等の各種情報を外部に提供するユーザインタフェースであり、例えば、画像表示装置（液晶モニタ、ＬＣＤ（Liquid Crystal Display）、グラフィックカード等）、印字装置等である。 The input device 13 is a user interface that accepts external input, and is, for example, a keyboard, a mouse, a touch panel, or the like. The output device 14 is a user interface that provides various information such as processing progress and processing results to the outside, and is, for example, an image display device (liquid crystal monitor, LCD (Liquid Crystal Display), graphic card, etc.), a printing device, or the like. ..

通信装置１５は、他の装置との間での通信を実現する装置であり、例えば、ＮＩＣ（Network Interface Card）や無線通信モジュールを用いて構成される。尚、例えば、情報処理装置１０が、通信装置１５を介して外部入力を受け付ける構成としてもよい。また例えば、情報処理装置１０が、処理経過や処理結果等の各種情報を通信装置１５を介して外部に提供する構成としてもよい。 The communication device 15 is a device that realizes communication with other devices, and is configured by using, for example, a NIC (Network Interface Card) or a wireless communication module. For example, the information processing device 10 may be configured to receive an external input via the communication device 15. Further, for example, the information processing device 10 may be configured to provide various information such as the processing progress and the processing result to the outside via the communication device 15.

図３は、サーバセグメント２に所属する、ソース管理装置１０１、ビルド装置１０２、及び変更管理装置１０３の夫々が備える機能、並びにこれらの装置が管理する情報（データ）を説明する図である。 FIG. 3 is a diagram for explaining the functions provided by each of the source control device 101, the build device 102, and the change management device 103 belonging to the server segment 2, and the information (data) managed by these devices.

ソース管理装置１０１は、プログラム管理部２０１を有する。ソース管理装置１０１は、ソースリポジトリ１０６を管理する。 The source management device 101 has a program management unit 201. The source control device 101 manages the source repository 106.

ユーザがユーザ端末１１０を介してソース管理装置１０１にアクセスし、データ分析処理のソースコード（以下、分析ロジックとも称する。）をコミット（登録）すると、プログラム管理部２０１は、ユーザが作成もしくは更新したソースコードをソースリポジトリ１０６に登録する。プログラム管理部２０１は、ソースリポジトリ１０６に格納されているソースコードの更新情報（リビジョン情報、バージョン情報等）の管理を行う。例えば、プログラム管理部２０１は、同じタイミングでコミットされた一群のソースコードに同一の更新情報を付与する。ソースコードは、例えば、Ｊａｖａ（登録商標）言語等を用いて記述される。 When the user accesses the source control device 101 via the user terminal 110 and commits (registers) the source code (hereinafter, also referred to as analysis logic) of the data analysis process, the program management unit 201 is created or updated by the user. Register the source code in the source repository 106. The program management unit 201 manages update information (revision information, version information, etc.) of the source code stored in the source repository 106. For example, the program management unit 201 assigns the same update information to a group of source codes committed at the same timing. The source code is described using, for example, the Java (registered trademark) language.

ビルド装置１０２は、ビルド実行部２０２を有する。ビルド装置１０２は、アーカイブリポジトリ１０７を管理する。アーカイブリポジトリ１０７には、ビルドを実行することにより生成される、実行装置１０４へのデプロイ（本番展開）の対象となる資材（以下、デプロイ対象資材とも称する。）が格納される。本例では、上記資材はアプリケーションアーカイブであるものとする。アプリケーションアーカイブの一例として、Ｊａｖａ（登録商標）における「ｊａｒ」ファイルがある。 The build device 102 has a build execution unit 202. The build device 102 manages the archive repository 107. The archive repository 107 stores materials (hereinafter, also referred to as deployment target materials) to be deployed (production deployment) to the execution device 104, which are generated by executing the build. In this example, it is assumed that the above material is an application archive. An example of an application archive is the "jar" file in Java®.

変更管理装置１０３は、変更管理部２０３を有する。変更管理装置１０３は、現在デプロイ情報１６０及び変更情報１７０を管理する。現在デプロイ情報１６０は、実行装置１０４に現在デプロイされているアプリケーションアーカイブに含まれている各コンポーネント（アプリケーションアーカイブの構成要素。例えば、ソースコードに対応する実行モジュール等。）の更新情報を含む。一方、変更情報１７０は、直近に実行されたビルドで生成されたアプリケーションアーカイブに含まれている各コンポーネントの更新情報と現在デプロイ情報１６０の各コンポーネントの更新情報との差を示す情報を含む。 The change management device 103 has a change management unit 203. The change management device 103 currently manages the deployment information 160 and the change information 170. The current deployment information 160 includes update information of each component (components of the application archive, for example, an execution module corresponding to the source code) included in the application archive currently deployed in the execution device 104. On the other hand, the change information 170 includes information indicating the difference between the update information of each component included in the application archive generated in the most recently executed build and the update information of each component of the current deployment information 160.

変更管理部２０３は、ユーザ端末１１０から「変更情報の生成指示」を受信すると、ソース管理装置１０１から、ソースコードに関する情報（各ソースコードのプログラム名（例えば、後述する図６のプログラム名１６２）と各ソースコードの更新情報（例えば、後述する図６の更新情報１６３）を含む。以下、ソース情報とも称する。）を取得する。 Upon receiving the "change information generation instruction" from the user terminal 110, the change management unit 203 receives information about the source code from the source control device 101 (program name of each source code (for example, program name 162 of FIG. 6 described later)). And update information of each source code (for example, update information 163 of FIG. 6 described later) is included. Hereinafter, it is also referred to as source information).

また変更管理部２０３は、ビルド装置１０２からアプリケーションアーカイブに関する情報（どのコンポーネントがどのアプリケーションアーカイブにパッケージングされるかを示す情報（例えば、後述する図６のアプリケーションアーカイブ名１６１とプログラム名１６２との対応を示す情報。）を含む。以下、ビルド情報と称する。）を取得する。 Further, the change management unit 203 receives information about the application archive from the build device 102 (information indicating which component is packaged in which application archive (for example, correspondence between the application archive name 161 and the program name 162 in FIG. 6 described later). Includes information indicating). Hereinafter referred to as build information).

変更管理部２０３は、直近に実行されたビルドにより生成されたアプリケーションアーカイブに含まれている各コンポーネントの更新情報を現在デプロイ情報１６０の更新情報と比較することにより両者の差を特定し、特定した結果を変更情報１７０として出力する。変更情報１７０の詳細については後述する。 The change management unit 203 has identified and identified the difference between the two by comparing the update information of each component contained in the application archive generated by the most recently executed build with the update information of the current deployment information 160. The result is output as change information 170. The details of the change information 170 will be described later.

変更管理部２０３は、ユーザ端末１１０から「現在デプロイ情報の更新指示」を受信すると、変更情報１７０の内容に基づき、現在デプロイ情報１６０を最新の内容に更新する。 When the change management unit 203 receives the "current deployment information update instruction" from the user terminal 110, the change management unit 203 updates the current deployment information 160 to the latest content based on the content of the change information 170.

図４は、サーバセグメント２に所属する、変更管理装置１０３及び実行装置１０４が備える機能、並びにこれらの装置が管理する情報（データ）を説明する図である。 FIG. 4 is a diagram for explaining the functions included in the change management device 103 and the execution device 104 belonging to the server segment 2, and the information (data) managed by these devices.

実行装置１０４（処理実行部）は、実行要求受付部３０１、処理実行制御部３０２、コマンド呼出部３０３、分析ロジック実行部３０４、及びデータ入出力部３０５の各機能を有する。このうち実行要求受付部３０１、処理実行制御部３０２、コマンド呼出部３０３、及びデータ入出力部３０５は、例えば、データ分析業務を行うためのプラットフォームシステム（基盤システム）として予めユーザに提供される。 The execution device 104 (process execution unit) has each function of an execution request reception unit 301, a process execution control unit 302, a command call unit 303, an analysis logic execution unit 304, and a data input / output unit 305. Of these, the execution request receiving unit 301, the processing execution control unit 302, the command calling unit 303, and the data input / output unit 305 are provided to the user in advance as, for example, a platform system (basic system) for performing data analysis work.

同図に示すように、実行装置１０４は、データレイク１０８を管理する。データレイク１０８には、例えば、分析対象となるデータ（例えば、Ｗｅｂサイトから取得されるデータ、ＳＮＳ（Social Networking Service）から取得されたデータ、センサから収集した
データ、ＰＯＳ（point of sale）データ、アンケートデータ、いわゆるビックデータ等
。以下、分析対象データとも称する。）、実行装置１０４が実行するデータ分析処理の各処理断面で生成（後述する処理要素が実行される度に生成）されるデータ（以下、中間データとも称する。）、分析処理の結果に関する情報を含んだデータ（以下、結果データと
も称する。）等が管理される。 As shown in the figure, the execution device 104 manages the data lake 108. The data lake 108 includes, for example, data to be analyzed (for example, data acquired from a website, data acquired from an SNS (Social Networking Service), data collected from a sensor, POS (point of sale) data, and the like. Questionnaire data, so-called big data, etc., hereinafter also referred to as analysis target data), data generated in each processing section of the data analysis processing executed by the execution device 104 (generated each time a processing element described later is executed) (generated. Hereinafter, intermediate data), data including information on the result of analysis processing (hereinafter, also referred to as result data), and the like are managed.

データレイク１０８は、例えば、分析対象データを当該分析対象データの取得元の形式（ネイティブフォーマット）で蓄積記憶し、データ入出力部３０５からの要求に応じて分析対象データを必要とされる形式のデータ（構造化データ、半構造化データ、非構造化データ等）に変換してデータ入出力部３０５に提供する。尚、上記変換は、例えば、ＥＴＬツール（ETL:Extract/Transform/Load）を用いて行われる。 The data lake 108 stores, for example, the data to be analyzed in the format (native format) from which the data to be analyzed is acquired, and the data to be analyzed is required in response to a request from the data input / output unit 305. It is converted into data (structured data, semi-structured data, unstructured data, etc.) and provided to the data input / output unit 305. The above conversion is performed by using, for example, an ETL tool (ETL: Extract / Transform / Load).

実行装置１０４は、分析処理定義情報３０６を記憶する。分析処理定義情報３０６には、データ分析処理を構成する複数の処理要素の実行順序を指定した情報が記述される。分析処理定義情報３０６は、例えば、実行装置１０４やユーザ端末１１０が提供する設定機能を利用してユーザが作成する。本例では、分析処理定義情報３０６はデプロイ対象資材の一つであるものとする。 The execution device 104 stores the analysis process definition information 306. In the analysis process definition information 306, information that specifies the execution order of a plurality of processing elements constituting the data analysis process is described. The analysis process definition information 306 is created by the user, for example, by using the setting function provided by the execution device 104 or the user terminal 110. In this example, the analysis process definition information 306 is assumed to be one of the materials to be deployed.

図５に分析処理定義情報３０６の一例を示す。同図に示すように、分析処理定義情報３０６には、順に実行される各処理要素の夫々の実行制御文が実行順に記述されている。本例では、処理要素１〜処理要素５までの５つの処理要素が実行順に記述されている。各実行制御文は、実行コマンドとその実行対象となるファイル（Ｊａｖａ（登録商標）における「ｊａｒ」形式ファイル）のファイル名の記述とを含む。 FIG. 5 shows an example of the analysis process definition information 306. As shown in the figure, in the analysis processing definition information 306, execution control statements of each processing element to be executed in order are described in the execution order. In this example, five processing elements 1 to processing element 5 are described in the order of execution. Each execution control statement includes a description of the file name of the execution command and the file to be executed (a "jar" format file in Java (registered trademark)).

図４に戻り、実行要求受付部３０１は、ユーザ端末１１０からデータ分析処理の実行要求を受け付ける。尚、分析処理定義情報３０６は、予め実行装置１０４に格納しておくようにしてもよいが、例えば、実行要求受付部３０１が、ユーザ端末１１０から、上記実行要求とともに分析処理定義情報３０６を受け付けて記憶するようにしてもよい。実行要求受付部３０１は、上記実行要求を受け付けると、処理実行制御部３０２にデータ分析処理の実行指示を行う。また実行要求受付部３０１は、上記実行指示によるデータ分析処理の実行結果をユーザ端末１１０に提供する。 Returning to FIG. 4, the execution request receiving unit 301 receives the execution request of the data analysis process from the user terminal 110. The analysis process definition information 306 may be stored in the execution device 104 in advance. For example, the execution request reception unit 301 receives the analysis process definition information 306 together with the execution request from the user terminal 110. You may try to remember it. When the execution request receiving unit 301 receives the execution request, the execution request receiving unit 301 instructs the processing execution control unit 302 to execute the data analysis process. Further, the execution request receiving unit 301 provides the user terminal 110 with the execution result of the data analysis process according to the execution instruction.

処理実行制御部３０２は、データ分析処理の実行を制御する。処理実行制御部３０２は、現在デプロイされているアプリケーションアーカイブにより分析処理定義情報３０６に従ってデータ分析処理を実行する。処理実行制御部３０２は、実行要求受付部３０１から上記実行指示を受けると変更管理装置１０３の変更管理部２０３から変更情報１７０を取得し、変更情報１７０の内容をコマンド呼出部３０３に入力する。 The process execution control unit 302 controls the execution of the data analysis process. The process execution control unit 302 executes the data analysis process according to the analysis process definition information 306 by the currently deployed application archive. When the process execution control unit 302 receives the execution instruction from the execution request reception unit 301, the process execution control unit 302 acquires the change information 170 from the change management unit 203 of the change management device 103, and inputs the contents of the change information 170 to the command calling unit 303.

コマンド呼出部３０３は、分析処理定義情報３０６に従ってデータ分析処理を実行（分析処理定義情報３０６に記述されている処理要素を実行）する。コマンド呼出部３０３は、分析ロジック実行部３０４と連係しつつ分析処理を実行する。コマンド呼出部３０３は、後述する第２実行モードが選択されている場合、変更情報１７０に基づき、各処理要素の変更有無（処理要素を実現するアプリケーションアーカイブの変更有無）を確認する。処理要素に変更がある場合、分析処理定義情報３０６に記述されている処理要素のうち省略可能な処理要素の実行を省略（スキップ）する。 The command calling unit 303 executes the data analysis process according to the analysis process definition information 306 (executes the processing element described in the analysis process definition information 306). The command calling unit 303 executes the analysis process in cooperation with the analysis logic execution unit 304. When the second execution mode described later is selected, the command calling unit 303 confirms whether or not each processing element has been changed (whether or not the application archive that realizes the processing element has been changed) based on the change information 170. When there is a change in the processing element, the execution of the optional processing element among the processing elements described in the analysis processing definition information 306 is omitted (skipped).

分析ロジック実行部３０４は、コマンド呼出部３０３から呼び出された実行コマンド（処理要素）を実行する。分析ロジック実行部３０４は、処理要素の実行に際してデータレイク１０８にアクセス（データの入力又は出力）する必要がある場合、データ入出力部３０５と連携して処理を進める。 The analysis logic execution unit 304 executes an execution command (processing element) called from the command call unit 303. When the analysis logic execution unit 304 needs to access the data lake 108 (input or output of data) when executing the processing element, the analysis logic execution unit 304 proceeds with the processing in cooperation with the data input / output unit 305.

データ入出力部３０５は、例えば、データアクセスオブジェクトとして機能し、分析ロジック実行部３０４からの要求に応じてデータレイク１０８にアクセスする。 The data input / output unit 305 functions as, for example, a data access object, and accesses the data lake 108 in response to a request from the analysis logic execution unit 304.

図６は前述した現在デプロイ情報１６０の一例である。同図に示すように、現在デプロイ情報１６０は、アプリケーションアーカイブ名１６１、プログラム名１６２、及び更新情報１６３の各項目を有する一つ以上のレコードで構成されている。 FIG. 6 is an example of the above-mentioned current deployment information 160. As shown in the figure, the current deployment information 160 is composed of one or more records having each item of the application archive name 161 and the program name 162, and the update information 163.

アプリケーションアーカイブ名１６１には、アプリケーションアーカイブを特定する情報が設定される。本例では、アプリケーションアーカイブ名１６１に「ｊａｒ」ファイルのファイル名（以下、アプリケーションアーカイブ名とも称する。）を設定している。プログラム名１６２には、ソースコードを特定する情報が設定される。本例ではソースコードのプログラム名を設定している。更新情報１６３には、上記ソースコードのリビジョン情報が設定される。 Information that identifies the application archive is set in the application archive name 161. In this example, the file name of the "jar" file (hereinafter, also referred to as the application archive name) is set in the application archive name 161. Information that identifies the source code is set in the program name 162. In this example, the program name of the source code is set. The revision information of the above source code is set in the update information 163.

図７は前述した変更情報１７０の一例である。同図に示すように、変更情報１７０は、アプリケーションアーカイブ名１７１、プログラム名１７２、更新情報１７３、及び差分情報１７４の各項目を有する一つ以上のレコードで構成されている。 FIG. 7 is an example of the above-mentioned change information 170. As shown in the figure, the change information 170 is composed of one or more records having each item of the application archive name 171 and the program name 172, the update information 173, and the difference information 174.

上記項目のうち、アプリケーションアーカイブ名１７１、プログラム名１７２、及び更新情報１７３については、現在デプロイ情報１６０のアプリケーションアーカイブ名１６１、プログラム名１６２、及び更新情報１６３と夫々同様であるので説明を省略する。差分情報１７４には、現在デプロイ情報１６０とソースコードの変更後に実行されたビルドにより生成された新たなアプリケーションアーカイブに関する情報との差を示す情報が設定される。本例の場合、ソースコードが変更（コンポーネントの更新情報が変更）されていれば「あり」が設定され、ソースコードが変更されていなければ空白が設定される。 Of the above items, the application archive name 171 and the program name 172, and the update information 173 are the same as the application archive name 161, the program name 162, and the update information 163 of the current deployment information 160, respectively, and thus the description thereof will be omitted. The difference information 174 is set with information indicating the difference between the current deployment information 160 and the information regarding the new application archive generated by the build executed after the source code is changed. In the case of this example, "Yes" is set if the source code has been changed (component update information has been changed), and blank is set if the source code has not been changed.

図８は、データ分析システム１において行われる処理（ユーザにより変更されたソースコードがコミットされた後に行われるデータ分析処理の流れ）を説明するシーケンス図である。以下、同図とともに順に説明する。 FIG. 8 is a sequence diagram illustrating a process performed in the data analysis system 1 (a flow of the data analysis process performed after the source code changed by the user is committed). Hereinafter, they will be described in order with the same figure.

まずユーザ端末１１０からソース管理装置１０１に対してコミット対象のソースコード及び当該ソースコードのコミット指示が送信される。ソース管理装置１０１は、上記ソースコードのコミット指示を受信すると、上記コミット指示とともに受信したソースコードをソースリポジトリ１０６にコミット（登録）する（Ｓ８０１）。 First, the user terminal 110 transmits the source code to be committed and the commit instruction of the source code to the source control device 101. When the source control device 101 receives the commit instruction of the source code, the source control device 101 commits (registers) the received source code together with the commit instruction to the source repository 106 (S801).

続いて、ユーザ端末１１０からソース管理装置１０１に対してビルドの実行指示が送信される（Ｓ８０２）。上記実行指示を受信すると、ビルド装置１０２は、ソース管理装置１０１に対してソースコードの取得要求を送信する（Ｓ８０３）。 Subsequently, the user terminal 110 transmits a build execution instruction to the source control device 101 (S802). Upon receiving the execution instruction, the build device 102 transmits a source code acquisition request to the source control device 101 (S803).

ソース管理装置１０１は、上記取得要求を受信すると、ビルド装置１０２にビルドに際して必要となるソースコードを送信する（Ｓ８０４）。 Upon receiving the acquisition request, the source control device 101 transmits the source code required for the build to the build device 102 (S804).

ビルド装置１０２は、ソース管理装置１０１からソースコードを受信すると、受信したソースコードを用いてビルドを実行し、アプリケーションアーカイブを生成する（Ｓ８０５）。尚、生成された上記アプリケーションアーカイブは後述するデプロイ処理におけるデプロイ対象資材となる。 When the build device 102 receives the source code from the source control device 101, the build device 102 executes a build using the received source code and generates an application archive (S805). The generated application archive is a material to be deployed in the deployment process described later.

続いて、ユーザ端末１１０から変更管理装置１０３に変更情報の生成指示が送信される（Ｓ８０６）。上記生成指示を受信すると、変更管理装置１０３は、ソース管理装置１０１にソース情報の取得要求を送信する（Ｓ８０７）。ソース管理装置１０１は、上記取得要求を受信すると、ソース情報を変更管理装置１０３に送信する（Ｓ８０８）。 Subsequently, the user terminal 110 transmits a change information generation instruction to the change management device 103 (S806). Upon receiving the generation instruction, the change management device 103 transmits a source information acquisition request to the source management device 101 (S807). Upon receiving the acquisition request, the source management device 101 transmits the source information to the change management device 103 (S808).

続いて、変更管理装置１０３からビルド装置１０２に対してビルド時のビルド情報の取
得要求が送信される（Ｓ８０９）。ビルド装置１０２は、上記取得要求を受信すると、ビルド情報を変更管理装置１０３に送信する（Ｓ８１０）。 Subsequently, the change management device 103 transmits a build information acquisition request at the time of build to the build device 102 (S809). Upon receiving the acquisition request, the build device 102 transmits the build information to the change management device 103 (S810).

尚、取得したソース情報とビルド情報に基づき、変更管理装置１０３は、ソースコードのリビジョン情報やソースコードがいずれのアプリケーションアーカイブに含まれるかといった情報を取得することが可能になる。 Based on the acquired source information and build information, the change management device 103 can acquire information such as revision information of the source code and which application archive the source code is included in.

続いて、変更管理装置１０３は、現在デプロイ情報１６０と取得したソース情報及びビルド情報とに基づき変更情報１７０を生成する（Ｓ８１１，Ｓ８１２）。変更情報１７０を生成すると、変更管理装置１０３は、ユーザ端末１１０に変更情報１７０の生成完了通知を送信する（Ｓ８１３）。 Subsequently, the change management device 103 generates change information 170 based on the current deployment information 160 and the acquired source information and build information (S811, S812). When the change information 170 is generated, the change management device 103 transmits the generation completion notification of the change information 170 to the user terminal 110 (S813).

続いて、ユーザ端末１１０からビルド装置１０２にデプロイ対象資材の取得要求が送信される（Ｓ８１４）。ビルド装置１０２は、上記取得要求を受信すると、デプロイ資材をユーザ端末１１０に送信する（Ｓ８１５）。ユーザ端末１１０は、デプロイ対象資材を受信すると、受信したデプロイ対象資材を実行装置１０４にデプロイする（Ｓ８１６）。 Subsequently, a request for acquiring the material to be deployed is transmitted from the user terminal 110 to the build device 102 (S814). Upon receiving the acquisition request, the build device 102 transmits the deployment material to the user terminal 110 (S815). When the user terminal 110 receives the deployment target material, the user terminal 110 deploys the received deployment target material to the execution device 104 (S816).

続いて、ユーザ端末１１０から変更管理装置１０３に現在デプロイ情報１６０の更新指示が送信される（Ｓ８１７）。変更管理装置１０３は、上記更新指示を受信すると、変更情報１７０の内容に基づき、現在デプロイ情報１６０を最新の内容に更新する（Ｓ８１８）。現在デプロイ情報１６０を更新すると、変更管理装置１０３は、ユーザ端末１１０に現在デプロイ情報１６０の更新完了通知を送信する（Ｓ８１９）。 Subsequently, the user terminal 110 transmits an update instruction for the current deployment information 160 to the change management device 103 (S817). Upon receiving the update instruction, the change management device 103 updates the current deployment information 160 to the latest content based on the content of the change information 170 (S818). When the current deployment information 160 is updated, the change management device 103 transmits an update completion notification of the current deployment information 160 to the user terminal 110 (S819).

続いて、ユーザ端末１１０から実行装置１０４にデータ分析処理の実行指示が送信される（Ｓ８２０）。尚、上記実行指示に際し、ユーザは、ユーザ端末１１０に対して、分析処理定義情報３０６に記述されている処理要素を最初から最後まで順に実行する実行モード（以下、第１実行モードと称する。）と、分析処理定義情報３０６に記述されている処理要素のうち、ソースコード（コンポーネント）が変更された処理要素よりも前の処理要素の実行を省略（スキップ）し、ソースコードが変更された処理要素以降の処理要素から順に再実行する実行モード（以下、第２実行モードと称する。）のうちのいずれかを指定することができる。ユーザが第２実行モードを選択した場合、実行装置１０４は、過去に行われたデータ分析処理で生成されてデータレイク１０８に格納されているデータを再利用してソースコードが変更された処理要素以降の処理要素を実行する。 Subsequently, the user terminal 110 transmits an execution instruction for data analysis processing to the execution device 104 (S820). At the time of the execution instruction, the user executes the processing elements described in the analysis processing definition information 306 in order from the beginning to the end of the user terminal 110 (hereinafter, referred to as the first execution mode). And, among the processing elements described in the analysis processing definition information 306, the execution of the processing element before the processing element whose source code (component) has been changed is omitted (skipped), and the processing whose source code has been changed. It is possible to specify one of the execution modes (hereinafter, referred to as the second execution mode) in which the processing elements after the element are re-executed in order. When the user selects the second execution mode, the execution device 104 reuses the data generated in the data analysis process performed in the past and stored in the data lake 108, and the source code is changed. Execute the following processing elements.

実行装置１０４は、ユーザ端末１１０から上記実行指示を受信すると、実行指示を受け付けた旨の応答をユーザ端末１１０に送信する（Ｓ８２１）。尚、実行装置１０４によるデータ分析処理は、例えば、ユーザ端末１１０とは独立して（非同期で）行ってもよい。 When the execution device 104 receives the execution instruction from the user terminal 110, the execution device 104 transmits a response to the effect that the execution instruction has been accepted to the user terminal 110 (S821). The data analysis process by the execution device 104 may be performed independently (asynchronously) from the user terminal 110, for example.

データ分析処理の実行に先立ち、実行装置１０４は、変更情報１７０の取得要求を変更管理装置１０３に送信する（Ｓ８２２）。変更管理装置１０３は、上記取得要求を受信すると、変更情報１７０を実行装置１０４に送信する（Ｓ８２３）。 Prior to the execution of the data analysis process, the execution device 104 transmits the acquisition request of the change information 170 to the change management device 103 (S822). Upon receiving the acquisition request, the change management device 103 transmits the change information 170 to the execution device 104 (S823).

変更情報１７０を受信すると、実行装置１０４は、ユーザが選択した実行モードでデータ分析処理を実行する（Ｓ８２４）。尚、Ｓ８２４の処理の詳細については後述する。 Upon receiving the change information 170, the execution device 104 executes the data analysis process in the execution mode selected by the user (S824). The details of the processing of S824 will be described later.

データ分析処理の実行後、実行装置１０４は、データ分析処理の結果（結果データ）をデータレイク１０８に格納する（Ｓ８２５）。 After executing the data analysis process, the execution device 104 stores the result (result data) of the data analysis process in the data lake 108 (S825).

図９は、図８のＳ８２４にて実行装置１０４が行う処理の詳細を説明するフローチャートである。以下、同図とともに説明する。 FIG. 9 is a flowchart illustrating details of the process performed by the execution device 104 in S824 of FIG. Hereinafter, it will be described together with the figure.

まず実行装置１０４は、ユーザ端末１１０から受信した実行指示に対応してこれから実行するデータ分析処理に識別子（以下、実行ＩＤと称する。）を付与する（Ｓ９０１）。 First, the execution device 104 assigns an identifier (hereinafter, referred to as an execution ID) to the data analysis process to be executed in response to the execution instruction received from the user terminal 110 (S901).

続いて、実行装置１０４は、変更管理装置１０３から変更情報１７０を取得する（Ｓ９０２）。 Subsequently, the execution device 104 acquires the change information 170 from the change management device 103 (S902).

続いて、実行装置１０４は、いずれの実行モードが選択されているかを判断する（Ｓ９０３）。第１実行モードが選択されている場合（Ｓ９０３：第１実行モード）、実行装置１０４は、分析処理定義情報３０６に記述されている処理要素を最初から最後まで順に実行する（Ｓ９０４〜Ｓ９０７）。一方、第２実行モードが選択されている場合（Ｓ９０３：第２実行モード）、処理はＳ９１０に進む。 Subsequently, the execution device 104 determines which execution mode is selected (S903). When the first execution mode is selected (S903: first execution mode), the execution device 104 executes the processing elements described in the analysis processing definition information 306 in order from the beginning to the end (S904 to S907). On the other hand, when the second execution mode is selected (S903: second execution mode), the process proceeds to S910.

Ｓ９１０では、実行装置１０４は、分析処理定義情報３０６に最初に記述されている処理要素を選択する。続いて、実行装置１０４は、変更情報１７０に基づき、選択中の処理要素が変更されているか否か（処理要素を実現するアプリケーションアーカイブのコンポーネントが変更されているか否か）を判断する（Ｓ９１１）。選択中の処理要素が変更されている場合（Ｓ９１１：ＹＥＳ）、処理はＳ９０５に進み、実行装置１０４は、選択中の処理要素から分析処理定義情報３０６に記述されている最後の処理要素まで順に処理を実行する（Ｓ９０５〜Ｓ９０７）。 In S910, the execution device 104 selects the processing element first described in the analysis processing definition information 306. Subsequently, the execution device 104 determines whether or not the selected processing element has been changed (whether or not the component of the application archive that realizes the processing element has been changed) based on the change information 170 (S911). .. If the selected processing element has been changed (S911: YES), the processing proceeds to S905, and the execution device 104 sequentially processes the selected processing element to the last processing element described in the analysis processing definition information 306. The process is executed (S905 to S907).

一方、選択中の処理要素が変更されていない場合（Ｓ９１１：ＮＯ）、処理はＳ９１２に進む。Ｓ９１２では、実行装置１０４は、選択中の処理要素の実行を省略（スキップ）する。続いて、実行装置１０４は、分析処理定義情報３０６に後続の処理要素が記述されているか否かを判断する（Ｓ９１３）。後続の処理要素が記述されている場合（Ｓ９１１：ＹＥＳ）、実行装置１０４は、次の処理要素を選択し直し（Ｓ９１４）、新たに選択した処理要素についてＳ９１１からの処理を行う。一方、後続の処理要素が記述されていない場合（即ちデータ処理要素の全ての処理要素が変更されていない場合）（Ｓ９１１：ＮＯ）、処理は終了する。 On the other hand, if the selected processing element has not been changed (S911: NO), the processing proceeds to S912. In S912, the execution device 104 omits (skips) the execution of the selected processing element. Subsequently, the execution device 104 determines whether or not the subsequent processing element is described in the analysis processing definition information 306 (S913). When a subsequent processing element is described (S911: YES), the execution device 104 reselects the next processing element (S914), and processes the newly selected processing element from S911. On the other hand, when the subsequent processing elements are not described (that is, when all the processing elements of the data processing elements are not changed) (S9111: NO), the processing ends.

尚、処理要素の実行を省略（スキップ）した（Ｓ９１２）後にＳ９１１からＳ９０５に進んで（Ｓ９１１：ＹＥＳ）分析処理定義情報３０６に記述されている処理要素のうち途中の処理要素からデータ分析処理を再実行する場合、実行装置１０４は、再実行する処理要素に、過去のデータ分析処理（ソースコードが変更される前に行われたデータ分析処理）において、再実行を開始する処理要素の前（例えば直前）の処理要素（再実行を開始する処理要素よりも前（例えば直前）に実行されるように分析処理定義情報３０６に記述されている処理要素）によって生成されたデータ（入力データや中間データ）を入力として与える。尚、上記再実行に際し、例えば、当該再実行を開始する処理要素の前に実行されるように分析処理定義情報３０６に記述されている処理要素が過去に生成したデータのうちいずれのデータを用いるかを、データ分析処理の態様に応じて設定（自動設定、ユーザ設定等）できるようにしてもよい。 After omitting (skipping) the execution of the processing element (S912), the process proceeds from S911 to S905 (S911: YES), and the data analysis processing is performed from the processing element in the middle of the processing elements described in the analysis processing definition information 306. When re-executing, the execution device 104 puts the processing element to be re-executed before the processing element that starts re-execution in the past data analysis processing (data analysis processing performed before the source code is changed). For example, the data (input data or intermediate) generated by the processing element (immediately before) (processing element described in the analysis processing definition information 306 so as to be executed before (for example, immediately before) the processing element that starts re-execution). Data) is given as input. In the re-execution, for example, any data among the data generated in the past by the processing element described in the analysis processing definition information 306 so as to be executed before the processing element that starts the re-execution is used. It may be possible to set (automatic setting, user setting, etc.) according to the mode of the data analysis processing.

図１０は、データレイク１０８におけるデータの管理形態の一例である。同図に示すように、この例では、データレイク１０８に、図９のＳ９０１において分析処理に付与された実行ＩＤ毎（実行装置１０４がユーザ端末１１０から受け付けたデータ分析処理の実行指示毎）にデータ空間が確保されている。データ空間は、例えば、実行装置１０４が実行ＩＤを付与した際にデータ入出力部３０５又はデータレイク１０８によって自動的に確保される。この例では、データレイク１０８に、実行ＩＤが「１」のデータ分析処理について確保されたデータ空間１０１０と、実行ＩＤが「２」のデータ分析処理について確保されたデータ空間１０２０とが設けられている。各データ空間には、データ分析処理の各処
理要素によって生成されるデータ（入力データ、中間データ、結果データ等）が格納される（本例では、「キー」と値「値」の組み合わせを含む一つ以上のレコードで構成されたデータを一例として示している。）。 FIG. 10 is an example of a data management mode in the data lake 108. As shown in the figure, in this example, in the data lake 108, for each execution ID assigned to the analysis process in S901 of FIG. 9 (for each execution instruction of the data analysis process received by the execution device 104 from the user terminal 110). Data space is secured. The data space is automatically secured by the data input / output unit 305 or the data lake 108 when the execution device 104 assigns the execution ID, for example. In this example, the data lake 108 is provided with a data space 1010 reserved for the data analysis process having the execution ID "1" and a data space 1020 reserved for the data analysis process having the execution ID "2". There is. Each data space stores data (input data, intermediate data, result data, etc.) generated by each processing element of the data analysis process (in this example, a combination of a "key" and a value "value" is included. Data composed of one or more records is shown as an example.)

図１１は、実行装置１０４がデータ分析処理の流れとデータ分析処理が実行されることによりデータレイク１０８に格納されるデータ（テーブル）との関係を示す図である。以下、同図とともに説明する。 FIG. 11 is a diagram showing the relationship between the flow of the data analysis process and the data (table) stored in the data lake 108 when the data analysis process is executed by the execution device 104. Hereinafter, it will be described together with the figure.

まず実行装置１０４は、データ分析処理に実行ＩＤとして「１」を付与し、データレイク１０８に実行ＩＤが「１」のデータ空間１１５０を確保する（図９のＳ９０１）。 First, the execution device 104 assigns "1" as an execution ID to the data analysis process, and secures a data space 1150 having an execution ID of "1" in the data lake 108 (S901 in FIG. 9).

続いて、実行装置１０４が処理要素１（Ｓ１１１１）を実行することにより、データレイク１０８の実行ＩＤが「１」のデータ空間１１５０にテーブル１（入力データ）１１５１が格納される（図９のＳ９０４）。 Subsequently, when the execution device 104 executes the processing element 1 (S1111), the table 1 (input data) 1151 is stored in the data space 1150 whose execution ID of the data lake 108 is “1” (S904 in FIG. 9). ).

続いて、実行装置１０４が処理要素２（Ｓ１１１２）、処理要素３（Ｓ１１１３）、及び処理要素４（Ｓ１１１４）を順に実行することにより、実行ＩＤが「１」のデータ空間１１５０に、テーブル２（中間データ）１１５２、テーブル３（中間データ）１１５３、及びテーブル４（中間データ）１１５４が順に生成される。 Subsequently, the execution device 104 executes the processing element 2 (S1112), the processing element 3 (S1113), and the processing element 4 (S1114) in this order, so that the table 2 (table 2 ( Intermediate data) 1152, table 3 (intermediate data) 1153, and table 4 (intermediate data) 1154 are generated in this order.

続いて、実行装置１０４が処理要素５（Ｓ１１１５）を実行することにより、実行ＩＤが「１」のデータ空間１１５０にテーブル５（結果データ）１１５５が生成される（図９のＳ９０５〜Ｓ９０７）。 Subsequently, when the execution device 104 executes the processing element 5 (S1115), the table 5 (result data) 1155 is generated in the data space 1150 having the execution ID “1” (S905 to S907 in FIG. 9).

図１２は、図１１に示したデータ分析処理が実行された後、処理要素３（Ｓ１１１３）が変更（処理要素３のソースコードが変更）されてビルドが行われ、その後にデータ分析処理が再実行された場合におけるデータ分析処理の流れと、データレイク１０８に格納されるデータ（テーブル）との関係を示す図である。この例では、実行装置１０４は、図１１に示したデータ分析処理で生成されたデータ（テーブル）を利用してデータ分析処理を再実行する。 In FIG. 12, after the data analysis process shown in FIG. 11 is executed, the processing element 3 (S1113) is changed (the source code of the processing element 3 is changed) to perform the build, and then the data analysis process is re-executed. It is a figure which shows the relationship between the flow of the data analysis processing at the time of execution, and the data (table) stored in the data lake 108. In this example, the execution device 104 re-executes the data analysis process using the data (table) generated in the data analysis process shown in FIG.

まず実行装置１０４は、再実行するデータ分析処理に実行ＩＤとして「２」を付与し、データレイク１０８に実行ＩＤが「２」のデータ空間１２５０を確保する（図９のＳ９０１）。尚、図１２に示すデータ分析処理が行われる際、データ空間１１５０はデータレイク１０８に確保されたままであり、図１１に示したデータ分析処理においてデータ空間１１５０に格納されたデータは全てデータレイク１０８に格納されている（残っている）ものとする。 First, the execution device 104 assigns "2" as an execution ID to the data analysis process to be re-executed, and secures a data space 1250 having an execution ID of "2" in the data lake 108 (S901 in FIG. 9). When the data analysis process shown in FIG. 12 is performed, the data space 1150 remains secured in the data lake 108, and all the data stored in the data space 1150 in the data analysis process shown in FIG. 11 is the data lake 108. It is assumed that it is stored (remains) in.

続いて、実行装置１０４は、変更情報１７０に基づき処理要素１（Ｓ１１１１）及び処理要素２（Ｓ１１１２）が変更されていないことを確認し、処理要素１（Ｓ１１１１）及び処理要素２（Ｓ１１１２）の実行を省略（スキップ）する（図９の（Ｓ９１１：ＮＯ）及びＳ９１２）。 Subsequently, the execution device 104 confirms that the processing element 1 (S1111) and the processing element 2 (S1112) have not been changed based on the change information 170, and the processing element 1 (S1111) and the processing element 2 (S1112). Execution is omitted (skipped) ((S911: NO) and S912 in FIG. 9).

続いて、実行装置１０４は、処理要素３（Ｓ１１１３）のソースコードが変更されていることを確認し、図１１に示したデータ分析処理で生成されたテーブル２（中間データ）１１５２を入力として処理要素３（Ｓ１１１３）を実行する。これによりデータレイク１０８の実行ＩＤが「２」のデータ空間１２５０にテーブル３（中間データ）１１６１が格納される。 Subsequently, the execution device 104 confirms that the source code of the processing element 3 (S1113) has been changed, and processes the table 2 (intermediate data) 1152 generated by the data analysis process shown in FIG. 11 as an input. Execute element 3 (S1113). As a result, the table 3 (intermediate data) 1161 is stored in the data space 1250 whose execution ID of the data lake 108 is “2”.

続いて、実行装置１０４は、処理要素４（Ｓ１１１４）及び処理要素５（Ｓ１１１５）
を実行し、これにより実行ＩＤが「２」のデータ空間１２５０にテーブル４（中間データ）１１６２及びテーブル５（結果データ）１１６５が格納される（図９の（Ｓ９１１：ＹＥＳ）、Ｓ９０５〜Ｓ９０７）。 Subsequently, the execution device 104 uses the processing element 4 (S1114) and the processing element 5 (S1115).
Is executed, and as a result, table 4 (intermediate data) 1162 and table 5 (result data) 1165 are stored in the data space 1250 whose execution ID is "2" ((S911: YES) in FIG. 9, S905 to S907). ..

図１３は、図４でユーザがユーザ端末１１０を介して実行装置１０４にデータ分析処理の実行要求を行う際、ユーザ端末１１０がユーザに提供するユーザインタフェース（実行要求指示画面１３００）の一例である。同図に示すように、実行要求指示画面１３００は、データ分析処理の指定欄１３１１、実行モードの選択欄１３１２、及び実行要求指示ボタン１３１３を含む。 FIG. 13 is an example of a user interface (execution request instruction screen 1300) provided by the user terminal 110 to the user when the user requests the execution device 104 to execute the data analysis process via the user terminal 110 in FIG. .. As shown in the figure, the execution request instruction screen 1300 includes a data analysis process designation field 1311, an execution mode selection field 1312, and an execution request instruction button 1313.

ユーザは、データ分析処理の指定欄１３１１にデータ分析処理の識別子（データ分析処理ＩＤ）を指定することで、実行装置１０４に実行させようとするデータ分析処理を指定することができる。またユーザは、実行モードの選択欄１３１２を介して前述した実行モードを指定することができる。またユーザは、実行要求指示ボタン１３１３を操作することで、データ分析処理の実行要求を実行装置１０４に対して行うことができる。 The user can specify the data analysis process to be executed by the execution device 104 by designating the data analysis process identifier (data analysis process ID) in the data analysis process designation field 1311. Further, the user can specify the above-mentioned execution mode via the execution mode selection field 1312. Further, the user can make an execution request for the data analysis process to the execution device 104 by operating the execution request instruction button 1313.

以上に説明したように、本実施形態のデータ分析システム１にあっては、変更された処理要素を自動的に特定し、特定した処理要素から、過去に実行されたデータ分析処理で生成されたデータを利用して再実行するので、効率よくデータ分析処理を実行することができる。尚、本実施形態のデータ分析システム１によれば、例えば、大規模なプロジェクト等においてデータ分析システム１が複数のユーザで共有され、デプロイ対象資材の更新状況を正確に把握することが難しい場合でも、省略可能な処理要素を正確に特定することが可能であり、効率よくデータ分析業務を進めることができる。 As described above, in the data analysis system 1 of the present embodiment, the changed processing element is automatically specified, and the specified processing element is generated by the data analysis processing executed in the past. Since the data is used for re-execution, the data analysis process can be executed efficiently. According to the data analysis system 1 of the present embodiment, for example, even when the data analysis system 1 is shared by a plurality of users in a large-scale project or the like and it is difficult to accurately grasp the update status of the materials to be deployed. , It is possible to accurately identify the optional processing elements, and the data analysis work can be carried out efficiently.

ところで、本発明は以上に説明した実施形態に限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能であることはいうまでもない。例えば、上記の実施の形態は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また上記実施形態の構成の一部について、他の構成の追加・削除・置換をすることが可能である。 By the way, it goes without saying that the present invention is not limited to the embodiments described above, and various modifications can be made without departing from the gist thereof. For example, the above-described embodiment has been described in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to the one including all the described configurations. Further, it is possible to add / delete / replace a part of the configuration of the above embodiment with another configuration.

また上記の各構成、機能部、処理部、処理手段等は、それらの一部又は全部を、例えば、集積回路で設計する等によりハードウェアで実現してもよい。また上記の各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリやハードディスク、ＳＳＤ等の記録装置、又はＩＣカード、ＳＤカード、ＤＶＤ等の記録媒体に置くことができる。 Further, each of the above configurations, functional units, processing units, processing means and the like may be realized by hardware by designing a part or all of them by, for example, an integrated circuit. Further, each of the above configurations, functions, and the like may be realized by software by the processor interpreting and executing a program that realizes each function. Information such as programs, tables, and files that realize each function can be placed in a memory, a hard disk, a recording device such as an SSD, or a recording medium such as an IC card, an SD card, or a DVD.

また上記の各図において、制御線や情報線は説明上必要と考えられるものを示しており、必ずしも実装上の全ての制御線や情報線を示しているとは限らない。例えば、実際にはほとんど全ての構成が相互に接続されていると考えてもよい。 Further, in each of the above figures, the control lines and information lines are shown as necessary for explanation, and not all the control lines and information lines in the implementation are necessarily shown. For example, in practice almost all configurations may be considered interconnected.

また以上に説明したデータ分析システム１の各種機能部、各種処理部、各種データベースの配置形態は一例に過ぎない。各種機能部、各種処理部、各種データベースの配置形態はデータ分析システム１が備えるハードウェアやソフトウェアの性能、処理効率、通信効率等の観点から最適な配置形態に変更し得る。 Further, the arrangement form of various functional units, various processing units, and various databases of the data analysis system 1 described above is only an example. The arrangement form of the various function units, the various processing units, and the various databases can be changed to the optimum arrangement form from the viewpoints of the performance, processing efficiency, communication efficiency, and the like of the hardware and software included in the data analysis system 1.

また前述したデータベースの構成（スキーマ（Schema）等）は、リソースの効率的な利用、処理効率向上、アクセス効率向上、検索効率向上等の観点から柔軟に変更し得る。 Further, the above-mentioned database configuration (schema, etc.) can be flexibly changed from the viewpoints of efficient use of resources, improvement of processing efficiency, improvement of access efficiency, improvement of search efficiency, and the like.

１データ分析システム、２サーバセグメント、３ユーザセグメント、５通信ネットワーク、１０情報処理装置、１０１ソース管理装置、１０２ビルド装置、１０３
変更管理装置、１０４実行装置、１０６ソースリポジトリ、１０７アーカイブリポジトリ、１０８データレイク、１１０ユーザ端末、１６０現在デプロイ情報、１７０変更情報、２０１プログラム管理部、２０２ビルド実行部、２０３変更管理部、３０１実行要求受付部、３０２処理実行制御部、３０３コマンド呼出部、３０４分析ロジック実行部、３０５データ入出力部、３０６分析処理定義情報 1 Data analysis system, 2 Server segment, 3 User segment, 5 Communication network, 10 Information processing device, 101 Source control device, 102 Build device, 103
Change Management Device, 104 Execution Device, 106 Source Repository, 107 Archive Repository, 108 Data Lake, 110 User Terminal, 160 Current Deployment Information, 170 Change Information, 201 Program Management Department, 202 Build Execution Department, 203 Change Management Department, 301 Execution Request reception unit, 302 process execution control unit, 303 command call unit, 304 analysis logic execution unit, 305 data input / output unit, 306 analysis process definition information

Claims

A storage unit that stores analysis process definition information including information that defines the execution order of multiple processing elements that make up the data analysis process.
A process execution unit that executes the process element according to the analysis process definition information by the deployed material and stores data generated by each of the process elements.
A build execution unit that executes a build based on the source code of the processing element and generates materials to be deployed,
The process execution unit stores the currently deployed information, which is the update information of the currently deployed material, accepts the change of the source code, executes the build based on the changed source code, and generates the new material. Then, the change management unit that generates change information, which is information indicating the difference between the current deployment information and the update information of the new material,
Including
When executing the data analysis process, the process execution unit determines whether or not the process element of the data analysis process has been changed based on the change information, and if it determines that the data analysis process has been changed, the processing element has been changed. The execution of the processing element defined in the analysis processing definition information so as to be executed before is omitted, and the execution of the processing element is defined in the analysis processing definition information so as to be executed before the changed processing element. Using the data generated in the past by the processing element, the processing is executed from the modified processing element.
Data analysis system.

The data analysis system according to claim 1.
The material is an application archive
The current deployment information includes update information of the components contained in the application archive.
The change information includes information indicating the difference between the update information of the current deployment information and the update information of the component included in the application archive of the new material.
Data analysis system.

The data analysis system according to claim 2.
Each of the processing elements is realized by one application archive.
The build execution unit manages information indicating which application archive contains the component based on the source code.
The change management unit generates the change information including information indicating the correspondence between the application archive and the component based on the information acquired from the build execution unit.
Data analysis system.

The data analysis system according to any one of claims 1 to 3.
The change management unit has a source code change information receiving unit that accepts changes to the source code.
Data analysis system.

The data analysis system according to any one of claims 1 to 3.
The processing execution unit has an analysis processing definition information receiving unit that receives and stores the analysis processing definition information.
Data analysis system.

The data analysis system according to any one of claims 1 to 3.
It is provided with a user interface that allows the user to select whether or not to perform the above omission.
When it is selected that the omission is not performed via the user interface, the process execution unit starts the process element defined in the analysis process definition information from the beginning regardless of whether or not the process element is changed. Execute in order,
Data analysis system.

Information processing device
A step to store analysis process definition information, including information that defines the execution order of multiple processing elements that make up a data analysis process.
A step of executing the processing element according to the analysis processing definition information by the deployed material and storing the data generated by each of the processing elements.
A step of executing a build based on the source code of the processing element to generate materials to be deployed,
The currently deployed information, which is the update information of the currently deployed material, is stored, the change of the source code is accepted, the build is executed based on the changed source code to generate the new material, and the currently deployed material is generated. A step of generating change information, which is information indicating the difference between the information and the update information of the new material.
When executing the data analysis process, it is determined whether or not the processing element of the data analysis process has been changed based on the change information, and if it is determined that the processing element has been changed, the data analysis process is executed before the changed processing element. The processing element defined in the analysis processing definition information is omitted in the past so that the processing element defined in the analysis processing definition information is executed before the changed processing element. A step of executing processing from the modified processing element using the generated data,
How to analyze data.

The data analysis method according to claim 7.
The material is an application archive
The current deployment information includes update information of the components contained in the application archive.
The change information includes information indicating the difference between the update information of the current deployment information and the update information of the component included in the application archive of the new material.
Data analysis method.

The data analysis method according to claim 8.
Each of the processing elements is realized by one application archive.
The information processing device
A step of managing information indicating which of the application archives contains components based on the source code.
A step of generating the change information, which includes information indicating the correspondence between the application archive and the component, based on the information.
A data analysis method that further performs.

The data analysis method according to any one of claims 7 to 9.
A step in which the information processing device accepts changes in the source code.
A data analysis method that further performs.

The data analysis method according to any one of claims 7 to 9.
A step in which the information processing apparatus receives and stores the analysis processing definition information.
A data analysis method that further performs.

The data analysis method according to any one of claims 7 to 9.
The information processing device includes a user interface that allows the user to select whether or not to perform the omission.
When it is selected that the information processing apparatus does not perform the omission via the user interface, the processing element defined in the analysis processing definition information is used from the beginning regardless of whether or not the processing element is changed. Steps to perform in sequence,
A data analysis method that further performs.