JP7682117B2

JP7682117B2 - Data processing device and method

Info

Publication number: JP7682117B2
Application number: JP2022032681A
Authority: JP
Inventors: 記史西川; 和彦茂木; 俊彦樫山
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2022-03-03
Filing date: 2022-03-03
Publication date: 2025-05-23
Anticipated expiration: 2042-03-03
Also published as: JP2023128370A

Description

本発明は、概して、データ処理に関し、特に、データベースの表の索引設計に関する。 The present invention relates generally to data processing, and more particularly to index design for database tables.

この種の技術として、例えば、特許文献１に開示の技術がある。特許文献１に開示の技術は、与えられたクエリから潜在的な索引候補を抽出し、それらの索引候補を、最適化を用いて評価し、その評価結果に基づいて索引を推薦する。 One example of this type of technology is the technology disclosed in Patent Document 1. The technology disclosed in Patent Document 1 extracts potential index candidates from a given query, evaluates the index candidates using optimization, and recommends indexes based on the evaluation results.

ＵＳ１０，７６２，０８５US10,762,085

特許文献１に開示されているように、与えられたクエリが使用する表の索引を推薦する技術はある。 As disclosed in Patent Document 1, there is a technique for recommending table indexes to be used for a given query.

しかし、推薦された索引に対応の表が属する環境又はデータベースとは別の環境又は別のデータベースについては、その別の環境又は別のデータベースに属する表のための索引を別途設計する必要がある。このため、その別の環境又は別のデータベースについて網羅的にクエリを利用しなければならず、結果として、索引設計の工数がかかる。 However, for an environment or database other than the environment or database to which the table corresponding to the recommended index belongs, it is necessary to design a separate index for the table belonging to that other environment or database. This requires the use of comprehensive queries for that other environment or database, which results in a lot of work for index design.

また、新たに表が入力された場合、又は、表が変更された場合、その入力された又は変更された表の索引の設計が必要となり得るが、その索引の設計も、その入力された又は変更された表についてのクエリを利用しなければならず、結果として、索引設計の工数がかかる。 In addition, when a new table is entered or a table is modified, it may be necessary to design an index for the entered or modified table. However, the index design must also use queries for the entered or modified table, which results in a lot of work for index design.

データ処理装置が、第２の表を構成する列のメタデータを取得し、当該取得されたメタデータと類似するメタデータが第１のデータカタログ（データベースにおける第１の表を構成する列毎の列名とメタデータとを有するデータ）である。データ処理装置が、当該判定の結果が真の場合、第１のデータカタログから、上記取得されたメタデータと類似するメタデータに対応した列名を特定し、当該特定された列名を含んだ索引であって第１の表の索引である第１の索引を生成する。データ処理装置が、当該生成した第１の索引、又は、当該第１の索引に基づき生成され第２の表の索引である第２の索引を推薦する。 The data processing device acquires metadata of columns constituting a second table, and metadata similar to the acquired metadata is a first data catalog (data having column names and metadata for each column constituting a first table in a database). If the result of the determination is true, the data processing device identifies, from the first data catalog, a column name corresponding to the metadata similar to the acquired metadata, and generates a first index that includes the identified column name and is an index of the first table. The data processing device recommends the generated first index, or a second index that is generated based on the first index and is an index of the second table.

本発明によれば、表の索引の設計に要する工数を低減することができる。 This invention can reduce the amount of work required to design a table index.

本発明の第１の実施形態に係るデータ処理装置の構成を示す。1 shows a configuration of a data processing device according to a first embodiment of the present invention. データカタログ情報の構成を示す。The structure of data catalog information is shown below. 索引定義情報の構成例を示す。1 shows an example of the structure of index definition information. 第１の実施形態に係る索引設計支援処理の流れを示す。3 shows a flow of an index design support process according to the first embodiment. 類似性判定処理の流れを示す。1 shows a flow of a similarity determination process. 索引推薦処理の流れを示す。1 shows the flow of an index recommendation process. 第２の実施形態に係る索引設計支援処理の流れを示す。13 shows a flow of an index design support process according to the second embodiment. 第３の実施形態に係る索引設計支援処理の流れを示す。13 shows a flow of an index design support process according to the third embodiment. 第４の実施形態に係る索引設計支援処理の流れを示す。13 shows a flow of an index design support process according to the fourth embodiment. 索引変更判定処理の流れを示す。13 shows the flow of an index change determination process.

以下の説明では、「インターフェース装置」は、一つ以上のインターフェースデバイスでよい。当該一つ以上のインターフェースデバイスは、下記のうちの少なくとも一つでよい。
・一つ以上のＩ／Ｏ（Input/Output）インターフェースデバイス。Ｉ／Ｏ（Input/Output）インターフェースデバイスは、Ｉ／Ｏデバイスと遠隔の表示用計算機とのうちの少なくとも一つに対するインターフェースデバイスである。表示用計算機に対するＩ／Ｏインターフェースデバイスは、通信インターフェースデバイスでよい。少なくとも一つのＩ／Ｏデバイスは、ユーザインターフェースデバイス、例えば、キーボード及びポインティングデバイスのような入力デバイスと、表示デバイスのような出力デバイスとのうちのいずれでもよい。
・一つ以上の通信インターフェースデバイス。一つ以上の通信インターフェースデバイスは、一つ以上の同種の通信インターフェースデバイス（例えば一つ以上のＮＩＣ（Network Interface Card））であってもよいし二つ以上の異種の通信インターフェースデバイス（例えばＮＩＣとＨＢＡ（Host Bus Adapter））であってもよい。 In the following description, an "interface unit" may refer to one or more interface devices. The one or more interface devices may be at least one of the following:
One or more I/O (Input/Output) interface devices. The I/O (Input/Output) interface devices are interface devices to at least one of the I/O devices and a remote display computer. The I/O interface device to the display computer may be a communications interface device. The at least one I/O device may be a user interface device, for example, either an input device such as a keyboard and a pointing device, or an output device such as a display device.
One or more communication interface devices. The one or more communication interface devices may be one or more homogeneous communication interface devices (e.g., one or more NICs (Network Interface Cards)) or two or more heterogeneous communication interface devices (e.g., a NIC and an HBA (Host Bus Adapter)).

また、以下の説明では、「メモリ」は、一つ以上の記憶デバイスの一例である一つ以上のメモリデバイスであり、典型的には主記憶デバイスでよい。メモリにおける少なくとも一つのメモリデバイスは、揮発性メモリデバイスであってもよいし不揮発性メモリデバイスであってもよい。 In the following description, "memory" refers to one or more memory devices, which are an example of one or more storage devices, and may typically be a primary storage device. At least one memory device in the memory may be a volatile memory device or a non-volatile memory device.

また、以下の説明では、「永続記憶装置」は、一つ以上の記憶デバイスの一例である一つ以上の永続記憶デバイスでよい。永続記憶デバイスは、典型的には、不揮発性の記憶デバイス（例えば補助記憶デバイス）でよく、具体的には、例えば、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、又はＮＶＭｅ（Non-Volatile Memory Express）ドライブでよい。 In the following description, a "persistent storage device" may be one or more persistent storage devices, which are an example of one or more storage devices. A persistent storage device may typically be a non-volatile storage device (e.g., an auxiliary storage device), and more specifically, may be, for example, a hard disk drive (HDD), a solid state drive (SSD), or a non-volatile memory express (NVMe) drive.

また、以下の説明では、「プロセッサ」は、一つ以上のプロセッサデバイスでよい。少なくとも一つのプロセッサデバイスは、典型的には、ＣＰＵ（Central Processing Unit）のようなマイクロプロセッサデバイスでよいが、ＧＰＵ（Graphics Processing Unit）のような他種のプロセッサデバイスでもよい。少なくとも一つのプロセッサデバイスは、シングルコアでもよいしマルチコアでもよい。少なくとも一つのプロセッサデバイスは、プロセッサコアでもよい。少なくとも一つのプロセッサデバイスは、処理の一部又は全部を行うハードウェア回路（例えばＦＰＧＡ（Field-Programmable Gate Array）、ＣＰＬＤ（Complex Programmable Logic Device）又はＡＳＩＣ（Application Specific Integrated Circuit））といった広義のプロセッサデバイスでもよい。 In the following description, a "processor" may be one or more processor devices. The at least one processor device may typically be a microprocessor device such as a CPU (Central Processing Unit), but may also be other types of processor devices such as a GPU (Graphics Processing Unit). The at least one processor device may be a single-core or multi-core. The at least one processor device may be a processor core. The at least one processor device may also be a broader processor device such as a hardware circuit that performs part or all of the processing (e.g., an FPGA (Field-Programmable Gate Array), a CPLD (Complex Programmable Logic Device), or an ASIC (Application Specific Integrated Circuit)).

また、以下の説明では、「ｙｙｙ部」の表現にて機能を説明することがあるが、機能は、一つ以上のコンピュータプログラムがプロセッサによって実行されることで実現されてもよいし、一つ以上のハードウェア回路（例えばＦＰＧＡ又はＡＳＩＣ）によって実現されてもよいし、それらの組合せによって実現されてもよい。プログラムがプロセッサによって実行されることで機能が実現される場合、定められた処理が、適宜に記憶装置及び／又はインターフェース装置等を用いながら行われるため、機能はプロセッサの少なくとも一部とされてもよい。機能を主語として説明された処理は、プロセッサあるいはそのプロセッサを有する装置が行う処理としてもよい。プログラムは、プログラムソースからインストールされてもよい。プログラムソースは、例えば、プログラム配付計算機又は計算機が読み取り可能な記憶媒体（例えば非一時的な記憶媒体）であってもよい。各機能の説明は一例であり、複数の機能が一つの機能にまとめられたり、一つの機能が複数の機能に分割されたりしてもよい。 In the following description, functions are sometimes described using the expression "yyy part", but the functions may be realized by one or more computer programs being executed by a processor, or by one or more hardware circuits (e.g., FPGA or ASIC), or by a combination thereof. When a function is realized by a program being executed by a processor, the function may be at least a part of the processor, since the specified processing is performed using a storage device and/or an interface device, etc., as appropriate. Processing described with a function as the subject may be processing performed by a processor or a device having the processor. A program may be installed from a program source. The program source may be, for example, a program distribution computer or a storage medium (e.g., a non-transitory storage medium) that can be read by a computer. The description of each function is an example, and multiple functions may be combined into one function, or one function may be divided into multiple functions.

以下、幾つかの実施形態を説明する。
［第１の実施形態］ Several embodiments will be described below.
[First embodiment]

図１は、第１の実施形態に係る本発明の第１の実施形態に係るデータ処理装置の構成を示す。 Figure 1 shows the configuration of a data processing device according to the first embodiment of the present invention.

データ処理装置１００は、インターフェース装置１０１、永続記憶装置１０２、メモリ１０３及びプロセッサ１０４を備える。 The data processing device 100 includes an interface device 101, a persistent storage device 102, a memory 103, and a processor 104.

インターフェース装置１０１は、通信ネットワーク（例えばインターネット）１５０に接続されている。インターフェース装置１０１は、通信ネットワーク１５０を介して、ユーザ装置１１０と通信する。ユーザ装置１１０は、パーソナルコンピュータ等の物理的な計算機でもよいし、物理的な計算機に基づく仮想的な計算機でもよい。また、インターフェース装置１０１は、ユーザ装置１１０に代えて又は加えて、ユーザインターフェースデバイスとしての入出力デバイスに接続されていてもよい。すなわち、ユーザ装置１１０及び入出力デバイスのいずれとの間でも、データ処理装置１００が情報の入出力を行うことができる。 The interface device 101 is connected to a communication network (e.g., the Internet) 150. The interface device 101 communicates with a user device 110 via the communication network 150. The user device 110 may be a physical computer such as a personal computer, or a virtual computer based on a physical computer. The interface device 101 may also be connected to an input/output device as a user interface device instead of or in addition to the user device 110. That is, the data processing device 100 can input and output information between the user device 110 and the input/output device.

永続記憶装置１０２は、データベース１２１と、データカタログ情報１２２と、索引定義情報１２３とを記憶する。データベース１２１、データカタログ情報１２２及び索引定義情報１２３のうち少なくともデータベース１２１は、複数存在してもよい。例えば、テスト環境（第１の環境の一例）に属するデータベース１２１と、本番環境（第１の環境と異なる第２の環境の一例）に属するデータベース１２１とがあってよい。また、第１のデータベース１２１と、第１のデータベースと異なる第２のデータベース１２１があってよい（第１のデータベース１２１が属する環境と第２のデータベース１２１が属する環境は同じで異なっていてもよい）。データベース１２１は、表と索引を含む。データカタログ情報１２２及び索引定義情報１２３については後述する。 The persistent storage device 102 stores a database 121, data catalog information 122, and index definition information 123. Of the databases 121, the data catalog information 122, and the index definition information 123, at least the database 121 may exist in a plurality of instances. For example, there may be a database 121 belonging to a test environment (an example of a first environment) and a database 121 belonging to a production environment (an example of a second environment different from the first environment). There may also be a first database 121 and a second database 121 different from the first database (the environment to which the first database 121 belongs may be the same as or different from the environment to which the second database 121 belongs). The database 121 includes a table and an index. The data catalog information 122 and the index definition information 123 will be described later.

メモリ１０３は、一つ又は複数のコンピュータプログラムを記憶する。これらのプログラムがプロセッサ１０４により実行されることで、索引候補生成部１３１及び索引設計支援部１３２といった機能が実現される。索引候補生成部１３１は、与えられたクエリが使用する表の索引を、当該クエリを用いて生成する。索引候補生成部１３１は、既存技術に従う機能でよい。索引設計支援部１３２は、後述の索引設計支援処理を行う。索引設計支援部１３２は、類似性判定部１３６及び索引推薦部１３７といった機能を含む。類似性判定部１３６及び索引推薦部１３７については後述する。 The memory 103 stores one or more computer programs. These programs are executed by the processor 104 to realize functions such as an index candidate generation unit 131 and an index design support unit 132. The index candidate generation unit 131 generates an index for a table used by a given query, using the query. The index candidate generation unit 131 may be a function according to existing technology. The index design support unit 132 performs index design support processing, which will be described later. The index design support unit 132 includes functions such as a similarity determination unit 136 and an index recommendation unit 137. The similarity determination unit 136 and the index recommendation unit 137 will be described later.

図示しないが、メモリ１０３は、ＤＢＭＳ（DataBase Management System）がプロセッサ１０４により実現されるためのコンピュータプログラムを記憶してもよい。索引候補生成部１３１及び索引設計支援部１３２の少なくとも一つは、ＤＢＭＳに含まれてもよいし、ＤＢＭＳの外の機能であってもよい。ＤＢＭＳは、クエリソースからクエリを受け付け、当該クエリに従い、当該クエリで指定された表の索引を参照し、データベース１２１に対する入出力を行う。クエリソースは、ユーザ装置１１０のようにデータ処理装置１００の外部の装置でもよいし、データ処理装置１００の内部の要素（例えば、プロセッサ１０４がメモリにおけるコンピュータプログラムを実行することで実現されるアプリケーション）であってもよい。 Although not shown, the memory 103 may store a computer program for implementing a DBMS (DataBase Management System) by the processor 104. At least one of the index candidate generation unit 131 and the index design support unit 132 may be included in the DBMS or may be a function outside the DBMS. The DBMS receives a query from a query source, and according to the query, refers to an index of a table specified in the query, and performs input/output to the database 121. The query source may be a device external to the data processing device 100, such as the user device 110, or may be an internal element of the data processing device 100 (for example, an application implemented by the processor 104 executing a computer program in memory).

図２は、データカタログ情報１２２の構成を示す。 Figure 2 shows the structure of data catalog information 122.

データカタログ情報１２２は、一つ又は複数のデータベース１２１における表毎のデータカタログを含む。データカタログは、表を構成する列毎の列名２０２とメタデータ２１０とを有するデータである。 The data catalog information 122 includes a data catalog for each table in one or more databases 121. The data catalog is data that has column names 202 and metadata 210 for each column that makes up the table.

データカタログ情報１２２は、表毎にエントリを有する。一つの表のエントリを例に取ると、エントリは、表名２０１、列名２０２、型／統計２０３、第１属性２０４、第２属性２０５及び説明２０６といった情報を含む。表は、一つ又は複数の列で構成される。一つの列を例に取ると、列のメタデータ２１０は、型／統計２０３、第１属性２０４、第２属性２０５及び説明２０６を含む。一つの表及び一つの列を例に取り、情報２０１～２０６を説明する。 The data catalog information 122 has an entry for each table. Taking an entry for one table as an example, the entry includes information such as table name 201, column name 202, type/statistics 203, first attribute 204, second attribute 205, and description 206. A table is made up of one or more columns. Taking an example of one column, the column metadata 210 includes type/statistics 203, first attribute 204, second attribute 205, and description 206. The information 201 to 206 will be explained using one table and one column as an example.

表名２０１は、表の名称を表す。列名２０２は、列の名称を表す。型／統計２０３は、列におけるデータの型（例えば、数値又は文字）と、列におけるデータの統計（例えば、最大値、最小値及び平均値）とを表す。第１属性２０４は、列におけるデータの特性（例えば、列における重複データの有無、ソートの有無）を表す。第２属性２０５は、使われ方（データベース１２１へのクエリ（例えばＳＱＬ文）に指定可能なデータベースオペレーション（例えば、結合、ソート、絞込）に関する条件（例えば、結合キー、ソートキー、絞込条件））を表す。説明２０６は、列の説明（例えば、テキスト文）を表す。 Table name 201 indicates the name of the table. Column name 202 indicates the name of the column. Type/statistics 203 indicates the type of data in the column (e.g., numeric or character) and statistics of the data in the column (e.g., maximum, minimum, and average). First attribute 204 indicates the characteristics of the data in the column (e.g., whether there is duplicate data in the column, whether it is sorted, etc.). Second attribute 205 indicates how it is used (conditions (e.g., join key, sort key, filter condition) related to database operations (e.g., join, sort, filter) that can be specified in a query (e.g., SQL statement) to database 121). Description 206 indicates a description of the column (e.g., text statement).

図３は、索引定義情報１２３の構成例を示す。 Figure 3 shows an example of the configuration of index definition information 123.

索引定義情報１２３は、一つ又は複数のデータベース１２１における生成済の索引毎にエントリを有する。一つの索引のエントリが、当該索引の索引定義に相当する。一つの索引のエントリを例に取ると、エントリは、索引名３０１、表名３０２、索引種別３０３及び列名３０４といった情報を有する。 The index definition information 123 has an entry for each index that has been generated in one or more databases 121. An entry for one index corresponds to the index definition for that index. Taking an entry for one index as an example, the entry has information such as index name 301, table name 302, index type 303, and column name 304.

索引名３０１は、索引の名称を表す。表名３０２は、索引が対応する表の名称を表す。索引種別３０３は、索引の種別（例えば、Ｂ－ｔｒｅｅ、レンジ）を表す。列名３０４は、索引に含まれている列名を表す。 Index name 301 indicates the name of the index. Table name 302 indicates the name of the table to which the index corresponds. Index type 303 indicates the type of index (e.g., B-tree, range). Column name 304 indicates the names of the columns included in the index.

図３が示すように、一つの表に、一つ又は複数の索引が存在し得る。また、少なくとも一つの索引定義は、当該索引定義が表す索引が対応した表のデータカタログに含まれていてもよい。 As shown in FIG. 3, one or more indexes may exist for a table. Also, at least one index definition may be included in the data catalog for the table to which the index it represents corresponds.

図４は、第１の実施形態に係る索引設計支援処理の流れを示す。 Figure 4 shows the flow of the index design support process according to the first embodiment.

索引設計支援部１３２は、索引定義（例えば、当該索引定義が表す索引が対応した表Ａのデータカタログに含まれている索引定義）を取得する（Ｓ４０１）。この索引定義は、ユーザ装置１１０から入力され永続記憶装置１０２に格納されてもよいし、永続記憶装置１０２から取得されてもよい。 The index design support unit 132 acquires an index definition (for example, an index definition included in the data catalog of table A to which the index represented by the index definition corresponds) (S401). This index definition may be input from the user device 110 and stored in the persistent storage device 102, or may be acquired from the persistent storage device 102.

索引設計支援部１３２は、Ｓ４０１で取得した索引定義に含まれている表名（表Ａの表名）と一致する表名を持つデータカタログのうち、Ｓ４０１で取得した索引定義が有する列名３０４に一致する列名２０２を特定し、特定した列名２０２に対応のメタデータ２１０を取得する（Ｓ４０２）。なお、Ｓ４０２で取得されたメタデータは、データカタログから取得されたメタデータ２１０に代えて、Ｓ４０１で取得した索引定義に含まれている表名の表Ａから取得されたメタデータ（例えば、特に、Ｓ４０１で取得した索引定義が表す列毎に当該表から取得されたメタデータ）でもよい。 The index design support unit 132 identifies a column name 202 that matches the column name 304 of the index definition acquired in S401 from among the data catalogs having a table name that matches the table name (table name of table A) included in the index definition acquired in S401, and acquires metadata 210 corresponding to the identified column name 202 (S402). Note that the metadata acquired in S402 may be metadata acquired from table A of the table name included in the index definition acquired in S401 (e.g., metadata acquired from the table for each column represented by the index definition acquired in S401) instead of the metadata 210 acquired from the data catalog.

索引設計支援部１３２の類似性判定部１３６が、類似性判定処理を行う（Ｓ４０３）。 The similarity determination unit 136 of the index design support unit 132 performs a similarity determination process (S403).

類似性判定処理において、メタデータ２１０に類似するメタデータ（メタデータ２１０との類似度Ｓ_ｉが所定の閾値Ｔｈ_ｉ以上のメタデータ）があれば（Ｓ４０４：ＹＥＳ）、索引設計支援部１３２の索引推薦部１３７が、索引推薦処理を行う。 In the similarity determination process, if there is metadata similar to the metadata 210 (metadata whose similarity S _i with the metadata 210 is equal to or greater than a predetermined threshold Th _i ) (S404: YES), the index recommendation unit 137 of the index design support unit 132 performs index recommendation processing.

図５は、類似性判定処理の流れを示す。 Figure 5 shows the flow of the similarity determination process.

図５の説明において、比較元のメタデータは、類似性判定処理の前に取得されたメタデータ（本実施形態では、Ｓ４０２で取得されたメタデータ）である。比較先のメタデータは、比較元のメタデータと比較されるメタデータであって、比較元のメタデータに対応した列を有する表Ａとは別の表Ｂの列のメタデータ（表Ｂのデータカタログにおけるメタデータ）である。一つの比較元のメタデータについて、表Ｂのうち、全ての列のメタデータが比較先のメタデータとされてもよい。一つの比較元のメタデータと一つの比較先のメタデータを例に取り、類似性判定処理を説明する。なお、一つの比較元のメタデータについて、比較先のメタデータは、比較元のメタデータに対応した列の列名と一致する列名の列のメタデータのみでもよい。 In the explanation of FIG. 5, the comparison source metadata is the metadata acquired before the similarity determination process (in this embodiment, the metadata acquired in S402). The comparison target metadata is the metadata to be compared with the comparison source metadata, and is the metadata of a column of table B (metadata in the data catalog of table B) that is different from table A having a column corresponding to the comparison source metadata. For one comparison source metadata, the metadata of all columns in table B may be set as the comparison target metadata. The similarity determination process will be explained using one comparison source metadata and one comparison target metadata as an example. Note that for one comparison source metadata, the comparison target metadata may be only the metadata of a column whose column name matches the column name of the column corresponding to the comparison source metadata.

類似性判定部１３６は、比較元のメタデータに対応した列名２０２と、比較先のメタデータに対応した列の列名２０２とが一致するか否かを判定する（Ｓ５０１）。Ｓ５０１の判定結果が真の場合（Ｓ５０１：ＹＥＳ）、類似性判定部１３６は、現在の点数Ｓ_ｃに所定の点数（例えば“１”）を加算する。 The similarity determination unit 136 determines whether the column name 202 corresponding to the metadata of the comparison source matches the column name 202 corresponding to the metadata of the comparison target (S501). If the determination result of S501 is true (S501: YES), the similarity determination unit 136 adds a predetermined score (for example, "1") to the current score _Sc .

また、類似性判定部１３６は、比較元のメタデータにおける型／統計２０３が表すデータ型と、比較先のメタデータにおける型／統計２０３が表すデータ型とが一致するか否かを判定する（Ｓ５０２）。Ｓ５０２の判定結果が真の場合（Ｓ５０２：ＹＥＳ）、類似性判定部１３６は、現在の点数Ｓ_ｃに所定の点数（例えば“１”）を加算する。 The similarity determination unit 136 also determines whether the data type represented by the type/statistics 203 in the metadata of the comparison source matches the data type represented by the type/statistics 203 in the metadata of the comparison target (S502). If the determination result of S502 is true (S502: YES), the similarity determination unit 136 adds a predetermined score (for example, "1") to the current score _Sc .

その後、類似性判定部１３６は、比較元のメタデータにおける型／統計２０３が表す統計と、比較先のメタデータにおける型／統計２０３が表す統計との一致度に応じた点数を、現在の点数Ｓ_ｃに加算する（Ｓ５０５）。この段落で言う「一致度」は、比較元のメタデータにおける型／統計２０３が表す統計と、比較先のメタデータにおける型／統計２０３が表す統計との間で互いに一致した要素（例えば、最大値、最小値又は平均値）の数に依存する。 Thereafter, the similarity determination unit 136 adds a score according to the degree of agreement between the statistics represented by the type/statistics 203 in the metadata of the comparison source and the statistics represented by the type/statistics 203 in the metadata of the comparison target to the current score S _c (S505). The "degree of agreement" referred to in this paragraph depends on the number of elements (e.g., maximum values, minimum values, or average values) that match between the statistics represented by the type/statistics 203 in the metadata of the comparison source and the statistics represented by the type/statistics 203 in the metadata of the comparison target.

類似性判定部１３６は、比較元のメタデータにおけるデータ特性（第１属性２０４）と、比較先のメタデータにおけるデータ特性（第１属性２０４）とが一致するか否かを判定する（Ｓ５０６）。Ｓ５０６の判定結果が偽の場合（Ｓ５０６：ＮＯ）、類似性判定部１３６は、比較先のメタデータが比較元のメタデータに類似しないと判定する（Ｓ５１１）。この場合、点数Ｓ_ｃは初期値にリセットされてよい。 The similarity determination unit 136 determines whether the data characteristics (first attribute 204) in the comparison source metadata match the data characteristics (first attribute 204) in the comparison target metadata (S506). If the determination result in S506 is false (S506: NO), the similarity determination unit 136 determines that the comparison target metadata is not similar to the comparison source metadata (S511). In this case, the score S _c may be reset to an initial value.

Ｓ５０６の判定結果が真の場合（Ｓ５０６：ＹＥＳ）、類似性判定部１３６は、比較元のメタデータにおける使われ方（第２属性２０５）と、比較先のメタデータにおける使われ方（第２属性２０５）とが一致するか否かを判定する（Ｓ５０７）。Ｓ５０７の判定結果が偽の場合（Ｓ５０７：ＮＯ）、Ｓ５１１が行われる。 If the result of the determination in S506 is true (S506: YES), the similarity determination unit 136 determines whether the usage (second attribute 205) in the comparison source metadata matches the usage (second attribute 205) in the comparison target metadata (S507). If the result of the determination in S507 is false (S507: NO), S511 is performed.

Ｓ５０７の判定結果が真の場合（Ｓ５０７：ＹＥＳ）、類似性判定部１３６は、比較元のメタデータにおける説明２０６（テキスト文）と、比較先のメタデータにおける説明２０６との一致度に応じた点数を、現在の点数Ｓ_ｃに加算する（Ｓ５０８）。この段落で言う「一致度」は、比較元のメタデータにおける説明２０６と、比較先のメタデータにおける説明２０６との間で互いに一致した要素の数に依存する。ここでの「要素」は、単語でもよいし、単語とその位置との組合せでもよい。要素は、類似性判定部１３６により説明２０６（説明）から特定される。 If the result of the determination in S507 is true (S507: YES), the similarity determination unit 136 adds a score according to the degree of agreement between the description 206 (text sentence) in the metadata of the comparison source and the description 206 in the metadata of the comparison destination to the current score S _c (S508). The "degree of agreement" referred to in this paragraph depends on the number of elements that match between the description 206 in the metadata of the comparison source and the description 206 in the metadata of the comparison destination. The "element" here may be a word or a combination of a word and its position. The element is identified from the description 206 (description) by the similarity determination unit 136.

類似性判定部１３６は、現在の点数Ｓ_ｃが所定の閾値Ｔｈ_ｃ以上か否かを判定する（Ｓ５０９）。Ｓ５０９の判定結果が偽の場合（Ｓ５０９：ＮＯ）、Ｓ５１１が行われる。 The similarity determination unit 136 determines whether the current score _Sc is equal to or greater than a predetermined threshold _Thc (S509). If the determination result of S509 is false (S509: NO), S511 is performed.

Ｓ５０９の判定結果が真の場合（Ｓ５０９：ＹＥＳ）、類似性判定部１３６は、比較先のメタデータが比較元のメタデータに類似すると判定する（Ｓ５１０）。この場合、点数Ｓ_ｃが前述の類似度Ｓ_ｉでよく、点数Ｓ_ｃの閾値Ｔｈ_ｃが前述の閾値Ｔｈ_ｉでよい。つまり、少なくとも一つの比較元のメタデータについてＳ５０９の判定結果が真の場合、図４のＳ４０４の判定結果が真でよい。 If the determination result of S509 is true (S509: YES), the similarity determination unit 136 determines that the comparison target metadata is similar to the comparison source metadata (S510). In this case, the score S _c may be the similarity S _i described above, and the threshold Th _c of the score S _c may be the threshold Th _i described above. In other words, if the determination result of S509 is true for at least one comparison source metadata, the determination result of S404 in FIG. 4 may be true.

図６は、索引推薦処理の流れを示す。 Figure 6 shows the flow of the index recommendation process.

索引推薦部１３７は、同一表Ｂについて、類似性判定処理において類似と判定された比較先メタデータに対応の列の列名３０４をリストアップする（Ｓ６０１）。 The index recommendation unit 137 lists the column names 304 of columns corresponding to the comparison target metadata that are determined to be similar in the similarity determination process for the same table B (S601).

索引推薦部１３７は、リストアップされた列名３０４を有する索引（索引候補）を生成し（Ｓ６０２）、当該索引（索引候補）を、表Ｂの索引として推薦する（Ｓ６０３）。 The index recommendation unit 137 generates an index (index candidate) having the listed column names 304 (S602) and recommends the index (index candidate) as an index for table B (S603).

Ｓ６０２で生成される索引は、Ｓ４０１で取得された索引定義が表す索引種別と同じ種別の索引でよい。 The index generated in S602 may be of the same type as the index type represented by the index definition obtained in S401.

また、Ｓ６０３の「推薦」とは、生成された索引（索引候補）がユーザ（索引設計者）に対して提示される（例えば、ユーザ装置１１０に表示される）ことでもよいし、生成された索引（索引候補）が、表Ｂの索引の一つとして、表Ｂを含むデータベース１２１に格納されることでもよい。また、Ｓ６０３の「推薦」は、メモリ１０３や永続記憶装置１０２に、生成された索引（索引候補）の定義を出力することであってもよい。 The "recommendation" in S603 may mean that the generated index (index candidate) is presented to the user (index designer) (for example, displayed on the user device 110), or that the generated index (index candidate) is stored in the database 121 that includes Table B as one of the indexes for Table B. The "recommendation" in S603 may mean that a definition of the generated index (index candidate) is output to the memory 103 or the persistent storage device 102.

第１の実施形態によれば、表Ａの索引の索引定義を基に特定されたデータカタログを用いて、表Ｂの索引（索引候補）を自動生成し推薦することができる。結果として、表Ｂの索引の設計に要する工数が低減される。なお、表Ａの索引は、索引候補生成部１３１により生成されてよい。
［第２の実施形態］ According to the first embodiment, it is possible to automatically generate and recommend an index (index candidate) for table B using a data catalog identified based on the index definition of the index for table A. As a result, the number of steps required to design the index for table B is reduced. Note that the index for table A may be generated by the index candidate generation unit 131.
Second Embodiment

第２の実施形態を説明する。その際、第１の実施形態との相違点を主に説明し、第１の実施形態との共通点については説明を省略又は簡略する（この点は、第３及び第４の実施形態についても同様）。 The second embodiment will now be described. In doing so, the differences from the first embodiment will be mainly described, and the description of the points in common with the first embodiment will be omitted or simplified (this also applies to the third and fourth embodiments).

図７は、第２の実施形態に係る索引設計支援処理の流れを示す。 Figure 7 shows the flow of the index design support process according to the second embodiment.

Ｓ４０１及びＳ４０２に代えて、Ｓ７０１及びＳ７０２が行われる。 Instead of S401 and S402, S701 and S702 are performed.

Ｓ７０１では、索引設計支援部１３２は、データカタログに登録されていない表Ａの列のメタデータと、当該表Ａの索引とを取得する。取得されたメタデータに対応の列は、取得された索引が表す列でよい。 In S701, the index design support unit 132 acquires metadata for a column of table A that is not registered in the data catalog, and an index for table A. The column corresponding to the acquired metadata may be the column represented by the acquired index.

Ｓ７０２では、索引設計支援部１３２は、Ｓ７０１で取得された索引の索引定義に含まれている列名３０４を取得する。 In S702, the index design support unit 132 obtains the column name 304 included in the index definition of the index obtained in S701.

Ｓ７０３～Ｓ７０５は、Ｓ４０３～Ｓ４０５と実質的に同じである。Ｓ７０３において、比較元のメタデータは、Ｓ７０１で取得されたメタデータ（表Ａの列のメタデータ）でよい。 S703 to S705 are substantially the same as S403 to S405. In S703, the metadata to be compared may be the metadata obtained in S701 (metadata for columns of Table A).

第２の実施形態によれば、表Ａの索引の索引定義を基に、表Ｂの索引（索引候補）を自動生成し推薦することができる。結果として、表Ｂの索引の設計に要する工数が低減される。
［第３の実施形態］ According to the second embodiment, it is possible to automatically generate and recommend an index (index candidate) for table B based on the index definition of table A. As a result, the number of steps required to design an index for table B is reduced.
[Third embodiment]

図８は、第３の実施形態に係る索引設計支援処理の流れを示す。 Figure 8 shows the flow of the index design support process according to the third embodiment.

Ｓ４０１及びＳ４０２に代えて、Ｓ８０１が行われる。 S801 is performed instead of S401 and S402.

Ｓ８０１では、索引設計支援部１３２は、表Ａの列のメタデータを取得する。 In S801, the index design support unit 132 obtains metadata for columns of table A.

Ｓ８０２～Ｓ８０４は、Ｓ４０３～Ｓ４０５と実質的に同じである。Ｓ８０２において、比較元のメタデータは、Ｓ８０１で取得されたメタデータ（表Ａの列のメタデータ）でよい。 S802 to S804 are substantially the same as S403 to S405. In S802, the metadata to be compared may be the metadata obtained in S801 (metadata for columns of Table A).

第３の実施形態によれば、表Ａの列のメタデータを基に、表Ｂの索引（索引候補）を自動生成し推薦することができる。結果として、表Ｂの索引の設計に要する工数が低減される。
［第４の実施形態］ According to the third embodiment, it is possible to automatically generate and recommend an index (index candidate) for table B based on the metadata of the columns of table A. As a result, the number of steps required to design an index for table B is reduced.
[Fourth embodiment]

図９は、第４の実施形態に係る索引設計支援処理の流れを示す。 Figure 9 shows the flow of the index design support process according to the fourth embodiment.

Ｓ４０１及びＳ４０２に代えて、Ｓ９０１～Ｓ９０３が行われる。 Instead of S401 and S402, S901 to S903 are performed.

Ｓ９０１では、索引設計支援部１３２は、新たな表Ａ又は更新された表Ａの列のメタデータの変更を取得する。具体的には、例えば、下記のいずれかでよい。
・新たな表Ａは、データベース１２１に新たに格納された表でよい。新たな表Ａのデータカタログがデータカタログ情報１２２に追加されてよい。当該追加されたデータカタログから（又は、新たな表Ａから）、表Ａの列毎のメタデータが取得されてよい。
・更新された表Ａは、少なくとも一つの列が変更（例えば、追加又は更新）された表でよい。表Ａの更新に伴い、表Ａのデータカタログの少なくとも一つのメタデータが更新されてもよい。当該更新されたデータカタログから（又は、更新された表Ａのうちの変更された列から）、変更された列のメタデータが取得されてよい。 In S901, the index design support module 132 acquires changes in metadata of columns of new or updated table A. Specifically, for example, any of the following may be used.
The new table A may be a table newly stored in the database 121. A data catalog of the new table A may be added to the data catalog information 122. Metadata for each column of table A may be obtained from the added data catalog (or from the new table A).
The updated table A may be a table in which at least one column has been changed (e.g., added or updated). The update of table A may update at least one metadata of the data catalog of table A. Metadata of the changed column may be obtained from the updated data catalog (or from the changed column of the updated table A).

Ｓ９０２では、索引設計支援部１３２は、索引変更判定処理を行う。 In S902, the index design support unit 132 performs index change determination processing.

Ｓ９０３では、索引設計支援部１３２は、索引変更判定処理における判定結果が変更要か否かを判定する。 In S903, the index design support unit 132 determines whether the result of the index change determination process indicates that a change is required.

Ｓ９０３の判定結果が真の場合（Ｓ９０３：ＹＥＳ）、Ｓ９０４～Ｓ９０６が行われる。Ｓ９０４～Ｓ９０６は、Ｓ４０３～Ｓ４０５と実質的に同じでよい。例えば、下記が採用されてよい。 If the determination result of S903 is true (S903: YES), S904 to S906 are performed. S904 to S906 may be substantially the same as S403 to S405. For example, the following may be adopted.

すなわち、Ｓ９０４において、比較元のメタデータは、Ｓ９０１で取得されたメタデータ（新たな表Ａの列毎のメタデータ、又は、更新された表Ａのうちの変更されたメタデータ）でよい。 That is, in S904, the metadata to be compared may be the metadata acquired in S901 (the metadata for each column of the new Table A, or the changed metadata in the updated Table A).

また、Ｓ９０６において（具体的には、図６のＳ６０３において）、推薦される索引は、Ｓ６０２において生成された索引Ｂ（表Ｂの索引）と、当該索引Ｂを基に索引推薦部１３７により生成された索引Ａ（新たな表Ａ又は更新された表Ａの索引（索引候補））とのうちの一方又は両方でよい。索引Ａは、例えば、索引Ｂの索引種別と同じ索引種別の索引でよい。 In addition, in S906 (specifically, in S603 in FIG. 6), the recommended index may be one or both of index B (index of table B) generated in S602 and index A (index (index candidate) of new table A or updated table A) generated by the index recommendation unit 137 based on index B. Index A may be, for example, an index of the same index type as index B.

また、索引Ａは、索引Ｂに加えて、更新前の表Ａの索引Ａ´を用いて生成されてよい。索引Ａの列名のうち、変更が無い列の列名は、索引Ａ´の列名と同じでよい。索引Ａは、例えば、索引Ａ´の索引種別と同じ索引種別の索引でよい。 Index A may be generated using index A' of table A before the update, in addition to index B. The column names of the columns in index A that remain unchanged may be the same as the column names in index A'. Index A may be, for example, an index of the same index type as index A'.

図１０は、索引変更判定処理の流れを示す。 Figure 10 shows the flow of the index change determination process.

索引設計支援部１３２は、Ｓ９０１で取得されたメタデータの第１属性２０４のうちのソート有無が、変更前の列のメタデータの第１属性２０４のうちのソート有無と異なるか否かを判定する（Ｓ１００１）。 The index design support unit 132 determines whether the sorting status of the first attribute 204 of the metadata acquired in S901 differs from the sorting status of the first attribute 204 of the metadata of the column before the change (S1001).

Ｓ１００１の判定結果が真の場合（Ｓ１００１：ＹＥＳ）、索引設計支援部１３２は、変更前の表Ａの一つ又は複数の索引のうち、索引種別が“レンジ”であるレンジ索引を、変更要の索引と判定する（Ｓ１００２）。 If the result of the determination in S1001 is true (S1001: YES), the index design support unit 132 determines that the range index whose index type is "range" among one or more indexes of table A before the change is an index that needs to be changed (S1002).

Ｓ１００１の判定結果が偽の場合（Ｓ１００１：ＮＯ）、索引設計支援部１３２は、Ｓ９０１で取得されたメタデータの第２属性２０５（使われた方）が、変更前の列のメタデータの第２属性２０５と異なるか否かを判定する（Ｓ１００３）。 If the judgment result of S1001 is false (S1001: NO), the index design support unit 132 judges whether the second attribute 205 (the one used) of the metadata acquired in S901 is different from the second attribute 205 of the metadata of the column before the change (S1003).

Ｓ１００３の判定結果が真の場合（Ｓ１００３：ＹＥＳ）、索引設計支援部１３２は、変更前の表Ａの一つ又は複数の索引のうち、索引種別が“Ｂ－ｔｒｅｅ”であるＢ－ｔｒｅｅ索引を、変更要の索引と判定する（Ｓ１００４）。 If the result of the determination in S1003 is true (S1003: YES), the index design support unit 132 determines that the B-tree index whose index type is "B-tree" among one or more indexes of table A before the change is an index that needs to be changed (S1004).

Ｓ１００４の判定結果が偽の場合（Ｓ１００４：ＮＯ）、索引設計支援部１３２は、Ｓ９０１で取得されたメタデータが新たな表Ａのメタデータか否かを判定する（Ｓ１００５）。 If the judgment result of S1004 is false (S1004: NO), the index design support unit 132 judges whether the metadata acquired in S901 is metadata for a new table A (S1005).

Ｓ１００５の判定結果が真の場合（Ｓ１００５：ＹＥＳ）、索引設計支援部１３２は、変更要（索引Ａの生成）と判定する（Ｓ１００６）。 If the result of the determination in S1005 is true (S1005: YES), the index design support unit 132 determines that a change is required (generation of index A) (S1006).

Ｓ１００１、Ｓ１００３及びＳ１００５のうちのいずれかの判定結果が真の場合、Ｓ９０３の判定結果が真である。一方、Ｓ１００１、Ｓ１００３及びＳ１００５のうちのいずれかの判定結果も偽の場合、Ｓ９０３の判定結果が偽である。 If the judgment result of any of S1001, S1003, and S1005 is true, the judgment result of S903 is true. On the other hand, if the judgment result of any of S1001, S1003, and S1005 is false, the judgment result of S903 is false.

第４の実施形態によれば、表Ａの列のメタデータを基に、索引Ｂ（表Ｂの索引）を自動生成すること、及び、索引Ｂと索引Ａ（表Ａの索引であり索引Ｂを基に生成された索引）の一方又は両方を推薦することができる。結果として、索引Ｂ及び索引Ａの設計に要する工数が低減される。 According to the fourth embodiment, it is possible to automatically generate index B (an index for table B) based on the metadata of columns in table A, and to recommend one or both of index B and index A (an index for table A that is generated based on index B). As a result, the amount of work required to design index B and index A is reduced.

上述の第１～第４の実施形態を、例えば下記のように総括することができる。なお、下記の総括は、変形例の説明や補足説明を含んでよい。 The first to fourth embodiments described above can be summarized, for example, as follows. Note that the summary below may include explanations of modified examples and supplementary explanations.

データ処理装置１００は、記憶装置（例えば永続記憶装置１０２及びメモリ１０３を含む）とプロセッサ１０４とを含む。記憶装置は、データベース１２１における表Ｂ（第１の表の一例）を構成する列毎の列名２０２とメタデータ２１０とを有するデータカタログＢ（第１のデータカタログの一例）を記憶する。 The data processing device 100 includes a storage device (e.g., including a persistent storage device 102 and a memory 103) and a processor 104. The storage device stores a data catalog B (an example of a first data catalog) having column names 202 and metadata 210 for each column constituting table B (an example of a first table) in a database 121.

プロセッサ１０４は、表Ａ（第２の表の一例）を構成する列のメタデータを取得する。プロセッサ１０４は、当該取得されたメタデータと類似するメタデータ２１０がデータカタログＢにあるか否かを判定する。当該判定の結果が真の場合、プロセッサ１０４は、データカタログＢから、上記取得されたメタデータと類似するメタデータ２１０に対応した列名２０２を特定する。プロセッサ１０４は、当該特定された列名２０２を含んだ索引であって表Ｂの索引である索引Ｂ（第１の索引の一例）を生成する。プロセッサ１０４は、当該生成した索引Ｂと、当該索引Ｂに基づき生成され表Ａの索引である索引Ａとのうちの少なくとも一つを推薦する。 The processor 104 acquires metadata of columns constituting table A (an example of a second table). The processor 104 determines whether metadata 210 similar to the acquired metadata is present in the data catalog B. If the result of the determination is true, the processor 104 identifies, from the data catalog B, a column name 202 corresponding to the metadata 210 similar to the acquired metadata. The processor 104 generates an index B (an example of a first index) that includes the identified column name 202 and is an index of table B. The processor 104 recommends at least one of the generated index B and index A that is generated based on index B and is an index of table A.

このように、表Ａの列のメタデータと類似するメタデータを有する列を持った表Ｂの索引として、当該類似するメタデータを有する列の列名を含んだ索引Ｂが、自動生成され推薦される。このため、索引Ｂの設計に要する工数が低減する。 In this way, as an index for table B, which has a column with metadata similar to that of a column in table A, index B containing the column name of the column with the similar metadata is automatically generated and recommended. This reduces the amount of work required to design index B.

表ＢのデータカタログＢは、既存の技術を利用して用意されてよい。一般に、データカタログＢを含むデータカタログ情報１２２は、データベース１２１の中身の概要（例えば、どのような表が存在するか）を知るために用意される。このようなデータカタログ情報１２２におけるデータカタログＢを利用して、索引Ｂ（索引候補）の自動生成が実現される。このため、索引Ｂを生成するために表Ｂに関するクエリを作成する必要が無い。 Data catalog B for table B may be prepared using existing technology. In general, data catalog information 122 including data catalog B is prepared to obtain an overview of the contents of database 121 (e.g., what tables exist). Using data catalog B in such data catalog information 122, automatic generation of index B (index candidate) is realized. For this reason, there is no need to create a query related to table B in order to generate index B.

表Ｂが属する環境（例えば本番環境）は、表Ａが属する環境（例えばテスト環境）と異なっていてよい。また、表Ｂが含まれるデータベース１２１は、表Ａが含まれるデータベース１２１と異なっていてよい。例えば、それらの環境又はデータベース１２１のうち、一方が、ローカルの環境又はデータベース１２１であり、他方が、リモートの環境又はデータベース１２１（例えば、クラウド又はクラウドストレージ内のデータベース）でもよい。 The environment to which table B belongs (e.g., a production environment) may be different from the environment to which table A belongs (e.g., a test environment). Furthermore, the database 121 in which table B is included may be different from the database 121 in which table A is included. For example, one of the environments or databases 121 may be a local environment or database 121, and the other may be a remote environment or database 121 (e.g., a database in the cloud or cloud storage).

記憶装置は、表Ａの一つ以上の索引を表す索引定義を含む索引定義情報１２３を記憶してよい。一つの索引Ａ´を例に取ると、索引定義は、索引Ａ´が有する列名であって表Ａの列名を含んでよい。プロセッサ１０４は、索引Ａ´について索引定義から列名を特定してよい。上記取得されたメタデータは、当該特定された列名に対応する列のメタデータでよく、推薦される索引は、索引Ｂでよい。このように、表Ａの索引Ａ´を利用して表Ｂに関するクエリ無しに表Ｂの索引Ｂを生成し推薦することができる。 The storage device may store index definition information 123 including an index definition representing one or more indexes of table A. Taking one index A' as an example, the index definition may include column names of table A that are included in index A'. Processor 104 may identify column names for index A' from the index definition. The obtained metadata may be metadata of columns corresponding to the identified column names, and the recommended index may be index B. In this way, index A' of table A can be used to generate and recommend index B for table B without a query on table B.

表Ａは、新たに入力された表、或いは、列の変更（例えば、列の追加又は更新）があった表でよい。推薦される索引は、索引Ｂ及び索引Ａのうち少なくとも索引Ａでよい。これにより、新たに入力された表Ａ、或いは、列の変更があった表Ａについて、当該表Ａに関するクエリ無しに、索引Ａ（索引候補）の自動生成及び推薦がされる。 Table A may be a newly entered table or a table that has had a column change (e.g., a column added or updated). The recommended index may be at least index A out of index B and index A. As a result, index A (index candidate) is automatically generated and recommended for newly entered table A or table A that has had a column change, without a query on table A.

表Ａについて取得されたメタデータは、変更された列のメタデータでよい。このメタデータは、表Ａのデータカタログから取得されてよい。プロセッサ１０４は、当該取得されたメタデータのうちの変更されたデータの属性種別（例えば、データ特性又は使われ方）を特定し、当該特定された属性種別を基に、索引の変更の要否を判定してよい。当該判定結果が真の場合、プロセッサ１０４は、当該特定された属性種別に対応する索引種別（例えば、“レンジ”又は“Ｂ－ｔｒｅｅ”）を特定し、索引Ｂを、表Ｂについて既に存在する一つ又は複数の索引のうちの、当該特定された索引種別の索引を用いて生成してよい。索引Ａは、この索引Ｂを基に生成される。このため、索引Ｂの索引種別と、生成された索引Ａの索引種別は同じである。具体的には、特定された属性種別が、ソート有無の場合、又は、使われ方の場合、索引の変更が要と判定されてよい。特定された属性種別が、ソート有無の場合、特定された索引種別は、レンジでよい。特定された属性種別が、使われ方の場合、特定された索引種別は、Ｂ－ｔｒｅｅでよい。このように、表Ｂについて既に存在する一つ又は複数の索引から、変更されたデータの属性種別に適切な索引種別の索引が選択され、当該選択された索引を基に、索引Ｂ及び索引Ａの生成がされる。つまり、適切な索引種別の索引Ａを効率的に生成することができる。 The metadata acquired for table A may be metadata of the changed column. This metadata may be acquired from the data catalog of table A. The processor 104 may identify the attribute type (e.g., data characteristics or usage) of the changed data among the acquired metadata, and may determine whether or not the index needs to be changed based on the identified attribute type. If the determination result is true, the processor 104 may identify an index type (e.g., "range" or "B-tree") corresponding to the identified attribute type, and may generate index B using an index of the identified index type among one or more indexes already existing for table B. Index A is generated based on this index B. Therefore, the index type of index B and the index type of the generated index A are the same. Specifically, if the identified attribute type is whether or not sorted, or how it is used, it may be determined that the index needs to be changed. If the identified attribute type is whether or not sorted, the identified index type may be range. If the specified attribute type is the way it is used, the specified index type may be a B-tree. In this way, an index of an appropriate index type for the attribute type of the changed data is selected from one or more indexes already existing for table B, and index B and index A are generated based on the selected index. In other words, index A of the appropriate index type can be generated efficiently.

列毎のメタデータは、当該列におけるデータの特性、及び、使われ方、を含んでよい。取得されたメタデータと類似するメタデータは、データ特性及び使われ方のいずれも一致していて、当該取得されたメタデータのうちのデータ特性及び使われ方以外のデータについて類似度が一定値以上であることでよい。これにより、類似するメタデータとして適切なメタデータの特定が可能である。 The metadata for each column may include the characteristics of the data in that column and how it is used. Metadata that is similar to the acquired metadata may have the same data characteristics and how it is used, and the degree of similarity of the acquired metadata other than the data characteristics and how it is used may be equal to or greater than a certain value. This makes it possible to identify appropriate metadata as similar metadata.

以上、幾つかの実施形態を説明したが、これらは本発明の説明のための例示であって、本発明の範囲をこれらの実施形態にのみ限定する趣旨ではない。本発明は、他の種々の形態でも実行することが可能である。 Although several embodiments have been described above, these are merely examples for the purpose of explaining the present invention, and the scope of the present invention is not intended to be limited to these embodiments. The present invention can also be implemented in various other forms.

１００…データ処理装置 100...Data processing device

Claims

A data processing device including a storage device and a processor,
the storage device stores a first data catalog having column names and metadata for each column constituting a first table in a database;
The processor,
Obtaining metadata for columns that make up a second table;
determining whether the first data catalog contains metadata similar to the retrieved metadata;
If the result of the determination is true, identifying column names from the first data catalog that correspond to metadata similar to the retrieved metadata;
generating a first index that includes the identified column names and that is an index on the first table;
recommending at least one of the generated first index and a second index generated based on the first index and being an index of the second table;
Data processing device.

the storage device stores an index definition for the second table;
the index definition includes column names of the second table that are included in an index of the second table;
The processor identifies column names from the index definition;
the obtained metadata is metadata for a column corresponding to the identified column name;
The recommended index is the first index.
2. A data processing apparatus according to claim 1.

The second table is a table that has been newly entered or has had a column change;
the recommended index is at least the second index of the first index and the second index;
2. A data processing apparatus according to claim 1.

the retrieved metadata is metadata for columns that have changed;
The processor,
Identifying different attribute types between the acquired metadata and the metadata of the column before the change;
Based on the identified attribute type, determining whether or not the index needs to be changed;
If the result of the determination is true,
Identifying an index type corresponding to the identified attribute type;
generating the first index using an index of the specified index type from among one or more indexes already existing for the first table;
4. A data processing apparatus according to claim 3.

If the identified attribute type is whether or not to sort, or a usage that is a condition related to a database operation that can be specified in a query, the determination result is true;
When the specified attribute type is sorted or not, the specified index type is range;
If the specified attribute type is usage, the specified index type is B-tree.
5. A data processing apparatus according to claim 4.

The metadata for each column includes data representing:
A data characteristic that is a characteristic of the data in the column; and
Usage, which is a condition that can be specified in a query about the database operations that are performed using the data in that column;
The metadata similar to the acquired metadata is data that satisfies the following conditions:
Both the data characteristics and the usage are consistent,
The similarity of the acquired metadata with respect to data other than the data characteristics and the usage is equal to or greater than a certain value.
2. A data processing apparatus according to claim 1.

A data processing method carried out by a computer, comprising the steps of:
(A) obtaining metadata for columns constituting a second table;
(B) determining whether metadata similar to the retrieved metadata is present in a first data catalog;
(C) if the result of the determination is true, identifying, from the first data catalog, column names corresponding to metadata similar to the retrieved metadata;
(D) generating a first index that includes the identified column name, the first index being an index on a first table;
(E) recommending at least one of the generated first index and a second index generated based on the first index and being an index of the second table;
the first table is a table in a database;
The first data catalog is data having column names and metadata for each column constituting the first table.
Data processing methods.