JP7789079B2

JP7789079B2 - Asynchronous persistence of replicated data changes in database accelerators

Info

Publication number: JP7789079B2
Application number: JP2023553633A
Authority: JP
Inventors: バイエル、フェリックス; バタースタイン、デニス; ルーク、エイナー; ペラトナー－チャフラー、サビーネ
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2021-03-19
Filing date: 2022-02-16
Publication date: 2025-12-19
Anticipated expiration: 2042-02-16
Also published as: WO2022193894A1; JP2024510137A; WO2022193893A1; DE112022000492T5; JP2024512335A; JP7843769B2; DE112022000767T5

Description

本発明は、一般的にデータベースに対するクラッシュ回復に関し、より具体的には、リンクド・データベースに対するクラッシュ回復のためのコンピュータ実施方法に関する。本発明はさらに、リンクド・データベースに対するクラッシュ回復を有するリンクド・データベース・システムと、コンピュータ・プログラム製品とに関する。 The present invention relates generally to crash recovery for databases, and more particularly to a computer-implemented method for crash recovery for linked databases. The present invention further relates to a linked database system with crash recovery for linked databases, and a computer program product.

大量のデータを管理することは、エンタープライズＩＴ（ｉｎｆｏｒｍａｔｉｏｎｔｅｃｈｎｏｌｏｇｙ）組織にとって継続する課題であり続けている。これは（ｉ）ますます増加するデータ量によるものであり、かつ（ｉｉ）多様なデータに対する観点でもある。エンタープライズＩＴシステムには、従来の構造化データに加えて、半構造化データおよびいわゆる非構造化データも大量に記憶される。さらに、より多くなったデータ分析に対する産業要件を満たすために、従来のトランザクション指向データベースに加えて、分析処理に対して最適化された特殊なデータベース・システムが導入されている。他方で、従来のトランザクション・データベースにおいてすべての分析処理を行うことが試みられてきた。しかし、分析処理は、オンライン・トランザクション処理のパフォーマンスに対してあまりにも強力で予測不可能な影響を与えることが明らかになった。 Managing large amounts of data remains a continuing challenge for enterprise IT (information technology) organizations. This is due to (i) the ever-increasing amount of data and (ii) the variety of data perspectives. In addition to traditional structured data, enterprise IT systems also store large amounts of semi-structured and so-called unstructured data. Furthermore, to meet increasing industrial requirements for data analysis, specialized database systems optimized for analytical processing have been introduced in addition to traditional transaction-oriented databases. On the other hand, attempts have been made to perform all analytical processing in traditional transaction databases. However, analytical processing has proven to have an excessively strong and unpredictable impact on the performance of online transaction processing.

結果として、トランザクション指向データベース管理システムと、分析最適化データベース管理システムとの巧みな組み合わせが導入されてきた。こうした組み合わせ製品の一例は、ＩＢＭＤＢ２（ＩＢＭ社の登録商標）分析アクセラレータ（ＩＤＡＡ：ＩＢＭＤＢ２ＡｎａｌｙｔｉｃｓＡｃｃｅｌｅｒａｔｏｒ）アーキテクチャに基づくものである。 As a result, sophisticated combinations of transaction-oriented database management systems and analytics-optimized database management systems have been introduced. One example of such a combination product is based on the IBM DB2 Analytics Accelerator (IDAA) architecture.

一方側で、本格的な行ベースのデータベース管理システム（ＤＢＭＳ：ｄａｔａｂａｓｅｍａｎａｇｅｍｅｎｔｓｙｓｔｅｍ）が、データ操作言語（ＤＭＬ：ｄａｔａｍａｎｉｐｕｌａｔｉｏｎｌａｎｇｕａｇｅ）動作およびクエリ実行に対するアプリケーション・エンドポイントの働きをしてもよい。ヒューリスティックな決定基準および規則を用いて、クエリ・オプティマイザは、クエリをソースＤＢＭＳにおいて、特にオンライン・トランザクション処理にて実行すべきか、または特にオンライン分析処理のためにターゲットＤＢＭＳにオフロードすべきかを透過的に決定してもよい。 On one side, a full-fledged row-based database management system (DBMS) may serve as the application endpoint for data manipulation language (DML) operations and query execution. Using heuristic decision criteria and rules, the query optimizer may transparently decide whether a query should be executed in the source DBMS, particularly for online transaction processing, or offloaded to the target DBMS, particularly for online analytical processing.

ターゲットＤＢＭＳは、ソースＤＢＭＳテーブルの選択されたセットのシャドウ・コピーを保持する本格的な列ベースのデータベース管理システムであってもよい。ターゲットＤＢＭＳのシャドウ・コピーを作成するための戦略は、ソースＤＢＭＳの１つ以上のテーブルに記憶されたデータを、所与の時点でターゲットＤＢＭＳに移すことを含んでもよい。しかし、ロードがいくらか前に行われ、その間に対応するソース・テーブルが修正されているとき、ターゲットＤＢＭＳ内のクエリ・データが期限切れになるかもしれない。したがって、テーブルの内容は通常、対応するソース・データベース・テーブルにおいて変更が記録される際に増分的に適合される。 The target DBMS may be a full-fledged column-based database management system that maintains a shadow copy of a selected set of source DBMS tables. A strategy for creating a shadow copy of the target DBMS may involve migrating data stored in one or more tables in the source DBMS to the target DBMS at a given point in time. However, query data in the target DBMS may become out of date if the load occurred some time ago and the corresponding source tables have been modified in the meantime. Therefore, the contents of tables are typically adapted incrementally as changes are recorded in the corresponding source database tables.

どちらのデータベース・システムも、それぞれのデータベースのＡＣＩＤ（原子性、一貫性、独立性、耐久性（ａｔｏｍｉｃ、ｃｏｎｓｉｓｔｅｎｔ、ｉｓｏｌａｔｉｏｎ、ｄｕｒａｂｌｅ））特性を保証するためのトランザクション機構を実装する。すなわち、同時修正はロッキング技術によって適切に分離され、一貫性チェックによってデータベースの状態が１つの一貫した状態から別の状態に移ることが保証されてもよく、トランザクション変更に対する原子性および耐久性を保証するためにたとえばログ先行書込みなどのロギング技術が実装されてもよい。 Both database systems implement transaction mechanisms to guarantee the ACID (atomic, consistent, isolation, durable) properties of their respective databases. That is, concurrent modifications are properly isolated by locking techniques, consistency checks may ensure that the database state moves from one consistent state to another, and logging techniques such as write-ahead logging may be implemented to guarantee atomicity and durability of transactional changes.

しかし、ソース・データベースが現在のトランザクションによって修正されるとき、ＡＣＩＤ特性を保証することは顕著なオーバーヘッドをもたらすことがある。特に、ロギングは挿入／更新／削除（ＩＵＤ：ｉｎｓｅｒｔ／ｕｐｄａｔｅ／ｄｅｌｅｔｅ）ステートメントの処理を減速させる。なぜなら、トランザクションを継続し得る前に変更記録を永続記憶媒体に書込む必要があるからである。データベース・テーブルの内容に対する変更の頻繁に使用されるメモリ内処理と比較して、トランザクション・ログを用いてこの持続性レイヤを追加することは比較的低速である。他方で、故障の場合にそれぞれのデータベースの最新の一貫した状態を復元するために、クラッシュ回復が必要とされる。 However, ensuring ACID properties when the source database is modified by the current transaction can incur significant overhead. In particular, logging slows down the processing of insert/update/delete (IUD) statements because change records must be written to persistent storage before the transaction can continue. Compared to the frequently used in-memory processing of changes to the contents of database tables, adding this durability layer using a transaction log is relatively slow. On the other hand, crash recovery is required to restore the latest consistent state of the respective database in the event of a failure.

ＩＤＡＡなどのデータベース・アクセラレータの状況において、ターゲット・アクセラレータ・データベースは、ソース・データベースのスナップショットを単純にミラーリングする。すなわち、ソース・データベースがデータのマスタの役割をしてもよいし、またはＩＵＤを介したデータ操作がソース・データベース管理システムによって処理されて、更新技術を介してターゲット・データベースに変更が複製されてもよい。したがってアクセラレータ・データベースは、高速の分析動作を必要とし得るソース・データベースのパーティションを記憶するためのキャッシュの役割をしてもよい。 In the context of a database accelerator such as IDAA, the target accelerator database simply mirrors a snapshot of the source database. That is, the source database may act as the master of the data, or data manipulation via the IUD may be handled by the source database management system, with changes replicated to the target database via update techniques. The accelerator database may therefore act as a cache for storing partitions of the source database that may require fast analytical operations.

この文脈ですでにいくつかの開示が存在する。特許文献１の文書は、クリーン・シャットダウン状態でデータベース・データを複製し、複製データの読取り専用コピーを生成するための方法を記載する。関連するシステムは、データベース・アプリケーションからソース・ストレージ・デバイスへの第１のトランザクションを監視して、そのアプリケーションの既知の良好な状態を示す少なくとも１つのマーカを有するログ・エントリを生成する追跡モジュールを含み得る。このシステムは、データベースおよびログ・ファイルを含むターゲット・ストレージ・デバイスに結合されたコンピュータをさらに含む。このコンピュータはログ・エントリに基づいてトランザクションを処理して、データをターゲット・ストレージ・デバイスに複製し、第１のスナップショットを行って、ログ・ファイルに記憶されたデータをデータベース内に再現する。 Several disclosures already exist in this context. The document US Pat. No. 6,299,499 describes a method for replicating database data in a clean shutdown state and generating a read-only copy of the replicated data. A related system may include a tracking module that monitors first transactions from a database application to a source storage device and generates log entries with at least one marker indicating a known good state of the application. The system further includes a computer coupled to a target storage device containing a database and a log file. The computer processes the transactions based on the log entries to replicate the data to the target storage device and performs a first snapshot to recreate the data stored in the log file in the database.

加えて特許文献２の文書は、高速のクラッシュ回復を実現する分散型データベース・システムを記載する。データベース・ヘッド・ノード故障からの回復の際に、そのデータベース・ヘッド・ノードによって実装されるデータベースに対するデータを記憶する分散型ストレージ・システムの１つ以上のストレージ・ノードとの接続が確立されてもよい。ストレージ・ノードとの接続の確立の際に、そのデータベースはたとえばさまざまなアクセス要求などに対するアクセスのために利用可能にされてもよい。 Additionally, the document in Patent Document 2 describes a distributed database system that provides fast crash recovery. Upon recovery from a database head node failure, a connection may be established with one or more storage nodes of a distributed storage system that store data for the database implemented by that database head node. Upon establishing a connection with a storage node, the database may be made available for access, for example, for various access requests.

しかし、一方側ではトランザクションに対して最適化され、他方側では分析処理に対して最適化される従来の結合データベースはすべて、永続的な回復ログを維持するための公知のオーバーヘッドを必要とすることがあり、それによって、ターゲット・データベースにおけるソース・データベースからの変更の最適化されていない使用に加えて、組み合わされたデータベースの処理が顕著に減速することがある。したがって、組み合わされたトランザクション／分析データベース管理システムに対するより高いパフォーマンス値を達成し、同時にターゲット・データベースの回復の課題をエレガントに解決するために、必要とされるオーバーヘッドを低減させる必要があり得る。 However, conventional combined databases that are optimized for transactions on one side and analytical processing on the other can all require the well-known overhead of maintaining a persistent recovery log, which can significantly slow down the processing of the combined database, in addition to the suboptimal use of changes from the source database in the target database. Therefore, to achieve higher performance values for combined transactional/analytical database management systems and at the same time elegantly solve the recovery challenges of the target database, it may be necessary to reduce the required overhead.

米国特許出願公開第２０１５／０２０５８５３（Ａ１）号U.S. Patent Application Publication No. 2015/0 205 853(A1) 米国特許出願公開第２０１４／０２７９９３０（Ａ１）号US Patent Application Publication No. 2014/0 279 930(A1)

本発明の１つの態様によると、リンクド・データベースに対するクラッシュ回復のためのコンピュータ実施方法が提供されてもよい。リンクド・データベースはソース・データベースと、関連ターゲット・データベースとを含んでもよく、ソース・データベースを含むデータベース管理システムに対する選択されたクエリが、処理のためにターゲット・データベースを含むデータベース管理システムに移されてもよい。この方法は、ソース・データベースのテーブルの内容の選択された部分を、ターゲット・データベースのテーブルの内容のそれぞれの部分と同期させることと、同期中に、ソース・データベースの回復ログ・ファイルを用いてターゲット・データベースを含むデータベース管理システムのメモリ内ターゲット・データベース部分にソース・データベースに対する変更を適用することと、メモリ内ターゲット・データベース部分に対する永続的に適用される変更を永続ターゲット・データベース・ストレージに非同期的に記憶することとを含んでもよい。 According to one aspect of the present invention, a computer-implemented method for crash recovery for linked databases may be provided. The linked databases may include a source database and an associated target database, and selected queries for a database management system including the source database may be transferred to a database management system including the target database for processing. The method may include synchronizing selected portions of the contents of tables in the source database with respective portions of the contents of tables in the target database; applying changes to the source database to an in-memory target database portion of the database management system including the target database using a recovery log file of the source database during synchronization; and asynchronously storing the persistently applied changes to the in-memory target database portion to persistent target database storage.

この方法は、ターゲット・データベース・システムのデータベース・クラッシュの際に、永続ターゲット・データベース・ストレージにおいて利用可能な最新のスナップショットによってメモリ内ターゲット・データベース部分を復元することと、ターゲット・データベース・システムのデータベース・クラッシュの際に、永続ターゲット・データベース・ストレージにおいて利用可能な最新のスナップショットよりも後のタイムスタンプを有するソース・データベース回復ログ・ファイルからの変更をメモリ内ターゲット・データベース部分に適用することとを含んでもよい。 The method may include, upon a database crash of the target database system, restoring the in-memory target database portion with the most recent snapshot available in persistent target database storage, and, upon a database crash of the target database system, applying changes from a source database recovery log file having a timestamp later than the most recent snapshot available in persistent target database storage to the in-memory target database portion.

本発明の別の態様によると、リンクド・データベースに対するクラッシュ回復を有するリンクド・データベース・システムが提供されてもよい。リンクド・データベースはソース・データベースと、関連ターゲット・データベースとを含んでもよく、ソース・データベースのテーブルの内容の選択された部分が、ターゲット・データベースのテーブルの内容のそれぞれの部分と同期されてもよい。リンクド・データベース・システムはプロセッサと、プロセッサに通信的に結合されたメモリとを含んでもよく、メモリはプログラム・コード部分を記憶してもよく、このプログラム・コード部分は実行されたときに、プロセッサが、ソース・データベースのテーブルの内容の選択された部分を、ターゲット・データベースのテーブルの内容のそれぞれの部分と同期させることと、同期中に、ソース・データベースの回復ログ・ファイルを用いてターゲット・データベースを含むデータベース管理システムのメモリ内ターゲット・データベース部分にソース・データベースに対する変更を適用することと、メモリ内ターゲット・データベース部分に対する永続的に適用される変更を永続ターゲット・データベース・ストレージに非同期的に記憶することとを可能にする。 According to another aspect of the present invention, a linked database system having crash recovery for the linked database may be provided. The linked database may include a source database and an associated target database, and selected portions of the contents of tables in the source database may be synchronized with respective portions of the contents of tables in the target database. The linked database system may include a processor and memory communicatively coupled to the processor, the memory storing program code portions that, when executed, enable the processor to synchronize selected portions of the contents of tables in the source database with respective portions of the contents of tables in the target database, apply changes to the source database to an in-memory target database portion of a database management system including the target database using a recovery log file of the source database during synchronization, and asynchronously store the persistently applied changes to the in-memory target database portion in persistent target database storage.

加えてメモリは、実行されたときに、プロセッサが、ターゲット・データベース・システムのデータベース・クラッシュの際に、永続ターゲット・データベース・ストレージにおいて利用可能な最新のスナップショットによってメモリ内ターゲット・データベース部分を復元することと、ターゲット・データベース・システムのデータベース・クラッシュの際に、永続ターゲット・データベース・ストレージにおいて利用可能な最新のスナップショットよりも後のタイムスタンプを有するソース・データベース回復ログ・ファイルからの変更をメモリ内ターゲット・データベース部分に適用することとを可能にするプログラム・コード部分を記憶してもよい。 In addition, the memory may store program code portions that, when executed, enable the processor to restore the in-memory target database portion with the most recent snapshot available in the persistent target database storage in the event of a database crash of the target database system, and to apply changes from a source database recovery log file having a later timestamp than the most recent snapshot available in the persistent target database storage to the in-memory target database portion in the event of a database crash of the target database system.

リンクド・データベースに対するクラッシュ回復のための提案されるコンピュータ実施方法は、多数の利点、技術効果、寄与、もしくは改善、またはその組み合わせを提供してもよい。 The proposed computer-implemented method for crash recovery for linked databases may provide numerous advantages, technical effects, contributions, or improvements, or combinations thereof.

メモリ内部分に対する更新および変更ならびにクエリは、変更をターゲット・データベースにも永続的に記憶することによる任意の減速を伴わずに継続されてもよい。よって、物理データベース・データ構造のロギングおよび状態記憶は、データベース・トランザクション内のＩＵＤ処理から分離されてもよい。このことは、ターゲット・データベース、特にメモリ内部分に対する変更が行われるときに永続性フェーズがスキップされ、更新動作が付加的な遅延なしに続き得ることを意味する。したがって、ターゲット・データベースのクラッシュ回復のために必要とされ得る記録は、専用の非同期的データ永続性サービスによって非同期的に書込まれてもよい。ＩＵＤトランザクションによって行われた変更はデータのメモリ内表現のみに適用され、そこでそれらの変更は、永続的媒体にすでに記憶されたか否かにかかわらず、ターゲット・データベース、すなわちアクセラレータにオフロードされるデータベース・クエリによって直ちに処理されてもよい。このことは、データベースのＡＣＩＤ特性の耐久性の制約を引き上げることとも解釈されてもよい。 Updates and modifications to the in-memory portion, as well as queries, may continue without any slowdown due to also durably storing the changes to the target database. Thus, logging and state storage of physical database data structures may be separated from IUD processing within the database transaction. This means that the persistence phase is skipped when changes to the target database, particularly the in-memory portion, are made, and update operations may continue without additional delay. Therefore, records that may be needed for crash recovery of the target database may be written asynchronously by a dedicated asynchronous data persistence service. Changes made by IUD transactions apply only to the in-memory representation of the data, so they may be immediately processed by the target database, i.e., database queries that are offloaded to the accelerator, regardless of whether they have already been stored to a durable medium. This may also be interpreted as lifting the durability constraints of the database's ACID properties.

メモリ内データ構造からそれらの永続的対応物に関連する変更が非同期的に書込まれるため、データベース・クラッシュの場合には変更のいくつかがターゲット・データベース・システムにおいて紛失されることがある。したがってクラッシュ回復では、紛失した変更を再現することによって、不完全な可能性のあるスナップショットをソース・データベース・システムと同期させる必要があるかもしれない。他のソリューションではオフロードされたテーブルがバルク・ロード機構によって完全にリロードされないのとは異なり、紛失部分が増分的に回復されてもよい。すなわち、ターゲット・データベースの最新の非同期的に永続的なスナップショットからターゲット・データベース状態が回復されてもよく、回復メタデータの助けによって、どの付加的変更がなおも紛失している可能性があるかが決定される。すなわち、ターゲット・データベースの現在の一貫した状態を最終的に復元するために、ソース・データベースからのどの変更をソース・データベース・システムのトランザクションまたは回復ログ・ファイルからターゲット・データベースによって再現する必要があるかである。 Because changes related to in-memory data structures are written asynchronously to their persistent counterparts, in the event of a database crash, some changes may be lost in the target database system. Crash recovery may therefore need to synchronize a potentially incomplete snapshot with the source database system by replaying the lost changes. Unlike other solutions, where offloaded tables are not fully reloaded by a bulk load mechanism, the missing portions may be recovered incrementally. That is, the target database state may be recovered from the latest asynchronously persistent snapshot of the target database, with the help of recovery metadata to determine which additional changes may still be lost; that is, which changes from the source database need to be replayed by the target database from the source database system's transaction or recovery log files to ultimately restore the current consistent state of the target database.

このことによって、ターゲット・データベースのメモリ内部分に一貫した更新状態が再確立されるまで回復プロセスがアクティブである間は、ターゲット・データベースにおける入来クエリを遅延させることが可能になる。回復によって引き起こされる遅延は公知のソリューションと比較するとかなり低いかもしれないが、ターゲット・データベース・システムにおいてデータベース・スナップショットを非同期的に維持するための処理およびストレージのコストはそれより少し高いかもしれない。しかし、これはターゲット・データベース・システムのより高い可用性および短期の回復時間に対しては低価格にすぎない。 This allows incoming queries on the target database to be delayed while the recovery process is active until a consistent update state is re-established in the in-memory portion of the target database. While the delay caused by recovery may be significantly lower compared to known solutions, the processing and storage costs of asynchronously maintaining the database snapshots in the target database system may be slightly higher. However, this is a small price to pay for higher availability and shorter recovery times for the target database system.

ターゲット・データベースに対する変更の非同期の永続ストレージは、ロギングをより効率的に実行することも助けてもよく、たとえばバッチ・プロセスなどによってＩ／Ｏ動作がより効率的に実行されてもよく、または自己相殺される変更が共に削除されてもよい。すなわち、ターゲット・データベースの変更の次の永続ストレージ・ステップの前に回復され得るやり方で変更されたターゲット記録は、ターゲット・データベースと永続記憶媒体との間のバッファにおいて削除され得る。 Asynchronous persistent storage of changes to the target database may also help perform logging more efficiently, I/O operations may be performed more efficiently, for example by a batch process, or self-cancelling changes may be deleted altogether. That is, target records that have been modified in a way that can be recovered before the next persistent storage step of the target database changes may be deleted in a buffer between the target database and the persistent storage medium.

以下において、方法およびシステムに適用可能な本発明の概念の追加の実施形態が説明されることとなる。 Further embodiments of the inventive concepts applicable to methods and systems will be described below.

この方法の１つの興味深い実施形態によると、同期させることは、ソース・データベースに関係する回復ログ・ファイルのエントリを読取ることと、読取ったエントリをターゲット・データベースに適用することとを含んでもよい。これは、ターゲット・データベース管理システムの管理部分によって行われてもよい。しかし、同期のための他の技術も用いられてもよく、それはたとえばＳＱＬ（構造化クエリ言語（ｓｔｒｕｃｔｕｒｅｄｑｕｅｒｙｌａｎｇｕａｇｅ））またはＱ、すなわち列ベースのデータベースに対して最適化されたクエリ（ｑｕｅｒｙ）言語に基づくものなどである。 According to one interesting embodiment of the method, synchronizing may involve reading entries from the recovery log file relating to the source database and applying the read entries to the target database. This may be done by the management part of the target database management system. However, other techniques for synchronization may also be used, such as those based on SQL (structured query language) or Q, a query language optimized for column-based databases.

この方法の１つの有用な実施形態によると、トランザクション、すなわちオンライン・トランザクション処理（ＯＬＴＰ：ｏｎｌｉｎｅｔｒａｎｓａｃｔｉｏｎｐｒｏｃｅｓｓｉｎｇ）に対してソース・データベースが最適化されてもよく、もしくはソース・データベースは行指向のリレーショナルＤＢＭＳであり、またはその両方である。これは、エンタープライズＩＴ（情報技術）動作の動作バックボーンを表して効率的にサポートしてもよい。行指向データベースは、データベース内のデータに対するバランスの取れた高速の読取り、書込み、および変更動作に対して最適化されてもよい。行指向データベースは、報告を生成するためにも有効であってもよい。しかし、このタイプのデータベースはしばしば、分析タイプの動作にとって最適ではない。 According to one useful embodiment of this method, the source database may be optimized for transactions, i.e., online transaction processing (OLTP), or the source database may be a row-oriented relational DBMS, or both. This may represent and efficiently support the operational backbone of enterprise IT (information technology) operations. A row-oriented database may be optimized for balanced, high-speed read, write, and modify operations on the data within the database. A row-oriented database may also be effective for generating reports; however, this type of database is often not optimal for analytical-type operations.

この方法の１つの許容される実施形態によると、分析動作、すなわちオンライン分析処理（ＯＬＡＰ：ｏｎｌｉｎｅａｎａｌｙｔｉｃａｌｐｒｏｃｅｓｓｉｎｇ）に対してターゲット・データベースが最適化されてもよく、もしくはターゲット・データベースは列指向データベースであってもよく、またはその両方であってもよい。さらなる実施形態によると、このタイプのデータベースは列指向データベースであってもよい。このデータベースは、ＯＬＴＰ最適化データベース・システムよりもかなり良好に、多くの依存性および相互相関を有するクエリをサポートしてもよい。 According to one acceptable embodiment of this method, the target database may be optimized for analytical operations, i.e., online analytical processing (OLAP), or the target database may be a column-oriented database, or both. According to a further embodiment, this type of database may be a column-oriented database, which may support queries with many dependencies and intercorrelations significantly better than OLAP-optimized database systems.

有利な実施形態によると、この方法は、ターゲット・データベースのクラッシュの場合に、ターゲット・データベースの回復が終了し得るまでターゲット・データベースに対するクエリを遅延させることも含んでもよい。よって、回復プロセスはユーザにとって透過的であってもよい。ユーザは最終的に、自身の分析クエリのわずかな遅延を経験するかもしれない。しかし、複雑な分析クエリに対する応答時間はいずれにしても比較的長いため、ユーザはその遅延にまったく気付かないかもしれない。 According to an advantageous embodiment, the method may also include, in the event of a target database crash, delaying queries to the target database until recovery of the target database can be completed. Thus, the recovery process may be transparent to the user. The user may ultimately experience a slight delay in their analytical queries. However, since response times for complex analytical queries are relatively long in any case, the user may not notice the delay at all.

任意選択で、ターゲット・データベースの回復プロセスの間に、ソース・データベースによってターゲット・データベースをターゲットとするクエリが実行されてもよい。これはトランザクション中にソース・データベースを少し減速させるかもしれないが、全体的なユーザの満足度に対してこれは許容可能な妥協であろう。 Optionally, queries targeting the target database may be executed by the source database during the target database recovery process. This may slow down the source database slightly during the transaction, but this may be an acceptable compromise for overall user satisfaction.

この方法の好ましい実施形態によると、選択されたテーブルを定義するメタデータは、回復ログ・ファイルの一部であってもよい。このやり方で、メモリ内ターゲット・データベースの一般的アーキテクチャが、ソース・データベースの回復ログ・ファイルにおいてすでに定義されていてもよい。このことは、ソース・データベースにミラーリングされるべきテーブルのデータの部分にも当てはまることがある。よって、コンフリクトの状況が起こり得ないように、データ定義に対する単一のソースが用いられてもよい。 According to a preferred embodiment of this method, the metadata defining the selected tables may be part of the recovery log file. In this way, the general architecture of the in-memory target database may already be defined in the recovery log file of the source database. This may also apply to the data portion of the tables to be mirrored to the source database. Thus, a single source for data definitions may be used so that conflict situations cannot arise.

この方法の進歩した実施形態によると、永続的に適用される変更の記憶は、メモリ内ターゲット・データベース部分において予め定められた数の変更が完了し得るまで待機することを含んでもよい。予め定められた数は、データベースのセットアップの際に構成可能であってもよく、もしくはデータベース管理システムの動作中にも変更されてもよく、またはその両方であってもよい。しかし、このアプローチの利点は、ターゲット・データベースの分析動作を減速させないことであってもよい。よって、ターゲット・データベースに対する更新のグループが、１回で抽出されて永続的に記憶されてもよい。ターゲット・データベースに対する予め定められた数の変更を用いることに加えて、ターゲット・データベースに対するロードを監視して、ターゲット・データベースに対する分析ロードが比較的低い期間中にターゲット・データベースに対する変更の永続ストレージを実行することも考えられてもよい。 According to an advanced embodiment of this method, storing the persistently applied changes may include waiting until a predetermined number of changes can be completed in the in-memory target database portion. The predetermined number may be configurable during database setup, or may be changed during operation of the database management system, or both. However, an advantage of this approach may be that it does not slow down analysis operations on the target database. Thus, a group of updates to the target database may be extracted and persistently stored at one time. In addition to using a predetermined number of changes to the target database, it may also be considered to monitor the load on the target database and perform persistent storage of changes to the target database during periods of relatively low analysis load on the target database.

この特徴は、メモリ内部分において予め定められた数の変更が完了するまで待機し得る別個のスレッドまたはプロセス内に実装されてもよい。最後の実行以後に完了した変更のセット内の各変更について、それらの変更がターゲット・データベースの永続データベース部分に持続される。加えて、回復フェーズ中に変更再現点を識別可能にするために、最近持続された変更（たとえばソース・データベース・システム内の最後の対応するブロック記録シーケンス番号など）を記述する関連メタデータも永続的に記憶される。これによって、記憶された最後の一貫して変更されたセットの時点が可能であってもよい。この時点は、ターゲット・データベースの永続的部分からターゲット・データベースのメモリ内部分を回復させて、ソース・データベースの回復ログ・ファイルからターゲット・データベースのメモリ内部分の残りのＩＵＤを増分的に回復させるために有用であってもよい。これらの追加のデータは、ターゲット・データベースの回復が要求される場合に、ソース・データベースまたは増分更新プロセスから要求される。回復プロセスの終わりに、クエリ・プロセッサは「再アクティブ」信号によって、以後クエリがターゲット・データベースから処理され得ることを通知されてもよい。 This feature may be implemented in a separate thread or process that may wait until a predetermined number of changes are completed in the in-memory portion. For each change in the set of changes that has completed since the last execution, the changes are persisted to the persistent database portion of the target database. In addition, to enable identification of change replay points during the recovery phase, associated metadata describing the most recently persisted change (e.g., the last corresponding block record sequence number in the source database system) may also be persistently stored. This may enable the time of the last consistently changed set stored. This time may be useful for recovering the in-memory portion of the target database from the persistent portion of the target database and incrementally recovering the remaining IUDs of the in-memory portion of the target database from the source database recovery log file. This additional data is requested from the source database or the incremental update process when recovery of the target database is required. At the end of the recovery process, the query processor may be notified by a "reactive" signal that queries may now be processed from the target database.

この方法の別の有利な実施形態によると、メモリ内ターゲット・データベース部分のテーブルまたはそのパーティションを復元することは、データ使用、クエリ優先順位、およびデータ優先順位からなる群より選択される１つによって、回復させることまたは回復を優先順位付けすることまたはその優先順位付けを含んでもよい。 According to another advantageous embodiment of the method, restoring a table or partition thereof in the in-memory target database portion may include prioritizing or restoring the table or partition by one selected from the group consisting of data usage, query priority, and data priority.

これらのオプションは、以下の段落で詳述される。しかし一般的な概念は、ターゲット・データベースが「再構築中」、すなわち回復モードであってもクエリの実行を可能にするために、回復プロセス中に異なる最適化方法を可能にすることである。これらの最適化オプションは構成可能であってもよく、かつ一般的に１つの実装に組み合わせ可能であってもよい。 These options are described in more detail in the following paragraphs, but the general concept is to enable different optimization methods during the recovery process in order to allow queries to be executed even while the target database is "reconstructing", i.e. in recovery mode. These optimization options may be configurable and may generally be combinable in one implementation.

この方法の１つの任意選択の実施形態によると、データ使用によって回復を優先順位付けすることは、ターゲット・データベースにおける各テーブルに対するカウンタ、またはそのパーティションに対するカウンタを維持することを含んでもよい。カウンタのカウンタ値は、関連テーブルをいくつのクエリが待機している可能性があるかを示してもよく、最初に最高カウンタ値を有するデータベース・テーブルが１番に復元される。その後、その他のテーブルが続いて回復されてもよい。このオプションは、需要が高いターゲット・データベースのテーブルの迅速な回復を可能にしてもよい。よってユーザには、データベースの完全に回復された部分、すなわち需要が高いテーブルができる限り早くプロビジョニングされてもよい。このオプションは、需要またはデータ使用最適化回復として示されてもよい。 According to one optional embodiment of this method, prioritizing recovery by data usage may include maintaining a counter for each table in the target database, or a counter for its partitions. The counter value may indicate how many queries may be waiting for related tables, with the database table with the highest counter value being restored first. Other tables may then be restored subsequently. This option may enable rapid recovery of tables in the target database that are in high demand. Thus, users may be provisioned with fully restored portions of the database, i.e., tables with high demand, as soon as possible. This option may be referred to as demand or data usage optimized recovery.

よって、回復プロセスを制御して回復時間を最小化するために、待機クエリのリストが考慮されてもよい。回復が進行している間にターゲット・データベースに新たなデータベース・クエリが到着したときは、そのデータ・アクセスが分析されてもよい。もしそのデータベース・クエリが非回復データにアクセスしなければ、それは直ちに処理されてもよい。そうでないときは、そのクエリは回復状態の一部として待機クエリのリストに登録される。実際にクエリを妨げているテーブルもしくはテーブル・パーティションまたはその両方を優先させた一連の回復動作を導出するために、回復スケジュールによって回復段階が評価される。回復の完了の際に利用可能な回復戦略のうちの１つを介して最も有益な回復動作がスケジュールされてもよく、回復状態が更新されてもよく、回復したテーブルもしくはパーティションまたはその両方に対する待機クエリは、もはや遮断される必要がないときには通知されてもよい。 Therefore, a list of waiting queries may be taken into account to control the recovery process and minimize recovery time. When a new database query arrives at the target database while recovery is in progress, its data access may be analyzed. If the database query does not access non-recovered data, it may be processed immediately. Otherwise, the query is registered in the list of waiting queries as part of the recovery state. The recovery phase is evaluated by a recovery schedule to derive a set of recovery actions that prioritizes tables and/or table partitions that are actually blocking queries. Upon completion of recovery, the most beneficial recovery action may be scheduled via one of the available recovery strategies, the recovery state may be updated, and waiting queries against recovered tables and/or partitions may be notified when they no longer need to be blocked.

より詳細な観点において、ターゲット・データベースに対するクラッシュ回復は次のとおりに記載され得る。第１に、ターゲット・データベースにおける既知の回復メタデータを用いることによって回復させる必要のあるテーブルまたはテーブル・パーティションのリスト。第２に、回復させるべきテーブルまたはテーブル・パーティションのリストが回復状態の中に記憶される。第３に、回復させるべきもののリストが空でないときは、以下が実行されてもよい。
（ｉ）リストから次の回復させるべきテーブルもしくはテーブル・パーティションまたはその両方を決定すること、
（ｉｉ）ターゲット・データベースに対して構成された回復戦略を介して回復動作をスケジュールすること、
（ｉｉｉ）回復動作が完了するまで待機すること、
（ｉｖ）回復状態を更新すること、すなわち、回復させるべきテーブル／パーティションをマーク付けし、そのテーブルもしくはテーブル・パーティションまたはその両方を待機しているすべてのクエリの遮断データ・リストからそのテーブル／パーティションを除去すること、および
（ｖ）遮断データ・リストが空になったクエリのリストを決定し、現在そのクエリを処理可能であることをクエリ・プロセッサに通知すること。 In more detail, crash recovery for a target database can be described as follows: First, a list of tables or table partitions that need to be recovered by using known recovery metadata in the target database. Second, the list of tables or table partitions to be recovered is stored in the recovery state. Third, if the list of what to recover is not empty, the following may be done:
(i) determining the next table and/or table partition to recover from the list;
(ii) scheduling recovery operations via a recovery strategy configured for the target database;
(iii) waiting for the recovery action to complete;
(iv) updating the recovery state, i.e., marking the table/partition to be recovered and removing it from the blocked data lists of all queries waiting for that table and/or table partition; and (v) determining the list of queries whose blocked data lists are now empty and notifying the query processor that the query can now be processed.

それによって、次の回復サイクルに対する次の回復アイテムを動的に算出するための以下に記載されるアルゴリズムを用いて、回復させるべきテーブルもしくはテーブル・パーティションまたはその両方のリストが優先順位キューによって実装されてもよい。回復プロセスを加速させるために、（ｉｉｉ）で言及されるステップが複数のテーブルもしくはテーブル・パーティションまたはその両方に対して並行して実行されてもよい。 Thereby, the list of tables and/or table partitions to be recovered may be implemented by a priority queue, using the algorithm described below to dynamically calculate the next recovery item for the next recovery cycle. To accelerate the recovery process, the steps referred to in (iii) may be performed in parallel for multiple tables and/or table partitions.

回復させるべきテーブルもしくはテーブル・パーティションまたはその両方の決定は、次のとおりに実行されてもよい。
（ｉ）クラッシュ回復プロセスが開始されるとき、遮断テーブルもしくはテーブル・パーティションまたはその両方の空のヒストグラムを作成すること、
（ｉｉ）回復状態において遮断クエリが登録されるとき、そのクエリの遮断エリア・リストにおける各テーブルもしくはテーブル・パーティションまたはその両方に対する使用カウンタを１増加させること、
（ｉｉｉ）次の回復させるべきテーブル／テーブル・パーティションを決定する必要があるとき、最大使用カウンタを有するテーブル／テーブル・パーティションを選択すること、および
（ｉｖ）テーブル／テーブル・パーティション回復動作が終了したとき、対応するヒストグラム・データを除去すること。 The determination of the tables and/or table partitions to recover may be performed as follows.
(i) creating an empty histogram of the shut-off table and/or table partition when the crash recovery process is initiated;
(ii) when a blocking query is registered in the recovery state, incrementing a usage counter for each table and/or table partition in the blocking area list of the query by one;
(iii) when it is necessary to determine the next table/table partition to be recovered, selecting the table/table partition with the highest usage counter, and (iv) when the table/table partition recovery operation is completed, removing the corresponding histogram data.

この方法の別の任意選択の実施形態によると、クエリ優先順位によって回復を優先順位付けすることは、最高優先順位を有するクエリを受信するデータベース・テーブルを最初に復元することを含んでもよい。こうした優先順位値は、たとえば生産データベース対テスト・データベースなど、データベース・システムに割り当てられてもよいし、個々のクエリに基づいて割り当てられてもよい。このオプションは、クエリ優先順位または単純にデータ優先順位最適化回復として示されてもよい。 According to another optional embodiment of this method, prioritizing recovery by query priority may include first restoring database tables that receive queries with the highest priority. Such priority values may be assigned to the database system, e.g., production databases versus test databases, or may be assigned based on individual queries. This option may be referred to as query priority or simply data priority optimized recovery.

この方法のさらなる任意選択の実施形態によると、データ優先順位によって回復を優先順位付けすることは、（少なくとも）２グループのデータベース・テーブルを維持することであって、各グループはユーザの別個のグループに関係する、維持することと、より高い構成グループ優先順位を有するグループのデータベース・テーブルを最初に復元することとを含んでもよい。こうした状況はマルチユーザ／複数グループ／マルチテナント環境において起こってもよく、ここで１つのユーザまたはグループまたはテナントに、クエリに対するより高い優先順位が割り当てられてもよい。たとえば、１つのテナントがデータベース・システムの可用性をより高く保証されていてもよい。こうした場合に、このテナントにはより高い優先順位が割り当てられてもよい。こうしたシナリオは、マルチテナント・データベースがクラウド・コンピューティング・データ・センタにおいて動作されるときに最も良好に機能してもよい。このオプションは、顧客優先順位最適化回復として示されてもよい。 According to a further optional embodiment of this method, prioritizing recovery by data priority may include maintaining (at least) two groups of database tables, each group relating to a distinct group of users, and first restoring the database tables of the group with a higher constituent group priority. Such a situation may arise in a multi-user/multi-group/multi-tenant environment, where one user, group, or tenant may be assigned a higher priority for queries. For example, one tenant may have a higher guarantee of availability of the database system. In such a case, this tenant may be assigned a higher priority. Such a scenario may work best when the multi-tenant database is operated in a cloud computing data center. This option may be denoted as customer priority optimized recovery.

別の興味深い実施形態によると、この方法は、次の回復させるべきテーブルのために回復させるべきデータ・ボリュームを決定することと、回復させるべきボリュームに依存する回復戦略を用いてそのテーブル（単数または複数）を回復させることとをさらに含んでもよい。それによって、回復戦略は増分更新戦略またはバルク更新戦略である。よって、ターゲット・データベースの合計回復時間を最小化するために、どの更新戦略を用いるかがデータベース・テーブル（またはデータベース・テーブルのグループ）ごとに決定されてもよい。このオプションは、時間最適化回復として示されてもよい。 According to another interesting embodiment, the method may further include determining a data volume to be restored for the next table to be restored and recovering that table(s) using a recovery strategy that depends on the volume to be restored. The recovery strategy is thereby an incremental update strategy or a bulk update strategy. Thus, it may be determined for each database table (or group of database tables) which update strategy to use in order to minimize the total recovery time of the target database. This option may be denoted as time-optimized recovery.

さらに実施形態は、コンピュータもしくは任意の命令実行システムによる使用、またはそれに関連した使用のためのプログラム・コードを提供するコンピュータ使用可能媒体もしくはコンピュータ可読媒体からアクセス可能な、関連コンピュータ・プログラム製品の形態を取ってもよい。この記載の目的のためのコンピュータ使用可能媒体またはコンピュータ可読媒体は、命令実行システム、装置、もしくはデバイスによる使用、またはそれに関連した使用のためのプログラムの記憶、通信、伝播、または移送のための手段を含み得る任意の装置であってもよい。 Furthermore, embodiments may take the form of a related computer program product accessible from a computer-usable or computer-readable medium that provides program code for use by or in connection with a computer or any instruction execution system. For purposes of this description, a computer-usable or computer-readable medium may be any apparatus that may include means for storing, communicating, propagating, or transporting a program for use by or in connection with an instruction execution system, apparatus, or device.

なお、本発明の実施形態は、異なる主題を参照して記載される。特に、いくつかの実施形態は方法タイプの請求項を参照して記載されるのに対し、他の実施形態は装置タイプの請求項を参照して記載される。しかし、当業者は上記および下記の記載から、別様のことが示されない限り、１つのタイプの主題に属する特徴の任意の組み合わせに加えて、異なる主題に関する特徴の任意の組み合わせ、特に方法タイプの請求項の特徴と装置タイプの請求項の特徴との任意の組み合わせもこの文書に開示されるものとみなされることを推測するだろう。 It should be noted that embodiments of the present invention are described with reference to different subject matter. In particular, some embodiments are described with reference to method-type claims, while other embodiments are described with reference to apparatus-type claims. However, those skilled in the art will infer from the above and below that, unless otherwise indicated, any combination of features belonging to one type of subject matter, as well as any combination of features relating to different subject matter, in particular any combination of features from a method-type claim with a feature from an apparatus-type claim, is also considered to be disclosed in this document.

上記に定義される態様および本発明のさらなる態様は、以後説明される実施形態の例から明らかであり、実施形態の例を参照して説明されるが、本発明はそれに限定されない。 The above-defined aspects and further aspects of the present invention will be apparent from and will be explained with reference to the exemplary embodiments described hereinafter, but the present invention is not limited thereto.

以下の図面を参照して、単なる例として、本発明の好ましい実施形態を説明する。 Preferred embodiments of the present invention will now be described, by way of example only, with reference to the following drawings:

リンクド・データベースに対するクラッシュ回復のための本発明のコンピュータ実施方法の実施形態を示すブロック図である。1 is a block diagram illustrating an embodiment of the computer-implemented method of the present invention for crash recovery for linked databases. リンクド・データベースの実施形態を示すブロック図である。FIG. 1 is a block diagram illustrating an embodiment of a linked database. ターゲット・データベースがどのように同期されてもよいかの実施形態を示すブロック図である。FIG. 2 is a block diagram illustrating an embodiment of how target databases may be synchronized. より実装に近い形態の提案される概念の実施形態を示すブロック図である。FIG. 1 is a block diagram showing an embodiment of the proposed concept in a form closer to implementation. 顧客優先順位最適化回復戦略のためのコンポーネントを含むリンクド・データベース・システムの実施形態を示すブロック図である。1 is a block diagram illustrating an embodiment of a linked database system including components for a customer priority optimization recovery strategy. ボリューム最適化回復戦略のためのコンポーネントを含むリンクド・データベース・システムの実施形態を示すブロック図である。1 is a block diagram illustrating an embodiment of a linked database system including components for a volume-optimized recovery strategy. リンクド・データベースに対するクラッシュ回復のためのリンクド・データベース・システムの実施形態を示すブロック図である。1 is a block diagram illustrating an embodiment of a linked database system for crash recovery for linked databases. リンクド・データベース・システムを含むコンピュータ・システムの実施形態を示す図である。1 illustrates an embodiment of a computer system including a linked database system.

この記載の文脈において、以下の慣例、用語、もしくは表現、またはその組み合わせが用いられてもよい。 In the context of this description, the following conventions, terms, or expressions, or combinations thereof, may be used:

「クラッシュ回復」という用語は、クラッシュが起こる前のデータベースの状態を再構築するプロセスを示してもよい。クラッシュが起こったとき、データは利用可能でないか、または非一貫性であってもよい。 The term "crash recovery" may refer to the process of reconstructing the state of a database before a crash occurred. When a crash occurs, data may be unavailable or inconsistent.

「リンクド・データベース」という用語は、互いに密接に関係する少なくとも２つのデータベースを示してもよい。この文書の文脈において、リンクド・データベースは、同一のデータを少なくとも部分的に記憶することがあるデータベースとして示されてもよい。他方で、こうした対におけるプライマリ・データベースは、セカンダリ・データベースとは異なるタスクに対して最適化されてもよい。 The term "linked database" may refer to at least two databases that are closely related to each other. In the context of this document, linked databases may be referred to as databases that may store at least part of the same data. On the other hand, the primary database in such a pair may be optimized for a different task than the secondary database.

「ソース・データベース」または「プライマリ・データベース」という用語は、たとえば高速のトランザクション、すなわちオンライン・トランザクション処理などに対して最適化されたデータベースを示してもよい。しかし、こうしたやり方で、すなわちデータに対する高速の読取り、書込み、更新動作に対して最適化されたデータベースは、たとえばオンライン分析処理の場合などにおける多数のテーブルまたは多数のデータを伴う複雑なクエリの実行が低速のことがある。加えて、オンライン分析処理はオンライン・トランザクション処理を減速させることがある。したがって、今言及したタイプのデータベースの高度に最適化されたデータベース管理システムは、タンデムとして正常に機能してもよい。 The terms "source database" or "primary database" may refer to a database that is optimized for, for example, high-speed transactions, i.e., online transaction processing. However, a database optimized in this manner, i.e., for high-speed read, write, and update operations on data, may be slow to execute complex queries involving many tables or large amounts of data, such as in the case of online analytical processing. In addition, online analytical processing may slow down online transaction processing. Thus, highly optimized database management systems for databases of the types just mentioned may function successfully in tandem.

「ターゲット・データベース」または「セカンダリ・データベース」という用語は、異なるタスクに対して最適化されたデータベースのこうしたタンデムにおける第２のデータベースを示してもよい。本明細書に記載される概念の文脈において、ターゲット・データベースは、オンライン分析処理に対して最適化されてもよい。ターゲット・データベースは、ソース・データベースのテーブルの少なくとも一部およびテーブルのデータの一部を記憶してもよい。加えて、ターゲット・データベースは次の２つの部分を含んでもよい。複雑な多次元のクエリの高速実行のためのメモリ内部分、およびたとえばハード・ディスクまたはフラッシュ・メモリなどのより長期的ストレージにターゲット・データベースのメモリ内部分のテーブルおよびデータを記憶し得る永続的部分である。このやり方で、ターゲット・データベースは、ターゲット・データベースのクラッシュの場合に永続ストレージからその内容のほとんどを回復できるようにされてもよい。 The terms "target database" or "secondary database" may refer to the second database in such a tandem of databases optimized for different tasks. In the context of the concepts described herein, the target database may be optimized for online analytical processing. The target database may store at least some of the tables and some of the table data of the source database. In addition, the target database may include two parts: an in-memory part for fast execution of complex multidimensional queries, and a persistent part that may store the tables and data of the in-memory portion of the target database in more long-term storage, such as a hard disk or flash memory. In this way, the target database may be able to recover most of its contents from persistent storage in the event of a target database crash.

「テーブルの内容の選択された部分」という用語は、ターゲット・データベースにおいて同期してコピーおよび保存され得る、プライマリ・データベースのテーブルの部分のデータの、今言及した部分またはパーティションを示してもよい。 The term "selected portion of the table contents" may refer to the just-mentioned portion or partition of data in the portion of the table in the primary database that may be synchronously copied and stored in the target database.

「選択されたクエリ」という用語は、クエリの性質に基づいて、２つのデータベースのうちの一方、特にターゲット・データベースによって実行された方が良いと考えられる、リンクド・データベースに向けられた特定のタイプのクエリを示してもよい。たとえば、クエリ・タイプがオンライン分析処理に関係するとき、そのクエリはターゲット・データベースに転送されてもよく、ソース・データベースによって実行されなくてもよい。 The term "selected query" may refer to a particular type of query directed to a linked database that, based on the nature of the query, is better executed by one of the two databases, particularly the target database. For example, when the query type relates to online analytical processing, the query may be forwarded to the target database and not executed by the source database.

「データベース管理システム」という用語は、通常はハードウェアおよびソフトウェアの組み合わせと、データを記憶する少なくとも１つの関連データベースとにおいて実装される運営／管理システムの組み合わせを示してもよい。 The term "database management system" may refer to a combination of administrative/management systems, typically implemented in a combination of hardware and software, and at least one associated database that stores data.

「メモリ内ターゲット・データベース部分」という用語は、コンピュータ・システムのメイン・メモリにデータのほぼすべてを保持し得るターゲット・データベースの部分を示してもよい。ターゲット・データベースのデータベース管理システムは、ターゲット・データベースのメモリ内部分および永続的部分を含んでもよく、この永続的部分は、メモリ内への最後の変更は別として、メモリ内部分の永続的に記憶されたコピーであってもよい。 The term "in-memory target database portion" may refer to a portion of a target database that may hold substantially all of its data in the main memory of a computer system. A database management system for a target database may include an in-memory portion and a persistent portion of the target database, where the persistent portion may be a persistently stored copy of the in-memory portion apart from the most recent changes made to it in memory.

「永続ターゲット・データベース・ストレージ」という用語は、ターゲット・データベースのデータを永続的に、すなわちメモリ内ストレージの代わりにハード・ディスクまたはフラッシュ・メモリを用いて記憶できるようにされたターゲット・データベース管理システムの部分を記述してもよい。 The term "persistent target database storage" may describe that portion of a target database management system that allows target database data to be stored persistently, i.e., using a hard disk or flash memory instead of in-memory storage.

「最新のスナップショット」という用語は、ターゲット・データベースの最後の一貫して記憶された状態を示してもよい。 The term "latest snapshot" may refer to the last consistently stored state of the target database.

「より後のタイムスタンプ」、特により後のタイムスタンプを有する変更という用語は、たとえばターゲット・データベースの永続的部分に記憶された最新のスナップショットよりも時間的に後に作成された可能性のある時間インジケータを含むソース・データベースの回復ログ・ファイルの記録などを示してもよい。 The term "later timestamp", and in particular changes having later timestamps, may refer to, for example, records in the source database recovery log file that contain a time indicator that may have been created later in time than the most recent snapshot stored in the persistent portion of the target database.

「回復ログ・ファイル」という用語は、データベースにおいて行われる一連のファイル・プロトコル動作、特にデータを修正するすべての動作、すなわち挿入、更新、および削除動作を示してもよい。回復ログ・ファイルは、データベースの完全な再構築を可能にするように設計されてもよい。したがって、データベースのテーブル定義も回復ログ・ファイルの一部であってもよい。 The term "recovery log file" may refer to the set of file protocol operations performed on a database, specifically all operations that modify data, i.e., insert, update, and delete operations. The recovery log file may be designed to allow a complete reconstruction of the database. Therefore, the table definitions of the database may also be part of the recovery log file.

「メタデータ」という用語は、データに関するデータ、特にデータベース中のテーブルのデータの定義と、潜在的にそれらの間の関係とを示してもよい。 The term "metadata" may refer to data about data, particularly the definitions of data in tables in a database, and potentially the relationships between them.

以下に、図面の詳細な説明を与えることとする。図面中のすべての命令は概略である。最初に、リンクド・データベースに対するクラッシュ回復のための本発明のコンピュータ実施方法の実施形態のブロック図が与えられる。その後、さらなる実施形態、およびリンクド・データベースに対するクラッシュ回復を有するリンクド・データベース・システムの実施形態が説明されることとなる。 Below, a detailed description of the drawings will be provided. All instructions in the drawings are schematic. First, a block diagram of an embodiment of the computer-implemented method of the present invention for crash recovery for linked databases will be provided. Thereafter, further embodiments and embodiments of a linked database system with crash recovery for linked databases will be described.

図１は、特にＩＤＡＡアーキテクチャによる、リンクド・データベースに対するクラッシュ回復のためのコンピュータ実施方法１００の好ましい実施形態のブロック図を示し、ここで提供される（１０２）リンクド・データベースは、ソースまたはプライマリ・データベース、特にトランザクションに対して最適化された、たとえば役割ベースのデータベースなど、および関連するターゲットまたはセカンダリ・データベースを含む。このデータベースはオンライン分析処理動作（ＯＬＡＰ）に対して最適化されてもよく、有利には列ベースで組織化されてもよい。 Figure 1 shows a block diagram of a preferred embodiment of a computer-implemented method 100 for crash recovery for linked databases, particularly according to the IDAA architecture, where the linked databases provided (102) include a source or primary database, particularly optimized for transactions, such as a role-based database, and an associated target or secondary database, which may be optimized for online analytical processing operations (OLAP) and advantageously organized on a column basis.

ソース・データベースを含むデータベース管理システムに対する選択されたクエリ、特に分析動作に向けられたクエリが、処理のためにターゲット・データベースを含むデータベース管理システムに移され、すなわちオフロードされる。 Selected queries to the database management system containing the source database, particularly queries directed to analytical operations, are transferred, or offloaded, to the database management system containing the target database for processing.

加えて方法１００は、ソース・データベースのテーブルの内容の選択された部分（いくつかのデータベースにおいてはテーブルの部分のデータの一部のみ、他の実装においては完全なコピー）を、ターゲット・データベースのテーブルの内容のそれぞれの部分と同期させる（１０４）ことを含み、それは同期中に、ソース・データベースの回復ログ・ファイルを用いて、ターゲット・データベースを含むデータベース管理システムのメモリ内ターゲット・データベース部分にソース・データベースに対する変更を適用する（１０６）ことによって行われる。 Additionally, method 100 includes synchronizing (104) selected portions of the contents of the tables of the source database (in some implementations, only a portion of the data in the portions of the tables, in other implementations, a complete copy) with respective portions of the contents of the tables of the target database by applying (106) changes to the source database during synchronization using the recovery log file of the source database to the in-memory portion of the target database of the database management system that contains the target database.

加えて方法１００は、メモリ内ターゲット・データベース部分に対する永続的に適用される変更を永続ターゲット・データベース・ストレージ部分に非同期的に記憶する（１０８）ことと、ターゲット・データベース・システムのデータベース・クラッシュの際に、永続ターゲット・データベース・ストレージにおいて利用可能な最新のスナップショットによってメモリ内ターゲット・データベース部分を復元する（１１０）ことと、ターゲット・データベース・システムのデータベース・クラッシュの際に、永続ターゲット・データベース・ストレージにおいて利用可能な最新のスナップショットよりも後のタイムスタンプを有するソース・データベース回復ログ・ファイルからの変更をメモリ内ターゲット・データベース部分に適用する（１１２）こととを含む。 Additionally, method 100 includes asynchronously storing (108) persistently applied changes to the in-memory target database portion in a persistent target database storage portion; restoring (110) the in-memory target database portion with the most recent snapshot available in the persistent target database storage in the event of a database crash of the target database system; and applying (112) changes from a source database recovery log file having a later timestamp than the most recent snapshot available in the persistent target database storage to the in-memory target database portion in the event of a database crash of the target database system.

任意選択で、ターゲット・データベースの永続的部分も並行して更新されてもよい。しかしこのためには、ターゲット・データベースのメモリ内部分の変更を永続的に記憶するために、永続ストレージ・プロセスまたはストレージ・プロセッサもアクティブである必要があるだろう。 Optionally, the persistent portion of the target database may also be updated in parallel. However, this would require a persistent storage process or storage processor to also be active to persistently store changes to the in-memory portion of the target database.

加えて、特に初期化プロセスの間に、ソース・データベースからターゲット・データベースをバルク・ロードすることが有利な場合がある。これは同等により低速の増分更新または同期プロセスを回避することを助け得る。なぜなら、ソース・データベースはすでにより多数のエントリを有し、よって自身の回復ログ・ファイル内により多数のエントリを有することがあるからである。この初期化は、選択されたテーブルまたはそのパーティションのみに対して有効であってもよい。 Additionally, it may be advantageous to bulk load the target database from the source database, especially during the initialization process. This may help avoid an equally slower incremental update or synchronization process because the source database may already have a larger number of entries and therefore a larger number of entries in its recovery log file. This initialization may be effective for only selected tables or their partitions.

図２は、リンクド・データベースの実施形態２００のブロック図を示す。プライマリまたはソース・データベース２０２は、ＯＬＴＰクエリ２１０およびＯＬＡＰクエリ２１４を受信する。ＯＬＡＰクエリ２１４として識別されたクエリは、セカンダリまたはターゲット・データベース２０６に移行またはオフロード２２２される。ソース・データベース２０２は、複数のテーブル２０４と、関連する記憶データとを含む。ターゲット・データベース２０６も、ソース・データベース２０２のデータベース・テーブル２０４の少なくともサブセットを表すテーブルと、そのデータの少なくともサブセットとをデータベース・テーブル２０８中に含む。 Figure 2 shows a block diagram of a linked database embodiment 200. A primary or source database 202 receives OLTP queries 210 and OLAP queries 214. Queries identified as OLAP queries 214 are migrated or offloaded 222 to a secondary or target database 206. The source database 202 includes multiple tables 204 and associated stored data. The target database 206 also includes tables in database tables 208 that represent at least a subset of the database tables 204 of the source database 202 and at least a subset of that data.

ターゲット・データベース２０６によってＯＬＡＰ動作が実行された後、データは返送２２４され、要求するプログラムまたはプロセスにＯＬＡＰ出力２２０が返送される。ＯＬＴＰクエリ２１０はソース・データベース２０２において直接実行され、要求するプログラムまたはプロセスにＯＬＴＰ出力２１２として返送される。よって、ＯＬＴＰまたはソース・データベース２０２は、任意のリソース集約的なＯＬＡＰクエリによって減速されないため、自身の最高パフォーマンスで動作してもよい。 After the OLAP operation is performed by the target database 206, the data is returned 224 and OLAP output 220 is returned to the requesting program or process. The OLTP query 210 is executed directly in the source database 202 and returned to the requesting program or process as OLTP output 212. Thus, the OLTP or source database 202 may operate at its peak performance without being slowed down by any resource-intensive OLAP queries.

ターゲット・データベース２０６におけるデータ組織が、たとえばソースまたはＯＬＴＰデータベース２０２における行指向の代わりに列指向であるなどして異なる可能性があるため、ターゲット・データベースは、ソース・データベース２０２よりもかなり高速でＯＬＡＰ結果を返送２２４してもよい。 Because the data organization in the target database 206 may be different, for example, column-oriented instead of row-oriented in the source or OLAP database 202, the target database may return OLAP results 224 significantly faster than the source database 202.

図３は、ターゲット・データベースがどのように同期され得るかの実施形態３００のブロック図を示す。ソース・データベース管理システム３０２は、ソース・データベース２０２およびその関連テーブル２０４の動作を制御する（図２と比較されたい）。ターゲット・データベース管理システム３０８もターゲット・データベース２０６および関連テーブル２０８について同じことが当てはまる。 Figure 3 shows a block diagram of an embodiment 300 of how target databases can be synchronized. A source database management system 302 controls the operation of the source database 202 and its associated tables 204 (compare Figure 2). The same is true for a target database management system 308, the target database 206, and its associated tables 208.

ソース・データベース管理システム３０２は、ソース・データベース２０２に対する回復ログ・ファイル３０６も維持する。回復ログ・ファイル３０６を読取るログ・リーダーまたはログ・ファイル・リーダー３１４は、これらのデータを適用ユニット３１６に提供し、適用ユニット３１６は、ソース・データベースに対して行われた変更（すなわち挿入、更新、削除）を、ターゲット・データベース２０６の選択されたテーブルおよびデータの選択されたセットに対しても適用する。選択されたテーブルおよびデータの選択されたセットは、予め定義されたとおりのソース・データベース２０２における関連テーブルおよびデータのサブセットであってもよい。適用ユニット３１６は、実行されるＯＬＡＰクエリに依存してターゲット・データベース２０６に対する変更の適用を最適化できる。このためにログ・バッファ３１８が有益であり得る。 The source database management system 302 also maintains a recovery log file 306 for the source database 202. A log reader or log file reader 314 reads the recovery log file 306 and provides this data to an apply unit 316, which applies changes (i.e., inserts, updates, deletes) made to the source database to a selected set of selected tables and data in the target database 206. The selected set of selected tables and data may be a subset of related tables and data in the source database 202 as predefined. The apply unit 316 can optimize the application of changes to the target database 206 depending on the OLAP query being executed. A log buffer 318 may be useful for this purpose.

ターゲット・データベース２０６の初期化のために、パフォーマンスの理由から、ソース・データベース２０２からターゲット・データベース２０６へのバルク・ロード動作３１２が行われ得る。 To initialize the target database 206, a bulk load operation 312 may be performed from the source database 202 to the target database 206 for performance reasons.

なお、本発明の概念を明瞭にする理由から、図３は、ターゲット・データベース管理システムをターゲット・データベース２０６およびそのテーブル２０８のメモリ内部分および永続的部分に分割することをまだ示していない。このことは次の図面で示されることとなる。 Note that for reasons of clarity of the inventive concept, Figure 3 does not yet show the division of the target database management system into in-memory and persistent portions of the target database 206 and its tables 208. This will be shown in the next figure.

なお、この同期機構は多くの同期技術のうちの１つを表すものであってもよい。他の同期技術も適用可能であってもよい。 Note that this synchronization mechanism may represent one of many synchronization techniques. Other synchronization techniques may also be applicable.

図４は、より実装に近い形態４００の提案される概念の実施形態のブロック図を示す。ソース・データベースをターゲット・データベースと同期させるために任意選択で使用されるソース・データベース管理システムのエレメント（主に図４の上側の部分）は、再び説明されない。 Figure 4 shows a block diagram of an embodiment of the proposed concept in a more practical form 400. Elements of the source database management system (primarily the upper part of Figure 4) that are optionally used to synchronize the source database with the target database will not be described again.

ターゲット・データベース管理システム３０８は、ターゲット・データベースのメモリ内部分４０２と、ターゲット・データベースの永続的部分４０４とを含む。永続性サービス４０６は、動作中にメモリ内ターゲット・データベース部分４０２の状態を永続データベース４０４に書込む。たとえばメモリ内データベース４０２に予め定められた回数の更新が行われた後などの、こうした規則的な動作の間に、クエリ・プロセッサ４０８は入来するクエリをターゲット・データベースのメモリ内部分４０２に向かわせる。 The target database management system 308 includes an in-memory portion 402 of the target database and a persistent portion 404 of the target database. A persistence service 406 writes the state of the in-memory target database portion 402 to the persistent database 404 during operation. During these regular operations, for example, after a predetermined number of updates to the in-memory database 402, a query processor 408 directs incoming queries to the in-memory portion 402 of the target database.

しかし、ターゲットＤＢＭＳが現在クラッシュ回復を行っているとき、特に要求またはデータ使用最適化回復を行っている場合は、動作が次のとおり異なってくる。
（ｉ）クエリ・プロセッサ４０８は、回復プロセッサ４１０と共にクエリ・データ・アクセスを分析して、ビュー分解技術を解析する最新のクエリを用いて、そのクエリに対して利用可能にする必要があるターゲット・テーブルのリストもしくはテーブル・パーティションのリストまたはその両方を決定する；
（ｉｉ）まだ回復していないと判定された回復状態中の前のステップからのデータ・アクセス・リストのテーブルもしくはテーブル・パーティションまたはその両方のすべてを見ることによって、遮断データ・リストが決定される；
（ｉｉｉ）前のステップからのリストが空でないとき、クエリおよびその遮断データ・リストは待機クエリに追加され、クエリ・プロセスは回復が終了する（遮断データ・リストが空になる）まで待機し、次いでクエリを継続できる；および
（ｉｖ）回復プロセッサ４１０からクエリ・プロセッサ４０８に完了通知が送信された後、ターゲット・データベースのクエリは通常どおりに処理される。 However, if the target DBMS is currently undergoing crash recovery, especially if it is performing a demand or data usage optimized recovery, the behavior differs as follows:
(i) the query processor 408, together with the recovery processor 410, analyzes the query data accesses to determine, using the current query analyzing view decomposition techniques, a list of target tables and/or a list of table partitions that need to be made available for the query;
(ii) a blocked data list is determined by looking at all of the tables and/or table partitions in the data access list from the previous step in the recovery state that have been determined not to have been recovered;
(iii) if the list from the previous step is not empty, the query and its blocked data list are added to the waiting queries, and the query process waits until recovery is finished (the blocked data list is empty), and then the query can continue; and (iv) after a completion notification is sent from recovery processor 410 to query processor 408, the query of the target database is processed as normal.

回復プロセッサ４１０が制御する動作の間、関連テーブルもしくはテーブル・パーティションまたはその両方の永続データベース４０４において利用可能な最新の一貫したスナップショットが、ターゲット・データベースのメモリ内部分４０２にロード４１２され、永続データベース４０４のスナップショットよりも後のタイムスタンプを有するソース・データベース２０２の回復ログ・ファイル３０６のエントリがメモリ内データベース４０２に再現される。これらは回復プロセッサ４１０からクエリ・プロセッサ４０８を介して要求され、ソースＤＢＭＳ３０２から、たとえば回復ログ・ファイル３０６からログ・リーダー３１４および適用ユニット３１６を介して提供される。 During operations controlled by the recovery processor 410, the latest consistent snapshot available in the persistent database 404 of the relevant tables and/or table partitions is loaded 412 into the in-memory portion 402 of the target database, and entries in the recovery log file 306 of the source database 202 that have a later timestamp than the snapshot in the persistent database 404 are recreated in the in-memory database 402. These are requested from the recovery processor 410 via the query processor 408 and provided from the source DBMS 302, for example from the recovery log file 306, via the log reader 314 and apply unit 316.

図５は、顧客優先順位最適化回復戦略のためのコンポーネントを含むリンクド・データベース・システムの実施形態５００のブロック図を示す。ソースＤＢＭＳ５０２は例示的に、（例、クラウド・コンピューティング環境における）第１のユーザまたはテナントの第１のソース・データベース５０４と、第２のユーザまたはテナントの第２のソース・データベース５０６とを有する。加えて、他のユーザまたはテナントに対する別個のソース・データベース・システムが利用可能であってもよい。 Figure 5 shows a block diagram of an embodiment 500 of a linked database system including components for a customer priority optimization recovery strategy. A source DBMS 502 illustratively has a first source database 504 for a first user or tenant (e.g., in a cloud computing environment) and a second source database 506 for a second user or tenant. In addition, separate source database systems for other users or tenants may be available.

ソース・データベース５０４、５０６をメモリ内ターゲット・データベース５１６のそれぞれのメモリ内部分５１８、５２０と同期させるために、データ同期システム５０８内に別個のデータ同期サブシステム５１０、５１２が実装される。クエリ・プロセス５２２は、ターゲット・データベース（単数または複数）５１６のメモリ内部分における実行のためのデータベース・クエリを受信する。これらのクエリは通常、ソース・データベース（単数または複数）５０４、５０６からオフロードされたＯＬＡＰクエリである。 Separate data synchronization subsystems 510, 512 are implemented within the data synchronization system 508 to synchronize the source databases 504, 506 with respective in-memory portions 518, 520 of the in-memory target database 516. A query process 522 receives database queries for execution in the in-memory portion of the target database(s) 516. These queries are typically OLAP queries offloaded from the source database(s) 504, 506.

ターゲットＤＢＭＳ５１４の回復プロセスまたはプロセッサ５２４も、データ・アクセス・アナライザ５２６において登録および分析されるべきクエリに関するデータをクエリ・プロセッサから受信する。ターゲット・データベースの回復プロセスの間に、クエリ・アナライザは回復状態管理システム５２８内の待機クエリ５３０およびすでに回復したテーブル・パーティション５３２を決定することによって、特定のユーザのクエリの優先順位に基づいて、どのテーブルを最初に回復させるべきかを決定する。これは最終的に回復スケジュール５３４によって判定および決定される。このタスクを行うために、回復スケジュール５３４は実際の回復状態をチェックするために回復状態管理システム５２８と常時データ交換を行い、ワークロード管理システム構成ストレージから構成データを受信する。さらに回復スケジュール５３４は、データ同期システム５０８ともデータを交換することによって、ソース・データベース管理システム５０２からのターゲット・データベース管理システム回復データベース・ロードを引き起こす。 The recovery process or processor 524 of the target DBMS 514 also receives data from the query processor regarding queries to be registered and analyzed in the data access analyzer 526. During the target database recovery process, the query analyzer determines which tables should be recovered first based on the priority of a particular user's queries by determining the queued queries 530 and already recovered table partitions 532 in the recovery state management system 528. This is ultimately determined and decided by the recovery schedule 534. To accomplish this task, the recovery schedule 534 constantly exchanges data with the recovery state management system 528 to check the actual recovery state and receives configuration data from the workload management system configuration storage. The recovery schedule 534 also exchanges data with the data synchronization system 508 to trigger the target database management system recovery database load from the source database management system 502.

このやり方で、構成されたより高い優先順位を有するユーザまたは顧客が、回復したデータベース・テーブルに早期にアクセスすることとなることを確実にでき、特定のテーブルのアクセス使用に依存してその回復時間も最適化できる。 In this way, you can ensure that users or customers with higher configured priorities have earlier access to recovered database tables, optimizing recovery times depending on the access usage of a particular table.

詳細には、このことは以下の手順によって達成されてもよい。ターゲット・データベースに対するクラッシュ回復プロセスが初期化されるとき、各テナントに対する遮断されたテーブル／テーブル・パーティションの空のヒストグラムが作成される。回復状態において新たな遮断クエリが登録されるとき、そのクエリの遮断データ・リストにおける各テーブル／パーティションに対する使用カウンタが１増加する。次いで、次の回復させるべきテーブル／パーティションを決定する必要があるとき、遮断されたクエリ使用のカウンタ数と、現在のテナントのＷＬＭ（ワークロード管理システム（ｗｏｒｋｌｏａｄｍａｎａｇｅｍｅｎｔｓｙｓｔｅｍ））構成（すなわち、その優先順位または重要性）とに基づいてテーブルの回復優先順位が決定され、最高の優先順位を有する回復アイテムが選択される。最後に、テーブル／パーティション回復動作が終了したときに、対応するヒストグラム・エントリも除去される。 In detail, this may be achieved by the following steps: When the crash recovery process for the target database is initialized, an empty histogram of blocked tables/table partitions for each tenant is created. When a new blocked query is registered in the recovery state, the usage counter for each table/partition in that query's blocked data list is incremented by one. Then, when the next table/partition to be recovered needs to be determined, the table's recovery priority is determined based on the blocked query usage counter count and the current tenant's WLM (workload management system) configuration (i.e., its priority or importance), and the recovery item with the highest priority is selected. Finally, when the table/partition recovery operation is completed, the corresponding histogram entry is also removed.

ユーザまたはテナントの特定的な回復優先順位の算出は、例示的に以下のとおりに実装され得る。たとえば、生産システムはテスト・システムより重要であること（単純な構成テキストによって指定されてもよい）など、テナントの対の間に厳密な優先順位が存在するとき、回復させるべきテーブル／パーティションのリストはユーザ／テナントによってグループ化され、グループのリストはユーザ／テナント優先順位の低下によって順序付けられ、次のテーブル／パーティションはクエリ使用カウンタ選択アルゴリズムによって、第１の空でないグループから選択される。 Calculating a user or tenant specific recovery priority can be implemented, for example, as follows: When there is a strict priority between pairs of tenants, e.g., production systems are more important than test systems (which may be specified by simple configuration text), the list of tables/partitions to be recovered is grouped by user/tenant, the list of groups is ordered by decreasing user/tenant priority, and the next table/partition is selected from the first non-empty group by a query usage counter selection algorithm.

しかし、ＷＬＭ構成における各テナントに対する相対的なリソース共有位置（例、テナント１５０％、テナント２３０％、テナント３２０％）が存在するとき、各回復アイテムに対する優先順位は以下のとおりに算出される。（ｉ）回復アイテムに対するクエリ使用カウンタｑを決定する；（ｉｉ）次いで、回復アイテムが属するテナントｔを決定する；（ｉｉｉ）次に、テナントｔに対するリソース共有位置ｒ（ｔ）を決定する；（ｉｖ）それに基づいて優先順位をａ＊ｑ＊ｂ＊ｒ（ｔ）として決定し、ここでａおよびｂは特に［０...１］の範囲の静的構成パラメータであり、ここでパラメータの数を減らすために、ｂはａに基づいて、たとえばｂ＝１－ａとして算出され得る。最後に、（ｖ）たとえば優先順位キューなどにおいて、回復アイテムのリストを優先順位によって順序付けできる。 However, when there is a relative resource sharing position for each tenant in the WLM configuration (e.g., Tenant 1 50%, Tenant 2 30%, Tenant 3 20%), the priority for each recovery item is calculated as follows: (i) determine the query usage counter q for the recovery item; (ii) then determine the tenant t to which the recovery item belongs; (iii) then determine the resource sharing position r(t) for tenant t; (iv) based thereon, determine the priority as a*q*b*r(t), where a and b are static configuration parameters, particularly in the range [0...1], and to reduce the number of parameters, b can be calculated based on a, e.g., b=1-a. Finally, (v) the list of recovery items can be ordered by priority, e.g., in a priority queue.

図６は、ボリューム最適化回復戦略のためのコンポーネントを含むリンクド・データベース・システムの実施形態６００のブロック図を示す。図５によってすでに紹介されたエレメントは、同じ参照番号で示される。左上側に、ソースＤＢＭＳ５０２がソース・データベース５０４および関連する回復ログ・ファイル６０４と共に示される。最初に、ターゲットＤＢＭＳ５１４のメモリ内データベース部分５１６（永続的部分は示されていない）はバルク・ローダー６０２を介してバルクでロードされてもよい。 Figure 6 shows a block diagram of an embodiment 600 of a linked database system including components for a volume-optimized recovery strategy. Elements already introduced by Figure 5 are indicated with the same reference numerals. On the upper left side, the source DBMS 502 is shown along with the source database 504 and associated recovery log file 604. Initially, the in-memory database portion 516 (persistent portion not shown) of the target DBMS 514 may be bulk loaded via the bulk loader 602.

ターゲットＤＢＭＳ５１４のメモリ内部分５１６に加えて、ここでは他のメタデータ６０８および回復プロセスまたは回復プロセッサ６１４に焦点を合わせてもよい。回復プロセスは、回復アイテム選択ユニット６１６と、変更推定ユニット６１８と、回復スケジュール６２０との少なくとも３つのコンポーネントを含む。回復プロセッサ６１４は、ターゲットＤＢＭＳ５１４のメモリ内データベース部分５１６のテーブルに関する状態情報を収集するために、メモリ内データベース５１６とデータ交換を行う。 In addition to the in-memory portion 516 of the target DBMS 514, the focus here may be on other metadata 608 and the recovery process or recovery processor 614. The recovery process includes at least three components: a recovery item selection unit 616, a change estimation unit 618, and a recovery schedule 620. The recovery processor 614 exchanges data with the in-memory database 516 to collect state information about the tables in the in-memory database portion 516 of the target DBMS 514.

すでに上述したとおり、ターゲットＤＢＭＳ側でデータベース・クラッシュが起こった場合、バルク・ロード機構を介して、または増分的に、ソース・データベース５０４からターゲット・データベース５１６を復元する必要がある。加えてここでは、回復させるべきテーブル／テーブル・パーティションが回復プロセス６１４によって動的に選択される。ここで提案される概念は、クラッシュ回復の際に復元する必要があるデータの量を推定または決定することを担う変更推定コンポーネント６１８によって拡張される。したがって、この変更推定コンポーネント６１８は、回復ベースライン・タイムスタンプ以後にソース・データベースにどれほどのデータ変更が蓄積したかを推定するために、データ変更統計を評価する。この情報に基づいて、回復スケジューラ６２０は、回復させるべきテーブル／テーブル・パーティションのデータを復元するために最も効率的なデータ同期方法を選択する。データ変更統計は、ターゲット・データベース（すなわち、メモリ内データベース部分５１６）の増分的またはバルク・ロードを介する規則的な更新処理の間に維持される。変更推定６１８は、回復アイテム選択コンポーネント６１６によって引き起こされてもよい。 As already mentioned above, in the event of a database crash on the target DBMS side, the target database 516 needs to be restored from the source database 504 either via a bulk load mechanism or incrementally. Additionally, the tables/table partitions to be recovered are dynamically selected by the recovery process 614. The proposed concept is extended by a change estimation component 618, which is responsible for estimating or determining the amount of data that needs to be restored during crash recovery. Therefore, this change estimation component 618 evaluates data change statistics to estimate how many data changes have accumulated in the source database since the recovery baseline timestamp. Based on this information, the recovery scheduler 620 selects the most efficient data synchronization method for restoring the data of the tables/table partitions to be recovered. The data change statistics are maintained during regular update processes via incremental or bulk loads of the target database (i.e., the in-memory database portion 516). The change estimation 618 may be triggered by the recovery item selection component 616.

回復スケジューラ６２０は、メモリ内データベース部分５１６の回復プロセスを管理するために、バルク・ローダー６０２および増分更新プロセス６０６の細部ともデータ交換を行う。たとえば、メモリ内データベース部分５１６の回復が完了したとき、バルク・ローダー６０２からの回復完了通知が受信される。他方側では、回復スケジューラ６２０がメモリ内データベース部分５１６の特定のテーブルに対する変更再現を要求する。バルク・ローダー６０２からの信号と同様に、回復スケジューラ６２０は増分更新プロセス（プロセッサ）６０６からの回復完了通知も受信する。図３の文脈ですでに説明されたとおり、増分更新プロセス６０６は、回復ログ・ファイル６０４から回復ログ・ファイル・エントリを読取るために適合されたログ・リーダー（ここには示されない）と、ソース・データベース５０４からのそれぞれの回復ログ・ファイル・エントリを用いてメモリ・データベース部分５１６を増分的に更新するために適合された回復ログ・ファイル適用ユニット（ここには示されない）とを含む。詳細については図３に戻って参照されたい。 The recovery scheduler 620 also exchanges data with the bulk loader 602 and the details of the incremental update process 606 to manage the recovery process of the in-memory database portion 516. For example, when recovery of the in-memory database portion 516 is completed, a recovery completion notification is received from the bulk loader 602. On the other hand, the recovery scheduler 620 requests the reproduction of changes to specific tables in the in-memory database portion 516. Similar to signals from the bulk loader 602, the recovery scheduler 620 also receives recovery completion notifications from the incremental update process (processor) 606. As already explained in the context of FIG. 3, the incremental update process 606 includes a log reader (not shown) adapted to read recovery log file entries from the recovery log file 604 and a recovery log file application unit (not shown) adapted to incrementally update the memory database portion 516 with respective recovery log file entries from the source database 504. Please refer back to Figure 3 for more details.

本明細書に記載される実施形態のプロセスを正常に管理するために、データ変更統計は、ターゲット・データベース・システム５１４のメタデータ６０８に含まれる永続メタデータ・カタログに記憶されるべきであり、かつターゲット・データベース・システム５１４が更新されるとき、すなわち増分更新またはバルク・ロード戦略を介して更新されるときに維持されるべきである。データ変更統計は、以下の情報を記憶してもよい。（ｉ）更新が処理されたときのタイムスタンプ、（ｉｉ）データベース・テーブルのスキーマ情報、たとえば列タイプ、列幅、...、（ｉｉｉ）更新による影響を受けたテーブル／テーブル・パーティション当たりのデータ変更の量、すなわち挿入された記録、および削除された記録、更新された記録；ならびに集約されたメトリック、たとえば変更された記録の総数、適用された合計データ・ボリューム、更新の合計実行時間など。 To successfully manage the processes of the embodiments described herein, data change statistics should be stored in a persistent metadata catalog included in the metadata 608 of the target database system 514 and should be maintained when the target database system 514 is updated, i.e., via incremental updates or bulk load strategies. The data change statistics may store the following information: (i) the timestamp when the update was processed; (ii) schema information for the database table, e.g., column types, column widths, ...; (iii) the amount of data changes per table/table partition affected by the update, i.e., inserted, deleted, and updated records; and aggregated metrics, e.g., total number of changed records, total data volume applied, total execution time of the update, etc.

統計データは、たとえば最後のｘ日間などの時間間隔とリンクされ得る。加えて統計データは、各更新サイクルの一部として増分的に管理されてもよい。 Statistical data may be linked to a time interval, such as the last x number of days. Additionally, statistical data may be maintained incrementally as part of each update cycle.

さらに、回復ベースライン６１０は、ターゲット・データベース・システム５１４の永続メタデータ６０８（例、メタデータ・カタログ）でも維持される。回復ベースライン６１０は、復元される必要があるデータ・ボリュームを決定するための推定に必要とされる。したがって、正確なタイムスタンプを決定する必要はなく、その値を推定することで十分であり、たとえば、それは規則的な時間間隔でターゲット・データベース・システムによって更新されるハートビート・タイムスタンプとして維持されてもよいし、それはクラッシュ後の回復プロセスの出発点として維持されてもよいし、それは最後に正常に持続されたターゲット・データベース・スナップショットの時間として維持されてもよい。それによって、タイムスタンプはターゲット・データベースのテーブルごとに維持されてもよい。 Furthermore, the recovery baseline 610 is also maintained in the persistent metadata 608 (e.g., metadata catalog) of the target database system 514. The recovery baseline 610 is needed for estimation to determine the data volume that needs to be restored. Therefore, it is not necessary to determine the exact timestamp; it is sufficient to estimate its value; for example, it may be maintained as a heartbeat timestamp updated by the target database system at regular time intervals, it may be maintained as a starting point for the recovery process after a crash, or it may be maintained as the time of the last successfully persisted target database snapshot. Thereby, a timestamp may be maintained for each table in the target database.

ターゲット・データベースに対するクラッシュ回復は、以下のとおりに実行され得る。（ｉ）最初に、次の回復させるべきテーブル／パーティションが決定される；（ｉｉ）回復ベースライン以後の回復させる必要があるデータ・ボリュームが推定される；（ｉｉｉ）推定されたデータ・ボリュームに基づいて最良の回復戦略が選択され、回復時間が推定される；（ｉｖ）次いで、選択された戦略によるテーブルの回復がスケジュールされる；および（ｖ）すべてのデータが回復されるまで、これらのステップがループで繰り返される。 Crash recovery for the target database can be performed as follows: (i) first, the next table/partition to be recovered is determined; (ii) the data volume that needs to be recovered since the recovery baseline is estimated; (iii) the best recovery strategy is selected based on the estimated data volume, and the recovery time is estimated; (iv) the table recovery according to the selected strategy is then scheduled; and (v) these steps are repeated in a loop until all data is recovered.

回復させるべきデータ・ボリュームの推定は、以下のとおりに実行されてもよい。（ｉ）回復させるべきテーブル／パーティションに対して、対応するデータ変更統計６１２が探索される；（ｉｉ）そのテーブル／パーティションに対する回復ベースラインが決定される；および（ｉｉｉ）その間隔［回復ベースライン、現在の回復時間］における増分更新プロセスを介して複製する必要がある変更の数が推定される。 Estimating the data volume to be restored may be performed as follows: (i) for the table/partition to be restored, the corresponding data change statistics 612 are looked up; (ii) a recovery baseline for that table/partition is determined; and (iii) the number of changes that need to be replicated via the incremental update process in the interval [recovery baseline, current recovery time] is estimated.

図７は、リンクド・データベースに対するクラッシュ回復のためのリンクド・データベース・システム７００の実施形態のブロック図を示す。リンクド・データベース７００はソース・データベース７０６と、関連ターゲット・データベース７０８とを含み、ソース・データベースのテーブルの内容の選択された部分が、ターゲット・データベースのテーブルの内容のそれぞれの部分と同期される。 Figure 7 shows a block diagram of an embodiment of a linked database system 700 for crash recovery for linked databases. The linked database 700 includes a source database 706 and an associated target database 708, in which selected portions of the contents of tables in the source database are synchronized with respective portions of the contents of tables in the target database.

リンクド・データベース・システム７００はプロセッサ７０２と、プロセッサ７０２に通信的に結合されたメモリ７０４とを含み、メモリ７０４はプログラム・コード部分を記憶しており、このプログラム・コード部分は実行されたときに、プロセッサが、たとえば同期ユニット７１４などを用いて、ソース・データベース７０６のテーブルの内容の選択された部分を、ターゲット・データベース７０８のテーブルの内容のそれぞれの部分と同期させることを可能にする。 The linked database system 700 includes a processor 702 and a memory 704 communicatively coupled to the processor 702, the memory 704 storing program code portions that, when executed, enable the processor to synchronize selected portions of the contents of tables in the source database 706 with respective portions of the contents of tables in the target database 708, for example, using a synchronization unit 714.

記憶されるプログラム・コード部分は実行されたときに、プロセッサ７０２が同期中に適用ユニット７１６を用いて、ソース・データベース７０６に対する変更を、ターゲット・データベースを含むデータベース管理システムのメモリ内ターゲット・データベース部分７１０に適用することと、たとえばストレージ・プロセッサ７１８などによって、メモリ内ターゲット・データベース部分７１２に対して適用された変更を非同期的に永続ターゲット・データベース・ストレージに永続的に記憶することとを可能にする。 When executed, the stored program code portions enable the processor 702 to synchronously apply, using the apply unit 716, changes to the source database 706 to an in-memory target database portion 710 of a database management system that includes the target database, and to asynchronously persistently store, by, for example, the storage processor 718, the applied changes to the in-memory target database portion 712 in persistent target database storage.

さらに、記憶されるプログラム・コード部分は実行されたときに、プロセッサ７０２が、たとえば復元ユニット７２０などによって、ターゲット・データベース・システムのデータベース・クラッシュの際に、永続ターゲット・データベース・ストレージ部分７１２において利用可能な最新のスナップショットによってメモリ内ターゲット・データベース部分７１０を復元することと、たとえば第２の適用ユニット７２２などによって、ターゲット・データベース・システムのデータベース・クラッシュの際に、永続ターゲット・データベース・ストレージ部分７１２において利用可能な最新のスナップショットよりも後のタイムスタンプを有するソース・データベース回復ログ・ファイルからの変更をメモリ内ターゲット・データベース部分７１０に適用することとを可能にする。 Furthermore, when executed, the stored program code portions enable the processor 702 to restore, for example by a restore unit 720, the in-memory target database portion 710 with the latest snapshot available in the permanent target database storage portion 712 in the event of a database crash of the target database system, and to apply, for example by a second apply unit 722, changes from a source database recovery log file having a later timestamp than the latest snapshot available in the permanent target database storage portion 712 to the in-memory target database portion 710 in the event of a database crash of the target database system.

なお加えて、リンクド・データベース・システム７００のすべてのモジュールおよびユニットは、信号もしくはデータまたはその両方を交換するために電気的に相互接続されてもよい。このことはプロセッサ７０２、メモリ７０４、ソース・データベース・システム７０６、メモリ内部分７１０および永続的部分７１２を含むターゲット・データベース・システム７０８、同期ユニット７１４、適用ユニット７１６、ストレージ・プロセッサ７１８、復元ユニット７２０、ならびに第２の適用ユニット７２２に適用されてもよい。これらのモジュールおよびユニット間の１：１接続の代わりに、これらは信号伝達データ交換のためのリンクド・データベース・システム内部バス・システム７２４に接続されてもよい。 Additionally, all modules and units of the linked database system 700 may be electrically interconnected to exchange signals and/or data. This may apply to the processor 702, memory 704, source database system 706, target database system 708 including in-memory portion 710 and persistent portion 712, synchronization unit 714, application unit 716, storage processor 718, restore unit 720, and second application unit 722. Instead of 1:1 connections between these modules and units, they may be connected to a linked database system internal bus system 724 for signaling and data exchange.

本発明の実施形態は、プラットフォームがプログラム・コードの記憶もしくは実行またはその両方に適しているかどうかにかかわらず、実質的に任意のタイプのコンピュータと共に実装されてもよい。図８は例として、提案される方法に関するプログラム・コードを実行するために好適なコンピュータ・システム８００を示す。 Embodiments of the present invention may be implemented in conjunction with virtually any type of computer, regardless of whether the platform is suitable for storing and/or executing program code. Figure 8 illustrates, by way of example, a computer system 800 suitable for executing program code relating to the proposed method.

コンピュータ・システム８００は、好適なコンピュータ・システムの単なる一例であり、コンピュータ・システム８００が上記に示された機能のいずれかの実装もしくは実行またはその両方が可能であるかどうかにかかわらず、本明細書に記載される本発明の実施形態の使用または機能の範囲に関する任意の限定を示唆することは意図されていない。コンピュータ・システム８００には、多数の他の汎用目的または特定目的のコンピュータ・システム環境または構成と共に動作するコンポーネントが存在する。コンピュータ・システム／サーバ８００と共に使用するために好適であり得る周知のコンピュータ・システム、環境、もしくは構成、またはその組み合わせの例は、パーソナル・コンピュータ・システム、サーバ・コンピュータ・システム、シン・クライアント、シック・クライアント、ハンドヘルドまたはラップトップ・デバイス、マルチプロセッサ・システム、マイクロプロセッサ・ベースのシステム、セット・トップ・ボックス、プログラマブル家電機器、ネットワークＰＣ、ミニコンピュータ・システム、メインフレーム・コンピュータ・システム、および上記のシステムまたはデバイスのいずれかを含む分散型クラウド・コンピューティング環境などを含むが、それに限定されない。コンピュータ・システム／サーバ８００は、コンピュータ・システム８００によって実行されるたとえばプログラム・モジュールなどのコンピュータ・システム実行可能命令の一般的な文脈で記載されてもよい。一般的に、プログラム・モジュールは、特定のタスクを実行するか、または特定の抽象データ・タイプを実装するルーチン、プログラム、オブジェクト、コンポーネント、ロジック、およびデータ構造などを含んでもよい。コンピュータ・システム／サーバ８００は、通信ネットワークを通じてリンクされたリモート処理デバイスによってタスクが実行される分散型クラウド・コンピューティング環境において実施されてもよい。分散型クラウド・コンピューティング環境において、プログラム・モジュールは、メモリ・ストレージ・デバイスを含むローカルおよびリモート・コンピュータ・システム記憶媒体の両方に位置してもよい。 Computer system 800 is merely one example of a suitable computer system, and whether computer system 800 is capable of implementing and/or performing any of the functions set forth above is not intended to suggest any limitation as to the scope of use or functionality of the embodiments of the present invention described herein. Computer system 800 has components that operate in conjunction with numerous other general-purpose or special-purpose computer system environments or configurations. Examples of well-known computer systems, environments, or configurations, or combinations thereof, that may be suitable for use with computer system/server 800 include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics devices, network PCs, minicomputer systems, mainframe computer systems, and distributed cloud computing environments that include any of the above systems or devices. Computer system/server 800 may be described in the general context of computer system-executable instructions, such as program modules, executed by computer system 800. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. Computer system/server 800 may also be practiced in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media, including memory storage devices.

図面に示されるとおり、コンピュータ・システム／サーバ８００は汎用目的のコンピュータ・デバイスの形態で示されている。コンピュータ・システム／サーバ８００のコンポーネントは、１つ以上のプロセッサまたは処理ユニット８０２、システム・メモリ８０４、およびシステム・メモリ８０４を含むさまざまなシステム・コンポーネントをプロセッサ８０２に結合するバス８０６を含んでもよいが、それに限定されない。バス８０６は、メモリ・バスまたはメモリ・コントローラ、ペリフェラル・バス、アクセラレーテッド・グラフィクス・ポート、およびさまざまなバス・アーキテクチャのいずれかを用いるプロセッサまたはローカル・バスを含むいくつかのタイプのバス構造のいずれか１つ以上を表す。限定ではなく例として、こうしたアーキテクチャは、インダストリ・スタンダード・アーキテクチャ（ＩＳＡ：ＩｎｄｕｓｔｒｙＳｔａｎｄａｒｄＡｒｃｈｉｔｅｃｔｕｒｅ）バス、マイクロ・チャネル・アーキテクチャ（ＭＣＡ：ＭｉｃｒｏＣｈａｎｎｅｌＡｒｃｈｉｔｅｃｔｕｒｅ）バス、拡張ＩＳＡ（ＥＩＳＡ：ＥｎｈａｎｃｅｄＩＳＡ）バス、ビデオ・エレクトロニクス・スタンダーズ・アソシエーション（ＶＥＳＡ：ＶｉｄｅｏＥｌｅｃｔｒｏｎｉｃｓＳｔａｎｄａｒｄｓＡｓｓｏｃｉａｔｉｏｎ）ローカル・バス、およびペリフェラル・コンポーネント・インターコネクト（ＰＣＩ：ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｃｏｎｎｅｃｔｓ）バスを含む。コンピュータ・システム／サーバ８００は通常、さまざまなコンピュータ・システム可読媒体を含む。こうした媒体は、コンピュータ・システム／サーバ８００によってアクセス可能な任意の利用可能な媒体であってもよく、それは揮発性および不揮発性媒体、取り外し可能および取り外し不可能媒体の両方を含む。 As shown in the drawings, computer system/server 800 is shown in the form of a general-purpose computing device. Components of computer system/server 800 may include, but are not limited to, one or more processors or processing units 802, system memory 804, and a bus 806 that couples various system components, including system memory 804, to processor 802. Bus 806 represents any one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include the Industry Standard Architecture (ISA) bus, the Micro Channel Architecture (MCA) bus, the Enhanced ISA (EISA) bus, the Video Electronics Standards Association (VESA) local bus, and the Peripheral Component Interconnect (PCI) bus. Computer system/server 800 typically includes a variety of computer system-readable media. Such media may be any available media that is accessible by the computer system/server 800 and includes both volatile and non-volatile media, removable and non-removable media.

システム・メモリ８０４は、たとえばランダム・アクセス・メモリ（ＲＡＭ：ｒａｎｄｏｍａｃｃｅｓｓｍｅｍｏｒｙ）８０８もしくはキャッシュ・メモリ８１０またはその両方などの、揮発性メモリの形のコンピュータ・システム可読媒体を含んでもよい。コンピュータ・システム／サーバ８００はさらに、他の取り外し可能／取り外し不可能な揮発性／不揮発性コンピュータ・システム記憶媒体を含んでもよい。単なる例として、取り外し不可能な不揮発性磁気媒体（示されておらず、通常は「ハード・ドライブ」と呼ばれる）からの読取りおよびそこへの書込みのために、ストレージ・システム８１２が提供されてもよい。示されていないが、取り外し可能な不揮発性磁気ディスク（例、「フレキシブル・ディスク」）からの読取りおよびそこへの書込みのための磁気ディスク・ドライブ、ならびにたとえばＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、またはその他の光媒体などの取り外し可能な不揮発性光ディスクからの読取りまたはそこへの書込みのための光ディスク・ドライブが提供されてもよい。こうした場合に、各々は１つ以上のデータ媒体インターフェースによってバス８０６に接続され得る。以下にさらに示されて説明されることとなるとおり、メモリ８０４は、本発明の実施形態の機能を実行するように構成されたプログラム・モジュールのセット（例、少なくとも１つ）を有する少なくとも１つのプログラム製品を含んでもよい。 System memory 804 may include computer system-readable media in the form of volatile memory, such as random access memory (RAM) 808 and/or cache memory 810. Computer system/server 800 may also include other removable/non-removable, volatile/non-volatile computer system storage media. By way of example only, storage system 812 may be provided for reading from and writing to non-removable, non-volatile magnetic media (not shown, typically referred to as a "hard drive"). Although not shown, a magnetic disk drive may be provided for reading from and writing to removable, non-volatile magnetic disks (e.g., "floppy disks"), and an optical disk drive may be provided for reading from or writing to removable, non-volatile optical disks, such as CD-ROMs, DVD-ROMs, or other optical media. In such cases, each may be connected to bus 806 by one or more data media interfaces. As will be further shown and described below, memory 804 may include at least one program product having a set (e.g., at least one) of program modules configured to perform the functions of embodiments of the present invention.

限定ではなく例として、プログラム・モジュール８１６のセット（少なくとも１つ）を有するプログラム／ユーティリティ、ならびにオペレーティング・システム、１つ以上のアプリケーション・プログラム、その他のプログラム・モジュール、およびプログラム・データがメモリ８０４に記憶されてもよい。オペレーティング・システム、１つ以上のアプリケーション・プログラム、その他のプログラム・モジュール、およびプログラム・データ、またはその何らかの組み合わせの各々は、ネットワーク形成環境の実装を含んでもよい。プログラム・モジュール８１６は一般的に、本明細書に記載される本発明の実施形態の機能もしくは方法またはその両方を実行する。 By way of example and not limitation, a program/utility having a set (at least one) of program modules 816, as well as an operating system, one or more application programs, other program modules, and program data, may be stored in memory 804. Each of the operating system, one or more application programs, other program modules, and program data, or any combination thereof, may comprise an implementation of a networking environment. The program modules 816 generally perform the functions and/or methods of embodiments of the present invention described herein.

コンピュータ・システム／サーバ８００は、たとえばキーボード、ポインティング・デバイス、ディスプレイ８２０などの１つ以上の外部デバイス８１８；ユーザがコンピュータ・システム／サーバ８００と対話することを可能にする１つ以上のデバイス；もしくはコンピュータ・システム／サーバ８００が１つ以上の他のコンピュータ・デバイスと通信することを可能にする任意のデバイス（例、ネットワーク・カード、モデムなど）；またはその組み合わせとも通信してもよい。こうした通信は、入力／出力（Ｉ／Ｏ：Ｉｎｐｕｔ／Ｏｕｔｐｕｔ）インターフェース８１４を介して生じ得る。さらに、コンピュータ・システム／サーバ８００は、ネットワーク・アダプタ８２２を介して、たとえばローカル・エリア・ネットワーク（ＬＡＮ：ｌｏｃａｌａｒｅａｎｅｔｗｏｒｋ）、一般的な広域ネットワーク（ＷＡＮ：ｗｉｄｅａｒｅａｎｅｔｗｏｒｋ）、もしくはパブリック・ネットワーク（例、インターネット）、またはその組み合わせなどの１つ以上のネットワークと通信してもよい。示されるとおり、ネットワーク・アダプタ８２２は、バス８０６を介してコンピュータ・システム／サーバ８００のその他のコンポーネントと通信してもよい。示されていないが、他のハードウェアもしくはソフトウェア・コンポーネントまたはその両方が、コンピュータ・システム／サーバ８００と共に使用され得ることが理解されるべきである。その例は、マイクロコード、デバイス・ドライバ、冗長処理ユニット、外部ディスク・ドライブ・アレイ、ＲＡＩＤシステム、テープ・ドライブ、およびデータ・アーカイバル・ストレージ・システムなどを含むが、それに限定されない。 The computer system/server 800 may also communicate with one or more external devices 818, such as a keyboard, pointing device, or display 820; one or more devices that allow a user to interact with the computer system/server 800; or any device (e.g., a network card, modem, etc.) that allows the computer system/server 800 to communicate with one or more other computer devices; or a combination thereof. Such communication may occur via an input/output (I/O) interface 814. Additionally, the computer system/server 800 may communicate with one or more networks, such as a local area network (LAN), a general wide area network (WAN), or a public network (e.g., the Internet), or a combination thereof, via a network adapter 822. As shown, the network adapter 822 may communicate with other components of the computer system/server 800 via the bus 806. Although not shown, it should be understood that other hardware and/or software components may be used with computer system/server 800. Examples include, but are not limited to, microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data archival storage systems.

加えて、リンクド・データベースに対するクラッシュ回復を有するリンクド・データベース・システム７００がバス・システム８０６に取り付けられてもよい。 In addition, a linked database system 700 with crash recovery for the linked database may be attached to the bus system 806.

本発明のさまざまな実施形態の説明は例示の目的のために提供されたものであるが、開示される実施形態に対して網羅的または限定的になることは意図されていない。記載される実施形態の範囲から逸脱することなく、当業者には多くの修正および変更が明らかになるだろう。本明細書で使用される用語は、実施形態の原理、実際の適用、または市場で見出される技術に対する技術的改善を最もよく説明するため、または他の当業者が本明細書で開示される実施形態を理解できるようにするために選択されたものである。 The descriptions of various embodiments of the present invention have been provided for illustrative purposes and are not intended to be exhaustive or limiting to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope of the described embodiments. The terminology used herein has been selected to best explain the principles of the embodiments, practical applications, or technical improvements over technology found in the marketplace, or to enable others skilled in the art to understand the embodiments disclosed herein.

本発明はシステム、方法、もしくはコンピュータ・プログラム製品、またはその組み合わせとして具現化されてもよい。コンピュータ・プログラム製品は、プロセッサに本発明の態様を実行させるためのコンピュータ可読プログラム命令を有するコンピュータ可読記憶媒体（または複数の媒体）を含んでもよい。 The present invention may be embodied as a system, method, or computer program product, or a combination thereof. The computer program product may include a computer-readable storage medium (or media) having computer-readable program instructions for causing a processor to perform aspects of the present invention.

媒体は、伝播媒体のための電子、磁気、光学、電磁気、赤外、または半導体のシステムであってもよい。コンピュータ可読媒体の例は半導体または固体メモリ、磁気テープ、取り外し可能コンピュータ・ディスケット、ランダム・アクセス・メモリ（ＲＡＭ）、リード・オンリ・メモリ（ＲＯＭ：ｒｅａｄ－ｏｎｌｙｍｅｍｏｒｙ）、剛性磁気ディスク、および光ディスクを含んでもよい。光ディスクの現在の例は、コンパクト・ディスク・リード・オンリ・メモリ（ＣＤ－ＲＯＭ：ｃｏｍｐａｃｔｄｉｓｋ－ｒｅａｄｏｎｌｙｍｅｍｏｒｙ）、コンパクト・ディスク読取り／書込み（ＣＤ－Ｒ／Ｗ：ｃｏｍｐａｃｔｄｉｓｋ－ｒｅａｄ／ｗｒｉｔｅ）、ＤＶＤ、およびＢｌｕ－Ｒａｙ（Ｒ）ディスクを含む。 The medium may be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system for propagation media. Examples of computer-readable media may include semiconductor or solid-state memory, magnetic tape, removable computer diskettes, random access memory (RAM), read-only memory (ROM), rigid magnetic disks, and optical disks. Current examples of optical disks include compact disk-read-only memory (CD-ROM), compact disk-read/write (CD-R/W), DVDs, and Blu-Ray® disks.

コンピュータ可読記憶媒体は、命令実行デバイスによって使用するための命令を保持および記憶できる有形デバイスであり得る。コンピュータ可読記憶媒体は、たとえば電子ストレージ・デバイス、磁気ストレージ・デバイス、光ストレージ・デバイス、電磁気ストレージ・デバイス、半導体ストレージ・デバイス、または前述の任意の好適な組み合わせなどであってもよいが、それに限定されない。コンピュータ可読記憶媒体のより具体的な例の非網羅的なリストは以下を含む。ポータブル・コンピュータ・ディスケット、ハード・ディスク、ランダム・アクセス・メモリ（ＲＡＭ）、リード・オンリ・メモリ（ＲＯＭ）、消去可能プログラマブル・リード・オンリ・メモリ（ｅｒａｓａｂｌｅｐｒｏｇｒａｍｍａｂｌｅｒｅａｄ－ｏｎｌｙｍｅｍｏｒｙ）（ＥＰＲＯＭまたはフラッシュ・メモリ）、スタティック・ランダム・アクセス・メモリ（ＳＲＡＭ：ｓｔａｔｉｃｒａｎｄｏｍａｃｃｅｓｓｍｅｍｏｒｙ）、ポータブル・コンパクト・ディスク・リード・オンリ・メモリ（ＣＤ－ＲＯＭ）、デジタル多用途ディスク（ＤＶＤ：ｄｉｇｉｔａｌｖｅｒｓａｔｉｌｅｄｉｓｋ）、メモリ・スティック、フレキシブル・ディスク、機械的にコード化されたデバイス、たとえばパンチ・カードまたは記録された命令を有する溝の中の隆起構造体など、および前述の任意の好適な組み合わせ。本明細書において用いられるコンピュータ可読記憶媒体は、たとえば電波もしくはその他の自由に伝播する電磁波、導波路もしくはその他の伝送媒体を通じて伝播する電磁波（例、光ファイバ・ケーブルを通過する光パルス）、またはワイヤを通じて伝送される電気信号など、それ自体が一時的な信号であると解釈されるべきではない。 A computer-readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. A computer-readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of computer-readable storage media includes the following: Portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static random access memory (SRAM), portable compact disc read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanically encoded devices such as punch cards or raised structures in grooves with recorded instructions, and the like, and any suitable combination of the foregoing. As used herein, computer-readable storage media should not be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., light pulses passing through a fiber optic cable), or electrical signals transmitted through wires.

本明細書に記載されるコンピュータ可読プログラム命令は、コンピュータ可読記憶媒体からそれぞれのコンピューティング／処理デバイスにダウンロードされ得るか、あるいはたとえばインターネット、ローカル・エリア・ネットワーク、広域ネットワーク、もしくはワイヤレス・ネットワーク、またはその組み合わせなどのネットワークを介して外部コンピュータまたは外部ストレージ・デバイスにダウンロードされ得る。ネットワークは銅伝送ケーブル、光伝送ファイバ、ワイヤレス伝送、ルータ、ファイアウォール、スイッチ、ゲートウェイ・コンピュータ、もしくはエッジ・サーバ、またはその組み合わせを含んでもよい。各コンピューティング／処理デバイス内のネットワーク・アダプタ・カードまたはネットワーク・インターフェースは、ネットワークからコンピュータ可読プログラム命令を受信して、そのコンピュータ可読プログラム命令をそれぞれのコンピューティング／処理デバイス内のコンピュータ可読記憶媒体に記憶するために転送する。 The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to each computing/processing device, or may be downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, or a wireless network, or a combination thereof. The network may include copper transmission cables, optical fiber transmissions, wireless transmissions, routers, firewalls, switches, gateway computers, or edge servers, or a combination thereof. A network adapter card or network interface within each computing/processing device receives the computer-readable program instructions from the network and forwards the computer-readable program instructions to a computer-readable storage medium within the respective computing/processing device for storage.

本発明の動作を実行するためのコンピュータ可読プログラム命令はアセンブラ命令、命令セット・アーキテクチャ（ＩＳＡ：ｉｎｓｔｒｕｃｔｉｏｎ－ｓｅｔ－ａｒｃｈｉｔｅｃｔｕｒｅ）命令、マシン命令、マシン依存命令、マイクロコード、ファームウェア命令、状態設定データ、または１つ以上のプログラミング言語の任意の組み合わせで書かれたソース・コードもしくはオブジェクト・コードであってもよく、このプログラミング言語はオブジェクト指向プログラミング言語、たとえばＳｍａｌｌｔａｌｋ、またはＣ＋＋など、および従来の手続き型プログラミング言語、たとえば「Ｃ」プログラミング言語または類似のプログラミング言語などを含む。コンピュータ可読プログラム命令は、すべてがユーザのコンピュータで実行されてもよいし、スタンドアロン・ソフトウェア・パッケージとして部分的にユーザのコンピュータで実行されてもよいし、一部がユーザのコンピュータで、一部がリモート・コンピュータで実行されてもよいし、すべてがリモート・コンピュータまたはサーバで実行されてもよい。後者のシナリオにおいて、リモート・コンピュータは、ローカル・エリア・ネットワーク（ＬＡＮ）または広域ネットワーク（ＷＡＮ）を含む任意のタイプのネットワークを通じてユーザのコンピュータに接続されてもよいし、（たとえば、インターネット・サービス・プロバイダを用いてインターネットを通じて）外部コンピュータへの接続が行われてもよい。いくつかの実施形態において、たとえばプログラマブル・ロジック回路、フィールド・プログラマブル・ゲート・アレイ（ＦＰＧＡ：ｆｉｅｌｄ－ｐｒｏｇｒａｍｍａｂｌｅｇａｔｅａｒｒａｙｓ）、またはプログラマブル・ロジック・アレイ（ＰＬＡ：ｐｒｏｇｒａｍｍａｂｌｅｌｏｇｉｃａｒｒａｙｓ）などを含む電子回路は、本発明の態様を行うために電子回路をパーソナライズするためのコンピュータ可読プログラム命令の状態情報を使用することによって、コンピュータ可読プログラム命令を実行してもよい。 The computer-readable program instructions for carrying out the operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state-setting data, or source or object code written in any combination of one or more programming languages, including object-oriented programming languages such as Smalltalk or C++, and conventional procedural programming languages such as the "C" programming language or similar. The computer-readable program instructions may execute entirely on the user's computer, partially on the user's computer as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (e.g., over the Internet using an Internet Service Provider). In some embodiments, electronic circuitry, including, for example, programmable logic circuits, field-programmable gate arrays (FPGAs), or programmable logic arrays (PLAs), may execute computer-readable program instructions by using state information of the computer-readable program instructions to personalize the electronic circuitry to perform aspects of the present invention.

本明細書においては、本発明の実施形態による方法、装置（システム）、およびコンピュータ・プログラム製品のフローチャート図もしくはブロック図またはその両方を参照して、本発明の態様を説明している。フローチャート図もしくはブロック図またはその両方の各ブロック、およびフローチャート図もしくはブロック図またはその両方におけるブロックの組み合わせは、コンピュータ可読プログラム命令によって実装され得ることが理解されるだろう。 Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

これらのコンピュータ可読プログラム命令は、汎用目的のコンピュータか、特定目的のコンピュータか、またはマシンを生成するためのその他のプログラマブル・データ処理装置のプロセッサに提供されることによって、そのコンピュータまたはその他のプログラマブル・データ処理装置のプロセッサを介して実行される命令が、フローチャートもしくはブロック図またはその両方の単数または複数のブロックにおいて指定される機能／動作を実装するための手段を生じてもよい。これらのコンピュータ可読プログラム命令は、コンピュータ、プログラマブル・データ処理装置、もしくはその他のデバイス、またはその組み合わせに特定の方式で機能するように指示できるコンピュータ可読記憶媒体にも記憶されることによって、命令が記憶されたコンピュータ可読記憶媒体が、フローチャートもしくはブロック図またはその両方の単数または複数のブロックにおいて指定される機能／動作の態様を実装する命令を含む製造物を含んでもよい。 These computer-readable program instructions may be provided to a processor of a general-purpose computer, special-purpose computer, or other programmable data processing apparatus to create a machine, such that the instructions, executed by the processor of the computer or other programmable data processing apparatus, cause means for implementing the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams. These computer-readable program instructions may also be stored on a computer-readable storage medium that can instruct a computer, programmable data processing apparatus, or other device, or combination thereof, to function in a particular manner, such that the computer-readable storage medium on which the instructions are stored includes instructions that implement aspects of the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.

コンピュータ可読プログラム命令は、コンピュータ、他のプログラマブル・データ処理装置、または別のデバイスにもロードされて、コンピュータに実装されるプロセスを生成するためにコンピュータ、他のプログラマブル装置、または他のデバイスにおいて一連の動作ステップを行わせることによって、そのコンピュータ、他のプログラマブル装置、または別のデバイスにおいて実行される命令が、フローチャートもしくはブロック図またはその両方の単数または複数のブロックにおいて指定される機能／動作を実装してもよい。 The computer-readable program instructions may also be loaded into a computer, other programmable data processing apparatus, or other device to cause the computer, other programmable apparatus, or other device to perform a series of operational steps to create a computer-implemented process, such that the instructions executed on the computer, other programmable apparatus, or other device implement the functions/operations specified in one or more blocks of the flowcharts and/or block diagrams.

図面におけるフローチャートもしくはブロック図またはその両方は、本発明のさまざまな実施形態によるシステム、方法、およびコンピュータ・プログラム製品の可能な実装のアーキテクチャ、機能、および動作を示すものである。これに関して、フローチャートまたはブロック図の各ブロックは、命令のモジュール、セグメント、または一部分を表してもよく、これは指定される論理機能（単数または複数）を実装するための１つ以上の実行可能命令を含む。いくつかの代替的実装において、ブロック内に示される機能は、図面に示されるものとは異なる順序で起こってもよい。たとえば、連続して示される２つのブロックは、実際には実質的に同時に実行されてもよく、または関与する機能に依存して、これらのブロックがときには逆の順序で実行されてもよい。加えて、ブロック図もしくはフローチャート図またはその両方の各ブロック、およびブロック図もしくはフローチャート図またはその両方のブロックの組み合わせは、指定された機能もしくは動作を行うか、または特定目的のハードウェアおよびコンピュータ命令の組み合わせを実行する特定目的のハードウェア・ベースのシステムによって実装され得ることが注目されるだろう。 The flowcharts and/or block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowcharts or block diagrams may represent a module, segment, or portion of instructions, which includes one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions shown in the blocks may occur in a different order than that shown in the figures. For example, two blocks shown in succession may in fact be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending on the functionality involved. In addition, it will be noted that each block of the block diagrams and/or flowchart diagrams, and combinations of blocks in the block diagrams and/or flowchart diagrams, may be implemented by a special-purpose hardware-based system that performs the specified functions or operations or executes a combination of special-purpose hardware and computer instructions.

本明細書において用いられる用語は、単に特定の実施形態を説明する目的のためのものであり、本発明を限定することは意図されていない。本明細書において用いられる単数形「ａ」、「ａｎ」、および「ｔｈｅ」は、文脈が別様を明確に示さない限り複数形も含むことが意図される。この明細書において用いられるときの「含む（ｃｏｍｐｒｉｓｅｓ）」もしくは「含んでいる（ｃｏｍｐｒｉｓｉｎｇ）」またはその両方の用語は、記述される特徴、整数、ステップ、動作、エレメント、もしくはコンポーネント、またはその組み合わせの存在を特定するが、１つ以上の他の特徴、整数、ステップ、動作、エレメント、コンポーネント、もしくはそのグループ、またはその組み合わせの存在または付加を除外しないことがさらに理解されるだろう。 The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the present invention. As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," as used in this specification, specify the presence of stated features, integers, steps, operations, elements, or components, or combinations thereof, but do not exclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or groups thereof, or combinations thereof.

以下の請求項におけるすべての手段またはステップ・プラス機能要素に対応する構造、材料、動作、および均等物は、特定的に請求される他の請求要素と組み合わせてその機能を行うための任意の構造、材料、または動作を含むことが意図される。本発明の説明は例示および説明の目的のために提供されているが、開示される形の本発明に対して網羅的または限定的になることは意図されていない。本発明の範囲から逸脱することなく、当業者には多くの修正および変更が明らかになるだろう。実施形態は、本発明の原理および実際の適用を最もよく説明し、かつ他の当業者が予期される特定の使用に好適であるようなさまざまな修正を伴うさまざまな実施形態に対して本発明を理解できるようにするために、選択されて記載されたものである。 Structure, material, acts, and equivalents corresponding to all means- or step-plus-function elements in the following claims are intended to include any structure, material, or acts for performing that function in combination with other specifically claimed elements. The description of the present invention has been provided for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those skilled in the art without departing from the scope of the invention. The embodiments have been chosen and described to best explain the principles and practical application of the invention, and to enable others skilled in the art to understand the invention in various embodiments with various modifications as may be suitable for the particular uses anticipated.

簡潔には、本発明の概念が以下の項目に要約されてもよい。 Briefly, the concept of the present invention may be summarized in the following points:

１．リンクド・データベースに対するクラッシュ回復のためのコンピュータ実施方法であって、リンクド・データベースがソース・データベースと、関連ターゲット・データベースとを含み、ソース・データベースを含むデータベース管理システムに対する選択されたクエリが、処理のためにターゲット・データベースを含むデータベース管理システムに移される、この方法が、
－ソース・データベースのテーブルの内容の選択された部分を、ターゲット・データベースのテーブルの内容のそれぞれの部分と同期させることと、
－同期中に、ターゲット・データベースを含むデータベース管理システムのメモリ内ターゲット・データベース部分にソース・データベースに対する変更を適用することと、
－メモリ内ターゲット・データベース部分に対する永続的に適用される変更を永続ターゲット・データベース・ストレージに非同期的に記憶することと、
－ターゲット・データベース・システムのデータベース・クラッシュの際に、永続ターゲット・データベース・ストレージにおいて利用可能な最新のスナップショットによってメモリ内ターゲット・データベース部分を復元することと、
－ターゲット・データベース・システムのデータベース・クラッシュの際に、永続ターゲット・データベース・ストレージにおいて利用可能な最新のスナップショットよりも後のタイムスタンプを有するソース・データベース回復ログ・ファイルからの変更をメモリ内ターゲット・データベース部分に適用することとを含むもの。 1. A computer-implemented method for crash recovery for linked databases, the linked databases including a source database and an associated target database, wherein selected queries to a database management system including the source database are transferred to a database management system including the target database for processing, the method comprising:
- synchronizing selected portions of the contents of tables in a source database with respective portions of the contents of tables in a target database;
During synchronization, applying changes to the source database to an in-memory target database portion of a database management system that contains the target database;
asynchronously storing persistently applied changes to the in-memory target database portion to a persistent target database storage;
In the event of a database crash of the target database system, restoring the in-memory target database portion with the latest snapshot available in the persistent target database storage;
- in the event of a database crash of the target database system, applying changes from a source database recovery log file having a timestamp later than the most recent snapshot available in the persistent target database storage to the in-memory target database portion.

２．項目１による方法であって、同期させることが、
－ソース・データベースに関係する回復ログ・ファイルのエントリを読取ることと、読取ったエントリをターゲット・データベースに適用することとを含むもの。 2. The method according to item 1, wherein the synchronizing comprises:
- One that involves reading recovery log file entries relating to the source database and applying the read entries to the target database.

３．項目１または２による方法であって、ソース・データベースがトランザクションに対して最適化されるか、もしくはソース・データベースが行指向リレーショナル・データベース管理システムであるか、またはその両方であるもの。 3. A method according to item 1 or 2, wherein the source database is optimized for transactions, or the source database is a row-oriented relational database management system, or both.

４．先行する項目のいずれかによる方法であって、ターゲット・データベースが分析動作に対して最適化されるか、もしくはターゲット・データベースが列指向データベースであるか、またはその両方であるもの。 4. A method according to any of the preceding items, wherein the target database is optimized for analytical operations, or the target database is a column-oriented database, or both.

５．先行する項目のいずれかによる方法であって、
－ターゲット・データベースのクラッシュの場合に、ターゲット・データベースの回復が終了するまでターゲット・データベースに対するクエリを遅延させることをさらに含むもの。 5. A method according to any of the preceding items,
In the event of a target database crash, further comprising delaying queries to the target database until recovery of the target database is complete.

６．先行する項目のいずれかによる方法であって、選択されたテーブルを定義するメタデータが、回復ログ・ファイルの一部であるもの。 6. A method according to any of the preceding items, wherein the metadata defining the selected table is part of the recovery log file.

７．先行する項目のいずれかによる方法であって、永続的に適用される変更を記憶することが、
－メモリ内ターゲット・データベース部分において予め定められた数の変更が完了するまで待機することを含むもの。 7. A method according to any of the preceding items, wherein storing the permanently applied changes comprises:
- One that involves waiting until a predetermined number of changes are completed in the in-memory target database portion.

８．先行する項目のいずれかによる方法であって、メモリ内ターゲット・データベース部分のテーブルを復元することが、データ使用、クエリ優先順位、およびデータ優先順位からなる群より選択される１つによって、回復を優先順位付けすることを含むもの。 8. A method according to any of the preceding items, wherein restoring tables of the in-memory target database portion includes prioritizing recovery by one selected from the group consisting of data usage, query priority, and data priority.

９．項目８による方法であって、データ使用によって回復を優先順位付けすることが、
－ターゲット・データベースにおける各テーブルに対するカウンタを維持することであって、カウンタのカウンタ値は関連テーブルをいくつのクエリが待機しているかを示す、維持することと、
－最初に最高カウンタ値を有するデータベース・テーブルを１番に復元することとを含むもの。 9. The method according to claim 8, wherein prioritizing recovery by data usage comprises:
- maintaining a counter for each table in the target database, the counter value of the counter indicating how many queries are waiting for the associated table;
- Restoring the database table with the highest counter value first.

１０．項目８または９による方法であって、クエリ優先順位によって回復を優先順位付けすることが、
－最高優先順位を有するクエリを受信するデータベース・テーブルを最初に復元することを含むもの。 10. The method according to item 8 or 9, wherein prioritizing recovery by query priority comprises:
- One that involves first restoring the database table that receives the query with the highest priority.

１１．項目８～１０のいずれかによる方法であって、データ優先順位によって回復を優先順位付けすることが、
－２グループのデータベース・テーブルを維持することであって、各グループはユーザの別個のグループに関係する、維持することと、
－より高い構成グループ優先順位を有するグループのデータベース・テーブルを最初に復元することとを含むもの。 11. The method according to any of items 8 to 10, wherein prioritizing recovery by data priority comprises:
- maintaining two groups of database tables, each group relating to a distinct group of users;
- Restoring the database tables of groups with higher configuration group priority first.

１２．先行する項目のいずれかによる方法であって、
－次の回復させるべきテーブルのために回復させるべきデータ・ボリュームを決定することと、
－回復させるべきボリュームに依存する回復戦略を用いてそのテーブルを回復させることであって、回復戦略は増分更新戦略またはバルク更新戦略である、回復させることとをさらに含むもの。 12. A method according to any of the preceding items, comprising:
determining the data volume to be restored for the next table to be restored;
recovering the table using a recovery strategy that depends on the volume to be recovered, the recovery strategy being an incremental update strategy or a bulk update strategy.

１３．リンクド・データベースに対するクラッシュ回復を有するリンクド・データベース・システムであって、リンクド・データベースはソース・データベースと、関連ターゲット・データベースとを含み、ソース・データベースのテーブルの内容の選択された部分が、ターゲット・データベースのテーブルの内容のそれぞれの部分と同期され、このリンクド・データベース・システムが、
－プロセッサと、プロセッサに通信的に結合されたメモリとを含み、メモリはプログラム・コード部分を記憶しており、このプログラム・コード部分は実行されたときに、プロセッサが
－ソース・データベースのテーブルの内容の選択された部分を、ターゲット・データベースのテーブルの内容のそれぞれの部分と同期させることと、
－同期中に、ターゲット・データベースを含むデータベース管理システムのメモリ内ターゲット・データベース部分にソース・データベースに対する変更を適用することと、
－メモリ内ターゲット・データベース部分に対する永続的に適用される変更を永続ターゲット・データベース・ストレージに非同期的に記憶することと、
－ターゲット・データベース・システムのデータベース・クラッシュの際に、永続ターゲット・データベース・ストレージにおいて利用可能な最新のスナップショットによってメモリ内ターゲット・データベース部分を復元することと、
－ターゲット・データベース・システムのデータベース・クラッシュの際に、永続ターゲット・データベース・ストレージにおいて利用可能な最新のスナップショットよりも後のタイムスタンプを有するソース・データベース回復ログ・ファイルからの変更をメモリ内ターゲット・データベース部分に適用することとを可能にするもの。 13. A linked database system with crash recovery for linked databases, the linked databases including a source database and an associated target database, wherein selected portions of the contents of tables in the source database are synchronized with respective portions of the contents of tables in the target database, the linked database system comprising:
a processor and a memory communicatively coupled to the processor, the memory storing program code portions that, when executed, cause the processor to: synchronize selected portions of the contents of tables of a source database with respective portions of the contents of tables of a target database;
During synchronization, applying changes to the source database to an in-memory target database portion of a database management system that contains the target database;
asynchronously storing persistently applied changes to the in-memory target database portion to a persistent target database storage;
In the event of a database crash of the target database system, restoring the in-memory target database portion with the latest snapshot available in the persistent target database storage;
- In the event of a database crash of the target database system, it is possible to apply changes from a source database recovery log file having a later timestamp than the most recent snapshot available in the persistent target database storage to the in-memory target database portion.

１４．項目１３によるリンクド・データベース・システムであって、プログラム・コード部分は、プロセッサが
－同期のために、ソース・データベースに関係する回復ログ・ファイルのエントリを読取ることと、読取ったエントリをターゲット・データベースに適用することとをさらに可能にするもの。 14. The linked database system according to item 13, wherein the program code portion further enables the processor to: read entries in a recovery log file relating to the source database for synchronization; and apply the read entries to the target database.

１５．項目１３または１４によるリンクド・データベース・システムであって、ソース・データベースがトランザクションに対して最適化されるか、もしくはソース・データベースが行指向リレーショナル・データベース管理システムであるか、またはその両方であるもの。 15. A linked database system according to item 13 or 14, wherein the source database is optimized for transactions, or the source database is a row-oriented relational database management system, or both.

１６．項目１３～１５のいずれかによるリンクド・データベース・システムであって、ターゲット・データベースが分析動作に対して最適化されるか、もしくはターゲット・データベースが列指向データベースであるか、またはその両方であるもの。 16. A linked database system according to any of items 13 to 15, wherein the target database is optimized for analytical operations, or the target database is a column-oriented database, or both.

１７．項目１３～１６のいずれかによるリンクド・データベース・システムであって、プログラム・コード部分は、プロセッサが
－ターゲット・データベースのクラッシュの場合に、ターゲット・データベースの回復が終了するまでターゲット・データベースに対するクエリを遅延させることをさらに可能にするもの。 17. A linked database system according to any of items 13 to 16, wherein the program code portion further enables the processor to: in the event of a crash of the target database, delay queries to the target database until recovery of the target database is complete.

１８．項目１３～１７のいずれかによるリンクド・データベース・システムであって、選択されたテーブルを定義するメタデータが、回復ログ・ファイルの一部であるもの。 18. A linked database system according to any of items 13-17, in which the metadata defining the selected tables is part of the recovery log file.

１９．項目１３～１８のいずれかによるリンクド・データベース・システムであって、プログラム・コード部分は、プロセッサが
－永続的に適用される変更を記憶するために、メモリ内ターゲット・データベース部分において予め定められた数の変更が完了するまで待機することをさらに可能にするもの。 19. A linked database system according to any of items 13 to 18, wherein the program code portion further enables the processor to: wait until a predetermined number of changes are completed in the in-memory target database portion in order to store the changes to be permanently applied.

２０．項目１３～１９のいずれかによるリンクド・データベース・システムであって、メモリ内ターゲット・データベース部分のテーブルを復元することが、データ使用、クエリ優先順位、およびデータ優先順位からなる群より選択される１つによって、回復を優先順位付けすることを含むもの。 20. A linked database system according to any of items 13-19, wherein restoring tables of the in-memory target database portion includes prioritizing recovery by one selected from the group consisting of data usage, query priority, and data priority.

２１．項目２０によるリンクド・データベース・システムであって、プログラム・コード部分は、データ使用によって回復を優先順位付けすることに対して、プロセッサが
－ターゲット・データベースにおける各テーブルに対するカウンタを維持することであって、カウンタのカウンタ値は関連テーブルをいくつのクエリが待機しているかを示す、維持することと、
－最初に最高カウンタ値を有するデータベース・テーブルを１番に復元することとをさらに可能にするもの。 21. A linked database system according to item 20, wherein the program code portion, for prioritizing recovery by data usage, causes the processor to: maintain a counter for each table in the target database, the counter value of the counter indicating how many queries are waiting for the associated table;
- Further allows restoring the database table with the highest counter value first.

２２．項目２０または２１によるリンクド・データベース・システムであって、プログラム・コード部分は、クエリ優先順位によって回復を優先順位付けすることに対して、プロセッサが
－最高優先順位を有するクエリを受信するデータベース・テーブルを最初に復元することをさらに可能にするもの。 22. The linked database system according to item 20 or 21, wherein the program code portion further enables the processor to prioritize recovery by query priority, restoring first the database table that receives the query with the highest priority.

２３．項目２０～２２のいずれかによるリンクド・データベース・システムであって、プログラム・コード部分は、データ優先順位によって回復を優先順位付けすることに対して、プロセッサが
－２グループのデータベース・テーブルを維持することであって、各グループはユーザの別個のグループに関係する、維持することと、
－より高い構成グループ優先順位を有するグループのデータベース・テーブルを最初に復元することとをさらに可能にするもの。 23. A linked database system according to any of items 20-22, wherein the program code portion for prioritizing recovery by data priority causes the processor to: maintain two groups of database tables, each group relating to a distinct group of users;
- Further allowing the database tables of groups with higher configuration group priority to be restored first.

２４．項目１３～１５のいずれかによるリンクド・データベース・システムであって、プログラム・コード部分は、プロセッサが
－次の回復させるべきテーブルのために回復させるべきデータ・ボリュームを決定することと、
－回復させるべきボリュームに依存する回復戦略を用いてそのテーブルを回復させることであって、回復戦略は増分更新戦略またはバルク更新戦略である、回復させることとをさらに可能にするもの。 24. A linked database system according to any of items 13 to 15, wherein the program code portion includes a processor: determining a data volume to be restored for the next table to be restored;
- recovering the table using a recovery strategy that depends on the volume to be recovered, the recovery strategy being an incremental update strategy or a bulk update strategy.

２５．リンクド・データベースに対するクラッシュ回復を有するリンクド・データベース・システムに対するコンピュータ・プログラム製品であって、リンクド・データベースはソース・データベースと、関連ターゲット・データベースとを含み、ソース・データベースのテーブルの内容の選択された部分が、ターゲット・データベースのテーブルの内容のそれぞれの部分と同期され、前記コンピュータ・プログラム製品が、具現化されるプログラム命令を有するコンピュータ可読記憶媒体を含み、前記プログラム命令が１つ以上のコンピュータ・システムまたはコントローラによって実行可能であることによって、前記１つ以上のコンピュータ・システムに、
－ソース・データベースのテーブルの内容の選択された部分を、ターゲット・データベースのテーブルの内容のそれぞれの部分と同期させることと、
－同期中に、ターゲット・データベースを含むデータベース管理システムのメモリ内ターゲット・データベース部分にソース・データベースに対する変更を適用することと、
－メモリ内ターゲット・データベース部分に対する永続的に適用される変更を永続ターゲット・データベース・ストレージに非同期的に記憶することと、
－ターゲット・データベース・システムのデータベース・クラッシュの際に、永続ターゲット・データベース・ストレージにおいて利用可能な最新のスナップショットによってメモリ内ターゲット・データベース部分を復元することと、
－ターゲット・データベース・システムのデータベース・クラッシュの際に、永続ターゲット・データベース・ストレージにおいて利用可能な最新のスナップショットよりも後のタイムスタンプを有するソース・データベース回復ログ・ファイルからの変更をメモリ内ターゲット・データベース部分に適用することとを行わせるもの。 25. A computer program product for a linked database system with crash recovery for linked databases, the linked databases including a source database and an associated target database, wherein selected portions of the contents of tables in the source database are synchronized with respective portions of the contents of tables in the target database, said computer program product including a computer readable storage medium having program instructions embodied thereon, said program instructions being executable by one or more computer systems or controllers to cause said one or more computer systems to:
- synchronizing selected portions of the contents of tables in a source database with respective portions of the contents of tables in a target database;
During synchronization, applying changes to the source database to an in-memory target database portion of a database management system that contains the target database;
asynchronously storing persistently applied changes to the in-memory target database portion to a persistent target database storage;
In the event of a database crash of the target database system, restoring the in-memory target database portion with the latest snapshot available in the persistent target database storage;
In the event of a database crash of the target database system, changes from a source database recovery log file having a timestamp later than the most recent snapshot available in the persistent target database storage are applied to the in-memory target database portion.

Claims

1. A computer-implemented method for crash recovery for linked databases, the linked databases including a source database and an associated target database, wherein selected queries for a database management system including the source database are transferred for processing to a database management system including the target database, the method comprising:
synchronizing selected portions of the contents of the tables of the source database with respective portions of the contents of the tables of the target database;
During said synchronization, applying changes to said source database to an in-memory target database portion of said database management system that includes said target database;
asynchronously storing persistently applied changes to the in-memory target database portion to a persistent target database storage;
In the event of a crash of the target database , restoring the in-memory target database portion with the latest snapshot available in the persistent target database storage;
and upon the crash of the target database , applying changes from a recovery log file of the source database having a timestamp later than the most recent snapshot available in the persistent target database storage to the in-memory target database portion.

The synchronizing
2. The method of claim 1, further comprising: reading entries from a recovery log file relating to said source database; and applying said read entries to said target database.

The method of claim 1, wherein the source database is optimized for transactions or is a row-oriented relational database management system.

The method of claim 1, wherein the target database is optimized for analytical operations or is a column-oriented database.

2. The method of claim 1, further comprising, in the event of the crash of the target database, delaying queries to the target database until recovery of the target database is complete.

The method of claim 1, wherein metadata defining the selected table is part of the recovery log file.

Storing the permanently applied changes
2. The method of claim 1, further comprising waiting until a predetermined number of modifications are completed in the in-memory target database portion.

2. The method of claim 1, wherein the restoring the tables of the in-memory target database portion includes prioritizing recovery from the crash by one selected from the group consisting of data usage, query priority, and data priority.

said prioritizing said recovery by said data usage;
maintaining a counter for each table in the target database, the counter value indicating how many queries are waiting on the associated table;
and restoring the database table with the highest counter value first in order of 1.

said prioritizing said recovery by said query priority;
9. The method of claim 8, further comprising first restoring the database table that receives a query with the highest priority.

said prioritizing said recovery by said data priority;
maintaining two groups of database tables, each group relating to a distinct group of users;
and restoring the database tables of the groups having higher configuration group priority first.

determining a data volume to be restored for the next table to be restored;
10. The method of claim 1, further comprising: recovering the table using a recovery strategy that depends on the volume to be recovered, the recovery strategy being an incremental update strategy or a bulk update strategy that recovers via a bulk load mechanism .

A linked database system, the linked database including a source database and an associated target database, wherein selected portions of the contents of tables of the source database are synchronized with respective portions of the contents of the tables of the target database;
a processor and a memory communicatively coupled to the processor, wherein the processor synchronizes selected portions of the contents of the table of the source database with respective portions of the contents of a table of the target database;
During said synchronization, applying changes to said source database to an in-memory target database portion of a database management system that includes said target database;
asynchronously storing persistently applied changes to the in-memory target database portion to a persistent target database storage;
In the event of a crash of the target database , restoring the in-memory target database portion with the latest snapshot available in the persistent target database storage;
and upon said crash of said target database, applying to said in-memory target database portion changes from a recovery log file of said source database having a timestamp later than the most recent snapshot available in said persistent target database storage.

The synchronizing
14. The linked database system of claim 13, further comprising reading entries in the recovery log file relating to the source database and applying the read entries to the target database.

The linked database system of claim 13, wherein the source database is optimized for transactions or is a row-oriented relational database management system.

The linked database system of claim 13, wherein the target database is optimized for analytical operations or is a column-oriented database.

In the event of the crash of the target database, the processor:
14. The linked database system of claim 13, wherein queries to the target database are delayed until recovery of the target database is complete.

The linked database system of claim 13, wherein metadata defining the selected table is part of the recovery log file.

Storing the permanently applied changes
14. The linked database system of claim 13, including waiting until a predetermined number of modifications are completed in the in-memory target database portion.

14. The linked database system of claim 13, wherein said restoring tables of said in-memory target database portion includes prioritizing said crash recovery by one selected from the group consisting of data usage, query priority, and data priority.

said prioritizing said recovery by said data usage;
maintaining a counter for each table in the target database, the counter value indicating how many queries are waiting on the associated table;
and restoring the database table with the highest counter value first in order of priority.

said prioritizing said recovery by said query priority;
21. The linked database system of claim 20, including first restoring the database table that receives a query with the highest priority.

said prioritizing said recovery by said data priority;
maintaining two groups of database tables, each group relating to a distinct group of users;
21. The linked database system of claim 20, further configured to: first restore the database tables of the groups having higher constituent group priority.

said processor determining a data volume to be restored for the next table to be restored;
14. The linked database system of claim 13, further comprising: recovering the table using a recovery strategy dependent on the volume to be recovered, the recovery strategy being an incremental update strategy or a bulk update strategy recovered via a bulk load mechanism .

A computer program for crash recovery for linked databases, the computer program causing a processor to carry out a method according to any one of claims 1 to 12.