JP4528307B2

JP4528307B2 - Dynamic performance monitoring based approach to memory management

Info

Publication number: JP4528307B2
Application number: JP2006542904A
Authority: JP
Inventors: アドル−タバタバイ、アリ−レーザ; スブラマネー、スレーヴィナス; ハドソン、リチャード; セラーノ、モーリシオ
Original assignee: Intel Corp
Current assignee: Intel Corp
Priority date: 2003-12-31
Filing date: 2004-12-24
Publication date: 2010-08-18
Anticipated expiration: 2024-12-24
Also published as: WO2005066791A1; CN1902598A; JP2007513437A; US20060143421A1; EP1702269A1; US7490117B2; EP1702269B1; CN100549982C

Description

本開示は広く、プロセッサベースの１つのシステムにおけるメモリ管理に関し、特に、メモリ管理を最適化するための装置及び技術に関する。 The present disclosure relates generally to memory management in one processor-based system, and more particularly to an apparatus and technique for optimizing memory management.

マイクロプロセッサのスピードとメモリパフォーマンスとの間に、よく知られたパフォーマンスギャップが存在する。マイクロプロセッサのクロックスピードは２、３年毎に倍増する一方で、メモリスピードはほとんど向上しない。１つのマイクロプロセッサは、ＧＨｚクロックスピードで動作し得るが、当該マイクロプロセッサによって使用されるランダムアクセスメモリ（ＲＡＭ）は、少なくとも桁違いに遅いクロックスピードを持つ。消費者は、そのパフォーマンスギャップが、ハードドライブ及びＣＤ−ＲＯＭストレージのようなマスストレージメモリに影響するだけでなく、ギャップがＲＡＭ及びキャッシュのような高速なメモリにも影響すること直感的に理解し得る。さらに、そのパフォーマンスギャップは、クロックスピードだけでなく、遅延問題及びメモリストールに由来する。 There is a well-known performance gap between microprocessor speed and memory performance. While the microprocessor clock speed doubles every few years, the memory speed is hardly improved. One microprocessor can operate at GHz clock speeds, but the random access memory (RAM) used by the microprocessor has a clock speed that is at least orders of magnitude slower. Consumers intuitively understand that the performance gap not only affects mass storage memory such as hard drives and CD-ROM storage, but also affects high-speed memory such as RAM and cache. obtain. Furthermore, the performance gap stems from delay problems and memory stalls as well as clock speeds.

コンピュータシステムは、パフォーマンスギャップに対処するために複数のメモリレベルを使用する。それぞれのレベルはレジスタにより近く、レジスタに低減された待ち時間を提供する。レベル１メモリは、比較的小さく極めて高速なメモリであり、典型的には、低レベル命令及びデータを記憶するマイクロプロセッサチップ上に設けられる。レベル２キャッシュメモリは、マイクロプロセッサ上にさらに設けられる、１つのより大きいメモリである。さらなる複数のレベルのキャッシュメモリもまた可能である。これらのメモリは、典型的にはＲＡＭより極めて小さいが、極めて高速である。 Computer systems use multiple memory levels to address performance gaps. Each level is closer to the register and provides a reduced latency to the register. Level 1 memory is a relatively small and extremely fast memory and is typically provided on a microprocessor chip that stores low level instructions and data. The level 2 cache memory is one larger memory further provided on the microprocessor. Further multiple levels of cache memory are also possible. These memories are typically much smaller than RAM, but are very fast.

不幸なことに、レベル１及びレベル２キャッシュメモリは、ＲＡＭからのメモリを待っている待ち時間問題及びメモリストールに悩まされる。より大きいキャッシュメモリは、例えば、より大きいリード及びライト待ち時間、より大きいデータ・トランスレーション・ルックアサイド・バッファ（ＤＬＴＢ）ミス、及びより大きいキャッシュミス情報を受ける。ＤＴＬＢは１つのキャッシュに結合され、キャッシュのようなより高レベルのメモリにデータをロケーティングすることを支援するために使用される。 Unfortunately, Level 1 and Level 2 cache memories suffer from latency problems and memory stalls waiting for memory from RAM. The larger cache memory receives, for example, larger read and write latency, larger data translation lookaside buffer (DLTB) misses, and larger cache miss information. DTLB is combined into one cache and used to help locate data in higher level memory such as cache.

メモリパフォーマンスを改善するための種々の技術が開発されている。実例は、データプリフェッチング、マルチスレッディングコード、動的命令スケジューリング、投機的コード実行、及びキャッシュを意識したデータ配置を含む。これらの解決方法はメモリ待ち時間問題に対処しようとする。他の解決方法はメモリ割り当て問題に対処しようとする。例えば、ガーベッジコレクションアルゴリズムは、ヒープ内の使用されていないメモリ領域を回収し、既存のメモリオブジェクトをより効率的な方法で組織化すべくデザインされた。より重要なことに、それらは使用されていないメモリの回収を管理することからプログラマを解放する。 Various techniques have been developed to improve memory performance. Examples include data prefetching, multithreading code, dynamic instruction scheduling, speculative code execution, and cache aware data placement. These solutions attempt to address the memory latency problem. Other solutions attempt to address the memory allocation problem. For example, the garbage collection algorithm was designed to reclaim unused memory areas in the heap and organize existing memory objects in a more efficient manner. More importantly, they free the programmer from managing the collection of unused memory.

いくつかのガーベッジコレクション技術、例えばコピーガーベッジコレクション、マークアンドスウィープガーベッジコレクション、世代別ガーベッジコレクション、及びスライディングコンパクションが存在する。スライディングコンパクションはポピュラーなガーベッジコレクション技術であり、ライブメモリオブジェクトがメモリヒープ内の複数のデッドスペースにわたってリライトされ、アロケーション順序を維持する。その技術は、Ｃ＃又はＪａｖａ（登録商標）で記述されたようなオブジェクト指向アプリケーション、サーバベース環境において使用されるいくつかの．Ｎｅｔフレームワーク（ワシントン州レッドモンドのマイクロソフト社によって最初に開発された）のような複数のフレームワークに特に有用である。 There are several garbage collection techniques, such as copy garbage collection, mark and sweep garbage collection, generational garbage collection, and sliding compaction. Sliding compaction is a popular garbage collection technique in which live memory objects are rewritten across multiple dead spaces in the memory heap to maintain the allocation order. The technology includes several .NET applications used in object-oriented applications, server-based environments such as those described in C # or Java. It is particularly useful for multiple frameworks such as the Net framework (originally developed by Microsoft Corporation in Redmond, Washington).

ガーベッジコレクションスキームは、到達不可能な、したがって再利用可能なエリアを求めてメモリヒープ領域を検索する。オブジェクトがアロケートされ得る場所を制限することによってメモリを断片化するガーベッジコレクタは、オブジェクトアロケーション回数に悪影響を与え、より多いＤＴＬＢミスをもたらし得る。管理されたヒープ内のライブオブジェクトは互いに近くに運ばれるので、実行中のコードセットをサポートするために必要なＤＴＬＢエントリの数は、スライディングコンパクションを用いて低減される。スライディングコンパクションの有用な特性は、オブジェクトがスライディングコンパクションが実行される前に元々配置された空間的順序を撹乱することがなく、したがって空間的順序を維持しつつ、介在するデッドスペースを取り除くことができる。したがって、インプレースな圧縮により空間的局所性が実際に改善される。より少ないＤＴＬＢミスによりより少ないＣＰＵストールがもたらされ、コードスピードが高められる。そのうえ、デッドスペースの低減によりキャッシュミスが低減され得る。 The garbage collection scheme searches the memory heap area for areas that are unreachable and therefore reusable. Garbage collectors that fragment memory by limiting where objects can be allocated can adversely affect object allocation times and lead to more DTLB misses. Since live objects in the managed heap are carried close together, the number of DTLB entries required to support the running code set is reduced using sliding compaction. A useful property of sliding compaction is that it does not disturb the spatial order in which the object was originally placed before sliding compaction was performed, thus eliminating intervening dead space while maintaining the spatial order. . Thus, in-place compression actually improves spatial locality. Less DTLB misses result in less CPU stalls and increased code speed. Moreover, cache misses can be reduced by reducing dead space.

そのパフォーマンス上の利点にもかかわらず、スライディングコンパクションは、いくつかの他のガーベッジコレクションルーチンに比較してかなり不経済であり、ガーベッジコレクションの全てのフェーズにおける著しい空間及び時間オーバーヘッドを課す。これらの問題は、巨大なヒープサイズで悪化させられる。インクリメンタルスライディングコンパクション、すなわち与えられたガーベッジコレクションサイクルの間にヒープの一部だけをスライディングすることでさえ、多くのメモリ領域が、管理される前に多くのコレクションサイクルを待たなければならないので、問題のエリアに十分速やかに到達することができない。 Despite its performance advantages, sliding compaction is quite uneconomical compared to some other garbage collection routines and imposes significant space and time overhead in all phases of garbage collection. These problems are exacerbated by the large heap size. Even with incremental sliding compaction, i.e. sliding only a portion of the heap during a given garbage collection cycle, many memory areas must wait for many collection cycles before being managed, which is problematic. The area cannot be reached quickly enough.

結局、メモリ待ち時間及びストールは、現在のメモリ管理技術に重荷を課す。ソフトウェアコードがメモリ管理に費やす時間は、その技術を問わず大きい。問題の多いメモリ領域を特定することはコードが実行される度になされる必要があり、これらの問題の多い領域内のメモリスペースの回収は、大きなヒープについては特に、効率的なコード実装にとって余りに不正確である。 Ultimately, memory latency and stalls place a burden on current memory management technology. Software code spends a lot of time managing memory, regardless of the technology. Identifying problematic memory areas must be done each time code is executed, and reclaiming memory space within these problematic areas is too much for efficient code implementation, especially for large heaps. Inaccurate.

メモリパフォーマンスモニタを有する中央演算装置（ＣＰＵ）及びメモリコントローラのブロック図を示す。1 shows a block diagram of a central processing unit (CPU) having a memory performance monitor and a memory controller.

図１のメモリパフォーマンスモニタをより詳細に示す。Fig. 2 shows the memory performance monitor of Fig. 1 in more detail.

メモリ管理最適化の一例のフロー図を示す。FIG. 5 shows a flow diagram of an example of memory management optimization.

２つの不良な領域を持つメモリヒープを示す。A memory heap with two bad areas is shown.

図４の１つの不良な領域の最適化の一例を示す。FIG. 5 shows an example of optimization of one defective area in FIG.

図４の他の１つの不良な領域の最適化の一例を示す。Fig. 5 shows an example of optimization of another defective area in Fig. 4.

不良な領域の最適化後の図４のメモリヒープを示す。FIG. 5 shows the memory heap of FIG. 4 after optimization of a bad area.

他の１つの不良な領域の最適化後の図４のメモリヒープを示す。FIG. 5 shows the memory heap of FIG. 4 after optimization of another bad area.

コード実行の一例のフロー図を示す。A flow diagram of an example of code execution is shown.

種々の技術が、１つのプロセッサシステム内のメモリ管理を最適化するために記載される。メモリ管理によって成し遂げられる成果に焦点をあてることによって、アプリケーションコードの実行、すなわち、プロセッサシステム上で実行するＪａｖａ（登録商標）及び．Ｎｅｔ環境のような動的に管理されたランタイム環境におけるミューテータが改善され得る。その技術は、プロセッサ、又はハードウェアモニタリングを用いることによってパフォーマンスをモニタリングすることが可能なプロセッサアーキテクチャ上に実装されてよい。マイクロプロセッサの実例は、カリフォルニア州サンタクララのインテル社から入手可能なＰｅｎｔｉｕｍ（登録商標）４（Precise Event Based Sampling）及びＩｔａｎｉｕｍ（登録商標）プロセッサ（Performance Monitoring Unit）を含む。その技術は専用プロセッサ環境にも実装されてよく、ストレージ、ネットワーキング、及び組み込み用途で使用される入力／出力（Ｉ／Ｏ）プロセッサが例である。Ｉ／Ｏ用途、例えば、サーバ、ワークステーション、及びストレージサブシステムにおいて、その技術は、コード実行及びデータフローを最適化すべく１つのデバイスネットワークにわたるメモリ管理を最適化するよう実現されてよい。実例は、共にインテル社から入手可能な、ｉ９６０（登録商標）ＲＭ／ＲＮ／ＲＳＩ／Ｏプロセッサ及びＸＳｃａｌｅ（登録商標）コアマイクロアーキテクチャで構築されたＩＯＰ３３１Ｉ／Ｏプロセッサを含む。当業者は、これらのプロセッサは実例であって、記載された技術が他のプロセッサ上に実装され得ることを理解し得る。 Various techniques are described for optimizing memory management within one processor system. By focusing on the results achieved by memory management, application code execution, i.e., Java and. Mutators in dynamically managed runtime environments such as the Net environment can be improved. The technique may be implemented on a processor or processor architecture capable of monitoring performance by using hardware monitoring. Examples of microprocessors include Pentium (R) 4 (Precise Event Based Sampling) and Itanium (R) processor (Performance Monitoring Unit) available from Intel Corporation of Santa Clara, California. The technology may also be implemented in a dedicated processor environment, examples being input / output (I / O) processors used in storage, networking, and embedded applications. In I / O applications, such as servers, workstations, and storage subsystems, the technology may be implemented to optimize memory management across one device network to optimize code execution and data flow. Examples include an I960® RM / RN / RS I / O processor and an IScale 331 I / O processor built on the XScale® core microarchitecture, both available from Intel. One skilled in the art can appreciate that these processors are illustrative and that the described techniques can be implemented on other processors.

図１は、１つのレベル２キャッシュ１０４及び１つのレベル１キャッシュ１０６を有する１つのＣＰＵユニット１０２を備えるコンピュータシステム１００の一例を示す。ＣＰＵ１０２は、１つのＲＡＭ１０８及び１つのリードオンリーメモリ（ＲＯＭ）１１０に、１つのメモリバス１１２を介して結合される。図示された例において、メモリバス１１２は１つのシステムバス１１４に結合される。代わりに、当該メモリバス１１２はシステムバスであってよい。当業者は、図示された構成が単なる例を目的としていることを理解するだろう。 FIG. 1 shows an example of a computer system 100 that includes one CPU unit 102 having one level 2 cache 104 and one level 1 cache 106. CPU 102 is coupled to one RAM 108 and one read-only memory (ROM) 110 via one memory bus 112. In the illustrated example, the memory bus 112 is coupled to one system bus 114. Alternatively, the memory bus 112 may be a system bus. Those skilled in the art will appreciate that the illustrated configuration is merely an example.

ＣＰＵ１０２は、全て互いに結合された、１つの独立した演算論理機構、複数のレジスタ、及びコントロールユニットを有してよい。または、示されるように、ＣＰＵ１０２は１つの集積化されたマイクロプロセッサであってよい。ＣＰＵ１０２は複数のレジスタブロック１１５を有する。ブロック１０６は、プロセッサスピードで動作する、１つのデータキャッシュ、１つの実行キャッシュ、及び１つの命令キャッシュを含む。レベル２キャッシュ１０４は、知られたキャッシュメモリであってよく、クロックサイクル毎にデータを転送する１つのキャッシュインタフェースを含んでよい。レベル２キャッシュは、ＣＰＵチップ（ボックス１０２）上に存在するか単独で存在して１つのＣＰＵバスを介してそこに結合されてよい。 The CPU 102 may have one independent arithmetic logic mechanism, a plurality of registers, and a control unit, all coupled together. Alternatively, as shown, the CPU 102 may be a single integrated microprocessor. The CPU 102 has a plurality of register blocks 115. Block 106 includes one data cache, one execution cache, and one instruction cache operating at processor speed. Level 2 cache 104 may be a known cache memory and may include one cache interface that transfers data every clock cycle. The level 2 cache may reside on the CPU chip (box 102) or may exist alone and be coupled thereto via a single CPU bus.

ＣＰＵ１０１は、１つのデータ変換ルックアサイドバッファ（ＤＴＬＢ）１１６及び１つの命令変換ルックアサイドバッファ（ＩＴＬＢ）１１７を有する。 The CPU 101 has one data conversion lookaside buffer (DTLB) 116 and one instruction conversion lookaside buffer (ITLB) 117.

ＣＰＵ１０２も、示されるようにＣＰＵチップ上にあるかそこに結合された１つのパフォーマンスモニタリングユニット（ＰＭＵ）１１８を有する。複数のオンチップＰＭＵを提供する好適なマイクロプロセッサは、Ｐｅｎｔｉｕｍ（登録商標）４及びＩｔａｎｉｕｍ（登録商標）プロセッサを含む。ＣＰＵ１０２は、パフォーマンスをモニタすることができる任意のプロセッサ又はプロセッサアーキテクチャ（例えば、１つの外部ＰＭＵを持つもの）を表してよい。 The CPU 102 also has one performance monitoring unit (PMU) 118 that is on or coupled to the CPU chip as shown. Suitable microprocessors that provide a plurality of on-chip PMUs include Pentium® 4 and Itanium® processors. The CPU 102 may represent any processor or processor architecture that can monitor performance (eg, having one external PMU).

システムバス１１４は、１つのネットワークコントローラ１２０、１つのディスプレイユニットコントローラ１２２、１つの入力デバイス１２４、及び１つのデータストレージ／メモリメディア１２６、例えば１つのマスストレージデバイスに結合される。バス１６０に結合された種々のデバイスの例は知られている。図示された例において、バス１０６は、１つのバスブリッジ１３０を介して他の１つのバス１２８に結合される。 The system bus 114 is coupled to one network controller 120, one display unit controller 122, one input device 124, and one data storage / memory medium 126, eg, one mass storage device. Examples of various devices coupled to bus 160 are known. In the illustrated example, the bus 106 is coupled to one other bus 128 via one bus bridge 130.

プロセッサ１０２上で実行するオペレーティングシステムは、種々のシステムのうちの１つ、例えば、ＷＩＮＤＯＷＳ（登録商標）９５、９８、２０００、ＭＥ、又はＸＰのような、ワシントン州レッドモンドのマイクロソフト社から入手可能なＷＩＮＤＯＷＳ（登録商標）ファミリのシステムのうちの１つであってよい。代わりに、オペレーティングシステムは、元々、ニュージャージ州マレーヒルのベル研究所（現ルーセントテクノロジ社ベル研究所）によって開発され、様々なソースから利用可能なＵＮＩＸ（登録商標）＊ファミリのシステムのうちの１つであってよい。さらに他にも、オペレーティングシステムは、ＬＩＮＵＸオペレーティングシステムのようなオープンソースシステムであってよい。その上さらに代替のオペレーティングシステムが使用され得ることが理解されるだろう。 The operating system running on processor 102 is available from one of a variety of systems, for example, Microsoft Corporation of Redmond, Washington, such as WINDOWS® 95, 98, 2000, ME, or XP. One of the WINDOWS® family of systems. Instead, the operating system was originally developed by Bell Labs (currently Lucent Technology Bell Labs) in Murray Hill, NJ and is one of the UNIX * family of systems available from a variety of sources. May be one. Still further, the operating system may be an open source system such as the LINUX operating system. It will be further appreciated that alternative operating systems may be used.

プロセッサ１０２は、ＰＭＵ１１８からのデータに基づいて、メモリ管理コード、例えばガーベッジコレクションを実行する。当該コードは、メモリ回収及び初期のアロケーションの両方のために使用される。多くの異なるガーベッジコレクションルーチンが存在する。例えば、１つの参照カウントガーベッジコレクションプログラムは、特定のメモリ領域（例えばブロック）への参照数の経過を追い、メモリロケーションへの参照が無い場合にメモリ領域を開放する。マークアンドスウィープガーベッジコレクションプログラムは、そのとき動作している複数のスレッドのルートから到達可能な複数のオブジェクトをトレースし、到達可能な複数のオブジェクトをマークする。マークアンドスウィープガーベッジコレクションプログラムは、それから全てのオブジェクトを調べ、マークされていない（すなわち、動作しているスレッドのうちの１つのルートからもはや到達できない）複数のオブジェクトによって使用されるメモリ領域を解放する。コピーガーベッジコレクションプログラムは、利用可能なメモリヒープを２つのセクションすなわち２つの空間に分割して、ある時刻に、到達可能なこれらのオブジェクトを、現在使用中の空間（"ＦｒｏｍＳｐａｃｅ"）から現在使用中でない空間（"ＴｏＳｐａｃｅ"）に、（アプリケーションスレッドのルートから推移的に）移動する。アプリケーションスレッドは、満杯になるまで、"ＴｏＳｐａｃｅ"内にオブジェクトをアロケートする。このとき、コピーガーベッジコレクションプログラムは、それから２つの空間の役割を逆転することによって、"ＦｒｏｍＳｐａｃｅ"を回収する。すなわち、旧"ＦｒｏｍＳｐａｃｅ"が新"ＴｏＳｐａｃｅ"になり、旧"ＴｏＳｐａｃｅ"が新"ＦｒｏｍＳｐａｃｅ"になる。 The processor 102 executes memory management code, such as garbage collection, based on the data from the PMU 118. The code is used for both memory recovery and initial allocation. There are many different garbage collection routines. For example, one reference count garbage collection program keeps track of the number of references to a specific memory area (for example, a block) and releases the memory area when there is no reference to a memory location. The mark-and-sweep garbage collection program traces a plurality of objects that can be reached from the roots of a plurality of threads that are currently operating, and marks the plurality of reachable objects. The mark-and-sweep garbage collection program then examines all objects and frees memory space used by multiple objects that are not marked (ie, can no longer be reached from the root of one of the running threads) . The copy garbage collection program divides the available memory heap into two sections, or two spaces, and at some point these reachable objects are currently used from the currently used space ("From Space"). Move to a non-medium space ("To Space") (transitive from the root of the application thread). The application thread allocates an object in “To Space” until it is full. At this time, the copy garbage collection program then retrieves the “From Space” by reversing the roles of the two spaces. That is, the old “From Space” becomes the new “To Space” and the old “To Space” becomes the new “From Space”.

さらに代替として、世代別ガーベッジコレクションプログラムは、最近のメモリアロケーションの大部分がなされたメモリヒープのセクションにフォーカスする。それは、フォーカスエリア内にある、フォーカスエリア外から到達可能なこれらの複数のオブジェクトを、１つの新たなエリアに移動する。フォーカスエリア外から到達可能な複数のオブジェクトの経過を追うことを目的として、世代別ガーベッジコレクションプログラムは、１つのストアバッファ形式の１つの書き込みバリア及び１つのログを使用してよい。書き込みバリアは、全ての書き込みをチェックして、フォーカスエリア外からの１つのオブジェクトがフォーカスエリア内の１つのオブジェクトを参照しているか否かを決定する。フォーカスエリア外の１つのオブジェクトからフォーカスエリア内の１つのオブジェクトに参照がなされている場合、この参照はログに記録される。ガーベッジコレクションプログラムは、その後メモリ回収及びリアロケーションの時にログを調べ、フォーカスエリア内のどの複数のオブジェクトが新たなエリアに移動させられるべきかを決定する。ログは、１つのカードテーブル又は１つのハッシュテーブル又は１つのシンプルなシーケンシャルバッファとして符号化されることができる。 As a further alternative, generational garbage collection programs focus on sections of the memory heap where most of the recent memory allocations have been made. It moves these objects within the focus area, reachable from outside the focus area, to a new area. For the purpose of keeping track of multiple objects reachable from outside the focus area, a generational garbage collection program may use one write barrier and one log in one store buffer format. The write barrier checks all writes and determines whether one object from outside the focus area refers to one object in the focus area. If a reference is made from one object outside the focus area to one object within the focus area, this reference is recorded in the log. The garbage collection program then examines the log during memory collection and rearlocation to determine which objects in the focus area should be moved to the new area. The log can be encoded as one card table or one hash table or one simple sequential buffer.

他の一例のスライディングコンパクションルーチンは、上で概して説明されたスライディングコンパクションである。さらに、他の知られた技術は、ベルトウェイコレクション、オールデストファーストコレクション、上記の任意の数のガーベッジコレクションルーチンを組み合わせるハイブリッドコレクションを含む。オールデストファーストコレクタは、世代別コレクタの典型のような最新の代わりに、システム内の最も古い複数のオブジェクトのコレクションにフォーカスする。ベルトウェイコレクタは、１つのラウンドロビン法を用いて高デッドレートのエリアを探す。１つが発見された場合、それはコレクション動作をこのエリアにフォーカスする。コレクタは、コンカレント又はインクリメンタルであってよい。コンカレントは、それらがアプリケーションコードと並行的に動作できることを意味する。インクリメンタルは、それらが、各ＧＣサイクルの間にデッドオブジェクトの一部だけを回収することを意味する。 Another example sliding compaction routine is the sliding compaction generally described above. In addition, other known techniques include beltway collections, oldest first collections, and hybrid collections that combine any number of the above garbage collection routines. The oldest first collector focuses on the oldest collection of objects in the system instead of the latest as typical of generational collectors. The beltway collector looks for high dead rate areas using a single round robin method. If one is found, it focuses the collection operation on this area. The collector may be concurrent or incremental. Concurrent means that they can run in parallel with application code. Incremental means that they collect only part of the dead object during each GC cycle.

従来のガーベッジコレクションルーチンと異なり、システム１００は、ガーベッジコレクションにフォーカスすべく、ＰＭＵ１１８からのデータを信頼する。図２はＰＭＵ１１８をより詳細に示す。ＰＭＵ１１８はコントロールロジック１５０、複数のカウンタ１５２、及び複数のレジスタ１５４を有する。ＰＭＵ１１８は、コード実行の間にわたって個別のイベントをモニタするオンチップハードウェアであってよい。複数のカウンタ１５２は、複数のグローバルタイムスタンプカウンタ及びＤＴＬＢミスを追跡してＤＴＬＢミスをひき起こすメモリ参照を調査する複数のＤＴＬＢカウンタのような、メモリパフォーマンスをモニタリングすることができる複数の専用プログラマブルイベントカウンタを含む。専用プログラマブルイベントカウンタは、いずれのＤＴＬＢ内の複数のイベントだけでなく、レベル１及びレベル２メモリ１１６及び１０４内の複数のイベントをモニタしてよい。ＰＭＵ１１８は、メモリバス１１２又はシステムバス１１４を介して、ＲＡＭ１０８及び／又は複数のマスストレージメモリ内の複数のイベントをモニタするよう拡張されてよい。ネットワークシステムにおいて、ＰＭＵ１１８は、ネットワークコントローラ１２０を介して、モニタされたデータを遠隔的に提供してよい。 Unlike conventional garbage collection routines, the system 100 relies on data from the PMU 118 to focus on garbage collection. FIG. 2 shows the PMU 118 in more detail. The PMU 118 includes a control logic 150, a plurality of counters 152, and a plurality of registers 154. The PMU 118 may be on-chip hardware that monitors individual events during code execution. Multiple counters 152 are multiple dedicated programmable events that can monitor memory performance, such as multiple DTLB counters that track multiple global timestamp counters and DTLB misses to investigate memory references that cause DTLB misses Includes a counter. The dedicated programmable event counter may monitor multiple events in level 1 and level 2 memories 116 and 104 as well as multiple events in any DTLB. PMU 118 may be extended to monitor multiple events in RAM 108 and / or multiple mass storage memories via memory bus 112 or system bus 114. In a network system, the PMU 118 may provide monitored data remotely via the network controller 120.

ＰＭＵ１１８は、任意のメモリパフォーマンスイベントをモニタしてよい。イベントの実例は、複数の命令キャッシュミス、複数のデータキャッシュミス、複数のブランチ予測ミス、複数のＩＴＬＢミス、複数のＤＴＬＢミス、データ依存性による複数のストール、及びデータキャッシュライトバックを含む。 PMU 118 may monitor any memory performance event. Examples of events include multiple instruction cache misses, multiple data cache misses, multiple branch prediction misses, multiple ITLB misses, multiple DTLB misses, multiple stalls due to data dependencies, and data cache writeback.

モニタされる複数のイベントは、望ましいメモリパフォーマンスイベントをインクリメンタルにモニタするよう複数のカウンタ１５２をコントロールする複数のイベントレジスタ１５４によって特定される。レジスタブロック１５４内のそれぞれのレジスタは、カウンタブロック１５２内のいくつかのカウンタをコントロールしてよい。実例だけを目的として、３２ビットカウンタ並びに３２ビット又は６４ビットレジスタが、それぞれ使用され得る。 The events to be monitored are identified by a plurality of event registers 154 that control a plurality of counters 152 to incrementally monitor desired memory performance events. Each register in register block 154 may control several counters in counter block 152. For illustrative purposes only, a 32-bit counter as well as a 32-bit or 64-bit register may be used, respectively.

ＰＭＵ１１８は、プロセッサ１０２に関連する全メモリシステムをモニタし、複数のコントロールレジスタ１５４において特定されたイベントの数をカウントする。イベントは、種々のコード命令の実行において発生することができ、キャッシュ１０４又はＤＴＬＢへの読み込み及び書き込み試行を含む。．Ｎｅｔのような環境のみならず上記のようなオブジェクト指向言語では、複数のストアドオブジェクトは、他の複数のストアドオブジェクトに関連づけられ、他のコードによって使用可能であってよい。関連づけられた複数のストアドオブジェクトは、一時的な局所性を有する。例えば、複数のオブジェクトは、即時継承内のコードによってアクセスされ得る。それによりヒープ内の空間的な局在性を望ましいものにする。ＰＭＵ１１８は、メモリマネージャがそのような空間的局在性を実現することをアシストすべく、複数のメモリパフォーマンスイベントをモニタしてよい。ＰＭＵ１１８は、異なる複数のメモリイベントが同時にカウントされるよう、複数のイベントを並行してモニタしてよい。 The PMU 118 monitors the overall memory system associated with the processor 102 and counts the number of events specified in the plurality of control registers 154. Events can occur in the execution of various code instructions and include read and write attempts to cache 104 or DTLB. . In an object-oriented language as described above as well as an environment such as Net, a plurality of stored objects may be associated with a plurality of other stored objects and used by other codes. A plurality of associated stored objects have temporary locality. For example, multiple objects can be accessed by code within immediate inheritance. This makes spatial localization within the heap desirable. The PMU 118 may monitor multiple memory performance events to assist the memory manager in achieving such spatial localization. The PMU 118 may monitor multiple events in parallel so that different memory events are counted simultaneously.

複数のＰＭＵは、プロセッサ実装に依存する異なる方法で機能してよいが、一実装例において、ＰＭＵ１１８は、データキャッシュ又はＤＴＬＢミス並びに命令キャッシュ又はＩＴＬＢミスのようなイベントをカウントするカウンタを含む。ＰＭＵ１１８は、特定のメモリ領域に起因するそのようなミスの数を示す履歴データを記憶するためのメモリバッファを含む。ＰＭＵ１１８又は外部コードは、モニタされる複数のメモリ領域のサイズをコントロールしてよい。ＰＭＵ１１８によってモニタされるデータは、個々のメモリブロックの大きさの又はより大きい複数のメモリ領域でのパフォーマンスデータであってよい。複数のメモリ領域は、例として大きさが６４Ｋであってよい。 Although multiple PMUs may function in different ways depending on the processor implementation, in one implementation, the PMU 118 includes counters that count events such as data cache or DTLB misses as well as instruction cache or ITLB misses. The PMU 118 includes a memory buffer for storing historical data indicating the number of such misses due to a particular memory area. The PMU 118 or external code may control the size of the multiple memory areas being monitored. The data monitored by the PMU 118 may be performance data in multiple memory areas that are the size of individual memory blocks or larger. The plurality of memory areas may have a size of 64K as an example.

ＰＭＵ１１８は、複数のイベントが生じたときに複数のイベントを特定すべくプログラムされてよい。代わりに、ＰＭＵ１１８は、一次的にモニタリングに割り込んで、１つのメモリ領域についてのデータ量が１つの閾値に到達した場合に、モニタされたデータを出力するよう設定され得る。閾値はコードによって決定され、例えばモニタされた１つのイベントについてのバッファされた履歴データを、モニタされた他の１つイベントのバッファされた履歴データに対して比較することによって、過去のＰＭＵモニタリング例に基づいて設定されたり、モニタリングしている間に設定されてよい。メモリイベントの閾値を持つメモリ領域を検出すると、ＰＭＵ１１８は当該メモリ領域が１つの不良な領域であると判断する。ＰＭＵ１１８は、モニタリングに割り込んで、後に続くメモリ管理のために当該メモリ領域についての１つの識別子を出力するようプログラムされてよい。システム１００は、他にも、ＰＭＵ１１８外のコードを通じて、ＰＭＵ１１８からのモニタリングデータに基づいて、１つのメモリ領域が不良領域であると判断してよい。ＰＭＵ１１８は、ストップザワールド又はコンカレントなガーベッジコレクションと共に使用されてよい。後者の場合、ガーベッジコレクタがコード、すなわちミューテータの実行と並行して動作することを可能にする。 The PMU 118 may be programmed to identify multiple events when multiple events occur. Instead, the PMU 118 may be set to output monitoring data when it temporarily interrupts monitoring and the amount of data for one memory area reaches one threshold. The threshold is determined by the code, eg past PMU monitoring examples by comparing the buffered historical data for one monitored event against the buffered historical data of another monitored event. It may be set based on or while monitoring. When a memory area having a memory event threshold is detected, the PMU 118 determines that the memory area is one defective area. The PMU 118 may be programmed to interrupt monitoring and output one identifier for the memory area for subsequent memory management. In addition, the system 100 may determine that one memory area is a defective area based on monitoring data from the PMU 118 through a code outside the PMU 118. The PMU 118 may be used with stop-the-world or concurrent garbage collection. The latter case allows the garbage collector to operate in parallel with the execution of the code, ie the mutator.

図３は、メモリ管理を通知するためにＰＭＵ１１８を使用するプロセス３００の一例を示す。プロセス３００は、システム１００上に記憶されて実行されるソフトウェアによって実装されてよい。示される例において、プロセス３００は、ブロック３０２−３１４を参照して説明される、種々のソフトウェアルーチン又はステップを実行する。 FIG. 3 shows an example of a process 300 that uses the PMU 118 to notify memory management. Process 300 may be implemented by software stored and executed on system 100. In the example shown, process 300 performs various software routines or steps described with reference to blocks 302-314.

ＰＭＵ１１８は、レベル１キャッシュ１０６、レベル２キャッシュ１０４、ＤＴＬＢ１１６、及びＩＴＬＢ１１７における複数のメモリオペレーションをモニタして、キャッシュミス又はＤＴＬＢミスであるかにかかわらず、それぞれの高い待ち時間ロードミスについての実効アドレスを特定する実効アドレスブロック３０２に、モニタされた情報を送る。高レイテンシミスは、データがメインメモリ又はＲＡＭからフェッチされることを要求する。キャッシュロードミスした実効アドレスは、キャッシュ内にない１つのメモリオブジェクトである。ブロック３０２は、ロードミス実効アドレスを、それぞれのメモリ領域についての頻度数を保持するレコードデータブロック３０４に供給する。カウンタ１５２又はＲＡＭ１０８又はマスストレージのような他の記憶メディアがブロック３０４を実装してよい。複数のメモリ領域は、任意の望ましい粒度、例えば６４Ｋを有してよい。ブロック３０４は、不良領域が特定されてメモリ管理コードが実行されるようにＰＭＵ１１８からの十分なデータサンプルが提供されているか否かを判断する判断ブロック３０６に制御を渡す。 The PMU 118 monitors multiple memory operations in the level 1 cache 106, level 2 cache 104, DTLB 116, and ITLB 117 and determines the effective address for each high latency load miss, whether it is a cache miss or a DTLB miss. The monitored information is sent to the effective address block 302 that identifies High latency misses require data to be fetched from main memory or RAM. A cache load miss effective address is one memory object not in the cache. Block 302 provides the load miss effective address to record data block 304 which holds the frequency number for each memory area. A counter 152 or RAM 108 or other storage media such as mass storage may implement block 304. The plurality of memory areas may have any desired granularity, for example 64K. Block 304 passes control to a decision block 306 that determines whether sufficient data samples from the PMU 118 are provided so that a bad area is identified and the memory management code is executed.

十分なサンプルが取得されていない場合、取得されたＰＭＵサンプルの全数を記憶するインクリメントブロック３０８に制御が進む。制御は、メモリパフォーマンスデータのさらなるモニタリングのためにＰＭＵ１１８に戻る。十分なデータサンプルが集められたことをブロック３０６が判断した場合、例えば望ましいサンプルカウント値がブロック３０８で記憶されている場合、ブロック３０４からの履歴データが、メモリヒープについて不良領域を特定するブロック３１０に提供される。ブロック３１０は、例えば９０％の特定されたキャッシュ又はＤＴＬＢミスが発生したメモリヒープの（複数の）領域を特定して、当該（複数の）領域を不良としてマークしてよい。ブロック３１０は、各セクションについて閾ミス位置が集中しているところを決定する前にメモリヒープを複数のセクションに分けることによって、複数の不良領域を特定してよい。不良領域の粒度は、ブロック３１０によって設定されてよく、元々モニタされたメモリ領域のサイズと同じであってよいし異なってよい。すなわち、１つの不良領域は複数のミス位置を持つ多数のメモリ領域を含んでよい。 If not enough samples have been acquired, control proceeds to an increment block 308 that stores the total number of acquired PMU samples. Control returns to the PMU 118 for further monitoring of memory performance data. If block 306 determines that enough data samples have been collected, for example if the desired sample count value is stored in block 308, the historical data from block 304 identifies a bad area for the memory heap block 310. Provided to. Block 310 may identify the region (s) of the memory heap where, for example, 90% of the identified cache or DTLB miss occurred, and mark the region (s) as bad. Block 310 may identify multiple bad regions by dividing the memory heap into multiple sections before determining where the threshold miss locations are concentrated for each section. The granularity of the bad area may be set by block 310 and may be the same as or different from the size of the originally monitored memory area. That is, one defective area may include a large number of memory areas having a plurality of miss positions.

特定された複数の不良領域は、ヒープ最適化のためにメモリ管理ブロック３１２に提供される。ブロック３１２は、上記説明されたいずれかのカーベッジコレクションのような、１以上のガーベッジコレクションルーチンを実行してよい。そのルーチンは、不良領域だけ又は不良及び非不良領域の両方に対して実行されてよい。例えば、メモリ管理ブロック３１２は、デフォルトのガーベッジコレクションアルゴリズムをメモリヒープの非不良メモリ領域に適用し、スライディングコンパクションを不良領域、すなわち、過度に高いメモリストールを示す領域だけに適用してよい。このように、プロセス３００は、第１のメモリ管理ルーチンを１つの不良領域又は複数の不良領域に適用し、異なる第２のメモリ管理ルーチンを１つの非不良領域又は複数の非不良領域に適用する。これらの例のそれぞれにおいて、スライディングコンパクションガーベッジコレクタは、最も問題の多いヒープエリアに導かれる。ここで、どのインフラストラクチャがスライディングコンパクションをサポートするために使用され得るかを説明する。 The identified defective areas are provided to the memory management block 312 for heap optimization. Block 312 may execute one or more garbage collection routines, such as any of the garbage collections described above. The routine may be performed only on bad areas or on both bad and non-bad areas. For example, the memory management block 312 may apply a default garbage collection algorithm to non-bad memory areas of the memory heap and apply sliding compaction only to bad areas, i.e. areas that exhibit excessively high memory stalls. In this manner, the process 300 applies the first memory management routine to one defective area or a plurality of defective areas, and applies a different second memory management routine to one non-defective area or a plurality of non-defective areas. . In each of these examples, the sliding compaction garbage collector is directed to the most problematic heap area. Here we describe which infrastructure can be used to support sliding compaction.

ブロック３１２でのガーベッジコレクションのマークフェーズの間、全てのライブオブジェクトがマークされる。また、後のフェーズのスライディングコンパクションをサポートすることを目的として、コンパクション領域を指すヒープ内の複数のメモリオブジェクト、例えば不良領域も記録される。結果として、スライディングコンパクションの間、全てのコンパクションブロックが処理されて、それらのメモリオブジェクトは互いに密集化される。メモリマネージャ３１２の実行の後、１つのブロック３１４はＰＭＵデータコレクションを同期させる。そのような同期化は、ＰＭＵデータのさらなるコレクションが現在のヒープ構成に関連するよう行われる。この同期化の一部として以前のサンプルは破棄される。 During the mark phase of garbage collection at block 312, all live objects are marked. In addition, a plurality of memory objects in the heap pointing to the compaction area, for example, a defective area, are also recorded for the purpose of supporting sliding compaction in a later phase. As a result, during sliding compaction, all compaction blocks are processed, and their memory objects are packed together. After execution of the memory manager 312, one block 314 synchronizes the PMU data collection. Such synchronization is done so that further collections of PMU data are related to the current heap configuration. As part of this synchronization, previous samples are discarded.

ブロック３０２、３０４、３０６、３０８、及び３１０は切り離して説明されたが、それらはＰＭＵ１１８によって実行され得る。 Although blocks 302, 304, 306, 308, and 310 have been described separately, they can be performed by PMU 118.

図４は、複数のメモリ領域４０２−４２０を形成するメモリヒープ４００の一例を示す。メモリ領域の数は、単に例示を目的として提供される。複数のメモリ領域４０４及び４１６は、ヒープ４００についてある閾数又は割合のロードミスに遭遇したからである。ブロック３１０によって不良領域（斜線のシェーディングで示される）として特定されている。メモリ領域４０４の状態の一例が、より詳細に示される。メモリオブジェクト４２２、４２４、及び４２６は、メモリ領域４０４において、デッドスペース４２８及び４３０によって間を開けて配置さる。メモリ領域４１６は、デッドスペース４３６によって隔てられた２つのメモリオブジェクト４３２及び４３４を有する。両方のメモリオブジェクトは、デッドスペース４３８によってメモリ領域４１６の先端から間隔があけられる。 FIG. 4 shows an example of a memory heap 400 that forms a plurality of memory areas 402-420. The number of memory areas is provided for illustrative purposes only. This is because the memory areas 404 and 416 have encountered a certain threshold number or percentage of load misses for the heap 400. Identified as a bad area (indicated by shaded shading) by block 310. An example of the state of the memory area 404 is shown in more detail. Memory objects 422, 424, and 426 are spaced apart by dead spaces 428 and 430 in memory area 404. The memory area 416 has two memory objects 432 and 434 separated by a dead space 436. Both memory objects are spaced from the tip of memory area 416 by dead space 438.

不良領域４０４及び４１６は、それらの領域だけに、領域４０２、４０６、４０８、４１０、４１２、４１４、４１８、及び４２０に影響を及ぼすことなくガーベッジコレクションを実行するブロック３１２のものと特定される。図５は、スライディングコンパクションが実行された後の、結果として得られる最適化されたメモリ領域４０４'を示す。図６は、スライディングコンパクションが実行された後の、結果として得られる最適化されたメモリ領域４１６'を示す。結果として得られるメモリヒープ４００（図７）は、全てのメモリ領域が最適化されていることを示す（すなわち、斜線シェーディングがない）。スライディングコンパクションが説明されたが、他のガーベッジコレクションルーチンが不良領域４０４及び４１６に対して実行されてよい。後に続くメモリパフォーマンスを改善するためのそのような不良領域だけに対するスライディングコンパクションの選択的及び意図的な適用が、このようにして達成される。代わりに、特定の不良領域４０４及び４１６に対して実行される独特のガーベッジコレクションルーチン（例えばスライディングコンパクション）とともに又はなしで、ガーベッジコレクションがヒープ４００の全体にわたって実行されてよい。さらに、代替の実施形態において、図８のブロック領域４０４''及び４１６''に示されるように、不良領域はメモリストレージから一時的にブロックされてよい。 Bad regions 404 and 416 are identified as those of block 312 that perform garbage collection on those regions only without affecting regions 402, 406, 408, 410, 412, 414, 418, and 420. FIG. 5 shows the resulting optimized memory area 404 ′ after sliding compaction has been performed. FIG. 6 shows the resulting optimized memory area 416 ′ after sliding compaction has been performed. The resulting memory heap 400 (FIG. 7) shows that all memory areas are optimized (ie, there is no diagonal shading). Although sliding compaction has been described, other garbage collection routines may be performed on the bad areas 404 and 416. Selective and deliberate application of sliding compaction only to such bad areas to improve subsequent memory performance is thus achieved. Alternatively, garbage collection may be performed throughout the heap 400 with or without a unique garbage collection routine (eg, sliding compaction) performed for specific bad areas 404 and 416. Further, in an alternative embodiment, the defective area may be temporarily blocked from memory storage, as shown in block areas 404 "and 416" of FIG.

図９は、システム１００においてコードを実行するプロセス５００の一例を示す。プロセス５００は、システム１００において記憶されて実行されるソフトウェアによって実装されてよい。示された例において、プロセス５００は、ブロック５０２−５１０を参照して説明される種々のソフトウェアルーチン又はステップを実行する。 FIG. 9 shows an example of a process 500 for executing code in the system 100. Process 500 may be implemented by software stored and executed in system 100. In the illustrated example, process 500 performs various software routines or steps described with reference to blocks 502-510.

ブロック５０２は、ミューテータとも呼ばれるアプリケーションコードをＣＰＵ１０２上で実行する。コードの言語例は、Ｃ＃及びＪＡＶＡ（登録商標）を含むが、コードはこれらの言語に限定されない。コードは、．Ｎｅｔフレームワーク下で記述されてもよい。コードは、１つのオペレーティングシステム、又はオペレーティングシステム上で実行される１つのアプリケーションであってよい。 Block 502 executes application code, also called a mutator, on CPU 102. Examples of the language of the code include C # and JAVA (registered trademark), but the code is not limited to these languages. The code is. It may be described under the Net framework. The code may be an operating system or an application that runs on the operating system.

ブロック５０２は、実行しているコードのために新たな１つのメモリオブジェクトをメモリマネージャがシステム１００のヒープにアロケートすることができるか否かを判断する判断ブロック５０４に制御を渡す。答えがｙｅｓであるとブロック５０４が判断した場合、制御は、追加のコードが実行されるか否かを判断する判断ブロック５０６に渡される。答えがｎｏであるとブロック５０４が判断した場合、最近発見された複数の不良領域の、これらの領域に対する上述のスライディングコンパクション技術を用いたメモリパフォーマンスの最適化に加えて、一定のヒープメモリ回収を実行するブロック５０８に、制御が渡される。失われたオブジェクトをガーベッジコレクションがアロケートすることができないことをブロック５０４が判断した場合、ブロック３１０と同様の、不良領域を特定するブロック５０８に制御が渡される。ブロック５０８は、ブロック３１２と同様のメモリ管理／最適化ブロック５１０に制御を渡す。 Block 502 passes control to a decision block 504 that determines whether the memory manager can allocate a new memory object to the heap of the system 100 for the code being executed. If block 504 determines that the answer is yes, control is passed to decision block 506 which determines whether additional code is executed. If block 504 determines that the answer is no, in addition to optimizing memory performance using the above-described sliding compaction techniques for those recently discovered defective areas, these heap memory recoveries Control is passed to block 508 for execution. If block 504 determines that garbage collection cannot allocate the lost object, control is passed to block 508, which identifies a bad area, similar to block 310. Block 508 passes control to a memory management / optimization block 510 similar to block 312.

上の技術は、キャッシュメモリを最適化することに関連して説明された。その技術は、パフォーマンスモニタがメモリパフォーマンスを計測するメモリストレージのいずれのレベルを最適化するために使用されてよい。さらに、その技術は、周辺デバイス又のような遠隔に格納されたメモリデバイス又はネットワーク又はサーバアプリケーション内のメモリデバイスを最適化すべく使用されてよい。 The above techniques have been described in connection with optimizing cache memory. The technique may be used to optimize any level of memory storage for which the performance monitor measures memory performance. In addition, the technology may be used to optimize memory devices in peripheral devices or remotely stored memory devices or network or server applications.

本発明の教示に従って構築されたいくつかの装置及び技術が本明細書に説明されたが、本特許の適用範囲はそれらに限定されない。それどくろか、本特許は、添付された請求項の範囲内に文言的に又は均等論の下に適正に含まれるその発明の教示の全ての実施形態を含む。 Although several devices and techniques constructed in accordance with the teachings of the present invention have been described herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent includes all embodiments of the teachings of the invention that are properly included within the scope of the appended claims, either literally or under the doctrine of equivalents.

Claims

A step of obtaining performance data for one memory heap having a memory region of more than from one performance monitor,
A procedure for determining whether at least one of the plurality of memory areas is a defective area based on the performance data;
In response to determining that at least one of the plurality of memory areas is a defective area, one garbage management routine is executed as one memory management routine to optimize the at least one defective area in the memory heap. A program for causing a computer to execute the procedure to be converted.

The program of claim 1, wherein the performance data represents at least one memory performance event.

The program of claim 1, wherein the performance data is selected from the group comprising cache misses, translation lookaside buffer misses, branch prediction misses, data dependent stalls, and data cache writebacks.

The program according to any one of claims 1 to 3, wherein the performance monitor is one performance monitoring unit (PMU).

Before Symbol computer,
Executing the memory management routine on at least one defective area;
To execute a procedure for executing a secondary memory management routine on at least one non-defective region,
The program according to claim 1, wherein the secondary memory management routine is different from the memory management routine.

The garbage collection routine, reference counting collection, copy collection, generational collection, mark-and-sweep collection, claim from claim 1 selected Beltway collection, oldest first collection, from the group comprising slide compaction or a hybrid collection The program according to any one of 5 above.

Before Symbol computer,
Before acquiring the performance data for said plurality of memory areas, according the plurality of memory region size granularity <br/> claim 1 procedure Ru is run to determine the in any one of claims 6 Program .

The performance data is obtained from one performance monitoring unit,
When executed by the computer , the performance monitoring unit
The program according to any one of claims 1 to 7, wherein a procedure for counting the number of occurrences of the performance data is executed.

When executed by the computer , the performance monitoring unit
Executing the procedure of comparing the count of occurrences of the performance data with a threshold;
The program according to claim 8 , wherein when the count exceeds the threshold value, it is determined that a defective area exists.

The pre SL count before comparing with the threshold value, the program according to instructions on how to determine whether the number or more data samples predetermined is obtained <br/> claim 9, Ru is executed on the computer.

11. The program according to claim 10, which causes the computer to execute a procedure of collecting additional data samples from the memory heap in response to a determination that additional data samples are acquired.

The program according to any one of claims 11 steps to block in order to a bad area unallocated area claims 1 to be executed by the computer.

Identifying a plurality of load miss memory addresses from one memory heap having a plurality of memory areas;
Holding the frequency number or frequency ratio of the identified load miss memory address;
Determining whether the frequency number or the frequency ratio of load miss memory addresses has reached a threshold in any of the plurality of memory regions;
To optimize memory region in which the frequency number or the frequency ratio of load miss memory addresses has reached the threshold value of the memory heap in response to a determination to the effect that has reached the threshold value in at least one of said plurality of memory areas With steps ,
The step of optimizing a memory area that has reached the threshold value of the memory heap performs garbage collection on at least one of the memory areas in which the frequency number or frequency ratio of load miss memory addresses has reached the threshold value. A method comprising the steps of :

The step of optimizing the memory area that has reached the threshold value of the memory heap includes the step of blocking the memory area in which the frequency number or the frequency ratio of the load miss memory address has reached the threshold value as an unallocated area. The method of claim 13 comprising :

The garbage collection is reference counting collection, copy collection, generational collection, mark-and-sweep collection, Beltway collection, oldest first collection, slide compaction or claim 13 or claim is selected from the group including hybrid collection, 14. The method according to 14 .

Executing a first memory management routine on at least one memory area where the frequency number or frequency ratio of load miss memory addresses has reached the threshold;
And executing a second memory management routine different from the first memory management routine on at least one memory area in which the frequency number or the frequency ratio of load miss memory addresses does not reach the threshold. 16. A method according to any one of claims 13 to 15 .

Hardware for monitoring the performance of one memory heap and collecting performance data in a plurality of memory areas in the memory heap, wherein any one of the plurality of memory areas is a defective area based on the collected performance data Hardware that can determine whether
And a single memory manager to optimize the defective area of the multiple,
The hardware has one performance monitoring unit,
The memory manager is a garbage collector system.

The garbage collector optimizes one garbage collection selected from the group including reference count collection, copy collection, generation collection, mark and sweep collection, beltway collection, oldest first collection, slide compaction, or hybrid collection. The system of claim 17 , wherein the system is executed.