JP7657963B2

JP7657963B2 - Credit Scheme for Multi-Queue Memory Controllers - Patent application

Info

Publication number: JP7657963B2
Application number: JP2023559128A
Authority: JP
Inventors: バラクリシュナンケダーナシュ; ラビチャンドランシュリラム
Original assignee: Advanced Micro Devices Inc
Current assignee: Advanced Micro Devices Inc
Priority date: 2021-03-31
Filing date: 2022-03-21
Publication date: 2025-04-07
Anticipated expiration: 2042-03-21
Also published as: KR102729694B1; US11379388B1; JP2024512623A; EP4315085A4; WO2022212100A1; CN117120992A; KR20230158571A; EP4315085B1; EP4315085A1

Description

コンピュータシステムは、一般に、メインメモリ用の安価で高密度のダイナミックランダムアクセスメモリ（dynamic random access memory、ＤＲＡＭ）チップを使用する。今日販売されている殆どのＤＲＡＭチップは、ＪｏｉｎｔＥｌｅｃｔｒｏｎＤｅｖｉｃｅｓＥｎｇｉｎｅｅｒｉｎｇＣｏｕｎｃｉｌ（ＪＥＤＥＣ）によって広められた様々なダブルデータ速度（double data rate、ＤＤＲ）ＤＲＡＭ規格と適合する。ＤＤＲＤＲＡＭは、高速アクセス回路を有する従来のＤＲＡＭメモリセルアレイを使用して、高い転送レートを達成し、メモリバスの利用を改善する。ＤＤＲメモリコントローラは、より多くのＤＲＡＭモジュールを収容し、単一のチャネルを使用するよりも速くデータをメモリと交換するために、複数のＤＤＲチャネルとインターフェースし得る。例えば、いくつかのメモリコントローラは、２つ又は４つのＤＤＲメモリチャネルを含む。 Computer systems commonly use inexpensive, high-density dynamic random access memory (DRAM) chips for main memory. Most DRAM chips sold today conform to various double data rate (DDR) DRAM standards promulgated by the Joint Electron Devices Engineering Council (JEDEC). DDR DRAM uses conventional DRAM memory cell arrays with high-speed access circuits to achieve high transfer rates and improve memory bus utilization. A DDR memory controller may interface with multiple DDR channels to accommodate more DRAM modules and exchange data with the memory faster than using a single channel. For example, some memory controllers include two or four DDR memory channels.

現代のＤＤＲメモリコントローラは、待ち状態のメモリアクセス要求を格納するためのキューを維持し、効率を高めるために待ち状態のメモリアクセス要求を、それらが生成又は格納された順序に関連して順不同で選択することを可能にする。特定のキューが満杯であるためにメモリアクセス要求が拒否されることを防止するために、メモリコントローラのデータインターフェースは、クレジット制御方式を使用してメモリアクセス要求のフローを制御し、このクレジット制御方式では、要求クレジットは、ホストシステムの様々な部分、例えば、そのデータインターフェースファブリックに提供され、ホストシステムは、コマンドキューに入るためのメモリ要求を送信することが可能になる。また、メモリコントローラは、異なるメモリタイプ、密度及びメモリチャネルトポロジのために構成され得るが、これらの異なるモードをサポートするためにチップコストを増加させ得る大量の追加の回路面積を必要としないように、十分に柔軟である必要がある。 Modern DDR memory controllers maintain queues for storing pending memory access requests, allowing them to be selected out of order relative to the order in which they were generated or stored to increase efficiency. To prevent memory access requests from being rejected because a particular queue is full, the memory controller's data interface controls the flow of memory access requests using a credit control scheme in which request credits are provided to various parts of the host system, such as its data interface fabric, allowing the host system to submit memory requests for entry into the command queue. The memory controller also needs to be flexible enough that it can be configured for different memory types, densities and memory channel topologies, but does not require a large amount of additional circuit area that can increase chip costs to support these different modes.

先行技術で知られている加速処理ユニット（accelerated processing unit、ＡＰＵ）及びメモリシステムのブロック図である。1 is a block diagram of an accelerated processing unit (APU) and memory system known in the prior art; いくつかの実施形態による、図１と同様のＡＰＵで用いるのに適したデュアルチャネルメモリコントローラを含む部分的なデータ処理システムのブロック図である。2 is a block diagram of a partial data processing system including a dual channel memory controller suitable for use in an APU similar to FIG. 1 , according to some embodiments. いくつかの実施形態による、図２のクレジット制御回路を実装するのに適したクレジット制御回路のブロック図である。FIG. 3 is a block diagram of a credit control circuit suitable for implementing the credit control circuit of FIG. 2 according to some embodiments. いくつかの実施形態による、要求クレジットを管理するためのプロセスのフロー図である。FIG. 11 is a flow diagram of a process for managing request credits, according to some embodiments. デュアルチャネルメモリコントローラにおいて要求クレジットを管理するための別のプロセスのフロー図である。FIG. 11 is a flow diagram of another process for managing request credits in a dual channel memory controller.

以下の説明において、異なる図面における同一の符号の使用は、同様の又は同一のアイテムを示す。別段言及されなければ、「結合される」という単語及びその関連する動詞形は、当技術分野で知られている手段による直接接続及び間接電気接続の両方を含み、また、別段言及されなければ、直接接続の任意の記述は、好適な形態の間接電気接続を使用する代替の実施形態も意味する。 In the following description, the use of the same reference numbers in different figures indicates similar or identical items. Unless otherwise noted, the word "coupled" and its related verb forms include both direct and indirect electrical connections by means known in the art, and unless otherwise noted, any description of a direct connection also refers to an alternative embodiment using a suitable form of indirect electrical connection.

メモリコントローラは、アドレスデコーダと、第１のコマンドキューと、第２のコマンドキューと、要求クレジット制御回路と、を含む。アドレスデコーダは、メモリアクセス要求を受信するための第１の入力と、第１の出力と、第２の出力と、を有する。第１のコマンドキューは、第１のメモリチャネルに対するメモリアクセス要求を受信するためにアドレスデコーダの第１の出力に接続された入力と、メモリアクセス要求を保持するためのいくつかのエントリと、を有する。第２のコマンドキューは、第２のメモリチャネルに対するメモリアクセス要求を受信するためにアドレスデコーダの第２の出力に接続された入力と、メモリアクセス要求を保持するためのいくつかのエントリと、を有する。要求クレジット制御回路は、第１のコマンドキューと、第２のコマンドキューと、に接続される。要求クレジット制御回路は、未処理の要求クレジットの数を追跡し、第１のコマンドキュー及び第２のコマンドキューの利用可能なエントリの数に基づいて要求クレジットを発行するように動作可能である。 The memory controller includes an address decoder, a first command queue, a second command queue, and a request credit control circuit. The address decoder has a first input for receiving memory access requests, a first output, and a second output. The first command queue has an input connected to the first output of the address decoder for receiving memory access requests for the first memory channel, and a number of entries for holding the memory access requests. The second command queue has an input connected to the second output of the address decoder for receiving memory access requests for the second memory channel, and a number of entries for holding the memory access requests. The request credit control circuit is connected to the first command queue and the second command queue. The request credit control circuit is operable to track the number of outstanding request credits and issue request credits based on the number of available entries in the first command queue and the second command queue.

方法は、メモリコントローラにおいて複数のメモリアクセス要求を受信することを含む。メモリアクセス要求のアドレスが復号され、第１のメモリチャネル及び第２のメモリチャネルのうち何れかが、メモリアクセス要求の各々を受信するために選択される。アドレスを復号した後、本方法は、各メモリアクセス要求を、第１のメモリチャネルに関連付けられた第１のコマンドキュー及び第２のメモリチャネルに関連付けられた第２のコマンドキューのうち何れかに送信することを含む。指定されたイベントに応じて、本方法は、第１のコマンドキュー及び第２のコマンドキューの利用可能なエントリの数に基づいて要求クレジットを発行することを含む。 The method includes receiving a plurality of memory access requests at a memory controller. Addresses of the memory access requests are decoded, and one of a first memory channel and a second memory channel is selected to receive each of the memory access requests. After decoding the addresses, the method includes sending each memory access request to one of a first command queue associated with the first memory channel and a second command queue associated with the second memory channel. In response to a specified event, the method includes issuing request credits based on the number of available entries in the first command queue and the second command queue.

データ処理システムは、データファブリックと、第１及び第２のメモリチャネルと、少なくとも１つのメモリアクセスエンジンからデータファブリックを介して受信されたメモリアクセス要求を遂行するためにデータファブリック並びに第１及び第２のメモリチャネルに接続されたメモリコントローラと、を含む。メモリコントローラは、アドレスデコーダと、第１のコマンドキューと、第２のコマンドキューと、要求クレジット制御回路と、を含む。アドレスデコーダは、メモリアクセス要求を受信するための第１の入力と、第１の出力と、第２の出力と、を有する。第１のコマンドキューは、第１のメモリチャネルに対するメモリアクセス要求を受信するためにアドレスデコーダの第１の出力に接続された入力と、メモリアクセス要求を保持するためのいくつかのエントリと、を有する。第２のコマンドキューは、第２のメモリチャネルに対するメモリアクセス要求を受信するためにアドレスデコーダの第２の出力に接続された入力と、メモリアクセス要求を保持するためのいくつかのエントリと、を有する。要求クレジット制御回路は、第１のコマンドキューと、第２のコマンドキューと、に接続される。要求クレジット制御回路は、未処理の要求クレジットの数を追跡し、第１のコマンドキュー及び第２のコマンドキューの利用可能なエントリの数に基づいて要求クレジットを発行するように動作可能である。 The data processing system includes a data fabric, a first and a second memory channel, and a memory controller connected to the data fabric and the first and second memory channels to fulfill memory access requests received over the data fabric from at least one memory access engine. The memory controller includes an address decoder, a first command queue, a second command queue, and a request credit control circuit. The address decoder has a first input for receiving memory access requests, a first output, and a second output. The first command queue has an input connected to a first output of the address decoder for receiving memory access requests for the first memory channel, and a number of entries for holding memory access requests. The second command queue has an input connected to a second output of the address decoder for receiving memory access requests for the second memory channel, and a number of entries for holding memory access requests. The request credit control circuit is connected to the first command queue and the second command queue. The request credit control circuit is operable to track a number of outstanding request credits and issue request credits based on the number of available entries in the first command queue and the second command queue.

図１は、従来技術において知られている加速処理ユニット（ＡＰＵ）１００及びメモリシステム１３０のブロック図である。ＡＰＵ１００は、ホストデータ処理システムにおけるプロセッサとして用いるのに適した集積回路であり、概して、中央処理ユニット（central processing unit、ＣＰＵ）コア複合体１１０と、グラフィックスコア１２０と、ディスプレイエンジン１２２のセットと、データファブリック１２５と、メモリ管理ハブ１４０と、周辺コントローラ１６０のセットと、周辺バスコントローラ１７０のセットと、システム管理ユニット（system management unit、ＳＭＵ）１８０と、を含む。 1 is a block diagram of an accelerated processing unit (APU) 100 and memory system 130 known in the prior art. The APU 100 is an integrated circuit suitable for use as a processor in a host data processing system and generally includes a central processing unit (CPU) core complex 110, a graphics core 120, a set of display engines 122, a data fabric 125, a memory management hub 140, a set of peripheral controllers 160, a set of peripheral bus controllers 170, and a system management unit (SMU) 180.

ＣＰＵコア複合体１１０は、ＣＰＵコア１１２及びＣＰＵコア１１４を含む。この例では、ＣＰＵコア複合体１１０が２つのＣＰＵコアを含むが、他の実施形態では、ＣＰＵコア複合体１１０が任意の数のＣＰＵコアを含むことができる。ＣＰＵコア１１２及び１１４の各々は、制御ファブリックを形成するシステム管理ネットワーク（system management network、ＳＭＮ）及びデータファブリック１２５に双方向に接続され、データファブリック１２５にメモリアクセス要求を提供することができる。ＣＰＵコア１１２及び１１４の各々は、単体のコアであってもよく、又は、更にキャッシュ等の特定のリソースを共有する２つ以上の単体のコアを有するコア複合体であってもよい。 The CPU core complex 110 includes a CPU core 112 and a CPU core 114. In this example, the CPU core complex 110 includes two CPU cores, but in other embodiments, the CPU core complex 110 can include any number of CPU cores. Each of the CPU cores 112 and 114 is bidirectionally connected to a system management network (SMN) that forms a control fabric and a data fabric 125, and can provide memory access requests to the data fabric 125. Each of the CPU cores 112 and 114 may be a single core, or may be a core complex having two or more single cores that further share certain resources such as caches.

グラフィックスコア１２０は、頂点処理、フラグメント処理、シェーディング、テクスチャブレンディング等のグラフィックス処理を高度に統合された並列方式で実行することができる高性能グラフィックス処理ユニット（graphics processing unit、ＧＰＵ）である。グラフィックスコア１２０は、ＳＭＮ及びデータファブリック１２５に双方向に接続され、メモリアクセス要求をデータファブリック１２５に提供することができる。これに関して、ＡＰＵ１００は、ＣＰＵコア複合体１１０とグラフィックスコア１２０とが同じメモリ空間を共有する統合メモリアーキテクチャ、又は、ＣＰＵコア複合体１１０とグラフィックスコア１２０とがメモリ空間の一部を共有する一方でグラフィックスコア１２０がＣＰＵコア複合体１１０によりアクセスできないプライベートグラフィックスメモリも使用するメモリアーキテクチャの何れかをサポートすることができる。 The graphics core 120 is a high-performance graphics processing unit (GPU) capable of performing graphics processing such as vertex processing, fragment processing, shading, and texture blending in a highly integrated and parallel manner. The graphics core 120 is bidirectionally connected to the SMN and to the data fabric 125 and can provide memory access requests to the data fabric 125. In this regard, the APU 100 can support either a unified memory architecture in which the CPU core complex 110 and the graphics core 120 share the same memory space, or a memory architecture in which the CPU core complex 110 and the graphics core 120 share a portion of the memory space, but the graphics core 120 also uses a private graphics memory that is not accessible by the CPU core complex 110.

ディスプレイエンジン１２２は、モニタ上に表示するためにグラフィックスコア１２０によって生成されたオブジェクトをレンダリングしてラスタライズする。グラフィックスコア１２０及びディスプレイエンジン１２２は、メモリシステム１３０内の適切なアドレスへの一様な変換のために、データファブリック１２５を介して共通メモリ管理ハブ１４０に双方向に接続される。 The display engine 122 renders and rasterizes objects generated by the graphics core 120 for display on a monitor. The graphics core 120 and the display engine 122 are bidirectionally connected to a common memory management hub 140 via the data fabric 125 for uniform translation to appropriate addresses in the memory system 130.

データファブリック１２５は、任意のメモリアクセスエージェントとメモリ管理ハブ１４０との間でメモリアクセス要求及びメモリ応答をルーティングするためのクロスバースイッチを含む。また、データファブリックは、システム構成に基づいてメモリアクセスの送信先を判定するために、基本入力／出力システム（basic input/output system、ＢＩＯＳ）によって規定されるシステムメモリマップ、並びに、各仮想接続のためのバッファも含む。 The data fabric 125 includes a crossbar switch for routing memory access requests and responses between any memory access agent and the memory management hub 140. The data fabric also includes a system memory map defined by the basic input/output system (BIOS) to determine where to send memory accesses based on the system configuration, as well as buffers for each virtual connection.

周辺コントローラ１６０は、ユニバーサルシリアルバス（universal serial bus、ＵＳＢ）コントローラ１６２及びシリアルアドバンストテクノロジーアタッチメント（Serial Advanced Technology Attachment、ＳＡＴＡ）インターフェースコントローラ１６４を含み、これらのそれぞれは、システムハブ１６６及びＳＭＮバスに対して双方向で接続される。これらの２つのコントローラは、ＡＰＵ１００で使用され得る周辺コントローラの単なる典型例である。 The peripheral controllers 160 include a universal serial bus (USB) controller 162 and a Serial Advanced Technology Attachment (SATA) interface controller 164, each of which is bidirectionally connected to a system hub 166 and an SMN bus. These two controllers are merely exemplary of peripheral controllers that may be used in the APU 100.

周辺バスコントローラ１７０は、システムコントローラ又は「サウスブリッジ」（Southbridge、ＳＢ）１７２と、周辺構成要素相互接続エクスプレス（Peripheral Component Interconnect Express、ＰＣＩｅ）コントローラ１７４と、を含み、これらのそれぞれは、入力／出力（input/output、Ｉ／Ｏ）ハブ１７６及びＳＭＮバスに対して双方向で接続される。また、Ｉ／Ｏハブ１７６は、システムハブ１６６及びデータファブリック１２５に対して双方向で接続される。したがって、例えば、ＣＰＵコアは、データファブリック１２５がＩ／Ｏハブ１７６を介してルーティングするアクセスにより、ＵＳＢコントローラ１６２、ＳＡＴＡインターフェースコントローラ１６４、ＳＢ１７２、又は、ＰＣＩｅコントローラ１７４内のレジスタをプログラムすることができる。ＡＰＵ１００のためのソフトウェア及びファームウェアは、リードオンリーメモリ（read-only memory、ＲＯＭ）、フラッシュ電気的消去可能プログラマブルＲＯＭ（electrically erasable programmable ROM、ＥＥＰＲＯＭ）等の様々な不揮発性メモリタイプの何れかであり得るシステムデータドライブ又はシステムＢＩＯＳメモリ（図示せず）に記憶される。一般に、ＢＩＯＳメモリはＰＣＩｅバスを介してアクセスされ、システムデータドライブはＳＡＴＡインターフェースを介してアクセスされる。 The peripheral bus controllers 170 include a system controller or "Southbridge" (SB) 172 and a Peripheral Component Interconnect Express (PCIe) controller 174, each of which is bidirectionally connected to an input/output (I/O) hub 176 and the SMN bus. The I/O hub 176 is also bidirectionally connected to the system hub 166 and the data fabric 125. Thus, for example, a CPU core can program registers in the USB controller 162, the SATA interface controller 164, the SB 172, or the PCIe controller 174, with accesses routed by the data fabric 125 through the I/O hub 176. Software and firmware for APU 100 is stored in a system data drive or system BIOS memory (not shown), which may be any of a variety of non-volatile memory types, such as read-only memory (ROM), flash, electrically erasable programmable ROM (EEPROM), etc. Typically, the BIOS memory is accessed via a PCIe bus and the system data drive is accessed via a SATA interface.

ＳＭＵ１８０は、ＡＰＵ１００上のリソースの動作を制御してそれらの間の通信を同期させるローカルコントローラである。ＳＭＵ１８０は、ＡＰＵ１００上の様々なプロセッサのパワーアップシーケンシングを管理し、リセット、イネーブル及び他の信号を介して複数のオフチップデバイスを制御する。ＳＭＵ１８０は、ＡＰＵ１００の構成要素のそれぞれにクロック信号を与えるために、位相ロックループ（ｐｈａｓｅｌｏｃｋｅｄｌｏｏｐ、ＰＬＬ）等の１つ以上のクロックソース（図示せず）を含む。また、ＳＭＵ１８０は、様々なプロセッサ及び他の機能ブロックのための電力を管理し、適切な電力状態を判定するためにＣＰＵコア１１２及び１１４並びにグラフィックスコア１２０から測定された電力消費値を受信することができる。 SMU 180 is a local controller that controls the operation of resources on APU 100 and synchronizes communication between them. SMU 180 manages the power-up sequencing of the various processors on APU 100 and controls multiple off-chip devices via reset, enable, and other signals. SMU 180 includes one or more clock sources (not shown), such as a phase locked loop (PLL), to provide clock signals to each of the components of APU 100. SMU 180 also manages power for the various processors and other functional blocks and can receive measured power consumption values from CPU cores 112 and 114 and graphics core 120 to determine appropriate power states.

メモリ管理ハブ１４０並びにその関連する物理インターフェース（physical interface、ＰＨＹ）１５１及び１５２は、この実施形態ではＡＰＵ１００と統合される。メモリ管理ハブ１４０は、メモリチャネル１４１及び１４２とパワーエンジン１４９とを含む。メモリチャネル１４１は、ホストインターフェース１４５、メモリチャネルコントローラ１４３及び物理インターフェース１４７を含む。ホストインターフェース１４５は、メモリチャネルコントローラ１４３を、シリアル存在検出リンク（serial presence detect、ＳＤＰ）を介してデータファブリック１２５に対して双方向で接続する。物理インターフェース１４７は、メモリチャネルコントローラ１４３をＰＨＹ１５１に対して双方向で接続し、ＤＤＲＰＨＹインターフェース（DDR PHY Interface、ＤＦＩ）仕様に準拠する。メモリチャネル１４２は、ホストインターフェース１４６、メモリチャネルコントローラ１４４及び物理インターフェース１４８を含む。ホストインターフェース１４６は、別のＳＤＰを介してメモリチャネルコントローラ１４４をデータファブリック１２５に対して双方向で接続する。物理インターフェース１４８は、メモリチャネルコントローラ１４４をＰＨＹ１５２に対して双方向で接続し、ＤＦＩ仕様に準拠する。パワーエンジン１４９は、ＳＭＮバスを介してＳＭＵ１８０に対して、ＡＰＢを介してＰＨＹ１５１及び１５２に対して双方向で接続されるとともに、メモリチャネルコントローラ１４３及び１４４に対して双方向で接続される。ＰＨＹ１５１は、メモリチャネル１３１への双方向接続を有する。ＰＨＹ１５２は双方向接続メモリチャネル１３３を有する。 The memory management hub 140 and its associated physical interfaces (PHYs) 151 and 152 are integrated with the APU 100 in this embodiment. The memory management hub 140 includes memory channels 141 and 142 and a power engine 149. The memory channel 141 includes a host interface 145, a memory channel controller 143, and a physical interface 147. The host interface 145 bidirectionally connects the memory channel controller 143 to the data fabric 125 via a serial presence detect link (SDP). The physical interface 147 bidirectionally connects the memory channel controller 143 to the PHY 151 and conforms to the DDR PHY Interface (DFI) specification. The memory channel 142 includes a host interface 146, a memory channel controller 144, and a physical interface 148. Host interface 146 bidirectionally connects memory channel controller 144 to data fabric 125 via another SDP. Physical interface 148 bidirectionally connects memory channel controller 144 to PHY 152 and conforms to the DFI specification. Power engine 149 is bidirectionally connected to SMU 180 via the SMN bus, to PHYs 151 and 152 via the APB, and to memory channel controllers 143 and 144. PHY 151 has a bidirectional connection to memory channel 131. PHY 152 has a bidirectionally connected memory channel 133.

メモリ管理ハブ１４０は、２つのメモリチャネルコントローラを有するメモリコントローラのインスタンス化であり、共有パワーエンジン１４９を使用して、以下で更に説明する態様でメモリチャネルコントローラ１４３及びメモリチャネルコントローラ１４４の両方の動作を制御する。メモリチャネル１４１及び１４２のそれぞれは、ＤＤＲバージョン５（DDR version five、ＤＤＲ５）、ＤＤＲバージョン４（DDR version four、ＤＤＲ４）、低電力ＤＤＲ４（low power DDR4、ＬＰＤＤＲ４）、グラフィックスＤＤＲバージョン５（graphics DDR version five、ＧＤＤＲ５）、及び、高帯域幅メモリ（high bandwidth memory、ＨＢＭ）等の最先端ＤＤＲメモリに接続することができ、将来のメモリ技術に適合され得る。これらのメモリは、高いバス帯域幅及び高速動作をもたらす。同時に、それらのメモリは、ラップトップコンピュータ等のバッテリ駆動用途のための電力を節約するための低電力モードを与えるとともに、内蔵熱監視も行う。 Memory management hub 140 is an instantiation of a memory controller with two memory channel controllers and uses a shared power engine 149 to control the operation of both memory channel controller 143 and memory channel controller 144 in a manner further described below. Each of memory channels 141 and 142 can connect to state-of-the-art DDR memories such as DDR version five (DDR5), DDR version four (DDR4), low power DDR4 (LPDDR4), graphics DDR version five (GDDR5), and high bandwidth memory (HBM), and can be adapted for future memory technologies. These memories provide high bus bandwidth and high speed operation. At the same time, they provide low power modes to conserve power for battery-powered applications such as laptop computers, as well as built-in thermal monitoring.

メモリシステム１３０は、メモリチャネル１３１及びメモリチャネル１３３を含む。メモリチャネル１３１は、ＤＤＲｘバス１３２に接続されるデュアルインラインメモリモジュール（dual inline memory module、ＤＩＭＭ）のセットを含み、これには、本実施例では個別のランクに対応する代表的なＤＩＭＭ１３４、１３６、１３８が含まれる。同様に、メモリチャネル１３３は、代表的なＤＩＭＭ１３５、１３７、１３９を含む、ＤＤＲｘバス１２９に接続されるＤＩＭＭのセットを含む。 Memory system 130 includes memory channel 131 and memory channel 133. Memory channel 131 includes a set of dual inline memory modules (DIMMs) connected to DDRx bus 132, including representative DIMMs 134, 136, and 138 that, in this example, correspond to separate ranks. Similarly, memory channel 133 includes a set of DIMMs connected to DDRx bus 129, including representative DIMMs 135, 137, and 139.

ＡＰＵ１００は、ホストデータ処理システムの中央処理ユニット（ＣＰＵ）として動作し、最新のコンピュータシステムにおいて有用な様々なバス及びインターフェースを与える。これらのインターフェースは、２つのダブルデータレート（double data rate、ＤＤＲｘ）メモリチャネル、ＰＣＩｅリンクへの接続のためのＰＣＩｅルート複合体、ＵＳＢネットワークへの接続のためのＵＳＢコントローラ、及び、ＳＡＴＡ大容量記憶デバイスへのインターフェースを含む。 The APU 100 acts as the central processing unit (CPU) of the host data processing system and provides the various buses and interfaces useful in modern computer systems. These interfaces include two double data rate (DDRx) memory channels, a PCIe root complex for connection to a PCIe link, a USB controller for connection to a USB network, and an interface to a SATA mass storage device.

また、ＡＰＵ１００は、様々なシステム監視機能及び節電機能も実装する。特に、１つのシステム監視機能は熱監視である。例えば、ＡＰＵ１００が高温になる場合、ＳＭＵ１８０は、ＣＰＵコア１１２、１１４及び／又はグラフィックスコア１２０の周波数及び電圧を低減することができる。ＡＰＵ１００が高温になりすぎる場合、ＳＭＵを完全にシャットダウンすることができる。ＳＭＮバスを介してＳＭＵ１８０によって外部センサから熱事象を受けることもでき、ＳＭＵ１８０は、それに応じてクロック周波数及び／又は電源電圧を低減することができる。 APU 100 also implements various system monitoring and power saving functions. One system monitoring function in particular is thermal monitoring. For example, if APU 100 becomes too hot, SMU 180 can reduce the frequency and voltage of CPU cores 112, 114 and/or graphics core 120. If APU 100 becomes too hot, the SMU can be shut down completely. Thermal events can also be received from external sensors by SMU 180 via the SMN bus, and SMU 180 can reduce the clock frequency and/or power supply voltage accordingly.

図２は、図１のようなＡＰＵで用いるのに適したデュアルチャネルメモリコントローラ２１０を含む部分的なデータ処理システム２００のブロック図である。データファブリック１２５に接続されたデュアルチャネルメモリコントローラ２１０が示されており、コヒーレントスレーブエージェント２５０及びコヒーレントマスタエージェント２６０を含むデータ処理システム２００内に存在するいくつかのメモリエージェントと通信することができる。デュアルチャネルメモリコントローラ２１０は、２つの個別のメモリチャネルコントローラ１４３及び１４４（図１）を置換し、データファブリック１２５及びデータ処理システム２００内の様々なメモリアドレス指定エージェントに透過的な方法で２つのＤＤＲｘチャネルを一緒に制御することができ、その結果、単一のメモリコントローラインターフェース２１２を使用してメモリアクセスコマンドを送信し、結果を受信することができる。また、デュアルチャネルメモリコントローラ２１０は、例えば、ＤＤＲ５ＤＲＡＭとともに使用するためのＤＤＲ５仕様において定義されるような２つのサブチャネル、又は、高帯域幅メモリ２（High Bandwidth Memory 2、ＨＢＭ２）及びＨＢＭ３規格において定義されるそれらのサブチャネルを制御することができる。デュアルチャネルメモリコントローラ２１０は、概して、インターフェース２１２と、クレジット制御回路２２１と、アドレスデコーダ２２２と、異なるメモリチャネルにそれぞれ割り当てられたメモリチャネル制御回路２２３の２つのインスタンスと、を含む。メモリチャネル制御回路２２３の各インスタンスは、メモリインターフェースキュー２１４と、コマンドキュー２２０と、コンテントアドレッサブルメモリ（content addressable memory、ＣＡＭ）２２４と、リプレイキュー２３０を含むリプレイ制御ロジック２３１と、タイミングブロック２３４と、ページテーブル２３６と、アービタ２３８と、エラー訂正符号（error correction code、ＥＣＣ）チェック回路２４２と、ＥＣＣ生成ブロック２４４と、データバッファ２４６と、アクティブ化カウンタ２４８を含むリフレッシュ制御ロジック２３２と、を含む。他の実施形態では、コマンドキュー２３０、アービタ２３８及びメモリインターフェースキュー２１４のみが、使用される各メモリチャネル又はサブチャネルに対して複製され、残りの図示された回路は、２つのチャネルとともに使用するように適合される。更に、図示されたデュアルチャネルメモリコントローラは、２つのメモリチャネル又はサブチャネルを制御するために、アービタ２３８、コマンドキュー２２０、及び、メモリインターフェースキュー２１４の２つのインスタンスを含むが、他の実施形態は、本明細書のクレジット管理技術に従って３つ又は４つのチャネル又はサブチャネル上でＤＲＡＭと通信するために使用される、３つ又は４つ以上等のより多くのインスタンスを含み得る。 FIG. 2 is a block diagram of a partial data processing system 200 including a dual channel memory controller 210 suitable for use in an APU such as FIG. 1. The dual channel memory controller 210 is shown connected to the data fabric 125 and can communicate with several memory agents present in the data processing system 200, including a coherent slave agent 250 and a coherent master agent 260. The dual channel memory controller 210 replaces two separate memory channel controllers 143 and 144 (FIG. 1) and can control two DDRx channels together in a manner that is transparent to the various memory addressing agents in the data fabric 125 and the data processing system 200, so that a single memory controller interface 212 can be used to send memory access commands and receive results. The dual channel memory controller 210 can also control two sub-channels, such as those defined in the DDR5 specification for use with DDR5 DRAMs, or those sub-channels defined in the High Bandwidth Memory 2 (HBM2) and HBM3 standards. Dual channel memory controller 210 generally includes an interface 212, a credit control circuit 221, an address decoder 222, and two instances of memory channel control circuitry 223, each assigned to a different memory channel. Each instance of memory channel control circuitry 223 includes a memory interface queue 214, a command queue 220, a content addressable memory (CAM) 224, a replay control logic 231 including a replay queue 230, a timing block 234, a page table 236, an arbiter 238, an error correction code (ECC) check circuit 242, an ECC generation block 244, a data buffer 246, and a refresh control logic 232 including an activation counter 248. In other embodiments, only the command queue 230, the arbiter 238, and the memory interface queue 214 are duplicated for each memory channel or sub-channel used, and the remaining illustrated circuitry is adapted for use with two channels. Additionally, although the illustrated dual channel memory controller includes two instances of arbiter 238, command queue 220, and memory interface queue 214 to control two memory channels or sub-channels, other embodiments may include more instances, such as three or four or more, used to communicate with DRAM over three or four channels or sub-channels in accordance with the credit management techniques herein.

インターフェース２１２は、通信バスを介したデータファブリック１２５への第１の双方向接続と、クレジット制御回路２２１への第２の双方向接続と、を有する。この実施形態では、インターフェース２１２は、データファブリック１２５と通信するためのいくつかのチャネルを確立するためにスケーラブルデータポート（ＳＤＰ）リンクを使用するが、他のインターフェースリンク規格も使用に適している。例えば、別の実施形態では、通信バスは、「ＡＸＩ４」として知られている英国ケンブリッジのＡＲＭＨｏｌｄｉｎｇｓ，ＰＬＣによって指定された高度拡張可能インターフェースバージョン４と適合するが、他の実施形態では他のタイプのインターフェースとなり得る。インターフェース２１２は、メモリアクセス要求を、「ＦＣＬＫ」（又は「ＭＥＭＣＬＫ」）ドメインとして知られている第１のクロックドメインから、「ＵＣＬＫ」ドメインとして知られているデュアルチャネルメモリコントローラ２１０の内部の第２のクロックドメインに変換する。同様に、メモリインターフェースキュー２１４は、ＵＣＬＫドメインからＤＦＩインターフェースに関連付けられる「ＤＦＩＣＬＫ」ドメインへのメモリアクセスを与える。 Interface 212 has a first bidirectional connection to data fabric 125 via a communication bus and a second bidirectional connection to credit control circuit 221. In this embodiment, interface 212 uses a Scalable Data Port (SDP) link to establish several channels for communicating with data fabric 125, although other interface link standards are suitable for use. For example, in another embodiment, the communication bus conforms to the Advanced Scalable Interface Version 4 specified by ARM Holdings, PLC of Cambridge, UK, known as "AXI4", although other types of interfaces may be used in other embodiments. Interface 212 translates memory access requests from a first clock domain known as the "FCLK" (or "MEMCLK") domain to a second clock domain internal to dual channel memory controller 210, known as the "UCLK" domain. Similarly, memory interface queue 214 provides memory access from the UCLK domain to the "DFICLK" domain associated with the DFI interface.

クレジット制御回路２２１は、インターフェース２１２への双方向通信リンクを含み、これは、アドレスデコーダ２２２と共有され得るか又は要求クレジットを管理するための専用ＳＤＰチャネルを含み得る。また、クレジット制御回路２２１は、両方のコマンドキュー２２０に接続された入力を有し、図では、アドレスデコーダ２２２と共有されるものとして示されている。クレジット制御回路２２１は、概して、両方のメモリチャネルのためのデータファブリックに割り当てられた要求クレジットを制御する。以下で更に説明するように、クレジット制御回路２２１によって実行される制御プロセスは、未処理の要求クレジットの数を追跡することと、未処理の要求クレジットの数が第１及び第２のコマンドキュー２２０の利用可能なエントリの最小数より小さい場合、第１及び第２のコマンドキュー２２０のうち何れかからメモリアクセス要求が割り当て解除されることに応じて要求クレジットを発行することと、そうでない場合、メモリアクセス要求が割り当て解除されることに応じて要求クレジットを発行しないことと、を含む。また、クレジット制御回路２２１は、最大数の利用可能なエントリを有する第１及び第２のコマンドキュー２２０のうち何れかに割り当てられたメモリアクセス要求が受信された場合、第１又は第２のコマンドキュー２２０からの対応する割り当て解除なしに要求クレジットを発行するように動作する。 The credit control circuit 221 includes a bidirectional communication link to the interface 212, which may be shared with the address decoder 222 or may include a dedicated SDP channel for managing request credits. The credit control circuit 221 also has inputs connected to both command queues 220, and is shown in the figure as being shared with the address decoder 222. The credit control circuit 221 generally controls the request credits allocated to the data fabric for both memory channels. As described further below, the control process performed by the credit control circuit 221 includes tracking the number of outstanding request credits, issuing a request credit in response to a memory access request being deallocated from either the first or second command queue 220 if the number of outstanding request credits is less than the minimum number of available entries in the first and second command queues 220, and not issuing a request credit in response to a memory access request being deallocated otherwise. The credit control circuit 221 also operates to issue a request credit without a corresponding deallocation from the first or second command queue 220 when a memory access request is received that is allocated to either the first or second command queue 220 that has the maximum number of available entries.

アドレスデコーダ２２２は、クレジット制御回路２２１への双方向リンクと、第１のコマンドキュー２２０（「コマンドキュー０」とラベル付けされている）に接続された第１の出力と、第２のコマンドキュー２２０（「コマンドキュー１」とラベル付けされている）に接続された第２の出力と、を有する。アドレスデコーダ２２２は、インターフェース２１２を介してデータファブリック１２５上で受信されたメモリアクセス要求のアドレスを復号する。メモリアクセス要求は、正規化フォーマットで表された物理アドレス空間内のアクセスアドレスを含む。アクセスアドレスに基づいて、アドレスデコーダ２２２は、要求を処理するために、コマンドキュー２２０のうち関連付けられた１つを有するメモリチャネルのうち１つを選択する。選択されたチャネルは、クレジット発行決定を行うことができるように、要求ごとにクレジット制御回路２２１に対して識別される。アドレスデコーダ２２２は、正規化されたアドレスを、メモリシステム１３０内の実際のメモリデバイスをアドレス指定するために及び関連するアクセスを効率的にスケジュールするために使用され得るフォーマットに変換する。このフォーマットは、メモリアクセス要求を特定のランク、行アドレス、列アドレス、バンクアドレス、及び、バンクグループと関連付ける領域識別子を含む。起動時に、システムＢＩＯＳは、メモリシステム１３０内のメモリデバイスに問い合わせてそれらのサイズ及び構成を判定し、アドレスデコーダ２２２に関連付けられた構成レジスタのセットをプログラムする。アドレスデコーダ２２２は、構成レジスタに記憶された構成を使用して、正規化されたアドレスを適切なフォーマットに変換する。各メモリアクセス要求は、アドレスデコーダ２２２によって選択されたメモリチャネルに対するコマンドキュー２２０にロードされる。 Address decoder 222 has a bidirectional link to credit control circuit 221 and a first output connected to a first command queue 220 (labeled "Command Queue 0") and a second output connected to a second command queue 220 (labeled "Command Queue 1"). Address decoder 222 decodes the addresses of memory access requests received on data fabric 125 via interface 212. The memory access request includes an access address in a physical address space represented in a normalized format. Based on the access address, address decoder 222 selects one of the memory channels with an associated one of command queues 220 to process the request. The selected channel is identified to credit control circuit 221 for each request so that a credit issuance decision can be made. Address decoder 222 converts the normalized address into a format that can be used to address the actual memory devices in memory system 130 and to efficiently schedule the associated accesses. This format includes a region identifier that associates the memory access request with a particular rank, row address, column address, bank address, and bank group. At power-up, the system BIOS interrogates the memory devices in memory system 130 to determine their size and configuration, and programs a set of configuration registers associated with address decoder 222. Address decoder 222 uses the configuration stored in the configuration registers to convert normalized addresses into the appropriate format. Each memory access request is loaded into command queue 220 for the memory channel selected by address decoder 222.

各コマンドキュー２２０は、ＣＰＵコア１１２及び１１４並びにグラフィックスコア１２０等のＡＰＵ１００内の様々なメモリアクセスエンジンから受信されるメモリアクセス要求のキューである。各コマンドキュー２２０は、関連付けられたメモリチャネルを介して発行されるメモリアクセス要求をコマンドキュー２２０から選択するために、それぞれのアービタ２３８に双方向に接続される。各コマンドキュー２２０は、アドレスデコーダ２２２によって復号されたアドレスフィールド、並びに、それぞれのアービタ２３８がアクセスタイプ及びサービス品質（quality of service、ＱｏＳ）識別子を含むメモリアクセスを効率的に選択できるようにする他のアドレス情報を記憶する。各ＣＡＭ２２４は、書き込み後の書き込み（write after write、ＷＡＷ）及び書き込み後の読み取り（read after write、ＲＡＷ）順序規則等の順序規則を実施するための情報を含む。 Each command queue 220 is a queue of memory access requests received from various memory access engines in APU 100, such as CPU cores 112 and 114 and graphics core 120. Each command queue 220 is bidirectionally connected to a respective arbiter 238 for selecting memory access requests from command queue 220 to be issued through the associated memory channel. Each command queue 220 stores address fields decoded by address decoder 222, as well as other address information that allows the respective arbiter 238 to efficiently select memory accesses, including access type and quality of service (QoS) identifiers. Each CAM 224 contains information for implementing ordering rules, such as write after write (WAW) and read after write (RAW) ordering rules.

アービタ２３８は、それぞれ、適切なコマンドで実行されるメモリアクセス要求を選択するために、それぞれのコマンドキュー２２０に双方向に接続される。アービタ２３８は、概して、メモリチャネルのメモリバスの使用を改善するために、アクセスのインテリジェントスケジューリングによって、そのそれぞれのメモリチャネルの効率を改善する。各アービタ２３８は、それぞれのタイミングブロック２３４を使用して、それぞれのコマンドキュー２２０内の特定のアクセスがＤＲＡＭタイミングパラメータに基づいて発行に適格であるかどうかを判定することによって、適切なタイミング関係を実施する。例えば、各ＤＲＡＭは、「ｔ_ＲＣ」として知られるアクティブ化コマンド間の最小指定時間を有する。各タイミングブロック２３４は、ＪＥＤＥＣ仕様で定められたこのタイミングパラメータ及び他のタイミングパラメータに基づいて適格性を判定するカウンタのセットを維持し、リプレイキュー２３０に対して双方向で接続される。各ページテーブル２３６は、アービタ２３８のためのそれぞれのメモリチャネルの各バンク及びランクにおけるアクティブページに関する状態情報を維持し、そのそれぞれのリプレイキュー２３０に対して双方向で接続される。アービタ２３８は、復号されたアドレス情報、タイミングブロック２３４によって示されるタイミング適格性情報、及び、ページテーブル２３６によって示されるアクティブページ情報を使用して、サービス品質（ＱｏＳ）要件等の他の基準を遵守しながら、メモリアクセスを効率的にスケジュールする。例えば、アービタ２３８は、メモリページを変更するために必要なプリチャージコマンド及びアクティブ化コマンドのオーバーヘッドを回避するために、オープンページへのアクセスの優先度を実装し、あるバンクへのオーバーヘッドアクセスを別のバンクへの読み取り及び書き込みアクセスとインターリーブすることによって隠す。特に、通常動作中、アービタ２３８は、通常、ページを、これらのページが異なるページを選択する前にプリチャージされる必要があるまで、異なるバンクで開いたままにする。アービタ２３８は、いくつかの実施形態では、それぞれのコマンドのターゲットメモリ領域に関するアクティブ化カウンタ２４８の少なくともそれぞれの値に基づいてコマンド選択の適格性を判定する。 Each arbiter 238 is bidirectionally connected to a respective command queue 220 to select memory access requests to be executed with the appropriate command. The arbiters 238 generally improve the efficiency of their respective memory channels by intelligent scheduling of accesses to improve usage of the memory channel's memory bus. Each arbiter 238 uses its respective timing block 234 to enforce the appropriate timing relationships by determining whether a particular access in its respective command queue 220 is eligible for issue based on DRAM timing parameters. For example, each DRAM has a minimum specified time between activation commands known as " _tRC ". Each timing block 234 maintains a set of counters that determine eligibility based on this and other timing parameters defined in the JEDEC specifications, and is bidirectionally connected to the replay queue 230. Each page table 236 maintains state information regarding active pages in each bank and rank of its respective memory channel for the arbiter 238, and is bidirectionally connected to its respective replay queue 230. The arbiter 238 uses the decoded address information, the timing eligibility information indicated by the timing block 234, and the active page information indicated by the page table 236 to efficiently schedule memory accesses while adhering to other criteria such as quality of service (QoS) requirements. For example, the arbiter 238 implements prioritization of accesses to open pages and hides overhead accesses to one bank by interleaving them with read and write accesses to another bank to avoid the overhead of precharge and activate commands required to change memory pages. In particular, during normal operation, the arbiter 238 typically keeps pages open in different banks until these pages need to be precharged before selecting a different page. The arbiter 238, in some embodiments, determines the eligibility of a command selection based on at least the respective values of the activation counters 248 for the target memory regions of the respective commands.

各エラー訂正コード（ＥＣＣ）生成ブロック２４４は、メモリに送られる書き込みデータのＥＣＣを判定する。ＥＣＣチェック回路２４２は、受信されたＥＣＣを着信ＥＣＣと照合してチェックする。 Each error correction code (ECC) generation block 244 determines the ECC of the write data sent to the memory. The ECC check circuit 242 checks the received ECC against the incoming ECC.

各リプレイキュー２３０は、アドレス及びコマンドパリティ応答等の応答を待っているアービタ２３８によって選択されたメモリアクセスを記憶するための一時的なキューである。リプレイ制御ロジック２３１は、ＥＣＣチェック回路２４２にアクセスして、戻されたＥＣＣが正しいか又はエラーを示すかを判定する。リプレイ制御ロジック２３１は、これらのサイクルのうち１つのパリティ又はＥＣＣエラーの場合にアクセスがリプレイされるリプレイシーケンスを開始して制御する。リプレイされたコマンドは、メモリインターフェースキュー２１４に配置される。 Each replay queue 230 is a temporary queue for storing a memory access selected by arbiter 238 awaiting a response, such as an address and command parity response. Replay control logic 231 accesses ECC check circuitry 242 to determine whether the returned ECC is correct or indicates an error. Replay control logic 231 initiates and controls a replay sequence in which an access is replayed in the event of a parity or ECC error in one of these cycles. The replayed command is placed in memory interface queue 214.

リフレッシュ制御ロジック２３２の各インスタンスは、メモリアクセスエージェントから受信した通常の読み取り及び書き込みメモリアクセス要求とは別に生成される様々な電源断、リフレッシュ及び終端抵抗（ＺＱ）較正サイクルのためのステートマシンを含む。例えば、メモリランクがプリチャージパワーダウンにある場合、リフレッシュ制御ロジックは、リフレッシュサイクルを実行するために定期的に起動されなければならない。リフレッシュ制御ロジック２３２は、ＤＲＡＭチップ内のメモリセルの蓄積キャパシタからの電荷の漏れによって引き起こされるデータエラーを防止するために、定期的に、定められた条件に応じて、リフレッシュコマンドを生成する。リフレッシュ制御ロジック２３２の各インスタンスはアクティブ化カウンタ２４８を含み、この実施形態では、アクティブ化カウンタ２４８は、メモリチャネルを介してメモリ領域に送信されるアクティブ化コマンドのローリング数をカウントするカウンタをメモリ領域ごとに有する。メモリ領域は、いくつかの実施形態ではメモリバンクであり、他の実施形態ではメモリサブバンクである。更に、リフレッシュ制御ロジック２３２は、システム内の熱変化に起因するオンダイ終端抵抗の不一致を防止するためにＺＱを定期的に較正する。 Each instance of the refresh control logic 232 includes state machines for various power-down, refresh, and termination resistor (ZQ) calibration cycles that are generated separately from normal read and write memory access requests received from memory access agents. For example, when a memory rank is in precharge power-down, the refresh control logic must be periodically awakened to perform refresh cycles. The refresh control logic 232 periodically generates refresh commands in response to defined conditions to prevent data errors caused by charge leakage from storage capacitors of memory cells in the DRAM chip. Each instance of the refresh control logic 232 includes an activation counter 248, which in this embodiment has a counter for each memory region that counts the rolling number of activation commands sent to the memory region over the memory channel. The memory region is a memory bank in some embodiments and a memory sub-bank in other embodiments. Additionally, the refresh control logic 232 periodically calibrates the ZQ to prevent on-die termination resistor mismatches due to thermal changes in the system.

ＥＣＣ生成ブロック２４４は、インターフェース２１２から受信した書き込みメモリアクセス要求に応じて、書き込みデータに従ってＥＣＣを計算する。データバッファ２４６は、受信したメモリアクセス要求に関する書き込みデータ及びＥＣＣを記憶する。データバッファ２４６は、それぞれのアービタ２３８がメモリチャネルへのディスパッチのために対応する書き込みアクセスを選択すると、組み合わされた書き込みデータ／ＥＣＣをそれぞれのメモリインターフェースキュー２１４に出力する。 The ECC generation block 244 calculates the ECC according to the write data in response to a write memory access request received from the interface 212. The data buffer 246 stores the write data and ECC for the received memory access request. The data buffer 246 outputs the combined write data/ECC to the respective memory interface queue 214 when the respective arbiter 238 selects the corresponding write access for dispatch to the memory channel.

３つ以上のメモリチャネル又はサブチャネルを有する実施形態では、追加のコマンドキュー、アービタ及びメモリインターフェースキューは、単一のアドレスデコーダ２２２及びクレジット制御回路２２１を使用して、図示されたものと並列に追加される。このような設計により、以下に説明するクレジット制御方式を３つ以上のチャネル又はサブチャネルとともに使用することが可能となり、キュー容量及びチャネル容量を使用する際に対応する効率が得られる。説明したように、メモリチャネル制御回路２２３のグループ全体は、各チャネル又はサブチャネルに対して再現されてもよく、あるいは、同じロジックブロックは、追加されたコマンドキュー、アービタ及びメモリインターフェースキューを追跡するために、追加された追加容量とともに使用されてもよい。 In embodiments having more than two memory channels or sub-channels, additional command queues, arbiters, and memory interface queues are added in parallel to those shown, using a single address decoder 222 and credit control circuitry 221. Such a design allows the credit control scheme described below to be used with more than two channels or sub-channels, with corresponding efficiencies in using queue and channel capacity. As described, the entire group of memory channel control circuits 223 may be duplicated for each channel or sub-channel, or the same logic blocks may be used with the additional capacity added to track the added command queues, arbiters, and memory interface queues.

図３は、いくつかの実施形態による、図２のクレジット制御回路２２１を実装するのに適したクレジット制御回路３００のブロック図を示す。クレジット制御回路３００は、未処理クレジット追跡ロジック３０２と、キュー０占有ロジック３０４と、キュー１占有ロジック３０６と、インターフェースロジック３０８と、クレジット発行ロジック３１０と、要求モニタ３１２と、コマンドキューモニタ３１４と、先入れ先出し（first-in-first-out、ＦＩＦＯ）クレジットキュー３１６（ＦＩＦＯキュー３１６）と、を含む。未処理クレジット追跡ロジック３０２は、概して、発行された要求クレジットのカウントを維持し、新しい要求クレジットを発行し、関連するメモリアクセス要求がメモリコントローラ２１０において受信された場合に有効化される要求クレジットを追跡する。要求クレジットは、データファブリック上の１つ以上の要求エージェントに発行される。この実施形態では、要求クレジットは、初期クレジットと、２つのコマンドキュー及び２つのチャネル又はサブチャネルの使用によって提供されるより高い容量のために発行される追加クレジットと、の２つのタイプのうち何れかである。追加クレジットを使用することにより、クレジット制御回路３００は、両方のコマンドキューの容量をより完全且つ効率的に利用するために、特定の条件下で初期クレジットの数を超える更なるクレジットを発行することができる。追加クレジットは、初期クレジットと同じ方法で未処理クレジット追跡ロジックによって追跡され、総未処理クレジットに向かってカウントされる。 3 illustrates a block diagram of a credit control circuit 300 suitable for implementing the credit control circuit 221 of FIG. 2, according to some embodiments. The credit control circuit 300 includes outstanding credit tracking logic 302, queue 0 occupancy logic 304, queue 1 occupancy logic 306, interface logic 308, credit issuing logic 310, request monitor 312, command queue monitor 314, and a first-in-first-out (FIFO) credit queue 316 (FIFO queue 316). The outstanding credit tracking logic 302 generally maintains a count of issued request credits, issues new request credits, and tracks request credits that are enabled when an associated memory access request is received at the memory controller 210. The request credits are issued to one or more request agents on the data fabric. In this embodiment, the request credits are one of two types: initial credits and additional credits that are issued due to the higher capacity provided by the use of two command queues and two channels or sub-channels. The use of additional credits allows the credit control circuitry 300 to issue more credits beyond the initial number of credits under certain conditions in order to more fully and efficiently utilize the capacity of both command queues. The additional credits are tracked by the outstanding credits tracking logic in the same manner as the initial credits and are counted toward the total outstanding credits.

キュー０占有ロジック３０４及びキュー１占有ロジック３０６は、それぞれのコマンドキュー内の割り当てられていないエントリの数のカウントを維持する。いくつかの実施形態では、カウントは、コマンドキューサイズから各コマンドキューの占有されたエントリの現在の数を減算することによって生成される。他の実施形態では、占有されていないエントリは、コマンドキューから直接的に追跡されるか、又は、各コマンドキューにロードされるエントリ及び各コマンドキューから割り当て解除されるエントリの追跡に基づいて間接的に追跡される。 Queue 0 occupancy logic 304 and queue 1 occupancy logic 306 maintain a count of the number of unallocated entries in their respective command queues. In some embodiments, the count is generated by subtracting the current number of occupied entries in each command queue from the command queue size. In other embodiments, the unallocated entries are tracked directly from the command queues, or indirectly based on tracking the entries loaded into and deallocated from each command queue.

要求モニタ３１２は、何れのキューが各要求を受信するかを含めて、それぞれのコマンドキューにアドレスデコーダ２２２によって割り当てられた着信要求を監視する。この情報は、新しい要求クレジットが発行される場合及び発行されるかどうかを判定する際に、クレジット発行ロジック３１０によって使用される。コマンドキューモニタ３１４は、両方のコマンドキューを監視して、いつ要求がコマンドキューから割り当て解除されるかを判定する。ＦＩＦＯキュー３１６は、図５に関して説明するように、特定の条件下でコマンドが各コマンドキューから割り当て解除された場合に発行される追加の要求クレジットを保持する。これらのクレジットは、以下で更に説明するように、クレジット発行ロジック３００が、これが許可されていると判定するとすぐに、ファブリックに解放される。クレジット発行ロジック３１０は、図４及び図５に関して以下で更に説明するように、未処理クレジットの数、各キューのキュー占有率、並びに、要求モニタ３１２及びコマンドキューモニタ３１４からの監視された情報を使用して、要求クレジットをいつ発行するかを決定する。いくつかのバージョンでは、クレジット制御機能は、メモリコントローラのアービタ（例えば、アービタ２３８、図２）内のロジック回路を監視することで具現化される。他のバージョンでは、プロセスは、前述のサブアービタ３０５及び最終アービタ３５０において使用されるものとは異なるアービトレーション方法を使用しながら、同様の機能を有するデジタルロジック又はコントローラによって実行されてもよい。 The request monitor 312 monitors incoming requests assigned by the address decoder 222 to each command queue, including which queue receives each request. This information is used by the credit issuing logic 310 in determining if and whether new request credits are issued. The command queue monitor 314 monitors both command queues to determine when requests are deallocated from the command queues. The FIFO queue 316 holds additional request credits that are issued when commands are deallocated from each command queue under certain conditions, as described with respect to FIG. 5. These credits are released to the fabric as soon as the credit issuing logic 300 determines that this is permitted, as described further below. The credit issuing logic 310 uses the number of outstanding credits, the queue occupancy of each queue, and monitored information from the request monitor 312 and the command queue monitor 314 to determine when to issue request credits, as described further below with respect to FIGS. 4 and 5. In some versions, the credit control function is implemented by monitoring logic circuitry within the memory controller's arbiter (e.g., arbiter 238, FIG. 2). In other versions, the process may be performed by digital logic or a controller having similar functionality, while using a different arbitration method than that used in the sub-arbiter 305 and final arbiter 350 discussed above.

図４は、いくつかの実施形態による、要求クレジットを管理するためのプロセスのフロー図４００である。図示されたプロセスは、デュアルチャネルメモリコントローラ、２つ以上のメモリチャネル若しくはサブチャネルに結合されたメモリコントローラ、又は、未処理の要求クレジットを追跡し、デュアルチャネルメモリコントローラのために２つ以上のコマンドキューを監視する別の好適なデジタル制御回路において具現化された図３のクレジット制御回路３００等のクレジット制御回路によって実行されるのに適している。プロセスは、概して、コマンドキュー０及びコマンドキュー１に関連付けられた両方のメモリチャネルに対するメモリアクセス要求に対する要求クレジットを管理するように機能する。要求クレジットは、関連付けられたアクセス要求を受信するために何れのコマンドキュー及びメモリチャネルが最終的に選択され得るかとは無関係に、データファブリックによって使用される。すなわち、メモリコントローラによって管理される２つのメモリチャネル又はサブチャネルの存在は、データファブリック及びデータファブリックにアクセスする様々なメモリエージェントに対して透過的である。 4 is a flow diagram 400 of a process for managing request credits, according to some embodiments. The illustrated process is suitable for execution by a credit control circuit, such as the credit control circuit 300 of FIG. 3 embodied in a dual channel memory controller, a memory controller coupled to two or more memory channels or sub-channels, or another suitable digital control circuit that tracks outstanding request credits and monitors two or more command queues for a dual channel memory controller. The process generally functions to manage request credits for memory access requests for both memory channels associated with command queue 0 and command queue 1. The request credits are used by the data fabric regardless of which command queue and memory channel may ultimately be selected to receive the associated access request. That is, the existence of two memory channels or sub-channels managed by the memory controller is transparent to the data fabric and the various memory agents accessing the data fabric.

ブロック４０２において２つのメモリチャネルが初期化されたことに応じて、ブロック４０４におけるプロセスは、初期要求クレジットをデータファブリックに発行し、初期要求クレジットは、着信読み取り又は書き込みコマンドのために有効化される。また、書き込みコマンドは、データバッファ２４６（図２）を管理するためにデータクレジットの使用を必要とする。データクレジットは、本明細書で説明する初期クレジット及び追加クレジットとは別に管理される。初期要求クレジットの数は、コマンドキュー２２０のサイズによって判定される。好ましくは、各コマンドキュー２２０内のエントリの半分を満たすのに十分な初期要求クレジットが解放され、単一のキュー内にコマンドを配置するために全てのクレジットが有効化された場合にオーバーフローしないことを保証する。２つのコマンドキューのサイズが等しい場合、解放されるクレジットの数は、通常、１つのコマンドキューのサイズである。２つのコマンドキューのサイズが等しくない場合、小さい方のコマンドキューのサイズを使用して初期クレジット数を判定し、最小のコマンドキューより大きくないクレジット量でクレジットプロセスが初期化されることを保証する。この時点で、データファブリックは、メモリコントローラ２１０に要求を送信するためにデータファブリックに接続された１つ以上のメモリアクセスエージェントによって使用され得る要求クレジットを所有する。 In response to the two memory channels being initialized in block 402, the process in block 404 issues initial request credits to the data fabric, which are enabled for incoming read or write commands. Write commands also require the use of data credits to manage the data buffer 246 (FIG. 2). The data credits are managed separately from the initial and additional credits described herein. The number of initial request credits is determined by the size of the command queues 220. Preferably, enough initial request credits are released to fill half of the entries in each command queue 220, ensuring that there will be no overflow if all credits are enabled to place commands in a single queue. If the two command queues are equal in size, the number of credits released is typically the size of one command queue. If the two command queues are unequal in size, the size of the smaller command queue is used to determine the initial number of credits, ensuring that the credit process is initialized with an amount of credits no greater than the smallest command queue. At this point, the data fabric owns request credits that can be used by one or more memory access agents connected to the data fabric to send requests to the memory controller 210.

ブロック４０６において、プロセスは、関連付けられた要求クレジットをそれぞれ有する読み取り及び書き込みメモリアクセス要求を受信し始める。受信されたアクセス要求ごとに、ブロック４０８において、クレジット制御回路は、例えば、未処理クレジット追跡ロジック３０２（図３）において、未処理の要求クレジットを有効化する。また、アクセス要求は、アドレスデコーダ２２２によって処理されて、関連付けられたアドレスが復号され、このアドレスに基づいて、メモリアクセス要求を受信するためのメモリチャネルのうち何れかが選択される。ブロック４１０において、要求は、アドレスデコーダ２２２の制御下で選択されたメモリチャネルのコマンドキューにロードすることによって、メモリチャネルに割り当てられる。クレジット制御回路３００は、ブロック４１０において各コマンドキューにロードされるアクセス要求を監視する。 At block 406, the process begins receiving read and write memory access requests, each having an associated request credit. For each received access request, at block 408, the credit control circuitry activates the outstanding request credit, for example in the outstanding credit tracking logic 302 (FIG. 3). The access request is also processed by the address decoder 222 to decode the associated address and select one of the memory channels for receiving the memory access request based on the address. At block 410, the request is assigned to a memory channel by loading the command queue of the selected memory channel under the control of the address decoder 222. The credit control circuitry 300 monitors the access requests that are loaded into each command queue at block 410.

ブロック４１２において、プロセスは、１つ以上の追加の要求クレジットが既に発行されており、クレジット制御回路ＦＩＦＯキュー３１６（図３）において解放の保留中であるかどうかを判定する。追加の要求クレジットの解放は、図５に関して更に説明される。追加の要求クレジットが解放の保留中である場合、プロセスはブロック４２０に進み、現在の着信要求に対して要求クレジットは解放されない。そうでない場合、プロセスはブロック４１４に進み、両方のコマンドキューが最大占有率にあるかどうかを判定する。そうである場合、プロセスはブロック４２０に進む。そうでない場合、プロセスはブロック４１６に進み、そこでプロセスは、未処理の要求クレジットが最大値であるかどうかを判定する。最大値は構成可能であり、典型的には、両方のコマンドキューの最大占有率の合計に設定される。ブロック４１６において未処理の要求クレジットが最大値である場合、プロセスはブロック４２０に進む。そうでない場合、プロセスはブロック４１８に進む。 At block 412, the process determines whether one or more additional request credits have already been issued and are pending release in the credit control circuit FIFO queue 316 (FIG. 3). The release of additional request credits is further described with respect to FIG. 5. If additional request credits are pending release, the process proceeds to block 420, where no request credits are released for the current incoming request. If not, the process proceeds to block 414, where the process determines whether both command queues are at maximum occupancy. If so, the process proceeds to block 420. If not, the process proceeds to block 416, where the process determines whether the outstanding request credits are at a maximum value. The maximum value is configurable and is typically set to the sum of the maximum occupancy of both command queues. If the outstanding request credits are at a maximum value at block 416, the process proceeds to block 420. If not, the process proceeds to block 418.

ブロック４１８において、プロセスは、最大数の利用可能なエントリを有するコマンドキューに要求が割り当てられたかどうかを判定する。そうである場合、プロセスはブロック４２２に進み、要求クレジットをデータファブリックに発行させる。クレジット発行ロジック３１０（図３）又は他の好適なデジタルロジック若しくは制御回路は、要求クレジット発行を実行し、未処理のクレジットを更新する。ブロック４２２における要求クレジット発行は、コマンドキューのうち何れかからのコマンドの対応する割り当て解除なしに行われ、これは、２つのコマンドキューのより効率的な使用を可能にするので、図示されたプロセスにおいて有益である。ブロック４１８において、アクセス要求が、最も多い利用可能なエントリを有するコマンドキューに割り当てられていない場合、プロセスはブロック４２０に進み、この特定のアクセス要求が割り当てられていることに応じて、要求クレジットを発行しない。 At block 418, the process determines whether the request has been assigned to the command queue with the most available entries. If so, the process proceeds to block 422, where the request credit is issued to the data fabric. The credit issuance logic 310 (FIG. 3) or other suitable digital or control circuitry performs the request credit issuance and updates the outstanding credits. The request credit issuance at block 422 is performed without a corresponding deallocation of commands from either of the command queues, which is beneficial in the illustrated process because it allows for more efficient use of the two command queues. At block 418, if the access request has not been assigned to the command queue with the most available entries, the process proceeds to block 420, where the request credit is not issued in response to this particular access request being assigned.

図示されたプロセスを使用すると、コマンドがあまり占有されていないコマンドキューに割り当てられる場合に「余分な」又は追加の要求クレジットが発行されることを可能にすることによって、各コマンドキューがより高い容量まで利用されるので、性能上の利点が達成される。図示されたプロセスを、図２に示したようなデュアルアービタメモリコントローラアーキテクチャと組み合わせて使用すると、図４及び図５に示したキュー容量チェックなしでより悲観的な手法を使用する場合よりも、アービタが選択するためにコマンドキュー内で利用可能な多数のコマンドで概して動作しながら、各メモリチャネルを別々に調停することができるので、更なる性能上の利点が達成される。 Using the illustrated process, a performance advantage is achieved since each command queue is utilized to a higher capacity by allowing "extra" or additional request credits to be issued if a command is assigned to a less occupied command queue. When the illustrated process is used in combination with a dual arbiter memory controller architecture such as that shown in FIG. 2, a further performance advantage is achieved since each memory channel can be arbitrated separately while generally operating with a larger number of commands available in the command queue for the arbiter to select from than would be the case if a more pessimistic approach without queue capacity checks were used as shown in FIGS. 4 and 5.

フローチャート４００は、順番に発生するブロック４１０、４１２、４１４、４１６、及び４１８を示しているが、実際の実装形態では、これらの決定は、デジタルロジックによって行われ、様々な実施形態では、任意の好適な順番で、又は示された条件の一部若しくは全てを同時にチェックするロジック回路と並行して行われる。 Although flowchart 400 shows blocks 410, 412, 414, 416, and 418 occurring in sequence, in an actual implementation, these decisions are made by digital logic and in various embodiments in any suitable order or in parallel with logic circuitry that checks some or all of the conditions shown simultaneously.

図５は、デュアルチャネルメモリコントローラにおいて要求クレジットを管理するための別のプロセスのフロー図５００である。この実施形態では、プロセスは、図４のプロセスとともにクレジット制御回路３００によって実行されて、要求クレジットがデュアルチャネルメモリコントローラ、又は、２つ以上のメモリチャネル若しくはサブチャネルのためのメモリコントローラに発行される２つの異なる方法が提供される。 FIG. 5 is a flow diagram 500 of another process for managing request credits in a dual channel memory controller. In this embodiment, the process is performed by the credit control circuitry 300 in conjunction with the process of FIG. 4 to provide two different ways in which request credits may be issued to a dual channel memory controller, or a memory controller for two or more memory channels or sub-channels.

ブロック５０２において、プロセスは、メモリアクセス要求が２つのコマンドキューのうち何れかから割り当て解除されることに応じて開始する。ブロック５０４において、プロセスは、各コマンドキューにおいて利用可能なエントリの数を取得する。この情報は、クレジット制御回路、例えば、キュー０占有ロジック３０４及びキュー１占有ロジック３０６（図３）において維持されることが好ましい。いくつかの実施形態では、プロセスは、ブロック５０４においてコマンドキューに直接アクセスして、各コマンドキュー内の利用可能なエントリの数を取得又は計算することができる。関連する数は、ブロック５０２における割り当て解除された要求を考慮した後の数である。 At block 502, the process begins in response to a memory access request being deallocated from one of two command queues. At block 504, the process obtains the number of available entries in each command queue. This information is preferably maintained in the credit control circuitry, e.g., queue 0 occupancy logic 304 and queue 1 occupancy logic 306 (FIG. 3). In some embodiments, the process may directly access the command queues at block 504 to obtain or calculate the number of available entries in each command queue. The relevant number is the number after taking into account the deallocated request at block 502.

ブロック５０６において、プロセスは、未処理の要求クレジットの数が、２つのコマンドキューの利用可能なエントリの最小数よりも少ないかどうかをチェックし、そうである場合、ブロック５０８において追加の要求クレジットを発行する。この要求クレジットは、ＦＩＦＯキュー３１６（図３）にロードされ、できるだけ早くデータファブリックに解放されることが好ましい。２つのコマンドキューの利用可能なエントリの数が等しい場合に、このプロセスは、未処理の要求クレジットの数が当該等しい数の利用可能なエントリよりも少ない場合に要求クレジットを発行し、そうでない場合（少なくない場合）に、メモリアクセス要求が割り当て解除されることに応じて要求クレジットを発行しない。未処理クレジット追跡ロジック３０２は、追加の要求クレジットがＦＩＦＯキュー３１６を出て、データファブリック上の受信側メモリエージェントによって受信されたことが確認された場合に、追加の要求クレジットを未処理としてカウントすることが好ましい。ブロック５０６において、未処理の要求クレジットの数が、２つのコマンドキューの利用可能なエントリの最小数よりも少なくない場合、プロセスはブロック５１０に進み、ブロック５０２においてメモリアクセス要求が割り当て解除されることに応じて、要求クレジットを発行しない。 At block 506, the process checks whether the number of outstanding request credits is less than the minimum number of available entries in the two command queues, and if so, issues additional request credits at block 508. The request credits are preferably loaded into the FIFO queue 316 (FIG. 3) and released to the data fabric as soon as possible. If the number of available entries in the two command queues is equal, the process issues request credits if the number of outstanding request credits is less than the equal number of available entries, and if not (not less), does not issue request credits in response to the memory access request being deallocated. The outstanding credit tracking logic 302 preferably counts additional request credits as outstanding when it is determined that the additional request credits have left the FIFO queue 316 and been received by a receiving memory agent on the data fabric. If at block 506, the number of outstanding request credits is not less than the minimum number of available entries in the two command queues, the process proceeds to block 510 and does not issue request credits in response to the memory access request being deallocated at block 502.

このクレジット発行プロセスは、２つのコマンドキューのより効率的な使用を可能にするという利点を有する一方で、未処理のクレジットの数が、最も占有されたキューの利用可能なエントリより高くならないことを保証する。データファブリック及びそれにアタッチされた要求側メモリエージェントは、好ましくは、特定の要求クレジットが初期クレジットであるか追加クレジットであるかに関する情報を有さず、クレジット追跡プロセスはデータファブリックに対して透過的になる。データファブリックは、デュアルチャネルメモリコントローラを、単一チャネルのスループット容量より高いスループット容量を有する単一コントローラであるかのように扱うことができる。２つのコマンドキュー及び２つのメモリチャネルの容量は、データファブリックに対して透過的な方法で組み合わされ、一方で、単一コマンドキューのための典型的なクレジット管理プロセスが使用された場合より積極的に要求クレジットが発行されることが可能になる。 This credit issuance process has the advantage of allowing more efficient use of the two command queues while ensuring that the number of outstanding credits never gets higher than the available entries of the most occupied queue. The data fabric and the requesting memory agents attached to it preferably have no information about whether a particular requested credit is an initial credit or an additional credit, making the credit tracking process transparent to the data fabric. The data fabric can treat a dual channel memory controller as if it were a single controller with a higher throughput capacity than that of a single channel. The capacity of the two command queues and two memory channels is combined in a manner that is transparent to the data fabric while allowing requested credits to be issued more aggressively than if a typical credit management process for a single command queue were used.

図２のデュアルチャネルメモリコントローラ２１０、又は、クレジット管理回路２２１及びアドレスデコーダ２２２等のその任意の部分は、プログラムによって読み取られ、集積回路を製造するために直接的又は間接的に使用され得るデータベース又は他のデータ構造の形態のコンピュータアクセス可能データ構造によって記述又は表現され得る。例えば、このデータ構造は、Ｖｅｒｉｌｏｇ又はＶＨＤＬ等の高レベル設計言語（high level design language、ＨＤＬ）におけるハードウェア機能の挙動レベル記述又はレジスタ転送レベル（register-transfer level、ＲＴＬ）記述であってもよい。記述は、合成ライブラリからゲートのリストを含むネットリストを生成するために記述を合成することができる合成ツールによって読み取られることができる。ネットリストは、集積回路を含むハードウェアの機能も表すゲートのセットを含む。ネットリストは、次いで、マスクに適用される幾何学的形状を記述するデータセットを生成するために配置され、ルーティングされ得る。次いで、マスクを様々な半導体製造工程で使用して、集積回路を製造することができる。代替的に、コンピュータアクセス可能格納媒体上のデータベースは、所望に応じて、ネットリスト（合成ライブラリの有無にかかわらず）若しくはデータセット、又は、グラフィックスデータシステム（Graphic Data System、ＧＤＳ）ＩＩデータであり得る。 The dual channel memory controller 210 of FIG. 2, or any portion thereof, such as the credit management circuit 221 and the address decoder 222, may be described or represented by a computer-accessible data structure in the form of a database or other data structure that may be read by a program and used directly or indirectly to manufacture an integrated circuit. For example, this data structure may be a behavioral or register-transfer level (RTL) description of the hardware functionality in a high level design language (HDL) such as Verilog or VHDL. The description may be read by a synthesis tool that may synthesize the description to generate a netlist that includes a list of gates from a synthesis library. The netlist includes a set of gates that also represent the functionality of the hardware that comprises the integrated circuit. The netlist may then be placed and routed to generate a data set that describes the geometric shapes that are applied to a mask. The mask may then be used in various semiconductor manufacturing processes to manufacture the integrated circuit. Alternatively, the database on the computer-accessible storage medium may be a netlist (with or without a synthesis library) or a data set, or Graphics Data System (GDS) II data, as desired.

特定の実施形態を説明してきたが、これらの実施形態に対する様々な修正が当業者には明らかである。例えば、デュアルチャネルメモリコントローラが例として使用されているが、本明細書の技術は、データファブリック及びホストデータ処理システムに透過的な方法でそれらの容量を組み合わせるために、３つ以上のメモリチャネルに適用されてもよい。例えば、３つ又は４つのメモリチャネルは、各チャネルに対して個別のコマンドキュー及びメモリチャネル制御回路を提供する一方で、単一のインターフェースと、アドレスデコーダと、個々のメモリチャネルから独立した要求クレジットをデータファブリックに発行するクレジット制御回路と、を提供することによって、本明細書の技術を使用して制御され得る。更に、デュアルチャネルメモリコントローラ２１０の内部アーキテクチャは、異なる実施形態では変化し得る。デュアルチャネルメモリコントローラ２１０は、高帯域幅メモリ（ＨＢＭ）、ＲＡＭｂｕｓＤＲＡＭ（RAMbus DRAM、ＲＤＲＡＭ）等のような、ＤＤＲｘ以外の他のタイプのメモリとインターフェースすることができる。図示した実施形態は、個別のＤＩＭＭ又はＳＩＭＭに対応するメモリの各ランクを示したが、他の実施形態では、各モジュールは複数のランクをサポートすることができる。更に他の実施形態は、ホストマザーボードに取り付けられたＤＲＡＭ等のように、特定のモジュールに含まれていない他のタイプのＤＲＡＭモジュール又はＤＲＡＭを含むことができる。したがって、添付の特許請求の範囲は、開示された実施形態の範囲に含まれる、開示された実施形態の全ての変更を網羅することを意図している。 While specific embodiments have been described, various modifications to these embodiments will be apparent to those skilled in the art. For example, while a dual channel memory controller is used as an example, the techniques herein may be applied to three or more memory channels to combine their capacities in a manner that is transparent to the data fabric and the host data processing system. For example, three or four memory channels may be controlled using the techniques herein by providing separate command queues and memory channel control circuits for each channel while providing a single interface, address decoder, and credit control circuitry that issues request credits to the data fabric independent of the individual memory channels. Additionally, the internal architecture of the dual channel memory controller 210 may vary in different embodiments. The dual channel memory controller 210 may interface with other types of memory other than DDRx, such as high bandwidth memory (HBM), RAMbus DRAM (RAMbus DRAM, RDRAM), etc. Although the illustrated embodiment shows each rank of memory corresponding to a separate DIMM or SIMM, in other embodiments, each module may support multiple ranks. Still other embodiments may include other types of DRAM modules or DRAM not contained in a specific module, such as DRAM attached to a host motherboard. Accordingly, the appended claims are intended to cover all modifications of the disclosed embodiments that fall within the scope of the disclosed embodiments.

Claims

1. A memory controller, comprising:
an address decoder having a first input for receiving a memory access request, a first output, and a second output;
a first command queue having an input coupled to the first output of the address decoder for receiving memory access requests for a first memory channel, and a number of entries for holding memory access requests;
a second command queue having an input coupled to the second output of the address decoder for receiving memory access requests for a second memory channel, and a number of entries for holding memory access requests;
a request credit control circuit coupled to the first command queue and the second command queue, the request credit control circuit operable to track a number of outstanding request credits and to issue request credits to a data fabric based on a number of available entries in the first command queue and the second command queue.
Memory controller.

issuance of a credit request based on a number of available entries in the first command queue and the second command queue includes issuing a credit request when a number of outstanding request credits is less than a minimum number of available entries in the first command queue and the second command queue.
The memory controller of claim 1 .

the request credit control circuitry is coupled to a data fabric of the data processing unit and is operable to issue request credits to memory accessing agents via the data fabric;
The memory controller of claim 1 .

the request credit control circuitry is operable to issue a request credit without a corresponding deallocation from the first command queue or the second command queue when a memory access request is received that is assigned to a command queue among the first command queue and the second command queue that has a maximum number of available entries.
The memory controller of claim 1 .

a first arbiter coupled to the first command queue to select an entry from the first command queue, place the entry in a first memory interface queue, and transmit the entry over the first memory channel;
a second arbiter coupled to the second command queue to select an entry from the second command queue, place the entry in a second memory interface queue, and transmit the entry over the second memory channel.
The memory controller of claim 1 .

the address decoder is operable to direct each memory access request to either the first command queue or the second command queue based on a target address of the memory access request.
The memory controller of claim 1 .

the first command queue has a different size than the second command queue.
The memory controller of claim 1 .

at least one additional command queue; and at least one additional arbiter coupled to the additional command queue;
the request credit control circuitry is operable to issue request credits in response to a memory access request being deallocated from any of the command queues if a number of outstanding request credits is less than a minimum number of entries available among all command queues, and to not issue request credits in response to the memory access request being deallocated otherwise.
The memory controller of claim 1 .

1. A method comprising:
receiving a plurality of memory access requests at a memory controller;
decoding addresses of the memory access requests and selecting one of a first memory channel and a second memory channel for receiving each of the memory access requests;
sending each memory access request after decoding the address to one of a first command queue associated with the first memory channel and a second command queue associated with the second memory channel;
and issuing request credits to a data fabric based on a number of available entries in the first command queue and the second command queue in response to a specified event.
method.

issuance of a credit request based on a number of available entries in the first command queue and the second command queue includes issuing a credit request when a number of outstanding request credits is less than a minimum number of available entries in the first command queue and the second command queue.
10. The method of claim 9 .

issuing a request credit without a corresponding deallocation from the first command queue or the second command queue when a memory access request is received that is assigned to a command queue among the first command queue and the second command queue that has a maximum number of available entries.
The method of claim 10 .

the specified event is a memory access request being deallocated from either the first command queue or the second command queue;
10. The method of claim 9 .

selecting an entry from the first command queue using a first arbiter, placing the entry in a first memory interface queue, and transmitting the entry over the first memory channel;
selecting an entry from the second command queue with a second arbiter, placing the entry in a second memory interface queue, and transmitting the entry over the second memory channel.
10. The method of claim 9 .

directing each memory access request to either the first command queue or the second command queue based on a target address of the memory access request.
10. The method of claim 9 .

1. A data processing system comprising:
Data Fabric and
a first memory channel and a second memory channel;
a memory controller coupled to the data fabric and to the first and second memory channels to fulfill memory access requests received via the data fabric from at least one memory access engine;
The memory controller includes:
an address decoder having a first input for receiving a memory access request, a first output, and a second output;
a first command queue having an input coupled to the first output of the address decoder for receiving memory access requests for a first memory channel, and a number of entries for holding memory access requests;
a second command queue having an input coupled to the second output of the address decoder for receiving memory access requests for a second memory channel, and a number of entries for holding memory access requests;
a request credit control circuit coupled to the first command queue and the second command queue, the request credit control circuit operable to track a number of outstanding request credits and to issue request credits to the data fabric based on a number of available entries in the first command queue and the second command queue.
Data processing system.

issuance of a credit request based on a number of available entries in the first command queue and the second command queue includes issuing a credit request when a number of outstanding request credits is less than a minimum number of available entries in the first command queue and the second command queue.
16. The data processing system of claim 15 .

the request credit control circuitry is coupled to the data fabric and is operable to issue request credits to memory accessing agents via the data fabric.
16. The data processing system of claim 15 .

the request credit control circuitry is operable to issue a request credit without a corresponding deallocation from the first command queue or the second command queue when a memory access request is received that is assigned to a command queue among the first command queue and the second command queue that has a maximum number of available entries.
16. The data processing system of claim 15 .

a queue occupancy circuit operable to calculate a current number of available command entries in each command queue for said request credit control circuit;
16. The data processing system of claim 15 .

a first arbiter coupled to the first command queue to select an entry from the first command queue, place the entry in a first memory interface queue, and transmit the entry over the first memory channel;
a second arbiter coupled to the second command queue to select an entry from the second command queue, place the entry in a second memory interface queue, and transmit the entry over the second memory channel.
16. The data processing system of claim 15 .

the address decoder is operable to direct each memory access request to either the first command queue or the second command queue based on a target address of the memory access request.
16. The data processing system of claim 15 .

the memory access engine is a coherent memory slave controller coupled to the data fabric for executing memory access requests from at least one data processing unit;
16. The data processing system of claim 15 .

the memory access engine is a coherent memory master controller coupled to the data fabric for executing memory access requests from at least one data processing unit;
16. The data processing system of claim 15 .