Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Memory Registration in High-Performance Pinning-based Networks

Firehose: A Memory Registration Strategy for High-Performance Pinning-based Networks

Keywords: RDMA, Memory Registration, GASNet, Global Address Space Languages.

Firehose is a distributed memory registration strategy for supporting Remote DMA (RDMA) operations over pinning-based networks. RDMA, in the field of High-Performance Computing, is an extension of User-Level networking where user processes can directly read and write data to the network. This particular concept is not really new. It started appearing in research with the U-Net research project and has since been incorporated by many High Performance Computing vendors in various forms.

Approaches taken to register memory for Remote DMA operation divide available technologies into two categories:

Hardware Assisted: Much like a regular paging-based VM system, but in this case the Network Interfaces combines a hardware TLB with tweaks in the kernel VM to track changes to virtual-to-physical page mappings.
Explicit Pinning-Based: The programmer is left to manage explicit regions to be enabled for Remote DMA operations, which means pinning the virtual-to-physical page mappings ahead of time. This presents many (probably too many) design trade-offs: memory access patterns where the active working set size cannot fit in physical memory, applications sensitive to latency and as such any host-level synchronization, the cost of pinning memory, using one-sided memory operations, ...

Papers

C. Bell, D. Bonachea. A New DMA Registration Strategy for Pinning-Based High Performance Networks. In Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003). IEEE Computer Society, 2003. [PDF]
C. Bell, R. Nishtala. Firehose: An Algorithm for Distributed Page Registration On Clusters of SMPs. Unpublished, Submitted for Class Project CS262b. May 2004. [PDF]

Firehose Algorithm

In a single sentence, Firehose is a memory registration mechanism that establishes a maximum amount of locally pinnable memory and distributes the management of a fixed fraction of this memory to each node in a parallel job. For example, if at most M pages of physical memory can be registered on node n_i, every node can assume that can independently manage p pages on node n_i, where p is established from p = M / nodes. As such, up to p mappings (or firehoses) can be maintained into n_i's address space, where a firehose is a mapping that guarantees that the underlying physical pages are pinned and where Remote DMA can operate fully one-sided and synchronization-free (data is free to pour through a firehose).

A node that requires more firehose mappings that the p it is statically allotted to every node must reuse some of it's unused mappings, or in fact move one of it's existing but stale firehose. A firehose is active while data is pouring through it or is otherwise inactive.

Naturally, constructive interference actually works at firehose's advantage, which occurs if more than one remote remote happen to reference the same set of pages pages. Such a case causes the registration operation to be essentially free. Consider the live firehose snapshot shown below:

Live Firehose snapshot of node B's memory space with nodes A and C mapping 5 of their 8 firehoses into B's memory space. Node B keeps a reference count with every one of it's pinned pages: zero refcounts are pinned but unreferenced by any firehoses while non-zero refcounts are referenced by one or more firehoses.

Optimizations

A few optimizations are possible from the base case shown above. For one, once a reference count reaches zero, the underlying page can remain pinned and deregistered in a lazy manner or when the local node is running out of pages it can pin. Also, the fact that per-node firehoses are statically partitioned at startup assumes that each node will require the same amount of firehoses to every other node, something that doesn't hold for all applications (i.e. nearest neighbour and/or boundary-type computations). Because of lazy deregistration, these applications may incur extra roundtrips to move firehoses but the registration cost will not necessarily be paid.

To be continued. .

Back to Christian Bell homepage