IPC medley: message-queue peeking, io_uring, and bus1
Peeking at message queues
The POSIX message-queue API is not heavily used, but there are users out there who care about how well it works. Message queues are named objects that, by default, all share a global namespace, though IPC namespaces can be used to separate them. There is a whole set of system calls for the creation, configuration, use, and destruction of message queues; see the mq_overview man page for an introduction to this subsystem.
Of interest here is mq_timedreceive(), which can be used to receive messages from a message queue:
ssize_t mq_timedreceive(size_t msg_len;
mqd_t mqdes, char msg_ptr[msg_len],
size_t msg_len, unsigned int *msg_prio,
const struct timespec abs_timeout);
This call will receive the highest-priority message pending in the queue described by mqdes (which is a file descriptor on Linux systems) into the buffer pointed to by msg_ptr, which must be at least msg_len bytes in length. If abs_timeout is not null, it specifies how long the call should block before returning a timeout error. On successful receipt of a message, the location pointed to by msg_prio (if non-null) will be set to the priority of the received message.
That system call has a fair number of parameters, but Mathura Kumar would like to add some more. Since mq_timedreceive() was not designed for extensibility, that means adding a new system call. Thus, Kumar's patch set adding mq_timedreceive2(). But there is an additional constraint here: there are architecture-imposed limits on the number of arguments that can be passed to system calls, and Kumar's plans would exceed those limits. As a result, the new system call is defined as:
struct mq_timedreceive2_args {
size_t msg_len;
unsigned int *msg_prio;
char *msg_ptr;
};
ssize_t mq_timedreceive2(mqd_t mqdes,
struct mq_timedreceive2_args *uargs,
unsigned int flags,
unsigned long index,
const struct timespec *abs_timeout);
The msg_len, msg_prio, and msg_ptr arguments have been moved into the new mq_timedreceive2_args structure, freeing up two slots for new parameters to the system call. That structure is passed by pointer, without using the common pattern of passing its length, which would make future additions easier; that may change if this patch series moves forward.
The new arguments are flags and index. In this series, only one flag (MQ_PEEK) is defined; if it is present, the message will be returned as usual, but without removing it from the queue, meaning that it will still be there the next time a receive operation is performed. The index argument indicates which message is of interest; a value of zero will return the highest-priority message, and higher values will return messages further back in the queue.
There are a few use cases for these features described in the patch cover letter. One would be monitoring tools, which may want to look at the message traffic without interfering with it. Another one is Checkpoint/Restore in Userspace, which can read a series of messages out of a queue, then restore them with the rest of the process at a future time.
The series as a whole has not received much attention so far, which is perhaps unsurprising given that few developers have much interest in POSIX message queues. If this work is to proceed, it will need to attract some reviews, and probably go through some more rounds to address the problems that are found.
IPC in io_uring
Since its inception, the io_uring subsystem has steadily gained functionality. After having started as the asynchronous I/O mechanism that Linux has long lacked, it has evolved into a separate system-call interface providing access to increasing amounts of kernel functionality. While io_uring can be used for interprocess communication (by way of Unix-domain sockets, for example), it has not yet acquired its own IPC scheme. This patch series from Daniel Hodges seeks to change that situation, but it probably needs a fair amount of work to get there.
Hodges's goal is to provide a high-bandwidth IPC mechanism, similar to D-Bus, that will perform well on large systems. By using shared ring buffers, processes should be able to communicate with minimal copying of data. It is worth noting that other developers have attempted to solve this problem over the years, generally without success; see, for example, the sad story of kdbus. Hope springs eternal, though, and perhaps io_uring is the platform upon which a successful solution can be built.
There are facilities for direct and broadcast messages. Communication is done through "channels"; it all starts when one process issues at least one IORING_REGISTER_IPC_CHANNEL_CREATE operation to establish an open channel. Other processes can attach to existing channels if the permissions allow. Two basic operations, IORING_OP_IPC_SEND and IORING_OP_IPC_RECV, are used to send and receive messages, respectively. There is no documentation, naturally, but interested readers can look at this patch containing a set of self-tests that exercise the new features.
The io_uring maintainer, Jens Axboe, quickly noticed
that the patch showed signs of LLM-assisted creation, something that Hodges
owned
up to. He also noted that the series falls short of being a complete
D-Bus replacement, lacking features like credential management. Still
Axboe agreed
that an IPC feature for io_uring "makes sense to do
" and seemed
happy with the overall design of the code. Some questions he asked though,
went unanswered. For this work to proceed, Hodges will need to return and
do the hard work to bring a proof-of-concept patch up to the level needed
for integration into a core subsystem like io_uring.
Bus1 returns
Back in 2016, David Herrmann Rheinsberg proposed a new
kernel subsystem called "bus1", which would provide kernel-mediated
interprocess communication along the lines of D-Bus. It allowed the passing
of messages, but also of capabilities, represented by bus1 handles and open
file descriptors. The proposal attracted some attention, and brought some
interesting ideas (see the above-linked article for details), but stalled
fairly quickly and was never seriously considered for merging into the
mainline kernel.
Ten years later, bus1 is back,
posted this time by David Rheinsberg. The code has seen a few changes in
the intervening decade:
The biggest change is that we stripped everything down to the basics and reimplemented the module in Rust. It is a delight not having to worry about refcount ownership and object lifetimes, but at the cost of a C<->Rust bridge that brings some challenges.
The core features of bus1 remain similar to what was proposed in 2016. For the time being, Rheinsberg is focusing on the Rust aspects of the work and requesting help from the Rust for Linux community to get that integration into better shape.
At some future time, presumably, the new bus1 implementation will be more widely exposed within the kernel community, at which point we will see if there is an appetite for this kind of in-kernel IPC mechanism or not. For those who would like an early look, this patch contains documentation on how the bus1 API will work, though with a number of details left unspecified.
[Editor's note: we originally missed that David had changed his name. Apologies for the error.]
| Index entries for this article | |
|---|---|
| Kernel | bus1 |
| Kernel | io_uring |
| Kernel | Message passing |