I am trying to create a Rust library to simplify working with shared memory on Linux. Processes would only be able to request shared memory through the library, but I cannot guarantee that they won't panic or be killed abruptly. I intend to use the library in runs, during which there is always at least one program using the shared memory, but once the run is concluded the shared memory needs to be cleaned up. There is no guarantee that a program will run for the entirety of the run, nor is it guaranteed that the last exiting program will exit cleanly. What is the best way to preserve the shared memory while the processes are still executing, but remove it immediately after the last attached process exits/is terminated?
As far as I understand, there are two options for creating shared memory on Linux: the System V-style shmat
or the POSIX-style shm_open
. I prefer shm_open
since it creates shared memory accessible through the file system, but this is not a strict requirement.
My problem would not exists if I could attach to a System V-style shared memory marked for deletion via a new process:
unsafe {
let shmid = shmget(KEY, SHMEM_SIZE, IPC_CREAT | IPC_EXCL | 0o770);
let shmaddr = shmat(shmid, null_mut(), 0);
shmctl(shmid, IPC_RMID, null_mut()); // mark it here for deletion
let ptr = shmaddr as *mut [u8; SHMEM_SIZE as usize];
*ptr = [12; SHMEM_SIZE as usize];
}
And from a different process:
unsafe {
let shmid = shmget(KEY, SHMEM_SIZE, IPC_EXCL); // this will not work since KEY is already marked for deletion
let shmaddr = shmat(shmid, null_mut(), 0);
let ptr = shmaddr as *mut [u8; SHMEM_SIZE as usize];
*ptr = [13; SHMEM_SIZE as usize];
}
Thereby delaying the deletion until both the new and the old process exits.
From what I see, I have two options—neither of which are particularly good:
- I could create a daemon-like background process that new processes must register with by sending their PID. The daemon would store each PID internally and would obtain a file descriptor by calling the
pidfd_open
syscall. I could then usepoll
to monitor whether processes are still running and remove the shared memory when the PID list becomes empty (see the footer for my half-baked solution). The problem with this approach is that it feels brittle, and I would like to avoid spawning a separate process just to monitor shared memory usage. - I could use eBPF to track processes opening my shared memory file. The eBPF program would maintain a PID list similar to the first approach and remove the shared memory when no processes remain. However, I have little experience with eBPF, so I'm unsure whether this is a viable approach. Additionally, eBPF typically requires a privileged environment, which I would prefer to avoid.
Apologies if this question has been asked before—it seems too obvious to me—but I have searched for an elegant solution and haven't found one.
The referenced daemon loop:
pub fn handle_unix_socket(
&self,
pollfd: &mut PollFd,
) -> Result<(Option<OwnedFd>, Option<u32>), std::io::Error> {
let mut buffer = [0; 4];
if pollfd
.revents()
.unwrap_or(PollFlags::empty())
.contains(PollFlags::POLLIN)
{
if let Ok((mut stream, _)) = self.listener.accept() {
let _ = stream.read(&mut buffer)?;
let pid = u32::from_be_bytes(buffer);
let fd = unsafe { syscall2(syscalls::Sysno::pidfd_open, pid as usize, 0).unwrap() };
let fd = unsafe { OwnedFd::from_raw_fd(fd as i32) };
stream.write_all(&buffer)?;
Ok((Some(fd), Some(pid)))
} else {
todo!()
}
} else {
Ok((None, None))
}
}
pub fn event_loop(&mut self) -> Result<(), std::io::Error> {
loop {
let mut pollfds = vec![PollFd::new(self.listener.as_fd(), PollFlags::POLLIN)];
for fd in self.fds.iter() {
pollfds.push(PollFd::new(fd.as_fd(), PollFlags::POLLIN));
}
poll(&mut pollfds, PollTimeout::NONE)?;
let (new_fd, new_pid) = self.handle_unix_socket(&mut pollfds[0])?;
let remove_indicies = (1..pollfds.len())
.rev()
.filter(|i| {
if let Some(pflag) = pollfds[*i].revents() {
if pflag.contains(PollFlags::POLLIN) {
return true;
}
}
false
})
.collect::<Vec<_>>();
for i in remove_indicies {
self.pids.remove(i - 1);
self.fds.remove(i - 1);
}
match (new_fd, new_pid) {
(Some(new_fd), Some(new_pid)) => {
self.fds.push(new_fd);
self.pids.push(new_pid);
}
_ => {}
}
if self.pids.is_empty() {
return Ok(());
}
}
}
I am trying to create a Rust library to simplify working with shared memory on Linux. Processes would only be able to request shared memory through the library, but I cannot guarantee that they won't panic or be killed abruptly. I intend to use the library in runs, during which there is always at least one program using the shared memory, but once the run is concluded the shared memory needs to be cleaned up. There is no guarantee that a program will run for the entirety of the run, nor is it guaranteed that the last exiting program will exit cleanly. What is the best way to preserve the shared memory while the processes are still executing, but remove it immediately after the last attached process exits/is terminated?
As far as I understand, there are two options for creating shared memory on Linux: the System V-style shmat
or the POSIX-style shm_open
. I prefer shm_open
since it creates shared memory accessible through the file system, but this is not a strict requirement.
My problem would not exists if I could attach to a System V-style shared memory marked for deletion via a new process:
unsafe {
let shmid = shmget(KEY, SHMEM_SIZE, IPC_CREAT | IPC_EXCL | 0o770);
let shmaddr = shmat(shmid, null_mut(), 0);
shmctl(shmid, IPC_RMID, null_mut()); // mark it here for deletion
let ptr = shmaddr as *mut [u8; SHMEM_SIZE as usize];
*ptr = [12; SHMEM_SIZE as usize];
}
And from a different process:
unsafe {
let shmid = shmget(KEY, SHMEM_SIZE, IPC_EXCL); // this will not work since KEY is already marked for deletion
let shmaddr = shmat(shmid, null_mut(), 0);
let ptr = shmaddr as *mut [u8; SHMEM_SIZE as usize];
*ptr = [13; SHMEM_SIZE as usize];
}
Thereby delaying the deletion until both the new and the old process exits.
From what I see, I have two options—neither of which are particularly good:
- I could create a daemon-like background process that new processes must register with by sending their PID. The daemon would store each PID internally and would obtain a file descriptor by calling the
pidfd_open
syscall. I could then usepoll
to monitor whether processes are still running and remove the shared memory when the PID list becomes empty (see the footer for my half-baked solution). The problem with this approach is that it feels brittle, and I would like to avoid spawning a separate process just to monitor shared memory usage. - I could use eBPF to track processes opening my shared memory file. The eBPF program would maintain a PID list similar to the first approach and remove the shared memory when no processes remain. However, I have little experience with eBPF, so I'm unsure whether this is a viable approach. Additionally, eBPF typically requires a privileged environment, which I would prefer to avoid.
Apologies if this question has been asked before—it seems too obvious to me—but I have searched for an elegant solution and haven't found one.
The referenced daemon loop:
pub fn handle_unix_socket(
&self,
pollfd: &mut PollFd,
) -> Result<(Option<OwnedFd>, Option<u32>), std::io::Error> {
let mut buffer = [0; 4];
if pollfd
.revents()
.unwrap_or(PollFlags::empty())
.contains(PollFlags::POLLIN)
{
if let Ok((mut stream, _)) = self.listener.accept() {
let _ = stream.read(&mut buffer)?;
let pid = u32::from_be_bytes(buffer);
let fd = unsafe { syscall2(syscalls::Sysno::pidfd_open, pid as usize, 0).unwrap() };
let fd = unsafe { OwnedFd::from_raw_fd(fd as i32) };
stream.write_all(&buffer)?;
Ok((Some(fd), Some(pid)))
} else {
todo!()
}
} else {
Ok((None, None))
}
}
pub fn event_loop(&mut self) -> Result<(), std::io::Error> {
loop {
let mut pollfds = vec![PollFd::new(self.listener.as_fd(), PollFlags::POLLIN)];
for fd in self.fds.iter() {
pollfds.push(PollFd::new(fd.as_fd(), PollFlags::POLLIN));
}
poll(&mut pollfds, PollTimeout::NONE)?;
let (new_fd, new_pid) = self.handle_unix_socket(&mut pollfds[0])?;
let remove_indicies = (1..pollfds.len())
.rev()
.filter(|i| {
if let Some(pflag) = pollfds[*i].revents() {
if pflag.contains(PollFlags::POLLIN) {
return true;
}
}
false
})
.collect::<Vec<_>>();
for i in remove_indicies {
self.pids.remove(i - 1);
self.fds.remove(i - 1);
}
match (new_fd, new_pid) {
(Some(new_fd), Some(new_pid)) => {
self.fds.push(new_fd);
self.pids.push(new_pid);
}
_ => {}
}
if self.pids.is_empty() {
return Ok(());
}
}
}
Share
Improve this question
edited Mar 12 at 22:42
sesh
asked Mar 12 at 22:22
seshsesh
951 silver badge4 bronze badges
3
- see shm_open docs where the creator of the shared memory waits on a semaphore for the child to signal a successful attach before it unlinks the memory. – Ahmed AEK Commented Mar 12 at 22:55
- I don't believe I can unlink the shared memory object at any point (except when there are no more consumers) as I don't know if a new process trying to access the shared memory will appear. – sesh Commented Mar 12 at 23:14
- I don't see a problem with spawning a process just to monitor shared memory usage, that's what every other language typically does when it comes to shared resources, and the performance isn't that bad, they usually also add some encryption on top to prevent malicious activity. just make sure there's always a leader whose job is to kill that monitor process and all the other child processes to not leave any zombies around. – Ahmed AEK Commented Mar 13 at 0:14
2 Answers
Reset to default 2Your criteria:
Processes would only be able to request shared memory through the library, but I cannot guarantee that they won't panic or be killed abruptly.
I intend to use the library in runs, during which there is always at least one program using the shared memory, but once the run is concluded the shared memory needs to be cleaned up.
It is not clear what you mean by a "run". Do you mean that:
- All programs are a subprocess of the main/master process (i.e. the "launcher")? If so, the launcher can just do
wait
and when all subprocesses are done, you canunlink
the shared memory file. But, even the "unlinking" isn't needed. See below. - A master process is running, but as long as it runs or a subprocess of it is running, a program that is not a subprocess of the master may be started. As an example, we start the master process. Then, in a separate terminal window, we start another program that "attaches" to the shared memory area.
The first option is clean and easy. The second can be done but there is more overhead.
Method A:
Because of criteria (2), for run type (1), we can use SysV shm*
or mmap
, using anonymous mappings.
In the case of SysV shm*
, the master/parent/top process will do:
- Use
shmget
withIPC_PRIVATE
. - Use
shmat
normally. - All subprocesses use the shared memory area (from a global pointer)
- When the last process detaches from the area, because of
IPC_PRIVATE
, the area is automatically reclaimed by the system.
For mmap
:
- Use
mmap
withMAP_SHARED
andMAP_ANONYMOUS
- All subprocesses use the same shared memory pointer.
- When all processes terminate, the shared memory is automatically reclaimed.
Method B:
For run type (2), it is a bit tricky. Because some programs can be started that are not a subprocess of the master process, we can't just do wait
and terminate.
But, having said that, it's really not necessary to have a given program "register" with the master. And, personally, the bpf
notion is fragile.
With an example file of /tmp/myshared
,
we use mmap
with MAP_SHARED
. A given subprocess or "external" program will do:
- Do
open
on/tmp/myshared
- Use
ftruncate
as needed to grow/shrink the file. - Do
mmap
on the open file descriptor withMAP_SHARED
. - Use the shared memory
- Either
close
the file descriptor or terminate the program (even abnormally).
The master process will have to poll/monitor /proc
, looking at all [accessible] /proc/<pid>/fd
directories:
- The master will use
opendir/readdir/closedir
on the/proc/<pid>/fd
directory. - The entries are all numeric (corresponding to the given file desciptor number).
- For each entry, we do:
readlink
on the path (e.g./proc/<pid>/fd/5
). We are looking for any entry that maps to our shared file:/tmp/myshared
The master must loop on all /proc/<pid>/fd
repeatedly. If none are found to reference /tmp/myshared
, the master can unlink /tmp/myshared
.
This polling is what the fuser
program does (would do) for /tmp/myshared
.
This will definitely work for open
and mmap
.
It will also work for shm_open
et. al. For a given shm_open
with /myshared
, we just look in /proc
for anything that references /dev/shm/myshared
.
Method A (using SysV shmget
and shmat
with IPC_PRIVATE
) is, if at all possible, the preferred method. I've written several commercial, mission critical, must-not-fail systems that use this method.
So, I highly recommend that you structure your "runs" to be able to use this method.
However, if you must, you can get passible results with method B. The "master" process could even be a daemon if needed.
UPDATE:
There is an issue with method B. How to "register" the list of shared memory filenames or /whatever
for shm_open
?
We don't want to (even try to) unlink
a shared file that is not part of a "run" in any way.
For mmap
with MAP_SHARED
, we could just put all such files under a special directory (e.g. /home/myuser/run_controlled
). The master process could monitor this directory and poll /proc/<pid>fd
for any entries that are prefixed with /home/myuser/run_controlled/whatever
. This would be private to the given user and extensible.
For shm_open
, all files generated by all users [AFAIK, but I could be wrong about this] are in a shared/common namespace:
- If a different user created an entry (e.g.)
/foobar
, although an attempt tounlink
it would fail due to file permissions, user B might not appreciate having even unsuccessfulunlink
attempts "pinging" the file. - This would also be true for the same user running an arbitrary program that is completely unrelated to a "run".
So, we'd need some prefix that is unique (e.g.) /myuser_run_controlled_something
. Then, only files that begin with such a prefix to be considered.
This is the easy way [recommended].
Otherwise, the master would need some way for the "run" programs to "register" random shared filename paths. I don't recommend the additional complexity, but the registration could be done with:
- A
PF_UNIX
socket. - A
PF_INET
socket tolocalhost
and a given port/service number. - A SysV IPC message (e.g.
msgsnd/msgrcv
) - Similar to
PF_INET
, the master could establish a multicast socket on the local host (e.g.239.x.y.z
) and clients could send UDP messages with the filenames.
This last method has additional the advantage that all clients could monitor this IGMP
stream and be aware of other peers. And, as an added bonus, all clients could periodically repeat such UDP messages. This could function as a "keepalive" protocol.
If the crux of the solution is to detect when all users of a system resource have terminated — which is what Craig Estey's /proc/*/fd
sweeping daemon answer does and what IPC_RMID doesn't quite do for your use case — you can make use of at least two other venerable unix facilities: advisory locking and, separately, pipes.
Advisory Locking with Periodically Sweeping Daemon
One daemon process overall, and one client fd per shared memory segment.
Shared memory clients obtain a shared lock (F_RDLCK) on a lockfile whose name embeds the shared memory identifier.* Of course, in the case of mmap'd files, the underlying file itself could be the lockfile. When the sweeping daemon is able to exclusively lock (F_WRLCK) the same, it can safely dismiss the shared memory and lockfile.
To avoid a race where the daemon locks a lockfile before the very first shared memory client in a run has created the file but not yet obtained its lock, the lockfile should be created with a different path or naming prefix and then atomically rename()
d or (link()
d / unlink(original)
d) into place after locking.
Per-segment Cleanup Process
One cleanup process per shared memory segment, and one client fd per shared memory segment.
The very first shared memory client mkfifo
s a pipe whose name embeds the shared memory identifier.* This client opens the pipe for writing (O_NONBLOCK|O_CLOEXEC) and also separately for reading, then spawns a small, purpose-built cleanup process with the reading fd on stdin and the shared memory identifier as argv[1]
. The client then closes its read fd and allocates and initializes the shared memory. Subsequent shared memory clients open the pipe for writing only.
The small cleanup process (e.g., /usr/libexec/mylibrary/rmshmem
) simply read()
s on stdin awaiting EOF, perhaps after calling setsid()
, which EOF indicates that all write fds to the FIFO are closed and thus, in your use case, that all writers have terminated. After that EOF, it can dismiss the shared memory, delete the named pipe, and exit.
For example:
/run/mylibrary/shmid.lock -> ftok("/run/mylibrary/shmid.lock", ...)
/run/mylibrary/shmid.fifo -> sem_open("/shmid", ...)
/run/mylibrary/shmid -> fd=open("/run/mylibrary/shmid", ...); mmap(..., fd, 0)