From 1437d3df15c1efae3164ae45c3285bd9959def5f Mon Sep 17 00:00:00 2001 From: Jade Lovelace Date: Wed, 7 Aug 2024 02:00:50 -0700 Subject: [PATCH] darwin: workaround PROC_PIDLISTFDS on processes with no fds This has been causing various seemingly spurious CI failures as well as some failures on people running tests on beta builds. lix> ++(nix-collect-garbage-dry-run.sh:20) nix-store --gc --print-dead lix> ++(nix-collect-garbage-dry-run.sh:20) wc -l lix> finding garbage collector roots... lix> error: Listing pid 87261 file descriptors: Undefined error: 0 There is no real way to write a proper test for this, other than to start a process like the following: int main(void) { for (int i = 0; i < 1000; ++i) { close(i); } sleep(10000); } and then let Lix's gc look at it. I have a relatively high confidence this *will* fix the problem since I have manually confirmed the behaviour of the libproc call is as-unexpected, and it would perfectly explain the observed symptom. Fixes: https://git.lix.systems/lix-project/lix/issues/446 Change-Id: I67669b98377af17895644b3bafdf42fc33abd076 --- doc/manual/rl-next/haunted-gc-macos.md | 15 +++++++++++++++ src/libstore/platform/darwin.cc | 17 ++++++++++++++++- 2 files changed, 31 insertions(+), 1 deletion(-) create mode 100644 doc/manual/rl-next/haunted-gc-macos.md diff --git a/doc/manual/rl-next/haunted-gc-macos.md b/doc/manual/rl-next/haunted-gc-macos.md new file mode 100644 index 000000000..3ce912b2d --- /dev/null +++ b/doc/manual/rl-next/haunted-gc-macos.md @@ -0,0 +1,15 @@ +--- +synopsis: "Fix unexpectedly-successful GC failures on macOS" +cls: 1723 +issues: fj#446 +credits: jade +category: Fixes +--- + +Has the following happened to you on macOS? This failure has been successfully eliminated, thanks to our successful deployment of advanced successful-failure detection technology (it's just `if (failed && errno == 0)`. Patent pendingnot really): + +``` +$ nix-store --gc --print-dead +finding garbage collector roots... +error: Listing pid 87261 file descriptors: Undefined error: 0 +``` diff --git a/src/libstore/platform/darwin.cc b/src/libstore/platform/darwin.cc index 1b591fde3..1f7e9be23 100644 --- a/src/libstore/platform/darwin.cc +++ b/src/libstore/platform/darwin.cc @@ -56,12 +56,27 @@ void DarwinLocalStore::findPlatformRoots(UncheckedRoots & unchecked) while (fdBufSize > fds.size() * sizeof(struct proc_fdinfo)) { // Reserve some extra size so we don't fail too much fds.resize((fdBufSize + fdBufSize / 8) / sizeof(struct proc_fdinfo)); + errno = 0; fdBufSize = proc_pidinfo( pid, PROC_PIDLISTFDS, 0, fds.data(), fds.size() * sizeof(struct proc_fdinfo) ); + // errno == 0???! Yes, seriously. This is because macOS has a + // broken syscall wrapper for proc_pidinfo that has no way of + // dealing with the system call successfully returning 0. It + // takes the -1 error result from the errno-setting syscall + // wrapper and turns it into a 0 result. But what if the system + // call actually returns 0? Then you get an errno of success. + // + // https://github.com/apple-opensource/xnu/blob/4f43d4276fc6a87f2461a3ab18287e4a2e5a1cc0/libsyscall/wrappers/libproc/libproc.c#L100-L110 + // https://git.lix.systems/lix-project/lix/issues/446#issuecomment-5483 + // FB14695751 if (fdBufSize <= 0) { - throw SysError("Listing pid %1% file descriptors", pid); + if (errno == 0) { + break; + } else { + throw SysError("Listing pid %1% file descriptors", pid); + } } } fds.resize(fdBufSize / sizeof(struct proc_fdinfo));