darwin: workaround PROC_PIDLISTFDS on processes with no fds
This has been causing various seemingly spurious CI failures as well as some failures on people running tests on beta builds. lix> ++(nix-collect-garbage-dry-run.sh:20) nix-store --gc --print-dead lix> ++(nix-collect-garbage-dry-run.sh:20) wc -l lix> finding garbage collector roots... lix> error: Listing pid 87261 file descriptors: Undefined error: 0 There is no real way to write a proper test for this, other than to start a process like the following: int main(void) { for (int i = 0; i < 1000; ++i) { close(i); } sleep(10000); } and then let Lix's gc look at it. I have a relatively high confidence this *will* fix the problem since I have manually confirmed the behaviour of the libproc call is as-unexpected, and it would perfectly explain the observed symptom. Fixes: https://git.lix.systems/lix-project/lix/issues/446 Change-Id: I67669b98377af17895644b3bafdf42fc33abd076
This commit is contained in:
parent
529eed74c4
commit
1437d3df15
2 changed files with 31 additions and 1 deletions
15
doc/manual/rl-next/haunted-gc-macos.md
Normal file
15
doc/manual/rl-next/haunted-gc-macos.md
Normal file
|
@ -0,0 +1,15 @@
|
|||
---
|
||||
synopsis: "Fix unexpectedly-successful GC failures on macOS"
|
||||
cls: 1723
|
||||
issues: fj#446
|
||||
credits: jade
|
||||
category: Fixes
|
||||
---
|
||||
|
||||
Has the following happened to you on macOS? This failure has been successfully eliminated, thanks to our successful deployment of advanced successful-failure detection technology (it's just `if (failed && errno == 0)`. Patent pending<sup>not really</sup>):
|
||||
|
||||
```
|
||||
$ nix-store --gc --print-dead
|
||||
finding garbage collector roots...
|
||||
error: Listing pid 87261 file descriptors: Undefined error: 0
|
||||
```
|
|
@ -56,12 +56,27 @@ void DarwinLocalStore::findPlatformRoots(UncheckedRoots & unchecked)
|
|||
while (fdBufSize > fds.size() * sizeof(struct proc_fdinfo)) {
|
||||
// Reserve some extra size so we don't fail too much
|
||||
fds.resize((fdBufSize + fdBufSize / 8) / sizeof(struct proc_fdinfo));
|
||||
errno = 0;
|
||||
fdBufSize = proc_pidinfo(
|
||||
pid, PROC_PIDLISTFDS, 0, fds.data(), fds.size() * sizeof(struct proc_fdinfo)
|
||||
);
|
||||
|
||||
// errno == 0???! Yes, seriously. This is because macOS has a
|
||||
// broken syscall wrapper for proc_pidinfo that has no way of
|
||||
// dealing with the system call successfully returning 0. It
|
||||
// takes the -1 error result from the errno-setting syscall
|
||||
// wrapper and turns it into a 0 result. But what if the system
|
||||
// call actually returns 0? Then you get an errno of success.
|
||||
//
|
||||
// https://github.com/apple-opensource/xnu/blob/4f43d4276fc6a87f2461a3ab18287e4a2e5a1cc0/libsyscall/wrappers/libproc/libproc.c#L100-L110
|
||||
// https://git.lix.systems/lix-project/lix/issues/446#issuecomment-5483
|
||||
// FB14695751
|
||||
if (fdBufSize <= 0) {
|
||||
throw SysError("Listing pid %1% file descriptors", pid);
|
||||
if (errno == 0) {
|
||||
break;
|
||||
} else {
|
||||
throw SysError("Listing pid %1% file descriptors", pid);
|
||||
}
|
||||
}
|
||||
}
|
||||
fds.resize(fdBufSize / sizeof(struct proc_fdinfo));
|
||||
|
|
Loading…
Reference in a new issue