The buildkite agent supports multiple tags with the same key. This
functionality is used to have a [single agent listen on multiple
queues](https://buildkite.com/docs/agent/v3/queues#setting-an-agents-queue).
However, having the tags be of type `attrsOf str` means that
we cannot suport this use case. This commit modifies the type
of tags to be `attrsOf (either str (listOf str))` where the list
is expanded into multiple tags with the same key.
Example:
```
{tags = {queue = ["default", "testing"];};}
```
generates
```
tags="queue=default,queue=testing"
```
in the buildkite agent configuration.
* github-runner: init at 2.277.1
* nixos/github-runner: initial version
* nixos/github-runner: add warning if tokenFile in Nix store
* github-runner: don't accept unexpected attrs
* github-runner: formatting nits
* github-runner: add pre and post hooks to checkPhase
* nixos/github-runner: update ExecStartPre= comment
* nixos/github-runner: adapt tokenFile option description
Also note that not only a change to the option value will trigger a
reconfiguration but also modifications to the file's content.
* nixos/github-runner: remove mkDefault for DynamicUser=
* nixos/github-runner: create a parent for systemd dirs
Adds a parent directory "github-runner/" to all of the systemd lifecycle
directories StateDirectory=, RuntimeDirectory= and LogDirectory=.
Doing this has two motivations:
1. Something like this would required if we want to support multiple
runners configurations. Please note that this is already possible
using NixOS containers.
2. Having an additional parent directory makes it easier to remap
any of the directories. Without a parent, systemd is going to
complain if, for example, the given StateDirectory= is a symlink.
* nixos/github-runner: use specifier to get abs runtime path
* nixos/github-runner: use hostname as default for option `name`
Until now, the runner registration did not set the `--name` argument if
the configuration option was `null`, the default for the option.
According to GitHub's documentation, this instructs the registration
script to use the machine's hostname.
This commit causes the registration to always pass the `--name` argument
to the runner configuration script. The option now defaults to
`networking.hostName` which should be always set on NixOS.
This change becomes necessary as the systemd service name includes the
name of the runner since fcfa809 and, hence, expects it to be set. Thus,
an unset `name` option leads to an error.
* nixos/github-runner: use types.str for `name` option
Forcing a `name` option to comply with a pattern which could also be
used as a hostname is probably not required by GitHub.
* nixos/github-runner: pass dir paths explicitly for ExecStartPre=
* nixos/github-runner: update variable and script naming
* nixos/github-runner: let systemd choose the user/group
User and group naming restrictions are a complex topic [1] that I don't
even want to touch. Let systemd figure out the username and group and
reference it in our scripts through the USER environment variable.
[1] https://systemd.io/USER_NAMES/
* Revert "nixos/github-runner: use types.str for `name` option"
The escaping applied to the subdirectory paths given to StateDirectory=,
RuntimeDirectory= and LogsDirectory= apparently doesn't use the same
strategy that is used to escape unit names (cf. systemd-escape(1)). This
makes it unreasonably hard to construct reliable paths which work for
StateDirectory=/RuntimeDirectory=/LogsDirectory= and ExecStartPre=.
Against this background, I decided to (re-)apply restrictions to the
name a user might give for the GitHub runner. The pattern for
`networking.hostName` seems like a reasonable choice, also as its value
is the default if the `name` option isn't set.
This reverts commit 193ac67ba337990c22126da24a775c497dbc7e7d.
* nixos/github-runner: use types.path for `tokenFile` option
* nixos/github-runner: escape options used as shell arguments
* nixos/github-runner: wait for network-online.target
* github-runner: ignore additional online tests
This will make it easier to track specifically where queries are being
made from (assuming a `log_line_prefix` that includes `%a` in the
postgres configuration).
These were broken since 2016:
f0367da7d1
since StartLimitIntervalSec got moved into [Unit] from [Service].
StartLimitBurst has also been moved accordingly, so let's fix that one
too.
NixOS systems have been producing logs such as:
/nix/store/wf98r55aszi1bkmln1lvdbp7znsfr70i-unit-caddy.service/caddy.service:31:
Unknown key name 'StartLimitIntervalSec' in section 'Service', ignoring.
I have also removed some unnecessary duplication in units disabling
rate limiting since setting either interval or burst to zero disables it
(ad16158c10/src/basic/ratelimit.c (L16))
This should NOT be backported to 20.09!
When 21.03 is released, the DB changes are about a year old and
operators had two release cycles for the upgrade. At this point it
should be fair to remove the compat layer to reduce the complexity of
the module itself.
In the current implementation, there's no possibility to modify the default
parameter for keepalive. This is a number that indicates how frequently
keepalive messages should be sent from the worker to the buildmaster,
expressed in seconds. The default (600) causes a message to be sent to
the buildmaster at least once every 10 minutes.
If the worker is behind a NAT box or stateful firewall, these messages
may help to keep the connection alive: some NAT boxes tend to forget about
a connection if it has not been used in a while. When this happens, the
buildmaster will think that the worker has disappeared, and builds will
time out. Meanwhile the worker will not realize than anything is wrong.
Since Buildbot 0.9.0, status targets were deprecated and ignored.
There's a very small line on startup explaining that, and status simply
isn't reported. Avoid others the same headaches, and do it right in the
NixOS module.
As there might have been changes in the way reporters are organized, and
configuration might need to be migrated remove the old option, and not
just provide an alias.
It's pbPort, and it's also a connection string, meaning
listen-on-localhost is also possible. Provide an alias for the old
option name, so old configs still work.
We already set the relevant env vars in the systemd services. That does
not help one when executing any of the executables outside a service,
e.g. when creating a new user.
Also removed `pkgs.hydra-flakes` since flake-support has been merged
into master[1]. Because of that, `pkgs.hydra-unstable` is now compiled
against `pkgs.nixFlakes` and currently requires a patch since Hydra's
master doesn't compile[2] atm.
[1] https://github.com/NixOS/hydra/pull/730
[2] https://github.com/NixOS/hydra/pull/732
Upgrades Hydra to the latest master/flake branch. To perform this
upgrade, it's needed to do a non-trivial db-migration which provides a
massive performance-improvement[1].
The basic ideas behind multi-step upgrades of services between NixOS versions
have been gathered already[2]. For further context it's recommended to
read this first.
Basically, the following steps are needed:
* Upgrade to a non-breaking version of Hydra with the db-changes
(columns are still nullable here). If `system.stateVersion` is set to
something older than 20.03, the package will be selected
automatically, otherwise `pkgs.hydra-migration` needs to be used.
* Run `hydra-backfill-ids` on the server.
* Deploy either `pkgs.hydra-unstable` (for Hydra master) or
`pkgs.hydra-flakes` (for flakes-support) to activate the optimization.
The steps are also documented in the release-notes and in the module
using `warnings`.
`pkgs.hydra` has been removed as latest Hydra doesn't compile with
`pkgs.nixStable` and to ensure a graceful migration using the newly
introduced packages.
To verify the approach, a simple vm-test has been added which verifies
the migration steps.
[1] https://github.com/NixOS/hydra/pull/711
[2] https://github.com/NixOS/nixpkgs/pull/82353#issuecomment-598269471
* nixos/buildkite: drop user option
This reverts 8c6b1c3eaa.
Turns out, buildkite-agent has logic to write .ssh/known_hosts files and
only really works when $HOME and the user homedir are in sync.
On top of that, we provision ssh keys in /var/lib/buildkite-agent, which
doesn't work if that other users' homedir points elsewhere (we can cheat
by setting $HOME, but then getent and $HOME provide conflicting
results).
So after all, it's better to only run the system-wide buildkite agent as
the "buildkite-agent" user only - if one wants to run buildkite as
different users, systemd user services might be a better fit.
* nixosTests.buildkite-agent: add node with separate user and no ssh key
This gets passed to BUILDKITE_SHELL, which will specify the shell being
used to executes script in.
Defaults to `${pkgs.bash}/bin/bash -e -c`, matching how buildkite
behaves on other distros.
SSH public keys aren't needed to clone private repos, and if we only
need to configure a single attribute, there's no need for the "openssh"
attrset anymore.