The new option `services.prometheus.enableReload` has been introduced
which, when enabled, causes the prometheus systemd service to reload
when its config file changes.
More specifically the following property holds: switching to a
configuration (`switch-to-configuration`) that changes the prometheus
configuration only finishes successully when prometheus has finished
loading the new configuration.
`enableReload` is `false` by default in which case the old semantics
of restarting the prometheus systemd service are in effect.
- Add HDFS journalnode and ZKFC services
- Test failover of HDFS and YARN master services in full hadoop test
- Check if a minimal HDFS cluster works in the minimal HDFS test
Some upstream changes broke the automatic fallback to SwiftShader.
Without this workaround the GPU initialization fails (apparently due to requiring Vulkan):
machine # libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null)
machine # [1001:1001:1101/104304.000625:ERROR:viz_main_impl.cc(161)] Exiting GPU process due to errors during initialization
machine # libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null)
machine # [1052:1052:1101/104305.060496:ERROR:viz_main_impl.cc(161)] Exiting GPU process due to errors during initialization
machine # [1084:1084:1101/104305.165898:ERROR:angle_platform_impl.cc(44)] Display.cpp:894 (initialize): ANGLE Display::initialize error 0: Internal Vulkan error (-3): Initialization of an object could not be completed for implementation-specific reasons, in ../../third_party/angle/src/libANGLE/renderer/vulkan/RendererVk.cpp, initialize:1048.
machine # [1084:1084:1101/104305.171191:ERROR:gl_surface_egl.cc(782)] EGL Driver message (Critical) eglInitialize: Internal Vulkan error (-3): Initialization of an object could not be completed for implementation-specific reasons, in ../../third_party/angle/src/libANGLE/renderer/vulkan/RendererVk.cpp, initialize:1048.
machine # [1084:1084:1101/104305.178707:ERROR:gl_surface_egl.cc(1382)] eglInitialize SwANGLE failed with error EGL_NOT_INITIALIZED
machine # [1084:1084:1101/104305.180111:ERROR:gl_ozone_egl.cc(20)] GLSurfaceEGL::InitializeOneOff failed.
machine # [1084:1084:1101/104305.189760:ERROR:viz_main_impl.cc(161)] Exiting GPU process due to errors during initialization
machine # [1092:1092:1101/104305.233470:ERROR:gpu_init.cc(457)] Passthrough is not supported, GL is disabled, ANGLE is
This workaround restores the previous result:
machine # [1004:1004:1101/104551.131190:ERROR:gpu_init.cc(457)] Passthrough is not supported, GL is swiftshader, ANGLE is
The workaround is only required for Chromium, not Google Chrome.
Some specialisations (such as those which affect various boot-time
attributes) cannot be switched to at runtime. This allows picking the
specialisation at boot time.
Massively reduce the time it takes running the test by building a
proper root disk image and increasing the virtualized core count to
4. This should make it much easier for the tests to pass even on
weaker systems.
With my laptop (AMD Ryzen 7 PRO 2700U) as the reference system, I see
the following test run times:
- No change:
Times out after 28 mins
- Building a root image:
7 mins, 48 secs
- Building a root image and bumping the core count:
7 mins, 17 secs
The times include the time it takes to build the image
(~1 min, 20 secs).
Massively reduce the time it takes running the test by building a
proper root disk image and increasing the virtualized core count to
4. This should make it much easier for the tests to pass even on
weaker systems.
With my laptop (AMD Ryzen 7 PRO 2700U, 32GB RAM) as the reference
system, I see the following test run times:
- No change:
25 mins, 49 secs
- Building a root image:
4 mins, 44 secs
- Building a root image and bumping the core count:
3 mins, 6 secs
The times include the time it takes to build the image (~40 secs).
pathsInNixDB isn't a very accurate name when a Nix store image is
built (virtualisation.useNixStoreImage); rename it to additionalPaths,
which should be general enough to cover both cases.
The existing tests for HDFS and YARN only check if the services come up and expose their web interfaces.
The new combined hadoop test will also test whether the services and roles work together as intended.
It spin up an HDFS+YARN cluster and submit a demo YARN application that uses the hadoop cluster for
storageand yarn cluster for compute.
The version 20 of Nextcloud will be EOLed by the end of this month[1].
Since the recommended default (that didn't raise an eval-warning) on
21.05 was Nextcloud 21, this shouldn't affect too many people.
In order to ensure that nobody does a (not working) upgrade across
several major-versions of Nextcloud, I replaced the derivation of
`nextcloud20` with a `throw` that provides instructions how to proceed.
The only case that I consider "risky" is a setup upgraded from 21.05 (or
older) with a `system.stateVersion` <21.11 and with
`services.nextcloud.package` not explicitly declared in its config. To
avoid that, I also left the `else-if` for `stateVersion < 21.03` which
now sets `services.nextcloud.package` to `pkgs.nextcloud20` and thus
leads to an eval-error. This condition can be removed
as soon as 21.05 is EOL because then it's safe to assume that only
21.11. is used as stable release where no Nextcloud <=20 exists that can
lead to such an issue.
It can't be removed earlier because then every `system.stateVersion <
21.11` would lead to `nextcloud21` which is a problem if `nextcloud19`
is still used.
[1] https://docs.nextcloud.com/server/20/admin_manual/release_schedule.html
during the rewrite the checkPasswords=false feature of the old module
was lost. restore it, and with it systems that allow any client to use
any username.
borg is able to process stdin during backups when backing up the special path -,
which can be very useful for backing up things that can be streamed (eg database
dumps, zfs snapshots).
mosquitto needs a lot of attention concerning its config because it doesn't
parse it very well, often ignoring trailing parts of lines, duplicated config
keys, or just looking back way further in the file to associated config keys
with previously defined items than might be expected.
this replaces the mosquitto module completely. we now have a hierarchical config
that flattens out to the mosquitto format (hopefully) without introducing spooky
action at a distance.
This commit changes a lot more that you'd expect but it also adds a lot
of new testing code so nothing breaks in the future. The main change is
that sockets are now restarted when they change. The main reason for
the large amount of changes is the ability of activation scripts to
restart/reload units. This also works for socket-activated units now,
and honors reloadIfChanged and restartIfChanged. The two changes don't
really work without each other so they are done in the one large commit.
The test should show what works now and ensure it will continue to do so
in the future.
allows configuration of foo-over-udp decapsulation endpoints. sadly networkd
seems to lack the features necessary to support local and peer address
configuration, so those are only supported when using scripted configuration.