For NVLink topology systems we need fabricmanager. Fabricmanager itself is
dependent on the datacenter driver set and not the regular x11 ones, it is also
tightly tied to the driver version. Furhtermore the current cudaPackages
defaults to version 11.8, which corresponds to the 520 datacenter drivers.
Future improvement should be to switch the main nvidia datacenter driver version
on the `config.cudaVersion` since these are well known from:
> https://docs.nvidia.com/deploy/cuda-compatibility/index.html#use-the-right-compat-package
This adds nixos configuration options `hardware.nvidia.datacenter.enable` and
`hardware.nvidia.datacenter.settings` (the settings configure fabricmanager)
Other interesting external links related to this commit are:
* Fabricmanager download site:
- https://developer.download.nvidia.com/compute/cuda/redist/fabricmanager/linux-x86_64/
* Data Center drivers:
- https://www.nvidia.com/Download/driverResults.aspx/193711/en-us/
Implementation specific details:
* Fabricmanager is added as a passthru package, similar to settings and
presistenced.
* Adds `use{Settings,Persistenced,Fabricmanager}` with defaults to preserve x11
expressions.
* Utilizes mkMerge to split the `hardware.nvidia` module into three comment
delimited sections:
1. Common
2. X11/xorg
3. Data Center
* Uses asserts to make the configurations mutualy exclusive.
Notes:
* Data Center Drivers are `x86_64` only.
* Reuses the `nvidia_x11` attribute in nixpkgs on enable, e.g. doesn't change it
to `nvidia_driver` and sets that to either `nvidia_x11` or `nvidia_dc`.
* Should have a helper function which is switched on `config.cudaVersion` like
`selectHighestVersion` but rather `selectCudaCompatibleVersion`.
the conversion procedure is simple:
- find all things that look like options, ie calls to either `mkOption`
or `lib.mkOption` that take an attrset. remember the attrset as the
option
- for all options, find a `description` attribute who's value is not a
call to `mdDoc` or `lib.mdDoc`
- textually convert the entire value of the attribute to MD with a few
simple regexes (the set from mdize-module.sh)
- if the change produced a change in the manual output, discard
- if the change kept the manual unchanged, add some text to the
description to make sure we've actually found an option. if the
manual changes this time, keep the converted description
this procedure converts 80% of nixos options to markdown. around 2000
options remain to be inspected, but most of those fail the "does not
change the manual output check": currently the MD conversion process
does not faithfully convert docbook tags like <code> and <package>, so
any option using such tags will not be converted at all.
We can't assume that DRI card minor is the same as NVidia GPU device minor,
because some DRI minors could be taken by GPUs of other vendors.
Fixes#87788, #98942.
The current type for the busId options are too relaxed, a stricter
constraint should be imposed to guard against typos which result
in Xorg unable to start.
This commit restricts the type to adhere to the B/D/F notation[1] for
addressing devices as expected by the module option.
[1] - https://wiki.osdev.org/PCI#Configuration_Space_Access_Mechanism_.231
GDM enables Wayland on supported platforms automatically (see ${gnome.gdm}/lib/udev/rules.d/61-gdm.rules), so we removed the `gdm.nvidiaWayland` option.
You will still need `hardware.nvidia.modesetting.enable = true;` with `nvidia` driver, though.
This patch fixes a bug caused by an incorrect reference to
'nvidiaSettings' rather than 'cfg.nvidiaSettings'. The bug caused
the system to not build when using the nvidia drivers.
Tested on my local machine.