document version 0.0.4, vep2, 2024-12-07
GPU System Setup
0) OS Installation
Update
whatever floats your ⛵
Upgrade
whatever floats your ⛵
Backup etc, change default shell
# cd /etc
# git config -g user.email=root@vertex
# git config -g user.name=root
# git init .
# git add .
# git commit -m "[+] init"
# vim /etc/default/useradd # change default shell to bash
Add scratch disk
if available:
# ln -s /scratch /srv/data
# mkdir /srv/data/extended-local-storage
Additional packages
#apt install restic # maybe get it from GitHub (?)
apt install vim tree ncdu htop btop nvitop nvtop
1) LDAP
follow the instructions given here:
https://ubuntu.com/server/docs/how-to/sssd/with-ldap/
then read the document sssd-ldap-setup.md :
-
working BFH
/etc/sssd/sssd.conffile: ... -
working BFH
/etc/ldap.conffile: ... -
working BFH
/etc/pam.d/sshdfile: ...
2) CUDA
General
ubuntu-drivers devices to get an overview of installed/supported GPU devices
ubuntu-drivers --gpgpu list
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#ubuntu
https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html
Detailed
adapt the linke below w/ the appropriate target_version=..., acc. to your Ubuntu release
https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/
(for Ubuntu 24.04)
follow these steps for the straight-forward approach
https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=24.04&target_type=deb_network
(read through this)
add some tools
apt install nvitop && apt install nvtop
after installation, reboot the system
check w/ nvcc --version (might not work, as this has not been added to your $PATH),
nvidia-smi, nvitop, nvtop, cat /proc/driver/nvidia/version
if the above does not work (!)
1) intro: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#ubuntu
1 a) https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#prepare-ubuntu
remove old keyring: sudo apt-key del 7fa2af80
wget https://developer.download.nvidia.com/compute/cuda/repos/<release>/x86_64/cuda-keyring_1.1-1_all.deb # placeholder <release> means `ubuntu2x04`
sudo dpkg -i cuda-keyring_1.1-1_all.deb
rm cuda-keyring_1.1-1_all.deb
sudo apt update
2) https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#network-repo-installation-for-ubuntu
wget https://developer.download.nvidia.com/compute/cuda/repos/<release>/x86_64/cuda-<release>.pin
mv cuda-<release>.pin /etc/apt/preferences.d/cuda-repository-pin-600 # placeholder <release> means `ubuntu2x04`
4) driver: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#driver-installation
4 a) https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#additional-package-manager-capabilities
see Section 3.12.2. Meta Packages to get current newest version; this info will be used for the placeholder <driver_branch>
sudo apt-get install cuda-drivers-<driver_branch>, e.g., sudo apt-get install cuda-drivers-555
5) https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions
6) check w/ nvcc --version (might not work, as this has not been added to your $PATH),
nvidia-smi, nvitop, cat /proc/driver/nvidia/version
for (un)-supported platforms (?)
1) Check GPU/CUDA capabilities
https://docs.nvidia.com/dgx/dgx-os-5-user-guide/installing_on_ubuntu.html#installing-the-dgx-software-stack
https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=20.04&target_type=deb_network
3) docker
NOT: do not apt install docker.iohttps://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
To fully profit from the GPU hardware, do not rely on the stock docker service; follow the guidelines for NVIDIA's docker runtime:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
# su sysadmin
$ cd ~ # change to /home/sysadmin
$ mkdir -p workspaces/docker/pyenv
$ cd ~/workspaces/docker/pyenv
$ scp -r sysadmin@peak.ti.bfh.ch:workspaces/docker/pyenv .
# check `Dockerfile`
# docker build . -t local:pyenv
Others
# pull a docker image for `gpu_burn`
Test docker pyenv
-
pyenv install 3.13.1 -
pyenv global 3.13.1