Nvidia Grid Driver Installation

By default all our E2E vGPU nodes are provided preinstalled with the drivers and especially the vGPU nodes come with the licensed grid drivers and it need not to be installed again. If in case you want to reinstall the driver again please go ahead and read further.

The drivers that are usually obtained from the default nvidia site will not be compatible with the vGPU card. This tutorial explains how to install the licensed nvidia grid drivers on your vGPU node.

Note: The following instructions are tested on a fresh node. As these steps involve reboot of the OS, please note that rebooting a production server can result in temporary downtime and may affect any running services or applications. Before proceeding with a reboot, please ensure that all critical services are stopped or paused, and that any affected users have been notified. We recommend scheduling a maintenance window during a period of low activity to minimize the impact on users.

Instructions for Ubuntu OS:

  • To update package cache:

sudo apt-get update -y
  • To upgrade all packages:

sudo apt-get upgrade -y

Then reboot the machine at a convenient downtime, if there are no kernel package updates you can skip reboot:

sudo reboot
  • Login to the machine again once it’s UP.

  • To install gcc compiler, dependencies and kernel headers.

sudo apt-get install -y gcc make dkms linux-headers-$(uname -r)

Disable the nouveau driver by running the following command:

cat << EOF | sudo tee --append /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
EOF

Then run

update-initramfs -u

Append the following line to /etc/default/grub

GRUB_CMDLINE_LINUX="rdblacklist=nouveau"

Then run to update the grub configurations:

update-grub

You can obtain the GRID driver installation utility by running the following command:

wget https://e2e-nvidia-grid.objectstore.e2enetworks.net/NVIDIA-Linux-x86_64-525.105.17-grid.run

Then run

sudo sh ./NVIDIA-Linux-x86_64-<version>.run

Replace “<version>” with the version number of the driver package. Follow the prompts in the installation script to complete the installation.

(OR)

If you want to install directly run the below command:

sudo sh ./NVIDIA-Linux-x86_64-<version>.run --silent --dkms

Once the installation is complete, reboot your system to activate the new driver.

Verify that the NVIDIA GRID driver is installed correctly by running the following command:

nvidia-smi

Instructions for CentOS/Rocky:

  • To update package cache:

sudo yum update -y
  • Then reboot the machine at a convenient downtime, if there are no kernel package updates you can skip reboot:

sudo reboot
  • Login to the machine again once it’s UP.

  • To install gcc compiler, dependencies and kernel headers.

sudo yum install -y gcc dkms kernel-devel-$(uname -r)

Disable the nouveau driver by running the following command:

cat << EOF | sudo tee --append /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
EOF

Then run

dracut --force --omit-drivers=nouveau

Append the following line to /etc/default/grub

GRUB_CMDLINE_LINUX="rdblacklist=nouveau"

Then run to update the grub configurations:

sudo grub2-mkconfig -o /boot/grub2/grub.cfg

You can obtain the GRID driver installation utility by running the following command:

wget https://e2e-nvidia-grid.objectstore.e2enetworks.net/NVIDIA-Linux-x86_64-525.105.17-grid.run

Then run

sudo sh ./NVIDIA-Linux-x86_64-<version>.run

Replace “<version>” with the version number of the driver package. Follow the prompts in the installation script to complete the installation.

(OR)

If you want to install directly run the below command:

sudo sh ./NVIDIA-Linux-x86_64-<version>.run --silent --dkms

Once the installation is complete, reboot your system to activate the new driver.

Verify that the NVIDIA GRID driver is installed correctly by running the following command:

nvidia-smi

CUDA Installation:

Additionally it may be required to install CUDA toolkit in order to run some applications related to GPU. Please follow the below steps to install the CUDA.

Open the following website:

https://developer.nvidia.com/cuda-toolkit-archive

Choose the CUDA version as per your requirement and select the appropriate OS. Make sure to choose the installer type as “run file”.

../../_images/NG1.png

Commands given above are for reference, you can refer to the site as well for the commands.

Then download and run the installation script as below:

wget https://developer.download.nvidia.com/compute/cuda/XXX/local_installers/cuda_XXX_linux.run
sudo sh cuda_XXX_linux.run

Type “accept” to accept the license agreement and press enter.

Note

Please note that we have already installed the grid driver and the driver provided by the above package need not to be installed. Make sure to unselect the driver in the given prompt, move down and hit install.

../../_images/NG2.png

OR

You can directly install through the following command:

sudo sh cuda_XXX_linux.run --toolkit

To add the CUDA environment path to your .bashrc file, follow these steps:

Open your .bashrc file using a text editor. You can use any text editor of your choice, but we will use the nano editor in this example.

nano ~/.bashrc

Scroll down to the end of the file and add the following lines:

# CUDA 11.8 path
export PATH=/usr/local/cuda-11.8/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH

The above example is provided for CUDA 11.8 and make sure to replace /usr/local/cuda-11.8 with the correct path to your CUDA installation.

Save the changes and exit the text editor by pressing Ctrl + X, then Y, then Enter.

source ~/.bashrc

Verify that the CUDA environment variables have been added to your path by running the following command:

echo $PATH

You should see the CUDA path added to the list of paths.

Voilà! You have installed the nvidia grid driver and CUDA toolkit.