Devoxx France 2025

Composants : NVIDIA GPU Driver

Si besoin, installe le driver NVIDIA (optionnel)

$ kubectl logs nvidia-driver-daemonset-zljb6
  The kernel was built by: gcc (Ubuntu 11.4.Cleaning kernel module build directory.
Building kernel modules:

  [##############################] 100%
Kernel module compilation complete.
Kernel messages:
[  405.347857] nvidia: loading out-of-tree module taints kernel.
[  405.347875] nvidia: module license 'NVIDIA' taints kernel.
[  405.347879] Disabling lock debugging due to kernel taint
[  405.361004] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[  405.385913] nvidia-nvlink: Nvlink Core is being initialized, major device number 235
[  405.385920] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  550.90.07  Fri May 31 09:35:42 UTC 2024
[  405.458423] nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
[  405.464070] nvidia-uvm: Loaded the UVM driver, major device number 511.
[  405.469814] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  550.90.07  Fri May 31 09:30:47 UTC 2024
[  405.476414] nvidia-modeset: Unloading
[  405.523298] nvidia-uvm: Unloaded the UVM driver.
[  405.564398] nvidia-nvlink: Unregistered Nvlink Core, major device number 235
Installing 'NVIDIA Accelerated Graphics Driver for Linux-x86_64' (550.90.07):: Installing

Du GPU dans mes conteneurs !

Rémi Verchère

Qui suis-je ?

Agenda

Sujets traités

Sujets non traités

Côté dev

Activation du GPU côté dev

Easy !

Activation du GPU côté dev

Easy !

Environnement de démo

Côté ops

Cluster Kubernetes

Besoins

Cluster Kubernetes

Besoins

Environnement de démo

Démo time !

Cluster Kubernetes

Besoins

Cluster Kubernetes

Besoins

Activation du GPU côté DevOoops

Déploiement sur Kubernetes

NVIDIA GPU Operator

NVIDIA GPU Operator

NVIDIA GPU Operator

NVIDIA GPU Operator

Composants : NVIDIA GPU Operator

NVIDIA GPU Operator

Composants : NVIDIA GPU Driver

Composants : NVIDIA GPU Driver

Old Good Times

Composants : NVIDIA GPU Driver

NVIDIA GPU Operator

Composants : NVIDIA Container Toolkit

Composants : NVIDIA Container Toolkit

Composants : NVIDIA Container Toolkit

NVIDIA GPU Operator

Composants : NVIDIA Kubernetes Device Plugin

Composants : NVIDIA Kubernetes Device Plugin

Composants : NVIDIA Kubernetes Device Plugin

NVIDIA GPU Operator

Composants : NVIDIA GPU Feature Discovery

Composant bonus : Node Feature Discovery

Exemple de déploiement conteneur GPU classique

my-app-using-gpu.yaml

Composants pour applis conteneur avec accès au GPU

Démo time !

Quand j'ai plus de 2 applications

Sur mon cluster...

GPU Sharing

Stratégie 1 : Time Slicing

Stratégie 1 : Time Slicing

Stratégie 1 : Time Slicing

Stratégie 1 : Time Slicing

GPU Memory

Stratégie 2 : Multi-Process Service (MPS)

Stratégie 2 : Multi-Process Service (MPS)

Stratégie 3 : Multi-Instance GPUs (MIG)

Stratégie 3 : Multi-Instance GPUs (MIG)

NVIDIA GPU Operator

Quelle stratégie ?

Demo time !

"Une appli non monitorée,

ce n'est pas une application en prod"

NVIDIA GPU Operator

Composants: NVIDIA DCGM Exporter

(Data Center GPU Manager)

Point Bonus

Déploiement sur les autres

Cloud Providers

GCP

Exemple: timeSlicing sur GKE

Amazon EKS

Azure AKS

Scaleway

Ressources

Merci !