codehaus


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

OpenStack 14 CentOS and Nvidia driver for vgpu?


Hello list, 

I'm struggling deploying Rocky with vGPU using nvidia drivers. 
Has anyone experienced the issues loading nvidia modules? 

I'm talking about hypervisor part of the setup. There are two modules provided by nvidia. One loads correctly it's the nvidia.ko one. 
The other however does not. The module is called nvidia-vgpu-vfio.ko 

I'm trying to load it and it seems that 7.6 kernel is no longer compatible with it 

modprobe nvidia-vgpu-vfio 
modprobe: ERROR: could not insert 'nvidia_vgpu_vfio': Invalid argument 

dmesg shows this: 
nvidia_vgpu_vfio: disagrees about version of symbol vfio_pin_pages 
nvidia_vgpu_vfio: Unknown symbol vfio_pin_pages (err -22) 
nvidia_vgpu_vfio: disagrees about version of symbol vfio_unpin_pages 
nvidia_vgpu_vfio: Unknown symbol vfio_unpin_pages (err -22) 
nvidia_vgpu_vfio: disagrees about version of symbol vfio_register_notifier 
nvidia_vgpu_vfio: Unknown symbol vfio_register_notifier (err -22) 
nvidia_vgpu_vfio: disagrees about version of symbol vfio_unregister_notifier 
nvidia_vgpu_vfio: Unknown symbol vfio_unregister_notifier (err -22) 

modinfo nvidia-vgpu-vfio 
filename: /lib/modules/3.10.0-957.27.2.el7.x86_64/weak-updates/nvidia-vgpu-vfio.ko 
version: 430.27 
supported: external 
license: MIT 
rhelversion: 7.6 
srcversion: 0A179A61A02AD500D05FB1A 
alias: pci:v000010DEd00000E00sv*sd*bc04sc80i00* 
alias: pci:v000010DEd*sv*sd*bc03sc02i00* 
alias: pci:v000010DEd*sv*sd*bc03sc00i00* 
depends: nvidia,mdev,vfio 
vermagic: 3.10.0-940.el7.x86_64 SMP mod_unload modversions 


My guess is that somewhere along the rhel/centos 7.6 lifecycle vfio module changed the vfio module and broke the compatibility. 

Nvidia provides those modules built against the BETA 7.6 release and assume weak-modules to make it work. 
Somehow it does not. 

Anybody got any suggestions how to handle this? I'm working on it with nvidia enterprise support but maybe one of you got there first? 

best regards 

-- 
Piotr Baranowski 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190814/4f9aca75/attachment.html>