之前一直是用.deb文件安装cuda,自动安装显卡驱动,简单粗暴稳得一笔。但是因为用deb文件安装cuda时显卡驱动是从源上下载的,方便的同时也带了一些问题。 最近可能是Ubuntu的源抽风了,安装cuda的时候自动装的显卡驱动有问题,安装结束后nvidia-smi找不到显卡,因此记录下从.run文件安装cuda的步骤以备不时之需。
如果驱动是用.run文件安装的驱动,可以用
sudo ./NVIDIA-Linux-x86_64-384.59.run --uninstall来卸载,不过为了稳妥起见我还是先重装系统,毕竟重装系统能省掉很多不必要的麻烦。
在最后添加如下内容
blacklist vga16fb blacklist nouveau blacklist rivafb blacklist rivatv blacklist nvidiafb保存后退出 输入
sudo update-initramfs -u然后重启电脑
这里要尤其注意,安装显卡驱动要先切换到文字界面,(按Ctrl+Alt+F1~F6).所以,启动电脑后,先进入文字界面,然后输入命令
sudo service lightdm stop sudo bash NVIDIA-Linux-x86_64-390.87.run安装完成后输入 每次安装的时候nvidia会让你看一大堆协议,可以:q直接跳过看协议,直接进入accept界面。
sudo nvidia-smi如果看到GPU的信息列表则说明安装成功
Thu Oct 11 16:00:03 2018 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 390.87 Driver Version: 390.87 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 108... Off | 00000000:0B:00.0 On | N/A | | 23% 33C P8 10W / 250W | 368MiB / 11177MiB | 1% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1193 G /usr/lib/xorg/Xorg 209MiB | | 0 2024 G compiz 143MiB | | 0 2310 G /usr/bin/nvidia-settings 13MiB | +-----------------------------------------------------------------------------+然后重启电脑
注意:执行后会有一系列提示让你确认,但是注意,有个让你选择是否安装nvidia387驱动时,一定要选择否:
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 387.26?
因为前面我们已经安装了更加新的nvidia387,所以这里不要选择安装。其余的都直接默认或者选择是即可。
打开~/.bashrc文件:
sudo gedit ~/.bashrc将以下内容写入到~/.bashrc尾部:
export PATH=/usr/local/cuda-9.1/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda9.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}如果显示一些关于GPU的信息,则说明安装成功。
./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "GeForce GTX 1080 Ti" CUDA Driver Version / Runtime Version 9.1 / 9.1 CUDA Capability Major/Minor version number: 6.1 Total amount of global memory: 11177 MBytes (11720130560 bytes) (28) Multiprocessors, (128) CUDA Cores/MP: 3584 CUDA Cores GPU Max Clock rate: 1582 MHz (1.58 GHz) Memory Clock rate: 5505 Mhz Memory Bus Width: 352-bit L2 Cache Size: 2883584 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: Yes Device PCI Domain ID / Bus ID / location ID: 0 / 11 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 9.1, NumDevs = 1 Result = PASS如果安装成功会打印如下信息
Testing single precision Testing conv ^^^^ CUDA : elapsed = 4.81606e-05 sec, Test PASSED Testing half precision (math in single precision) Testing conv ^^^^ CUDA : elapsed = 3.40939e-05 sec, Test PASSEDUbuntu更新内核后可能导致显卡驱动损坏,因此锁定内核
sudo dpkg --get-selections | grep linux-image sudo apt-mark hold linux-image-5.0.0-23-generic sudo apt-mark hold linux-image-generic-hwe-18.04