Ubuntu16.04安装Nvidia驱动cuda,cudnn和tensorflow-gpu

xiaoxiao2025-08-18  34

本文个人博客地址: 点击查看之前有在阿里云GPU服务器上弄过: 点击查看, 这里从装Nvidia开始

一、 安装Nvidia驱动

1.1 查找需要安装的Nvidia版本

1.1.1 官网

官网上查找: https://www.nvidia.com/Download/index.aspx?lang=en-us 这里是 GeForce GTX 1080 TI如下图,推荐 410 版本的

1.1.2 命令行查看推荐驱动

查看驱动:ubuntu-drivers devices, 如下图 ubuntu@ubuntu-System-Product-Name:~$ ubuntu-drivers devices == cpu-microcode.py == driver : intel-microcode - distro free == /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 == vendor : NVIDIA Corporation modalias : pci:v000010DEd00001B06sv00001458sd0000374Dbc03sc00i00 driver : nvidia-410 - third-party free recommended driver : nvidia-384 - distro non-free driver : xserver-xorg-video-nouveau - distro free builtin driver : nvidia-390 - third-party free driver : nvidia-396 - third-party free 注意这里添加了ppa, 若是没有,可能最新的只有nvidia-384, 但是若想安装cuda-9.0 需要大于384.81, 不然后面安装tensorflow-gpu 之后也会报错 图片对应网址:https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

添加 ppa: sudo add-apt-repository ppa:graphics-drivers/ppa (注意联网,去掉代理)sudo apt update 然后执行ubuntu-drivers devices就可以看到如上的结果安装: 可能需要的依赖:sudo apt install dkms build-essential linux-headers-generic有些可能需要禁用nouveau模块,查看:https://blog.csdn.net/u012235003/article/details/54575758sudo apt-get install linux-headers-$(uname -r)sudo apt install nvidia-410重启机器 查看: nvidia-smi显示如下结果 (wangyongzhi_ml) ubuntu@ubuntu-System-Product-Name:/usr/local/cuda-10.0/bin$ nvidia-smi Thu Oct 25 15:49:46 2018 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 410.66 Driver Version: 410.66 CUDA Version: 10.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 108... Off | 00000000:01:00.0 On | N/A | | 0% 44C P8 20W / 250W | 42MiB / 11174MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 108... Off | 00000000:02:00.0 Off | N/A | | 0% 50C P8 20W / 250W | 2MiB / 11178MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 949 G /usr/lib/xorg/Xorg 39MiB | +-----------------------------------------------------------------------------+

二、安装cuda

官网: https://developer.nvidia.com/cuda-toolkit-archive选择想要安装的版本,这里选择的是cuda-9.0, 下载安装 chmod +x cuda_9.0.176_384.81_linux-runsudo ./cuda_9.0.176_384.81_linux-run根据提示安装选择即可添加环境变量 vim ~/.bashrc加入环境变量 # cuda9.0 export PATH=/usr/local/cuda-9.0/bin/:$PATH; export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64/:$LD_LIBRARY_PATH; 测试1 nvcc -V如下图,版本为V9.0.176 (wangyongzhi_ml) ubuntu@ubuntu-System-Product-Name:~/wangyongzhi/software$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA Corporation Built on Fri_Sep__1_21:08:03_CDT_2017 Cuda compilation tools, release 9.0, V9.0.176 测试2 如果上面安装过程中选择了安装Examples, 会在 ~ 文件夹下生成测试NVIDIA_CUDA-9.0_Samples 的文件进入: cd NVIDIA_CUDA-9.0_Samplesmake进入 NVIDIA_CUDA-9.0_Samples/bin/x86_64/linux/release 文件夹 执行: ./deviceQuery, 可以看到类似如下信息 ./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 2 CUDA Capable device(s) Device 0: "GeForce GTX 1080 Ti" CUDA Driver Version / Runtime Version 10.0 / 9.0 CUDA Capability Major/Minor version number: 6.1 Total amount of global memory: 11174 MBytes (11717181440 bytes) (28) Multiprocessors, (128) CUDA Cores/MP: 3584 CUDA Cores GPU Max Clock rate: 1683 MHz (1.68 GHz) Memory Clock rate: 5505 Mhz Memory Bus Width: 352-bit L2 Cache Size: 2883584 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024

三、安装cudnn

官网:https://developer.nvidia.com/rdp/cudnn-download选择cuda对应的版本, 我的选择如下图

安装 tar -zxvf cudnn-9.0-linux-x64-v7.3.1.20.tgz将解压得到的cuda 文件夹下的内容拷贝到对应的 /usr/local/cuda-9.0文件夹下即可

四、安装Anaconda和tensorflow-gpu

官网: https://www.anaconda.com/download/#linux下载安装即可,我这里选择的是 python3.7 版本安装之后添加到环境变量: # anaconda3 export PATH=/home/ubuntu/anaconda3/bin:$PATH

创建虚拟环境,防止污染他人使用环境

conda create -n xxx python-3.6conda install tensorflow-gpu

测试

import tensorflow as tf sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) 打印如下信息: 2018-10-25 16:25:35.683507: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683 pciBusID: 0000:01:00.0 totalMemory: 10.91GiB freeMemory: 10.72GiB 2018-10-25 16:25:35.783459: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-10-25 16:25:35.783843: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 1 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683 pciBusID: 0000:02:00.0 totalMemory: 10.92GiB freeMemory: 10.76GiB 2018-10-25 16:25:35.784321: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0, 1 2018-10-25 16:25:36.069610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-10-25 16:25:36.069634: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 1 2018-10-25 16:25:36.069637: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N Y 2018-10-25 16:25:36.069639: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 1: Y N 2018-10-25 16:25:36.069852: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10367 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1) 2018-10-25 16:25:36.101498: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10409 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1) Device mapping: /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1 /job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1 2018-10-25 16:25:36.134430: I tensorflow/core/common_runtime/direct_session.cc:288] Device mapping: /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1 /job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1

五、 多个cuda版本切换

安装cuda-9.0 会在 /usr/local/ 目录下 如下图,它会创建一个软连接指向了 /usr/local/cuda-9.0/ (wangyongzhi_ml) ubuntu@ubuntu-System-Product-Name:/usr/local$ ll 总用量 48 drwxr-xr-x 12 root root 4096 10月 25 14:51 ./ drwxr-xr-x 13 root root 4096 10月 25 09:39 ../ drwxr-xr-x 2 root root 4096 4月 21 2016 bin/ lrwxrwxrwx 1 root root 19 10月 25 00:41 cuda -> /usr/local/cuda-9.0/ drwxr-xr-x 19 root root 4096 10月 25 14:52 cuda-10.0/ drwxr-xr-x 18 root root 4096 10月 25 00:41 cuda-9.0/ drwxr-xr-x 2 root root 4096 4月 21 2016 etc/ drwxr-xr-x 2 root root 4096 4月 21 2016 games/ drwxr-xr-x 2 root root 4096 4月 21 2016 include/ drwxr-xr-x 4 root root 4096 4月 21 2016 lib/ lrwxrwxrwx 1 root root 9 10月 24 14:52 man -> share/man/ drwxr-xr-x 2 root root 4096 4月 21 2016 sbin/ drwxr-xr-x 8 root root 4096 4月 21 2016 share/ drwxr-xr-x 2 root root 4096 4月 21 2016 src/ 所以正常安装cuda 其他版本,然后创建软连接指向对应的版本即可 sudo rm -rf cuda sudo ln -s /usr/local/cuda-10.0 /usr/local/cuda

Reference

https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

https://blog.csdn.net/u012235003/article/details/54575758

转载请注明原文地址: https://www.6miu.com/read-5035008.html

最新回复(0)