RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

不得不说，算力太强，版本太新也是一种烦恼哈哈在安装torch时，一定要注意显卡的cuda版本问题。比如，在 RTX2080上同样的环境中程序可以正常运行，而换到A100中，就会报错如下：NVIDIA A100-PCIE-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.

edward_zcl

17256人浏览 · 2022-04-29 11:52:46

edward_zcl · 2022-04-29 11:52:46 发布

不得不说，算力太强，版本太新也是一种烦恼哈哈

(base) [s503-1@s518-7 code_Spiking_CNN_Rathi_hybrid]$ conda activate pytorch17
(pytorch17) [s503-1@s518-7 code_Spiking_CNN_Rathi_hybrid]$ conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/linux-64/
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.5.11
  latest version: 4.12.0

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /data1/s503-1/anaconda3/envs/pytorch17

  added / updated specs:
    - cudatoolkit=11.0
    - pytorch==1.7.1
    - torchaudio==0.7.2
    - torchvision==0.8.2


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    torchvision-0.8.2          |       py38_cu110        17.9 MB  http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
    pytorch-1.7.1              |py3.8_cuda11.0.221_cudnn8.0.5_0       770.6 MB  http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
    cudatoolkit-11.0.221       |       h6bb024c_0       952.7 MB  defaults
    ------------------------------------------------------------
                                           Total:        1.70 GB

The following packages will be UPDATED:

    cudatoolkit: 10.2.89-hfd86e86_1                   defaults                                                   --> 11.0.221-h6bb024c_0                   defaults
    pytorch:     1.7.1-py3.8_cuda10.2.89_cudnn7.6.5_0 http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch --> 1.7.1-py3.8_cuda11.0.221_cudnn8.0.5_0 http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
    torchvision: 0.8.2-py38_cu102                     http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch --> 0.8.2-py38_cu110                      http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch

Proceed ([y]/n)? y


Downloading and Extracting Packages
torchvision-0.8.2    | 17.9 MB   | ############################################################################# | 100%
pytorch-1.7.1        | 770.6 MB  | ############################################################################# | 100%
cudatoolkit-11.0.221 | 952.7 MB  | ############################################################################# | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: - By downloading and using the CUDA Toolkit conda packages, you accept the terms and conditions of the CUDA End User License Agreement (EULA): https://docs.nvidia.com/cuda/eula/index.html
                                                                                                                      done

         kernel_size          : 3
         test_acc_every_batch : False
         train_acc_batches    : 200
         devices              : 0
 Loaded module.features.0.weight from ./trained_models/snn/snn_vgg5_mnist_100.pth
 Loaded module.features.3.weight from ./trained_models/snn/snn_vgg5_mnist_100.pth
 Loaded module.features.6.weight from ./trained_models/snn/snn_vgg5_mnist_100.pth
 Loaded module.classifier.0.weight from ./trained_models/snn/snn_vgg5_mnist_100.pth
 Loaded module.classifier.3.weight from ./trained_models/snn/snn_vgg5_mnist_100.pth
 Loaded module.classifier.6.weight from ./trained_models/snn/snn_vgg5_mnist_100.pth
 DataParallel(
  (module): VGG_SNN_STDB(
    (input_layer): PoissonGenerator()
    (features): Sequential(
      (0): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (1): ReLU(inplace=True)
      (2): AvgPool2d(kernel_size=2, stride=2, padding=0)
      (3): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (4): ReLU(inplace=True)
      (5): Dropout(p=0.3, inplace=False)
      (6): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (7): ReLU(inplace=True)
      (8): AvgPool2d(kernel_size=2, stride=2, padding=0)
    )
    (classifier): Sequential(
      (0): Linear(in_features=6272, out_features=4096, bias=False)
      (1): ReLU(inplace=True)
      (2): Dropout(p=0.5, inplace=False)
      (3): Linear(in_features=4096, out_features=4096, bias=False)
      (4): ReLU(inplace=True)
      (5): Dropout(p=0.5, inplace=False)
      (6): Linear(in_features=4096, out_features=10, bias=False)
    )
  )
)
 Adam (
Parameter Group 0
    amsgrad: True
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.0001
    weight_decay: 0.0005
)snn.py:182: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  mask = torch.tensor(mask,dtype=torch.float)

在安装torch时，一定要注意显卡的cuda版本问题。

比如，在 RTX2080上同样的环境中程序可以正常运行，而换到A100中，就会报错如下：

NVIDIA A100-PCIE-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75. If you want to use the NVIDIA A100-PCIE-40GB GPU with PyTorch, please check the instructions at Start Locally | PyTorch

大概意思就是： NVIDIA A100-PCIE-40GB 带有的CUDA算力是8.0，它和现有的PyTorch版本不匹配，现有的PyTorch版本支持的CUDA算力是 3.7，5.0，6.0，7.0，7.5。

支持的CUDA算力是与安装的cuda的版本有关的，cuda 10.2 仅仅支持 3.7，5.0，6.0，7.0算力，不支持8.0算力。而cuda11是支持8.0算力的。

目前安装的torch版本是1.7.0，所以，需要安装cuda11及其以上，并且和torch 1.7.0不冲突的版本。

进入 PyTorch官网：Previous PyTorch Versions | PyTorch

选择合适的CUDA版本，也可以去 Previous PyTorch Versions 进行查看选择，

最终选择了 v1.7.1 CUDA 11.0的版本


   
     
      
     
     
      
       # CUDA 11.0
      
     

     
      
     
     
      
       pip install torch==
       1.7
       .1+cu110 torchvision==
       0.8
       .2+cu110 torchaudio==
       0.7
       .2 -f https:
       //download.pytorch.org/whl/torch_stable.html

问题解决。

参考：https://zhuanlan.zhihu.com/p/427395039

这个问题常常会伴随着这几个输出信息：
NVIDIA A100 GPU - RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR
问题：
A100-PCIE-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the A100-PCIE-40GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

建议看这里：
NVIDIA A100 GPU - RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR

https://discuss.pytorch.org/t/nvidia-a100-gpu-runtimeerror-cudnn-error-cudnn-status-mapping-error/121648

另外，pytorch或者tensorflow使用conda安装失败，解决环境失败，网速太慢，可能是你网速或者安装源里面根本就没有这个版本的，你需要换源或者换版本。