RuntimeError: Unable to find a valid cuDNN algorithm to run convolution
不得不说,算力太强,版本太新也是一种烦恼哈哈在安装torch时,一定要注意显卡的cuda版本问题。比如,在 RTX2080上 同样的环境中 程序可以正常运行,而换到A100中,就会报错如下:NVIDIA A100-PCIE-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.
不得不说,算力太强,版本太新也是一种烦恼哈哈
(base) [s503-1@s518-7 code_Spiking_CNN_Rathi_hybrid]$ conda activate pytorch17
(pytorch17) [s503-1@s518-7 code_Spiking_CNN_Rathi_hybrid]$ conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/linux-64/
Solving environment: done
==> WARNING: A newer version of conda exists. <==
current version: 4.5.11
latest version: 4.12.0
Please update conda by running
$ conda update -n base -c defaults conda
## Package Plan ##
environment location: /data1/s503-1/anaconda3/envs/pytorch17
added / updated specs:
- cudatoolkit=11.0
- pytorch==1.7.1
- torchaudio==0.7.2
- torchvision==0.8.2
The following packages will be downloaded:
package | build
---------------------------|-----------------
torchvision-0.8.2 | py38_cu110 17.9 MB http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
pytorch-1.7.1 |py3.8_cuda11.0.221_cudnn8.0.5_0 770.6 MB http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
cudatoolkit-11.0.221 | h6bb024c_0 952.7 MB defaults
------------------------------------------------------------
Total: 1.70 GB
The following packages will be UPDATED:
cudatoolkit: 10.2.89-hfd86e86_1 defaults --> 11.0.221-h6bb024c_0 defaults
pytorch: 1.7.1-py3.8_cuda10.2.89_cudnn7.6.5_0 http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch --> 1.7.1-py3.8_cuda11.0.221_cudnn8.0.5_0 http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
torchvision: 0.8.2-py38_cu102 http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch --> 0.8.2-py38_cu110 http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
Proceed ([y]/n)? y
Downloading and Extracting Packages
torchvision-0.8.2 | 17.9 MB | ############################################################################# | 100%
pytorch-1.7.1 | 770.6 MB | ############################################################################# | 100%
cudatoolkit-11.0.221 | 952.7 MB | ############################################################################# | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: - By downloading and using the CUDA Toolkit conda packages, you accept the terms and conditions of the CUDA End User License Agreement (EULA): https://docs.nvidia.com/cuda/eula/index.html
done
kernel_size : 3
test_acc_every_batch : False
train_acc_batches : 200
devices : 0
Loaded module.features.0.weight from ./trained_models/snn/snn_vgg5_mnist_100.pth
Loaded module.features.3.weight from ./trained_models/snn/snn_vgg5_mnist_100.pth
Loaded module.features.6.weight from ./trained_models/snn/snn_vgg5_mnist_100.pth
Loaded module.classifier.0.weight from ./trained_models/snn/snn_vgg5_mnist_100.pth
Loaded module.classifier.3.weight from ./trained_models/snn/snn_vgg5_mnist_100.pth
Loaded module.classifier.6.weight from ./trained_models/snn/snn_vgg5_mnist_100.pth
DataParallel(
(module): VGG_SNN_STDB(
(input_layer): PoissonGenerator()
(features): Sequential(
(0): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): ReLU(inplace=True)
(2): AvgPool2d(kernel_size=2, stride=2, padding=0)
(3): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): ReLU(inplace=True)
(5): Dropout(p=0.3, inplace=False)
(6): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(7): ReLU(inplace=True)
(8): AvgPool2d(kernel_size=2, stride=2, padding=0)
)
(classifier): Sequential(
(0): Linear(in_features=6272, out_features=4096, bias=False)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=4096, out_features=4096, bias=False)
(4): ReLU(inplace=True)
(5): Dropout(p=0.5, inplace=False)
(6): Linear(in_features=4096, out_features=10, bias=False)
)
)
)
Adam (
Parameter Group 0
amsgrad: True
betas: (0.9, 0.999)
eps: 1e-08
lr: 0.0001
weight_decay: 0.0005
)snn.py:182: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
mask = torch.tensor(mask,dtype=torch.float)
在安装torch时,一定要注意显卡的cuda版本问题。
比如,在 RTX2080上 同样的环境中 程序可以正常运行,而换到A100中,就会报错如下:

NVIDIA A100-PCIE-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75. If you want to use the NVIDIA A100-PCIE-40GB GPU with PyTorch, please check the instructions at Start Locally | PyTorch
大概意思就是: NVIDIA A100-PCIE-40GB 带有的CUDA算力是8.0,它和现有的PyTorch版本不匹配,现有的PyTorch版本支持的CUDA算力是 3.7,5.0,6.0,7.0,7.5。
支持的CUDA算力是与安装的cuda的版本有关的,cuda 10.2 仅仅支持 3.7,5.0,6.0,7.0算力,不支持8.0算力。而cuda11是支持8.0算力的。
目前安装的torch版本是1.7.0,所以,需要安装cuda11及其以上,并且和torch 1.7.0不冲突的版本。
进入 PyTorch官网:Previous PyTorch Versions | PyTorch

选择合适的CUDA版本, 也可以去 Previous PyTorch Versions 进行查看选择,

最终选择了 v1.7.1 CUDA 11.0的版本
-
# CUDA 11.0
-
pip install torch==
1.7
.1+cu110 torchvision==
0.8
.2+cu110 torchaudio==
0.7
.2 -f https:
//download.pytorch.org/whl/torch_stable.html
问题解决。
参考:https://zhuanlan.zhihu.com/p/427395039
这个问题常常会伴随着这几个输出信息:
NVIDIA A100 GPU - RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR
问题:
A100-PCIE-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the A100-PCIE-40GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
建议看这里:
NVIDIA A100 GPU - RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR
https://discuss.pytorch.org/t/nvidia-a100-gpu-runtimeerror-cudnn-error-cudnn-status-mapping-error/121648
另外,pytorch或者tensorflow使用conda安装失败,解决环境失败,网速太慢,可能是你网速或者安装源里面根本就没有这个版本的,你需要换源或者换版本。
更多参考
https://blog.csdn.net/hb_learing/article/details/114851335
https://blog.csdn.net/n_fly/article/details/120952287
https://blog.csdn.net/xiaobai11as/article/details/108357857
https://discuss.pytorch.org/t/nvidia-a100-gpu-runtimeerror-cudnn-error-cudnn-status-mapping-error/121648
https://blog.csdn.net/Willen_
https://blog.csdn.net/wxd1233/article/details/120509750
https://blog.csdn.net/weixin_43615569/article/details/108932451
更多推荐



所有评论(0)