简介
前几天捣鼓了一下Ubuntu,正是想用一下我旧电脑上的N卡,可以用GPU来跑代码,体验一下多核的快乐。
还好我这破电脑也是支持Cuda的:
1
2
3
4
5
6
7
8
9
10
11
12
13
|
$ sudo lshw -C display *-display description: 3D controller product: GK208M [GeForce GT 740M] vendor: NVIDIA Corporation physical id : 0 bus info: pci@0000:01:00.0 version: a1 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress bus_master cap_list rom configuration: driver=nouveau latency=0 resources: irq:35 memory:f0000000-f0ffffff memory:c0000000-cfffffff memory:d0000000-d1ffffff ioport:6000(size=128) |
安装相关工具
首先安装一下Cuda的开发工具,命令如下:
1
|
$ sudo apt install nvidia-cuda-toolkit |
查看一下相关信息:
1
2
3
4
5
6
|
$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Thu_Nov_18_09:45:30_PST_2021 Cuda compilation tools, release 11.5, V11.5.119 Build cuda_11.5.r11.5 /compiler .30672275_0 |
通过Conda安装相关的依赖包:
1
|
conda install numba & conda install cudatoolkit |
通过pip安装也可以,一样的。
测试与驱动安装
简单测试了一下,发觉报错了:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
|
$ /home/larry/anaconda3/bin/python /home/larry/code/pkslow-samples/python/src/main/python/cuda/test1 .py Traceback (most recent call last): File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/driver.py" , line 246, in ensure_initialized self.cuInit(0) File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/driver.py" , line 319, in safe_cuda_api_call self._check_ctypes_error(fname, retcode) File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/driver.py" , line 387, in _check_ctypes_error raise CudaAPIError(retcode, msg) numba.cuda.cudadrv.driver.CudaAPIError: [100] Call to cuInit results in CUDA_ERROR_NO_DEVICE During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/larry/code/pkslow-samples/python/src/main/python/cuda/test1.py" , line 15, in <module> gpu_print[1, 2]() File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/compiler.py" , line 862, in __getitem__ return self.configure(*args) File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/compiler.py" , line 857, in configure return _KernelConfiguration(self, griddim, blockdim, stream, sharedmem) File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/compiler.py" , line 718, in __init__ ctx = get_context() File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/devices.py" , line 220, in get_context return _runtime.get_or_create_context(devnum) File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/devices.py" , line 138, in get_or_create_context return self._get_or_create_context_uncached(devnum) File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/devices.py" , line 153, in _get_or_create_context_uncached with driver.get_active_context() as ac: File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/driver.py" , line 487, in __enter__ driver.cuCtxGetCurrent(byref(hctx)) File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/driver.py" , line 284, in __getattr__ self.ensure_initialized() File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/driver.py" , line 250, in ensure_initialized raise CudaSupportError(f "Error at driver init: {description}" ) numba.cuda.cudadrv.error.CudaSupportError: Error at driver init: Call to cuInit results in CUDA_ERROR_NO_DEVICE (100) |
网上搜了一下,发现是驱动问题。通过Ubuntu自带的工具安装显卡驱动:
还是失败:
1
2
|
$ nvidia-smi NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. |
最后,通过命令行安装驱动,成功解决这个问题:
1
|
$ sudo apt install nvidia-driver-470 |
检查后发现正常了:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
$ nvidia-smi Wed Dec 7 22:13:49 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage /Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 N /A | N /A | | N /A 51C P8 N /A / N /A | 4MiB / 2004MiB | N /A Default | | | | N /A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ |
测试代码也可以跑了。
测试Python代码
打印ID
准备以下代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
from numba import cuda import os def cpu_print(): print ( 'cpu print' ) @cuda .jit def gpu_print(): dataIndex = cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x print ( 'gpu print ' , cuda.threadIdx.x, cuda.blockIdx.x, cuda.blockDim.x, dataIndex) if __name__ = = '__main__' : gpu_print[ 4 , 4 ]() cuda.synchronize() cpu_print() |
这个代码主要有两个函数,一个是用CPU执行,一个是用GPU执行,执行打印操作。关键在于@cuda.jit
这个注解,让代码在GPU上执行。运行结果如下:
$ /home/larry/anaconda3/bin/python /home/larry/code/pkslow-samples/python/src/main/python/cuda/print_test.py
gpu print 0 3 4 12
gpu print 1 3 4 13
gpu print 2 3 4 14
gpu print 3 3 4 15
gpu print 0 2 4 8
gpu print 1 2 4 9
gpu print 2 2 4 10
gpu print 3 2 4 11
gpu print 0 1 4 4
gpu print 1 1 4 5
gpu print 2 1 4 6
gpu print 3 1 4 7
gpu print 0 0 4 0
gpu print 1 0 4 1
gpu print 2 0 4 2
gpu print 3 0 4 3
cpu print
可以看到GPU总共打印了16次,使用了不同的Thread来执行。这次每次打印的结果都可能不同,因为提交GPU是异步执行的,无法确保哪个单元先执行。同时也需要调用同步函数cuda.synchronize()
,确保GPU执行完再继续往下跑。
查看时间
我们通过这个函数来看GPU并行的力量:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
|
from numba import jit, cuda import numpy as np # to measure exec time from timeit import default_timer as timer # normal function to run on cpu def func(a): for i in range ( 10000000 ): a[i] + = 1 # function optimized to run on gpu @jit (target_backend = 'cuda' ) def func2(a): for i in range ( 10000000 ): a[i] + = 1 if __name__ = = "__main__" : n = 10000000 a = np.ones(n, dtype = np.float64) start = timer() func(a) print ( "without GPU:" , timer() - start) start = timer() func2(a) print ( "with GPU:" , timer() - start) |
结果如下:
$ /home/larry/anaconda3/bin/python /home/larry/code/pkslow-samples/python/src/main/python/cuda/time_test.py
without GPU: 3.7136273959999926
with GPU: 0.4040513340000871
可以看到使用CPU需要3.7秒,而GPU则只要0.4秒,还是能快不少的。当然这里不是说GPU一定比CPU快,具体要看任务的类型。
以上就是一文详解如何用GPU来运行Python代码的详细内容,更多关于用GPU运行Python代码的资料请关注服务器之家其它相关文章!
原文链接:https://www.cnblogs.com/larrydpk/p/17093627.html