At least with Colab + Tesla P100 I could use Vulkan.
https://qiita.com/syoyo/items/3956e98e4a607cde6cb2
(How about V100 or A100?)
Try VkInline, which can call the Vulkan compute kernel inline from python, like a cupy.
https://github.com/fynv/VkInline
You will need the Vulkan 1.2 driver. (Currently, only RADV (AMD OSS (?) Linux Vulkan driver) or Windows Adrenalin driver supports 1.2?)
VkInline
The license is Anti 996 license (Companies that ignore the Labor Standards Act, such as 9:09 (21:00), 6 days a week, should not use anti-996 licensed software)
https://github.com/996icu/996.ICU
There is no particular problem, and if you follow the procedure, it will be smooth.
VK_KHR_buffer_device_address
. It doesn't work on Windows on amdgpu or older AMD GPUs.I can go about.
This time, I confirmed the operation with RX5700 (Navi) with RADV driver.
Note using Vulkan (compute kernel) with RADV with ROCm https://qiita.com/syoyo/items/ce3943757281acbdba49
Let's run test_compute.py.
// from VkInline test_compute.py
import VkInline as vki
import numpy as np
# interface with numpy
harr = np.array([1.0, 2.0, 3.0, 4.0, 5.0], dtype='float32')
darr = vki.device_vector_from_numpy(harr)
print(darr.to_host())
# GLSL data type
print(darr.name_view_type())
harr2 = np.array([6,7,8,9,10], dtype='int32')
darr2 = vki.device_vector_from_numpy(harr2)
# kernel with auto parameters, launched twice with different types
kernel = vki.Computer(['arr_in', 'arr_out', 'k'],
'''
void main()
{
uint id = gl_GlobalInvocationID.x;
if (id >= get_size(arr_in)) return;
set_value(arr_out, id, get_value(arr_in, id)*k);
}
''')
darr_out = vki.SVVector('float', 5)
kernel.launch(1,128, [darr, darr_out, vki.SVFloat(10.0)])
print (darr_out.to_host())
darr_out = vki.SVVector('int', 5)
kernel.launch(1,128, [darr2, darr_out, vki.SVInt32(5)])
print (darr_out.to_host())
# create a vector from python list with GLSL type specified
darr3 = vki.device_vector_from_list([3.0, 5.0, 7.0, 9.0 , 11.0], 'float')
print(darr3.to_host())
[1. 2. 3. 4. 5.]
Comb_bb4c7639fd354507
[10. 20. 30. 40. 50.]
[30 35 40 45 50]
[ 3. 5. 7. 9. 11.]
:tada:
I'm surprised that it works smoothly without any problems! You can expect it, but there are various issues to make an application such as machine learning from here.
TODO
Recommended Posts