I am implementing sparse matrix multiply on Nalla 510t, and have some problem transfer my matrix.
The problem has I stripped the matrix into 512 pieces for parallel computation.
First I tried to create 512 cl_mem instances and pass the 512 pointers to the kernel function, but find there is a limitation on kernel argument size, and only support about 100 pointers. So it doesn't compile.
Then I tried to create a single cl_mem instance for the image and another cl_mem instance for the 512 offset addresses as indices, but when compile, it is reported that the cl_mem is too large(I set the size to 1024*1024*512 uint type, which should be 2GB). the design can only be compiled when setting the buffer size less than 2GB.
I want to run some larger matrix, and since Nalla 510t has 16GB on-board DDR memory, it should be physically supported. But how can I achieve this using OpenCL?
Thanks in advance for any help!
The problem has I stripped the matrix into 512 pieces for parallel computation.
First I tried to create 512 cl_mem instances and pass the 512 pointers to the kernel function, but find there is a limitation on kernel argument size, and only support about 100 pointers. So it doesn't compile.
Then I tried to create a single cl_mem instance for the image and another cl_mem instance for the 512 offset addresses as indices, but when compile, it is reported that the cl_mem is too large(I set the size to 1024*1024*512 uint type, which should be 2GB). the design can only be compiled when setting the buffer size less than 2GB.
I want to run some larger matrix, and since Nalla 510t has 16GB on-board DDR memory, it should be physically supported. But how can I achieve this using OpenCL?
Thanks in advance for any help!