Hi,
In situations in which you don't want to compute on all the elements in an 2d-array/opencl buffer. On the FPGA would I be better launching the exact amount of work items required to process them (assuming one WI per array element) and using offsets, or specifying an NDrange which is the size of the entire array (or some other multiple) and using a simple if statement within the kernel to control which array elements are actually processed.
eg if I had an array of X by X elements but I don't want to process the elements in the outer halo.
Similarly is it generally more performant to launch just the required number of work items in the NDrange or to round the NDrange up to a particular value? On other architectures I found this to be beneficial. I note that the value of PREFERRED_WORK_GROUP_MULTIPLE seems to be 0 on Altera, is this significant?
Regarding specifying the workgroup size at compilation time (assuming the problem size doesn't change) would I be better specifying a workgroup size which is the same as the number of workitems which will eventually be launched (assuming this size can fit in the hardware)? Or some other value?
Many thanks
In situations in which you don't want to compute on all the elements in an 2d-array/opencl buffer. On the FPGA would I be better launching the exact amount of work items required to process them (assuming one WI per array element) and using offsets, or specifying an NDrange which is the size of the entire array (or some other multiple) and using a simple if statement within the kernel to control which array elements are actually processed.
eg if I had an array of X by X elements but I don't want to process the elements in the outer halo.
Similarly is it generally more performant to launch just the required number of work items in the NDrange or to round the NDrange up to a particular value? On other architectures I found this to be beneficial. I note that the value of PREFERRED_WORK_GROUP_MULTIPLE seems to be 0 on Altera, is this significant?
Regarding specifying the workgroup size at compilation time (assuming the problem size doesn't change) would I be better specifying a workgroup size which is the same as the number of workitems which will eventually be launched (assuming this size can fit in the hardware)? Or some other value?
Many thanks