In Intel FPGA SDK for OpenCL programming guide Page 43, a producer, consumer and manager strategy is mentioned.
There is also reference code for producer.
__kernel void __attribute__((task)) producer(
__global const int* restrict src,
__global volatile int* restrict shared_mem,
const int iterations)
{
int base_offset;
for(int gid = 0; gid < iterations; gid++){
int lid = 0xff & gid;
if(lid == 0){
base_offset = read_channel_intel(req);
}
shared_mem[base_offset + lid] = src[gid];
mem_fence(CLK_GLOBAL_MEM_FENCE | CLK_CHANNEL_MEM_FENCE);
if(lid == 255){
write_channel_intel(c, base_offset);
}
}
}
I searched around, but I didn't see any detailed example code of this strategy.
According to my understanding, the shared_mem is used as the "channel".
Seems it requires the producer and consumer to work on a loop with identical loop count "iterations".
However, a simple regular opencl channel should be able to to the same thing, I don't see there is any reason
using this design.
If the producer does some filtering, for instance, it discards data that is larger than 10 and sends the
rest to the consumer. Basically the loop count in the consumer is not fixed.
In this case, the basic on-chip channel doesn't work while the producer, consumer and manager scheme can't be applied too.
As the producer has no feed back signal sent to the manager, then the manager can't decide if the consumer has finished all the
processing.
In general, I have two questions.
1) Is there a way to do on-chip communication between two kernels using OpenCL when the amount of data communication
is not determined at compilation time. You may take the simple filter as an example. Kernel 0 filters input data streams and extracts the data
that is larger than 10. Kernel 1 gets the output data of Kernel 1 and does some processing.
2) Will the producer, consumer and manager scheme work for the filter example? If not, when will it be used to replace
the basic on-chip channel based design?
Any suggestions will be appreciated.
Regards,
Cheng Liu
There is also reference code for producer.
__kernel void __attribute__((task)) producer(
__global const int* restrict src,
__global volatile int* restrict shared_mem,
const int iterations)
{
int base_offset;
for(int gid = 0; gid < iterations; gid++){
int lid = 0xff & gid;
if(lid == 0){
base_offset = read_channel_intel(req);
}
shared_mem[base_offset + lid] = src[gid];
mem_fence(CLK_GLOBAL_MEM_FENCE | CLK_CHANNEL_MEM_FENCE);
if(lid == 255){
write_channel_intel(c, base_offset);
}
}
}
I searched around, but I didn't see any detailed example code of this strategy.
According to my understanding, the shared_mem is used as the "channel".
Seems it requires the producer and consumer to work on a loop with identical loop count "iterations".
However, a simple regular opencl channel should be able to to the same thing, I don't see there is any reason
using this design.
If the producer does some filtering, for instance, it discards data that is larger than 10 and sends the
rest to the consumer. Basically the loop count in the consumer is not fixed.
In this case, the basic on-chip channel doesn't work while the producer, consumer and manager scheme can't be applied too.
As the producer has no feed back signal sent to the manager, then the manager can't decide if the consumer has finished all the
processing.
In general, I have two questions.
1) Is there a way to do on-chip communication between two kernels using OpenCL when the amount of data communication
is not determined at compilation time. You may take the simple filter as an example. Kernel 0 filters input data streams and extracts the data
that is larger than 10. Kernel 1 gets the output data of Kernel 1 and does some processing.
2) Will the producer, consumer and manager scheme work for the filter example? If not, when will it be used to replace
the basic on-chip channel based design?
Any suggestions will be appreciated.
Regards,
Cheng Liu