Hello,
I have executed the vector_add example on the DE10-Standard board and got the following output. It took 6.9ms kernel time to perform the floating point add operation on 1M elements. So, the performance is around 145M FLOPS. I expected the performance to be much higher in the order of 100 Giga FLOPS. Is there a way to achieve a better performance?
------------------------------------------------------------
Initializing OpenCL
Platform: Intel(R) FPGA SDK for OpenCL(TM)
Using 1 device(s)
de10_standard_sharedonly : Cyclone V SoC Development Kit
Using AOCX: vector_add.aocx
Reprogramming device [0] with handle 1
Launching for device 0 (1000000 elements)
Time: 108.505 ms
Kernel time (device 0): 6.931 ms
Verification: PASS
--------------------------------------------------
Thanks
Pavan
I have executed the vector_add example on the DE10-Standard board and got the following output. It took 6.9ms kernel time to perform the floating point add operation on 1M elements. So, the performance is around 145M FLOPS. I expected the performance to be much higher in the order of 100 Giga FLOPS. Is there a way to achieve a better performance?
------------------------------------------------------------
Initializing OpenCL
Platform: Intel(R) FPGA SDK for OpenCL(TM)
Using 1 device(s)
de10_standard_sharedonly : Cyclone V SoC Development Kit
Using AOCX: vector_add.aocx
Reprogramming device [0] with handle 1
Launching for device 0 (1000000 elements)
Time: 108.505 ms
Kernel time (device 0): 6.931 ms
Verification: PASS
--------------------------------------------------
Thanks
Pavan