Quantcast
Channel: Altera Forums
Viewing all articles
Browse latest Browse all 19390

vector_add example - measuring the performance

$
0
0
Hello,

I have executed the vector_add example on the DE10-Standard board and got the following output. It took 6.9ms kernel time to perform the floating point add operation on 1M elements. So, the performance is around 145M FLOPS. I expected the performance to be much higher in the order of 100 Giga FLOPS. Is there a way to achieve a better performance?

------------------------------------------------------------
Initializing OpenCL
Platform: Intel(R) FPGA SDK for OpenCL(TM)
Using 1 device(s)
de10_standard_sharedonly : Cyclone V SoC Development Kit
Using AOCX: vector_add.aocx
Reprogramming device [0] with handle 1
Launching for device 0 (1000000 elements)

Time: 108.505 ms
Kernel time (device 0): 6.931 ms

Verification: PASS
--------------------------------------------------

Thanks
Pavan

Viewing all articles
Browse latest Browse all 19390

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>