Hi everybody,
I am currently playing around with the Cyclone V SoC Development board.
I have a QSys system running resembling the Golden system reference design apart from the fact that the h2f_axi_clock is controlled by a PLL residing on FPGA side.
The h2f_axi_clock is 80 MHz in my case.
I have the onchip memory up and running according to the GSRD. I used the Linux example to run a Linux application on the HPS. From within the application I can write to and read from the on-chip memory, the memory address range is mmaped to the linux user space for this purpose.
Now the question:
I get about 45 MByte/s throughput when using a memcpy to copy a block of 65kBytes of data from the HPS to the FPGA (transfer needs about 1.4 ms). I measured the time it takes to memcpy the following way:
clock_gettime(CLOCK_REALTIME, &start);
memcpy((void*)hw_onchip_mem_base, (void*)&buffer[0], ONCHIP_MEMORY2_0_SPAN);
clock_gettime(CLOCK_REALTIME, &end);
45 MBytes/s seem to be quite low. I have a 64 Bit bus width to the memory and a clock of 80 MHz. So I would expect about 640 MBytes/s theoretical throughput.
Of course I can imagine that the bridge is only able to transmit with a certain burst size, arbitration must take place, Linux data handling will add some overhead and maybe there are other restrictions.
But is 45 MByte/s all I can get? That would be quite bad...
Any ideas how to improve the performance?
What am I doing wrong?
Has anybody better results and how?
Tool is Quartus 13.1, Linux 3.9 Kernel.
Thanks in advance!!
Volker
I am currently playing around with the Cyclone V SoC Development board.
I have a QSys system running resembling the Golden system reference design apart from the fact that the h2f_axi_clock is controlled by a PLL residing on FPGA side.
The h2f_axi_clock is 80 MHz in my case.
I have the onchip memory up and running according to the GSRD. I used the Linux example to run a Linux application on the HPS. From within the application I can write to and read from the on-chip memory, the memory address range is mmaped to the linux user space for this purpose.
Now the question:
I get about 45 MByte/s throughput when using a memcpy to copy a block of 65kBytes of data from the HPS to the FPGA (transfer needs about 1.4 ms). I measured the time it takes to memcpy the following way:
clock_gettime(CLOCK_REALTIME, &start);
memcpy((void*)hw_onchip_mem_base, (void*)&buffer[0], ONCHIP_MEMORY2_0_SPAN);
clock_gettime(CLOCK_REALTIME, &end);
45 MBytes/s seem to be quite low. I have a 64 Bit bus width to the memory and a clock of 80 MHz. So I would expect about 640 MBytes/s theoretical throughput.
Of course I can imagine that the bridge is only able to transmit with a certain burst size, arbitration must take place, Linux data handling will add some overhead and maybe there are other restrictions.
But is 45 MByte/s all I can get? That would be quite bad...
Any ideas how to improve the performance?
What am I doing wrong?
Has anybody better results and how?
Tool is Quartus 13.1, Linux 3.9 Kernel.
Thanks in advance!!
Volker