Quantcast
Channel: Altera Forums
Viewing all 19390 articles
Browse latest View live

Fixed point optimization

$
0
0
Hello,

I have written two kernels to notice the difference in fixed and floating point operations.

a)
__kernel
__attribute__((task))
void test_multiplier(global char *restrict in, global char *restrict weights, global int *restrict out) {


int output = 0;
#pragma unroll 100
for(int i=0; i<VEC_SIZE; i++){
output += in[i] * weights[i];
}


*out = output;

}

b)
__kernel
__attribute__((task))
void test_multiplier(global float *restrict in, global float *restrict weights, global float *restrict out) {


int output = 0;
#pragma unroll 100
for(int i=0; i<VEC_SIZE; i++){
output += in[i] * weights[i];
}


*out = output;

}

Both the kernels give me the same number of DSPs, i.e 100 (unroll factor). I was expecting 25 DSPs in the 8 bit (char argument) case. Does aoc compiler optimize well for fixed point quantizations?

Multiple kernels using channels vs Single Merged Kernel

$
0
0
Which implementation would be more effective? A huge single WI kernel (communication is assumed using local memory within the kernel) or a distributed kernel with communicating using channels?

I see lot of implementations using distributed kernel approach. Writing the entire system in single WI kernel is challenging?

Can I use Tranceiver as LVDS Lines

$
0
0
Hi, Can I use Tranceiver as LVDS Lines? LVDS lines in low cost FPGAs (Cyclone 10 GX) can run up to 1.4 Gbps. I need to run LVDS lines at 2.5 gbps. Can I use Transceivers instead of LVDS IOs?

Cannot find how to set environment variable in Quartus Prime 18.0 Lite for Windows 10

$
0
0
Just about to start using the Intel 10 LP FPGA Kit.
The "manual" states before you start using the examples....

The BTS relies on the Quartus Prime Pro software's specific library. Before running the BTS, open the
Quartus Prime Pro software. It sets the environment variable $QUARTUS_ROOTDIR automatically.

None of the dropdown menus seem to provide an option to set
environment variable
s.

What am I missing?
:(
John

How to modify the official website's fft1d kernel to get a 32k point Fourier transfor

$
0
0
I met a problem that when I modified the official kernel with a larger points than 4k, I can not get the correct result. How can I modify the code for a large number fft?

some doubts on stratix 10 pcie hard IP ?

$
0
0
Hi,

In the qsys design. I have ddr3 (512 Mb) and pcie express configured bar0 (pre-fetchable 64-bit memory) and connected to ddr3.
After generation of .sof file and after programming. when i start the host system and once i type "dmesg" in terminal. it is showing 1 gb memory space (which should be 512mb) configured to bar0. I don't know where i am going wrong .

Even though , if i proceed further, i am able to access only the first 64mb of entire 512mb space from the host. there is a error print message after running "dmesg" it shows "vmap" virtual memory capacity issue.

I am very new to something like os, kernal modifications , modifying driver c code.
Its a great help if someone explains/ clears this issue.

Regards,
Anil

Cannot edit the generated PLL Intel FPGA IP v18.0

$
0
0
Hi All,


I am using Quartus Prime v18.0 and generated PLL IP v18.0. I have successfully generated the PLL and added into the project but when I tried to modify it in the IP components, the MegaWizard failed to launch, below is the complete message of the problem.
"Failed to launch MegaWizard Plug-In Manager. PLL Intel FPGA IP v18.0 could not be found in the specified library paths."
See attachment for the screenshot.
I tried other IP components, like FIFO, ALTLVDS_RX, ALTLVDS_TX, ALTCLKCTRL, all of it works fine.


Quartus Prime: Version 18.0.0 Build 614 04/24/2018 SJ Standard Edition
Target Device: Arria V GX (5AGXBB3D4F35I5)
Machine: Windows 10 Pro 64bit


Also, I tried the same thing on other PC (Windows 7) but the problem is the same.
Need help to fix this issue.


Thank You.


Zeahr
Attached Images

OpenCL compile error for high resource utilization

$
0
0
Hello,

I encounter compile error (Error (23031): Evaluation of Tcl script import_compile.tcl unsuccessful) when my resource utilization is huge. My estimated resource usage is :

+--------------------------------------------------------------------+
; Estimated Resource Usage Summary ;
+----------------------------------------+---------------------------+
; Resource + Usage ;
+----------------------------------------+---------------------------+
; Logic utilization ; 65% ;
; ALUTs ; 39% ;
; Dedicated logic registers ; 29% ;
; Memory blocks ; 59% ;
; DSP blocks ; 79% ;
+----------------------------------------+---------------------------;

I have attached the compile log. I have seen previous threads in the forum and this error might be because of less RAM. I have allocated 20GB of RAM and 60GB of swap for this compilation. Do you think my error is still because of less RAM?

My log file ends with the following error
Error: Quartus Prime Compiler Database Interface was unsuccessful. 1 error, 0 warnings
Error: Peak virtual memory: 693 megabytes
Error: Processing ended: Mon Jun 4 20:49:47 2018
Error: Elapsed time: 13:00:32
Error: Total CPU time (on all processors): 01:54:02
Attached Files

How to improve the frequency of FPGA with max-fanout?

$
0
0
The frequency of my FPGA code is about 240MHz, so we want to improve it.

We add ‘-max-fanout=1024 --fmax 300’ in the aoc command, but it has no effect on increase frequency.

In paper ‘Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network ’, noted that ‘To achieve a higher working frequency, we use register duplication to limit the maximum fan-out to 100. We found that the paths with the highest fan-out are the control signals, which are generated by the dispatcher and connected to all of the PEs.’. So, how to use register duplication to limit the maximum fan-out with OpenCL, and how to find the paths with the highest fan-out?

Thanks a lot for your help!

Achieve Low Latency in OpenCL implementation of a State Space Equation Solver

$
0
0
Dear All,

I am exploring OpenCL as an alternative to HDL to implement a simple accelerator, to solve state space equations of an Induction Motor, on an FPGA PCIe board.
My aim is to have a low latency implementation, with a very modest bandwidth, but able to provide the results with possibly <50us latency (including memory transfers to the host, a Xeon CPU).

I implemented my first kernel, following the Intel Programming and Best Practices guide.

Code:

    typedef union{
        float f[4];
        float4 f4;
    }float4_t;
   
    typedef union{
        float f[8];
        float8 f8;
        float4_t f4_t[2];
    }float8_t;

typedef struct __attribute__((packed)){
    t_float3 iabc;
    t_float  wr;
    t_float  Te;
}ker_out_t;




#define RANK 4


__attribute__((task))
__kernel void induction_machine(
        __constant float4_t* restrict vabc,
        __constant float* restrict theta,
        __global ker_out_t*  restrict ker_out,
        __global float4_t* restrict AC,
        __global float4_t* restrict BD
        ){
 
    int tid = 0;


    const float kpc = sqrt(2.0/3.0);
    const float sqr2b2 = sqrt(2.0)/2.0;
    const float pi2b3 = 2.0*pi/3.0;
    const float sqrt3b2 = sqrt(3.0)/2.0;
    const float f1b2 = 0.5;


    float4_t KP[4];
    float4_t iKP[4];


    float4_t iabc={};
    float  Te;
    float  wr;


    float4_t vdqo = {};


    float4_t x;
    float4_t u = {};
    float4_t y = {};
    float4_t xn = {};
 
    float8_t ACx={};
    float8_t BDu={};
   
    float costh = cos(theta[tid]);
    float sinth = sin(theta[tid]);
    float costh_p_pi2b3 = -sqrt3b2*sinth -f1b2*costh; //cos(theta[tid] + pi2b3);
    float costh_m_pi2b3 = sqrt3b2*sinth -f1b2*costh;//cos(theta[tid] - pi2b3);
    float sinth_p_pi2b3 = sqrt3b2*costh -f1b2*sinth; //sin(theta[tid] + pi2b3);
    float sinth_m_pi2b3 = -sqrt3b2*costh -f1b2*sinth; //sin(theta[tid] - pi2b3);


    // park and inverse park coefficients
    KP[0].f4 =(float4) (kpc*costh,kpc*costh_m_pi2b3,kpc*costh_p_pi2b3,0);
    KP[1].f4 =(float4) (kpc*(-sinth),kpc*(-sinth_m_pi2b3),kpc*(-sinth_p_pi2b3),0);
    KP[2].f4 = (float4) (kpc*sqr2b2,kpc*sqr2b2,kpc*sqr2b2,0);


    iKP[0].f4 =(float4) (kpc*costh,-kpc*sinth,kpc*sqr2b2,0);
    iKP[1].f4 =(float4) (kpc*costh_m_pi2b3,kpc*(-sinth_m_pi2b3),kpc*sqr2b2,0);
    iKP[2].f4 =(float4) (kpc*costh_p_pi2b3,kpc*(-sinth_p_pi2b3),kpc*sqr2b2,0);


    // park transform
   
    for(int i=0;i<RANK;i++){
    #pragma unroll   
        for(int j=0;j<RANK;j++)   
            vdqo.f[i]+=KP->f[i*RANK+j]*vabc->f[j];
    }


    u.f4.s01 = vdqo.f4.s01; //state space input
   
    BDu.f8 = (float8) (0,0,0,0,0,0,0,0);
    ACx.f8 = (float8) (0,0,0,0,0,0,0,0);
 
 //state solver
#pragma unroll   
    for(int i=0; i<2*RANK;i++){
#pragma unroll   
      for(int j=0; j<RANK;j++){
        ACx.f[i] += AC[i].f[j]*x.f[j];
        BDu.f[i] += BD[i].f[j]*u.f[j];
      }
    }


#pragma unroll
    for(int i=0;i<RANK;i++){
        xn.f[i] = x.f[i] + h*(ACx.f[i] + BDu.f[i]);
        y.f[i] = ACx.f[4+i] + BDu.f[4+i];
    }
 
    //torque and speed output
    ker_out->Te = (3.0/2.0)*(P/2.0)*(x.f[1]*y.f[0] - x.f[0]*y.f[1]);
    ker_out->wr = ker_out->wr + (P/(2*J))*(ker_out->Te - Td)*h;   
 
    //system update
    AC[2].f[3] = ker_out->wr-w;
    AC[3].f[2] = w-ker_out->wr;




    //output currents inverse park transform
    for(int i=0;i<RANK;i++){
    #pragma unroll   
        for(int j=0;j<RANK;j++)   
            iabc.f[i]+=iKP[i].f[j]*y.f[j];
    }
   
    ker_out->iabc = iabc.f4.s012;
   
    //state update
    x = xn;
}

Profiling this code on an Arria10GX I noticed that it roughly takes 70us to execute.

Do you think this result is reasonable? Is there a way to reduce this figure?
Running the kernel several times inside a for loop in the host, executing using EnqueueTask, I also noticed in the output of aocl report that a lot of time is spent in between executions. Does that part represent the memory transfer ?
A screenshot of the profiling timeline is in the attachments

Any suggestion is appreciated.

Thank you in advance,

Peter


Attached Images

quartus_pgm: Programming option E is illegal

$
0
0
Hello to everyone and sorry for newbie question
(considering that I don't know too much about Altera FPGA)
but I don't find the answer to that error either on Altera documentation or in this forum.

I am trying to dump the flash from a Cyclone III board,
precisely Cyclone III EP3C120F48417N and as flash Intel 256P30B,
through jtag with quartus_pgm version 13.0sp1 but I always receive the following error:
"Error (213002): Programming option E is illegal. Refer to --help for legal programming option combinations."
As programmer, I am using an USB-Blaster that works correctly with a Cyclone V.

Search devices in the chain.
Code:

Info: Command: quartus_pgm -a
Info (213045): Using programming cable "USB-Blaster [1-1]"
1) USB-Blaster [1-1]
  020F70DD  EP3C120/EP4CE115


Try to examine the flash.
Code:

Info: Running Quartus II 64-Bit Programmer
Info: Command: quartus_pgm --operation=E;smtest.pof;EP3C120 --mode=jtag --no_banner
Info (213045): Using programming cable "USB-Blaster [1-1]"
Error (213002): Programming option E is illegal. Refer to --help for legal programming option combinations.
Error: Quartus II 64-Bit Programmer was unsuccessful. 1 error, 0 warnings

Any suggestions to avoid that error?

Thank you so much!

Limiting the number of threads on NDRange kernels

$
0
0
Hi,

Is there a way to limit the number of simultaneous executing threads for a NDRange kernel??


My NDRange kernel has a high thread capacity (127 simultaneous threads) and uses local memories.
I suspect that the high number of threads is one of the causes that makes local memories being replicated several times (as the report says).

Is there an "elegant" way of limiting the number of concurrent (pipelined) threads so that the compiler reduces the memory usage??

Now, the compiler crazily replicates hardware as mad (event to more than 2000% as reported by the early estimator).

My current work-around is to introduce a barrier at the end of the outer-loop iteration.
It does not reduce the "thead capacity" reported by the early estimator, but it effectively reduces the memory replication factor.

Best Regards

cyclone V E dev kit programming failed in EPCQ mode

$
0
0
Hi:
I an trying to use the cyclone V E dev kit to practice, < Board Update Portal based on Nios II Processor with EPCQ > example but it always failed at 95% when I programming the .jic in to the EPCQ.
I have already changed configuration resistor R17 and R19. Dose anyone know why ?
Thanks!

Convert Rocketboard.org GHRD v17.1 from Quartus Prime Pro to Standard

$
0
0
For reasons not worth discussing here, we required that the Arria10 GHRD design be available in the Quartus Prime Standard tools suite. The below steps were used to convert the 17.1 GHRD design (provided by [RocketBoards.org|http://rocketboards.org/]) which was developed under Quartus Prime Pro, to build for the Quartus Prime Standard tools. The rbf generated by the Standard tools was confirmed to perform correctly (same as Pro rbf) on the Arria10 development board.

It is strongly recommended (practically mandatory) to have the GHRD design open in Quartus Prime Pro/Platform Designer while performing the conversion to Quartus Prime Standard.


  1. Make a copy of the 17.1 GHRD Pro directory that is known to have been successfully built and verified on the Arria10 Dev Brd.
  2. Within the copied version of the GHRD: Remove anything that wouldn't have existed when a 'new' project is created, such as, the gdb/ directory and all *.tcl, etc. The following 'rm -rf' was what is actual used, and the 'ls' shows the remaining files and directories.
    1. a10_soc_devkit_ghrd$ rm -rf ghrd_10as066n2/ *.tcl *.csv qdb tmp-clearbox top_level_template.v.terp Makefile *.xml hps_isw_handoff output_files sd_fat.tar.gz software tgz readme.txt .qsys_edit
    2. a10_soc_devkit_ghrd$ ls cti_tapping.stp fpga_niosii.sdc fpga_pr.sdc ghrd_10as066n2.dts ghrd_10as066n2.qsf ghrd_a10_top.v hps_sgmii.sdc jtag.sdc fpga_dp.sdc fpga_pcie.sdc fpga_sgmii.sdc ghrd_10as066n2.dtb ghrd_10as066n2.qpf ghrd_10as066n2.qsys ghrd_timing.sdc ip

  3. Rebuild EACH Platform Designer (Qsys) IP using Quartus Prime Standard tools, which has the affect of converting Pro Platform Designer IP to Standard Platform Designer IP.
    1. Edit the IPs' .qip files, to remove the "Pro" from the IP_TOOL_ENV "QsysPrimePro", to look like IP_TOOL_ENV "QsysPrime". The following grep/sed was used to perform this step, take note of the location in which is was executed.
      1. a10_soc_devkit_ghrd/ip/ghrd_10as066n2$ grep -rl 'QsysPrimePro' ./ | xargs sed -i 's/QsysPrimePro/QsysPrime/g'

    2. Within Quartus Prime Standard, "File" -> "Open Project" -> IP .qip and click through the "Next" windows, but ensure correct "Device" (*10AS066N3F40E2SG*) selected, and click "Finish".
    3. Execute "Start Analysis and Synthesis"
    4. Close Project (I repeatedly observed Quartus Prime Standard crashing when attempting to Open the next project, without Closing the current project).
    5. Repeat steps for each IP

  4. Convert the .Qsys from Pro to Standard:
    1. Edit the ghrd_10as066n2.qsys file: Remove " tool="QsysPro" ".

  5. Open the Qsys design in the Platform Designer, and re-declare each IP's "Type" and update IP "Parameters":
    NOTE: In this step, the IP Type names will be updated, parameters values changed and connections verified to match the Pro design.
    1. Launch Quartus Prime Standard, then launch the Platform Designer and open the 'modified' Platform Designer (ghrd_10as066n2.qsys) project.
    2. Notice that all of the IP modules are in RED text and that all of their Parameter:Type values are the same; 'altera_generic_component'.
    3. For each IP, execute the following steps:
      1. Click-on IP name in the System Contents window to highlight the Parameters.
      2. Within the Parameters tab, change the "Type" to match name used in Pro (For comparison, it helps to have Pro design open in Pro/Qsys(Platform Designer) tools).
        1. NOTE: There's no need to manually change "Version", as it should change automatically when correct "Type" is set and a "Refresh System" is executed in the next step.

      3. "File" -> "Refresh System" -> "Yes" (to "Save changes before refresh?"). The Version is updated as can be confirmed by reviewing the pop-up for "IP_name 17.1 (instead of 1.0)".
        1. NOTE: While the Type and Version have change, the conversion from Pro to Standard is NOT complete.

      4. Close the "Save and Refresh System Completed" pop-up window.
      5. If the IP has more parameters, modify them to match the Pro design.

    4. Review the connections to ensure they match the Pro implementation.
    5. "File" -> "Save"
    6. "Generate HDL" -> Check: Clear output directories for selected generation targets -> "Generate". Review the messages in the pop-up window.
    7. "Finish"

  6. Build the Quartus Prime Standard project and generate an rbf:
    1. Edit the .qsf:
      1. Replace "Pro" with "Standard": set_global_assignment -name LAST_QUARTUS_VERSION "17.1.0 Standard Edition"
      2. Remove/comment out Pro specific assignments: set_global_assignment -name GENERATE_PR_RBF_FILE OFF

    2. Open ghrd_10as066n2.qpf in Quartus Prime Standard.
    3. "Assignments" -> "Settings" -> "Files" : Remove *ghrd_10as066n2/ghrd_10as066n2.qip* from the list.
    4. "Processing" -> "Start Assembler" (should build everything). It seems to be normal for many of the entities to be reported as "entity does not exist in design". Since the generated RBF seems to work, it is unclear if this is truly an issue.



Good luck!

MAX10 ADC Core Channel Sequence Issue

$
0
0
Hi All,

I am having a small issue with the ADC Core-only IP. I have two channels enabled (11 and 12) and a controller that sequences between them: Once 11 is converted, it starts 12, then 11, and so on.

The problem is that when response_channel is 11, and response_valid is 1 response_data is showing the ADC count that corresponds to channel 12.

Here is a screenshot from the Signal Tap I have setup. The Signal Tap clock is 100 MHz, ADC Clock is 10 MHz
https://i.imgur.com/YTUzpyc.png

I was tying channel 12 (ADC1IN12, PAD F2 on 10M04DAF256C8G) low and tying channel 11 (ADC1IN11, PAD E3 on 10M04DAF256C8G) to the analog supply voltage. As you can see from the waveform, the ADC is reporting that channel 11 is low and channel 12 is tied high.

Any ideas why this might be happening? This seems like a fairly simple interface and my control signals seem to match that of the user manual. Any help would be greatly appreciated.

- Sam

Chip name for the Intel Cyclone 10 LP Evaluation Board and Quartus V18.0

$
0
0
I am just starting to use the With Quartus Prime (Lite) V18.0, with the
Intel Cyclone 10 LP Evaluation Board
.
In all the YouTube demonstrations at the start, for the setup, the "Cyclone 10P" is listed in the "Family, Device, Board Settings".
With this Quartus version at least all you see is Cyclone IV E and Cyclone GX in the Device Family dropdown box.
Numerous chip device names are listed below but I cannot find one that is programmable at the end -- very frustrating as I only find out at the end.

The chip label is "10CL025YU25617G" on my (new) Intel Cyclone 10 LP board.
Anybody got an answer?
What device do you use for your Cyclone FPGA on the "Intel Cyclone 10 LP Evaluation Board" with Quartus?

Thanks in advance
John

OpenCL BSP Design

$
0
0
Does anyone has experience designing BSP?

I have a A10 board without BSP, and it have 2 DDR4 memory,
so I try to follow opencl dsp design tutorial to design a bsp,
I copied a10_ref, and named it my_board,
however, a10_ref has only 1 DDR memory, and the device and pin planner doesn't fit to my board,
I first modified device in device.tcl from 10AX115S2F45I1SG to 10AX115N3F40E2SG, and also modified pin in flat.qsf.
then I open board.qsys, acl_ddr4_a10.qsys, acl_ddr4_a10_core.qsys and click "sync all system info" to upadte device in .qsys file.
then I generate HDL,
then I run base compile with export ACL_QSH_COMPILE_CMD="quartus_sh –-flow compile top -c base" and compile boardtest.cl,
however, I keep getting this error

Info: Command: quartus_syn --read_settings_files=off --write_settings_files=off top -c base
Info: Using INI file /root/intelFPGA_pro/17.0/hld/board/custom_platform_toolkit/tests/boardtest/boardtest/quartus.ini
Info: qis_default_flow_script.tcl version: #1
Info: Initializing Synthesis...
Info: Project = "top"
Info: Revision = "base"
Info: Analyzing source files
Error (18185): Your design contains IP components that must be regenerated. To regenerate your IP, use the Upgrade IP Components dialog box, available on the Project menu in the Quartus Prime software
Error (18186): You must upgrade the IP component instantiated in file ip/board/board_kernel_clk_gen.ip to the latest version of the IP component.
Error (18186): You must upgrade the IP component instantiated in file ip/board/board_pcie.ip to the latest version of the IP component.
Error: Flow failed: ERROR: Current design not found

Input Soft_LVDS

$
0
0
Hi!
Who knows how to connect differential LVDS inputs to ALTERA_SOFT_LVDS?
Attached Images

Error deleting long path file name. Please help

$
0
0
How to delete long file which name is too long?

modelsim does not generate clock signal

$
0
0
Hello,
I am trying to use testbenches in a big project. Quartus created tb itself and I just added a few lines within this testmodule:

initial begin
clk_clk = 0;
end
always #10 clk_clk = ~clk_clk

Then by using RTL simulation I opened modelsim which added all wires of the project correctly but it is not simple generate clock
pic related
1) I did not find any explanation about grey lines. Why it even Pu0 (pull down?)
2) What did i wrong? I tried to follow altera RTL simulation guide
Looking forward any help :c i am very new to Quartus
Attached Images
Viewing all 19390 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>