512-deep FIFO showing "full" after a single word has been written.

March 20, 2018, 2:12 am

≫ Next: How to deal with the Out-of-Order Loop Iterations in single work-item kernel?

≪ Previous: ALTGX: Byte Ordering is disabled when Rate Match FIFO is enabled

I have recently encountered a bug where the Altera FIFO v17.1 Megafunction is driving the "full" signal after a single word has been written.
I have noticed that this seems to be timing related. In the best case build, I've seen it occur once in ~400 reprograms. In the worst case, it occurs once ~30 reprograms. The rest of the time, the FIFO behaves as expected. It also does correlate with temperature. I see the error occur more often at higher temperatures. In all builds, my design meets timing closure and I have double checked my constraints closely. I have not added any custom false-path constraints in my design.

The FIFO is a single-clock FIFO with the following configuration:

defparam
	scfifo_component.add_ram_output_register = "OFF",
	scfifo_component.intended_device_family = "Cyclone IV E",
	scfifo_component.lpm_numwords = 512,
	scfifo_component.lpm_showahead = "ON",
	scfifo_component.lpm_type = "scfifo",
	scfifo_component.lpm_width = 72,
	scfifo_component.lpm_widthu = 9,
	scfifo_component.overflow_checking = "ON",
	scfifo_component.underflow_checking = "ON",
	scfifo_component.use_eab = "ON";

I caught this behaviour in SignalTap.

I have edited in the signal names. At time 0, the fifo empty is driven, and my code drives a write request and supplies data on the data lines. This fifo is 512 words large, so while I understand that there is a latency between the data being written to the write port and read from the read, I would not expect the FIFO full to be asserted so soon.

I continue to write into the fifo. The write request signal is driven by a separate counter which keeps track of the number of writes that have occurred. It doesn't use the full. There is an overflow detect mechanism in the code that detects a write while it is full (that I used to discover this error).

I perform a read request every 7th cycle, and see that the fifo is dropping the 6 samples in-between (indicative that the fifo is behaving as if it's full, even though only a couple of samples have been written of the 512). I don't have this shown on this particular SignalTap instance but saw it previously at another debug stage.

As an example of a normal behaviour, here is what it looks like in the working case. In this example I have added the almost_full and almost_empty signals too.

This is the expected behaviour. After the first write, the full is not asserted and the wrreq continues to write data for the following 200 words. The read request reads every 7th word and continues to do so after the writes have been completed (until it empties the fifo, not seen here).

This is the behaviour seen >97% of the time the FPGA is reprogrammed. This error is occurring on a Cyclone IV E device and I'm using Quartus Prime Lite Edition, V 17.1.0, Build 590 in Linux Debian Jessie.

I have done a fair amount of searching and have found examples of a bug in cross-clock FIFOs and where FIFOs are small, but nowhere have I found a report of this happening on a larger, single clock fifo. Is this a known issue, or has anyone run into this before? My next step will be to contact Altera directly by opening a support ticket.

Thanks,
Stacey

Attached Images

empty_full_simultaneous.jpg (27.7 KB)
empty_full_working.jpg (26.1 KB)

↧

How to deal with the Out-of-Order Loop Iterations in single work-item kernel?

March 20, 2018, 3:59 am

≫ Next: Implementing ARM Cortex M to Altera MAX10

≪ Previous: 512-deep FIFO showing "full" after a single word has been written.

Hi,

Today I tried to use single work-item kernel. I have a nested loop. In Loop Report, I found my outer loop not pipelined due to:

Loop iteration ordering: iterations may get out of order with respect to the inner loop,
as the number of iterations of the inner loop may be different for different iterations of this loop.

I understood this problem. For different outer iterations of outer loop, actually I need different number of iterations of inner loop. And in "Out-of-Order Loop Iterations" section of the Best Practices Guide, I found an example, it is just similar to my code:

Code:

__kernel void order( __global unsigned* restrict input,

                              __global unsigned* restrict output, int N ) {

    unsigned sum=0;

    for (unsigned i = 0; i < N; i++) {

        for (unsigned j = 0; j < i; j++)

            sum += input[i+j];

    }

    output[0] = sum;

}

But no solution is mentioned here. How can I pipeline the loop? Or how to deal with this problem? If I use multiple kernels, will it work?

↧

Implementing ARM Cortex M to Altera MAX10

March 20, 2018, 5:20 am

≫ Next: Altera Flex 10K Driver

≪ Previous: How to deal with the Out-of-Order Loop Iterations in single work-item kernel?

Hi,

Does anyone has experience implementing ARM Cortex M to Altera MAX10 FPGA?

↧

Altera Flex 10K Driver

March 20, 2018, 7:28 am

≫ Next: Execution of vlib Failed (Ubuntu 16.04)

≪ Previous: Implementing ARM Cortex M to Altera MAX10

Hi guys,

I've ran into an issue here at work. A PC in the Dental Operatory was imaged to Windows 7. I'm now trying to find a driver for an Altera Flex PCI Fiber Card. It appears to be a 10K judging by the string of text on the chip. Would anyone know if or where I might find a driver for this device?

↧

Execution of vlib Failed (Ubuntu 16.04)

March 20, 2018, 12:33 pm

≫ Next: [QuestaSim] WLF (dataset) files compatibility

≪ Previous: Altera Flex 10K Driver

I have been running into an issue where I cannot run simulation because of some errors where it cannot find "vlib". It's running Quartus 17.1 Lite and Modelsim-Intel SE 10.5b on Ubuntu 16.04.

I run from Quartus, and it automatically runs the simulation with a script, but gives me several errors. By the looks of it, the program is searching in a non-existent directory for .../linuxpe/vlib. I found that file in .../linuxaloem/vlib, but i don't understand why it's doing it that way. I had previously had to install a bunch of 32-bit libraries to get Modelsim to work at all. Is there something that needs changed or am i missing some libraries?:

# do fred_run_msim_rtl_verilog.do
# if {[file exists rtl_work]} {
# vdel -lib rtl_work -all
# }
# vlib rtl_work
# vmap work rtl_work
# Model Technology ModelSim - Intel FPGA Edition vmap 10.5b Lib Mapping Utility 2016.10 Oct 5 2016
# vmap work rtl_work
# Copying /home/jakeros/intelFPGA_lite/17.1/modelsim_ase/linuxaloem/../modelsim.ini to modelsim.ini
# Modifying modelsim.ini
#
# vlog -vlog01compat -work work +incdir+/media/sf_Shared1/fred {/media/sf_Shared1/fred/jaa721_processor.v}
# Model Technology ModelSim - Intel FPGA Edition vlog 10.5b Compiler 2016.10 Oct 5 2016
# Start time: 15:20:23 on Mar 20,2018
# vlog -reportprogress 300 -vlog01compat -work work "+incdir+/media/sf_Shared1/fred" /media/sf_Shared1/fred/jaa721_processor.v
# ** Error: (vlog-66) Execution of vlib failed. Please check the error log for more details.
# sh: 1: /home/jakeros/intelFPGA_lite/17.1/modelsim_ase/linuxpe/vlib: not found
# End time: 15:20:24 on Mar 20,2018, Elapsed time: 0:00:01
# Errors: 1, Warnings: 0
# ** Error: /home/jakeros/intelFPGA_lite/17.1/modelsim_ase/linuxaloem/vlog failed.
# Error in macro ./fred_run_msim_rtl_verilog.do line 8
# /home/jakeros/intelFPGA_lite/17.1/modelsim_ase/linuxaloem/vlog failed.
# while executing
# "vlog -vlog01compat -work work +incdir+/media/sf_Shared1/fred {/media/sf_Shared1/fred/jaa721_processor.v}"

↧

[QuestaSim] WLF (dataset) files compatibility

March 20, 2018, 12:52 pm

≫ Next: Local variables in Kernel

≪ Previous: Execution of vlib Failed (Ubuntu 16.04)

Hi All,

I have a WLF file, which was created with Questa v10.6, but I have Questa v10.4.

So, it seems there is some incompatibility in the WLF versions, which are generated by the tool - please see the error message below:

Code:

# WLF Error: Cannot open WLF file because of version incompatibility.# The WLF file version is 170.  Modelsim 10.4c can read up to and including WLF file version 150.

# Cannot open file: D:/LPHUD/units/bit/verification/bit_20180320_1319.wlf

So, is it possible to save the WLF file in the Questa v10.6, so that it will be possible to open it in Questa v10.4?

Thank you!

↧

Local variables in Kernel

March 20, 2018, 1:55 pm

≫ Next: changing output from two always blocks

≪ Previous: [QuestaSim] WLF (dataset) files compatibility

Hi!

I was getting bad results in my kernel, so i decided to printf the local variables marked on the code in bold and blue.

At every iteration of the 2nd loop the value from previous iteration is stored instead of creating a new local variable. Whats going wrong?

P.S- In red i have a memory dependency, how i resolve this?

Code:

int total_gin[20000];



  for(int w=0; w < row;w++)

  {





    int fcont = 0;

    int row_mat = w*col;





    for(i = 0; i < (col >> 3);i++)

    {

      int aux = i * 8;

      int lcontsum = 0;

      int ccol_co[8];

      int copl=0;

      int aux_tt[8];

      int lcont[8];





      #pragma unroll 8

      for(int j=0;j < 8;j++)

      {





        int aux_g1 = aux + j;





        lcont[j] = loc_col[aux_g1] & (g1[aux_g1] != 0);





        lcontsum += lcont[j];





        if(lcont[j])

        {

            ccol_co[copl] = j+1;

            copl++;

            aux_tt[j]=1;

        }

      }





      for(int bb=0; bb < 8;bb++){

        total_gin[aux + bb] += aux_tt[bb];

        int ax_col = (aux + (ccol_co[bb]-1));





        if(ccol_co[bb] != 0){

          in_cols[ax_col*w  + total_gin[ax_col]] = w;

        }





      }





      fcont += lcontsum;

    }





  }

↧

changing output from two always blocks

March 20, 2018, 4:44 pm

≫ Next: Does anyone have board and dts files that contain i2c0-mux entry?

≪ Previous: Local variables in Kernel

I'm new to verilog so please forgive any ignorance:

i'm using the vga_adapter module which for the purpose of this question just requires an x and y value

I use two always blocks to do the math and get the values i want: reg [7:0] x0,y0 and reg [7:0] x1,y1 from always block 1 and 2 respectively, how would I code something such that the vga_adapter gets x0,y0 if write_en = 1, or else x1,y1

↧

Does anyone have board and dts files that contain i2c0-mux entry?

March 20, 2018, 5:49 pm

≫ Next: Reduce logic utilization

≪ Previous: changing output from two always blocks

Hi,

I'm currently using a DE10-Standard board. I noticed that if I use the Linux Console image and write it to a microSD card, the card will contain four files.

fpga.dtbo
socfpga.dtb
u-boot.scr
zImage

After I've booted and logged in as root, I see that there is an i2c0-mux under the /sys/class/leds directory. This allows me to use HPS_CONTROL_I2C functionality, which is a way to use the HPS to access the onboard ADV7180 Video chip. I can use the C code found in the hps_i2c_switch to read registers contained in the ADV7180.

Even if I remove the fpga.dtbo from the microSD card, I can still boot Linux completely and I still see i2c0-mux in the /sys/class/leds directory. Somehow, the dts file that was used to generate the socfpga.dtb file contains an entry for the i2c0-mux in the device tree.

I would like to use the i2c0-mux in my own projects but I don't know the device tree syntax that is needed to make it work.

I was wondering if anyone in the forum can point me to board info and dts files that allow me to create a dtb file that contains the i2c0-mux entry.

Even better, instructions on how to create the socfpga.dtb file that is part of the Linux Console image.

I've tried some things out myself but I'm not sure if it is complete. My first go was adding this right below hps0_led in the hps_common_board_info.xml file.

<DTAppend name="label" type="string" parentlabel="i2c0mux" val="i2c0-mux"/>
<DTAppend name="gpios" parentlabel="i2c0mux" >
<val type="phandle">hps_0_gpio1_porta</val>
<val type="number">19</val>
<val type="number">0</val>
</DTAppend>

I haven't been able to get it to work because my .rbf file somehow prevents HPS I2C access.

Thanks,
Raul

↧

Reduce logic utilization

March 20, 2018, 7:51 pm

≫ Next: Quartus 14.1:How to generate MCS file ?

≪ Previous: Does anyone have board and dts files that contain i2c0-mux entry?

Hi,

I have this part in my kernel where it takes too much logic

Code:

if(relu == 1){ 

if(out < 0 )

      conv_in = 0.1*out;

else 

      conv_in = out;

 }

out is a float data. The report.html shows me it taking 4k aluts and 8k ff for this function which is too much for my de1soc to handle. Any idea how to reduce it?
Btw, the function is a leaky activation function where negative data will mutliply by 0.1.
Thanks in advance.

EDIT:
Whats the ups and downs in using these two compiler flag.
1) -fp-relaxed
2) -fpc

↧

Quartus 14.1:How to generate MCS file ?

March 21, 2018, 4:13 am

≫ Next: Huge Size Hello world with FreeRTOS on NIOS II

≪ Previous: Reduce logic utilization

Using Quartus 14.1[64-bit web Edition]

After programming .sof file, how to transfer it into MCS format, which can be loaded into flash chip?

I tried converting .sof programming file to .jic file.

While generating jic file, getting Error Message "Serial Flash Loader" is missing.

Thanks

↧

Huge Size Hello world with FreeRTOS on NIOS II

March 21, 2018, 6:41 am

≫ Next: Max 10 dac output

≪ Previous: Quartus 14.1:How to generate MCS file ?

Hello everyone,

I've created hello world program with FreeRTOS on NIOS II based on CYCLONE III FPGA.
Elf file created is of huge size, 8244 KBytes to be exact.
This is size is not acceptable as I intend to run code from on-chip memory and I've only 128 KB on-chip memory available.
I've tried hello world program with uCOSII as well and elf size is 102 KB only.

Can anybody please suggest why hello world based on FreeRTOS is so large in size ?

Thank you,

↧

Max 10 dac output

March 21, 2018, 7:02 am

≫ Next: IP block eror

≪ Previous: Huge Size Hello world with FreeRTOS on NIOS II

Hey everyone,
I'm trying to check if the DAC_SMA connector is working with the code below.
i compiled it and programmed on the max10 board, but when i connected it to an external scope I see nothing.

what could be the reason?

i simulated it with modelsim and the time constraints of the DAC8551 and the code are met.
from my understanding the scope should show steady voltage level.

Code:

library IEEE;use IEEE.STD_LOGIC_1164.ALL;

use IEEE.STD_LOGIC_ARITH.ALL;

use IEEE.STD_LOGIC_UNSIGNED.ALL;

ENTITY try IS

PORT(USER_PB:IN STD_LOGIC_vector(0 downto 0);

                CLK_10_ADC:IN STD_LOGIC;

                SYNC:OUT STD_LOGIC;

                SCLK:OUT STD_LOGIC;

                DIN:OUT STD_LOGIC;

                USER_LED:OUT STD_LOGIC_VECTOR(2 DOWNTO 0)

                --USER_LED:OUT STD_LOGIC_VECTOR(2 DOWNTO 0)

                );

END;

ARCHITECTURE try OF try IS

SIGNAL SYS_CLK: STD_LOGIC;

SIGNAL ST:STD_LOGIC_VECTOR(2 DOWNTO 0);

SIGNAL RDATA:STD_LOGIC_VECTOR(23 DOWNTO 0);

SIGNAL CNT:STD_LOGIC_VECTOR(7 DOWNTO 0);

SIGNAL DATA:STD_LOGIC_VECTOR(15 DOWNTO 0);

BEGIN

DATA<="1111111111111111";

PROCESS(USER_PB,CLK_10_ADC)

BEGIN

        IF(user_pb(0)='0')THEN

                ST<="000";

                USER_LED<="000";

                SYNC<='1';

                SCLK<='0';

                DIN<='0';

                CNT<="00000000";

                ELSIF(CLK_10_ADC='1' AND CLK_10_ADC'EVENT)THEN

                                USER_LED<=NOT(ST);

                                CASE ST IS

                                                WHEN "000"=>

                                                        SCLK  <='0';

                                                        RDATA(23 DOWNTO 0)<="00000011"&DATA; 

                                                        SYNC  <='0'; 

                                                        ST<= "001"; 

                                                        --USER_LED<= "110" ;

                                                WHEN "001"=>

                                                        SCLK  <='1';

                                                        RDATA(23 DOWNTO 0)<=RDATA(22 DOWNTO 0)&'0'; 

                                                        DIN<=RDATA(23);

                                                        SYNC  <='0'; 

                                                        ST<= "010";

                                                        --USER_LED<= "101"        ;

                                                WHEN "010"=>

                                                        CNT <= CNT +'1'; 

                                                        ST<= "011";

                                                        --USER_LED<= "100";

                                                        SCLK  <='0' ;

                                                        IF(CNT=24)THEN

                                                                ST<="100";

                                                                --USER_LED<= "011";

                                                                CNT<="00000000";

                                                                SYNC<='1';

                                                        ELSE

                                                                ST<="001";

                                                                --USER_LED<= "110";

                                                        END IF;

                                                WHEN "100"=>

                                                        IF (CNT=4)THEN

                                                                CNT<="00000000";

                                                                ST<="000" ;

                                                                --USER_LED<= "000";                        

                                                                DIN   <='0' ;

                                                        ELSE

                                                                CNT<=CNT+'1';

                                                        END IF;

                                                WHEN OTHERS=>

                                                        NULL;

                                END CASE;

                END IF;

        END PROCESS;

END ARCHITECTURE;

↧

IP block eror

March 21, 2018, 7:26 am

≫ Next: Arria 10 in quartus prime

≪ Previous: Max 10 dac output

Hello!

I am trying to compile an OpenCL kernel called SimpleKernel.cl. Emulation and profiling work fine. I am stuck with the final aoc execution command. I have tried setting -board=a10gx and -board=a10gx_hostch (only two options) and always end up with the following error:

aoc SimpleKernel.cl -v -board=a10gx_hostch
aoc: Environment checks are completed successfully.
aoc: If necessary for the compile, your BAK files will be cached here: /var/tmp/aocl/
You are now compiling the full flow!!
aoc: Selected target board a10gx_hostch
aoc: Running OpenCL parser....
aoc: OpenCL parser completed successfully.
aoc: Optimizing and doing static analysis of code...
aoc: Linking with IP library ...
Checking if memory usage is larger than 100%
aoc: First stage compilation completed successfully.
Compiling for FPGA. This process may take a long time, please be patient.
Error (18185): Your design contains IP components that must be regenerated. To regenerate your IP, use the Upgrade IP Components dialog box, available on the Project menu in the Quartus Prime software
Error: Flow failed: ERROR: Current design not found
Error: Quartus Prime Synthesis was unsuccessful. 4 errors, 0 warnings
Error (23031): Evaluation of Tcl script import_compile.tcl unsuccessful
Error: Quartus Prime Compiler Database Interface was unsuccessful. 1 error, 0 warnings
Error: Compiler Error, not able to generate hardware

I have already tried upgrading the IP blocks in Quartus Prime. Though the message log indicates that this was successful, I try running aoc again and end up with the same error. Then when I go back to Quartus, it states that I need to update the IP blocks again. All of the environmental variables are set exactly as outlined in the SDK manual, and the version (17.1.0 build 240) is consistent.

Any suggestions on what to try next would be appreciated.

↧

Arria 10 in quartus prime

March 21, 2018, 10:49 am

≫ Next: Verifying final channel depth

≪ Previous: IP block eror

Greetings
My issue is related to the choosing of Arria10 device in Quartus prime [standard ] . When I chose Arria 10 device through new project wizard, Quartus tool shuts down automatically. I have attached screen shot of the error. Please guide me to resolve this issue
Thanking You
Anish

Attached Images

device choose error screen shot.jpg (261.8 KB)

↧

Verifying final channel depth

March 21, 2018, 11:48 am

≫ Next: Do Arria A10 LVDS inputs support weak pull ups/downs?

≪ Previous: Arria 10 in quartus prime

Is there a way to verify the final, post compiled, depth of a channel. I have a design that requires a deep fifo, critical for maximizing throughput performance. I want to verify that the compiler has in fact set the depth to what I've set using the depth parameter. Am I safe to assume that the implemented channel will never be less than the specified depth.

↧

Do Arria A10 LVDS inputs support weak pull ups/downs?

March 21, 2018, 12:25 pm

≫ Next: DMA Flow Control - Memory to Avalon MM Peripherals

≪ Previous: Verifying final channel depth

Does the Arria A10 support internal weak pull-ups/pull-downs on the LVDS input lines? I can't find this info in the datasheet?

↧

DMA Flow Control - Memory to Avalon MM Peripherals

March 21, 2018, 7:10 pm

≫ Next: Free Tools?

≪ Previous: Do Arria A10 LVDS inputs support weak pull ups/downs?

Hello,

Still rather new to FPGA design, and I had a question about implementing a DMA controller.

If a DMA transfer is coordinated from a incrementing memory location to a simple MM-Slave peripheral, like the Altera SPI core, how is flow control properly achieved? I understand that the SPI core can initiate an IRQ event when it's TX Ready bit goes high, but what is the proper way to coordinate the SPI (or other simple CSR peripheral) with the DMA transfer activity?

If I am writing a long transfer to a non-incrementing peripheral address, it seems like this would cause TX overflow issues, or the DMA would need to interface with the slave's status registers to coordinate it's next word transfer. Surely the CPU wouldn't be working ISRs for each write if the concept of DMA is to reduce processing overhead? Is it necessary to implement a FIFO on the peripheral slave port?

This is a question I've been encountering a lot lately, and I'd like to understand how this kind of transfer coordination takes place. What is the method for equating a TX ready status bit in the target peripheral to a TREADY-like signal for throttling a DMA transaction?

Thanks a lot!

↧

Free Tools?

March 21, 2018, 7:28 pm

≫ Next: linux support on Intel Cyclone 10 GX FPGA Development Kit

≪ Previous: DMA Flow Control - Memory to Avalon MM Peripherals

What FREE tools are available for synthesis and simulation?

FREE means:
*gratis (no cost)
*license (if required) lasts forever; never expires
*license cannot be tied to hardware
*anybody and everybody can get and use the tool
*could be closed source, but prefer open source
*must run on Linux
*must be statically linked to avoid library issues in the future

I want to play around with SoC architectures making high-bandwidth data radios for amateur radio.

If it is a hassle to get the tools, or if there are any financial costs, no large community will develop as it has in the Linux world. So it must meet all of the requirements listed above. I'm willing to pay for hardware, but not software.

This is for a hobby: amateur radio.

If there are any such tools that meet these requirements, then what limitations do they have?

Thank you,
Ken Hendrickson

PS As long as FPGA manufacturers charge money for their software tools, there will never develop a maker community experimenting with them as a hobby.

↧

linux support on Intel Cyclone 10 GX FPGA Development Kit

March 21, 2018, 10:41 pm

≫ Next: De1Soc Mutlithreading

≪ Previous: Free Tools?

Hi All,

As we have Intel Cyclone 10 GX FPGA Development Kit and we tested it on UClinux.Now we want to test it on normal linux kernel 4.15.

So my question is :
1) Is it possible to boot normal linux on Intel Cyclone 10 GX FPGA Development Kit
2) if it so will you provide us the BSP for the same

Thanks,
Simit

↧