How does the producer, consumer, and manager strategy in the programming guide work?

March 1, 2018, 8:50 pm

≪ Previous: SRAM ECC : single bit error in 7 bits ECC not detected, cyclone V FPGA.

In Intel FPGA SDK for OpenCL programming guide Page 43, a producer, consumer and manager strategy is mentioned.
There is also reference code for producer.

__kernel void __attribute__((task)) producer(
__global const int* restrict src,
__global volatile int* restrict shared_mem,
const int iterations)
{
int base_offset;
for(int gid = 0; gid < iterations; gid++){
int lid = 0xff & gid;
if(lid == 0){
base_offset = read_channel_intel(req);
}

shared_mem[base_offset + lid] = src[gid];
mem_fence(CLK_GLOBAL_MEM_FENCE | CLK_CHANNEL_MEM_FENCE);

if(lid == 255){
write_channel_intel(c, base_offset);
}
}
}

I searched around, but I didn't see any detailed example code of this strategy.

According to my understanding, the shared_mem is used as the "channel".
Seems it requires the producer and consumer to work on a loop with identical loop count "iterations".
However, a simple regular opencl channel should be able to to the same thing, I don't see there is any reason
using this design.

If the producer does some filtering, for instance, it discards data that is larger than 10 and sends the
rest to the consumer. Basically the loop count in the consumer is not fixed.
In this case, the basic on-chip channel doesn't work while the producer, consumer and manager scheme can't be applied too.
As the producer has no feed back signal sent to the manager, then the manager can't decide if the consumer has finished all the
processing.

In general, I have two questions.
1) Is there a way to do on-chip communication between two kernels using OpenCL when the amount of data communication
is not determined at compilation time. You may take the simple filter as an example. Kernel 0 filters input data streams and extracts the data
that is larger than 10. Kernel 1 gets the output data of Kernel 1 and does some processing.

2) Will the producer, consumer and manager scheme work for the filter example? If not, when will it be used to replace
the basic on-chip channel based design?

Any suggestions will be appreciated.

Regards,
Cheng Liu

↧

where is my post?

March 1, 2018, 8:59 pm

≫ Next: producer, consumer and manager in Intel OpenCL programming guide.

≪ Previous: How does the producer, consumer, and manager strategy in the programming guide work?

I can't see my post in the forum?
I can't even see any record about it in my own profile.

↧

producer, consumer and manager in Intel OpenCL programming guide.

March 2, 2018, 12:38 am

≫ Next: MAX10 programmed with POF won't start

≪ Previous: where is my post?

On page 43 in Intel FPGA SDK for OpenCL programming guide,
I see there is a producer, consumer and manager model proposed for kernel communication.

I read the code carefully, but still I don't understand when this method can be used.
I think the same logic can be implemented using basic on-chip channel.
In addition, there is memory fence after each src read, will it affect burst memory operations?
Why not put the memory fence in the " if (lID == 255)"?

Here is the reference producer code.

__kernel void producer (__global const uint * restrict src,
__global volatile uint * restrict shared_mem,
const uint iterations)
{
int base_offset;
for (uint gID = 0; gID < iterations; gID++)
{
// Assume each block of memory is 256 words
uint lID = 0x0ff & gID;
if (lID == 0)
{
base_offset = read_channel_intel(req);
}
shared_mem[base_offset + lID] = src[gID];
// Make sure all memory operations are committed before
// sending token to the consumer
mem_fence(CLK_GLOBAL_MEM_FENCE | CLK_CHANNEL_MEM_FENCE);
if (lID == 255)
{
write_channel_intel(c, base_offset);
}
}
}

↧

MAX10 programmed with POF won't start

March 2, 2018, 3:55 am

≫ Next: ALT_SLD_FAB error

≪ Previous: producer, consumer and manager in Intel OpenCL programming guide.

Device in question is 10M04SCE144.

I am using Altera's for last 30 years or so, so I am pretty familiar with it. Now I have a new project, using MAX10 device for the first time. I've downloaded Quartus 17.1 Lite, created a test project (blinky) which works when SOF file is downloaded. However, whatever I do, I can't get it running when I program device with POF file ( programming and verifying completes without errors, btw). Please note that the configuration-related pins are disabled (all in user I/O mode).

I guess I am missing something, but can't figure out what can it be, so any help/suggestion is highly appreciated.

↧

ALT_SLD_FAB error

March 2, 2018, 6:36 am

≫ Next: Arria 10 GX FPGA dev kit examples: JTAG ID code issue

≪ Previous: MAX10 programmed with POF won't start

Hello,

I have been using Quartus Prime 17.0v. When I compile DE1_SoC_SDRAM_Nios_Test project, it gives following errors.

Error (11176): Set_instance_parameter_value: There is no parameter named DESIGN_HASH on instance alt_sld_fab
Error (11176): Alt_sld_fab.: version not allowed for EModuleProperty, must be in {[DESCRIPTION, NAME, VERSION, MODULE_TCL_FILE, MODULE_DIRECTORY, INTERNAL, HIDE_FROM_SOPC, HIDE_FROM_QSYS, HIDE_FROM_QUARTUS, OPAQUE_ADDRESS_MAP, GROUP, AUTHOR, ICON_PATH, DISPLAY_NAME, DATASHEET_URL, TOP_LEVEL_HDL_FILE, TOP_LEVEL_HDL_MODULE, INSTANTIATE_IN_SYSTEM_MODULE, EDITABLE, VALIDATION_CALLBACK, EDITOR_CALLBACK, ELABORATION_CALLBACK, GENERATION_CALLBACK, COMPOSITION_CALLBACK, PARAMETER_UPGRADE_CALLBACK, OUTDATED_IP_FILE, ANALYZE_HDL, STATIC_TOP_LEVEL_MODULE_NAME, FIX_110_VIP_PATH, SUPPORTED_DEVICE_FAMILIES, REPORT_TO_TALKBACK, ALLOW_GREYBOX_GENERATION, SUPPRESS_WARNINGS, STRUCTURAL_COMPOSITION_CALLBACK, NATIVE_INTERPRETER, PREFERRED_SIMULATION_LANGUAGE, REPORT_HIERARCHY, UPGRADEABLE_FROM]}
Error (12154): Can't elaborate inferred hierarchy "sld_hub:auto_hub"
Error: Quartus Prime Analysis & Synthesis was unsuccessful. 3 errors, 32 warnings
Error: Peak virtual memory: 883 megabytes
Error: Processing ended: Fri Mar 02 17:26:26 2018
Error: Elapsed time: 00:00:52
Error: Total CPU time (on all processors): 00:01:22
Error (293001): Quartus Prime Full Compilation was unsuccessful. 5 errors, 32 warnings

As I read on Altera's support for these features SignalTap II Logic AnalyzerIn-System Memory Content EditorLogic Analyzer InterfaceNios® II JTAG UARTNios II On-Chip Debugging (OCD) quartus using sld_hub file and I'm using 2 of them.

For this error Error (12154): Can't elaborate inferred hierarchy "sld_hub:auto_hub" support says delete ./megafunction folder in archive but I cant find this folder in project. This folder is in Quartus Library.
I think there is problem in ALT_SLD_FAB files but I can not solve the problem yet.
I would be glad, if you help.

↧

Arria 10 GX FPGA dev kit examples: JTAG ID code issue

March 2, 2018, 7:31 am

≫ Next: Documentation discrepency for MAX10m50 pin outs

≪ Previous: ALT_SLD_FAB error

I'm attempting to load the current example files for the Arria 10 GX FPGA dev kit and it is coming up with:

Error (209015): Can't configure device. Expected JTAG ID code 0x02E860DD for device 1, but found JTAG ID code 0x02E660DD. Make sure the location of the target device on the circuit board matches the device's location in the device chain in the Chain Description File (.cdf).

The ID for the device on my board is correct (0x02E660DD) but the programmer is expecting something else. I have tried loading via the BTS software and the Quartus programmer and both have the same issue.

Has anyone else experienced this?

↧

Documentation discrepency for MAX10m50 pin outs

March 2, 2018, 8:12 am

≫ Next: Connecting my own MAC design to the DE10-Nano PHY (via HPS pins)

≪ Previous: Arria 10 GX FPGA dev kit examples: JTAG ID code issue

Hi All,
I am designing a board with a MAX10m50SCE144. I downloaded the Excel pinout spreadsheet which shows CLK0n and CLK0p on pins 25 & 26 and CLK1n and CLK1p are on 27 & 28. But in the Quartus Pin Planner (for the same part) the clock pins are shown on pins 26 & 27 for CLK0 and on pins 28 & 29 for CLK1. Which document is correct??

Both of these documents are the very latest versions which I downloaded over the last few days so I don't believe I'm looking at aged documents.

Thanks,
Scott

↧

Connecting my own MAC design to the DE10-Nano PHY (via HPS pins)

March 2, 2018, 8:24 am

≫ Next: why only one pll? and no internal oscillator

≪ Previous: Documentation discrepency for MAX10m50 pin outs

Hi community,

I have a DE10-Nano board where I would like to prototype my own 100 Mbit MAC design instead of the one (EMAC0 or EMAC1) hardwired in the HPS. From what I understand, the MII interface of the on-board PHY is connected to pins which are mapped to the HPS I/O. So how can I route these MII signals directly to the FPGA fabric (without using the HPS EMAC)?

I saw in the Platform Design interface that one is allowed to multiplex these pins via 'loan I/Os" or "GPIOs", but I am not sure of which method is correct.

thanks for your help:o:o

Marco

↧

why only one pll? and no internal oscillator

March 2, 2018, 10:02 am

≫ Next: FPGA2HPS SDRAM Bridge Appears to Be Working but data is Zeros

≪ Previous: Connecting my own MAC design to the DE10-Nano PHY (via HPS pins)

Hi all,

I am tinkering with a schematic design using the Quartus lite software. I've created a project with Quartus lite with a top-level schematic. I've tried using the internal oscillator, altufm_osc wiring the enable high and trying to implement. The fitter reports Error (12024): WYSIWYG primitive "maxii_ufm_block1" is not compatible with the current device family. This is the only library component displayed that remotely resembles the internal oscillator that several pieces of documentation say is available to designs. What gives.

Similarly I have tried instantiating two PLLs and get a message there is only one available. According to the docs, there are supposed to be four in the MAX10m50SCE144 part. Is this because the free software doesn't support the chip functionality?

Thanks,
Scott

↧

FPGA2HPS SDRAM Bridge Appears to Be Working but data is Zeros

March 2, 2018, 10:41 am

≫ Next: Dev board available for cyclone 10 10CX105 and 10CL120?

≪ Previous: why only one pll? and no internal oscillator

I'm using Q 17.1 I have tried this on numerous Evaluation boards -- but will describe it for the Macnica Sodia.

This same code scheme worked with the Helio Evaluation board which used an older GSRD as the starting point a few years back.

I start with the Golden Design Quartus project AND the SD Card image.

I add a custom IP where the firmware is responsible for writing data to SD RAM memory (On signal tap we can see that the data is getting written to the specified location).

I create new *.rbf and new *.dtb to link the custom Linux device driver properly.
I also create a new preloader and u-boot - but this doesn't make any difference it I use the original GOLDEN u-boot and preloader or the ones created for this specific project.

The custom Linux device driver for this IP kmallocs two buffers and uses this address to set up the firmware IP. It then creates virtual addresses for user space for the "copy_to_user" in the read function.

Linux indicates -- ALL THE BRIDGES ARE ENABLED and I actually have the FPGA configured from the EPCQ and can see from the LEDs that it is configured BEFORE u-boot starts. SIGNAL TAP indicates that the data is being moved by the firmware.

BUT - when the device driver does the "copy_to_user" from the virtualAddress the user space code buffer is ALL ZEROS.

ANY ADVISE WOULD BE GREATLY APPRECIATED !!!

↧

Dev board available for cyclone 10 10CX105 and 10CL120?

March 2, 2018, 11:55 am

≫ Next: how to read/write to dual port ram

≪ Previous: FPGA2HPS SDRAM Bridge Appears to Be Working but data is Zeros

Hi are there development boards available for "larger" cyclone 10 like 10CX105 and 10CL120? basically we are looking for testing our design on a cyclone 10 product which has more than 100K logic cells. any advice/information is appreciated.

Thanks.

↧

how to read/write to dual port ram

March 2, 2018, 1:49 pm

≫ Next: Best way to dynamically select static length part of a bus

≪ Previous: Dev board available for cyclone 10 10CX105 and 10CL120?

I added a dual port ram in on-chip memory named "dpram" to qsys but don't know how to read or write to it in Eclipse.
Am using Cyclone 10LP and Quartus Prime Lite Edition 17.1.

I tried this simple code :
IOWC(DPRAM_BASE, 0 , 1);
int Byte = IORD( DPRAM_BASE, 0 ); // should read back the 1

error msg says: Symbol "DPRAM_BASE" could not be resolved.

Is there any literature describing how to write C code to dprams?

Attached Images

dpram.jpg (20.8 KB)

↧

Best way to dynamically select static length part of a bus

March 2, 2018, 2:47 pm

≫ Next: task vs always

≪ Previous: how to read/write to dual port ram

Hello,

I am trying to figure out how to select a w-bit(w is some parameter, shorter than the bus) part of a bus based on an incoming variable x with the smallest resource footprint possible, in one clock tick. I have tried two methods so far.
The first uses a shifter:

...
parameter bw=96
parameter w=8
...
input wire [$clog2(bw):0] x,
input wire [bw-1:0] inputWire,
output reg [w-1:0] out
...
always@(posedge clk) out=inputWire>>x;

At the second I use a for loop and multiplexers:

integer i;
always@(posedge clk)begin
for (i =0; i<bw-w;i=i+1) begin:myforloop
if(x==i) out=inputWire[i+:w];
end
end

I was quite surprised that the second method used much less resources(about half) than the first. Is there any reason for this? The only one I can come up with is that the second one covers a smaller range, but it still does not explain why the difference is this large. Also a single instance of the shifter used around 110 ALMs which is much more than I anticipated. Simpler adders take up like 4; even if the input is large like 96 it shouldn't take up so much space, should it?
Is there any better/more compact method to dynamically select part of a bus?

↧

task vs always

March 3, 2018, 1:41 pm

≫ Next: ALTLVDS_RX frame alignment

≪ Previous: Best way to dynamically select static length part of a bus

Hello, i know that always executes always when some parameter in the sensitivity list changes.
in what cases task executes?
if i have a task block and always block one after the other' which one executes first?

Thanks

↧

ALTLVDS_RX frame alignment

March 3, 2018, 2:36 pm

≫ Next: Quartus 2 web edition 15.0

≪ Previous: task vs always

Hello,

I am using the ALTLVDS_RX IP Core to deserialize data from TI's ADS5263 EVM (16 bit, quad channel, 100MSps ADC). I am using a DE2-115 board (Cyclone IV E). The ADC provides a DDR bit clock (8*fs) and a frame clock (=fs). When the data is aligned, the frame should read 0xFF00. The problem I am having is that the frame does not stay aligned. In order to read valid data I have to pulse the rx_data_align signal 16 times, essentially running through all of the possible frame alignments, every time. This causes incoming samples to be missed while the frame is not aligned. Does anyone have any ideas as to why this may be?

Some more details: I currently have one data channel and the frame clock as inputs to the ALTLVDS block. I am using fs = 20MHz and 16x deserialization. I saw in the IP Core User Guide that Cyclone devices only support up to 10x deserialization, but Quartus does not complain when I enter 16 as the SERDES factor in the Megafunction Wizard.

The rx_data_align pulse is controlled by a state machine. Below are a picture of my Verilog code for the state machine, as well as what I am seeing with Signal Tap.

Any suggestions would be much appreciated.

Thanks,
Hannah

↧

Quartus 2 web edition 15.0

March 4, 2018, 5:03 am

≫ Next: Bit memorisation in if statement

≪ Previous: ALTLVDS_RX frame alignment

How to find area and delay of the design in Quartus 2 web ed.

↧

Bit memorisation in if statement

March 4, 2018, 8:41 am

≫ Next: Simple Avalon-MM to onchip memory

≪ Previous: Quartus 2 web edition 15.0

Hi everyone,

First i'm sorry because i'm french, so my english may be bad.

My project consists to control many servo-motors via the DE0 Nano Board via UART.
In order to do that, i send serial data via a terminal like "100,120,80,200". There are 4 servos, 4 commands which are separated by commas, a character which is recognized like a separator to attribute the right value (pulse width) to the right servo via a counter incrementation.

For this separator, i've made a "block state" to not attribute the value of the "," character on my servo and wait the next value in my RX buffer.

My code will talks better than me :

Code:



sel_servo: process (reset_n, rx_buffer(d_width DOWNTO 1), clk, compteur)

begin

    if (reset_n = '0' OR compteur > 3) then

        compteur <= 0;

    elsif (clk'EVENT AND clk = '1') then

        if((rx_buffer(d_width DOWNTO 1) = "00101100") and block_state = '1') then

            compteur <= compteur + 1;

            block_state <= '0';

        else

            compteur <= compteur;

            block_state <= block_state;

            blockstate <= block_state;                                                 -- just watch the bit on LEDs

        end if;

    case compteur is

        when 0 =>

            if block_state = '0' and (rx_buffer(d_width DOWNTO 1) /= "00101100") then       -- "00101100" is the binary value for comma character

                block_state <= '1';

                posiS0 <= rx_buffer(d_width DOWNTO 1);

                pwmiS0 <= unsigned('0' & posiS0 ) + 64;

            end if;

        when 1 =>

            if block_state = '0' and (rx_buffer(d_width DOWNTO 1) /= "00101100") then            -- the "compteur" value represents the servo 

                block_state <= '1';

                posiS1 <= rx_buffer(d_width DOWNTO 1);

                pwmiS1 <= unsigned('0' & posiS1 ) + 64;            

            end if;

        when 2 =>

            if block_state = '0' and (rx_buffer(d_width DOWNTO 1) /= "00101100") then

                block_state <= '1';

                posiS2 <= rx_buffer(d_width DOWNTO 1);

                pwmiS2 <= unsigned('0' & posiS2 ) + 64;        

            end if;

        when 3 =>

            if block_state = '0' and (rx_buffer(d_width DOWNTO 1) /= "00101100") then

                block_state <= '1';

                posiS3 <= rx_buffer(d_width DOWNTO 1);

                pwmiS3 <= unsigned('0' & posiS3 ) + 64;

            end if;

        when 4 =>

            compteur <= 0;

        end case;

    end if;

end process;

This code compiles successfully.
My problem is that the "block_state" bit seems falls when i send a comma, but rises after immediatly. It didn't memorised his state, and i try a lot of different way but i didn't find any issues.

I hope i was clear in this description of my problem.

Thanks you !

↧

Simple Avalon-MM to onchip memory

March 4, 2018, 2:58 pm

≫ Next: can FPGA act as database?

≪ Previous: Bit memorisation in if statement

I have been struggling to get a simple Avalon-MM to read from an onchip memory slave. All the examples I've seen have a NIOS connect to the onchip memory which I don't want. I am putting my process logic directly in the master component.

I've made sure to configure the same latencies and have matched the Avalon waveform signals between master and slave. I suspect my problem is the memory initialization is not occurring, especially due to the fact that I have enabled the in-system memory content editor feature with an instance id yet whenever I attempt to use the in-system memory content editor and select my device, it says "No instances found". Likewise when I try to use "Update Memory Initialization File", it complains that it "Found no valid Memory Initialization File to process." The full file path to a .hex file is definitely set in the onchip memory component. I created and am managing my .hex file directly in Quartus and have also tried a .mif file with the same results. I've even disabled default initialization and enabled "Use checkered pattern as uninitialized RAM content" with a 0101 pattern.

Code:

library IEEE;

use IEEE.std_logic_1164.all;

use IEEE.numeric_std.all;





entity new_component is

    port (

        avm_m0_address    : out std_logic_vector(7 downto 0);                     -- avm_m0.address

        avm_m0_readdata   : in  std_logic_vector(31 downto 0) := (others => '0'); --       .readdata

        avm_m0_byteenable : out std_logic_vector(3 downto 0);                     --       .byteenable

        avm_m0_write      : out std_logic;                                        --       .write

        avm_m0_chipselect : out std_logic;                                        --       .chipselect

        avm_m0_writedata  : out std_logic_vector(31 downto 0);                    --       .writedata

        clock_clk         : in  std_logic                     := '0';             --  clock.clk

        reset_reset       : in  std_logic                     := '0';             --  reset.reset

        read_out           : out std_logic_vector(31 downto 0)

    );

end entity new_component;





architecture rtl of new_component is

    signal chipselect : std_logic := '1';

begin

    avm_m0_address <= "00000000";

    avm_m0_byteenable <= "0001";

    avm_m0_write <= '0';

    

    process(clock_clk, chipselect, avm_m0_readdata)

    begin

        if rising_edge(clock_clk) then

            if chipselect = '0' then

                read_out <= avm_m0_readdata;

            end if;

            avm_m0_chipselect <= not chipselect; -- after 20ns;

            chipselect <= not chipselect; -- after 20.5ns;

        end if;

    end process;

end architecture rtl; -- of new_component

I have clock_clk, reset_reset, and read_out(7 downto 0) connected to my 50Mhz clock, reset, and 8 LEDs in my top file and all the LEDs light up every time which means they are all staying at '0' as opposed to the all FF's (which should set the LEDs to '1') that I have defined in my .hex file (or the 0101 pattern when onchip is uninitialized).

↧

can FPGA act as database?

March 4, 2018, 8:06 pm

≫ Next: Quartus II vs Quartus Prime TimeQuest Timing Analyzer

≪ Previous: Simple Avalon-MM to onchip memory

Hi,
I am doing an image processing project and i am using DE1-SoC board..My doubt is how can I use FPGA as an database.(I want to measure image data from camera and compare to predefined stored value and then displays the corresponding result)..Is it possible in this board? If possible means please help me and give some materials..

Thanks in advance.

↧

Quartus II vs Quartus Prime TimeQuest Timing Analyzer

March 4, 2018, 9:12 pm

≫ Next: FMCOMMS5 compatibility with Arria 10!

≪ Previous: can FPGA act as database?

Hi all,

I have been using Quartus II since v8.0 until v15.0, no problem in dealing with the timing analysis. Just installed Quartus Prime a moment ago, after I compiled my previous built project in Quartus Prime, I found out that in the TimeQuest Timing Analyzer section, under slow model, the section Named Datasheet Report is no longer exist in Quartus Prime. Is thie report is being removed or relocated into somewhere else? Any guru can guide me on this? My main intention is to view the tco timing of my design. Previously in Quartus II this tco is located under the Datasheet Report.

I attached the screenshot for both Quartus II V15.0 vs Quartus Prime 17.1 for reference. Thanks!

Attached Images

Capture.jpg (20.5 KB)

↧