Hello all,
I am trying to create a multicore system that I will use for JPEG encoding. I am using a DE2-115 board along with Quartus 12.1, Qsys and associated build tools for Eclipse.
I have some questions regarding this, that I am seeking some advice for and some help in setting it up. I am mostly a newbie when it comes to hardware architecture and altera tools.
So let me explain what I have so far, and my requirements.
I have 4 cpus right now (I am intending to add as many as possible), each with their on-chip memory (4 KB right now) with data and instruction cache (4 KB / 64 KB). I need as many cpus as possible to compress the bmp image to jpeg, so obviously more cpus are better.
I have an uart, connected to cpu 0. I need this in order to read the bmp file off the computer, and to write back the jpeg file.
I have a timer, connected to cpu 0, to time how long it takes to compress the image.
Finally, I have an sdram controller connected to all cpus, a mutex for the sdram.
Now, the basic idea is to read a bmp file from the computer, compress it using the hardware, then write it back to the pc. I should also time how long it takes to do the compression. I have successfully managed to do this for a single cpu. But now it's time to take it to the realm of several cpus and make it as fast as possible.
So, some questions here...
The program including the HAL is pretty big. I'd wager around 60 KB, so it's impossible to use onchip memory to store the instructions. I need some external memory such as sdram, sram or flash. But here is where I'm not sure.
As far as I understand, I can't put all instructions in a single area (since I intend for all cpus to run the same code) and have all cpus read from there. The nios build tools simply won't allow it. Another concern is, of course, access conflicts. I don't think I can have 4+ cpus reading from the memory at the same time. So what is the ideal option here? What should I use? SDRAM, SRAM, Flash? How do I optimize it so that all cpus can access the data quickly? Should I use instruction cache? Obviously, the less onchip memory I use for caches and stuff, the more cpus I can have, which is nice.
Second question is how to configure the build tools where to place the different sections. I want the heap to be on the SDRAM, the .text section wherever the instructions end up, and maybe some onchip memory and data cache to locally store data that it will process before writing back to the SDRAM.
As for the software, I'm guessing that all memory is just memory mapped, so using pointers to appropriate locations is a good way to transfer data from and to sdram and onchip memory (alternatively cache)?
Any insight would be welcome.
Thanks!
I am trying to create a multicore system that I will use for JPEG encoding. I am using a DE2-115 board along with Quartus 12.1, Qsys and associated build tools for Eclipse.
I have some questions regarding this, that I am seeking some advice for and some help in setting it up. I am mostly a newbie when it comes to hardware architecture and altera tools.
So let me explain what I have so far, and my requirements.
I have 4 cpus right now (I am intending to add as many as possible), each with their on-chip memory (4 KB right now) with data and instruction cache (4 KB / 64 KB). I need as many cpus as possible to compress the bmp image to jpeg, so obviously more cpus are better.
I have an uart, connected to cpu 0. I need this in order to read the bmp file off the computer, and to write back the jpeg file.
I have a timer, connected to cpu 0, to time how long it takes to compress the image.
Finally, I have an sdram controller connected to all cpus, a mutex for the sdram.
Now, the basic idea is to read a bmp file from the computer, compress it using the hardware, then write it back to the pc. I should also time how long it takes to do the compression. I have successfully managed to do this for a single cpu. But now it's time to take it to the realm of several cpus and make it as fast as possible.
So, some questions here...
The program including the HAL is pretty big. I'd wager around 60 KB, so it's impossible to use onchip memory to store the instructions. I need some external memory such as sdram, sram or flash. But here is where I'm not sure.
As far as I understand, I can't put all instructions in a single area (since I intend for all cpus to run the same code) and have all cpus read from there. The nios build tools simply won't allow it. Another concern is, of course, access conflicts. I don't think I can have 4+ cpus reading from the memory at the same time. So what is the ideal option here? What should I use? SDRAM, SRAM, Flash? How do I optimize it so that all cpus can access the data quickly? Should I use instruction cache? Obviously, the less onchip memory I use for caches and stuff, the more cpus I can have, which is nice.
Second question is how to configure the build tools where to place the different sections. I want the heap to be on the SDRAM, the .text section wherever the instructions end up, and maybe some onchip memory and data cache to locally store data that it will process before writing back to the SDRAM.
As for the software, I'm guessing that all memory is just memory mapped, so using pointers to appropriate locations is a good way to transfer data from and to sdram and onchip memory (alternatively cache)?
Any insight would be welcome.
Thanks!