Neural Network simulator in FPGA? [closed]_问答_开发者

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

We don’t allow questions seeking recommendations开发者_运维知识库 for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.

Closed 2 years ago.

Improve this question

To learn FPGA programming, I plan to code up a simple Neural Network in FPGA (since it's massively parallel; it's one of the few things where an FPGA implementation might have a chance of being faster than a CPU implementation).

Though I'm familiar with C programming (10+ years). I'm not so sure with FPGA development stuff. Can you provide a guided list of what I should do / learn / buy?

Thanks!

Necroposting, but for others like me that come across this question there is an in-depth, though old, treatment of implementing neural networks using FPGAs

It's been three years since I posted this, but it is still being viewed so I thought I'd add another two papers from last year I recently found.

The first talks about FPGA Acceleration of Convolutional Neural Networks. Nallatech performed the work. It's more marketing that an academic paper, but still an interesting read, and might be a jumping off point for someone interesting in experimenting. I am not connected to Nallatech in any way.

The second paper came out of the University of Birmingham, UK, written by Yufeng Hao. It presents A General Neural Network Hardware Architecture on FPGA.

Most attempts at building a 'literal' neural network on an FPGA hit the routing limits very quickly, you might get a few hundred cells before P&R pulls takes longer to finish than your problem is worth waiting for. Most of the research into NN & FPGA takes this approach, concentrating on a minimal 'node' implementation and suggesting scaling is now trivial.

The way to make a reasonably sized neural network actually work is to use the FPGA to build a dedicated neural-network number crunching machine. Get your initial node values in a memory chip, have a second memory chip for your next timestamp results, and a third area to store your connectivity weights. Pump the node values and connection data through using techniques to keep the memory buses saturated (order node loads by CAS line, read-ahead using pipelines). It will take a large number of passes over the previous dataset as you pair off weights with previous values, run them through DSP MAC units to evaluate the new node weights, then push out to the result memory area once all connections evaluated. Once you have a whole timestep finished, reverse the direction of flow so the next timestep writes back to the original storage area.

I want to point out a potential issue with implementing a Neural Network in FPGA. FPGAs have limited amount of routing resources. Unlike logic resources (flops, look-up tables, memories), routing resources are difficult to quantify. Maybe a simple Neural Network will work, but a "massively parallel" one with mesh interconnects might not.

I'd suggest starting with a simple core from OpenCores.org just to get familiar with FPGA flow, and then move on to prototyping a Neural Network. Downloading free Xilinx WebPack, which includes ISIM simulator, is a good start. Later on you can purchase a cheap dev. board with a small FPGA (e.g. Xilinx Spartan 3) to run your designs on.

A neural network may not be the best starting point for learning how to program an FPGA. I would initially try something simpler like a counter driving LEDs or a numeric display and build up from there. Sites that may be of use include:

http://www.fpga4fun.com/ - Excellent examples of simple projects and some boards.
http://opencores.org/ - Very useful reference code for many interfaces, etc...

You may also like to consider using a soft processor in the FPGA to help your transition from C to VHDL or Verilog. That would allow you to move small code modules from one to the other to see the differences in hardware. The choice of language is somewhat arbitrary - I code in VHDL (syntactically similar to ADA) most of the time, but some of my colleagues prefer Verilog (syntactically similar to C). We debate it once in a while but really it's personal choice.

As for the buyers / learners guide, you need:

Patience :) - The design cycle for FPGAs is significantly longer than for software due to the number of extra 'free parameters' in the build, so don't be surprised if it takes a while to get designs working exactly the way you want.
A development board - For learning, I would buy one from one of the three bigger FPGA vendors: Xilinx, Altera or Lattice. My preference is Xilinx at the moment but all three are good. For learning, don't buy one based on the higher-end parts - you don't need to when starting using FPGAs. For Xilinx, get one based on the Spartan series such as the SP601 (I have one myself). For Altera, buy a Cyclone one. The development boards will be significantly cheaper than those for the higher-end parts.
A programming cable - Most companies produce a USB programming cable with a special connector to program the devices on the board (often using JTAG). Some boards have the programming interface built in (such as the SP601 from Xilinx) so you don't need to spend extra money on it.
Build tools - There are many varieties of these but most of the big FPGA vendors provide a solution of their own. Bear in mind that the tools are only free for the smaller lower-performance FPGAs, for example the Xilinx ISE Webpack.

The software comprises stages with which you may not be familiar having come from the software world. The specifics of the tool flow are always changing, but any tool you use should be able to get from your code to your specific device. The last part of this design flow is normally provided by the FPGA vendor because it's hardware-specific and proprietary. To give you a brief example, the software you need should take your VHDL and Verilog code and (this is the Xilinx version):
- 'Synthesise' it into constructs that match the building blocks available inside your particular FPGA.
- 'Translate & map' the design into the part.
- 'Place & route' the logic in the specific device so it meets your timing requirements (e.g. the clock speed you want the design to run at).

Regardless of what Charles Stewart says, Verilog is a fine place to start. It reminds me of C, just as VHDL reminds me of ADA. No one uses Occam in industry and it isn't common in universities.

For a Verilog book, I recommend these especially Verilog HDL. Verilog does parallel work trivially, unlike C.

To buy, get a relatively cheap Cyclone III eval board from [Altera] or Altera's 3 (e.g. this Cyclone III one with NIOS for $449 or this for $199) or Xilinx.

I'll give you yet a third recommendation: Use VHDL. Yes, on the surface it looks like ADA. While Verilog bears a passing resemblance to C. However, with Verilog you only get the types that come with it out of the box. With VHDL you can define your own new types which lets you program at a higher level (still RTL, of course). I'm pretty sure the Xilinx and Altera free tools support both VHDL and Verilog. "A Designers Guide to VHDL" by Ashenden is a good VHDL book.

VHDL has a standard fixed-point math package which can make NN implementation easier.

It's old, because I haven't thought much about FPGAs in nearly 20 years, and it uses a concurrent programming language that is rather obscure, but Page & Luk, 1991, Compiling Occam into FPGAs covers some crucial topics in a nice way, enough, I think, for your purposes. Two links for trying stuff out:

KRoC is an actively maintained, linux-based Occam compiler, which I know has an active user base.
Roger Peel has a logic synthesis page that has some documentation of his linux-based workflow from Occam code synthesis through to FPGA I/O.

Occam->FPGA isn't where the action is, but it may be a much better place to start than, say, Verilog.

I would recommend looking into xilinx high-level synthesis, especially if you are coming from a C background. It abstracts away the technical details in using a hdl so the designer can focus on the algorithmic implementation.

The are restriction in the type of C code you can write. For example, you can't use dynamically sized data structures, as that would infer dynamically sized hardware.