What is an FPGA?
Do you know what an FPGA is? Our FPGA Team Leader discusses what an FPGA is, how they work, what the different components of an FPGA are and how we use them at Telesoft.
Hi I’m Richard from Telesoft, and one of the questions I get asked most often is what is an FPGA? So let’s discuss.
So an FPGA stands for Field Programmable Gate Array, so that is an array of programmable logic gate you can program in the field. So let’s take a look at what’s inside them because that might help.
So this is a view of a chip planner of an Intel FPGA, things we’re discussing today are representative of most modern FPGAs. So first if we start in the corners of this chip, you can see these transceiver banks, transceivers are transmitters and receivers in modern FPGAs you’re looking at tens of gigabits per second, per transceiver. Right behind the transceivers, you’ve got the hard IP. I can tell you just from knowing that they’re right next to the transceivers these are going to be PCIe or Ethernet Macs and so very high-speed serial interface protocols.
Other examples of Hard IP you get commonly are external memory interfaces so DDR3, DDR4 controllers, or even quad-core ARM processors are pretty commonplace in modern FPGAs. Other things that you get inside FPGAs are registers, these are really common, you get millions of them all they do is latch their outputs or latch from their inputs to their output on a clock edge. We’ll discuss more about why these are important a bit later on. Others that you get in FPGAs are I/O Lane circuitries so this controls all the physical pins on the device, it allows you to control single-ended or differential signaling, different voltage levels, if you’ve got open-drain inputs/outputs etc. These controls all of that.
We’ve also got RAM this is regular memory, Random Access Memory. it comes in blocks you can join the blocks side by side to hold wide data elements or you can stack them deep to hold lots of data elements, or you can stack them wide and deep to hold lots of large data elements. They’re very flexible, very programmable and they can cover most things that you’d like to do inside a chip. Next DSPs so if I zoom in a little bit on this slide you get these banks that are coloured down so some of these are block RAMs, some of these are DSPs.
DSP stands for Digital Signal Processing, so these are little blocks of logic that are very good for doing maths very quickly so that can be additions, subtraction, multiplication, multiply-accumulate, and the DSP slices are just used for that. They’re very fast at a very specific task.
The final thing we have are LUTs or Look Up Table, so we’ll find out about what they are on this slide. So if you haven’t done digital logic design before this is a simple AND gate it’s called an AND gate because it’s output goes high when both its input or inputs A and B are high on the left here you can see this is a truth table so this shows that when both inputs are high the output goes high so this truth table can be programmed into your programmable lookup table and this is the basis that lookups and look-up tables start implementing programmable logic. If we go to a slightly more complicated example there’s a few more logic gates here but if you want to follow through on a slide please feel free to work it out with the truth table there should correspond to the logic that’s shown. So this has four inputs, most, well most, FPGAs that we use at Telesoft have six input look-up tables in them FPGAs have anything from four-input LUTs to eight input LUTs but six is kind of a pretty standard, middle-of-the-road that most chips have and they’re all programmable, and on the Stratix 10 FPGA for example, there are a million LUTs so this gives you a little idea of the scale of how many lookup tables you have in a sort of modern high-end device you can get more, you can get less, it’s very device-dependent.
Next, we can talk about how a compiler works so FPGA compilation particularly or if you are from a software background FPGA can take a long time to build so we’re going to talk about why the process takes the time that it does. So the first thing the compiler does is very much the same as any other software compiler, it does a syntax check and elaboration so it’s checking the code you’ve written is syntactically correct and that all the functions that you called are declared somewhere and are also correct. The next thing it does is what’s called synthesis so this is taking the syntax checked code and it’s turning it into the elements inside the FPGA that we’ve discussed, so this is a netlist view and it’s taken all the code and turned this into your logic gates, into registers, into DSP slices, into block RAMs, and this is what you can see on the slide behind me. This looks messy that’s because it is, it’s very, very complicated this is quite a large design and the image you’re seeing here is a very small section of a large design, so this does take a little bit of time for the tool to work this out.
The next stage is the tool will take your single synthesized netlist and has to fit this to the device, fitting comes in two stages, there’s place and there’s route. Granny with a shotgun here is representing the place stage so the first thing the compiler will do is it has to do an initialization or starting point for all of your logic on the target device and it does this with a semi-intelligent shotgun blast, so it takes all of your logic and just chucks it on the device and then it iterates through this so the initial placement is just an initialization it’s then moving all your logic round on your DSP slices, or your block RAMs, and is trying to find a global optimum placement for all of your logic. The second stage of fitting is routing, so the routing is when it then has to connect all of the placed elements and it has to route them to all the elements they’re connected to, again routing is an iterative process so this goes through many, many steps trying to find the optimum routing for your entire design, it’s no good just having one part that’s really fast if it sacrifices all the paths around it, so the router does take a fair amount of time even on very high-end processors now you’re looking at several hours or maybe even tens of hours for a really large design on a modern FPGA.
The final stage of the build process is a lot quicker so there’s the assembler and timing analysis. Assembler is the process of taking your placed and routed design and turning it into a programmable bitstream for your target device and then timing analysis is how it checks that your device will pass timing.
Now timing closure or passing timing, timing clean, these are phrases you will start to hear as you work with FPGAs. So we’re going to talk about what that actually means, so timing closure is the tool measuring the paths from every register to every other register or from I/O pins on your device now there’s a lot of parts for it to measure but it actually does this quite fast and it has to measure the total delay. Now the delay is made up of two things, you get logic delay and routing delay, so logic delay is the amount of time it takes your signals to get through the look-up tables or through RAMs, and the routing delay is the amount of time it takes those signals to pass through the routing the compiler has chosen to use. You’re aiming for your logic delay and your routing delay to be less than your clock period. So if we take the example of an FPGA running at 333 MegaHertz you have a clock period of three nanoseconds, so you need to get from one register to the next register in less than three nanoseconds. This sounds quite fast but this is very achievable in FPGAs.
So the next thing is all this sounds quite complicated so why do people bother using FPGAs what benefits do they bring to us? Well the first thing is they offer massively parallel processing so let’s think about what this actually means if you think of an FPGA running at 300 megahertz but then you compare it to a processor running at 3 gigahertz you’ll often think how is an FPGA any faster than a processor? The processor is 10 times faster now the difference is due to the parallel processing so a CPU or a processor core will operate one instruction at a time so at 3 gigahertz is operating at 3 billion instructions per second, so an FPGA only runs at 300 megahertz this sounds 10 times worse but the FPGA does everything on every clock cycle so if you implement an FPGA with a thousand functions and each of those functions would take a processer 10 clock cycles to implement, you’re looking at 10,000 clock cycles for your standard processor to implement the same thing but an FPGA can do this every single clock cycle, so suddenly running 10 times slower is insignificant because overall you’ve got a thousand fold improvement in your performance, so this massively parallel processing gives huge acceleration, it doesn’t matter if you’re accelerating math functions, if you’re doing fingerprint hashes anything like that anything that a processor can do an FPGA will also do really quite quickly particularly this iterative process, things like hashing algorithms or compression, decompression and encryption, decryption these things are really well optimized for hardware implementations.
Now counter-intuitively this huge acceleration also gives you a massive power efficiency so if you consider if an FPGA is a thousand times faster than processor if you are running a thousand processor cores, you’re going to use a lot more power than one FPGA, so all of this massively parallel processing gives us acceleration which also gives us huge power savings pretty cool.
So in summary what is an FPGA?
It’s transceivers, DSP slices, LUTs, I/O registers, hard IP.
What does the compiler do?
It does syntax checks, elaboration, it does the fitter which is the place and route and then assembler and timing analysis.
What does timing closure mean?
Well it’s the time measured from one register to another register including the routing delay and the logic delay and you want that total delay to be less than your clock period.
And finally why do people use FPGAs? What benefits do they bring to Telesoft?
It’s massively parallel processing that gives us huge acceleration and in return also a kind of cool power saving.
So that’s everything for today I hope you’ve enjoyed it and we’ll see you soon.
You may also like
400GBPS FlowProbe: Network Traffic Monitoring
Monitor real time traffic information and network performance whilst using anomaly detection to maintain cyber security with our ultra high performance 4x 100GbE network traffic monitor.
100GBPS CERNE: INTRUSION DETECTION
100 Gbps IDS engine and alert driven packet recorder that enables 24/7 real-time network threats monitoring and access control.
400GBPS TRITON: CYBER WARFARE SIMULATION
Prove and enhance your cyber security posture with our Cyber Warfare Simulation tool and our world class SLA and advanced on-site/ off-site support.
TDAC: Digital Forensics
Unlocks network visibility and threat identification