A Founding Father of Modern Computing: Part 1
A revised transcript from Part 1 of our interview with Steve Casselman
Colby: I want to touch on how you started, and what the industry looked like at the very beginning, because I hear stories from people about hand drawing designs.
Steve: That's where I started, they were doing that. They were drawing stuff out by hand. And one of them, this guy, was telling me they had to print out the huge ASIC design that they were doing. It's a Gate Array. The gate array they were doing. And he had to crawl on the floor until he found the one via that was missing in the design. At that time there were very few automation tools to do any of this stuff.
When I started in the CAE lab, entering schematics, and daisy was pretty good. I was good at all that stuff. So one day one of the top guys comes in and goes, looks in the lab and he goes, Casselman you like weird stuff, come out here and look at this thing with me, and I go, sure. And we went out and it was actually monolithic memories, which Xilinx was the second source to get a military contract. So it was a sales guy from monolithic memories. We went over that and I was looking at it and I go, this is awesome. If you hit a button for record or play, you could switch the whole FPGA out before anybody would know it. I go, yeah, that's probably right.
And then I was walking around, and that was in like 80 or 86 so I went back to work and thought about it. And basically the next day somebody came in and said, oh, they're gonna do a demo for a Silicon compiler. I go, wow, those guys already know how to compile down into these FPGA things, that's great. But I went in there and the Silicon compiler and I saw the demo. It was basically a form where you would enter your parameters for little components, and then it would build a schematic with this. And I go, that's not a Silicon compiler. I know what a Silicon compiler is. So that's when I started writing my SBIRs in 86.
Colby: Were those for the FPGA stuff?
Steve: Yeah, to take a FPGA, and make a computer out of them. And at that time there were like 64 lookup tables, right? So I was gonna have a big old board with lots of stuff, which I eventually did make. And interestingly, so that board there, and if you look at it, it's almost exactly what we have today in FPGAs.
So there's an area where I was planning to put a processor in so that it can compile programs and it can make its own bit streams running on the device. And then there was a big array of FPGAs and interconnect chips, which I did a full custom interconnect chip, any pin to any pin that had two different configurations paths. So you could be using one and you load the next one, and then you could just flip between them. And that's important in things like butterflies and stuff like that and FFTs and all that kind of stuff. So just thinking ahead. \
And then on the board, I also had these really big Rams, right? SRams, they were the biggest things at the time. I think they were a megabyte per and there were like eight megabytes on there. And then there was the dual port Ram on the edges, and then it wrapped around. So it was to, so it's basically the SOC architecture that we have today in FPGAs.
Colby: And did you have a goal for it? Like a main focus, because now they have FPGAs for RF and pretty much anything you want to do? Or did you just wanna show that you could make one?
Steve: I got a SBIR contract to make a hierarchical database search engine. It was hard to make this to begin with, and I was hoping that I would get some follow-on money to do the actual application, but that didn't happen.
So I had a little interface board that went from the sun workstation into this big monster in the background and it had a FPGA on it and I could configure that on demand because I had set it up to be able to configure on demand, and so I go, oh, maybe I'll just sell this thing. So I started, I'd had a bunch of pins to go to something else, so I swapped that out for a couple of the nice connectors. So you could put a daughter board on top of it, right? And then I just did this easy, what I thought was easy, interface.
And it was pretty funny because 10 years, 15 years later, some guy comes up to me and says, I still have one of your boards on my desk and whenever I want to try something out, I pull it out and I use that, because it was so easy to set up and talk to.
Colby: So you were actually able to sell it? Did you do it yourself?
Steve: Oh yeah, I sold it. I had a company, it was called virtual computer corporation. I won the SBIR under imagination works, which was the name of the first company, but then I wanted something a little more industrial. So I got virtual computer corporation as a name and VCC as a domain. And I should have kept that.
But I had four or five people working for me. And I made all sorts of stuff . One of our biggest customers was the government, which I'm sure they were doing encryption, but they never told me but you could see it roll out. They bought one, then they bought four. Then they bought 30, then they bought a couple hundred. So that was good.
Colby: And I also wanted to ask, because right now, especially with the supply chain, and all the getting parts and stuff, how was it to piece together? Because now, I can pretty much go on Mouser and type in micro controller and there's thousands to pick from. How did you select parts for the board?
Steve: The board is basically, the big board, is Ram, FPGAs, and an interconnect chip. So for the first prototype, I did the full custom ship myself.
Colby: Oh, wow.
Steve: Yeah. Especially since nobody asked me if I could do that. So they asked, as part of the proposal, can you do that? I go, I've never done that before.
Colby: PCB and ASIC design are very different.
Steve: Yeah, doing the ASIC full custom, I dropped a boron layer the first time it was like, ouch. But that was with Moses, which was a great little thing that was one of the first multi-project things to happen and it was mostly for students, but anybody could get in on it. So that was very cool. And I finally got that thing to work, but in the meantime, somebody else came out with a chip that you could load and configure all the routing.
Because in the beginning, the FPGAs weren't so good at routing. They didn't have enough wires. So on my board, I had added a whole lot more wires. See, and that's why it's more like today's modern FPGAs. I saw all the flaws and I designed a board that would cover those flaws, and it helped alleviate them.
So this was a big board and I turned the interface card into a product. And then I was walking around Xilinx and they go, oh, we got a PCI, not PCIE, but PCI interface macro and we're gonna build a board so that you can use it. And they showed me what it was and it looked like a terrible, horrible looking board.
I like to think of art, boards are art. A good board, when you look at it, and it looks good, that's a good board. You can have a good board that looks ugly, but there's a certain amount of pride that goes into making everything look just right.
And I looked at that. I go, no let me do the board. So I actually did the board that they shipped out with their PCI interface. Because I got that to reconfigure on the bus and do all this stuff. So that was a good product, because that's about the time I started working on the hardware object stuff because I was doing a lot of different designs and I was tired of trying to port or build up a new kind of interface to each board because each board was a little bit different.
So with the hardware object technology, what I did is I separated everything that had to do with the actual physical board from all the rest of the stuff that you'd need to do. So that made it very portable. So I could just port things in a day or, my guys could. So you know, the hardware object technology, as you take the bit stream and you turn it into a static array so you can compile it right into a program.
And then the hardware object part is, you add this bit stream. So you make a class called hot and then the bit stream is a field, but then there's a whole bunch of other stuff where you write your code to use that bit stream. So you got the bit stream as part of the object and the code to use the bit stream as part of that.
So that's hardware object technology in a nutshell. Is that you're able to make these C plus objects that are bulletproof, where they could be Python now or whatever the favorite language of the day. But it's really convenient because it's very fast, as far as loading and unloading and doing things like.
And I developed a whole client server thing where I could throw a hardware object over to another workstation somewhere and it would load and run and then return the results. There's all that stuff. It was a plugin technology.
You know, I'd really like to get that to be open source. That's one of the things where I keep looking around and I go, oh, this guy's got this stuff. What does it do? I go, oh, that's almost nothing. And everybody's all happy with it. And the hardware object technology is well thought out, documented, and has all sorts of features.
You basically treat anything inside the FPGA as a memory location, and you have a memory map, and then you know where those are and then you just use regular C read and write commands. You get a handle and the driver knows what's going on and it's very simple to do. But nobody's done it yet because there were no customers for it for 30 years. Now there's all these people doing all sorts of stuff as a hobby and open source. The world could use hardware object technology to be open source.
Colby: I'm glad you brought up open source. Because I was wondering, was it a thing people were focused on or knew about when you were first working on these things?
Steve: When I first started working, there were some things that were open source, for example, Linux. And there were some other little tools and things like that, but for any kind of really big job, you paid money for it. Look at what cadence and synopsis just got a hundred thousand dollars a seat, 30 years ago. And so, are you gonna open source that? I don't think so.
This thing I'm working on now, which is the state machine is totally awesome in my opinion, it's unlike any other state machine because it has way more states at runtime. But it takes more memory, so there's no magic in it.
When I read the C and I have these two bits and one bit is for the actual state bit, and the other bit is the clock enable bit that’s actually a latch. It's a latch enabled bit to actually latch that in. So once you do that, it's like a don't care. You can make it into a don't care kind of situation.
So on that, in all state machines that I know of, you go to an address and you get a state vector out, you go to that address again, you get the same state vector out. But in mine, if you go to the address, you get a state vector, and you can go to other places that set the other don't care bits before you get back, you go to that address and it's a different state vector. So no, no other state machine does that. So this is like a quantum state machine in some ways, because the number of states is exponential, right?
If I have five states, in a line, in an output and I'm only setting one, the other four could be anything. So that's two to the four, right? So basically, you can have 32 different states at one address. That's the thing, at the same address. So it's extremely versatile, extremely powerful.
I'm just about to release it. I've been working on a website for what seems like a long time. It's about a month, but I had to learn how to do modern website tools and all this kind of stuff. When you're an entrepreneur, especially if you're just a bootstrapper you gotta learn all that stuff.
And I'm very close to being done with that. And then I'll release a paper that I've written, maybe 10 years ago, but I've updated it. And it's basically why use FPGAs for computing in the first place? Why are FPGAs more special in some ways than GPUs and CPUs? And there are a lot of different reasons for that.
And I outlined some of the things that are physical. And so you've heard it from Xilinx lately, right? Where they say no dark Silicon. I was surprised to hear that because that's the first thing I go into is the dark Silicon. And why FPGAs don't suffer from that.
And basically, you can think of a CPU as a tight, little mound of silicon working really hard and it gets really hot and it just sucks the electrons from the substrate faster than the substrate can get the electrons to it. But in FPGAs, those are all pulled apart. So they're all pulled apart and there's little islands of lookup tables. And so they don't get nearly as hot because they're separated. They're not concentrated.
And then there's other things in the paper you read about Rents rule, for example, that not many people understand. If you look at a processor, there's data coming into it. It's maybe a hundred wires or 200 wires or something. You take that same area in an FPGA and you have thousands of wires going into that area. So they're way better at delivering the data to where the computation has to happen than CPUs.
Colby: Can you touch more on the dark Silicon, because that's a phrase that I haven't really heard.
Steve: Oh yeah. That's been around for a little while. But there's a Scaling factor that happens. So when you go down in power or in size, those transistors are still going pretty fast. And so they're taking more power per area. So now you've got 20 gates in some area, nowadays you have 10,000 or a million gates in one area; they're just so small.
And so what happens is that everything's working and you can't deliver electricity fast enough to the inside of the ASIC to keep up with the power draw in there. And so what actually happens is the voltage rails drop, and then it just blows up, you lose your program or whatever you're doing there, because it just can't do it.
So even now in the processors that have 24 cores and stuff, they have little things that they turn off if they're not using them to make sure that they can go better, doing this. So that's how they're addressing it.
But in an FPGA, all the compute things are all spread out already, right? With the interconnect between. So you don't get this phenomenon where you're just sucking the electricity right out of the voltage rails and dropping the voltage locally.
However, there is a problem with FPGA, if you have dark Silicon, but it's the configuration. So when you configure an FPGA, if you do it too fast, what'll happen is the voltage rails will drop and then you'll lose all your configuration bits. So you have to go a certain speed, and they're limited by how they set things up and how things are going. And right now it's just like a big frame buffer that goes in and they have to go really slow because the one thing that processors have over FPGAs is the ability to change the functionality in a heartbeat.
So for a CPU, they just pull in some more instructions. They're doing something completely different than they were doing before. And that didn't take much time at all. So in the FPGAs, we gotta get to that point or at least half of that, maybe 10 times slower right now we're a thousand times slower.
Addressing small subroutines was very hard unless you could get them all in on the FPGA. It's hard to do partial reconfiguration because that takes a lot of time. And then you have to make up for that time by how much speed up you get. And sometimes that just doesn't work.
So then you're stuck having that function over on the CPU and Andels law that just starts to kill you, right? If that 10% can't possibly be put in hardware, then the best you could do is 10 X. And if it's more like 50%, then you're getting two X. Which could be a good thing.