Saturday, September 23, 2017

Stanford Mips-X @IIT

This doc may be helpful to have on hand to understand this post:
http://i.stanford.edu/pub/cstr/reports/csl/tr/86/289/CSL-TR-86-289.pdf

Integrated Information Technology or as that mouthful was abbreviated to: IIT (now 8x8) was founded by Chi-Shin Wang and Y.W. Sing two Weitek veterans.  Its original product line was floating point coprocessors and lived in a niche before the giant sucking sound was heard of Intel vacuuming up peripherals such as floating point. At that point, a new product was needed. From 1989 a small group was working on a Vision Processor (VP). This SIMD(4) engine took a year or so to build. Its first customer was Compression Labs (CLI). The VP had a large reg file 64 16bit regs  times 4 SIMD instances. This was so CLI could execute the 16x16 DCT. An 8x8 DCT is far more modest in reg consumption. Another customer for the part was AT&T with the videophone 2500 in 1992. So at this point, IIT had a reasonable video business with customers that had application processors that would use the VP as a video engine. About this time a search was underway for IIT's very own controller. Chi-Shin found a 'bargain' from Stanford, Mips-X for $15K! What do you get for $15K? You get a tape from Stanford's tech licensing group. Paul Chow (now EECS prof at Toronto) sat with us for a week to go over the tape. I worked with him to understand the SW tools offering. For one item, there was no C compiler for it, there was however an assembler written in Modula-2. Over the years I wrote and maintained our port of GCC to Mips-X. Having 2 delay slots broke the reorg pass of GCC on numerous occasions with the usual email chain from me to Richard Kenner and workarounds to deal with reorg issues. Finally I wrote our own 8x8 pass in GCC that fixed up cases where a hazard was in slot1 and slot2 used it. Rather than use the Modula-2 assembler, I wrote one in C++. (props to Jeff Loomans at a poker game who told me to learn C++, the assembler was my first C++ program). From a HW standpoint, there was no schematic(!). There was a layout, not sent out(!). So, IIT sent the layout to the fab.  It came back and had severe voltage range issues. Also, it was a black box, i.e. provide a clock, take it out of reset, and it fetched a few instructions and was dead, lifeless. Observing the code, it was clear that it was a cache issue, since the first few fetches worked until a loop branched back. Luckily for us, there was a pin to disable the cache, then, it worked.
Management at IIT then reversed the layout and sucked out the design to a schematic. From there, subsequent tapeouts could use a proper core. Work then went on to produce the Vision Controller(VC) which was IIT's version of what CLI and AT&T bolted on around the VP. So this VC had a DMA controller, a video circuit, etc. Externally, it used a brooktree part for the analogue to drive the monitor. VC came back from the fab and we could see colour bars on the screen using a polled interrupt test. When that polling became a real interrupt... all hell broke loose and Mips-X crashed. The standard crash was a vertically rolling screen. Without the processor there to manage the DMA for the active portion of the screen, the DMA just kept shipping out a rolling rectangle. Doug Neubauer who was instrumental in the HW design of VC was about to go on a fishing trip and (paraphrasing) said "I am going fishing, if this is not fixed when I return, its game over". So during his trip a logic analyzer was hauled in and clipped onto the Mips-X code SRAMs. When the crash happened, the processor was in the interrupt handler, never able to leave. Mips-X has an odd interrupt pipeline. Everything is exposed to the programmer, so when you get an interrupt, the 5 stage pipe (IF RF ALU MEM WB) allows MEM and WB to finish (look out for volatiles there btw). You get 3 PC's to restart at IF, RF, ALU and the last 2 will redo the MEM and WBs that were in flight at the interrupt. If an interrupt should come in whilst an interrupt was being serviced, the final code of an interrupt handler will be:

JPCRS
<---------------- New interrrupt.
JPC
JPC

the new interrupt interferes with the pipleline restart and you are back in the handler with a half baked restart of the prior interrupt return. In the PSW is an e bit. The bug we found was that the exception began chain shifting again making the e bit and recovery using the chain regs impossible.

How to then return from interrupts?

Here is where I nominate a reg to the best register I have ever seen. Having worked in this space between HW&SW to nominate one reg to the best means this reg is pretty special. I believe Matt Cressa was the HW engineer who crafted it. Its reason for being even is odd. There was no need for this reg really, nor wrt its semantics. From the ERS:

RIFACE_IRQSUPPRESS    (0x20004014)                       (4 bits RD/WR)
 -----------------------------------------------------------------------------
|    |    |    |    |    |    |   |   |    |    |    |    |    |    |    |    |
|  x | x  | X  | X  | X  |  X | X | X | X  | X  | X  | x  | S3 | S2 | S1 | S0 |
|    |    |    |    |    |    |   |   |    |    |    |    |    |    |    |    |
 -----------------------------------------------------------------------------
  b15                                                                      b0

RIFACE_IRQSUPPRESS - Suppress Interrupts for S3-S0 Instructions
    0x00 = 15 instructions
    0x0e =  1 instruction
    0x0f =  0 instruction

Lets look at that reg closely, observe the inversion of S3..S0, writing 0 gets the maximum suppression. How is that useful?  Lets look at this tail end of an interrupt handler. Observe R0 being assigned to the reg.



7609c: 96c04028 st 0x4014[r27], r0
760a0: 877e0000 ld 0x0[r29], r31
760a4: 60000019 nop
760a8: f8000003 jpcrs
760ac: e8000003 jpc
760b0: e8000003 jpc


The effect is that for the next 15 instructions, interrupts are suppressed. This is enough to escape the interrupt handler without being sent back into the handler in a cooked state,

Without this we cannot get back cleanly from an interrupt. Since the reg's original purpose was unknown as the problem had not manifested itself yet it really was pure serendipity that it exists at all. Also, the inversion... Had it been not inverted and a linear map of write N get N suppress, then that too would have failed. You see at the end of the interrupt handler, the regs have to go back to the values they had at the time of the interrupt, only R0 on the Mips-X, which is hardwired to 0, can be dependably used. Amazing.