Skip to main content

Stanford Mips-X @IIT

This doc may be helpful to have on hand to understand this post:
http://i.stanford.edu/pub/cstr/reports/csl/tr/86/289/CSL-TR-86-289.pdf

Integrated Information Technology or as that mouthful was abbreviated to: IIT (now 8x8) was founded by Chi-Shin Wang and Y.W. Sing two Weitek veterans.  Its original product line was floating point coprocessors and lived in a niche before the giant sucking sound was heard of Intel vacuuming up peripherals such as floating point. At that point, a new product was needed. From 1989 a small group was working on a Vision Processor (VP). This SIMD(4) engine took a year or so to build. Its first customer was Compression Labs (CLI). The VP had a large reg file 64 16bit regs  times 4 SIMD instances. This was so CLI could execute the 16x16 DCT. An 8x8 DCT is far more modest in reg consumption. Another customer for the part was AT&T with the videophone 2500 in 1992. So at this point, IIT had a reasonable video business with customers that had application processors that would use the VP as a video engine. About this time a search was underway for IIT's very own controller. Chi-Shin found a 'bargain' from Stanford, Mips-X for $15K! What do you get for $15K? You get a tape from Stanford's tech licensing group. Paul Chow (now EECS prof at Toronto) sat with us for a week to go over the tape. I worked with him to understand the SW tools offering. For one item, there was no C compiler for it, there was however an assembler written in Modula-2. Over the years I wrote and maintained our port of GCC to Mips-X. Having 2 delay slots broke the reorg pass of GCC on numerous occasions with the usual email chain from me to Richard Kenner and workarounds to deal with reorg issues. Finally I wrote our own 8x8 pass in GCC that fixed up cases where a hazard was in slot1 and slot2 used it. Rather than use the Modula-2 assembler, I wrote one in C. The linker, nld, in C++ (props to Jeff Loomans at a poker game who told me to learn C++, the linker was my first C++ program). From a HW standpoint, there was no schematic(!). There was a layout, not sent out(!). So, IIT sent the layout to the fab.  It came back and had severe voltage range issues. Also, it was a black box, i.e. provide a clock, take it out of reset, and it fetched a few instructions and was dead, lifeless. Observing the code, it was clear that it was a cache issue, since the first few fetches worked until a loop branched back. Luckily for us, there was a pin to disable the cache, then, it worked.
Management at IIT then reversed the layout and sucked out the design to a schematic. From there, subsequent tapeouts could use a proper core. Work then went on to produce the Vision Controller(VC) which was IIT's version of what CLI and AT&T bolted on around the VP. So this VC had a DMA controller, a video circuit, etc. Externally, it used a brooktree part for the analogue to drive the monitor. VC came back from the fab and we could see colour bars on the screen using a polled interrupt test. When that polling became a real interrupt... all hell broke loose and Mips-X crashed. The standard crash was a vertically rolling screen. Without the processor there to manage the DMA for the active portion of the screen, the DMA just kept shipping out a rolling rectangle. Doug Neubauer who was instrumental in the HW design of VC was about to go on a fishing trip and (paraphrasing) said "I am going fishing, if this is not fixed when I return, its game over". So during his trip a logic analyzer was hauled in and clipped onto the Mips-X code SRAMs. When the crash happened, the processor was in the interrupt handler, never able to leave. Mips-X has an odd interrupt pipeline. Everything is exposed to the programmer, so when you get an interrupt, the 5 stage pipe (IF RF ALU MEM WB) allows MEM and WB to finish (look out for volatiles there btw). You get 3 PC's to restart at IF, RF, ALU and the last 2 will redo the MEM and WBs that were in flight at the interrupt. If an interrupt should come in whilst an interrupt was being serviced, the final code of an interrupt handler will be:

JPCRS
<---------------- New interrrupt.
JPC
JPC

the new interrupt interferes with the pipleline restart and you are back in the handler with a half baked restart of the prior interrupt return. In the PSW is an e bit. The bug we found was that the exception began chain shifting again making the e bit and recovery using the chain regs impossible.

How to then return from interrupts?

Here is where I nominate a reg to the best register I have ever seen. Having worked in this space between HW&SW to nominate one reg to the best means this reg is pretty special. I believe Matt Cressa was the HW engineer who crafted it. Its reason for being even is odd. There was no need for this reg really, nor wrt its semantics. From the ERS:

RIFACE_IRQSUPPRESS    (0x20004014)                       (4 bits RD/WR)
 -----------------------------------------------------------------------------
|    |    |    |    |    |    |   |   |    |    |    |    |    |    |    |    |
|  x | x  | X  | X  | X  |  X | X | X | X  | X  | X  | x  | S3 | S2 | S1 | S0 |
|    |    |    |    |    |    |   |   |    |    |    |    |    |    |    |    |
 -----------------------------------------------------------------------------
  b15                                                                      b0

RIFACE_IRQSUPPRESS - Suppress Interrupts for S3-S0 Instructions
    0x00 = 15 instructions
    0x0e =  1 instruction
    0x0f =  0 instruction

Lets look at that reg closely, observe the inversion of S3..S0, writing 0 gets the maximum suppression. How is that useful?  Lets look at this tail end of an interrupt handler. Observe R0 being assigned to the reg.



7609c: 96c04028 st 0x4014[r27], r0
760a0: 877e0000 ld 0x0[r29], r31
760a4: 60000019 nop
760a8: f8000003 jpcrs
760ac: e8000003 jpc
760b0: e8000003 jpc


The effect is that for the next 15 instructions, interrupts are suppressed. This is enough to escape the interrupt handler without being sent back into the handler in a cooked state,

Without this we cannot get back cleanly from an interrupt. Since the reg's original purpose was unknown as the problem had not manifested itself yet it really was pure serendipity that it exists at all. Also, the inversion... Had it been not inverted and a linear map of write N get N suppress, then that too would have failed. You see at the end of the interrupt handler, the regs have to go back to the values they had at the time of the interrupt, only R0 on the Mips-X, which is hardwired to 0, can be dependably used. Amazing.

Comments

Popular posts from this blog

The Air32F103

 Bluepills Its no surprise that Bluepills, such as they were, a smoking sub $2 deal would dry up eventually, taken over by clones, some subtle, some overt. With supply issues still plaguing ST parts for hobbyists, what about these cheap Chinese clone parts? Lets dive into the Air32F103. Air32F103 I bought 5 of these $1.90 boards. They look like this: Some notables vs a stock bluepill: 1) Castellated pins with flat underside for use as a module 2) 3 LEDs R/G/B (vs just the G for a bluepill) 3) Clone of ST's peripherals 4) over clockable to 256Mhz (spec is 216Mhz) (original STM32F103CB is 72Mhz) 5) 32K of ram. But, via secret regs, 97K(!) 6) top and bottom debug pads, top for JLINK, bottom, legacy STLINK SWD 7) USB C vs USB mini 8) BOOT as a button vs jumper 9) 2 12bit DACs (STM32F103CB's don't have that) 10) QSPI (hidden support) 11) Undocumented crypto block from MegaHunt (includes: AES/DES/3DES/SHA/SM[1,3,4,7]) Clocking As mentioned 256Mhz is possible. One Q is how do they...

Ada on a $2 eBay 'Bluepill' board (STM32F103C8T6)

For about $2 a 'bluepill' board can be obtained from eBay. Taking its name from the PCB patina this small board is used in the Arduino community. The SoC itself per ST's early docs on it was 64k flash and 20k data, that said, all STM32F103C8's you buy nowadays are 128k flash. The CPU is a Cortex-M3 which can be run at 72Mhz maximum. There is a good website that shows the schematic and pinout of the bluepill: http://wiki.stm32duino.com/index.php?title=Blue_Pill If you are an Arduino programmer, the link above will take you where you want to go. But suppose you wanted to try programming and using the bluepill a different way? Well that is what this blog entry is about. An Ada port was done via AdaCores Ada_Drivers_Library and the Libre GNAT toolchain to the bluepill. This is preliminary work but it is able to generate working code that the author already is using as of yesterday (a garage parking measurement sensor).  The port of the library is derivative, ...

The STM32MP257F-EV1

 This is the follow on to the STM32MP1 series. This time a dual core Cortex-A35 a Cortex-M33 and a Cortex-M0+. As shipped, they provide a 16G micro SD card with Linux on it. That card is partitioned with a lot of odd small partitions. The important ones are at the end, p10 and p11. Unfortunately p10, the / partition is only 4G so immediately there is a storage crisis when you start populating the usual tools. A quick dd from that card to a 256G card I had lying around and we now have a start of something useful. Next is to run gparted on my RPi5 and first move p11 which is /usr/local to 1MiB from the end. Don't leave that as 0 or the move will fail. Once moved, you can resize p10 to be 200+G. Now we have some useful space. I added a user with sudoers privs. There is no date/time setup when it boots, that needs to get rectified with a ntp callout.  Now... there is no C compiler or binutils. binutils got installed. Lots of apt update -> apt install xyz's. Here is an interesti...