Thursday, March 7, 2019

Shift on Mips-X

Mips-X


Mips-X as described earlier on the blog was a Stanford University grad project. A 32bit RISC CPU with some unique features for one, it had 2 delay slots for control change instructions, branches and jumps. I am not aware of any other processor that has that. We had a visit from John Hennessy (Stanford Mips project faculty lead and ultimately university president) one day (not Mips-X related) and I asked him, "why two delay slots?" his paraphrased answer was "It was a graduate project, we were just trying things out".

The Shifter

Mips-X had a barrel shifter and exposed it to the programmer via these opcodes:

asr    rSRC,rDST,#1..32
rotlb  rSRC1,rSRC2,rDST
rotlcb rSRC1,rSRC2,rDST
sh     rSRC1,rSRC2,rDST,#1..32


Via a combination of the above, all the needed shift operations could be done. Observe though there is no variable shift, just fixed # shift values.

My Shift function

Now here is a good puzzle for the reader to parse my variable shift func for lsr.s.

r0 == 0 -- can be a src or dst
r24 is the code segment offset (allows for position independent code off of r24).
r4 is the value to be shifted.
r5 has the #<shift>
r2 is the result.
r31 is the return address

.text
.noreorg
shift_table:
        mov     r4,r2
        lsr     r4,r2,#1
        lsr     r4,r2,#2
        lsr     r4,r2,#3
        lsr     r4,r2,#4
        lsr     r4,r2,#5
        lsr     r4,r2,#6
        lsr     r4,r2,#7
        lsr     r4,r2,#8
        lsr     r4,r2,#9
        lsr     r4,r2,#10
        lsr     r4,r2,#11
        lsr     r4,r2,#12
        lsr     r4,r2,#13
        lsr     r4,r2,#14
        lsr     r4,r2,#15
        lsr     r4,r2,#16
        lsr     r4,r2,#17
        lsr     r4,r2,#18
        lsr     r4,r2,#19
        lsr     r4,r2,#20
        lsr     r4,r2,#21
        lsr     r4,r2,#22
        lsr     r4,r2,#23
        lsr     r4,r2,#24
        lsr     r4,r2,#25
        lsr     r4,r2,#26
        lsr     r4,r2,#27
        lsr     r4,r2,#28
        lsr     r4,r2,#29
        lsr     r4,r2,#30
        lsr     r4,r2,#31
.globl ___lshrsi3
___lshrsi3:
        nop
        add     r24,r5,r1
        jspci   r1,#shift_table,r0
        jspci   r31,#0,r0
        nop
        nop
.end


Look at the two jspci's above. A jspci in the delay slot of a jspci! What happens? Also observe the nop at function entry. Why is that there? Well, this func's caller could have had a LD of r5 in the second delay slot of the jspci. In that case, if add were the first instruction, r5 would be stale as LD's have a one instruction hazard.


        jspci   r24,#___lshrsi3,r0
        nop
        ld      0[r29],r5
___lshrsi3:
        add     r24,r5,r1

That is a hazard as r5 is still in transit in the pipeline when the add goes to use it. Thus the nop.