Mips-X
Mips-X as described earlier on the blog was a Stanford University grad project. A 32bit RISC CPU with some unique features for one, it had 2 delay slots for control change instructions, branches and jumps. I am not aware of any other processor that has that. We had a visit from John Hennessy (Stanford Mips project faculty lead and ultimately university president) one day (not Mips-X related) and I asked him, "why two delay slots?" his paraphrased answer was "It was a graduate project, we were just trying things out".
The Shifter
Mips-X had a barrel shifter and exposed it to the programmer via these opcodes:
asr rSRC,rDST,#1..32
rotlb rSRC1,rSRC2,rDST
rotlcb rSRC1,rSRC2,rDST
sh rSRC1,rSRC2,rDST,#1..32
asr rSRC,rDST,#1..32
rotlb rSRC1,rSRC2,rDST
rotlcb rSRC1,rSRC2,rDST
sh rSRC1,rSRC2,rDST,#1..32
Via a combination of the above, all the needed shift operations could be done. Observe though there is no variable shift, just fixed # shift values.
My Shift function
Now here is a good puzzle for the reader to parse my variable shift func for lsr.s.r0 == 0 -- can be a src or dst
r24 is the code segment offset (allows for position independent code off of r24).
r4 is the value to be shifted.
r5 has the #<shift>
r2 is the result.
r31 is the return address
.text
.noreorg
shift_table:
mov r4,r2
lsr r4,r2,#1
lsr r4,r2,#2
lsr r4,r2,#3
lsr r4,r2,#4
lsr r4,r2,#5
lsr r4,r2,#6
lsr r4,r2,#7
lsr r4,r2,#8
lsr r4,r2,#9
lsr r4,r2,#10
lsr r4,r2,#11
lsr r4,r2,#12
lsr r4,r2,#13
lsr r4,r2,#14
lsr r4,r2,#15
lsr r4,r2,#16
lsr r4,r2,#17
lsr r4,r2,#18
lsr r4,r2,#19
lsr r4,r2,#20
lsr r4,r2,#21
lsr r4,r2,#22
lsr r4,r2,#23
lsr r4,r2,#24
lsr r4,r2,#25
lsr r4,r2,#26
lsr r4,r2,#27
lsr r4,r2,#28
lsr r4,r2,#29
lsr r4,r2,#30
lsr r4,r2,#31
.globl ___lshrsi3
___lshrsi3:
nop
add r24,r5,r1
jspci r1,#shift_table,r0
jspci r31,#0,r0
nop
nop
.end
Look at the two jspci's above. A jspci in the delay slot of a jspci! What happens? Also observe the nop at function entry. Why is that there? Well, this func's caller could have had a LD of r5 in the second delay slot of the jspci. In that case, if add were the first instruction, r5 would be stale as LD's have a one instruction hazard.
jspci r24,#___lshrsi3,r0
nop
ld 0[r29],r5
___lshrsi3:
add r24,r5,r1
That is a hazard as r5 is still in transit in the pipeline when the add goes to use it. Thus the nop.