Monday, December 23, 2019

Ada on a CM33

The CM33

About 2 years ago, ARM released the CM33 to silicon designers. It is designed for IoT but it has a myriad of options that the silicon designer (SD) can choose from. So, at its core its just another CM series processor not unlike a CM4/CM4F/CM7. What differentiates it to a SD are the choices that allow the main extension and security extensions (aka TrustZone). When both of these options are in place and selected by the end user, the CPU looks at one moment like two CM4Fs one secure and one not secure. Of course, the program counter can only be one place at one time so whilst it may indeed look like two cores, only one is really there and it is in this management of the view of the SoC from each of these two spaces that the true colour of the CPU shows. For this discussion of Ada on a CM33, we will assume the CPU starts in TrustZone mode + Main extension. This means that there are two SysTick timers, one for secure(S) and one for non-secure(NS). The CPU registers are only banked for some of the registers. R0-R12, LR, PC are not banked. Banking takes place for the SP's. For the usual MSP & PSP there are now new variants that deduce their value from the context the CPU is running in. So we have MSP_NS, PSP_NS & MSP_S, PSP_S. There are also some SP limit regs, also banked. A myriad of internal peripherals are also banked, the E000ED00 - space for example. S always has access to the NS banked world but NS has hardware&software reduced views of memory and registers. There are many complexities in a CM33, as an example, lets look at exception handling. If its a S originating exception to S handler, no issues, the usual frame is maintained.  Similarly NS originating exception to NS handler. If however it was a S originating exception to a NS handler there is the potential of a leak of S register info via this asynchronous 'peek' into the now stopped S side. ARM was clever here and I believe must have given their CPU designers quite a design challenge. The idea is to now push a *big* frame onto the S stack and then zero all the registers and arrive at the NS handler. There is major LR magic going on in a CM33. It has more bits now to indicate S and NS frame info. ARM does not do all the work for you in HW here btw. Once your program is running, lets say its executing in S, and wants to call over to code in NS, this is doable. There is a new instr: BLXNS. This allows you to call to a NS entry point. For the converse, a NS program calling a S function, there are limits in place. The entry point must be an SG instruction, the memory space must be marked Non-secure-callable (NSC). After that, the usual veneer code can be used. For the first example, S->NS, it is the job of SW to wipe out the S registers before the BLXNS. So there is some boilerplate to do that in assembly, the func is __gnu_cmse_nonsecure_call. There are some new C compiler options that trigger this automatically. (--cmse cortex-m security extensions I believe that stands for).

Ada on a CM33

So building on AdaCore's ARM offering via the Ada_Drivers_Library and Ravenscar profiles, we can consider how to get software running on a CM33. The first thing to think about with a port of Ada to a CM33 is what is the use case that is being contemplated? How will a CM33 improve on the already perfectly usable Ada footprint in the CM world. I.e. what is to be gained by moving Ada to a CM33, are there threats that the security modes of a CM33 would help to close?

Secure Booting Ada

One item that looks promising is secure boot. Secure boot is the establishment of a root of trust in a system starting from power on reset and extending out into securing the system during runtime by offering a secure interface that can be leveraged back over to NS components to allow them to securely utilize keys, and other high value assets protected by the root of trust and indirectly visible to the NS side. For example, encrypting a packet with a network key. Perhaps that key, on the S side is used with HW to perform an encryption and the result placed in NS ram. At no time did the NS side have access to the key, the HW or any algorithm associated with the encryption. So this looks quite promising, we can establish a root of trust and then pass control to NS whilst offering a set of S side APIs to the NS side.

Non-secure Ada

 The secure side after establishing the root of trust, passes control to the NS side. From this side, user code can take over and perform system code much as before in the CM4 days but it does so under the watchful eye of SoC hardware that has prepared the memory and peripherals in the SoC in such a way as to sandbox the NS side from overstepping any boundary that secure boot establishes. 

Two programs

The way my Ada port to a CM33 works is to have two somewhat independent programs running on the same CPU. The basic layout is S:glue:NS where S and NS are two Ada programs. glue is not written in Ada but is a mix of C and assembly and it is tasked with joining S and NS bidirectionally so that each can call over to each other in a secure fashion.

Here are the memory maps for S and NS. 

Memory region         Used Size  Region Size  %age Used
           flash:          0 GB       512 KB      0.00%
          sram12:       56512 B       256 KB     21.56%
           sglue:          0 GB        60 KB      0.00%
          nsglue:          32 B         4 KB      0.78%
              ns:          0 GB       128 KB      0.00%

Memory region         Used Size  Region Size  %age Used
           flash:          0 GB       512 KB      0.00%
          sram12:          0 GB       256 KB      0.00%
           sglue:          0 GB        60 KB      0.00%
          nsglue:          0 GB         4 KB      0.00%
              ns:       81840 B       128 KB     62.44%
Currently this is being tested on an STM32L552 Nucleo board with TZONE=1 (that is an almost irreversible OTP option bit that once programmed makes the core Main Extension and Secure Extension). By default an STM32L552 Nucleo ships looking a lot like a fancy CM4F with none of the magic CM33 options activated.

Step One

Step one is a debugger. We want to use arm-eabi-gdb but what to connect it to? I did a port of openocd to ARMv8m to achieve this. I have added 3 CM33 cpu's 

1) LPC5569 (lpc55xx.cfg)
2) nRF5340 (nRF5340.cfg)
3) STM32L552 (st_nucleo_l552.cfg)

Only the ST openocd port supports flashing, but at the moment I only use SRAM for my experiments as by now I am sure I would have worn out the flash in the part.

The code is up on github:

Hard bugs

There are some very interesting and very tough bugs that can come from this type of work. Days can disappear as the clues are fleeting.  To that end, one particularly bad one caused me a two or more day detour whilst I got trace going:


Via trace being captured on a Saleae and then massaged into a Pulseview ITM trace, I could see where the issue was. There are two SysTicks now and two Ada programs. The Ravenscar runtimes will periodically context switch on either side after number of SysTicks has been reached. In a single core world, this is fine. However, in a CM33 the PC can be over on either side when the exception appears. The CPU handles this fine, its the runtime software that has to be made wise to this dual origination. I found that crashing would occur anytime the PC was on the other side when the exception appears for the side in Q. The magic LR value indicates where the saved registers are. The problem is when there is a decision by the runtime to context switch using a Pending SV call, that new direction left the other side in a lurch as the exception frame was not honoured and control was stolen over the the excepting side. The solution I have at the moment that fixes the crash is to XOR between the security states. Only context switch when the exception and context are on the same side. This can still lead to starvation if both sides choose to be in WFI instructions. It doesn't crash but the allocation of CPU time is not equitable between the two sides. For now, I think there is no need to WFI on the S side and use it instead as a deterministic API handler. Other bugs can be frustrating security issues where the SAU status indicates faults due to misaligned calls from NS to S or S to NS when the NS side is not marked NS. Some of these security fails can really open up a rabbit hole of odd registers that need to be flipped to allow S->NS or NS->S.

Complexity

The 3 designs I have looked at, the NXP LPC5569, Nordic nRF5340 and STM32L552 are some of the most complex microcontrollers I have ever worked with. A CM4F/CM7 already can be a handful depending on what you are trying to do. Now take a CM4F and add security via TrustZone, the SAU (security attribution unit), possibly an IDAU (implementation specific device attribution unit), the MPU (now also changed and new in the ARMv8m + banked in S and NS). So via these units you begin to divvy up the ram into units S & NS can work with. Next if you need to share SoC peripherals you begin with two views, the S & NS view of the peripheral. Ada's Ada_Device_Library needed to be taught how to handle this new view. Typically each of these SoCs has S and NS views of a peripheral picked by the top nibble, for example:

   GPIOA_Base : constant System.Address :=
     System'To_Address (16#42020000#);

   SEC_GPIOA_Base : constant System.Address :=
     System'To_Address (16#52020000#);

In the Ada driver we have the device auto choose the address based on execution context:

   function S_NS_Periph (Addr : System.Address) return System.Address
     with Inline;

   GPIO_A : aliased GPIO_Port with Import, Volatile, Address => S_NS_Periph (GPIOA_Base);

It should be mentioned that this is vendor specific, each vendor can choose how they do S/NS mappings. This just adds to the complexity.

Example

An example has been prepared that cycles the 3 user LEDs on the Nucleo board. The LED toggle is done by a secure function which is called by a non-secure function.



No comments:

Post a Comment