Sunday, June 16, 2019

An Ada Client & Server on the STM32WB55


Ada WB55 Client & Server

Following on from the last blog posting, I now have a client and server implementation in Ada. Mine is not quite as fancy as ST's, ST's allows role reversal where the client can run on the larger board (MB1355C) and the server on the USB dongle (MB1293C). I only support server on MB1355C and client on MB1293C.

Getting the code & building

You will need gnat2018 or gnat2019 from AdaCore:

https://www.adacore.com/download

Once that's installed, you will need some library code and the STM32 dir I use:

git clone https://github.com/morbos/Ada_Drivers_Library.git
git clone https://github.com/morbos/embedded-runtimes.git
git clone https://github.com/morbos/STM32.git
mv ../embedded-runtimes Ada_Drivers_Library
cd STM32/WB/WB55/cli_serv_wb55
make

Once make finishes you get 2 ELF32 files in the obj/Debug dir.

admin@ubuntu_1604:/tmp/STM32/WB/WB55/cli_serv_wb55$ make
rm -f obj/Debug/client_wb55x
(export LOADER=ROM_WB55x; gprbuild client_wb55x.gpr)
Link
   [link]         client_wb55x.adb
Memory region         Used Size  Region Size  %age Used
           flash:       93008 B         1 MB      8.87%
           sram1:      102792 B       192 KB     52.28%
          sram2a:          0 GB        32 KB      0.00%
          sram2b:          0 GB        32 KB      0.00%
(cd obj/Debug; arm-eabi-objdump -d client_wb55x >client_wb55x.lst; arm-eabi-objdump -s client_wb55x >client_wb55x.dmp; arm-eabi-gcc-nm -an client_wb55x >client_wb55x.nm; arm-eabi-objcopy -Obinary client_wb55x client_wb55x.bin)
rm -f obj/Debug/server_wb55x
(export LOADER=ROM_WB55x; gprbuild server_wb55x.gpr)
Link
   [link]         server_wb55x.adb
Memory region         Used Size  Region Size  %age Used
           flash:       93752 B         1 MB      8.94%
           sram1:      108056 B       192 KB     54.96%
          sram2a:          0 GB        32 KB      0.00%
          sram2b:          0 GB        32 KB      0.00%
(cd obj/Debug; arm-eabi-objdump -d server_wb55x >server_wb55x.lst; arm-eabi-objdump -s server_wb55x >server_wb55x.dmp; arm-eabi-gcc-nm -an server_wb55x >server_wb55x.nm; arm-eabi-objcopy -Obinary server_wb55x server_wb55x.bin)

Each of these ELF32 files can be flashed on the board.

Openocd

To flash the MB1355C use the ST-Link USB connector (the silkscreen is on the bottom of the board).

You will need openocd-0.10.0 that is on my github.

Modify the st_nucleo_wb.tcl in the tcl/board dir. You want to make sure the v2-1 line is uncommented. 

From:
# source [find interface/stlink-v2-1.cfg]
source [find interface/stlink-v2.cfg]

To:

source [find interface/stlink-v2-1.cfg]
#source [find interface/stlink-v2.cfg]

Then you can attach:

root@pi3:~/openocd-0.10.0/tcl# ../src/openocd -f board/st_nucleo_wb55.cfg Open On-Chip Debugger 0.10.0 Licensed under GNU GPL v2 For bug reports, read http://openocd.org/doc/doxygen/bugs.html Info : The selected transport took over low-level target control. The results might differ compared to plain JTAG/SWD adapter speed: 2000 kHz adapter_nsrst_delay: 100 none separate none separate Info : Unable to match requested speed 2000 kHz, using 1800 kHz Info : Unable to match requested speed 2000 kHz, using 1800 kHz Info : clock speed 1800 kHz Info : STLINK v2 JTAG v32 API v2 SWIM v22 VID 0x0483 PID 0x374B Info : using stlink api v2 Info : Target voltage: 3.268721 Info : stm32wb.cpu: hardware has 6 breakpoints, 4 watchpoints

Flashing & Debug

admin@ubuntu_1604:/.share/CACHEDEV1_DATA/Ada/STM32/WB/WB55/cli_serv_wb55$ arm-eabi-gdb obj/Debug/server_wb55x
GNU gdb (GDB) 8.3 for GNAT Community 2019 [rev=gdb-8.3-ref-194-g3fc1095]
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
See your support agreement for details of warranty and support.
If you do not have a current support agreement, then there is absolutely
no warranty for this version of GDB.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-pc-linux-gnu --target=arm-eabi".
Type "show configuration" for configuration details.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from obj/Debug/server_wb55x...
(gdb) target extended-remote 10.0.1.241:3333
Remote debugging using 10.0.1.241:3333
0x0800e02e in system.bb.board_support.interrupts.power_down ()
    at /.share/CACHEDEV1_DATA/Ada/STM32/WB/WB55/cli_serv_wb55/Ada_Drivers_Library/embedded-runtimes/base_runtimes/ravenscar-full/gnarl-common/s-bbbosu.adb:402
402          Asm ("wfi", Volatile => True);
(gdb) monitor reset halt
Unable to match requested speed 2000 kHz, using 1800 kHz
Unable to match requested speed 2000 kHz, using 1800 kHz
adapter speed: 1800 kHz
target halted due to debug-request, current mode: Thread 
xPSR: 0x01000000 pc: 0x08010968 msp: 0x2001a620
(gdb) load
Loading section .text, size 0x121d0 lma 0x8000000
Loading section .ARM.extab, size 0xe34 lma 0x80121d0
Loading section .ARM.exidx, size 0xe98 lma 0x8013004
Loading section .rodata, size 0x31e8 lma 0x8013ea0
Loading section .data, size 0x78c lma 0x8017088
Start address 0x8010968, load size 96272
Transfer rate: 22 KB/sec, 10696 bytes/write.
(gdb) monitor reset init
Unable to match requested speed 2000 kHz, using 1800 kHz
Unable to match requested speed 2000 kHz, using 1800 kHz
adapter speed: 1800 kHz
target halted due to debug-request, current mode: Thread 
xPSR: 0x01000000 pc: 0x08010968 msp: 0x2001a620
Unable to match requested speed 2000 kHz, using 1800 kHz
Unable to match requested speed 2000 kHz, using 1800 kHz
adapter speed: 1800 kHz
(gdb) b main
Breakpoint 1 at 0x800060a: file b__server_wb55x.adb, line 276.
(gdb) b __gnat_last_chance_handler
Breakpoint 2 at 0x800065a: file /.share/CACHEDEV1_DATA/Ada/STM32/WB/lch_sfp/led/last_chance_handler.adb, line 48.
(gdb) c
Continuing.
Note: automatically using hardware breakpoints for read-only addresses.

Breakpoint 1, main () at b__server_wb55x.adb:276
276       Ensure_Reference : aliased System.Address := Ada_Main_Program_Name'Address;
(gdb) 

To flash the MB1293C attach using the other text line of st_nucleo_wb.tcl and use a similar flashing process except flash the client_wb55


Thursday, June 13, 2019

Ada on the STM32WB

STM32WB

This is a new part family that merges an L series microcontroller with the BlueNRG-MS controller realized now as a single die. The microcontroller can run at 64Mhz has 1MB of flash and 256KB of ram. The wireless portion (I say wireless since its not limited to BT/BLE but also supports Thread and Zigbee), runs on a Cortex-M0+ controller and shares the top of the 1MB of flash for its (encrypted) FW. The M0's FW size seems about 256KB or so. The CM4F and CM0+ communicate by using some of the 256KB of ram as a mailbox. This is supported by HW (IPCC) that signals interrupts from one side to the other when data is ready. In the past, as on the SensorTile, the BlueNRG-MS was connected by SPI and that was the transport, now its done by mailbox and hw signalling.

Ada

Can we do it again? Can we get Ada running usefully on an WB series part with the added burden of the new control over BT/BLE? I will save the readers time here to say, yes, its possible! further, its working. Here is some interesting info, Ada's package system cleanly separates modules from one another so, I was able to smoothly migrate the BT work over from the SensorTile to the WB just as I had envisioned. The trick here was to recognize that ST was not going to re-invent the wheel here, they would use 99% of the working BlueNRG-MS stack over on the WB. That means, opcodes are the same, events are the same. All the data structures that had been done for those were a drop in. This saves me months of weekend dev time. Of course, there are some differences, these are relatively minor wrt the BT messaging.

SVD

I carp about SVD files a bunch. To me I think they are the key to getting Ada going on a a new target. ST has been good over the years at generating SVD files. Why should the WB be any different? Well, it is different. There is no SVD file on their website as of this writing. (odd since they released one for the complex STM32MP157 with hundreds of blocks). I need an SVD file so what are we going to do to get Ada going on the WB55? Well, the reference manual defines all the regs... maybe... well that's what I did I cut and paste into block txt files all the reg defns, bugs and all (yes there are loads of datasheet bugs). Once all the hw blocks were assembled I then hacked a Ruby script to convert them into xml fragments and assembled the whole shebang as the SVD file. Its part of the dir trees called out below. svd2ada is happy with the file and I have been using it smoothly for bringup and for my usual technique of using SVD files to parse raw GDB dumps back to ascii dotted format for easy diffing (see my blog entry on that: http://www.hrrzi.com/2017/07/arm-cortex-svd-files-lot-of-goodness.html  ).

An update on the SVD topic, Pierre Le Corre of ST pointed out the the STM32CubeIDE has as subdir where the SVD file can be found. Indeed its there along with all of the F and L series parts.

STM32CubeIDE_1.0.1/STM32CubeIDE/plugins/com.st.stm32cube.ide.mcu.productdb.debug_1.0.0.201904021149/resources/cmsis/STMicroelectronics_CMSIS_SVD

Hardware & Demo FW

ST sells a nucleo board eval kit for the WB55. It has two boards in it. A nucleo board and a USB dongle. Out of the box they have a client server demo. The bigger board is the server and the dongle is a client. The dongle on a button press scans and connects to the server. Once connected, the button (SW1) toggles the blue LED on the other board. So, each can flip the led on the other one. SW2 on the bigger board changes the rate the radio refreshes. In this mode, the LED takes a little longer to toggle.

Ada client

I crafted a workalike of the ST client that runs on the dongle. Here is the larger nucleo running STs server code communing with my Ada client on the dongle.



The Ada client performs all the functions as stated in the readme ST provides:

 - The Peripheral device (BLE_p2pServer) starts advertising (during 1 minute), the green led blinks for each advertising event.
 - The Central device (BLE_p2pClient) starts scanning when pressing the User button (SW1) on the USB Dongle board. 
   - BLE_p2pClient blue led becomes on. 
   - Scan req takes about 5 seconds. *
   - Make sure BLE_p2pServer advertises, if not press reset button or switch off/on to restart advertising.
 - Then, it automatically connects to the BLE_p2pServer. 
   - Blue led turns off and green led starts blinking as on the MB1355C. Connection is done.
 - When pressing SW1 on a board, the blue led toggles on the other one.
   - The SW1 button can be pressed independently on the GATT Client or on the GATT Server.
 - When the server is located on a MB1355C, the connection interval can be modified from 50ms to 1s and vice-versa using SW2. 
 - The green led on the 2 boards blinks for each advertising event, it means quickly when 50ms and slowly when 1s. 
 - Passing from 50ms to 1s is instantaneous, but from 1s to 50ms takes around 10 seconds.
 - The SW1 event, switch on/off blue led, depends on the connection Interval event. 
   - So the delay from SW1 action and blue led change is more or less fast.

* I should say the Ada client scans faster since it abandons the scan when it finds the server.

Code

Order a WB55 nucleo board and get started with Ada running a BLE stack!

## Building on Linux gnat2018 or gnat2019 needs to be installed first
git clone https://github.com/morbos/Ada_Drivers_Library.git
git clone https://github.com/morbos/embedded-runtimes.git
git clone https://github.com/morbos/STM32.git
mv ../embedded-runtimes Ada_Drivers_Library
cd STM32/WB/WB55/client_wb55
make

Flashing & Debugging

To flash the code to the USB dongle, openocd needs to be used. First there is a hookup:


Four wires from the ST Link V2.0 over to the USB dongle. If you zoom in a bit you can id the wires that need to go where.

Next you need my version of openocd:


I built it on a RaspberryPi3 as so:

./configure --enable-ftdi --enable-stlink --enable-ti-icdi --enable-jlink

Then make

Add other --enable-xyz's if you have other targets not called out above.

Finally run it. I usually cd to the tcl dir:

../src/openocd -f board/st_nucleo_wb55.cfg
Open On-Chip Debugger 0.10.0
Licensed under GNU GPL v2
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
Info : The selected transport took over low-level target control. The results might differ compared to plain JTAG/SWD
adapter speed: 2000 kHz
adapter_nsrst_delay: 100
none separate
none separate
Info : Unable to match requested speed 2000 kHz, using 1800 kHz
Info : Unable to match requested speed 2000 kHz, using 1800 kHz
Info : clock speed 1800 kHz
Info : STLINK v2 JTAG v17 API v2 SWIM v4 VID 0x0483 PID 0x3748
Info : using stlink api v2
Info : Target voltage: 3.248645
Info : stm32wb.cpu: hardware has 6 breakpoints, 4 watchpoints

Future work

This is a work in progress.. I plan to swap out the server on the larger nucleo board with an Ada version. Stay tuned.






Thursday, March 7, 2019

Shift on Mips-X

Mips-X


Mips-X as described earlier on the blog was a Stanford University grad project. A 32bit RISC CPU with some unique features for one, it had 2 delay slots for control change instructions, branches and jumps. I am not aware of any other processor that has that. We had a visit from John Hennessy (Stanford Mips project faculty lead and ultimately university president) one day (not Mips-X related) and I asked him, "why two delay slots?" his paraphrased answer was "It was a graduate project, we were just trying things out".

The Shifter

Mips-X had a barrel shifter and exposed it to the programmer via these opcodes:

asr    rSRC,rDST,#1..32
rotlb  rSRC1,rSRC2,rDST
rotlcb rSRC1,rSRC2,rDST
sh     rSRC1,rSRC2,rDST,#1..32


Via a combination of the above, all the needed shift operations could be done. Observe though there is no variable shift, just fixed # shift values.

My Shift function

Now here is a good puzzle for the reader to parse my variable shift func for lsr.s.

r0 == 0 -- can be a src or dst
r24 is the code segment offset (allows for position independent code off of r24).
r4 is the value to be shifted.
r5 has the #<shift>
r2 is the result.
r31 is the return address

.text
.noreorg
shift_table:
        mov     r4,r2
        lsr     r4,r2,#1
        lsr     r4,r2,#2
        lsr     r4,r2,#3
        lsr     r4,r2,#4
        lsr     r4,r2,#5
        lsr     r4,r2,#6
        lsr     r4,r2,#7
        lsr     r4,r2,#8
        lsr     r4,r2,#9
        lsr     r4,r2,#10
        lsr     r4,r2,#11
        lsr     r4,r2,#12
        lsr     r4,r2,#13
        lsr     r4,r2,#14
        lsr     r4,r2,#15
        lsr     r4,r2,#16
        lsr     r4,r2,#17
        lsr     r4,r2,#18
        lsr     r4,r2,#19
        lsr     r4,r2,#20
        lsr     r4,r2,#21
        lsr     r4,r2,#22
        lsr     r4,r2,#23
        lsr     r4,r2,#24
        lsr     r4,r2,#25
        lsr     r4,r2,#26
        lsr     r4,r2,#27
        lsr     r4,r2,#28
        lsr     r4,r2,#29
        lsr     r4,r2,#30
        lsr     r4,r2,#31
.globl ___lshrsi3
___lshrsi3:
        nop
        add     r24,r5,r1
        jspci   r1,#shift_table,r0
        jspci   r31,#0,r0
        nop
        nop
.end


Look at the two jspci's above. A jspci in the delay slot of a jspci! What happens? Also observe the nop at function entry. Why is that there? Well, this func's caller could have had a LD of r5 in the second delay slot of the jspci. In that case, if add were the first instruction, r5 would be stale as LD's have a one instruction hazard.


        jspci   r24,#___lshrsi3,r0
        nop
        ld      0[r29],r5
___lshrsi3:
        add     r24,r5,r1

That is a hazard as r5 is still in transit in the pipeline when the add goes to use it. Thus the nop.

Tuesday, November 20, 2018

California fires + Air Quality + Ada + Bluepill + PMS7003

Camp fire

The Camp fire has had a big effect on the Air Quality Index (AQI) in the Bay Area over the past week or so. AQI is typically measured as an entry in one of 7 bands:


To compute the AQI (typically cited from the PM2.5 #) a formula is used. See the wiki page above.

The PM2.5# is in µg/mfortunately there is a Chinese sensor, the PMS7003 that can provide PM1, PM2.5 and PM10 in those units:


So we have a sensor, we already have a Bluepill running Ada with an SPI screen to display the results. 


The sensor is a $20 eBay purchase. Bluepills are < $2 as documented on this blog, screen is a little over $3. So for $25 or so you can build your own AQI meter.

Code is in: 



Thursday, May 10, 2018

Bluepill+: Diagnosing bad solder joints


If your like me... a FW engineer attempting to do soldering then this might sound familiar. I have been replacing Bluepill STM32F103C8's with STM32L443CC chips. Its a hack as I do it as I use hot air for the remove and then residual pad solder for the stick down of the replacement.

Not surprisingly, some pads don't get good adhesion. Worse, some pads don't show this lack of adhesion until much later when you need the pad to do something useful.

The STM32 series are quite good, I think all user pads can be made into GPIOs. What if we wrote a program that toggled each pad every second and then walked the pins with a meter. So that program was written.

https://github.com/morbos/STM32/blob/master/L/L443/pinny/src/pinny.adb

That's great! Take a look at that code for the abstraction on the GPIOs that the Ada Device Library provides with

type GPIO_Points is array (Positive range <>) of GPIO_Point;

The magic decl in pinny is:

BP_Points : GPIO_Points :=
PB0 & PB1 & PB3 & PB4 & PB5 & PB6 & PB7 & PB8 &
PB9 & PB10 & PB11 & PB12 & PB13 & PB14 & PB15 &
PA0 & PA1 & PA2 & PA3 & PA4 & PA5 & PA6 & PA7 &
PA8 & PA9 & PA10 & PA11 & PA12 & PA15 &
PC13 & PC14 & PC15;


& concatenates items together, in this case of GPIO_Point making the init of the array BP_Points.

After that, we can just say:

Toggle(BP_Points);

The lib has code to accept the array and it will walk each element and toggle each one.

So that is every useful pin on a Bluepill highlighted above in the decl. With the abstraction of the pins to be a single element, initialization and group change becomes trivial and a small amount of user code.

A corridor conversation with a co-worker, Tyson Leistiko followed. As I described my software he mentioned I should group the GPIOs in such a way that neighboring pins toggle asymmetrically and in that way look for non 3v, 0v pins indicating some form of pin bridging. So pinny was adapted easily to add two hand picked sets that reflect the property of being neighbors to the other set.

As so:

Set1 : GPIO_Points :=
PC13 & PC15 & PA1 & PA4 & PA6 & PB0 & PB10 & PB12 &
PB14 & PA8 & PA10 & PA12 & PB3 & PB5 & PB7 & PB9;
and:

Set2 : GPIO_Points :=
PC14 & PA0 & PA2 & PA3 & PA5 & PA7 & PB1 & PB11 &
PB13 & PB15 & PA9 & PA11 & PA15 & PB4 & PB6 & PB8;

Post those decl's we can preload the sets as so:


Clear (Set1);
Set (Set2);

Then both pinny programs go into a loop allowing you to probe all the pins for the 3v -> 0v -> 3v .. pattern.

loop
Toggle (BP_Points);
delay until Clock + Milliseconds (1000);
end loop;

https://github.com/morbos/STM32/blob/master/L/L443/pinny2/src/pinny2.adb

It should be said my friend Eric Schlaepfer mentioned that the EE way of doing this is to probe
the ESD diodes and look for pad to pin connectivity that way. My solution: a FW engineer caused the
problem, a FW way was used to repair the problem 😀

Saturday, April 21, 2018

Tracing a Bluepill

Did you know the lowly STM32F103C8 has non-invasive trace logic along for the ride? Did you also know that via program control:

1) Trace can be enabled out to a pin on the board (PB3).
2) 32 channels can be used to emit 32, 16 or 8bit (char) values from running code. Useful to monitor values from your code.
3) A cycle count can be emitted
4) Data Watchpoints can also be reported. See a picture below.



Showing SysTick exception handling:

Showing a sensor range value being emitted on channel 16 (that's the max 1FFE in mm's btw)

Showing a watchpoint on the read modify write of the Green LED on PC13 (HW watchpoint on: BSRR @0x40011010).

Also an Overflow is seen (first time I have seen that actually). Can be mitigated by bumping the output rate.



Bluepill shown with a VL53L10X range sensor. The trace is the yellow wire connected to TRACESWO PB3 and green wire is connected to GPIO1 from the sensor. That is plumbed up to go low when a new sample is ready. Based on the range value, the green LED is lit or not. (a version of this is used in my garage to detect if the car is far enough along to close the door w/o striking the rear or the vehicle).

Use

So a board that can cost $1.27 on Taobao has all this and more.

One omission on such an inexpensive design is ETM (Embedded Trace Macrocell). This is featured on many of the larger ST designs. Its easy to check if it exists as the register space at E0041000 will have non-zero contents. On a bluepill, that space is all zero.

No matter, Instruction Trace Macrocell (ITM) is available. This can report the PC of the running program at a sub-sampled rated.

ITM also reports when the code dips into the exception handler as shown above.

Usually though when configured, the diet is a steady stream of PC's.

Some observations of this technology.

1) You need background/sample code. A hackaday post in 2015 on the great work by Petteri Aimonen:
a)  Showing how enable the Trace Port Interface Unit TPIU/ITM and ETM
b) Adding .py scripts to Pulseview to permit GUI viewing of trace output as seen above in the images.
2) ETM, Petteri used an F1 Value Line part per his blog. I had no luck getting ETM data out of that part.

Some observations:1) You must have a debug connection to the target to get SWO to emit trace.
2) It's possible to muck around with rate settings. You can get a rate that is unreal, I had a Bluepill+ with trace @80Mhz coupled with a minimally subdivided PC. You get a firehose of PCs coming out. Perhaps one very 20-30instrs or so. Amazing.
3) My code is a translation of Petteri's C example into Ada.

https://github.com/morbos/STM32/tree/master/F/F103/vl53l0x_trace_f103


Using Pulseview

The pretty picture above is done with Petteri's work. You can install Pulseview for Linux or Windows (also you can build the source (as I did to a Raspberry pi3)). The trick to decoding the trace is to have a UART capture of PB3 off of the traceswo port. Get that UART capture into Pulseview. For my path, I use an unsupported logic capture, the Salae logic pro 16. No matter, it exports binary and then its a simple matter to convert it to a .sr file that Pulseview can read:

Hedley@FASTER ~/trace
$ split --numeric-suffixes=1 --suffix-length=1 --bytes=4M untitled.bin logic-1-

Hedley@FASTER ~/trace
$ zip trace.sr version metadata logic-1-*
updating: version (stored 0%)
updating: metadata (deflated 8%)
updating: logic-1-1 (deflated 97%)
updating: logic-1-2 (deflated 97%)
updating: logic-1-3 (deflated 97%)
updating: logic-1-4 (deflated 97%)
updating: logic-1-5 (deflated 97%)
updating: logic-1-6 (deflated 97%)
updating: logic-1-7 (deflated 100%)
updating: logic-1-8 (deflated 100%)
updating: logic-1-9 (deflated 100%)

Hedley@FASTER ~/trace
$ cat metadata
[global]
sigrok version=0.3.0

[device 1]
capturefile=logic-1
total probes=8
samplerate=50 MHz
probe1=SWO
probe2=GPIO1
unitsize=1

So once in Pulseview we can add a decoder to the UART. Just add ARM ITM and set the baudrate to your capture speed (in my case, 8Mhz). Then let Pulseview's decoder show you the trace. 

Notes:

https://hackaday.com/2015/03/09/execution-tracing-on-cortex-m-microcontrollers/