Code to produce video signals on the STM32L Discovery

I’ve finally coded the program that produces a video signal.  There weren’t too many surprises.

One issue I had was that there was that occasionally a line would be drawn slightly to the right.  I changed the interrupt priority from 7 to 0, and disabled ChibiOS’ thread preemption. Probably only changing the interrupt will do, but I had no trouble after that.

A bigger problem is that any vertical line wriggles around on the screen a lot.  I don’t think there’s anything in my code that might cause this.  I came across a question on the STM site, which suggests that an instruction must complete before an interrupt will be triggered.  If the CPU happens to be running a long instruction, the line produced by the following interrupt will be shifted over slightly.  The Arduino TV-Out library doesn’t have this problem, even though the AVRs instructions can take different amounts of time.  I’m not sure what I can do about this – the timer should still be correct, so maybe I’d need a busy-wait loop while waiting for the timer to hit a particular value.  It looks like the TV-Out library does this.  It might need some assembler, which I don’t plan to learn right now (but maybe check this ST forum post).  But the author of the RBox suggests it’s because of the wait states.  I don’t know why there would be anything non-deterministic with wait states, unless there was caching involved, which I don’t think the STM32 has.  I suspect it’s ChibiOS running stuff during the timing interrupt, which would cause jitter.

I don’t plan to do any more work on this program, since it’s served its task of teaching me about ARM processors.  It has potential to show data its capturing or interacting with an operator.  There’s plenty of memory for a framebuffer for its its 400×288 display, It should be fairly easy to port the TVOut library to it, to add graphics and text rendering capabilities.  The advantage of the ARM chip is that because it uses DMA to write to the screen, the CPU is doing almost nothing while it’s displaying an image.  An AVR needs to work hard while a line is being drawn.

I’ve seen one project where an ARM chip produced colour signals.  The CPU didn’t have DMA though, but was faster than the STM32L.  The Freescale Freedom board looks like a good target (although ChibiOS doesn’t support it yet).  I was thinking about the way that 2D polygons are drawn, and I think it might be possible to render a number of 3D polygons with occlusion as each line is being drawn.  The unusual part of this rendering is that instead of the frame rate slowing when the CPU was busy, the vertical resolution would decrease instead as the previous line keeps getting rendered as a new line is being drawn.

Generating video signals like is is nifty, but maybe a bit pointless since there are chips around with composite output anyway.  The OLinuXino iMX233 would be ideal for this, as its CPU has a complete reference manual available.  It’s designed for running Linux, but some low level programming like I did here would provide an “instant-on” function.  The same could be done with the Raspberry Pi, but since there’s no user manual available, you’d need to rely on its limited documentation and Linux drivers.  I like the idea of porting the RTEMS operating system to the OLinuXino, since that OS provides a POSIX API and BSD networking, so porting other applications would be easier.

Here’s a video of my results.  My code is here: https://github.com/33d/stm32-video

Video signal output circuitry

I’ll need a simple circuit to mix my video signals together. The Arduino TV Out library shows how to do this, but that works with 5V IOs, but the STM32L Discovery (and all ARM chips AFAIK) uses 3.3V.

So which resistors will I need?  To produce a white signal, the sync and video lines will be high.  The equivalent circuit looks like this:

(The 75Ω resistor is the resistance inside the TV).

To show a black signal, the sync line will be high, and the video line will be low, which looks like this:

Wikipedia gives the formula for a voltage divider, so the resistors in the first diagram can be calculated with this formula:

$1= {75 \over {75+{1 \over { {1 \over V} + {1 \over S} }}}}\times 3.3$

and in the second:

$0.3= {{1 \over {{1 \over V} + {1 \over 75}}} \over {1 \over {{1 \over V} + {1 \over 75}}} + S} \times 3.3$

I tried solving these, but that’s well beyond my mathematical ability.  Instead I found some online site that could plot the two formulas (edit: I could have used Wolfram Alpha).  The lines crossed at about RV=250Ω and RS=580Ω. These resistor values don’t exist, so RV=270Ω and RS=560Ω is close enough.  They seem to work fine in the circuit.

DMA on the STM32L Discovery

There’s one more part to my video generator – the picture data, which I want to transfer to the SPI port using DMA. This actually looks fairly straightforward, these are the available registers:

 MEM2MEM I’m transferring from memory to a peripheral, so this should be off. PL I’ll make this “very high” priority, because I want to keep the picture stable at all costs. If a program writes to the framebuffer during this DMA transfer, it will be blocked. MSIZE I’ve set the SPI port to 8 bits so I’ll stick with that. I don’t think it will make any difference whether it’s 8 or 16. MINC I want the memory pointer to increment during the transfer PINC I guess this should be off, because to write SPI you keep sending data to the same memory location. CIRC I don’t want the memory pointer to circle around. DIR Read from memory Interrupts I won’t need any yet, but eventually I’ll have to turn off the SPI port at the end of the transfer, otherwise I’ll get white bars down the sides of the screen.

DMA_CNDTRx contains how much data to transfer. There are 7 channels, and table 40 of the reference manual says SPI2_TX is on channel 5. This needs to be set to the number of pixels / 8, since I’ll have 8 pixels in one byte.  There’s a “auto-reload” setting somewhere which resets this counter value after a transfer; I think this happens in circular mode.

Table 40 also suggests I must use DMA1 for these transfers.

The peripheral address register should point to the SPI data register (&(SPI2->DR)), and the memory register is the start of the current line of pixels.

That’s all of the available settings!  There’s one more thing to do though: section 10.3.7 says this:

The peripheral DMA requests can be independently activated/de-activated by programming the DMA control bit in the registers of the corresponding peripheral.

I guess this is the TXDMAEN bit in the SPI_CR2 register.

Now for some code… first I’ll make some data to send:

const uint8_t image[] = { 0xAA, 0x55, 0xAA, 0x55 };

Of course later on I’ll have a lot more data…

Now to set the above settings:

  DMA1_Channel5->CCR = DMA_CCR5_PL // very high priority
| DMA_CCR5_MINC  // memory increment mode
| DMA_CCR5_DIR;  // read from memory, not peripheral

Section 10.3.3 has this useful bit of information:

The first transfer address is the one programmed in the DMA_CPARx/DMA_CMARx registers. During transfer operations, these registers keep the initially programmed value. The current transfer addresses (in the current internal peripheral/memory address register) are not accessible by software.

This suggests that I only need to set these at the start and shouldn’t need to touch them again.

To set these:

  DMA1_Channel5->CMAR = (uint32_t) image;       // where to read from
DMA1_Channel5->CPAR = (uint32_t) &(SPI2->DR); // where to write to

Time to try it out… and… nothing!  Maybe there’s another clock setting for DMA, and sure enough there is:

  rccEnableAHB(RCC_AHBENR_DMA1EN, 0); // Enable DMA clock, run at SYSCLK

I still haven’t got anything, so I tried setting the source and destination registers each time before I start a DMA transfer. It looks like now I get a single transfer, but I’m trying to get a transfer on every hsync.

I poked around with the debugger, especially at 0x40026058 which is DMA5->CCR1 (I calculated the address from values in stm32l1xx.h), and noticed that the Enable flag is still set.  Maybe it has to be toggled each time?  Now I get a square wave instead of my data… I then tried decreasing my hsync timer, and decreasing the SPI speed, and I got a reasonable output.  I’m getting some nasty aliasing on my DSO Nano though, maybe I should have borrowed a faster scope!  I think I was triggering the DMA transfers too quickly, which produced that square wave.  Conveniently, I notice the SPI line is now low when it’s idle, which is the output I want.  I’m not sure why it’s gone low, but I’m not complaining.

So to sum up:

  rccEnableAHB(RCC_AHBENR_DMA1EN, 0); // Enable DMA clock, run at SYSCLK
// Configure DMA
DMA1_Channel5->CCR = DMA_CCR5_PL // very high priority
| DMA_CCR5_MINC  // memory increment mode
| DMA_CCR5_DIR;  // read from memory, not peripheral
DMA1_Channel5->CMAR = (uint32_t) image;       // where to read from
DMA1_Channel5->CPAR = (uint32_t) &(SPI2->DR); // where to write to
...
SPI2->CR2 = SPI_CR2_SSOE | SPI_CR2_TXDMAEN;

then in my hsync handler:

    // Activate the DMA transfer
DMA1_Channel5->CCR &= ~DMA_CCR5_EN;
DMA1_Channel5->CNDTR = sizeof(image);
DMA1_Channel5->CCR |= DMA_CCR5_EN;

I didn’t need to reset CMAR and CPAR after all.

I think that’s now demonstrated everything I need for the video signal generator! My code needs a big cleanup, and I’d like to use ChibiOS functions where I can (palSetPadMode instead of messing around with memory locations and data structures, etc).

Picture data using SPI

I plan to use SPI to send the picture data for my video generator.

First I need to work out what speed to run the port at.  Each line goes for 52 μs, or 1664 cycles.  I could divide this by 4 for 416 pixels per line or 8 for 208 per line.  This sets the baud rate, so I shouldn’t need to divide this by 8 again to get a bytes per second speed.  It looks (from the clock registers) like SPI1 is connected to the APB2 clock, and SPI2 is connected to APB1.  I’m already running APB1 at the system clock (32MHz), so I’d like to use that if I can.  The speed is set in the CR1 register, by the BR bits, which supports dividing by 4 or 8.  I might as well use SPI2.  The datasheet says that SPI2_MOSI can only be on pin B15.  I won’t need the clock output, so I won’t configure a pin for that.

The CR1 register contains a setting for 8 or 16-bit operation.  This affects the size of the data being written.  Since I plan to use DMA I’ll leave it at 8 bits.

It turns out there are very few settings to get SPI working.  I had to stuff around a lot before I got it working though – eventually I copied the ChibiOS code, and set the SPI_CR1_CPOL, SPI_CR1_SSM, SPI_CR1_SSI and SPI_CR2_SSOE flags even though I wouldn’t have thought I need them, and it suddenly worked!

This was enough to get SPI working:

  rccEnableAPB1(RCC_APB1ENR_SPI2EN, 0); // Enable SPI2 clock, run at SYSCLK
PAL_STM32_OSPEED_HIGHEST);           /* MOSI.    */
SPI2->CR1 = //SPI_CR1_BR_0 // divide clock by 4
SPI_CR1_CPOL | SPI_CR1_SSM | SPI_CR1_SSI |
SPI_CR1_BR // divide clock by 256
| SPI_CR1_MSTR;  // master mode
SPI2->CR2 = SPI_CR2_SSOE;
SPI2->CR1 |= SPI_CR1_SPE; // Enable SPI

To send data, write bytes to SPI2->DR.  The output appears on PB15.  I think in the future I’ll try using palSetPadMode for configuring the pins, since it’s better than the 8 lines of code I’ve been using previously to do this.  The above code divides the clock by 256 so I could see the output on my DSO Nano, but I’ll change this to 4 later.

The next step will be using DMA to write the data to SPI instead.

Coding PWM on the STM32L

So how about coding up the PWM to produce the hsync signal from my previous post?  Here’s my first attempt:

#include "ch.h"
#include "hal.h"
#include "stm32l1xx.h"

int main(void) {
halInit();
chSysInit();

rccEnableAPB2(RCC_APB2ENR_TIM11EN, 0); // Enable TIM11 clock, run at SYSCLK

// TIM11 outputs on PA7, PB9 or PB15
GPIOB->MODER &= ~GPIO_MODER_MODER9;
GPIOB->MODER |= GPIO_MODER_MODER9_1; // alternate function on pin B9
TIM11->CCR1 = TIM_CR1_ARPE // buffer ARR, needed for PWM (?)
| TIM_CR1_CEN; // counter enable... proably important!
TIM11->CCMR1 &= ~(TIM_CCMR1_CC1S); // configure output pin
TIM11->CCMR1 =
TIM_CCMR1_OC1M_2 | TIM_CCMR1_OC1M_1 | TIM_CCMR1_OC1M_0  // output high on compare match
TIM11->CCER = TIM_CCER_CC1P // active low output
| TIM_CCER_CC1E; // enable output
TIM11->ARR = STM32_SYSCLK * 0.000064;   // horizontal line duration
TIM11->CCR1 = STM32_SYSCLK * 0.0000047; // hsync pulse duration

TIM11->CR1 |= TIM_CR1_CEN; // enable the counter

while (1) {
chThdSleepMilliseconds(500);
chThdSleepMilliseconds(500);
}
}

I’ll attach my DSO Nano to PB9, and… nothing!  I don’t think it’s the DSO; that should be good to a few μs.

So why isn’t it running? Maybe the timer is running, but the pin output isn’t.  I started OpenOCD, and attached GDB to it.  The datasheet says that TIM11 is at 0x40011000, and the reference manual says the counter is at offset 0x24.  With this information, I can look at the contents of the timer – the GDB manual says I can look at memory using the “x” command:

$arm-none-eabi-gdb build/ch.elf GNU gdb (GNU Tools for ARM Embedded Processors) 7.3.1.20120613-cvs Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "--host=i686-linux-gnu --target=arm-none-eabi". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /home/damien/projects/stm32/video/build/ch.elf...done. (gdb) target remote :3333 Remote debugging using :3333 ResetHandler () at /opt/ChibiOS_2.4.2/os/ports/GCC/ARMCMx/crt0.c:262 262 asm volatile ("cpsid i"); (gdb) cont Continuing. ^C Program received signal SIGINT, Interrupt. 0x08000588 in _idle_thread (p=<optimized out>) at /opt/ChibiOS_2.4.2/os/kernel/src/chsys.c:62 62 chRegSetThreadName("idle"); (gdb) x 0x40011024 0x40011024: 0x000002ed (gdb) cont Continuing. ^C Program received signal SIGINT, Interrupt. 0x08000588 in _idle_thread (p=<optimized out>) at /opt/ChibiOS_2.4.2/os/kernel/src/chsys.c:62 62 chRegSetThreadName("idle"); (gdb) x 0x40011024 0x40011024: 0x0000067c (gdb) x 0x40011024 0x40011024: 0x000005b8 (gdb) x 0x40011024 0x40011024: 0x0000050c (gdb) x 0x40011024 0x40011024: 0x00000774 (gdb) The value is changing – that’s a good start! Note that I didn’t have to continue the program to see the timer incrementing – it runs even though the debugger has stopped execution! Also, the timer never seems to set the top 4 bits, which suggests it’s resetting at some value before it overflows its 16 bits. This maximum value should be 32MHz*64μs=0x800, so it’s looking good. There’s an application note discussing the timers in the STM32 chips, and reading that closely it looks like I’ve been confusing the “output compare” mode and the PWM mode. I think in the AVR these are essentially the same, but in the STM32 there doesn’t look like there’s a way to reset the pin upon overflow when in output compare mode (so this mode doesn’t seem to be that useful to me, apart from one-shot events). The ChibiOS PWM code is worth looking at too. A few days have passed since I did the above bit… I read the reference on GPIOs a bit more. I would have guessed that since I told the chip to connect to the timer, the timer would look after that pin. But after reading section 6.3.2 of the datasheet about configuring pins for the alternate function, it looks like there’s a AFR register to set. Figure 18 suggests I need to set the AFRH register to “3” to enable the timer on this pin. I added these lines:  GPIOB->AFRH &= ~GPIO_AFRH_AFRH9; GPIOB->AFRH |= 0x3 << 4; // ChibiOS doesn't seem to have constants for these and finally I get some output! The timing looks right, but there’s some strange stuff happening with the amplitude of this waveform. I’m hoping it’s aliasing with my DSO, so hopefully I’ll get to use a CRO in a few days to confirm this. I made the timings a bit longer, and it didn’t look as bad, but maybe there’s some other aliasing going on with the clocks in the chip. Another problem is that the waveform is the wrong way around! There seem to be two settings that affect this: the OC1M bits in CCMR1, which say whether the waveform is active when the counter is less than the compare register; the other specifies the polarity of the output. Maybe I only have to change one of these? It seems odd to have two registers which do mostly the same thing. I’ll change the polarity in CC1P. That’s looking better! I swapped CCER and OC1M around, and the output looked the same. I noticed that the signal doesn’t rise very quickly. Looking through the GPIO registers again, I got the OTYPER setting wrong which made it open drain. Changing this around fixed this problem. So here’s my complete code:  rccEnableAPB2(RCC_APB2ENR_TIM11EN, 0); // Enable TIM11 clock, run at SYSCLK // TIM11 outputs on PA7, PB9 or PB15 GPIOB->OTYPER &= ~GPIO_OTYPER_OT_9; // Push-pull output GPIOB->OSPEEDR |= GPIO_OSPEEDER_OSPEEDR9; // 40MHz GPIOB->PUPDR &= ~GPIO_PUPDR_PUPDR9; GPIOB->PUPDR |= GPIO_PUPDR_PUPDR9_0; // Pull-up GPIOB->MODER &= ~GPIO_MODER_MODER9; GPIOB->MODER |= GPIO_MODER_MODER9_1; // alternate function on pin B9 // Reassign port B9 GPIOB->AFRH &= ~GPIO_AFRH_AFRH9; GPIOB->AFRH |= 0x3 << 4; // ChibiOS doesn't seem to have constants for these TIM11->CR1 |= TIM_CR1_ARPE; // buffer ARR, needed for PWM (?) TIM11->CCMR1 &= ~(TIM_CCMR1_CC1S); // configure output pin TIM11->CCMR1 = TIM_CCMR1_OC1M_2 | TIM_CCMR1_OC1M_1 // output high on compare match | TIM_CCMR1_OC1PE; // preload enable TIM11->CCER = TIM_CCER_CC1P // active low output | TIM_CCER_CC1E; // enable output TIM11->ARR = STM32_SYSCLK * 0.000064; // horizontal line duration TIM11->CCR1 = STM32_SYSCLK * 0.0000047; // hsync pulse duration TIM11->CR1 |= TIM_CR1_CEN; // enable the counter Next I need an interrupt when the signal goes low, so I can adjust the signal timings for the vertical sync. Timing video signals from the STM32L Discovery In my last post, I suggested using ChibiOS to produce video signals from the STM32L Discovery. The configuration for ChibiOS is held in chconf.h. An interesting section is this one: /** * @brief System tick frequency. * @details Frequency of the system timer that drives the system ticks. This * setting also defines the system tick time unit. */ #if !defined(CH_FREQUENCY) || defined(__DOXYGEN__) #define CH_FREQUENCY 1000 #endif It looks like the operating system wakes up periodically, checking whether there’s anything to do. It also means that to produce video signals, this number may not be accurate enough. So what is this number used for? The only interesting reference I can find is in os/hal/platforms/STM32L1xx/hal_lld.c: SysTick->LOAD = STM32_HCLK / CH_FREQUENCY - 1; and of course that’s where the system ticks are initialized. STM32_HCLK looks interesting, there’s plenty of references to this in os/hal/platforms/STM32F4xx/hal_lld.h: /** * @brief AHB frequency. */ #if (STM32_HPRE == STM32_HPRE_DIV1) || defined(__DOXYGEN__) #define STM32_HCLK (STM32_SYSCLK / 1) #elif STM32_HPRE == STM32_HPRE_DIV2 #define STM32_HCLK (STM32_SYSCLK / 2) ... I remember seeing the text AHB before, this is the bus that connects the CPU to the GPIO ports, and other peripherals. This code suggests that it’s related to the CPU clock via a prescaler, which the clock tree in the reference manual confirms. This led me here: /** * @brief System clock source. */ #if STM32_NO_INIT || defined(__DOXYGEN__) #define STM32_SYSCLK STM32_HSICLK #elif (STM32_SW == STM32_SW_HSI) #define STM32_SYSCLK STM32_HSICLK ... so STM32_SYSCLK is the system clock, and we can choose the source for this. “HSI” would be the High Speed Internal clock, which is fixed at 16MHz. It’s possible to use the PLL to run the CPU at 32MHz too. So working backwards, with the default setting, STM32_HCLK is 16MHz, and ChibiOS’ default tick is 1000 cycles, which is 62.5μs. For PAL, the sync pulse length is 4.7µs, and the front porch is much shorter than that, so the ChibiOS timer is far too inaccurate for that. I could change the system tick to 100, but there’s the risk that ChibiOS won’t have enough time to do its scheduling after it wakes up, and that number still isn’t accurate enough. While I’m rummaging around the ChibiOS code, what does it use to trigger its scheduler? It never seems to be read anywhere, but looking at SysTick_Type in os/ports/common/ARMCMx/CMSIS/include/core_cm3.h it looks like some part of the address space, specifically at address 0xE000E010. The datasheet says this is part of the “Cortex-M3 Internal Peripherals”, but that’s all it says. The CPU manual might be more helpful here, and it says this is part of the “System Control Space”. Section 3.1.1 says this address contains the “SysTick Control and Status Register”, and the following registers correspond to the SysTick variable in ChibiOS. So what is the SysTick for? Section 5.2 suggests that an interrupt can be triggered on the SysTick firing, which might be what ChibiOS uses for its scheduling. (WordPress didn’t save my draft from here, so I might be missing a few steps.) So where is the handler for this? Earler I found that each program starts with an interrupt table. The example there has 4 entries, but it can be longer. The linker script (os/ports/GCC/ARMCMx/STM32L1xx/ld/STM32L152xB.ld) contains a section called “vectors”, which is defined in os/ports/GCC/ARMCMx/STM32L1xx/vectors.c. The SysTick handler is called SysTickVector, which looks like this (from os/ports/GCC/ARMCMx/chcore_v7m.c; I don’t know whether this is an arm6 or an arm7): CH_IRQ_HANDLER(SysTickVector) { CH_IRQ_PROLOGUE(); chSysLockFromIsr(); chSysTimerHandlerI(); chSysUnlockFromIsr(); CH_IRQ_EPILOGUE(); } So this is how the SysTick facility works. Now this can’t be used for generating the video signals, since it’s not accurate enough – I’d need to use another timer for that. The timer interrupt would need to be of higher priority than SysTick, otherwise the CPU might be doing something else which would make the image jump around. The ChibiOS docs suggest that interrupt handlers are like a special thread with higher priority than everything else, which is what I want. All of this suggests that ARMs are a lot trickier than 8-bit CPUs, because of all of the available features. I don’t think I’ve even found all of the relevant documentation – with the AVR, one document contains everything you need to know. Next I’ll look at how the timers work. ChibiOS on the STM32L Discovery I previously used the libraries provided by ST to do something on the STM32L Discovery, now I’ve made a short demo of using ChibiOS. Hopefully by using an OS to write code, I don’t have to mess around with timers myself to process events periodically, which I always found time consuming on the AVR. Here’s what I did: 1. Download and extract ChibiOS. I used version 2.4.2. I extracted the archive to /opt. 2. Make a copy of this directory: ChibiOS_2.4.2/demos/ARMCM3-STM32L152-DISCOVERY. This is what I edited for my demo. 3. Delete the keil and iar directories; they’re for different IDEs and we don’t need them. 4. Replace main.c with this: #include <ch.h> #include <hal.h> int main(void) { halInit(); chSysInit(); palSetPadMode(GPIOB, 7, PAL_MODE_OUTPUT_PUSHPULL); while (1) { palSetPad(GPIOB, 7); chThdSleepMilliseconds(500); palClearPad(GPIOB, 7); chThdSleepMilliseconds(500); } } I adapted this from another blog. 5. In the Makefile, change the CHIBIOS variable to point to where you extracted ChibiOS. 6. Run make, then upload the binary to the Discovery. The correct compiler should be invoked if the Linaro bare-metal compiler is on your PATH, like I did for my first coding attempt. It’s as simple as that – a light that flashes once a second! So what does this code do? halInit() and chSysInit() seem to go at the start of any ChibiOS program. The HAL is what tries to abstract out the peripherals on each chip. The document about the architecture explains this some more. pal means “Port Abstraction Layer”, and functions relating to this start with pal. Functions starting with ch relate to the kernel. There are two reference manuals in ChibiOS: one for the kernel (the cross platform stuff, although there’s a separate manual for each compiler), and one for the peripherals (the more chip-specific stuff). It looks like once you call chSysInit, that function continues to run as a thread. I would hope because we’re calling a sleep function, the CPU is actually sleeping and not busy-waiting. I’m surprised how easy this was to get running. The Makefile is very good – you don’t need to keep a copy of the entire operating system in your project directory; it will find it and compile the relevant parts. There’s no code directly accessing the hardware either, so the I/O code is no more complicated than Arduino code. Of course I’m expecting to have to learn more about ChibiOS and the STM32L as I make more complicated things. I’d like to use ChibiOS to write firmware to drive a television from the Discovery, so I can learn about RTOS scheduling and DMA, which I plan to use to copy the framebuffer to the output. I imagine I’ll need to learn more about how clocks work to control the speed the data is written to the TV. How the STM32L Discovery demo works In my previous post, I got a basic program running on a STM32L Discovery board. Now I hope to work out what the program works. The program contained this data structure: // Define the vector table unsigned int *myvectors[4] __attribute__ ((section("vectors"))) = { (unsigned int *) STACK_TOP, // stack pointer (unsigned int *) main, // code entry point (unsigned int *) nmi_handler, // NMI handler (not really) (unsigned int *) hardfault_handler // hard fault handler }; This is a structure with four pointers. It also has this in front: __attribute__ ((section("vectors"))). The linker script contains a section with a similar name, and while I don’t know anything about linker scripts, it looks like it goes right at the start of the flash memory. In other words, these four pointers look like the first 32 bytes of any program. Is there any documentation that describes this? After suffering through ST’s “product selector”, I found the page for the CPU, where I found the reference manual. This is a bit like a AVR datasheet; it tells you all about the interfaces the chip has. Since my program doesn’t talk to the outside world yet, this document isn’t terribly helpful; but it does point to a document from ARM about the CPU core. After searching for various terms in this document, I eventually found out that this table is called the “vector table” and is described in section 5.9.1. Although the table in the code is self-explanatory, it’s nice to find the reference to what exactly it does. The document also says there’s other vectors that may appear after these, so that may be useful to know one day. Now that I’ve started to find my way around the documentation, maybe I can go on to making the chip actually do something! Getting started with an STM32L Discovery with Linux and GCC I’ve got the “hello world” working on my STM32L Discovery board that I got about 8 months ago. It’s not even the canonical blinking light, but it counts up and you only know that it works by using a debugger! Another site gave me the basic idea, but I needed a few changes to get it working. 1. Download the Linaro bare metal ARM toolchain (it’s near the bottom of the page). Extract it somewhere (I put it in /opt). 2. Download and build OpenOCD. I’m using version 0.6.0. I used Checkinstallso I had a managed package: tar -zxvf openocd-0.6.0.tar.gz cd openocd-0.6.0.tar.gz ./bootstrap ./configure --prefix=/usr --enable-jlink --enable-amtjtagaccel --enable-ft2232_libftdi make sudo checkinstall make install 3. Now something to compile. I used this: // By Wolfgang Wieser, heavily based on: // http://fun-tech.se/stm32/OlimexBlinky/mini.php #define STACK_TOP 0x20000800 // just a tiny stack for demo static void nmi_handler(void); static void hardfault_handler(void); int main(void); // Define the vector table unsigned int *myvectors[4] __attribute__ ((section("vectors"))) = { (unsigned int *) STACK_TOP, // stack pointer (unsigned int *) main, // code entry point (unsigned int *) nmi_handler, // NMI handler (not really) (unsigned int *) hardfault_handler // hard fault handler }; int main(void) { int i=0; for(;;) { i++; } } void nmi_handler(void) { for(;;); } void hardfault_handler(void) { for(;;); } 4. Build it: arm-none-eabi-gcc -I. -fno-common -O0 -g -mcpu=cortex-m0 -mthumb -c -o main.o main.c I believe the -O0 is to stop the compiler optimizing out the counting loop. 5. Now for linking. The script on the other site didn’t seem to work for me – when I started the debugger, it looks like it was trying to run code from memory address 0. From what I’ve seen, the flash actually lives at 0x02000000, which might explain the problem. I found another script at ChibiOSwhich seemed to work better. Download the script, then run the linker: arm-none-eabi-ld -v -TSTM32L152xB.ld -nostartfiles -o demo.elf main.o 6. Now extract the binary image from the .elf: arm-none-eabi-objcopy -Obinary demo.elf demo.bin My binary is a whopping 52 bytes! 7. Before uploading the binary, the permissions on the Discovery board need changing, because only root can access it at the moment. Put this in /etc/udev/rules.d/90-stm32ldiscovery.rules: ATTRS{idVendor}=="0483", ATTRS{idProduct}=="3748", MODE="0666" This will give everyone write access to the Discovery. To apply the rules, run: sudo service udev restart 8. Now to start OpenOCD, and upload the binary: $ openocd -f /usr/share/openocd/scripts/board/stm32ldiscovery.cfg
Open On-Chip Debugger 0.6.0 (2012-09-15-16:06)
http://openocd.sourceforge.net/doc/doxygen/bugs.html
srst_only separate srst_nogate srst_open_drain
Info : clock speed 1000 kHz
Info : stm32lx.cpu: hardware has 6 breakpoints, 4 watchpoints

In another terminal:

$telnet localhost 4444 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. Open On-Chip Debugger > poll background polling: on TAP: stm32lx.cpu (enabled) target state: halted target halted due to breakpoint, current mode: Thread xPSR: 0x01000000 pc: 0x0800001a msp: 0x200007f0 target state: halted target halted due to breakpoint, current mode: Thread xPSR: 0x01000000 pc: 0x0800001a msp: 0x200007f0 > reset halt target state: halted target halted due to debug-request, current mode: Thread xPSR: 0x01000000 pc: 0x08000010 msp: 0x20000800 > flash probe 0 flash size = 128kbytes flash size = 128kbytes flash 'stm32lx' found at 0x08000000 > flash write_image erase demo.bin 0x08000000 auto erase enabled target state: halted target halted due to breakpoint, current mode: Thread xPSR: 0x61000000 pc: 0x20000012 msp: 0x20000800 wrote 4096 bytes from file demo.bin in 0.325034s (12.306 KiB/s) > reset target state: halted target halted due to breakpoint, current mode: Thread xPSR: 0x01000000 pc: 0x08000010 msp: 0x20000800 > exit Connection closed by foreign host. I don’t know what all of those commands do though! 9. Now to see whether the code is actually running: $ arm-none-eabi-gdb demo.elf
GNU gdb (GNU Tools for ARM Embedded Processors) 7.3.1.20120613-cvs
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=i686-linux-gnu --target=arm-none-eabi".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
(gdb) target remote :3333
Remote debugging using :3333
main () at main.c:21
21	{
(gdb) cont
Continuing.
^C
main () at main.c:26
26	        i++;
(gdb) print i
$3 = 496378 (gdb) cont Continuing. ^C Program received signal SIGINT, Interrupt. main () at main.c:26 26 i++; (gdb) print i$4 = 903650
(gdb) quit
A debugging session is active.

Inferior 1 [Remote target] will be detached.

Quit anyway? (y or n) y
Ending remote debugging.

Yay, it looks like it’s running!

A program isn’t of much use if it can’t communicate outside of the chip, so driving I/O will be next. There looks like three options:

1. Write to the hardware directly.  This involves looking through the CPU’s user manual, and working out how to access the I/Os.
2. Use another library to access the hardware.  This is much like how you write AVR code – you access all of the I/Os through C library calls.  ST supplies a library, while it doesn’t have a particularly nice license it’s probably a good starting point.
3. Use a operating system like ChibiOS, which has support for this board.  Having developed stuff for the AVR, I think it would be nice to have the resources of a real operating system – I wouldn’t have to worry about implementing scheduling and interrupts myself.

Hopefully one day I’ll try these out and get around to writing about the results!

My next post describes a what the code in this example does.