Posts Tagged dma

DMA on the STM32L Discovery

There’s one more part to my video generator – the picture data, which I want to transfer to the SPI port using DMA. This actually looks fairly straightforward, these are the available registers:

MEM2MEM I’m transferring from memory to a peripheral, so this should be off.
PL I’ll make this “very high” priority, because I want to keep the picture stable at all costs. If a program writes to the framebuffer during this DMA transfer, it will be blocked.
MSIZE I’ve set the SPI port to 8 bits so I’ll stick with that. I don’t think it will make any difference whether it’s 8 or 16.
MINC I want the memory pointer to increment during the transfer
PINC I guess this should be off, because to write SPI you keep sending data to the same memory location.
CIRC I don’t want the memory pointer to circle around.
DIR Read from memory
Interrupts I won’t need any yet, but eventually I’ll have to turn off the SPI port at the end of the transfer, otherwise I’ll get white bars down the sides of the screen.

DMA_CNDTRx contains how much data to transfer. There are 7 channels, and table 40 of the reference manual says SPI2_TX is on channel 5. This needs to be set to the number of pixels / 8, since I’ll have 8 pixels in one byte.  There’s a “auto-reload” setting somewhere which resets this counter value after a transfer; I think this happens in circular mode.

Table 40 also suggests I must use DMA1 for these transfers.

The peripheral address register should point to the SPI data register (&(SPI2->DR)), and the memory register is the start of the current line of pixels.

That’s all of the available settings!  There’s one more thing to do though: section 10.3.7 says this:

The peripheral DMA requests can be independently activated/de-activated by programming the DMA control bit in the registers of the corresponding peripheral.

I guess this is the TXDMAEN bit in the SPI_CR2 register.

Now for some code… first I’ll make some data to send:

const uint8_t image[] = { 0xAA, 0x55, 0xAA, 0x55 };

Of course later on I’ll have a lot more data…

Now to set the above settings:

  DMA1_Channel5->CCR = DMA_CCR5_PL // very high priority
          | DMA_CCR5_MINC  // memory increment mode
          | DMA_CCR5_DIR;  // read from memory, not peripheral

Section 10.3.3 has this useful bit of information:

The first transfer address is the one programmed in the DMA_CPARx/DMA_CMARx registers. During transfer operations, these registers keep the initially programmed value. The current transfer addresses (in the current internal peripheral/memory address register) are not accessible by software.

This suggests that I only need to set these at the start and shouldn’t need to touch them again.

To set these:

  DMA1_Channel5->CMAR = (uint32_t) image;       // where to read from
  DMA1_Channel5->CPAR = (uint32_t) &(SPI2->DR); // where to write to

Time to try it out… and… nothing!  Maybe there’s another clock setting for DMA, and sure enough there is:

  rccEnableAHB(RCC_AHBENR_DMA1EN, 0); // Enable DMA clock, run at SYSCLK

I still haven’t got anything, so I tried setting the source and destination registers each time before I start a DMA transfer. It looks like now I get a single transfer, but I’m trying to get a transfer on every hsync.

I poked around with the debugger, especially at 0x40026058 which is DMA5->CCR1 (I calculated the address from values in stm32l1xx.h), and noticed that the Enable flag is still set.  Maybe it has to be toggled each time?  Now I get a square wave instead of my data… I then tried decreasing my hsync timer, and decreasing the SPI speed, and I got a reasonable output.  I’m getting some nasty aliasing on my DSO Nano though, maybe I should have borrowed a faster scope!  I think I was triggering the DMA transfers too quickly, which produced that square wave.  Conveniently, I notice the SPI line is now low when it’s idle, which is the output I want.  I’m not sure why it’s gone low, but I’m not complaining.

So to sum up:

  rccEnableAHB(RCC_AHBENR_DMA1EN, 0); // Enable DMA clock, run at SYSCLK
  // Configure DMA
  DMA1_Channel5->CCR = DMA_CCR5_PL // very high priority
          | DMA_CCR5_MINC  // memory increment mode
          | DMA_CCR5_DIR;  // read from memory, not peripheral
  DMA1_Channel5->CMAR = (uint32_t) image;       // where to read from
  DMA1_Channel5->CPAR = (uint32_t) &(SPI2->DR); // where to write to

then in my hsync handler:

    // Activate the DMA transfer
    DMA1_Channel5->CCR &= ~DMA_CCR5_EN;
    DMA1_Channel5->CNDTR = sizeof(image);
    DMA1_Channel5->CCR |= DMA_CCR5_EN;

I didn’t need to reset CMAR and CPAR after all.

I think that’s now demonstrated everything I need for the video signal generator! My code needs a big cleanup, and I’d like to use ChibiOS functions where I can (palSetPadMode instead of messing around with memory locations and data structures, etc).

Leave a Comment

Picture data using SPI

I plan to use SPI to send the picture data for my video generator.

First I need to work out what speed to run the port at.  Each line goes for 52 μs, or 1664 cycles.  I could divide this by 4 for 416 pixels per line or 8 for 208 per line.  This sets the baud rate, so I shouldn’t need to divide this by 8 again to get a bytes per second speed.  It looks (from the clock registers) like SPI1 is connected to the APB2 clock, and SPI2 is connected to APB1.  I’m already running APB1 at the system clock (32MHz), so I’d like to use that if I can.  The speed is set in the CR1 register, by the BR bits, which supports dividing by 4 or 8.  I might as well use SPI2.  The datasheet says that SPI2_MOSI can only be on pin B15.  I won’t need the clock output, so I won’t configure a pin for that.

The CR1 register contains a setting for 8 or 16-bit operation.  This affects the size of the data being written.  Since I plan to use DMA I’ll leave it at 8 bits.

It turns out there are very few settings to get SPI working.  I had to stuff around a lot before I got it working though – eventually I copied the ChibiOS code, and set the SPI_CR1_CPOL, SPI_CR1_SSM, SPI_CR1_SSI and SPI_CR2_SSOE flags even though I wouldn’t have thought I need them, and it suddenly worked!

This was enough to get SPI working:

  rccEnableAPB1(RCC_APB1ENR_SPI2EN, 0); // Enable SPI2 clock, run at SYSCLK
  palSetPadMode(GPIOB, 15, PAL_MODE_ALTERNATE(5) |
                           PAL_STM32_OSPEED_HIGHEST);           /* MOSI.    */
  SPI2->CR1 = //SPI_CR1_BR_0 // divide clock by 4
          SPI_CR1_CPOL | SPI_CR1_SSM | SPI_CR1_SSI |
          SPI_CR1_BR // divide clock by 256
          | SPI_CR1_MSTR;  // master mode
  SPI2->CR1 |= SPI_CR1_SPE; // Enable SPI

To send data, write bytes to SPI2->DR.  The output appears on PB15.  I think in the future I’ll try using palSetPadMode for configuring the pins, since it’s better than the 8 lines of code I’ve been using previously to do this.  The above code divides the clock by 256 so I could see the output on my DSO Nano, but I’ll change this to 4 later.

The next step will be using DMA to write the data to SPI instead.

Leave a Comment

More timing video signals on the STM32L Discovery

I’ve looked how ChibiOS does its timing, and worked out that it’s unsuitable for timing video signals.  Now I’ll look at the using the timers directly.

The chip has a number of timers, I can’t work out how many.  The ChibiOS HAL manual says it can use timers 2, 3 and 4, so let’s leave those alone for other uses.  That leaves timers 9, 10 and 11.

If the timers work like the AVR’s timers, they work by starting at 0 and counting up to some maximum value, where the counter is reset to 0.  There’s also a compare register, and when the timer matches the compare register, something can happen – we can trigger an interrupt, change the state of a pin and so on.  Being able to change a pin is how PWM works.

It would be nice to use PWM to produce the sync signals.  I’ve found an excellent description of these signals, and at the start of each line, there’s always a falling signal.  So if we set the timer’s maximum value to the end of the line, and have the signal go low when it overflows, that’s the signal start taken care of.

The signal end is a bit trickier.  The image shows that it varies depending on the line; further, on some of the vertical sync lines it happens twice!  This means that we might need to change the value at which the signal goes high as the timer is running.  The AVR can do this, but in some modes the timer register is double-buffered.  If you write a new value to the timer compare register, it’s only applied the next time the timer resets.  They do this so you don’t set a compare value lower than the current register, which means the timer will keep counting up until it overflows!

So can you adjust the compare registers on the fly on the STM32L, and is it double buffered?  It looks like the compare register is called TIMx_CCR1.  There doesn’t seem to be a CCR2, so maybe these timers only have one output.  In the reference manual, section 17.6.11 says:

It is loaded permanently if the preload feature is not selected in the TIMx_CCMR1 register (bit OC1PE). Else the preload value is copied in the active capture/compare 1 register when an update event occurs.

So if the preload feature is off, the compare register can be updated straight away!  But back in the PWM mode description (section 17.4.9), it says:

You must enable the corresponding preload register by setting the OCxPE bit in the TIMx_CCMRx register

So we’re not so lucky.  We need to use the preload register, and we know it updates on an “update event”.  What’s an update event? Back in section 17.4.1:

The update event is sent when the counter reaches the overflow and if the UDIS bit equals 0 in the TIMx_CR1 register.

So all we need to do is update the compare register one line early!

What about the vertical sync lines, where there are two pulses?  That shouldn’t be a problem; we simply consider them two separate lines in software, so the lines are numbered like this:

This also shows that the maximum value will need to be changed in the same way.  In the reference manual, they call this value the “auto-reload” value, and it’s kept in the TIMx_ARR register.  Section 17.4.1 suggests you can choose whether this is double-buffered or not.  We might as well use this feature since the compare register needs it.

There’s one more thing to look at.  At the start of each line, I’ll need to start transferring data from memory to an external port using DMA, and configure the compare and maybe the maximum value register for the next line.  I could either do both in a single interrupt, or set the registers on the reset interrupt, and start the DMA on the compare interrupt.

Do we have enough time from the start of the interrupt to do anything useful? The horizontal sync pulse on a PAL signal is 4.7μs.  If we assume the CPU runs at 16MHz, this is about 75 instructions.  Section 5.5.1 of the ARM manual suggests that it takes 12 cycles to enter an interrupt.  In the AVR, it’s up to the programmer to save the register state at the start of an interrupt.  This means it’s a good idea to do as little as possible in an interrupt, because the compiler inserts lots of “push” and “pop” instructions around the interrupt.  Since the ARM looks after this for you, and takes a fixed amount of time to enter an interrupt, this isn’t a problem.  If we assume it takes 12 cycles to leave an interrupt too, that leaves about 50 cycles to do stuff.  This stuff is working out how long it should take to raise the signal again, and set the compare register and maybe the reset register.

That should be all we need to know about the timers – now I’ll try to use the timer to produce these horizontal sync pulses.

Leave a Comment