Subscribe via RSS
5Dec/190

More (Bit)Bang For Your Buck

Arduinos are a great hobbyist platform for projects, but you'll often encounter speed and memory issues when you try to squeeze in too many features. Fortunately, being an open source system, one can easily bypass the standard libraries and push these little units as hard as possible. Of course, once you're past the warranties and disclaimers, you're out on your own.

I've previously had to jump out of the IDE and use 'faster' code whilst using a Dreamcast Controller for my Model Railway, but that was a little more advanced than what we're about to do here. Today's goal is to bitbang the data stream to the Red LED Sign as quickly as possible. If you check that post, you'll see I'm already using digitalWriteFast, but it seems that we can go even quicker with the SPI library.

What's SPI, you ask? To steal from Arduino: Serial Peripheral Interface (SPI) is a synchronous serial data protocol used by microcontrollers for communicating with one or more peripheral devices quickly over short distances. As it turns out, that's exactly what we want to do! We want the display to be a slave and, well, we're just going to hack the right RX/TX lines together to get the data to flow. A fellow Arduino-er has done this here whilst interfacing with a 74HC595 and provides a few hints as to problems we might run in to.

Actually, now that I think of it, the data is to be sent out in bytes, but my screen is no where near the order of a clean multiplication of bytes... will have to work on that. Also... the data I read is the top 'line' of each letter, it's not a clean char in any memory array. Hmmm...

Test Patterns

Without pulling apart my currently-hacked-together-and-functional-display, I used another Uno I had lying around and hooked it up to the two spare LED panels. The whole display came with 5 panels, but I wasn't able to smoothly run them all at once, so these two were not in use.

The hook-up is pretty straight forward. The Wikipedia article on SPI gives a little more information on the bus and, more specifically, dictates which wires we need to use. We'll need the data output wire and the clock, known as MOSI, or Master Out-Slave In, and SCLK, the serial clock. Note that some devices read the data pin when the clock transitions from LOW to HIGH, and some vice-versa. Fortunately, this can all be configured via the SPI library.

With the following code, I managed to get a test pattern on the display...

#include <SPI.h>
void setup() {
  for (int p = 0; p <= 12; p++) {
    pinMode(p, OUTPUT);
    digitalWrite(p, LOW);
  }
  SPI.begin();
  SPI.setDataMode(SPI_MODE2);
  SPI.setBitOrder(LSBFIRST);
  SPI.setClockDivider(SPI_CLOCK_DIV128);
}

bool isEvenRow = false;
void loop() {
  isEvenRow = false;
  for (int row = 0; row < 7; row++) 
  {
    if (isEvenRow) {
      SPI.transfer(B01010101);
      SPI.transfer(B01010101);
      SPI.transfer(B01010101);
      SPI.transfer(B01010101);
    } else {
      SPI.transfer(B10101010);
      SPI.transfer(B10101010);
      SPI.transfer(B10101010);
      SPI.transfer(B10101010);
    }
    digitalWrite(row + 3, HIGH);
    delayMicroseconds(150);
    digitalWrite(row + 3, LOW);
    isEvenRow = !isEvenRow;
  }
}

Well, neat! It worked? Playing with the SPI_CLOCK_DIv variable made the loop even quicker, down to the point where I had to up the delay at the end so that the LEDs would illuminate for a longer timeframe. SPI_CLOCK_DIV determines the speed at which the data is sent out, dividing the main 16mhz clock on the Arduino. DIV2 halves it (8mhz), DIV4 = 1/4 = 4mhz, DIV8 = 1/8 = 2mhz, DIV16 = 1/16 = 1mhz, etc...

Anyway, the test pattern was great... but I needed to output text, in rows of bytes!?

What About Text?

So this is where it gets interesting. My font array has 7 rows of 5 pixels per font. I then have up to 6 characters per LED display unit, equalling a total of 30 bits each. Of course, there's 8 bits to a byte, so the neatly rounded-up value would be 32 bits. I could therefore send out 4 bytes to get each row transmitted over the wire. But do be warned, if you start the data on the first bit out, it'll 'run off the edge' of the screen as the screen is only 30 bits wide...

Bytes 00 01 02 03
Pixels -- -- 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Row 0 00 00 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 00 00 01 00 00 00 00 00 00 00
Row 1 00 00 00 00 01 00 00 01 00 00 00 00 01 00 00 00 00 00 00 01 00 00 00 00 01 00 00 00 00 00 00 00
Row 2 00 00 00 00 01 00 00 01 00 00 00 00 01 00 00 00 00 00 00 01 00 00 00 00 01 00 00 00 00 00 00 00
Row 3 00 00 00 00 01 00 00 01 01 01 01 01 01 01 01 01 01 00 00 01 00 00 00 00 01 00 00 00 00 00 00 00
Row 4 00 00 00 00 01 00 00 01 00 00 00 00 00 00 00 00 01 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00
Row 5 00 00 00 00 01 00 00 01 00 00 00 00 00 00 00 00 01 00 00 01 00 00 00 00 01 00 00 00 00 00 00 00
Row 6 00 00 00 00 01 00 00 01 01 01 01 01 01 01 01 01 01 00 00 01 00 00 00 00 01 00 00 00 00 00 00 00

As you can see, the text to be displayed is "TEST!" in the cute Arduino 5x7 font. Our display is 30 bits wide, so there's two columns of blank to the left. After the 2 pixel offset, we start throwing out each row of data. This won't be overly-easy either, as we need to get the rows of bits per character per font.

So, based on the current pixel to be pushed out, firstly get the character from the string array. We then need to look up the font pixel array for this character, only getting the relevant row that we're trying to display. Once we have that, we can then just send the bits out to a temporary array of bytes. As that we'll only have 5 bits per font line row, we'll actually need to merge multiple characters into each byte of the array. If we then want to scroll text, we'll have to put a little more effort in. This isn't overly different from when we were sending the data out bit-by-bit; we're just stuffing it into bytes now, making sure we get everything into the correct byte in the output array.

I jammed this calculation into one row below and the output was actually quite surprising! Mainly in the fact that it actually worked!

#include <SPI.h>
#include "myfont.h"

String serialString = "TEST! ";
const int BUFFER_LENGTH = 4;
const int DISPLAY_LENGTH = 30;
byte* datarow = new byte[BUFFER_LENGTH];

void setup() {
  for (int p = 0; p <= 12; p++) {
    pinMode(p, OUTPUT);
    digitalWrite(p, LOW);
  }
  SPI.begin();
  SPI.setDataMode(SPI_MODE2);
  SPI.setBitOrder(LSBFIRST);
  SPI.setClockDivider(SPI_CLOCK_DIV128);
}

void loop() {
  for (int row = 0; row < 7; row++) 
  {
    for (int col = 0; col < DISPLAY_LENGTH; col++) {
      datarow[((col + ((BUFFER_LENGTH * 8) - DISPLAY_LENGTH)) / 8)] |= 
        ((font[serialString[col / 5]][col % 5] & (1 << (6 - row))) 
          ? 1 : 0) << ((col) % 8);
    }
    for (int col = 0; col < BUFFER_LENGTH; col++) SPI.transfer(datarow[col]);
    digitalWrite(row + 3, HIGH);
    delayMicroseconds(500);
    digitalWrite(row + 3, LOW);
  }
}

Everything was actually working nicely... but then I tried to extend it out to 4 displays. 120 pixels out, even at the fastest SPI, caused severe flickering. It dawned on me that I was doing some pretty hefty lifting in my display loop, presumably slowing down the drawing. It's actually a critical loop and it seems that any processing will slow down the line drawing, causing the flickering I was seeing.

With this problem in mind, it became apparent that I'd need to move the 'buffer' calculation out of the main loop. I quickly tested this theory by moving it to a once-off in the setup procedure.

#include <SPI.h>
#include "myfont.h"

String serialString = "TEST! ";
const int BUFFER_LENGTH = 4;
const int DISPLAY_LENGTH = 30;
const int DISPLAY_HEIGHT = 7;
byte* datarow = new byte[DISPLAY_HEIGHT][BUFFER_LENGTH];
void setup() {
  for (int p = 0; p <= 12; p++) {
    pinMode(p, OUTPUT);
    digitalWrite(p, LOW);
  }
  SPI.begin();
  SPI.setDataMode(SPI_MODE2);
  SPI.setBitOrder(LSBFIRST);
  SPI.setClockDivider(SPI_CLOCK_DIV128);

  for (int row = 0; row < DISPLAY_HEIGHT; row++) 
  {
    for (int col = 0; col < DISPLAY_LENGTH; col++) {
      datarow[row][((col + ((BUFFER_LENGTH * 8) - DISPLAY_LENGTH)) / 8)] |= 
        ((font[serialString[col / 5]][col % 5] & (1 << ((DISPLAY_HEIGHT - 1) - row))) 
          ? 1 : 0) << ((col) % 8);
    }
  }
}

void loop() {
  for (int row = 0; row < DISPLAY_HEIGHT; row++) 
  {
    for (int col = 0; col < BUFFER_LENGTH; col++) SPI.transfer(datarow[row][col]);
    digitalWrite(row + 3, HIGH);
    delayMicroseconds(500);
    digitalWrite(row + 3, LOW);
  }
}

My text was static, so I now only built the buffer once. Would you believe it? The rendering was beautiful! I couldn't even notice a flicker. This was then short-lived when I realised I needed to scroll text. Again, there was no need to rebuild the buffer on each loop, so I wrote a procedure and only called it when there was text to move; defined by a delayed scroll rate.

This worked nicely, but there was an obvious jarring delay as the text shifted along. You could feel it re-processing in-between the character movements. I was still happy with it! I went ahead and reconfigured my initial display setup to use two Arduinos: one for the remocon+wifi and the other to receive serial data and light this dispalay.

But Wait, Why Don't I Build The Entire Text Buffer?

Whilst writing this up, it occurred to me that I was only rebuilding the buffer for the text that was visible on the display, not the entire string. If I had the full text buffer already created somewhere in memory, I then would only need to get SPI to send out the data from a specific bit offset. But, how do you start sending data via SPI from half-way through a byte? My fonts/characters were 5 bits wide and each byte had 1.5ish characters in them! I considered just shifting the pointer along the bytes, but the scrolling would be horrible.

So, I want to send out data from half way through a byte? Let's just borrow that table from above again:

Bytes 00 01 02 03
Pixels -- -- 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Row 0 00 00 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 00 00 01 00 00 00 00 00 00 00
Row 1 00 00 00 00 01 00 00 01 00 00 00 00 01 00 00 00 00 00 00 01 00 00 00 00 01 00 00 00 00 00 00 00
Row 2 00 00 00 00 01 00 00 01 00 00 00 00 01 00 00 00 00 00 00 01 00 00 00 00 01 00 00 00 00 00 00 00
Row 3 00 00 00 00 01 00 00 01 01 01 01 01 01 01 01 01 01 00 00 01 00 00 00 00 01 00 00 00 00 00 00 00
Row 4 00 00 00 00 01 00 00 01 00 00 00 00 00 00 00 00 01 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00
Row 5 00 00 00 00 01 00 00 01 00 00 00 00 00 00 00 00 01 00 00 01 00 00 00 00 01 00 00 00 00 00 00 00
Row 6 00 00 00 00 01 00 00 01 01 01 01 01 01 01 01 01 01 00 00 01 00 00 00 00 01 00 00 00 00 00 00 00

Officially, on the 3rd loop of scrolling, I would want to send out the green bits. I'd also then want to send out five zeroes at the other end so that the scrolling text trails off and doesn't wrap around. I can't tell SPI to start at the 4th bit of a byte? So I'll need to create a new buffer and populate this as quickly as possible, line by line, just as we're about to render it. Thanks to the glory of the internet, there's always someone else who wants to do something similar. Effectively we have an array of bytes (unsigned chars, meaning we get all 8 bits) and we'll need to shift the first one and append the next one. It'll be very much like a catepillar crawling along.

So, if we build the full text buffer array when the string comes in, we can then push the relevant chunk into the display buffer. There'll be no font calculations, just bit for bit copying, and this will hopefully still be fast enough. We can even make sure we only do it line by line, and only as required... if we're on a byte boundary, then we don't need to shift anything!

I mentioned shifting above, and that's what we'll need to do to get the chunks of bytes into the final display buffer. If you look at the table above, you'll notice I coloured the cells in light green and dark green. The first 3 bits are light green and come from the 'end' of the first byte of the text buffer. The next 5 bits are from the 'start' of the second byte in the text buffer. This chain then continues on as we shift the bits. The >> and << C operators will assist us with this task as that's exactly what they're designed to do: shift bits in a variable in a certain direction by a certain amount. Let's try some pseudo-code first....

define a string buffer that has the string to display: "TEST!";
build a buffer for this string: stringBitBuffer[7][4];
define a display buffer that is enough for 1 panel: displayBuffer = byte[7][4];
if our scroll offset is a multiple of 8
  foreach stringBitBuffer[scrolloffset /8] SPI.writeRow();
  write extra zeroes at the end.
else if our scroll offset is NOT a multiple of 8
  foreach byte in stringBitBuffer starting from (scrollOffset/8)
    shift the first block of the text buffer right scrollOffset times
    make anything to the left of the text in that byte zero.
    move the resulting value into the current displayBuffer byte.
    get the next text buffer byte. 
    take the first (8 - scrollOffset) bits and add them to the current displayBuffer byte
  next
end if

Here's an animation showing what bits we want where...

Bytes 00 01 02 03
Pixels -- -- 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Row 0 00 00 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 00 00 01 00 00 00 00 00 00 00
Row 1 00 00 00 00 01 00 00 01 00 00 00 00 01 00 00 00 00 00 00 01 00 00 00 00 01 00 00 00 00 00 00 00
Row 2 00 00 00 00 01 00 00 01 00 00 00 00 01 00 00 00 00 00 00 01 00 00 00 00 01 00 00 00 00 00 00 00
Row 3 00 00 00 00 01 00 00 01 01 01 01 01 01 01 01 01 01 00 00 01 00 00 00 00 01 00 00 00 00 00 00 00
Row 4 00 00 00 00 01 00 00 01 00 00 00 00 00 00 00 00 01 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00
Row 5 00 00 00 00 01 00 00 01 00 00 00 00 00 00 00 00 01 00 00 01 00 00 00 00 01 00 00 00 00 00 00 00
Row 6 00 00 00 00 01 00 00 01 01 01 01 01 01 01 01 01 01 00 00 01 00 00 00 00 01 00 00 00 00 00 00 00
TB 00 00 00 00 00 00 00 00 01 01 01 01 01 01 01 01 02 02 02 02 02 02 02 02 03 03 03 03 03 03 03 03
Bytes 00 01 02 03

The table above shows the 8 frames that need to be copied over to the framebuffer between bytes. Once the data has shifted a whole byte to the left, then the drawing routine just needs to start from the next byte, there's no need to send out the initial bytes if they're offscreen. We need to copy the bits in the main area of the table into the buffer bytes listed on top. I've added a new row at the bottom which indicates the first bit/byte of the text buffer (TB). With the duplicated Byte row underneath, you can see how we'll need a part of one text buffer and the rest of the next to fill the gaps.

Back to that forum post I mentioned above, and the description I gave about bit shifting, we can build up our buffer. We need to, based on the scroll offset, take the first (8 - offset) bits from every byte and then the 'offset' bits from the following byte. i.e. if the offset was 3, we want the last 5 bits of byte 1 and the first three of byte 2. If we're on the last byte, then we just fill the end with zeroes. The AND masks for row 2 are shown below. Row 1 is nearly all bits set... so wouldn't be a good test case.

Bytes 00 01
Pixels 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15
Row 0 00 00 01 01 01 01 01 01 01 01 01 01 01 01 01 01
Row 1 00 00 00 00 01 00 00 01 00 00 00 00 01 00 00 00
AND #1 00 00 00 01 01 01 01 01 00 00 00 00 00 00 00 00
AND #2 00 00 00 00 00 00 00 00 01 01 01 00 00 00 00 00

And then we just need to shift the first 5 bits to the left... as we're stored in MSB.

#include <SPI.h>
#include "myfont.h"

int ccol = 0;
byte datarow[7][60];
int displayLength = 120;
int bytesRequired = (displayLength + (displayLength % 8) + 8) / 8;
int totalPixels = bytesRequired * 8;
byte frameBuffer[7][16];

unsigned long last_time = millis();
unsigned long scroll_time = millis();
unsigned long scroll_pause = millis();
unsigned long next_text_time = millis();
int charOffset = 0, loop_count = 0, charcol = 0;
String displayString;

void buildTextBuffer() {
  bytesRequired = ((displayString.length() * 5) / 8) + ((displayString.length() * 5) % 8 == 0 ? 0 : 1);
  totalPixels = bytesRequired  * 8;
  for (int row = 0; row < 7; row++) 
  {
    for (int b = 0; b < bytesRequired; b++) datarow[row][b] = 0;
    for (int col = 0; col < totalPixels; col++) {
      ccol = col + 8;
      datarow[row][ccol / 8] |= 
        ((font[displayString[col / 5]][col % 5] & (1 << (6 - row))) 
          ? 1 : 0) << (ccol % 8);
    }
  }
}

void buildFrameBuffer(int offset = 0) {
  int tOffset = offset / 8;
  int cOffset = offset % 8;
  int bb = 0;
  for (int row = 0; row < 7; row++) 
  {
    for (int b = 0; b < 16; b++) {
      bb = b+tOffset;
      frameBuffer[row][b] = (datarow[row][bb] >> cOffset);
      if (cOffset > 0) {
        if (bb <= bytesRequired) {
          frameBuffer[row][b] |= (datarow[row][bb + 1] << (8 - cOffset));
        }
      }
    }
  }
}

void setup() {
  Serial.begin(9600);
        
  for (int p = 0; p <= 12; p++) {
    pinMode(p, OUTPUT);
    digitalWrite(p, LOW);
  }
  
  SPI.begin();
  SPI.setDataMode(SPI_MODE3);
  SPI.setBitOrder(LSBFIRST);
  SPI.setClockDivider(SPI_CLOCK_DIV64);

  displayString = "Warming up ... ";
  buildTextBuffer();
  buildFrameBuffer(0);
}

String newString = "";
void getNextIndex() {
  for (int i = 0; i < 7; i++)
    for (int x = 0; x < 60; x++)
      datarow[i][x] = 0;
      
  if (Serial.available()) {
    newString = Serial.readStringUntil('\n');
    if (!newString.equals(displayString)) {
      displayString = newString;
      next_text_time = millis();
    }
  } else {
    displayString = "";
  }
  if (displayString.length() > 0) buildTextBuffer();
  buildFrameBuffer(0);
}

void loop() {  
  for (int row = 0; row < 7; row++) 
  {
    for (int col = 0; col < 16; col++) SPI.transfer(frameBuffer[row][col]);
    digitalWrite(row + 3, HIGH);
    delayMicroseconds(150);
    digitalWrite(row + 3, LOW);
  }

  if ((millis() - scroll_pause > 2500)) {
    if ((displayString.length() > (displayLength / 5))) {
      if (millis() - scroll_time > 30) {
        charOffset++;
        if ((charOffset / 5) > displayString.length()) {
          charOffset = 0;
          loop_count++;
          scroll_pause = millis();
          if (millis() - next_text_time > 5000) getNextIndex();
        }
        scroll_time = millis();
        buildFrameBuffer(charOffset);
      }
    } else {
      loop_count++;
      if (millis() - next_text_time > 5000) {
        getNextIndex();
        loop_count = 0;
      }
      charOffset = 0;
    }
  }
}

And it all actually worked... beautiful scrolling.

Further Possible Optimisations

We're sending the text into this Arduino from another Arduino over SoftSerial. Why send text when I could send the framebuffer bits instead? I wonder if sending (string-length * 5 * 7 * 8) bits would be quicker than the initial string and calculations on the secondary Arduino's side? Should I test it? Maybe later...

Filed under: Arduino Leave a comment
Comments (0) Trackbacks (0)

No comments yet.


Leave a comment


*

No trackbacks yet.