CAN Bus: remote update

image: Pexels

A scalable system

To update firmwares remotly is a necessary step to create a scalable system. When you don't have this resource you can be limited in the implementation of new features, bug fixes and the most important, regressions, if necessary.

Considering my scenario working with NXP microcontrollers of LPC 17xx serie connected to themselves through CAN message protocol, I had this limitation of not be enable to remotly update the boards. The communication by uart0 and uart3 ports were enabled: the update was physically made through uart0 and a SBC was connected in uart3 to be able to send external commands to the board. Looking to my sistem I saw that I could use the SBC that was already connected — to keep the firmware and update the boards.


CAN Protocol and CAN Bus

The CAN protocol is a message protocol to enables the communication of different boards without to be necessary to connect all the boards with each one, this way it is possible to keep an organized and reduced wire scheme, delivering the messages to all of them and programming the boards to filter if it is a massage to itself or if not

You can assimilate with a transport bus: the bus has a path already defined that connect the neighborhoods (main wire); all the bus stops from each neighborhood are included in the way (CAN node connected to main wire); the people that were waiting at the bus stop get into the bus to go to another place (message from one board to another); the bus carry the person to your destination passing by others neighborhoods, that will see the message but will not receive (message being transmitted using CAN protocol). All this physical communication system of the boards is called CAN Bus.

Not use this method would be like the bus connected directly with all the bus stops between themselves: would work but would be confuse to understand and organize, and would spent more resources — gasoline to the bus and wire for the boards. 



Continuing with the bus history, let's keep the board path and organization idea but instead of a bus, that can carry a lot of people, we'll associate now with a car, that will be able to carry five persons only. This assimilation is closer to what we see using CAN Bus to delivery the messages.

The information still being delivered but now is limited: a CAN message can only contain a maximum of 8-bytes of data. So when we are talking about normal communication between the boards this is not a problem, a board can receive a encoded message, decode and execute the commands; however, talking about remote updates would be necessary to transmit data in kB scale, not only bytes (in my case about 256 kB). The implementation must to be very well planned.

Remote update

Because of the messages limitation, when we are talking about this solution the most important is to think about the system architecture: how to break the file in smaller packages? How and when to erase the actual firmware? How to rebuild the packages? Update only one or several boards at time? And if the update gets wrong? Those and another questions appear during the development, so I'll split in different topics: 
  1. Bootloader
  2. Memory Mapping
  3. Splitting firmware file
  4. Rebuild and write the new firmware
  5. Problems that can be found
Maybe some point can stay out of this text but the major development will be approached.

Bootloader

The bootloader is responsible for indicate the initial address and boot all the system. So when a firmware is added to your board the bootloader go to flash memory at the initial address, read your code, search by main function and start it. Normally the microcontroller — at least for NXP 17xx serie  has a factory-programmed bootloader, that is located at ROM space where we can not access or change. Once started the firmware, the bootloader is not used anymore, it stays in standby until a board restart.

Control the boot moment is necessary to make the decision of when to start the main code and when the board has to be prepared to receive a new code. This decision has to be made by a bootloader, and, since we can't change the one that is already there, the idea is to construct e new bootloader that will be initialized by factory-programmed bootloader and will decide the board state.

Factory-programmed bootloader starts the main code Bootloader is called  Boot or wait a new firmware

How the bootloader is started will be better approached at the Memory Mapping topic but, looking to this workflow, we can see that one decision will be made, but based on what? Somehow the information will be receive by the main firmware when it is normally working and the message has to be passed to the bootloader, but it is in standby not receive message, how to communicate? The answer is: using a flag in the memory space.

Considering that bootloader only operates in the initial boot, the firmware code has to be able to receive the update information and write on this flag, this minimum memory space, that an update will be made. It is necessary to point that the flag must be in the flash memory space, not RAM, so you can not only receive the information and change a intern variable, it's necessary to set this information on the correct address (that's because flash memory is not volatile and can save states even after a reboot). So the new flow would be like: 

Factory-programmed bootloader starts the main code  ⟶  Bootloader is called  ⟶   Bootloader reads the flag space to make the decision    Boot or wait a new firmware

A simple code to read the memory space and made this decision would be something like:

#define FLAG_ADDRESS   0x0007F000
#define FLAG_VALID     0xA5

bool read_update_flag(void) {
    uint8_t flag = *(volatile uint8_t *)FLAG_ADDRESS;

    if (flag == FLAG_VALID) {
        return true;
    } else {
        return false;
    }
}

Looking by the write perspective, the code needs to call a function to write in this flag. For this process it is necessary to erase the memory space and only after that to write in it, something like:

void write_update_flag(uint32_t value) {
    for (int i = 0; i < PAGE_SIZE; i++) {
        aligned_buffer[i] = src_data[i];
    }

    // you need to define this functions
    prepare_sector();
    erase_sector();
    prepare_sector();

    command[0] = IAP_COPY_RAM_TO_FLASH;  // or 51 if iap library is not included
    command[1] = FLAG_ADRESS;
    command[2] = (uint32_t)&value;
    command[3] = PAGE_SIZE;  // you must to define this value
    command[4] = SystemCoreClock / 1000;
    iap_entry(command, result);

    reboot_system();  // restart board to bootloader reads the flag
}

The code to read should be at bootloader code and the code to write at the firmware code; they are seperated files. After change the flag using the write process it is necessary to restart the board, this way the bootloader will be called and will read the flag.

Memory Mapping

Bootloader, firmware and flag, all those parts of the system must be at flash memory and for this is necessary to map the memory in the most efficient way, keeping all the spaces secure, not overwriting none of then and not wrongly erase the wrong sector, this would cause a crash in the system. Using the LPC1768 as example we have the following configuration:



Analysing the scheme is possible to see a 512 kB space for flash, that will starts at 0x0 address and end at 0x80000. The factory-programmed bootloader points to 0x0 location, that is the first flash address. Knowing that the customized bootloader must to be initialized first, it will be added in this first address.

The firmware code will be added after the last bootloader section, so it is necessary to see how much space will be used by bootloader and separates the correct size to the firmware. Last but not least, a space to save your flag, a full sector that is recommended to be added at the and, to leave more space to firmware. To exemplify:

System - Location - Size
Bootloader - 0x0 - 0x20000
Firmware - 0x28000 - 0x50000 (a secure blank space after bootloader)
Flag - 0x78000 - 0x02000

Here I have some notes to do: the ideal scenario was to have a dual bank scheme: partition A and B, updating one of them and keeping the old firmware version in the other, to ensure that the code will not be broken. The problem is that complex firmware can occupy more space than one partition only. In this situation the risk to update exist and the structure must be very well done and tested.

To build the firmware and make this memory mapping I use MCUXpresso IDE, that automatically generates the linker file that is responsible to keep this information about the memory address. To edit the build properties you can navigate to Project > Properties > C/C++ Build > MCU settingsThe process should made two times: one for the bootloader code and other to firmware.

Splitting firmware file

As mentioned, the CAN method has the limitation of packages size, so is necessary to split the whole firmware in smaller packages at server side. The only part here that deserve your attention is the time between the packages to the board be able to receive, process and something with it. Here we have an example about how could you divide your file and send all using python code and libraries:

def hex_to_bin(hex_file_path):
    if not os.path.exists(hex_file_path):
        logging.error(f"Hex file not found: {hex_file_path}")
        return None
    ih = intelhex.IntelHex()
    ih.loadhex(hex_file_path)
    return ih.tobinarray()


def send_to_board(firmware_data):
    data_size = 8  // bytes 
    id_bytes = bytes([0x05, 0x01])

    for i in range(0, len(firmware_data), data_size):
        block = firmware_data[i:i+data_size]
        if len(block) < data_size:
            block = array.array('B', block)
            while len(block) < data_size:
                block.append(0x00)

        // you can monitor the sending
        //logging.debug('{:>10}: {} {}'.format('Block sent', id_bytes, block))
        serial.write(id_bytes + block)
        time.sleep(0.015)


// add here the path to your firmware .hex file
hex_path = ""
firmware_data = hex_to_bin(hex_path)
send_to_board(firmware_data)

The code is reduced but the logic is: get the .hex firmware file, transform to bin, break and send the packages. If you already have the .bin file you can comment the first function call.

Rebuild and write the new firmware

If at the server side send the package is simple, at the board everything can be very complicate. I will approach the problems in the next topic but I'll let you know that the most part of then happens here, in the board. Lest's approach the perfect scenario just for a better understatement of the process: boards to update, enough space at flash and no errors.

Knowing that the packages are being received containing 8 bytes each, it is necessary to define the sectors and addresses to write the new firmware, receive the packages and decide how to write then (package by package, group and write, receive all, etc).

The approach that I chose was not the better one but was the possible one to use keeping in mind the memory problems (better explained in the next topic). I decided to receive, group some packages and write, I had no space to receive all, check if everything was received and then to write all. The strategy was try to eliminate the possible errors to get a clean update even using the method of receive and write. The flow was:

  1. First package informs how many packages will be sent. This information is set in a variable;
  2. All firmware is erased, only keeping the bootloader code;
  3. Each package is received and grouped in a buffer until 256 bytes received;
  4. Is the buffer 256 bytes? Write in the flash memory space;
  5. Everything was receive? Update flag is cleaned;
  6. Restart the board that will read the flag (empty) and will jump to firmware code.
The method of erase all firmware before to receive all is not the ideal. Also, not to apply checksum could be a seriously problem. However, this is a testing and first code to try to update remotely, it works fine for now and will be — and is being — improved in the future for production.

Problems that can be found

Remote updates are not simple to be constructed. I found a lot of problems and obstacles during the development process and I still working to secure not only a stable system, but also a firmware that is capable to deal with crashes and recovery the system by it own. Here are the main problems that can be faced:
  1. Memory: LPC1768 has 512kB of flash memory, what is a great space, but for a bigger firmware still being not enough to implements a dual bank to save a recovery firmware and the new firmware. Erase all firmware without the guarantee that the update will be successfully done is a big problem, even if the board stay in boot mode if the updated fails;
  2. Recovery: like was said before, it is very important to have a recovery mode, with only the board's memory you will not be able to do that. An eprom could be use for this purpose;
  3. Microcontroller limitation: the LPC17xx has different flash space for each model. When developing to LPC1766 and LPC1756 I observe that even not using dual bank I had no space to save bootloader, firmware and flag, so I was limitaded to 512kB+ microcontrollers. This depends of the firmware size, of course;
  4. Time: it is necessary to find the correct interval between the packages on sending process and at the end the update process can be longer than the expected. In my case, because of the extensive firmware, I got close to 7 minutes — a long time I would say;
  5. Checksum: the code here is a simple code to make the remote update, but is very important to implements more things like checksum, to secure the package validation. The big question is how to implements this without to duplicated update interval and create am action if checksum failed.
Here is an example code to how to receive and write the packages:

// --- consts ---
// define here the firmware address (can not be 0x0, that is the bootloader)
#define FIRMWARE_ADDRESS 0x20000;

// --- variables ---
static uint8_t pageBuffer[256] __attribute__((aligned(4)));
static uint32_t bufferIndex = 0;
static uint32_t currentWriteAddress = FIRMWARE_ADDRESS;

// --- receive, group and write ---
memcpy(pageBuffer + bufferIndex + 4, message 8);
bufferIndex += 8;
if (bufferIndex >= 256) {
int result = WriteToFlash(currentWriteAddress, pageBuffer, 256);
if (result != 0) {
inteligencia::Debug::Printf("Error writing firmware to flash: %d\r\n", result);
}
currentWriteAddress += 256;
bufferIndex = 0;
}
SetCurrentFirmwarePacket( GetCurrentFirmwarePacket() + 1 );


The number of packages was set in the first package that is identified by a different id. All the packages are sent with a id to avoid to write a package that isn't a update package. The code explain the logic of receive and write, it is necessary to erase all firmware sectors and create the function to write the buffer.

Conclusion

Despite the several and big problems that can happen during this development process, the remote update is a very important step that must to be made if you want to create a scalable system. All the code that I let here was a superficial and base code to explain concepts, to implements this in your system requires to understand all architecture mentioned and adapt to your scenario.

Also, it's nice to point that this system that I'm working on and will receive the remote update have a lot of async tasks, interruptions and things happening at the same time, so was necessary to find out how to deal with each one of then to avoid broken or unwelcome packages.

From a personal perspective, I guess that this was the most challenging development that I made so far, but the quantity of knowledge that I absorb in the process was incredible, and see all the system working with this initial code was really nice. I still working in this project, but was nice to share the base understatement of what is CAN Bus and how it works on remote updates scenario.

Comments

Popular Posts