The BattOr's file system was written so that it writes traces to every sequential block on the SD card before restarting at the beginning of the SD card. It does this because each block of memory on an SD card is limited to somewhere around 1000 writes before it fails.
The file system is divided into 512 byte blocks, with a first block reserved as a binary indicator of whether there's any data on the card. Each individual BattOr trace has one block reserved as a header, part of which contains the location of the next block after the trace.
When starting a new trace, the BattOr has to seek to the first available block, which requires iterating through every header due to the linked list design. Now that the BattOr is running so reliably, we've collected lots of traces on a single BattOr (~5k), and iterating through this linked list is taking longer than our allotted BattOr start time of 4s.
In the short term, aschulman@ fixed this by resetting the first block of the SD card and restarting writes from the beginning. However, in the long term, we need some system that will allow us to run the BattOr for more than a couple weeks at a time before requiring a restart.
Some sort of skip list might work nicely here: if, in each header, we store not only the address of the next chunk but also the address of the chunk N ahead (for example), we could cut the required iteration time to 1/N of what it is now while only sacrificing one additional write to the SD card.
Comment 1 by aschulman@chromium.org
, Jul 6 2016