Skip to content

Using the watchdog timer to fix an unstable Raspberry Pi

I’m using a Raspberry Pi to make time-lapse photos using the motion daemon. The camera I use, a generic “GEMBIRD” (VID:PID 1908:2311) works out of the box, but causes the Pi to lock up from time to time. After replacing all the polyfuses with either a normal quick blow 1A fuse (for F3) or wire (USB fuses), the freezing still happened, even with different adapters with more than enough current capacity.

I’ve surmised the problem is the driver causing a kernel panic. This means the Pi won’t respond on the network and needs a hard reset to get it working again. I’ve not had time to diagnose the fault, but the BCM2708 has a watchdog timer that allows the freezing problems to be worked around.

After following the above tutorial, watchdog wasn’t using the hardware watchdog so was unable to reboot in the instance of a kernel panic or other mystery hardware freeze. The cause of the problem is the default heartbeat interval (60s). Setting it to 15s fixes it. To test that it works, just kill it and the system will reboot if the hardware timer is enabled.

Rebooting the system when it hangs is all well and good, but the freezing (or rebooting) could cause corruption on the SD card. To guard against this, the root partition can be kept read-only, so in the event of a crash the system should still remain bootable. A few daemons and other things need to be able to write to parts of the filesystem (/var, /home, parts of /etc). I followed the instructions here and here; here are the steps in full, after creating the new partition and mounting it on /persistent, and taking a backup image of the SD card. I ran the commands as root (sudo -i) rather than with sudo to avoid writes to /home while moving it.

1. Move/copy the /home, /var and /media partitions over to /persistent:

mv /home /persistent
cp /var /persistent
mv /media /persistent

2. Recreate mount points for each directory:

mkdir /home /var /media

3. Add bind entries for each mount point in /etc/fstab

nano /etc/fstab

Add lines:

/dev/mmcblk0p3  /persistentext4defaults,noatime    0       0
/persistent/media /media        none    bind       0       0
/persistent/home  /home         none    bind       0       0
/persistent/var   /var          none    bind       0       0

4. Link /etc/mtab to /proc/self/mounts:

rm /etc/mtab
ln -s /proc/self/mounts /etc/mtab

5. Move /etc/network/run to /dev/shm

rm -rf /etc/network/run 
sudo dpkg-reconfigure ifupdown

6. Delete the contents of /var

rm -r /var/*

7. Mount the replacement partitions:

mount -a

After these steps, everything should be OK. The current root filesytem is still writeable, so packages can still be installed, config files edited. The new /var partition worked, so I rebooted the Pi to see if it still came up.

The next test was to remount the / partition as read-only and see if everything still worked. Running mount -r -o remount / worked without any errors, suggesting nothing was still trying to write to the partition. After waiting a little while to see if anything popped up in /var/log/messages, I edited /etc/fstab to add “,ro” to the entry for / and rebooted to make / read-only by default.

These changes made the system more likely to survive random reboots, but it would still periodically lock up. I found that lockups only happened when motion was reading from the camera. The lock ups were caused just after a reboot, where the motion daemon started. The problem was caused by watchdog starting after motion, leaving a small time window for a lockup to happen without being caught by the watchdog timer.

To fix this, I set motion’s init script to depend on all services, and changed watchdog to only depend upon wd_keepalive. I changed /etc/init.d/motion to add $all to the #Required-Start directive, and /etc/init.d/watchdog to replace $all with wd_keepalive. After editing the inits script, they have to be refreshed by deleting and adding them in chkconfig: sudo chkconfig --del motion && sudo chkconfig --add motion and sudo chkconfig --del watchdog && sudo chkconfig --add watchdog. This shortens the window that motion can start (and freeze the system) before the watchdog has a chance to start.

It’s more of a cheap kludge than a fix, but it works.

Visualising Bitcoin Prices with R

Vircurex is an online exchange for various cryptocurrencies. They have an accessible API which is easily queried with a small bit of R code.

Get Data

Pulling data from HTTP URLs is straightforward in R. HTTPS support is a slightly different story, but the RCurl package easily solves the problem, using the getURL function. Querying the various endpoints becomes fairly straightforward:

url <- paste0("https://api.vircurex.com/api/", request_type, ".json", parameters)
fromJSON(json_str=getURL(url))

Almost all API calls follow the same sequence; build a query string and send it. R provides some interesting reflection capabilities that make implementing this particularly easy. Firstly, match.call() gives the call for the function (name and arguments). In addition, named arguments can be extracted using formals(). Between those two functions, we can determine argument names and values dynamically. These will be the parameters used in the query.

query_trade <- function(base = "BTC", alt="USD")
    request(as.list(match.call())[-1], formals())

This defines a function, query_trade that calls out to the Vircurex API (via request). Since most of the API calls take the same arguments, we can reuse this function for most of the calls by abusing sys.calls() which gives access to the call stack.

caller_details <- sys.calls()[[sys.nframe()-1]]
request_type <- caller_details[[1]]

Included in the call stack is the name of the calling functions. Using the right entry in the stack allows the following definitions to “just work”:

get_last_trade <- query_trade
get_lowest_ask <- query_trade
get_highest_bid <- query_trade
get_highest_bid <- query_trade
get_volume <- query_trade
get_info_for_1_currency <- query_trade
orderbook <- query_trade
trades <- query_trade

Visualising MtGox’s impact on BTC prices

Of the various API calls supports, the trades function gives all the transactions completed in the last 7 days for a given currency pair. The fromJSON function returns a list of lists which need to be munged into a data frame. The easiest way to do this is to convert it to a matrix, then to a data frame.

trade_list <- trades()
trade.df <- data.frame(matrix(unlist(trade_list), nrow=length(trade_list), byrow=T))
#Convert columns from factors to numerics:
for(col in 1:ncol(trade.df))
    trade.df[,col] <- as.numeric(levels(trade.df[,col]))[trade.df[,col]]
#Give columns names:
names(trade.df) <- c("time","id","amount","price")
#Convert the time column:
trade.df$time <- as.POSIXct(as.numeric(df$time), origin="1970-01-01")

Now the data is in a data frame with the correct types, it’s straightforward to visualise it using ggplot2:

ggplot(trade.df, aes(time, price, size=amount)) + geom_point() + geom_smooth(aes(weight=amount)) + theme_minimal()

This plots each trade as a dot, scaled by the amount (in Bitcoin). The line is fitted using LOESS regression, weighted by the transaction amount.

Vircurex: BTC USD

There’s an interesting dip that corresponds to Feb 25, about the time that MtGox went offline.

Code for this entry is available in this GitHub Gist.

Torness Power Station Visit

A while ago I came across Charles Stross’s account of visiting the AGR power station at Torness (not far from Edinburgh). Coincidentally, EDF recently opened a visitor’s centre there and now runs tours of the plant. I went on a visit recently and this is my account of what’s there. It’s worth a trip if you can make it; it only takes 90 minutes and only costs a photocopy of your passport (if you’re British). Sadly, the tour isn’t as involved as the one Charles Stross described, but it’s good fun nonetheless.

Public image seems to be the forefront of the tour. From the visitor’s centre and right through the tour, the virtues of nuclear power are prominently featured. Interestingly enough, the start of this campaign includes a video of a nuclear waste transit container surviving a collision with an ill-fated train (“this is a controlled test”, as the subtitle in the video helpfully adds).

The interesting part starts at a security gate, where you’re given a badge, checked for any verboten items (cameras, USB sticks, firearms) and pass through a secure turnstile. Once on-site the tour starts through a small glass building that leads, via a glass walled walkway into the main building.

Once inside the large building, the tour starts at a board showing Torness’ history, from its foundations to EDF’s latest mascot (whose name is Zingy). Emphasis on safety is paramount throughout the tour, starting with warnings on the use of handrails for stairs, walking with hands in pockets and untied shoelaces. Beyond the timeline, the entrance to the charge hall, and the extensive checks on dosage and material leakage are highlighted. Past a muster point stocked with potassium iodide and personal dosimeters, the tour begins, via an elevator and a bright red door that raises an alarm if held open for more than 15 seconds.

Viewing decks cover three of the main parts of the site. The first takes you over into the vast charge hall, overlooking the reactors, the spent fuel storage chamber and the fuel disposal chute. It’s not immediately clear why the hall is so large but this is clarified by the scale of the green “monster”; the machine, tasked with assembling fuel assemblies and loading them into the reactors.

Some corridors that wouldn’t look out of place in a 70s school building lead to a viewing area overlooking the control room. Decked out appropriately with a plethora of buttons, gauges and dials, the control room itself looks similarly dated (although well-maintained) — mostly down to the cream colour of the panels.

Red carpets in the control room mark out the areas where a button might find itself accidentally pushed. Sitting atop the two reactor control desks are several monitors showing old HMIs for the plant’s SCADA systems, complete with limited colour palette and cyan-on-black text. A common offender sits amongst these panels: Windows XP and Excel

The final stop on the tour is a gallery overlooking the cavernous turbine hall, showcasing the two blue GEC generator and turbine units. The noise emitted by these things is quite ferocious, even after attenuation by the glass.

The most interesting part of the tour was how clean the building is; this is to make spotting anomalies a little easier. The dated decor is also something to behold. It makes an interesting contrast — the building’s in great shape despite its obvious age. Like other AGRs (except the one at Sellafield), Torness will probably see its lifetime extended. As it stands, a lot of the technology at Torness is “obsolete”, down to the use of Windows XP in the control room. Despite that, it still produces electricity that powers approximately 2.5 million homes.

To top off the spectacular tour, the staff there are all friendly, not just those in the visitor’s centre.

JGAP Default Initialisation Configuration

I use JGAP as part of my Crunch clustering tool. For my thesis writeup, I needed to describe the specific parameters to the genetic algorithm it used. This post details how JGAP starts up, and where to find the defaults, what they are and what the values actually mean. These values are probably specific to the version I’m looking at (3.62), although they haven’t changed in roughly 4 years at the time of writing.

Startup

The search parameters are contained in a Configuration object. The configuration is initialised with various parameters and the type of chromosome to produce. The Genotype represents the population, and exposes the GA component to the user (via the evolve and getFittestIndividual methods).

Genotype

Genotypes can be bootstrapped from a Configuration via the randomInitialGenotype method. This constructs the population (which uses randomInitialChromosome to produce random individuals) and returns the initialised Genotype object.

Genotype.evolve()

evolve has three overloaded methods, the two that take arguments internally call the no-argument version.

The evolve method simply obtains an IBreeder object from the Configuration and uses it to construct a new Population, before setting the current population to the newly constructed version.

IBreeder

IBreeders coordinate the evolution step, however they do not actually perform or store any of the genetic operators. These are kept in the Configuration object’s m_geneticOperators field.

GABreeder

The default Breeder used is the GABreeder. The following stages are applied to the population:

Fitness Evaluation

The fitness of each individual in the population is first calculated, using the bulk fitness function, or via updateChromosomes if one isn’t provided.

Natural Selectors (Before Genetic Operators)

Each of the NaturalSelectors registered for application before the genetic operator chain runs is run on the population. The default configuration does not set any of these selectors.

Genetic Operators

Each of the genetic operators registered in the configuration is run, one at a time, in order of addition to the Configuration. Implementation for this step appears in BreederBase.applyGeneticOperators. The genetic operators implement an operate method, which runs over the population. These may only add to the population, however; an operator should never modify an individual in the population. A List of IChromosomes is passed as well as a Population, although this is just a reference to the Population‘s internal list.

DefaultConfiguration uses a Crossover operator, followed by a Mutation operator. Parameters to these are given at the end of this post.

Evaluation

The population is then re-evaluated, as it contains new individuals not present before the first round of selection.

Natural Selectors (After Genetic Operators)

Finally, the new population is produced by applying a NaturalSelector to it. If the selector selects too few individuals, JGAP can optionally fill the remainder with random individuals – change MinimumPopSizePercent to turn this on, but it’s off by default.

The Default JGAP Configuration

DefaultConfiguration sets up the following operator chain:

Genetic Operators

CrossoverOperator:

Rate: 35%, which translates to populationSize * 0.35 crossover operations per generation. Each crossover produces 2 individuals. Crossover point is random.

MutationOperator:

Desired Rate: 12. The rate for this one is a little different; mutation occurs if a random integer between 0 and the rate (12) is 0, thus it equates to a probability of 1/12. This operation is applied to each gene in each element of the population (excluding those produced by crossover), so the rough likelihood of a mutation being applied is (1/rate) * popSize * chromosomeSize. For each gene that is to be mutated, a copy of the whole IChromosome that contains it is made, with the selected Gene mutated via its applyMutation method.

Natural Selector

One selector is defined by default, which is the BestChromosomesSelector. This selector is given the following parameters:

Original Rate: 0.9. 90% of the population size allotted to this selector will actually be used. The selector is elitist, so this will return the top populationSize * 0.9 elements. The remaining 10% is filled by cloning the selected elements, one by one, in order of fitness until the population size reaches the limit.

Doublette Chromosomes Allowed? Yes. Doublettes are equivalent chromosomes (by equals()).

Configuration Parameters

The configuration specifies that the pool of selectors will select 100% of the population size. However, the only selector is configured to produce only 90% of the population size each time.

keepPopulationSizeConstant is also set to true, which dictates that the population is trimmed to size at each iteration. Somewhat counter-intuitively, this does not cause the population size to be increased if it falls below the user supplied value; it doesn’t enforce a minimum, only a maximum

minimumPopSizePercent is set to zero, so random individuals will never be used to fill the population if it declines below the limit (won’t ever happen under these conditions).

So What is the Default?

The default configuration is an elitist ranking selector that clones the top 90% of the user-specified population size, repeating it to fill the rest. Crossover is random point with a rate of populationSize * 0.35. Mutation is random and applied at  to 1 in 12 genes in the whole population (i.e. the rate is dictated by chromosome size * population size / 12).

Restoring sudo access after an Ubuntu upgrade

I recently brought back an old mini ITX box which had an unsupported Ubuntu version on it (last booted well over a year ago). During the upgrade process from Maverick to Natty, one of the scripts asked if I wanted a new /etc/sudoers file. Stupidly, I assumed that my user was in the correct group and took the new one.

On rebooting it turned out that my choice was unwise – I didn’t have sudo access nor did I have the root password for recovery. Unfortunately, the trick of using /bin/bash as an init replacement to get a root shell didn’t work either (it’s a common problem).

The fix was to write a small C program which just executed a script with /bin/sh to replace the sudoers file:

#include<stdio.h>
#include<stdlib.h>
#include<unistd.h>
int main(void){
	execl("/bin/sh", "-e", "/home/mat/replace_sudoers.sh",0);
	return 0;
}

Compile this with gcc: gcc -o replace replace_sudoers.c then create a replace_sudoers.sh with the following contents (with the home directory changed):

echo "Backing up old sudoers"

cp /etc/sudoers /home/mat/old_sudoers
chmod 777 /home/mat/old_sudoers

echo "moving new one"
cp -fr /home/mat/sudoers /etc/sudoers
chown root /etc/sudoers
chmod 0440 /etc/sudoers

echo "moved new sudoers successfully, will halt in 5s"
sleep 5
halt

I replaced my old sudoers with the following bare-bones one:

#/etc/sudoers

Defaults env_reset

root	ALL=(ALL) ALL
mat	ALL=(ALL) ALL
%admin	ALL=(ALL) ALL

Now, at the GRUB boot menu, edit the first kernel entry. Instead of using init=/bin/sh, init=/home/mat/replace can be used which will launch the script and overwrite the old sudoers file. On rebooting, you’ll have your sudo privileges back.

Reading ROMs and Saving Pokemon

Early Game Boy cartridges that save games use battery-backed SRAM cartridges to hold data between plays. It’s recently arisen that the batteries in them are expected to start dying soon. Critically: a lot of Pokemon from a lot of childhoods are facing extinction.

Luckily, thanks to the Pandocs [1] and a lot of patience from other developers, people have started building their own dumpers. This post details another implementation, built for PIC16 devices instead of the existing Arduino and AVR implementations. It is heavily based on the InsideGadgets Arduino GBCartRead project [2].

There's a ROM dumper under this rat's nest, honest.

The Hardware

I built my dumper around a PIC16F690. It’s got just enough pins to drive a couple of shift registers, a UART and to leave enough pins left over for I/O. The most difficult part to get is the cartridge connector – a donor Game Boy is required unfortunately.

Schematic for the dumper

The schematic is fairly straightforward. The PIC communicates with the PC via a MAX232. The address lines are driven via two 74HC594 shift registers. Data and control pins take up the remaining pins. Power is supplied via a LM7805 linear regulator (not shown).

The cartridge requires a connection to GND and +5V, as well as a pull-up resistor from +5V to the /RST pin (30)

The Software

Game Boy cartridges are fairly simple devices. They use a 32-pin connector which consists of power, three control lines, a 16-bit address bus and an 8-bit data bus. The cartridge shares the address bus with other devices and is mapped to the region 0x0000 to 0x7FFFF [3].

Talking to a cartridge is fairly simple:

  1. Write the address you want to read or write to the address bus
  2. (optional) Write the data you want to write to the data bus
  3. (optional) Raise MREQ if you’re reading SRAM (Pokemon live there)
  4. Raise RD or WR to read or write respectively
  5. Drop RD, WR and MREQ
  6. (optional) Read the data from the address bus

The WR, RD and MREQ pins are all active high.

Cartridges larger 16KB require bank switching which is handled by  a device called the memory bank controller (MBC). The MBC also controls access to the SRAM, as well as any other peripherals on the cartridge.

The MBC is controlled by writing commands to specific addresses. These addresses vary between MBCs, although most seem to enable RAM when 0xA0 is written to 0x0000. The MBC type is readable from the cartridge descriptor which is at 0x0134-0x0148 [1].

The firmware for the PIC is available at GitHub [4].

Rescuing the Pokemon

To use the dumper just connect to it via the serial port. The interface is fairly straight forward. It waits for confirmation before reading the descriptor, it’s pretty straight forward from there. The most important mode to use first is the diagnostics mode, which needs to be run without a cartridge connected.

On the Importance of Testing

Continuity testing is important!

The diagnostics mode is ideal for catching faults before they have real implications.

Faults like this one. In this case the WR pin hadn’t been properly soldered. Since the pins are active low and they’re connected via pull-down resistors. That means that the WR signal was 1, so the attempt to read RAM ended up writing garbage to it. I euthanised my Pokemon.

Dumping Data

Retrieving data (hopefully having not destroyed it) can be done using RealTerm [5]. It features true raw logging which is necessary, as, for example, PuTTY interprets some control characters, even in raw logging mode.

Future Work

At the moment the interface is a bit clunky, as is having to use RealTerm to dump things. I’ll be changing it to speak the same protocol as the original Arduino GBCartRead [2] which will make dumping things easier.

Links

[1] : The Pandocs – Everything you wanted to know about Game Boy but were afraid to ask

[2] : GBCartRead

[3] : Game Boy memory map

[4] : GBCartRead-PIC16

[5] : RealTerm

Tracing Java Method Execution with AspectJ

ApectJ is a project to bring aspect oriented programming to Java.  Sidestepping the whole issue around whether or not aspect-oriented programming is a good idea; it can be used to insert code at points you define and is a very useful tool for dynamic program analysis.

Some terminology needs explaining here before we go further.  AspectJ defines a point in the source to stop execution and pass it to your code as a “Pointcut”, the actual code executed on this pause is termed “advice”.  Although I use the word “pause” it’s not strictly correct; AspectJ “weaves” the advice code for matching pointcuts at compile time. Additionally (and perhaps most significantly) AspectJ can weave on class load time (“Load Time Weaving”), this is what we’ll be using.

So we have a way of inserting code in places in an existing code base, what can we do with it?  A lot of horrible things (playing around with returned values and parameters to name a few); but for now we’ll stick to what we want to do, tracing method calls.

The most obvious way to instrument a program is to insert calls to some logging library at the top of every method.  Although this works it takes a lot of time, either manually logging everything or figuring out how to use a regex/parser to do it for you.  With AspectJ we can simply weave code that logs the current method at every method execution point.

So how do we do that?  First we need to get AspectJ working; Eclipse will do that for you, alternatively you can do everything from the command line (assuming you have AspectJ installed, including the ajc command on your PATH).

We’ll define an Aspect that defines a pointcut on every method execution, as well as some advice to run when they turn up when the code is executed.

package aspects;

import java.util.logging.Level;
import java.util.logging.Logger;

import org.aspectj.lang.Signature;

aspect Trace{


	pointcut traceMethods() : (execution(* *(..))&& !cflow(within(Trace)));

	before(): traceMethods(){
		Signature sig = thisJoinPointStaticPart.getSignature();
		String line =""+ thisJoinPointStaticPart.getSourceLocation().getLine();
		String sourceName = thisJoinPointStaticPart.getSourceLocation().getWithinType().getCanonicalName();
		Logger.getLogger("Tracing").log(
                Level.INFO, 
                "Call from "
                	+  sourceName
                    +" line " +
                    line
                    +" to " +sig.getDeclaringTypeName() + "." + sig.getName()
		);
	}

}

So what exactly does this do? Firstly, the pointcut traceMethods() defines a new pointcut called traceMethods. This pointcut matches execution of every method in every class, as long as the control flow isn’t in the current class (Trace). The latter constraint is to stop an infinite loop occurring.

The before(): part of the class defines advice. This is the code that gets inserted just before the execution of the method. Advice can also be given after a pointcut is hit (using the after keyword instead). Our “advice” doesn’t modify execution flow, it just logs some information about the state of the program when the pointcut was hit, but it could quite easily start modifying control flow.

Save this file as Trace.java and compile it using: ajc -outxml -outjar aspects.jar Trace.java.

This compiles the aspect and puts it into a jar ready for use. The -outxml parameter will cause ajc to automatically generate an aop.xml file and save it in the jar’s META-INF directory. The load time weaving agent will read this to determine which aspects to weave with the classes it loads. The aop.xml file can do more than that, it can be used to set namespaces to ignore when weaving (such as java.*) to avoid mucking with the control flow of libraries.

So now we have our aspects.jar how do we use it? The AspectJ weaving agent must be available somewhere on the filesystem (aspectjweaver.jar) and your aspect.jar (and the target application) must be on the classpath.

Once that’s sorted to trace the application, use

java -javaagent:<path to aspectjweaver.jar> -cp <path to aspects.jar>:<path to target jar/folder> <name of main class to run>

Hopefully that will run and you should see a large amount of console output:

INFO: Call from main.RunFile line 206 to main.RunFile.main
Mar 31, 2011 2:52:53 PM aspects.Trace ajc$before$aspects_Trace$1$b314f86e
INFO: Call from main.RunFile line 186 to main.RunFile.runList
Mar 31, 2011 2:52:53 PM main.RunFile main
INFO: Starting clustering of 0 files
Mar 31, 2011 2:52:53 PM aspects.Trace ajc$before$aspects_Trace$1$b314f86e
INFO: Call from main.ExperimentRunner line 59 to main.ExperimentRunner.runExperiments

AspectJ lets you do far more than this, it’s an understatement to say this is all it can be used for. Using LTW does make for some interesting possibilities, one obvious use would be monkey patching existing projects to fix bugs where source isn’t available (and the license permits it, of course).

AspectJ is an interesting piece of work; the official project page has far more resources on the types of pointcuts you can define and the more nitty-gritty details of making it mess with the traced program.

Follow

Get every new post delivered to your Inbox.