Managing MySQL Persistence with Docker Containers

I recently started using Docker to manage deployment of some applications I maintain as part of my work. Docker makes developing on a Mac and deploying to a Linux machine much simpler, and obviates a lot of configuration (absolute paths work fine). Moreover, it makes integration tests far less painful; I have Jenkins rebuild my containers and end-to-end test them each time I push to my repositories (they’re small apps so e2e tests are still quick).

Containerising applications is nice but it does have some problems. The religion^Wconvention that all containers are supposed to be ephermal is great until you want to store anything. My applications typically consist of 2-3 containers; an application server, the database, and optionally some processing backend (abusing the database as a job queue). When it comes to testing these, my containers are actually ephermal so I can tear them down and rebuild them from scratch.

Unfortunately, I can’t tear down the deployed apps. This means a changed database schema ruins the nice workflow. This post shows how I got around that problem.

MySQL in Docker

The MySQL team has made a MySQL image available on Docker Hub. This makes it really easy to get a database running. Using docker-compose makes knitting the application to it a breeze too. The nice thing about the MySQL Docker images is that they work completely out of the box. There’s no need to layer your own stuff on top of the image. Instead, you can specify some scripts that run (once) to populate your database.

The MySQL image recommends you put the database on a volume. Doing this means when you tear down the container you can link a new one to it. Compose automatically does this for you so you can docker-compose stop database && docker-compose rm -f database at your leisure (assuming you’ve set the volume up in your compose file!).

This design does mean that your init scripts will never run more than once. This is probably a good thing – they probably all start with DROP DATABASE foo. Unfortunately it means there’s no way to make changes to your schema out of the box.

Database Migrations

The solution to the problem is fairly straight forward. Schema changes need to be played on top of the deployed version. Each time you change the schema, create an SQL script with the relevant ALTER TABLE commands. Then, when the application is restarted, run the sequence of changes against the database.

Using docker-compose this consists of defining a new container that runs the mysql image, connects to the database and runs each migration in turn. Ordering their names lexically allows you to avoid having to worry about making each migration totally bulletproof. That doesn’t mean the SQL scripts don’t have to have some defensive checks in them (especially if they transform existing data rather than just adding new, empty fields).

I’ve included an example docker-compose.yml file that shows how this works. This uses the current mysql image to run a database and uses an image that includes the script to run the migrations (derived from the original mysql init script). The Dockerfile for the image is available on GitHub and the image is available on Docker Hub.

  image: mysql
    - 3306
    - ./schema:/docker-entrypoint-initdb.d
    - /var/lib/mysql
    MYSQL_DATABASE: database
    MYSQL_USER: user
    MYSQL_PASSWORD: password
    MYSQL_ROOT_PASSWORD: root_password
  image: mathewhall/mysql_migration
    - ./migrations:/docker-entrypoint-migrations.d
    - database
    MYSQL_HOST: database
    MYSQL_DATABASE: database
    MYSQL_USER: user
    MYSQL_PASSWORD: password

When this compose file is run, the migration container will start and wait for the database to come up. Once it does, it runs the migrations in the host-linked migrations directory. The directory link ensures new changes will be picked up without having to rebuild an image or recreate the container. As long as the migration container is run (either automatically with docker-compose up or manually with docker-compose run migration) the database schema will be up to date.

Cheap Home Air Quality Monitoring

I’ve wanted to monitor room temperature and humidity in my home for quite some time and I recently came across the ESP8266 Wi-Fi microcontrollers that were ideal for the job. Prior to the ESP8266 becoming mainstream, I was considering running a 1wire network over spare phone wires, similar to this weather station setup.


Thankfully, the ESP8266 chips are cheap enough that fitting one in each room of a house is practical. On top of that, people had already made inroads to getting code running on the ESP, including a webserver returning readings from a DHT22 temperature and humidity sensor.

The ESP8266 chip itself includes a Tensilica Diamond Standard core, although documentation is a little sparse. Fortunately, setting up a toolchain is relatively straight forward, and there is a (proprietary) SDK available that allows custom firmware to be built and flashed onto the modules.


The hardware I threw together is very simple, and hooks up a DS18B20 1wire temperature sensor and a DHT11 temperature and humidity sensor up to the two GPIOs hooked up to pins on the module. Since I wanted a handful of the the boards, I opted to design the circuit in Fritzing and got a pack of boards from DirtyPCBs. The schematic and board are below:


Temp Mon PCB

The schematic is straight-forward; 5V comes in from a USB socket and goes through a 250mA fuse and gets dropped to 3.3V by a TS1086CZ or AMS1117 module. The switch S1 allows the GPIO pin to be grounded to program the module, with the UART pins broken out to a separate header.

Software – ESP8266

The software is quite simple. Using sprite_tm’s webserver code and building on Martin’s additions for the DHT sensor, my code adds a DS18B20 driver and support for the cheaper DHT11 sensor.

In addition to serving the sensor values over HTTP, the code periodically sends a JSON packet over UDP to a logging host (a Raspberry Pi) which stores it in a MySQL database and an rrdtool database.

I noticed after a few days of running, the ESP boards can get quite warm. Although I don’t think it affected the sensors, it seemed sensible to avoid leaving the ESP8266 online if it only needed to periodically log data. Fortunately, there is a “deep sleep” function in the SDK, but unfortunately this requires the RESET pin to be soldered to the RTC pin to trigger wakeups. With the chips spending most of the time asleep, they don’t get warm at all.

Software – Logging

A python script listens on the UDP port for incoming packets, deserialises them, and logs them. The readings from the DHT sensor are susceptible to noise (possibly a bug in the driver), so the script includes logic to reject samples that are outside a window.

RRDTool expects updates for every value at once, so it’s not possible to use a single RRD to store the data from multiple hosts as it comes in at different times. Instead, each host needs its own RRD, so I use a script to create them as I add hosts to the network.

Software – Visualising

I initially used a Shiny app to load the data from the SQL database, but this became unworkable quite quickly; without a lot of spare RAM, MySQL resorts to creating temporary tables on the SD card which is tragically slow.

Shiny Dashboard

For quick at-a-glance readouts, I switched to RRDTool. Although it’s not as pretty as ggplot2, it’s lightweight enough to render graphs regularly and serve them as static images, which fits far more comfortably on the Raspberry pi. The script just iterates over each host and plots it for various intervals, saving the images to a webserver’s directory.

Bill of Materials & Cost

  • PCBs: $14/12 – £0.78ea
  • USB B Connector: £2.51/5 – £0.50ea
  • Fuse holder: £1.29/10 – £0.13ea
  • Fuse: ~£0.10ea
  • ESP8266: £8.36/4 – £2.09ea
  • DS18B20: £4.99/5 – £1.00ea
  • DHT11: £3.04/5 – £0.61ea
  • AMS1117 module: £2.62/5 – £0.52ea
  • Switch: £1.99/5 – £0.40ea
  • 4.7kR resistor: 2x~£0.02ea
  • Project box: £3.36/5 – £0.67ea

That works out at approximately £6.84 (amortising the cost of the PCBs), or £7.93 each for the 5 nodes I built. I’m ignoring the cost of power supplies as I have at least one USB power source and cable in the rooms I’m monitoring. Given there are more GPIOs on board to the ESP8266, there’s also scope for adding more sensors to them later on.

Using an Efergy Current Transformer to Monitor Power Consumption

Household power consumption can be estimated non-invasively by attaching a current transformer(s) to the incoming phase(s). There are quite a few commercial kits available that do this, plus EmonTX. I wanted a solution that allowed me to log power consumption over the network, and bought an Efergy “Smart” sensor, designed for Efergy’s E2 and Elite monitors.

Unlike other current transformers, the Efergy sensor contains some more interesting components. There’s a long discussion on the OpenEnergyMonitor forums about this sensor, including internals, and response characteristics, that ultimately concludes the components should be replaced to get it to work with EmonTX’s design. I wanted to keep the sensor as a unit for safety, so opted to embrace the odd design.

In a nutshell, the EmonTX reference design is to take the AC signal from a transformer fitted with an appropriate burden resistor, DC-bias it to lift the whole signal so it’s always positive, and then scale it to a range that the microcontroller on the board can tolerate (5V or 3.3V).

This design is very simple, requires very few components (resistors and a capacitor), and makes estimation easy: sample the waveform, scale the value according to the burden resistance and turns ratio, then compute the RMS of those samples to get an estimate of current. Multiplying this by an estimated (or measured) voltage and power factor generates a power consumption estimate in Watts.

Efergy's current transformer internal circuit (modelled in LTSPICE)
Efergy’s current transformer internal circuit (modelled in LTSPICE)

The Efergy sensor’s extra components do a couple of things. Firstly, a half-wave rectifier diode (D2 in the schematic) chops half of the waveform off (in practice, this limits the voltage on one half of the signal to 0.7V). This immediately means sampling the waveform won’t work without extra logic.

The diode D1 serves to decrease the burden resistance for larger loads. This results in a non-linear response to the current through the primary. Both of these components allow more of the ADC’s resolution to be used measuring the waveform at lower (expected) loads.

I threw this circuit into LTSPICE, and simulated it under a sinusoidal current source with an amplitude of 110mA, corresponding to a sinusoidal current through the primary of about 100A with a turns ratio of 1350. The waveform’s shown below:

Efergy current transformer AC response for 100A RMS.
Efergy current transformer AC response for 100A RMS.

This shows the clipped waveform, limited by D2, and the non-linear “kink” in the output voltage as the current increases, roughly at -52mA on the secondary (about 73A on the primary).

I’m using an MSP430G2553 for this project, which includes a 10-bit ADC. Unfortunately, TI advises against applying voltages outside the region of 0-3.3V to the microcontroller, and the ADC only converts inputs between 0V and 3V.

Like the EmonTX design, this signal requires at least two transforms, one to shift it so it’s all positive, and another to scale it to the safe input range. The shift can be achieved by biasing the circuit from a voltage divider (as EmonTX does), and the scale can also be achieved using a voltage divider.

However, half of the waveform isn’t useful at all, so it doesn’t make sense to waste the ADC’s range on it. The negative voltage still needs to be removed, however, so the sensor needs to be biased by the drop of D2.

To claw back some useful range, and to clamp the negative cycle to (close to) zero, I placed a diode in line with the signal, which drops the signal by about 0.7V. With a 0.8V bas, this takes the 0mA offset for the positive cycle from 0.7V to about 0.4V. This bias also ensures (at least theoretically) that the rectifier diode is biased when the interesting signal is going through it. Putting this together with an output divider:

Efergy monitor circuit design

This produces the following waveforms, measured at the current transformer (red), at the diode D5 (green), and at the output (blue), against current (cyan):

Output Waveforms

This scales the input (roughly) into the 0-3V range. At the top end, the output does increase to around 3.3V, but this occurs when the current drawn at the primary is around 95A; at this point, I have bigger problems than a blown MSP430.

Up to now, I’ve only been simulating the circuit, so practical considerations might change things. I still need to build the hardware and deal with mapping the signal to estimates of instantaneous AC usage. I’ll cover these issues in later posts.

The LTSPICE model for this post is available here.

Using the watchdog timer to fix an unstable Raspberry Pi

I’m using a Raspberry Pi to make time-lapse photos using the motion daemon. The camera I use, a generic “GEMBIRD” (VID:PID 1908:2311) works out of the box, but causes the Pi to lock up from time to time. After replacing all the polyfuses with either a normal quick blow 1A fuse (for F3) or wire (USB fuses), the freezing still happened, even with different adapters with more than enough current capacity.

I’ve surmised the problem is the driver causing a kernel panic. This means the Pi won’t respond on the network and needs a hard reset to get it working again. I’ve not had time to diagnose the fault, but the BCM2708 has a watchdog timer that allows the freezing problems to be worked around.

After following the above tutorial, watchdog wasn’t using the hardware watchdog so was unable to reboot in the instance of a kernel panic or other mystery hardware freeze. The cause of the problem is the default heartbeat interval (60s). Setting it to 15s fixes it. To test that it works, just kill it and the system will reboot if the hardware timer is enabled.

Rebooting the system when it hangs is all well and good, but the freezing (or rebooting) could cause corruption on the SD card. To guard against this, the root partition can be kept read-only, so in the event of a crash the system should still remain bootable. A few daemons and other things need to be able to write to parts of the filesystem (/var, /home, parts of /etc). I followed the instructions here and here; here are the steps in full, after creating the new partition and mounting it on /persistent, and taking a backup image of the SD card. I ran the commands as root (sudo -i) rather than with sudo to avoid writes to /home while moving it.

1. Move/copy the /home, /var and /media partitions over to /persistent:

mv /home /persistent
cp /var /persistent
mv /media /persistent

2. Recreate mount points for each directory:

mkdir /home /var /media

3. Add bind entries for each mount point in /etc/fstab

nano /etc/fstab

Add lines:

/dev/mmcblk0p3  /persistentext4defaults,noatime    0       0
/persistent/media /media        none    bind       0       0
/persistent/home  /home         none    bind       0       0
/persistent/var   /var          none    bind       0       0

4. Link /etc/mtab to /proc/self/mounts:

rm /etc/mtab
ln -s /proc/self/mounts /etc/mtab

5. Move /etc/network/run to /dev/shm

rm -rf /etc/network/run 
sudo dpkg-reconfigure ifupdown

6. Delete the contents of /var

rm -r /var/*

7. Mount the replacement partitions:

mount -a

After these steps, everything should be OK. The current root filesytem is still writeable, so packages can still be installed, config files edited. The new /var partition worked, so I rebooted the Pi to see if it still came up.

The next test was to remount the / partition as read-only and see if everything still worked. Running mount -r -o remount / worked without any errors, suggesting nothing was still trying to write to the partition. After waiting a little while to see if anything popped up in /var/log/messages, I edited /etc/fstab to add “,ro” to the entry for / and rebooted to make / read-only by default.

These changes made the system more likely to survive random reboots, but it would still periodically lock up. I found that lockups only happened when motion was reading from the camera. The lock ups were caused just after a reboot, where the motion daemon started. The problem was caused by watchdog starting after motion, leaving a small time window for a lockup to happen without being caught by the watchdog timer.

To fix this, I set motion’s init script to depend on all services, and changed watchdog to only depend upon wd_keepalive. I changed /etc/init.d/motion to add $all to the #Required-Start directive, and /etc/init.d/watchdog to replace $all with wd_keepalive. After editing the inits script, they have to be refreshed by deleting and adding them in chkconfig: sudo chkconfig --del motion && sudo chkconfig --add motion and sudo chkconfig --del watchdog && sudo chkconfig --add watchdog. This shortens the window that motion can start (and freeze the system) before the watchdog has a chance to start.

It’s more of a cheap kludge than a fix, but it works.

Visualising Bitcoin Prices with R

Vircurex is an online exchange for various cryptocurrencies. They have an accessible API which is easily queried with a small bit of R code.

Get Data

Pulling data from HTTP URLs is straightforward in R. HTTPS support is a slightly different story, but the RCurl package easily solves the problem, using the getURL function. Querying the various endpoints becomes fairly straightforward:

url <- paste0("", request_type, ".json", parameters)

Almost all API calls follow the same sequence; build a query string and send it. R provides some interesting reflection capabilities that make implementing this particularly easy. Firstly, gives the call for the function (name and arguments). In addition, named arguments can be extracted using formals(). Between those two functions, we can determine argument names and values dynamically. These will be the parameters used in the query.

query_trade <- function(base = "BTC", alt="USD")
    request(as.list([-1], formals())

This defines a function, query_trade that calls out to the Vircurex API (via request). Since most of the API calls take the same arguments, we can reuse this function for most of the calls by abusing sys.calls() which gives access to the call stack.

caller_details <- sys.calls()[[sys.nframe()-1]]
request_type <- caller_details[[1]]

Included in the call stack is the name of the calling functions. Using the right entry in the stack allows the following definitions to “just work”:

get_last_trade <- query_trade
get_lowest_ask <- query_trade
get_highest_bid <- query_trade
get_highest_bid <- query_trade
get_volume <- query_trade
get_info_for_1_currency <- query_trade
orderbook <- query_trade
trades <- query_trade

Visualising MtGox’s impact on BTC prices

Of the various API calls supports, the trades function gives all the transactions completed in the last 7 days for a given currency pair. The fromJSON function returns a list of lists which need to be munged into a data frame. The easiest way to do this is to convert it to a matrix, then to a data frame.

trade_list <- trades()
trade.df <- data.frame(matrix(unlist(trade_list), nrow=length(trade_list), byrow=T))
#Convert columns from factors to numerics:
for(col in 1:ncol(trade.df))
    trade.df[,col] <- as.numeric(levels(trade.df[,col]))[trade.df[,col]]
#Give columns names:
names(trade.df) <- c("time","id","amount","price")
#Convert the time column:
trade.df$time <- as.POSIXct(as.numeric(df$time), origin="1970-01-01")

Now the data is in a data frame with the correct types, it’s straightforward to visualise it using ggplot2:

ggplot(trade.df, aes(time, price, size=amount)) + geom_point() + geom_smooth(aes(weight=amount)) + theme_minimal()

This plots each trade as a dot, scaled by the amount (in Bitcoin). The line is fitted using LOESS regression, weighted by the transaction amount.

Vircurex: BTC USD

There’s an interesting dip that corresponds to Feb 25, about the time that MtGox went offline.

Code for this entry is available in this GitHub Gist.

Torness Power Station Visit

A while ago I came across Charles Stross’s account of visiting the AGR power station at Torness (not far from Edinburgh). Coincidentally, EDF recently opened a visitor’s centre there and now runs tours of the plant. I went on a visit recently and this is my account of what’s there. It’s worth a trip if you can make it; it only takes 90 minutes and only costs a photocopy of your passport (if you’re British). Sadly, the tour isn’t as involved as the one Charles Stross described, but it’s good fun nonetheless.

Public image seems to be the forefront of the tour. From the visitor’s centre and right through the tour, the virtues of nuclear power are prominently featured. Interestingly enough, the start of this campaign includes a video of a nuclear waste transit container surviving a collision with an ill-fated train (“this is a controlled test”, as the subtitle in the video helpfully adds).

The interesting part starts at a security gate, where you’re given a badge, checked for any verboten items (cameras, USB sticks, firearms) and pass through a secure turnstile. Once on-site the tour starts through a small glass building that leads, via a glass walled walkway into the main building.

Once inside the large building, the tour starts at a board showing Torness’ history, from its foundations to EDF’s latest mascot (whose name is Zingy). Emphasis on safety is paramount throughout the tour, starting with warnings on the use of handrails for stairs, walking with hands in pockets and untied shoelaces. Beyond the timeline, the entrance to the charge hall, and the extensive checks on dosage and material leakage are highlighted. Past a muster point stocked with potassium iodide and personal dosimeters, the tour begins, via an elevator and a bright red door that raises an alarm if held open for more than 15 seconds.

Viewing decks cover three of the main parts of the site. The first takes you over into the vast charge hall, overlooking the reactors, the spent fuel storage chamber and the fuel disposal chute. It’s not immediately clear why the hall is so large but this is clarified by the scale of the green “monster”; the machine, tasked with assembling fuel assemblies and loading them into the reactors.

Some corridors that wouldn’t look out of place in a 70s school building lead to a viewing area overlooking the control room. Decked out appropriately with a plethora of buttons, gauges and dials, the control room itself looks similarly dated (although well-maintained) — mostly down to the cream colour of the panels.

Red carpets in the control room mark out the areas where a button might find itself accidentally pushed. Sitting atop the two reactor control desks are several monitors showing old HMIs for the plant’s SCADA systems, complete with limited colour palette and cyan-on-black text. A common offender sits amongst these panels: Windows XP and Excel

The final stop on the tour is a gallery overlooking the cavernous turbine hall, showcasing the two blue GEC generator and turbine units. The noise emitted by these things is quite ferocious, even after attenuation by the glass.

The most interesting part of the tour was how clean the building is; this is to make spotting anomalies a little easier. The dated decor is also something to behold. It makes an interesting contrast — the building’s in great shape despite its obvious age. Like other AGRs (except the one at Sellafield), Torness will probably see its lifetime extended. As it stands, a lot of the technology at Torness is “obsolete”, down to the use of Windows XP in the control room. Despite that, it still produces electricity that powers approximately 2.5 million homes.

To top off the spectacular tour, the staff there are all friendly, not just those in the visitor’s centre.

JGAP Default Initialisation Configuration

I use JGAP as part of my Crunch clustering tool. For my thesis writeup, I needed to describe the specific parameters to the genetic algorithm it used. This post details how JGAP starts up, and where to find the defaults, what they are and what the values actually mean. These values are probably specific to the version I’m looking at (3.62), although they haven’t changed in roughly 4 years at the time of writing.


The search parameters are contained in a Configuration object. The configuration is initialised with various parameters and the type of chromosome to produce. The Genotype represents the population, and exposes the GA component to the user (via the evolve and getFittestIndividual methods).


Genotypes can be bootstrapped from a Configuration via the randomInitialGenotype method. This constructs the population (which uses randomInitialChromosome to produce random individuals) and returns the initialised Genotype object.


evolve has three overloaded methods, the two that take arguments internally call the no-argument version.

The evolve method simply obtains an IBreeder object from the Configuration and uses it to construct a new Population, before setting the current population to the newly constructed version.


IBreeders coordinate the evolution step, however they do not actually perform or store any of the genetic operators. These are kept in the Configuration object’s m_geneticOperators field.


The default Breeder used is the GABreeder. The following stages are applied to the population:

Fitness Evaluation

The fitness of each individual in the population is first calculated, using the bulk fitness function, or via updateChromosomes if one isn’t provided.

Natural Selectors (Before Genetic Operators)

Each of the NaturalSelectors registered for application before the genetic operator chain runs is run on the population. The default configuration does not set any of these selectors.

Genetic Operators

Each of the genetic operators registered in the configuration is run, one at a time, in order of addition to the Configuration. Implementation for this step appears in BreederBase.applyGeneticOperators. The genetic operators implement an operate method, which runs over the population. These may only add to the population, however; an operator should never modify an individual in the population. A List of IChromosomes is passed as well as a Population, although this is just a reference to the Population‘s internal list.

DefaultConfiguration uses a Crossover operator, followed by a Mutation operator. Parameters to these are given at the end of this post.


The population is then re-evaluated, as it contains new individuals not present before the first round of selection.

Natural Selectors (After Genetic Operators)

Finally, the new population is produced by applying a NaturalSelector to it. If the selector selects too few individuals, JGAP can optionally fill the remainder with random individuals – change MinimumPopSizePercent to turn this on, but it’s off by default.

The Default JGAP Configuration

DefaultConfiguration sets up the following operator chain:

Genetic Operators


Rate: 35%, which translates to populationSize * 0.35 crossover operations per generation. Each crossover produces 2 individuals. Crossover point is random.


Desired Rate: 12. The rate for this one is a little different; mutation occurs if a random integer between 0 and the rate (12) is 0, thus it equates to a probability of 1/12. This operation is applied to each gene in each element of the population (excluding those produced by crossover), so the rough likelihood of a mutation being applied is (1/rate) * popSize * chromosomeSize. For each gene that is to be mutated, a copy of the whole IChromosome that contains it is made, with the selected Gene mutated via its applyMutation method.

Natural Selector

One selector is defined by default, which is the BestChromosomesSelector. This selector is given the following parameters:

Original Rate: 0.9. 90% of the population size allotted to this selector will actually be used. The selector is elitist, so this will return the top populationSize * 0.9 elements. The remaining 10% is filled by cloning the selected elements, one by one, in order of fitness until the population size reaches the limit.

Doublette Chromosomes Allowed? Yes. Doublettes are equivalent chromosomes (by equals()).

Configuration Parameters

The configuration specifies that the pool of selectors will select 100% of the population size. However, the only selector is configured to produce only 90% of the population size each time.

keepPopulationSizeConstant is also set to true, which dictates that the population is trimmed to size at each iteration. Somewhat counter-intuitively, this does not cause the population size to be increased if it falls below the user supplied value; it doesn’t enforce a minimum, only a maximum

minimumPopSizePercent is set to zero, so random individuals will never be used to fill the population if it declines below the limit (won’t ever happen under these conditions).

So What is the Default?

The default configuration is an elitist ranking selector that clones the top 90% of the user-specified population size, repeating it to fill the rest. Crossover is random point with a rate of populationSize * 0.35. Mutation is random and applied at  to 1 in 12 genes in the whole population (i.e. the rate is dictated by chromosome size * population size / 12).