Can Approval Testing and Specification by Example Work Together?

At XP 2013 I attended a workshop by Emily Bache in which we compared different methods of writing tests for an existing piece of legacy code. The methods were coding tests to run in a test framework such as JUnit or RSpec, using Cucumber with pre-written step definitions, and approval testing.

With an approval testing library, you do not write assertions in your test code. Instead, a test passes inputs to the functionality under test and formats the results as text. It passes the text to the approval testing library which compares it against an approved copy from a previous run. If the outputs are different from the approved copy, or this is the first time the test has run and there is no approved copy, the test fails and the user must examine the differences and decide whether to approve the latest outputs. If they do, the latest outputs then become the approved copy to be used in future tests.

I was impressed by how quickly one could use approval testing to get a piece of existing code under test compared to the other two methods. It's a powerful technique for locking down behaviour when refactoring legacy code.

However, we found that it wasn't always easy to understand test failures with approval testing. We had to look in at least three places to understand what was being tested and why it might have failed: the test code, a diff between the output of the test and the currently approved copy, and the code under test.

It was worth expending some effort on how results were formatted as text to make failures easier to understand. Helpful output showed the inputs alongside the results and had explanatory text that identified the feature being tested and what each pair of inputs and outputs was testing.

To me, that begins to sound very like Specification by Example. But instead of documention being written up front and parsed by the tests to extract inputs and expected outputs, documents are generated by the test and then compared against the approved copy. With a diff-friendly document format, such as Markdown, and some string templating, one could use approval testing as a low-overhead way of both testing and documenting a system. The templating would have to generate diff-friendly output too, but it might be enough to just generate Markdown tables from inputs and outputs and ensure that the grid lines are aligned.

Approval testing would also be better than something like Cucumber for testing numerical calculations. The final output for approval does not have to be text. For a numerical function, a test can render a graphical visualisation so one can more easily see calculation errors, such as undesirable discontinuities, then when results are displayed in tabular form.

Approval testing might help avoid a failure-mode of Specification by Example / BDD, in which it gradually becomes a document-heavy process with an up-front analysis phase. People notice that requirements were missed or misunderstood when they first see the running system. If it has taken a long time to get that running system in front of people, a common reaction is to try and avoid the problem in the future by spending more effort on requirements analysis and writing more documents. In a workshop at XP Day last year I heard how a team that had adopted BDD now wrote so many Cucumber specs documents they had stopped using them to automate tests and just used Cucumber's Gherkin language as a standard format for documenting requirements before starting development. But this is a vicious circle: the more time spent writing requirements documents instead of getting running code in front of users, the more critical any mistakes in those documents become, because time needed to change the software has been spent on analysis and documentation. Reacting to mistakes by spending even more time writing requirements documents only makes the problem worse.

Agile software development takes the opposite approach: relentlessly increase the rate at which you can put running software in front of users and get feedback about how they really use it. The end result, if you push it hard enough, is continuous delivery and, for online applications with large userbases, the lean startup approach. For bespoke applications, the end result is developers and users working together to change the software, the programmer changing the system during conversations with the user about what it should do. When software can be changed that fast, writing Specification by Example documents for new features feels like an unhelpful overhead. But the documentation and regression tests that you get from Specification by Example are helpful as the system is evolved.

Maybe combining approval testing with Specification by Example would help. Rapidly change the code until it does what is desired, then lock down its behaviour with an approval test that generates easily diffable documentation and pass that documentation through a publishing pipeline as part of the build. For example, the Markdown format and pandoc tool could work well for this.

Of course, such a tool would not preclude following a normal acceptance-test driven process for changes that require more time to implement. One can write the golden copy document for the approval test by hand and it will then act like a normal specification by example document in an acceptance-test driven process, reporting where the system does not yet perform the desired behaviour.

Image from VistaICO Toolbar Icons, used under the under the Creative Commons Attribution 3.0 Unported license.

Mistaeks I Hav Made

Can Approval Testing and Specification by Example Work Together?