A unit is the smallest testable part of an application.
More often than not, I see unit tests that forget this core tenet. Picture a method or function as a sandbox. When you are writing a unit test, never let a call get outside of your sandbox.
You unit test should be testing the conditional logic and correctness of the code in your sandbox. Let another test worry about all that external/dependent code. When writing a suite of unit tests we're interested in code coverage. We want to exercise as much of our code base as possible. That's a lot of tests. We need to make sure these tests are not fragile, difficult to write or hard to maintain. And we need to make sure they're quick to execute.
The code in your sandbox really only has one or two purposes:
- To execute some external operation, and/or
- To compute some result
- If-Then statements
- Ternary operators
- Switch statements
- Boolean expressions (x = y or z)
- Try-catch blocks
- Goto statements (if applicable)
Computing a result includes:
- Checking return values
- Checking thrown exceptions
- Checking set member variables and/or globals
How to stay in your sandbox
Nearly all programming languages have some sort of Fake, Mock or Stub library available for it. The purpose of these libraries is to short-circuit the external code and replace it with your test code. Something that:
- Always returns an expected value
- Has no state
- Has no external state dependencies (i.e. a database initialized to known values)
- Returns nearly instantly
- Has no conditional logic in it
In "XUnit Test Patterns", Gerard Meszaros has a nice breakdown of Fakes, Mocks, Stubs and Dummies:
- Dummy objects are passed around but never actually used. Usually they are just used to fill parameter lists.
- Fake objects actually have working implementations, but usually take some shortcut which makes them not suitable for production (an in memory database is a good example).
- Stubs provide canned answers to calls made during the test, usually not responding at all to anything outside what's programmed in for the test. Stubs may also record information about calls, such as an email gateway stub that remembers the messages it 'sent', or maybe only how many messages it 'sent'.
- Mocks are objects pre-programmed with expectations which form a specification of the calls they are expected to receive.
I try to only use Dummies and Mock objects for unit tests. I don't like storing state in my test doubles ... after a while you spend so much time making an effective stub you don't know if you're testing the stub or the code in the sandbox. That's why I like mock objects. They usually just consist of "return value" or "raise exception" ... simple, maintainable and readable.
If you find yourself putting an If statement in a Mock, you're doing something wrong.
If you find yourself putting an If statement in a Mock, you're doing something wrong.
How can you verify an If-Then statement?
Most of the time you just want to be sure that an external method or function was actually called. Do you need to actually call that method? Hell no. Not in a unit test. Many of these Mock libraries also have provisions for "Was Called" semantics. Essentially a flag is set if a Mock method was called. If your library doesn't have one, it's pretty darn simple to simple to set one up:
So, rather than let your code call outside of your sandbox, replace the external method (via monkey-patching, dependency injection or some other mechanism) to use your mock instead.
Does all code need a Unit Test?
No. Consider the following gist:
What would a unit test accomplish here? Nothing. Just make sure you have tests around each of the methods within foo().
When Unit Tests aren't sufficient
Let's say we did write a unit test for foo() in the above example. Would it accomplish anything? No, because we aren't checking the semantic ordering of the calls within foo(). Putting logic in to ensure that do_this() was called before do_that() and do_something_else() would just be a waste of time.
What if foo() was changed to this:
Our unit test would still pass just fine. But it would be wrong. What we need here is an Integration Test. Integration tests do check the semantic ordering of statements. With an integration test we allow ourselves to step outside of our sandbox. We may even use test doubles to help us build our integration tests (including Fakes and Stubs).
But now our mandate for these tests have changed: No longer are we concerned with code coverage, but instead we're interested in testing specific usage scenarios that are critical to our customers acceptance of the software. Other than manual testing, integration testing is the only way to ensure we're building software that will work for the customer.
We don't need 100% code coverage, instead we want the 80% of the every day scenarios the user will be experiencing when they use our software. If we were making a word processor, do we need integration tests for mail merge, form builder and the math editor? Probably not. Sure, it would be nice to have, but it's not critical. What we absolutely need are integration tests for: launch, enter some text, setting basic styles like bold, italics, etc, printing, saving and loading, page flow, etc.
Do we need to do integration testing for every edge case where an error might occur? No (again, it would be nice to have) ... but realistically we need to test: disk full, disk failed, out of memory, out of paper, etc. The most common scenarios.
But, if integration testing is so good and so important, why not just do integration testing?
Because integration testing is hard, slow, brittle, hard to read and hard to maintain. Everything unit tests aren't. In other words ... they're a pain in the ass. Integration testing is sometimes frowned upon because of these limitations, but they're being compared to unit tests. Integration tests and unit tests are very different animals. They serve two different purposes and give two different levels of comfort to the developer. I don't even think they're comparable at all. The worst mistake a development team can make (regarding testing) is to mix and match their integration tests with their unit tests. You get the worst of both worlds. Keep them clearly separate.
Also, treat your integration tests like your core code base. Unit tests can be hacky since they're so small. But integration tests are complicated and need to be carefully maintained. They need to be documented properly. They need fantastic logging capabilities with rich output. They need to follow the same coding styles as your core code base. They need to be structured, refactored and updated so that they're always easily readable. Unit tests are rarely refactored, since they're so small and atomic, there's usually nothing to change.
So, to grow an effective body of Unit and Integration tests for your application, remember these rules:
- Don't step outside your sandbox for unit tests
- Use Mocks and Dummies for your unit tests
- Check all branching logic in your unit tests
- Go for very high code coverage for your unit tests
- Integration tests are difficult beasts. Go for high-impact user stories in your integration testing. Mostly Happy Day scenarios.
- Don't spend a lot of time on the edge-case failure conditions in your integration tests
- Keep your integration test code clean and maintainable. Refactor as frequently as your core code base.
- Set up a dedicated integration test server that runs on every commit to trunk (they're slow and hard to set up remember)

3 comments:
Great post, Sandy!
One of the things I would mention here is that the terminology around all of this is so loosely defined as to be stupid.
In a previous life, we simply referred to tests as blackbox versus whitebox. The former was more like integration/functional whereas the latter was more like unit tests. In that same vein, the last couple months have in turn told me that the things I thought of as "integration" tests were truly functional tests, and vice versa. As such, I've personally come to define things "unit tests" and "not unit tests" for simplicity's sake.
Regarding mocks, I personally believe their value should be evaluated on a case-by-case basis, as they can lead to some incredibly brittle tests, especially if you're enforcing the order of method calls. As for stubs, I've come to believe the only real difference between these and mocks is whether or not you verify the mock was called explicitly, and what parameters the mock actually received. One of the testing frameworks I've used in Ruby made this same distinction: stubs only returned a value necessary for a test, whereas a mock would vet the input and place a hard requirement on whether or not it was called.
Great post Sandy!
You have a great way of breaking this stuff down...
On the mocking side, I've come to love mocking frameworks for doing stubbing instead of trying to create crufty mock implementations to maintain...mockito is my favorite. I find that inline mock definition (ie when(mock.getSomething()).thenReturn("something")) much easier to use and maintain.
Good post, Sandy. I disagree on a couple of points, though.
I'm much, much, much more comfortable using fakes that maintain some amount of state than relying on the continued correctness of thousands of ad-hoc mocks objects. If the fake exposes more than a couple(!) of methods, I just add tests for the fake. I can run the tests against the real thing and against my fake, and see that they do the same thing. This pattern has served me quite well a number of times now. The amount of ad-hoc mock objects we have in Nova right now gives me the heebie jeebies. A method that mocks out a call to the db layer returning an object with an attribute that has since been moved could go unnoticed for months until some almost entirely unrelated change reveals that something has changed incompatibly. This is pain to clean up after.
I disagree about your 80%/20% thing. I think unit tests are even *more* important for the things that only 20% of your users use. If the stuff that everyone uses breaks, you'll find out in a jiffy and someone will always be up for contributing tests for that. The exotic stuff is exactly the sort of stuff you need help verifying.
I think your example of a method that doesn't need unit tests is much too general. Sure, I can think of many examples where such a method could acceptably be untested by unit tests, but in many cases, the correct, expected behaviour of that foo() method is exactly to execute those three methods in that exact order. If your mock library of choice doesn't let you verify this easily, I'd certainly whip up an ad-hoc mock that does.
Post a Comment