Don't get me wrong. I love me some Python and I love me some unit tests. The problem is, I find strict unit tests in dynamically typed languages aren't nearly as useful as they are in statically typed languages.
I can hopefully better illustrate this with an example. Let's say we have some code like below. Note: Don't worry about what's actually happening (I completely made it up), but just be aware that decide() calls on compute() and both have conditionals/exceptions/loops, etc which make them good candidates for unit testing … in that, they could easily bust our application if someone starts messing with either of them.
Also note that decide() is dependent on a bunch of other functions: get_seed(), get_first()/second()/third() … which we will assume are equally unit tested.
So, we write our unit tests and they're really good. They mess with the fence post cases of compute(). They make get_whatever() exception out to test the retry code. All of this is done with Mocks and Fakes so we're not dependent on the other code. These are unit tests after all; we shouldn't be calling external functions/methods. This means that in our decide() unit tests, we've mocked out compute(). We green bar our test suites and all is good.
Until someone changes compute() to something like this:
Until someone changes compute() to something like this:
Now it's some weird function that takes a boolean (or something that can be coerced into a boolean) and returns a string. Fortunately the author fixed the unit tests for compute() and everything green bars again.
But the application no longer works. decide() is broken … and we won't discover it until decide() gets called.
Why? Because with most dynamically typed languages, the contracts between functions or methods are decided at runtime. When compute() is called by decide() the virtual machine will try to coerce the values for first and second as best as it can. Likewise, when the response from compute() comes back to decide() the virtual machine will try to apply the +100 to it the best way it can. Maybe it will work, maybe it won't. Almost certainly it won't be what the developers intended.
If we are using a statically typed language such as C/C++/Java/C# things are a little easier. Our function would probably look something like
So, when compute() changes to
… the compiler can catch it and complain long before the unit tests are ever run.
What does this mean for us Python/Ruby/PHP developers? It means we have to start moving towards integration tests in addition to our unit tests. Unit tests alone are not enough to let us sleep at night.
As I've said before, integration tests, while great, are a royal pain in the buttocks. They are fragile and difficult to maintain. Do we need to go full-on end-to-end integration testing? No. A normal unit test has a call-out depth of zero. We only test the function in question. But, what we can do is start to write some 1-depth unit tests (or baby integration tests, if you prefer). A 1-depth unit test would allow decide() to call compute() (and get_first/second/third/seed, etc) but no further. All calls beyond the 1-function call depth would be stubbed out as normal.
How do we decide where to place our 1-depth unit tests? I find that file or module boundaries are good places; places where it's not really easy to scan with your eyes. I will also skip third party libraries. I'm really just worried about the code I can immediately control. Or perhaps you might want to write them between highly dependent and equally complex functions?
Do you need to include every external call? Probably not. Make it a judgment call. Is the return type from a call sufficiently complex that it's likely to change? That's a great place to push your call depth down a level. Does one of your functions parameters have a dictionary of assorted types? If so, that sounds suitably fragile. Use common sense, there's no one-size-fits-all rule to this.
Do you need to include every external call? Probably not. Make it a judgment call. Is the return type from a call sufficiently complex that it's likely to change? That's a great place to push your call depth down a level. Does one of your functions parameters have a dictionary of assorted types? If so, that sounds suitably fragile. Use common sense, there's no one-size-fits-all rule to this.
But, I would strongly encourage you to place these tests in a separate directory away from your strict unit tests. Developers should know what they're getting into when they open those files.
I look forward to hearing your thoughts on this.
7 comments:
I don't really expect this to bite me very often. The return type is part of the contract, and everyone knows it even if the interpreter doesn't know it, so whoever changed compute() would have known that they needed to inspect the code that calls it.
Also, wouldn't the unit tests of compute() have gone red when they made this change, since the unit tests insisted that the return type be numeric? So they would have had to change those unit tests to no longer insist on that, which certainly should have clued them in that what they are doing is changing something that someone else relies on.
But yeah, static type-checking is great when it is great.
I agree. It doesn't happen that often, but when it does the first thing I think is "Why didn't the tests catch that?"
From the feedback I've been getting on this article I think people define the term "unit test" very loosely. They're willing to call outside of the function being tested and don't strictly mock everything external.
I view that as an integration test. A test that spans multiple components. It sounds like you're in the "willing to span components" camp as well by your assumption that the compute tests should have gone red. But they wouldn't have if all external methods are mocked.
And that's not to say your approach is wrong. Perhaps I'm just overly strict on what I view as a unit test? In other languages it would never be a problem.
I think what I'm trying to do is find the boundary between a pure unit test and a full-on integration test (which I fear creating).
In the process it seems I'm learning more about how other people approach "unit" testing.
When you wrote "What does this mean for us Python/Ruby/PHP developers?" I thought that you were going to continue in another direction entirely. "Switch to a statically typed language" would have been another way to go. :-)
Is there a statically typed Python-like language?
@robinbb ... heh, well let's not go crazy now. I thought you would have enjoyed by little subtle dig of
# 100 lines of C
:)
There a good summary of hybrid languages here: http://en.wikipedia.org/wiki/Type_system#Combinations_of_dynamic_and_static_typing
The contract of compute() includes both it's return type(s), and what it does. In a statically typed world, we're used to leaning on the compiler for the prior, and using unit tests to enforce the latter. But in a dynamically typed world, we need to explicitly test return types in our contract tests as well.
Arguably, integration tests are a crutch for when you're not completely testing the contract of a service/interface that your other unit tests are mocking/stubbing.
(In the real world, the crutch is often useful or necessary, but I still like to lean on it as little as possible.)
@ryan, I agree, but I think you're missing the point of my post.
Testing the inputs and return types on a unit testing basis won't solve the problem when you're using a dynamically typed language. You *have* to do integration testing.
Otherwise, your tests will pass, but the program will fail in the field (since the inputs and/or output no longer match).
This is a good point Sandy. It's also a problem when you are doing funky RPC casts that are no longer sending the correct type of data the end function is expecting (ran into this today in OpenStack).
Another supporting point for static type-checking is that programmers are lazy and don't always go out of their way to inspect for everything that is calling a function. If someone is reviewing code, it's hard to go behind everyone and make sure they do this.
Post a Comment