A checklist of things that are worth testing in pretty much any software system.
…trailing his fingers along the edge of an incomprehensible computer bank, he reached out and pressed an invitingly large red button on a nearby panel. The panel lit up with the words “Please do not press this button again.”
~ Douglas Adams
Software systems are complex and as such exhibit non-deterministic behavior. This is true of any non-trivial system. The behaviors of even a small software product are so varied and unpredictable as to defy complete testing.
However there are general five general areas of interest that are always worth examining because they reveal mistakes with such surprising regularity. Specifically it’s worthwhile to find out how any system handles inputs, math, text, time and system resources.
If like me you are a software developer then it’s commonly accepted that about 50% of your time should be spent in testing rather than writing code. If this seems excessive think about how much time you spent in debugging the last time code you wrote was involved in a production issue. Then think about your level of stress.
In the book The Soul of A New Machine, Tracy Kidder makes a comment to the effect that most career programmers are pack-a-day smokers who eventually drop dead of a heart attack. Don’t be that guy. Time spent testing happens during work hours, within the parameters of an estimated project schedule that you (hopefully) got to sign off on in advance. If you follow the “50% of development time is testing” rule then it’s possible that overall in the course of your career you may spend more time testing than you would have debugging production issues if you hadn’t taken the time to test. But even so, you will spend less time being stressed and less time working on the weekend.
And seriously, tested code is better code. Better code means more reliable products. Reliability in turn leads to better customer experience because reliability engenders trust. Trust in turn is the foundation of the relationship that a product team forms with its customers. Tested code means better customer experience which leads to products that compete more effectively in the marketplace. And that means you keep getting paid, which means you get to keep writing code.
- Minimum and maximum input values are always good to test. For instance, if a password field allows 6 to 128 characters, what actually happens when you submit a six-character password? What about a 128-character password?
- Too-high and too-low values. What happens with a 5-character or 129-character password? Alternately, how does the system respond to inputs equal to the the minimum and maximum integer values allowed by the implementation language or platform?
- Invalid values such as
NaN. Strings instead of integers, arrays instead of strings.
- Inputs that might break the underlying code. For a Web app examples would include SQL injection and cross-site scripting attacks.
- Empty inputs such as a blank user name field or a transaction record in which none of the fields contain any information. For unit tests, submitting zero or an empty string instead of a valid parameter can sometimes yield interesting results.
- Inputs that are too big, perhaps even too big to conveniently fit into available memory…
- Too many inputs or not enough inputs. For a unit test this is simply a matter of creating an incorrect function signature. For a Web app it might involve submitting too many POST parameters or selectively deleting parts of a URL’s query string.
- Decimal math is hard. Verify that integers are treated correctly in a floating-point context, and vice versa.
- Repeating decimals. Does the system treat
- Rounding. If you put
3 * 1.005into the system, do you get
- Type coercion. Is an input of
23treated differently than an input of
"23"? That is: is a numeric input treated differently than a string containing a numeric value?
- Units of measurement. If you specify that the thrusters should fire with a force of 267 Newtons, does the guidance system actually interpret that value as Newtons? Or is it interpreted as 267 foot-pounds? (Hat tip to Sebastian Delmont for pointing out that units of measurement are worth testing.)
- Units of currency. There’s going to be a problem if an input of £23.00 is stored in the database as $23.00.
- User names are perhaps the single most interesting class of text that can be submitted as input to a computer program. At a minimum, the system shouldn’t break when names contain apostrophes, hyphens or spaces.
- Passwords are also interesting. Does the maximum password length allow for enough entropy? Are plain-English passphrases disallowed because they don’t contain numbers? Are passwords stored as salted hashes?
- Are Unicode inputs treated differently than ASCII?
- On the Web, are HTML-encoded entities properly converted to characters and vice versa? What about URL-encoded characters?
- Time zones are a bitch. Try switching the system time from GMT to EST and see what happens.
- Test on the first and last day of daylight savings time. The system does allow you to mock out the first and last day of daylight savings time, right?
- Like with unit tests, boundary conditions can reveal
interestingness. How does the system behave between
00:01? What about during the hour between
- Be very aware of dates and times that are “special” to your system. For instance, if you have a fake user for testing purposes, how does the system respond when it’s that user’s birthday?
- What if there’s half as much available memory as the system’s designers expect?
- In a distributed system, what happens if half the nodes become unavailable?
- In a service-oriented architecture, what happens if one of the services becomes unavailable? What if it’s only partially available?
- What happens if the network is slow?
- What happens when the database is down?
- What happens when the database is empty?
- What happens if the cache is disabled? What about the CDN?
- What if load on the system spikes to ten times normal?
- What if load on the system drops to zero?
- For long-running operations, what happens if you power cycle the machine before the operation is complete?
Two digressions: names and time
When it comes to Web apps, there are two areas that seem to cause more pain than any other: people’s names and the time. These elements are both common, essential to the correct functioning of a system, and shockingly difficult to get right.
There are only two hard problems in Computer Science: cache invalidation, naming things and off-by-one-errors.
~ Phil Karlton
My favorite real-world case of a system finding a user’s name “unacceptable” involved a person whose first name was 9. Not “Nine,” mind you but the numeral “9”.
I have a friend named Sonnet (no middle name, no last name) who is unable to complete registration flow for most Web sites. I myself have occasionally been rejected by a registration form because I have no middle name.
When I used to build internal tools for Etsy I worked with a plethora of excellently-named hackers such as Michelle D’Netto, Kellan Elliot-McCrea and of course Ramin Bozorgzadeh. Ramin quickly became my test user of choice because his surname was almost always too long for the single line allotted to display it, thus breaking the UI. And in at least one case an intranet tool (which had been around for several years at that point) was brought down hard by the introduction of a user name that contained an apostrophe. If you’re not as fortunate in the naming scheme of your alpha testers then take care to construct your fixtures appropriately.
Patrick McKenzie wrote the canonical blog post on the intricacies of testing user names. Highly recommended (and highly amusing) reading.
Never, ever use the system time in tests
King and villein, lad and lass,
All answer to the hourglass.
Tests that use the system time implicitly test the system clock of whatever machine happens to be running the tests. Speaking from long experience, I can attest that this approach can only lead to unreliable tests and extreme debugging pain. If there is a test that must rely on the system clock then it is better to go without implementing the test than it is to expose yourself to the lost time and frustration that running such a test would surely incur on you and your team.
So, the system you are testing does allow you to mock out all of the necessary times of day and times of year. Right? I hope so because if you’re using the system time in tests, you are doing it completely wrong.
And in my humble opinion, if you’re using the system time in tests because the system you are testing won’t allow you to mock the time, you aren’t the only one doing it wrong — the system itself is fundamentally broken.
Pretty much every bullet point on each checklist above was drawn from my own direct experience with a mistake that was found either in development or in production. The cost of such knowledge was at the very least some frustration for myself and in other cases a lot of stress and lost time for many people on my team. But as my career has progressed and I’ve moved to larger and larger projects, it’s been really useful to have this information in my head. I like to think I design better software because I’ve been burned in the past.
I hope this checklist helps you to find mistakes in the design and implementation of your own systems as well. I hope you at least will find most of them before they’re caught by your customers in production. Because as software engineers, a clean, well-functioning system is the basic foundation of the trust that our users put in us and in the products we deliver.