The Commit Mutex Problem

When I worked at Etsy, the problem of many developers committing to trunk at once was colloquially known as the “commit mutex” problem.

A CI system can effectively become “blocked” if too many people commit and trigger too many builds within a relatively short time period. All nodes in the CI cluster are busy but new builds keep getting queued. The resulting “build queue” means that it takes progressively longer to get feedback from CI. The longer it takes to get feedback, the less useful the CI system.

keep the feedback loop short

Throwing hardware at the problem is a reasonable solution, as long as you can afford it. In fact almost every time we had to add hardware to the CI cluster, it was because of the commit mutex. There are a very few cases where we added a new tool that made the build slower <cough>code coverage </cough> but the far more common case for adding hardware was: more people want to commit more often, and they all want to run all the tests.

Everyone committing small chunks of code all the time is a Very Good Thing. “Everyone commits to trunk every day” is one of the tenets of continuous integration. Yet there’s still an upper limit on how fast and how concurrently everyone can (or should) deploy.

Avoiding the commit mutex and preventing build queues are in my humble opinion two of the hard problems in continuous deployment.

Hiring Programmers in New York, 2013

It seems like every week I have a long conversation with someone on the topic of hiring programmers. tl;dr: there are no programmers for hire in New York in 2013. The job market is such that all decent programmers are gainfully employed. In order to hire a developers it is necessary to convince them to leave their current job. This will be true for the foreseeable future.

42% of hiring managers cited finding good people as their primary concern, as of 2012.

Personally I have observed that the number of contacts I receive from recruiters has now surpassed the number of contacts I used to receive during the dot-com boom. Which makes sense. In the late 90s the Internet had potential but only a few million people used it. Today the Internet has become an essential household appliance, like cable TV or the washing machine. And in the intervening ten years, the number of qualified software engineers has not risen to meet demand. It may take a generation before we once again see a reasonable ratio of job opportunities to competent software engineers.

There aren’t enough software engineers to fill all the roles

The most poignant question recruiters ask me lately is: do you know someone else who would be interested in this role?

No. The last time I knew a programmer who was unemployed, was in 2010. And they were only out of work for three months. And that was basically because they were being super picky about what kind of early-stage startup they wanted to join.

Web Developer salaries in New York 2012

Software Engineer salaries in New York 2012

Senior Web Developer salaries in New York 2012

Senior Software Engineer salaries in New York 2012

So no. No I don’t know any programmers who are looking for work.

And I don’t expect to meet an unemployed programmer again. Unless the Internet stops being a thing. Or unless the American school system starts turning out class after class of qualified, passionate journeyman software engineers.

Software Engineer I salaries in New York 2012

All the good programmers I know are gainfully employed

I get a lot of recruiter contacts and I read every single one. I am always looking for people’s thoughts as to me why I would leave my current job and come to their office to work on their project every day for the next 2-3 years. Unfortunately most job postings are little more than a list of keywords and a salary range.

Still, keywords and salary ranges are interesting. I always take note of which languages are trending among tech recruiters. And did you know that a Web Developer makes less money than a Software Engineer?

image

Since 2005 I’ve used Salary.com (and more recently GlassDoor) to get a general sense of what various job titles were worth around New York. Combining that with the actual salaries for roles I’ve seen mentioned in recruiter emails gives a pretty accurate picture of real salary ranges.

Let’s take it as read that any decent programmer performs that kind of analysis on a regular basis.

image

My own personal experience and all available data seems to indicate that recruiting must take place under the assumption that all good programmers are already getting paid what they want, and working in an environment that is relatively awesome.

To recruit decent developers it’s always necessary to have a reasonably interesting narrative about solving problems. In the current job market, it also takes competitive pay and a work environment that is conducive to writing a lot of code. Job postings that fail to communicate those qualities will not be successful.

Software Engineer salaries at Google New York in 2012

Salary ranges from Salary.com and GlassDoor were retrieved on January 27 2013. When reviewing these graphs, keep in mind that demand is high and therefore salaries in the median range are unlikely to draw qualified candidates. Anecdotal evidence suggests that engineers are willing to consider a new role at salaries closer to the 80th percentile.

How to write a bug report

In writing bug reports, I have found it helpful to take the attitude that I am engaged in scientific inquiry in the relatively new field that is software. I’ve gotten the best results when I constructed my communications around bugs in a way that lent itself to application of the Scientific Method (or a rough approximation thereof).

Description Of Problem, Steps To Reproduce, and Expected Resolution

Extraordinary claims require extraordinary proof. Marcello Truzzi

Generally, a complete bug report consists of no less than three sections containing the following information:

  1. Description of the problem, including at least one screenshot.
  2. Steps to reproduce the problem, in the form of a numbered list.
  3. Expected resolution of the problem, described using at least two complete sentences.

In the rest of this post I’m going to delve at length into some of the techniques I’ve learned for effective communication where bugs are concerned. But even if you stop reading right here, you already know the most important technique: write bug reports that consist of a description of the problem, steps to reproduce, and an expected resolution.

A good bug report consists of three parts roughly corresponding to the beginning, middle and end of a narrative. These three parts are: a description of the problem followed by a numbered list of steps to reproduce and finally the desired resolution.

Provide context, and lots of it.

The problem with communication is the illusion that it has taken place. George Bernard Shaw

There’s a phenomenon by which software that was working just fine, comes to be seen over time as increasingly buggy. In fact this phenomenon occurs so often that it has a name: bit rot. Very rarely, bit rot occurs due to actual changes in the software. But much more often the apparent “rot” is actually due the software staying the same while the attitudes and habits of its users slowly change and evolve.

As the existence of bit rot demonstrates, it is not only possibly but very likely that, given two different people, one of them might perceive a behavior as buggy while the other person might see the system as working just fine. Not to mention the costs associated with other consequences of miscommunication including:

Overcoming such disconnects requires careful statement of the problem at hand, with sensitivity to the different perspectives of one’s audience.

Describe the problem, using at least two complete sentences.

Usually it’s enough to write two complete sentences that describe the problem in detail. The first sentence should generally be of the form “Feature X is exhibiting behavior Y, but it was expected to exhibit behavior Z.” The second sentence should expand upon or clarify the first.

For example: “Feature X is redirecting signed-in users to the cart, but it should be redirecting to the new holiday promotion. This is happening in IE and Chrome.”

When describing thornier or larger issues, it may be useful to use a problem statement template as a starting point.

Always include screenshots that illustrate the problem.

Screenshots of errors in production are a fundamentally important artifact in the reporting of bugs and the specification of improvements to existing features.

Having a visual picture of a problem, is of huge benefit to people who weren’t there to witness the issue firsthand. Because of the deeply visual wiring of our brains, there are certain aspects of most problems that are best communicated via a picture. This tends to be true regardless the level of detail provided in writing.

Thus most bug reports cannot be considered complete without at least one screenshot.

Present the Steps To Reproduce as a numbered list of discrete actions.

More than any other factor, the ability to reproduce a problem will determine whether or not the problem gets fixed. Failure to provide enough information to reproduce an issue is one of the most insidious and frustrating things that can go wrong with a bug report. In the worst case, everyone who tries to reproduce the issue winds up feeling like they’ve wasted time over nothing while the original reporter of the bug feels like the owner of a singing frog.

Numbered lists are the ideal format for explaining how to reproduce a bug because a list makes it immediately obvious how many steps there are as well as where each step leaves off and the next step begins. Confusion about “minor” distinctions like that can lead to major headaches later on.

Here’s an example of a reasonably detailed “steps to reproduce” section:

  1. Open a browser (latest version of any major browser is fine).
  2. Navigate to foo.example.com/my/awesome/thing
  3. Sign in as the user “HomerJ”
  4. Observe that the text of the menu item in the upper right corner, runs off the right edge of the screen.

And by counter example, here’s the same set of steps with all the helpful context removed.

  1. Look at the dev site.
  2. The menu is wonky.

In the best case, a developer who reads the second example is confused but figures it out. In the worst case, she notices that the background color of the menu in question is green, recalls (possibly wrongly) that the background color was meant to be blue, and spends the rest of the day “fixing” that before she realizes she was only meant to be moving the menu a couple of pixels to the left. Or she signs in as a user whose preferences specify a totally different menu look-and-feel. Or she goes to the wrong part of the dev site and looks at the wrong menu. More misunderstandings are possible in this situation but their enumeration is left as an exercise for the reader.

Describe the expected resolution, using at least two complete sentences.

This is where it all comes together. The reader has been informed of the nature of the problem and has walked through the steps to reproduce it. What remains now are the tasks of inventing a fix for the issue, applying it to the production application without breaking anything else, and finally verifying that the fix as implemented actually solves the original problem.

It is generally insufficient to describe the expected resolution of a software bug in less than two complete sentences. Even a trivial bug represents at best a failure to properly implement the software as specified. Such failures inevitably stem from misunderstandings of one sort or another. So it is reasonable to approach writing up the desired state of a software feature, as an exercise in bridging a pre-existing communication gap.

To this end it’s also best to avoid domain-specific jargon and acronyms. Stating the desired outcome in plain English can be more challenging than it might first appear. But it’s worth the effort.

It is essential to provide as much context as possible when describing an apparent bug in a production system. Bugs are often subjective and the environment required to reproduce and investigate bugs may be very complex. Every ounce of detail helps.

Be quantitative. Engineers respond well to hard evidence.

Most bugs, most of the time, are easily nailed given even an incomplete but suggestive characterization of their error conditions at source-code level. When someone among your beta-testers can point out, “there’s a boundary problem in line nnn”, or even just “under conditions X, Y, and Z, this variable rolls over”, a quick look at the offending code often suffices to pin down the exact mode of failure and generate a fix.
Eric Raymond

Screenshots are the basic currency of good bug reports. But there are numerous other artifacts which could be helpful in diagnosing an issue. These include but are not limited to:

  • Web server or application log messages.
  • Stack traces of all sorts.
  • Network traffic captures, eg from tcpdump or Charles.
  • HTTP Archive files captured with Chrome Inspector.
  • Graphs from tools like Graphite, Ganglia, Cacti, Cloudkick, Google Analytics, etc.

In the case of a Web application it is also extremely helpful to provide hyperlinks pointing directly to the page(s) in production where an issue has been observed.

As a general rule, the more hard data provided, the better. Anectdotes about frustrated users can be a useful tool for understanding why a fixing a particular bug might be important. But knowing there’s an error at line 142 (for instance) is much more likely to turn out to be key to actually implementing the fix.

Although bugs in the field are upsetting to users, the most direct route to fixing a bug is often through hard evidence and rigorous analysis of historical data.

A bug report is a story about a problem.

A good story has a beginning, middle and end. A good bug report has a description of the problem, steps to reproduce a problem and concludes by stating the expected resolution. These three parts of a bug report map neatly to the narrative arc that underlies all good stories.

description of problem establishment of the setting and characters
steps to reproduce rising action
expected resolution climax and resolution

Stories are powerful tools for communication. The wetware of the human brain is heavily optimized for the processing of information that is structured as a narrative.

In her talk “How To Test The Inside of Your Head,” Liz Keogh discusses confirmation bias, the Face on Mars at Cydonia, the Martian Canals of Percival Lowell and how we as a species are sometimes predisposed to believe obvious but incorrect explanations. I highly recommend watching the whole talk but here is the really pertinent bit.

Writing bug reports as I’ve described helps prevent misunderstandings and wasted effort. But there’s more to it than that. By providing a complete narrative, a good bug report sets the stage for everyone involved to act within a larger, cohesive context. Human beings feel better and make better decisions when they feel like the world around them makes sense. Stories help us to make sense of the world and a good bug report can go a long way toward making a chaotic situation suddenly feel manageable.

Thus the larger benefit of including complete information in a bug report can be to directly reduce stress and therefore indirectly to increase the capability of the team to rapidly resolve production issues.

Campfire

This article was written as a general guide, with the novice-to-intermediate practitioner in mind. The techniques described above reflect what has worked well in my limited experience. For other perspectives the reader is referred to the many other fine bug writing guides available on the Internet, including:

If you enjoyed this article, you might also appreciate my slide deck on the origin and nature of software bugs: Software Entomology, or Where Do Bugs Come From?

More falsehoods programmers believe about time; “wisdom of the crowd” edition

A couple of days ago I decided to write down some of the things I’ve learned about testing over the course of the last several years. In the course of enumerating the areas that benefit most from testing, I realized that I had accumulated a lot of specific thoughts about how we as programmers tend to abuse the concept of time.

So I wrote another post called falsehoods programmers believe about time,” where I included 34 misconceptions and mistakes having to do with both calendar and system time. Most of these were drawn from my immediate experience with code that needed to be debugged (both in production and in test).

Sub 5 Seconds

A great many of the false assumptions listed were my own. Especially “time stamps are always in seconds since epoch” and “the duration of a system clock minute is always pretty close to the duration of a wall clock minute.” Whoa did I ever live to regret my ignorance in those two cases! But hey, apparently I’m not the only one who has run into (or inadvertently caused) such issues. A lot of people responded and shared similar experiences.

UPDATED: I’d like to say a big thanks to all the Redditors who have been discussing this post recently. I have read every single one of your comments :) and learned about fun stuff like the year zero and International Atomic Time.

I’d like to say an enormous thanks to everyone who contributed to the comment threads on BoingBoing and Hacker News as well as Reddit and MetaFilter and to everyone on Twitter who shared their strange experiences with time. In those thousand or so comments and tweets, there were a lot of suggestions as to “falsehoods 35 to 35+n.”

First and foremost was the omission of the false assumption that “time always moves forward,” as pointed out by Technomancy and many others. I enjoyed reading all the suggested falsehoods. When I was done reading, I realized that taken as a whole, these constitute a whole other blog post. So I collected some of your suggested falsehoods into a post and here it is.

@JackieJack tweeted: 'I figured it out. My brain is a computer, running VMware & my spirit is a hapless programmer'

All of these assumptions are wrong

All of these falsehoods were suggested by people who commented on the original post. Each contributor is credited below.

  1. The offsets between two time zones will remain constant.
  2. OK, historical oddities aside, the offsets between two time zones won’t change in the future.
  3. Changes in the offsets between time zones will occur with plenty of advance notice.
  4. Daylight saving time happens at the same time every year.
  5. Daylight saving time happens at the same time in every time zone.
  6. Daylight saving time always adjusts by an hour.
  7. Months have either 28, 29, 30, or 31 days.
  8. The day of the month always advances contiguously from N to either N+1 or 1, with no discontinuities.
  9. There is only one calendar system in use at one time.
  10. There is a leap year every year divisible by 4.
  11. Non leap years will never contain a leap day.
  12. It will be easy to calculate the duration of x number of hours and minutes from a particular point in time.
  13. The same month has the same number of days in it everywhere!
  14. Unix time is completely ignorant about anything except seconds.
  15. Unix time is the number of seconds since Jan 1st 1970.
  16. The day before Saturday is always Friday.
  17. Contiguous timezones are no more than an hour apart. (aka we don’t need to test what happens to the avionics when you fly over the International Date Line)
  18. Two timezones that differ will differ by an integer number of half hours.
  19. Okay, quarter hours.
  20. Okay, seconds, but it will be a consistent difference if we ignore DST.
  21. If you create two date objects right beside each other, they’ll represent the same time. (a fantastic Heisenbug generator)
  22. You can wait for the clock to reach exactly HH:MM:SS by sampling once a second.
  23. If a process runs for n seconds and then terminates, approximately n seconds will have elapsed on the system clock at the time of termination.
  24. Weeks start on Monday.
  25. Days begin in the morning.
  26. Holidays span an integer number of whole days.
  27. The weekend consists of Saturday and Sunday.
  28. It’s possible to establish a total ordering on timestamps that is useful outside your system.
  29. The local time offset (from UTC) will not change during office hours.
  30. Thread.sleep(1000) sleeps for 1000 milliseconds.
  31. Thread.sleep(1000) sleeps for >= 1000 milliseconds.
  32. There are 60 seconds in every minute.
  33. Timestamps always advance monotonically.
  34. GMT and UTC are the same timezone.
  35. Britain uses GMT.
  36. Time always goes forwards.
  37. The difference between the current time and one week from the current time is always 7 * 86400 seconds.
  38. The difference between two timestamps is an accurate measure of the time that elapsed between them.
  39. 24:12:34 is a invalid time
  40. Every integer is a theoretical possible year
  41. If you display a datetime, the displayed time has the same second part as the stored time
  42. Or the same year
  43. But at least the numerical difference between the displayed and stored year will be less than 2
  44. If you have a date in a correct YYYY-MM-DD format, the year consists of four characters
  45. If you merge two dates, by taking the month from the first and the day/year from the second, you get a valid date
  46. But it will work, if both years are leap years
  47. If you take a w3c published algorithm for adding durations to dates, it will work in all cases.
  48. The standard library supports negative years and years above 10000.
  49. Time zones always differ by a whole hour
  50. If you convert a timestamp with millisecond precision to a date time with second precision, you can safely ignore the millisecond fractions
  51. But you can ignore the millisecond fraction, if it is less than 0.5
  52. Two-digit years should be somewhere in the range 1900-2099
  53. If you parse a date time, you can read the numbers character for character, without needing to backtrack
  54. But if you print a date time, you can write the numbers character for character, without needing to backtrack
  55. You will never have to parse a format like ---12Z or P12Y34M56DT78H90M12.345S
  56. There are only 24 time zones
  57. Time zones are always whole hours away from UTC
  58. Daylight Saving Time (DST) starts/ends on the same date everywhere
  59. DST is always an advancement by 1 hour
  60. Reading the client’s clock and comparing to UTC is a good way to determine their timezone
  61. The software stack will/won’t try to automatically adjust for timezone/DST
  62. My software is only used internally/locally, so I don’t have to worry about timezones
  63. My software stack will handle it without me needing to do anything special
  64. I can easily maintain a timezone list myself
  65. All measurements of time on a given clock will occur within the same frame of reference.
  66. The fact that a date-based function works now means it will work on any date.
  67. Years have 365 or 366 days.
  68. Each calendar date is followed by the next in sequence, without skipping.
  69. A given date and/or time unambiguously identifies a unique moment.
  70. Leap years occur every 4 years.
  71. You can determine the time zone from the state/province.
  72. You can determine the time zone from the city/town.
  73. Time passes at the same speed on top of a mountain and at the bottom of a valley.
  74. One hour is as long as the next in all time systems.
  75. You can calculate when leap seconds will be added.
  76. The precision of the data type returned by a getCurrentTime() function is the same as the precision of that function.
  77. Two subsequent calls to a getCurrentTime() function will return distinct results.
  78. The second of two subsequent calls to a getCurrentTime() function will return a larger result.
  79. The software will never run on a space ship that is orbiting a black hole.

Seriously? Black holes?

Hey, if Bruce Sterling says that my software needs to be resilient against time distortions caused by black holes, I’m going to take him at his word.

Inexact Time

Corrections

Daniel Morrison pointed out that it’s daylight saving time and not daylight savings time. Thanks, I’ve been saying it wrong my whole life!

Rohan Jayasekera suggested a couple of corrections. Thanks!

Credits

Thanks again to everyone who commented. I read everything that you wrote, even if I didn’t wind up including it above.

I made the list above by going through each of the comment threads on Hacker News, Reddit, MetaFilter and BoingBoing (in that order) and finding all(?) of the places where folks had broken out “falsehood 35 to 35 + n” as a bulleted list. I then selectively copied those lists — in the order that I found them. I made small edits for readability and occasionally I paraphrased (this is noted below).

From Hacker News

1-8: JoshTriplett, 9-10: lambda, 11: hc5, 12: chris_wot, 13: einhverfr, 14: masklinn, 15: rmc, 16: jimfl, 17: einhverfr, 18-20: aardvark179, 21-22: bazzargh, 23: my paraphrase of mikeash’s comment, 24-26: edanm, 27: my paraphrase of Mvandenbergh’s comment, 28: derleth, 29: finnw, 30: michaelochurch, 31: cpeterso, 32-33: dfranke, 34: arohner, 35: TazeTSchnitzel, 36: technomancy, 37: sses, 38: DanWaterworth

From Reddit

39-55: benibela2, 56-64: Darkhack, 65: ericanderton, 66: Taladar

From MetaFilter

69-69 : Joe in Australia

From BoingBoing

70-75: Paul

From Twitter

76-79: cmchen

An acknowledgment

This post — like the one before it — owes a great debt to Patrick McKenzie’s canonical blog post about user names, which I have read over and over throughout the years and from which I have shamelessly cribbed both concept and style. If you haven’t yet read this gem, go and do so right now. I promise you’ll enjoy it.

Falsehoods programmers believe about time

Over the past couple of years I have spent a lot of time debugging other engineers’ test code. This was interesting work, occasionally frustrating but always informative. One might not immediately think that test code would have bugs, but of course all code has bugs and tests are no exception.

I have repeatedly been confounded to discover just how many mistakes in both test and application code stem from misunderstandings or misconceptions about time. By this I mean both the interesting way in which computers handle time, and the fundamental gotchas inherent in how we humans have constructed our calendar — daylight savings being just the tip of the iceberg.

In fact I have seen so many of these misconceptions crop up in other people’s (and my own) programs that I thought it would be worthwhile to collect a list of the more common problems here.

All of these assumptions are wrong

  1. There are always 24 hours in a day.
  2. Months have either 30 or 31 days.
  3. Years have 365 days.
  4. February is always 28 days long.
  5. Any 24-hour period will always begin and end in the same day (or week, or month).
  6. A week always begins and ends in the same month.
  7. A week (or a month) always begins and ends in the same year.
  8. The machine that a program runs on will always be in the GMT time zone.
  9. Ok, that’s not true. But at least the time zone in which a program has to run will never change.
  10. Well, surely there will never be a change to the time zone in which a program hast to run in production.
  11. The system clock will always be set to the correct local time.
  12. The system clock will always be set to a time that is not wildly different from the correct local time.
  13. If the system clock is incorrect, it will at least always be off by a consistent number of seconds.
  14. The server clock and the client clock will always be set to the same time.
  15. The server clock and the client clock will always be set to around the same time.
  16. Ok, but the time on the server clock and time on the client clock would never be different by a matter of decades.
  17. If the server clock and the client clock are not in synch, they will at least always be out of synch by a consistent number of seconds.
  18. The server clock and the client clock will use the same time zone.
  19. The system clock will never be set to a time that is in the distant past or the far future.
  20. Time has no beginning and no end.
  21. One minute on the system clock has exactly the same duration as one minute on any other clock
  22. Ok, but the duration of one minute on the system clock will be pretty close to the duration of one minute on most other clocks.
  23. Fine, but the duration of one minute on the system clock would never be more than an hour.
  24. You can’t be serious.
  25. The smallest unit of time is one second.
  26. Ok, one millisecond.
  27. It will never be necessary to set the system time to any value other than the correct local time.
  28. Ok, testing might require setting the system time to a value other than the correct local time but it will never be necessary to do so in production.
  29. Time stamps will always be specified in a commonly-understood format like 1339972628 or 133997262837.
  30. Time stamps will always be specified in the same format.
  31. Time stamps will always have the same level of precision.
  32. A time stamp of sufficient precision can safely be considered unique.
  33. A timestamp represents the time that an event actually occurred.
  34. Human-readable dates can be specified in universally understood formats such as 05/07/11.

UPDATED: There’s more! Read the rest of the falsehoods…

Citizen Eco-Drive wrist watch

That thing about a minute being longer than an hour was a joke, right?

No.

There was a fascinating bug in older versions of KVM on CentOS. Specifically, a KVM virtual machine had no awareness that it was not running on physical hardware. This meant that if the host OS put the VM into a suspended state, the virtualized system clock would retain the time that it had had when it was suspended. E.g. if the VM was suspended at 13:00 and then brought back to an active state two hours later (at 15:00), the system clock on the VM would still reflect a local time of 13:00. The result was that every time a KVM VM went idle, the host OS would put it into a suspended state and the VM’s system clock would start to drift away from reality, sometimes by a large margin depending on how long the VM had remained idle.

There was a cron job that could be installed to keep the virtualized system clock in line with the host OS’s hardware clock. But it was easy to forget to do this on new VMs and failure to do so led to much hilarity. The bug has been fixed in more recent versions.

An acknowledgment

This post owes a great debt to Patrick McKenzie’s canonical blog post about user names, which I have read over and over throughout the years and from which I have shamelessly cribbed both concept and style. If you haven’t yet read this gem, go and do so right now. I promise you’ll enjoy it.

UPDATED: Thanks for your comments and anecdotes!

I’d like to say thanks to everyone who contributed to the comment threads about this post on BoingBoing and Hacker News as well as Reddit and MetaFilter and to everyone on Twitter who shared their strange experiences with time.

You have provided so many interesting edge cases I had forgotten about as well as many oddities of which I wasn’t aware. For instance: in the Jewish calendar, days start at sunset not midnight. And as Bruce Sterling pointed out, I didn’t even think about what happens when the computer is on a spaceship orbiting a black hole.

There’s more than enough material for another (longer!) post about this topic. But first I’ll have to finish reading all >500 of your comments as well as the wealth of awesome research material that has been linked.

I’ve written another post collecting the many other falsehoods that were suggested by your comments at BoingBoing and Hacker News as well as Reddit and MetaFilter and also Twitter.

Thanks again for your enthusiasm and for the mind-boggling level of detail. I learned a lot about time in the last 24 hours. Fellow nerds, I salute you.

Things you should test

A checklist of things that are worth testing in pretty much any software system.

…trailing his fingers along the edge of an incomprehensible computer bank, he reached out and pressed an invitingly large red button on a nearby panel. The panel lit up with the words “Please do not press this button again.”
        ~ Douglas Adams

Software systems are complex and as such exhibit non-deterministic behavior. This is true of any non-trivial system. The behaviors of even a small software product are so varied and unpredictable as to defy complete testing.

However there are general five general areas of interest that are always worth examining because they reveal mistakes with such surprising regularity. Specifically it’s worthwhile to find out how any system handles inputs, math, text, time and system resources.

If like me you are a software developer then it’s commonly accepted that about 50% of your time should be spent in testing rather than writing code. If this seems excessive think about how much time you spent in debugging the last time code you wrote was involved in a production issue. Then think about your level of stress.

In the book The Soul of A New Machine, Tracy Kidder makes a comment to the effect that most career programmers are pack-a-day smokers who eventually drop dead of a heart attack. Don’t be that guy. Time spent testing happens during work hours, within the parameters of an estimated project schedule that you (hopefully) got to sign off on in advance. If you follow the “50% of development time is testing” rule then it’s possible that overall in the course of your career you may spend more time testing than you would have debugging production issues if you hadn’t taken the time to test. But even so, you will spend less time being stressed and less time working on the weekend.

And seriously, tested code is better code. Better code means more reliable products. Reliability in turn leads to better customer experience because reliability engenders trust. Trust in turn is the foundation of the relationship that a product team forms with its customers. Tested code means better customer experience which leads to products that compete more effectively in the marketplace. And that means you keep getting paid, which means you get to keep writing code.

Peter Griffin pushes the forbidden button

Inputs

  1. Minimum and maximum input values are always good to test. For instance, if a password field allows 6 to 128 characters, what actually happens when you submit a six-character password? What about a 128-character password?
  2. Too-high and too-low values. What happens with a 5-character or 129-character password? Alternately, how does the system respond to inputs equal to the the minimum and maximum integer values allowed by the implementation language or platform?
  3. Invalid values such as null and NaN. Strings instead of integers, arrays instead of strings.
  4. Inputs that might break the underlying code. For a Web app examples would include SQL injection and cross-site scripting attacks.
  5. Empty inputs such as a blank user name field or a transaction record in which none of the fields contain any information. For unit tests, submitting zero or an empty string instead of a valid parameter can sometimes yield interesting results.
  6. Inputs that are too big, perhaps even too big to conveniently fit into available memory
  7. Too many inputs or not enough inputs. For a unit test this is simply a matter of creating an incorrect function signature. For a Web app it might involve submitting too many POST parameters or selectively deleting parts of a URL’s query string.

Math

  1. Decimal math is hard. Verify that integers are treated correctly in a floating-point context, and vice versa.
  2. Repeating decimals. Does the system treat 0.666 differently from 0.665?
  3. Rounding. If you put 3 * 1.005 into the system, do you get 3.015 back out? (This is not the default behavior in JavaScript, for instance.)
  4. Type coercion. Is an input of 23 treated differently than an input of "23"? That is: is a numeric input treated differently than a string containing a numeric value?
  5. Units of measurement. If you specify that the thrusters should fire with a force of 267 Newtons, does the guidance system actually interpret that value as Newtons? Or is it interpreted as 267 foot-pounds? (Hat tip to Sebastian Delmont for pointing out that units of measurement are worth testing.)
  6. Units of currency. There’s going to be a problem if an input of £23.00 is stored in the database as $23.00.

Text

  1. User names are perhaps the single most interesting class of text that can be submitted as input to a computer program. At a minimum, the system shouldn’t break when names contain apostrophes, hyphens or spaces.
  2. Passwords are also interesting. Does the maximum password length allow for enough entropy? Are plain-English passphrases disallowed because they don’t contain numbers? Are passwords stored as salted hashes?
  3. Are Unicode inputs treated differently than ASCII?
  4. On the Web, are HTML-encoded entities properly converted to characters and vice versa? What about URL-encoded characters?

Time

  1. Time zones are a bitch. Try switching the system time from GMT to EST and see what happens.
  2. Test on the first and last day of daylight savings time. The system does allow you to mock out the first and last day of daylight savings time, right?
  3. Like with unit tests, boundary conditions can reveal interestingness. How does the system behave between 23:59 and 00:01? What about during the hour between 00:00 and 00:59?
  4. Be very aware of dates and times that are “special” to your system. For instance, if you have a fake user for testing purposes, how does the system respond when it’s that user’s birthday?

System resources

  1. What if there’s half as much available memory as the system’s designers expect?
  2. In a distributed system, what happens if half the nodes become unavailable?
  3. In a service-oriented architecture, what happens if one of the services becomes unavailable? What if it’s only partially available?
  4. What happens if the network is slow?
  5. What happens when the database is down?
  6. What happens when the database is empty?
  7. What happens if the cache is disabled? What about the CDN?
  8. What if load on the system spikes to ten times normal?
  9. What if load on the system drops to zero?
  10. For long-running operations, what happens if you power cycle the machine before the operation is complete?

A flashing red sign reads: COMPUTER MALFUNCTION

Two digressions: names and time

When it comes to Web apps, there are two areas that seem to cause more pain than any other: people’s names and the time. These elements are both common, essential to the correct functioning of a system, and shockingly difficult to get right.

People’s names

There are only two hard problems in Computer Science: cache invalidation, naming things and off-by-one-errors.
        ~ Phil Karlton

My favorite real-world case of a system finding a user’s name “unacceptable” involved a person whose first name was 9. Not “Nine,” mind you but the numeral “9”.

I have a friend named Sonnet (no middle name, no last name) who is unable to complete registration flow for most Web sites. I myself have occasionally been rejected by a registration form because I have no middle name.

When I used to build internal tools for Etsy I worked with a plethora of excellently-named hackers such as Michelle D’Netto, Kellan Elliot-McCrea and of course Ramin Bozorgzadeh. Ramin quickly became my test user of choice because his surname was almost always too long for the single line allotted to display it, thus breaking the UI. And in at least one case an intranet tool (which had been around for several years at that point) was brought down hard by the introduction of a user name that contained an apostrophe. If you’re not as fortunate in the naming scheme of your alpha testers then take care to construct your fixtures appropriately.

Patrick McKenzie wrote the canonical blog post on the intricacies of testing user names. Highly recommended (and highly amusing) reading.

Never, ever use the system time in tests

King and villein, lad and lass,
All answer to the hourglass.
        ~ trad.

Tests that use the system time implicitly test the system clock of whatever machine happens to be running the tests. Speaking from long experience, I can attest that this approach can only lead to unreliable tests and extreme debugging pain. If there is a test that must rely on the system clock then it is better to go without implementing the test than it is to expose yourself to the lost time and frustration that running such a test would surely incur on you and your team.

So, the system you are testing does allow you to mock out all of the necessary times of day and times of year. Right? I hope so because if you’re using the system time in tests, you are doing it completely wrong.

And in my humble opinion, if you’re using the system time in tests because the system you are testing won’t allow you to mock the time, you aren’t the only one doing it wrong — the system itself is fundamentally broken.

Gromit the dog sits at a control panel filled with blinking lights and buttons.

Good hunting

Pretty much every bullet point on each checklist above was drawn from my own direct experience with a mistake that was found either in development or in production. The cost of such knowledge was at the very least some frustration for myself and in other cases a lot of stress and lost time for many people on my team. But as my career has progressed and I’ve moved to larger and larger projects, it’s been really useful to have this information in my head. I like to think I design better software because I’ve been burned in the past.

I hope this checklist helps you to find mistakes in the design and implementation of your own systems as well. I hope you at least will find most of them before they’re caught by your customers in production. Because as software engineers, a clean, well-functioning system is the basic foundation of the trust that our users put in us and in the products we deliver.

Tags: testing

Rapid Infrastructure: Tools You Can Use Even If You Only Have One Line Of Code

There’s an old joke that goes something like this:

Proposition one: all programs have bugs.

Proposition two: all programs can be shortened by one line.

Conclusion: every program can be reduced to one line of buggy code.

Corny, I know ;-;)

But hey, there is a point in the life of every piece of software when the entire system consists of one line of code. That time is at the very beginning of a project, when one has just typed the first bit of code into one’s text editor.

Spider - 1

You might be wondering: why are we talking about projects that contain only one line of code? How could automation possibly help there? And wouldn’t it be overkill to set up tooling to supports a trivially small, new project?

I’ll explain how two automated tools can help you maintain a project, even at the point where you’ve just typed your first line of code. These tools are code review and static analysis.

Start By Establishing A Culture Of Code Review

Recently I sat and talked with Erik Kastner about his thoughts on code review and testing. Erik’s a thoughtful, experienced guy and after working with him for a couple of years I have found that his opinions have become very important to me. Erik says great stuff like “code review is just reading someone else’s code and understanding it before it ships.”

That’s one of the things I like about Kastner — he does more than just propound his methodology. When Erik talks about how he thinks software engineering should or shouldn’t work, he always qualifies his statements. And there can be some pretty surprising insights wrapped up in those qualifications.

So let’s look at this statement again:

Code review means reading and understanding someone else’s code.

This implies that if you and I are working on a project together, you’re going to read my diffs before I commit or merge them into trunk. We might be doing the review in FishEye within a GitHub pull request, or you might just be looking at my commits in our SCM.

But I expect you to do more than just read my changesets. I also should expect you to fully comprehend how the diffs I’m showing you are going to change the behavior of the system. Like many of Kastner’s qualifications to software methodologies, this is a subtle but large distinction.

You need to clone out this shadow here

Ideally every changeset I write gets reviewed by someone else before it goes to production. This is the practice at a lot of large, successful organizations like Google and the JPL. Having a human review every changeset does impose an upper limit on how fast you can deploy code to production. For a new, relatively small project, you might feel that reviewing every changeset is too heavyweight. And you might be right. But keep in mind that it’s a lot easier to put this kind of process in place at the beginning than it is to wait until your application is mature — and you’re definitely going to want a code review process in place at that point.

Static Analysis

Now consider the case where I have written the following one line of code and I ask you to review it. This is a trivial case of course, but I hope it’s still illustrative of why you should spend the time to set up these tools before you write a single line of code. Anyway, here’s my changeset, would you review it before I push it to prod?

<?php

echo "hello world''

Did you catch both of the errors in my code? Probably you did. And I’m sure you noticed the missing semicolon immediately. But did it take you just a moment longer to realize there was something wrong with that closing double quote? If it did, then you were experiencing a trivial increase in cognitive load.

As our application gets larger and my changesets grow in complexity, you’re going to have to endure a greater and greater amount of cognitive load every time you review and debug one of my changesets. That’s not great. You’re a good hacker and our project is going to win because you’re using your whole brain to think about solving hard problems. It’s too bad that instead our new code review process is causing you to fill your brain with thoughts about whether or not I got my punctuation right.

Die perfekte Welle

Besides, checking other people’s syntax is boring drudge work and drudgery is evil. So it’s actually really important that we take a little bit of time at the beginning of our project to make sure that code review imposes as little unnecessary cognitive cost as possible.

Both of the errors I made above actually cause the PHP interpreter to barf. So by induction, there must be a way to catch those errors programmatically. And of course there are several open source tools to help us do exactly that. But the simplest option is to just use the PHP interpreter’s built-in syntax checker:

php -l index.php

Parse error: syntax error, unexpected $end, expecting T_VARIABLE or T_DOLLAR_OPEN_CURLY_BRACES or T_CURLY_OPEN in foo.php on line 3
Errors parsing index.php

Great. Just by running php -l on my code before you review it, you can now avoid winding up as a human syntax-checker. This saves us both time and frustration as we continue to work on our project. Even better, I could run the syntax check on my own code before I send it over to you for review.

It’s worthwhile for us to informally agree that we won’t bother reviewing any code that doesn’t pass a syntax check.

Is It Worth Automating Static Analysis At This Point?

So we’ve made an agreement to always run static analysis on our code before asking someone else to review it. This implies that any code we deploy to production will have been run through static analysis at least once. Even though our project and our team are small, we’ve managed to put in place some important cornerstones on which we can build a healthy engineering culture.

We could codify our new agreement by writing it down in our wiki (if we have one). Another way to codify our contract would be to set up a CI server and configure it to fail the build if anyone commits a file that doesn’t pass the syntax check. Yet another way to do this would be for each of us to run watchr on our laptops, and configure it to throw up a Growl alert whenever the syntax check fails. We can pick one of these automated solutions, spend a couple of hours setting it up and get its benefit throughout the life of our project. So that seems like a worthwhile thing to do, even though so far we only have one line of code.

Intrinsic and Incidental Complexity

There is a very convincing argument to be made that feature-complete software is ultimately more valuable than a readable codebase. That there is more value in what an application does than in how it is put together. Perhaps it is fair to consider architecture, including the comprehensibility of a program as source code, to be of some benefit but still orthogonal to a program’s business value? After all, it’s incontrovertibly true that the most important aspect of developing software is shipping it to the customer. So then isn’t it the case that the real value is all in what the software does?

The Morning Line Istanbul

Why is so much value placed on delivering readable code?

Greg Horvath recently showed me a paper on JPL coding standards (PDF) that encouraged eschewing some pretty basic strategies (recursion, dynamic memory allocation) on the grounds that they lead to code that is somewhat more difficult to run through static analysis. The takeaway for me was that NASA cares a lot about being able to tell what the code is intended to do, without actually running it.

So while shipping feature-complete software is obviously important, it seems that shipping readable software is really important, too. Why? I think it’s because comprehensibility of components contributes to the resiliency of a system overall. A prototype that works is good. A prototype that can evolve rapidly is even better.

The Morning Line

Complexity is an inherent property of software. Comprehensibility is not.

It’s interesting to note that Dijkstra espoused a readable-code-over-feature-completeness approach to software architecture. He contrasted the two mindsets as “postulational” and “operational,” respectively. Postulational meaning one can postulate about what a program does just by reading the source. Operational meaning that one bases one’s expectations about what a program will do, on (educated) assumptions about what operations will be carried out when that program is executed.

Dijkstra also once pronounced that software is the most complex product ever produced by human effort. When delivering any non-trivial software application, some degree of complexity is intrinsic to the task. And obviously, concepts that are complex do not lend themselves to implementations that are easily readable. So maintaining comprehensibility in the components of a complex system, turns out to be a rather difficult problem.

Incidental complexity, once identified, can eventually be factored out, leading to code that is more readable overall. Therefore it is valuable to take the (sometimes considerable amount of) time to distinguish between intrinsic and incidental complexity, and to continually either avoid or remove the incidental complexities that over time can make a codebase harder and harder to read.

The Evening Line

The most interesting part of delivering software is watching what users do with it.

In order to provide a satisfying user experience over the long term, any software needs to be able to adapt iteratively and rapidly to the unpredictable needs and desires of its user base. Resilient systems are best able to adapt, because successful adaptation requires constant readjustment in the face of new circumstances. So there is considerable value in preserving the readability of source code throughout the life of a system.