The Commit Mutex Problem

When I worked at Etsy, the problem of many developers committing to trunk at once was colloquially known as the “commit mutex” problem.

A CI system can effectively become “blocked” if too many people commit and trigger too many builds within a relatively short time period. All nodes in the CI cluster are busy but new builds keep getting queued. The resulting “build queue” means that it takes progressively longer to get feedback from CI. The longer it takes to get feedback, the less useful the CI system.

keep the feedback loop short

Throwing hardware at the problem is a reasonable solution, as long as you can afford it. In fact almost every time we had to add hardware to the CI cluster, it was because of the commit mutex. There are a very few cases where we added a new tool that made the build slower <cough>code coverage </cough> but the far more common case for adding hardware was: more people want to commit more often, and they all want to run all the tests.

Everyone committing small chunks of code all the time is a Very Good Thing. “Everyone commits to trunk every day” is one of the tenets of continuous integration. Yet there’s still an upper limit on how fast and how concurrently everyone can (or should) deploy.

Avoiding the commit mutex and preventing build queues are in my humble opinion two of the hard problems in continuous deployment.

More falsehoods programmers believe about time; “wisdom of the crowd” edition

A couple of days ago I decided to write down some of the things I’ve learned about testing over the course of the last several years. In the course of enumerating the areas that benefit most from testing, I realized that I had accumulated a lot of specific thoughts about how we as programmers tend to abuse the concept of time.

So I wrote another post called falsehoods programmers believe about time,” where I included 34 misconceptions and mistakes having to do with both calendar and system time. Most of these were drawn from my immediate experience with code that needed to be debugged (both in production and in test).

Sub 5 Seconds

A great many of the false assumptions listed were my own. Especially “time stamps are always in seconds since epoch” and “the duration of a system clock minute is always pretty close to the duration of a wall clock minute.” Whoa did I ever live to regret my ignorance in those two cases! But hey, apparently I’m not the only one who has run into (or inadvertently caused) such issues. A lot of people responded and shared similar experiences.

UPDATED: I’d like to say a big thanks to all the Redditors who have been discussing this post recently. I have read every single one of your comments :) and learned about fun stuff like the year zero and International Atomic Time.

I’d like to say an enormous thanks to everyone who contributed to the comment threads on BoingBoing and Hacker News as well as Reddit and MetaFilter and to everyone on Twitter who shared their strange experiences with time. In those thousand or so comments and tweets, there were a lot of suggestions as to “falsehoods 35 to 35+n.”

First and foremost was the omission of the false assumption that “time always moves forward,” as pointed out by Technomancy and many others. I enjoyed reading all the suggested falsehoods. When I was done reading, I realized that taken as a whole, these constitute a whole other blog post. So I collected some of your suggested falsehoods into a post and here it is.

@JackieJack tweeted: 'I figured it out. My brain is a computer, running VMware & my spirit is a hapless programmer'

All of these assumptions are wrong

All of these falsehoods were suggested by people who commented on the original post. Each contributor is credited below.

  1. The offsets between two time zones will remain constant.
  2. OK, historical oddities aside, the offsets between two time zones won’t change in the future.
  3. Changes in the offsets between time zones will occur with plenty of advance notice.
  4. Daylight saving time happens at the same time every year.
  5. Daylight saving time happens at the same time in every time zone.
  6. Daylight saving time always adjusts by an hour.
  7. Months have either 28, 29, 30, or 31 days.
  8. The day of the month always advances contiguously from N to either N+1 or 1, with no discontinuities.
  9. There is only one calendar system in use at one time.
  10. There is a leap year every year divisible by 4.
  11. Non leap years will never contain a leap day.
  12. It will be easy to calculate the duration of x number of hours and minutes from a particular point in time.
  13. The same month has the same number of days in it everywhere!
  14. Unix time is completely ignorant about anything except seconds.
  15. Unix time is the number of seconds since Jan 1st 1970.
  16. The day before Saturday is always Friday.
  17. Contiguous timezones are no more than an hour apart. (aka we don’t need to test what happens to the avionics when you fly over the International Date Line)
  18. Two timezones that differ will differ by an integer number of half hours.
  19. Okay, quarter hours.
  20. Okay, seconds, but it will be a consistent difference if we ignore DST.
  21. If you create two date objects right beside each other, they’ll represent the same time. (a fantastic Heisenbug generator)
  22. You can wait for the clock to reach exactly HH:MM:SS by sampling once a second.
  23. If a process runs for n seconds and then terminates, approximately n seconds will have elapsed on the system clock at the time of termination.
  24. Weeks start on Monday.
  25. Days begin in the morning.
  26. Holidays span an integer number of whole days.
  27. The weekend consists of Saturday and Sunday.
  28. It’s possible to establish a total ordering on timestamps that is useful outside your system.
  29. The local time offset (from UTC) will not change during office hours.
  30. Thread.sleep(1000) sleeps for 1000 milliseconds.
  31. Thread.sleep(1000) sleeps for >= 1000 milliseconds.
  32. There are 60 seconds in every minute.
  33. Timestamps always advance monotonically.
  34. GMT and UTC are the same timezone.
  35. Britain uses GMT.
  36. Time always goes forwards.
  37. The difference between the current time and one week from the current time is always 7 * 86400 seconds.
  38. The difference between two timestamps is an accurate measure of the time that elapsed between them.
  39. 24:12:34 is a invalid time
  40. Every integer is a theoretical possible year
  41. If you display a datetime, the displayed time has the same second part as the stored time
  42. Or the same year
  43. But at least the numerical difference between the displayed and stored year will be less than 2
  44. If you have a date in a correct YYYY-MM-DD format, the year consists of four characters
  45. If you merge two dates, by taking the month from the first and the day/year from the second, you get a valid date
  46. But it will work, if both years are leap years
  47. If you take a w3c published algorithm for adding durations to dates, it will work in all cases.
  48. The standard library supports negative years and years above 10000.
  49. Time zones always differ by a whole hour
  50. If you convert a timestamp with millisecond precision to a date time with second precision, you can safely ignore the millisecond fractions
  51. But you can ignore the millisecond fraction, if it is less than 0.5
  52. Two-digit years should be somewhere in the range 1900-2099
  53. If you parse a date time, you can read the numbers character for character, without needing to backtrack
  54. But if you print a date time, you can write the numbers character for character, without needing to backtrack
  55. You will never have to parse a format like ---12Z or P12Y34M56DT78H90M12.345S
  56. There are only 24 time zones
  57. Time zones are always whole hours away from UTC
  58. Daylight Saving Time (DST) starts/ends on the same date everywhere
  59. DST is always an advancement by 1 hour
  60. Reading the client’s clock and comparing to UTC is a good way to determine their timezone
  61. The software stack will/won’t try to automatically adjust for timezone/DST
  62. My software is only used internally/locally, so I don’t have to worry about timezones
  63. My software stack will handle it without me needing to do anything special
  64. I can easily maintain a timezone list myself
  65. All measurements of time on a given clock will occur within the same frame of reference.
  66. The fact that a date-based function works now means it will work on any date.
  67. Years have 365 or 366 days.
  68. Each calendar date is followed by the next in sequence, without skipping.
  69. A given date and/or time unambiguously identifies a unique moment.
  70. Leap years occur every 4 years.
  71. You can determine the time zone from the state/province.
  72. You can determine the time zone from the city/town.
  73. Time passes at the same speed on top of a mountain and at the bottom of a valley.
  74. One hour is as long as the next in all time systems.
  75. You can calculate when leap seconds will be added.
  76. The precision of the data type returned by a getCurrentTime() function is the same as the precision of that function.
  77. Two subsequent calls to a getCurrentTime() function will return distinct results.
  78. The second of two subsequent calls to a getCurrentTime() function will return a larger result.
  79. The software will never run on a space ship that is orbiting a black hole.

Seriously? Black holes?

Hey, if Bruce Sterling says that my software needs to be resilient against time distortions caused by black holes, I’m going to take him at his word.

Inexact Time

Corrections

Daniel Morrison pointed out that it’s daylight saving time and not daylight savings time. Thanks, I’ve been saying it wrong my whole life!

Rohan Jayasekera suggested a couple of corrections. Thanks!

Credits

Thanks again to everyone who commented. I read everything that you wrote, even if I didn’t wind up including it above.

I made the list above by going through each of the comment threads on Hacker News, Reddit, MetaFilter and BoingBoing (in that order) and finding all(?) of the places where folks had broken out “falsehood 35 to 35 + n” as a bulleted list. I then selectively copied those lists — in the order that I found them. I made small edits for readability and occasionally I paraphrased (this is noted below).

From Hacker News

1-8: JoshTriplett, 9-10: lambda, 11: hc5, 12: chris_wot, 13: einhverfr, 14: masklinn, 15: rmc, 16: jimfl, 17: einhverfr, 18-20: aardvark179, 21-22: bazzargh, 23: my paraphrase of mikeash’s comment, 24-26: edanm, 27: my paraphrase of Mvandenbergh’s comment, 28: derleth, 29: finnw, 30: michaelochurch, 31: cpeterso, 32-33: dfranke, 34: arohner, 35: TazeTSchnitzel, 36: technomancy, 37: sses, 38: DanWaterworth

From Reddit

39-55: benibela2, 56-64: Darkhack, 65: ericanderton, 66: Taladar

From MetaFilter

69-69 : Joe in Australia

From BoingBoing

70-75: Paul

From Twitter

76-79: cmchen

An acknowledgment

This post — like the one before it — owes a great debt to Patrick McKenzie’s canonical blog post about user names, which I have read over and over throughout the years and from which I have shamelessly cribbed both concept and style. If you haven’t yet read this gem, go and do so right now. I promise you’ll enjoy it.

Falsehoods programmers believe about time

Over the past couple of years I have spent a lot of time debugging other engineers’ test code. This was interesting work, occasionally frustrating but always informative. One might not immediately think that test code would have bugs, but of course all code has bugs and tests are no exception.

I have repeatedly been confounded to discover just how many mistakes in both test and application code stem from misunderstandings or misconceptions about time. By this I mean both the interesting way in which computers handle time, and the fundamental gotchas inherent in how we humans have constructed our calendar — daylight savings being just the tip of the iceberg.

In fact I have seen so many of these misconceptions crop up in other people’s (and my own) programs that I thought it would be worthwhile to collect a list of the more common problems here.

All of these assumptions are wrong

  1. There are always 24 hours in a day.
  2. Months have either 30 or 31 days.
  3. Years have 365 days.
  4. February is always 28 days long.
  5. Any 24-hour period will always begin and end in the same day (or week, or month).
  6. A week always begins and ends in the same month.
  7. A week (or a month) always begins and ends in the same year.
  8. The machine that a program runs on will always be in the GMT time zone.
  9. Ok, that’s not true. But at least the time zone in which a program has to run will never change.
  10. Well, surely there will never be a change to the time zone in which a program hast to run in production.
  11. The system clock will always be set to the correct local time.
  12. The system clock will always be set to a time that is not wildly different from the correct local time.
  13. If the system clock is incorrect, it will at least always be off by a consistent number of seconds.
  14. The server clock and the client clock will always be set to the same time.
  15. The server clock and the client clock will always be set to around the same time.
  16. Ok, but the time on the server clock and time on the client clock would never be different by a matter of decades.
  17. If the server clock and the client clock are not in synch, they will at least always be out of synch by a consistent number of seconds.
  18. The server clock and the client clock will use the same time zone.
  19. The system clock will never be set to a time that is in the distant past or the far future.
  20. Time has no beginning and no end.
  21. One minute on the system clock has exactly the same duration as one minute on any other clock
  22. Ok, but the duration of one minute on the system clock will be pretty close to the duration of one minute on most other clocks.
  23. Fine, but the duration of one minute on the system clock would never be more than an hour.
  24. You can’t be serious.
  25. The smallest unit of time is one second.
  26. Ok, one millisecond.
  27. It will never be necessary to set the system time to any value other than the correct local time.
  28. Ok, testing might require setting the system time to a value other than the correct local time but it will never be necessary to do so in production.
  29. Time stamps will always be specified in a commonly-understood format like 1339972628 or 133997262837.
  30. Time stamps will always be specified in the same format.
  31. Time stamps will always have the same level of precision.
  32. A time stamp of sufficient precision can safely be considered unique.
  33. A timestamp represents the time that an event actually occurred.
  34. Human-readable dates can be specified in universally understood formats such as 05/07/11.

UPDATED: There’s more! Read the rest of the falsehoods…

Citizen Eco-Drive wrist watch

That thing about a minute being longer than an hour was a joke, right?

No.

There was a fascinating bug in older versions of KVM on CentOS. Specifically, a KVM virtual machine had no awareness that it was not running on physical hardware. This meant that if the host OS put the VM into a suspended state, the virtualized system clock would retain the time that it had had when it was suspended. E.g. if the VM was suspended at 13:00 and then brought back to an active state two hours later (at 15:00), the system clock on the VM would still reflect a local time of 13:00. The result was that every time a KVM VM went idle, the host OS would put it into a suspended state and the VM’s system clock would start to drift away from reality, sometimes by a large margin depending on how long the VM had remained idle.

There was a cron job that could be installed to keep the virtualized system clock in line with the host OS’s hardware clock. But it was easy to forget to do this on new VMs and failure to do so led to much hilarity. The bug has been fixed in more recent versions.

An acknowledgment

This post owes a great debt to Patrick McKenzie’s canonical blog post about user names, which I have read over and over throughout the years and from which I have shamelessly cribbed both concept and style. If you haven’t yet read this gem, go and do so right now. I promise you’ll enjoy it.

UPDATED: Thanks for your comments and anecdotes!

I’d like to say thanks to everyone who contributed to the comment threads about this post on BoingBoing and Hacker News as well as Reddit and MetaFilter and to everyone on Twitter who shared their strange experiences with time.

You have provided so many interesting edge cases I had forgotten about as well as many oddities of which I wasn’t aware. For instance: in the Jewish calendar, days start at sunset not midnight. And as Bruce Sterling pointed out, I didn’t even think about what happens when the computer is on a spaceship orbiting a black hole.

There’s more than enough material for another (longer!) post about this topic. But first I’ll have to finish reading all >500 of your comments as well as the wealth of awesome research material that has been linked.

I’ve written another post collecting the many other falsehoods that were suggested by your comments at BoingBoing and Hacker News as well as Reddit and MetaFilter and also Twitter.

Thanks again for your enthusiasm and for the mind-boggling level of detail. I learned a lot about time in the last 24 hours. Fellow nerds, I salute you.

Things you should test

A checklist of things that are worth testing in pretty much any software system.

…trailing his fingers along the edge of an incomprehensible computer bank, he reached out and pressed an invitingly large red button on a nearby panel. The panel lit up with the words “Please do not press this button again.”
        ~ Douglas Adams

Software systems are complex and as such exhibit non-deterministic behavior. This is true of any non-trivial system. The behaviors of even a small software product are so varied and unpredictable as to defy complete testing.

However there are general five general areas of interest that are always worth examining because they reveal mistakes with such surprising regularity. Specifically it’s worthwhile to find out how any system handles inputs, math, text, time and system resources.

If like me you are a software developer then it’s commonly accepted that about 50% of your time should be spent in testing rather than writing code. If this seems excessive think about how much time you spent in debugging the last time code you wrote was involved in a production issue. Then think about your level of stress.

In the book The Soul of A New Machine, Tracy Kidder makes a comment to the effect that most career programmers are pack-a-day smokers who eventually drop dead of a heart attack. Don’t be that guy. Time spent testing happens during work hours, within the parameters of an estimated project schedule that you (hopefully) got to sign off on in advance. If you follow the “50% of development time is testing” rule then it’s possible that overall in the course of your career you may spend more time testing than you would have debugging production issues if you hadn’t taken the time to test. But even so, you will spend less time being stressed and less time working on the weekend.

And seriously, tested code is better code. Better code means more reliable products. Reliability in turn leads to better customer experience because reliability engenders trust. Trust in turn is the foundation of the relationship that a product team forms with its customers. Tested code means better customer experience which leads to products that compete more effectively in the marketplace. And that means you keep getting paid, which means you get to keep writing code.

Peter Griffin pushes the forbidden button

Inputs

  1. Minimum and maximum input values are always good to test. For instance, if a password field allows 6 to 128 characters, what actually happens when you submit a six-character password? What about a 128-character password?
  2. Too-high and too-low values. What happens with a 5-character or 129-character password? Alternately, how does the system respond to inputs equal to the the minimum and maximum integer values allowed by the implementation language or platform?
  3. Invalid values such as null and NaN. Strings instead of integers, arrays instead of strings.
  4. Inputs that might break the underlying code. For a Web app examples would include SQL injection and cross-site scripting attacks.
  5. Empty inputs such as a blank user name field or a transaction record in which none of the fields contain any information. For unit tests, submitting zero or an empty string instead of a valid parameter can sometimes yield interesting results.
  6. Inputs that are too big, perhaps even too big to conveniently fit into available memory
  7. Too many inputs or not enough inputs. For a unit test this is simply a matter of creating an incorrect function signature. For a Web app it might involve submitting too many POST parameters or selectively deleting parts of a URL’s query string.

Math

  1. Decimal math is hard. Verify that integers are treated correctly in a floating-point context, and vice versa.
  2. Repeating decimals. Does the system treat 0.666 differently from 0.665?
  3. Rounding. If you put 3 * 1.005 into the system, do you get 3.015 back out? (This is not the default behavior in JavaScript, for instance.)
  4. Type coercion. Is an input of 23 treated differently than an input of "23"? That is: is a numeric input treated differently than a string containing a numeric value?
  5. Units of measurement. If you specify that the thrusters should fire with a force of 267 Newtons, does the guidance system actually interpret that value as Newtons? Or is it interpreted as 267 foot-pounds? (Hat tip to Sebastian Delmont for pointing out that units of measurement are worth testing.)
  6. Units of currency. There’s going to be a problem if an input of £23.00 is stored in the database as $23.00.

Text

  1. User names are perhaps the single most interesting class of text that can be submitted as input to a computer program. At a minimum, the system shouldn’t break when names contain apostrophes, hyphens or spaces.
  2. Passwords are also interesting. Does the maximum password length allow for enough entropy? Are plain-English passphrases disallowed because they don’t contain numbers? Are passwords stored as salted hashes?
  3. Are Unicode inputs treated differently than ASCII?
  4. On the Web, are HTML-encoded entities properly converted to characters and vice versa? What about URL-encoded characters?

Time

  1. Time zones are a bitch. Try switching the system time from GMT to EST and see what happens.
  2. Test on the first and last day of daylight savings time. The system does allow you to mock out the first and last day of daylight savings time, right?
  3. Like with unit tests, boundary conditions can reveal interestingness. How does the system behave between 23:59 and 00:01? What about during the hour between 00:00 and 00:59?
  4. Be very aware of dates and times that are “special” to your system. For instance, if you have a fake user for testing purposes, how does the system respond when it’s that user’s birthday?

System resources

  1. What if there’s half as much available memory as the system’s designers expect?
  2. In a distributed system, what happens if half the nodes become unavailable?
  3. In a service-oriented architecture, what happens if one of the services becomes unavailable? What if it’s only partially available?
  4. What happens if the network is slow?
  5. What happens when the database is down?
  6. What happens when the database is empty?
  7. What happens if the cache is disabled? What about the CDN?
  8. What if load on the system spikes to ten times normal?
  9. What if load on the system drops to zero?
  10. For long-running operations, what happens if you power cycle the machine before the operation is complete?

A flashing red sign reads: COMPUTER MALFUNCTION

Two digressions: names and time

When it comes to Web apps, there are two areas that seem to cause more pain than any other: people’s names and the time. These elements are both common, essential to the correct functioning of a system, and shockingly difficult to get right.

People’s names

There are only two hard problems in Computer Science: cache invalidation, naming things and off-by-one-errors.
        ~ Phil Karlton

My favorite real-world case of a system finding a user’s name “unacceptable” involved a person whose first name was 9. Not “Nine,” mind you but the numeral “9”.

I have a friend named Sonnet (no middle name, no last name) who is unable to complete registration flow for most Web sites. I myself have occasionally been rejected by a registration form because I have no middle name.

When I used to build internal tools for Etsy I worked with a plethora of excellently-named hackers such as Michelle D’Netto, Kellan Elliot-McCrea and of course Ramin Bozorgzadeh. Ramin quickly became my test user of choice because his surname was almost always too long for the single line allotted to display it, thus breaking the UI. And in at least one case an intranet tool (which had been around for several years at that point) was brought down hard by the introduction of a user name that contained an apostrophe. If you’re not as fortunate in the naming scheme of your alpha testers then take care to construct your fixtures appropriately.

Patrick McKenzie wrote the canonical blog post on the intricacies of testing user names. Highly recommended (and highly amusing) reading.

Never, ever use the system time in tests

King and villein, lad and lass,
All answer to the hourglass.
        ~ trad.

Tests that use the system time implicitly test the system clock of whatever machine happens to be running the tests. Speaking from long experience, I can attest that this approach can only lead to unreliable tests and extreme debugging pain. If there is a test that must rely on the system clock then it is better to go without implementing the test than it is to expose yourself to the lost time and frustration that running such a test would surely incur on you and your team.

So, the system you are testing does allow you to mock out all of the necessary times of day and times of year. Right? I hope so because if you’re using the system time in tests, you are doing it completely wrong.

And in my humble opinion, if you’re using the system time in tests because the system you are testing won’t allow you to mock the time, you aren’t the only one doing it wrong — the system itself is fundamentally broken.

Gromit the dog sits at a control panel filled with blinking lights and buttons.

Good hunting

Pretty much every bullet point on each checklist above was drawn from my own direct experience with a mistake that was found either in development or in production. The cost of such knowledge was at the very least some frustration for myself and in other cases a lot of stress and lost time for many people on my team. But as my career has progressed and I’ve moved to larger and larger projects, it’s been really useful to have this information in my head. I like to think I design better software because I’ve been burned in the past.

I hope this checklist helps you to find mistakes in the design and implementation of your own systems as well. I hope you at least will find most of them before they’re caught by your customers in production. Because as software engineers, a clean, well-functioning system is the basic foundation of the trust that our users put in us and in the products we deliver.

Tags: testing