When I worked at Etsy, the problem of many developers committing to trunk at once was colloquially known as the “commit mutex” problem.
A CI system can effectively become “blocked” if too many people commit and trigger too many builds within a relatively short time period. All nodes in the CI cluster are busy but new builds keep getting queued. The resulting “build queue” means that it takes progressively longer to get feedback from CI. The longer it takes to get feedback, the less useful the CI system.
Throwing hardware at the problem is a reasonable solution, as long as you can afford it. In fact almost every time we had to add hardware to the CI cluster, it was because of the commit mutex. There are a very few cases where we added a new tool that made the build slower <cough>code coverage </cough> but the far more common case for adding hardware was: more people want to commit more often, and they all want to run all the tests.
Everyone committing small chunks of code all the time is a Very Good Thing. “Everyone commits to trunk every day” is one of the tenets of continuous integration. Yet there’s still an upper limit on how fast and how concurrently everyone can (or should) deploy.
Avoiding the commit mutex and preventing build queues are in my humble opinion two of the hard problems in continuous deployment.
It seems like every week I have a long conversation with someone on
the topic of hiring programmers. tl;dr: there are no programmers for hire in
New York in 2013. The job market is such that all decent programmers
are gainfully employed. In order to hire a developers it is necessary
to convince them to leave their current job. This will be true for
the foreseeable future.
Personally I have observed that the number of contacts I receive from
recruiters has now surpassed the number of contacts I used to receive
during the dot-com boom. Which makes sense. In the late 90s the
Internet had potential but only a few million people used
it. Today the Internet has become an essential household appliance,
like cable TV or the washing machine. And in the intervening ten
years, the number of qualified software engineers has not risen to meet demand.
It may take a generation before we once again see a reasonable ratio
of job opportunities to competent software engineers.
There aren’t enough software engineers to fill all the roles
The most poignant question recruiters ask me lately is: do you know
someone else who would be interested in this role?
No. The last time I knew a programmer who was unemployed, was
in 2010. And they were only out of work for three months. And that
was basically because they were being super picky about what kind of
early-stage startup they wanted to join.
So no. No I don’t know any programmers who are looking for work.
And I don’t expect to meet an unemployed programmer again. Unless the
Internet stops being a thing. Or unless the American school system
starts turning out class after class of qualified,
passionate journeyman software engineers.
All the good programmers I know are gainfully employed
I get a lot of recruiter contacts and I read every single one. I am
always looking for people’s thoughts as to me why I would leave my
current job and come to their office to work on their project
every day for the next 2-3 years. Unfortunately most job postings are
little more than a list of keywords and a salary range.
Still, keywords and salary ranges are interesting. I always take
note of which languages are trending among tech recruiters. And did
you know that a Web Developer makes less money than a Software
Engineer?
Since 2005 I’ve used Salary.com (and more recently
GlassDoor) to get a general sense of what various job
titles were worth around New York. Combining that with the actual
salaries for roles I’ve seen mentioned in recruiter emails gives a
pretty accurate picture of real salary ranges.
Let’s take it as read that any decent programmer performs that kind of
analysis on a regular basis.
To recruit decent developers it’s always necessary to have a
reasonably interesting narrative about solving problems. In the current job market, it also takes competitive pay and a work
environment that is conducive to writing a lot of code. Job postings
that fail to communicate those qualities will not be successful.
Salary ranges from Salary.com and GlassDoor were retrieved on January 27 2013. When reviewing these graphs, keep in mind that demand is high and therefore salaries in the median range are unlikely to draw qualified candidates. Anecdotal evidence suggests that engineers are willing to consider a new role at salaries closer to the 80th percentile.
In writing bug reports, I have found it helpful to take the attitude
that I am engaged in scientific inquiry in the relatively
new field that is software. I’ve gotten the best results when
I constructed my communications around bugs in a way that lent itself
to application of the Scientific Method (or a
rough approximation thereof).
Description Of Problem, Steps To Reproduce, and Expected Resolution
Steps to reproduce the problem, in the form of a numbered list.
Expected resolution of the problem, described using at least
two complete sentences.
In the rest of this post I’m going to delve at length into some of the
techniques I’ve learned for effective communication where bugs are
concerned. But even if you stop reading right here, you already know
the most important technique: write bug reports that consist of a
description of the problem, steps to reproduce, and an expected
resolution.
Provide context, and lots of it.
The problem with communication is the illusion that it has taken
place. George Bernard Shaw
There’s a phenomenon by which software that was working just fine,
comes to be seen over time as increasingly buggy. In fact this
phenomenon occurs so often that it has a name: bit rot.
Very rarely, bit rot occurs due to actual changes in the software. But
much more often the apparent “rot” is actually due the software
staying the same while the attitudes and habits of its users slowly
change and evolve.
As the existence of bit rot demonstrates, it is not only possibly but
very likely that, given two different people, one of them might
perceive a behavior as buggy while the other person might see the
system as working just fine. Not to mention the costs associated with
other consequences of miscommunication including:
Overcoming such disconnects requires careful statement of the problem
at hand, with sensitivity to the different perspectives of one’s
audience.
Describe the problem, using at least two complete sentences.
Usually it’s enough to write two
complete sentences that describe the problem in
detail. The first sentence should generally be of the form “Feature X
is exhibiting behavior Y, but it was expected to exhibit behavior Z.”
The second sentence should expand upon or clarify the first.
For example: “Feature X is redirecting signed-in users to the
cart, but it should be redirecting to the new holiday promotion. This
is happening in IE and Chrome.”
When describing thornier or larger issues, it may be useful to use a
problem statement template as a starting
point.
Always include screenshots that illustrate the problem.
Screenshots of errors in production are a
fundamentally important artifact in the reporting of bugs and the
specification of improvements to existing features.
Having a visual picture of a problem, is of huge benefit to people who
weren’t there to witness the issue firsthand. Because of the
deeply visual wiring of our brains, there are certain
aspects of most problems that are best communicated via a picture. This
tends to be true regardless the level of detail provided in writing.
Thus most bug reports cannot be considered complete without at least
one screenshot.
Present the Steps To Reproduce as a numbered list of discrete actions.
More than any other factor, the ability to
reproduce a problem will determine whether or not
the problem gets fixed. Failure to provide enough information to
reproduce an issue is one of the most insidious and frustrating things
that can go wrong with a bug report. In the worst case, everyone who
tries to reproduce the issue winds up feeling like they’ve wasted time
over nothing while the original reporter of the bug feels like the
owner of a singing frog.
Numbered lists are the ideal format for explaining how to reproduce a
bug because a list makes it immediately obvious how many steps there
are as well as where each step leaves off and the next step
begins. Confusion about “minor” distinctions like that can lead to
major headaches later on.
Here’s an example of a reasonably detailed “steps to reproduce”
section:
Open a browser (latest version of any major browser is fine).
Navigate to foo.example.com/my/awesome/thing
Sign in as the user “HomerJ”
Observe that the text of the menu item in the upper right corner,
runs off the right edge of the screen.
And by counter example, here’s the same set of steps with all the
helpful context removed.
Look at the dev site.
The menu is wonky.
In the best case, a developer who reads the second example is confused
but figures it out. In the worst case, she notices that the background
color of the menu in question is green, recalls (possibly wrongly)
that the background color was meant to be blue, and spends the rest of
the day “fixing” that before she realizes she was only meant to be
moving the menu a couple of pixels to the left. Or she signs in as a
user whose preferences specify a totally different menu
look-and-feel. Or she goes to the wrong part of the dev site and
looks at the wrong menu. More misunderstandings are possible in this
situation but their enumeration is left as an exercise for the reader.
Describe the expected resolution, using at least two complete sentences.
This is where it all comes together. The reader has been informed of
the nature of the problem and has walked through the steps to reproduce
it. What remains now are the tasks of inventing a fix for the issue,
applying it to the production application without breaking anything
else, and finally verifying that the fix as implemented actually
solves the original problem.
It is generally insufficient to describe the expected resolution of a
software bug in less than
two complete sentences. Even a trivial bug
represents at best a failure to properly implement the software as
specified. Such failures inevitably stem from misunderstandings of one
sort or another. So it is reasonable to approach writing up the
desired state of a software feature, as an exercise in bridging a
pre-existing communication gap.
To this end it’s also best to avoid domain-specific jargon and
acronyms. Stating the desired outcome in plain English can be more
challenging than it might first appear. But it’s worth the effort.
Be quantitative. Engineers respond well to hard evidence.
Most bugs, most of the time, are easily nailed given even an
incomplete but suggestive characterization of their error conditions
at source-code level. When someone among your beta-testers can point
out, “there’s a boundary problem in line nnn”, or even just “under
conditions X, Y, and Z, this variable rolls over”, a quick look at
the offending code often suffices to pin down the exact mode of
failure and generate a fix. Eric
Raymond
Screenshots are the basic currency of good bug reports. But there are
numerous other artifacts which could be helpful in diagnosing an
issue. These include but are not limited to:
Graphs from tools like Graphite, Ganglia,
Cacti, Cloudkick, Google Analytics, etc.
In the case of a Web application it is also extremely helpful to
provide hyperlinks pointing directly to the page(s) in production
where an issue has been observed.
As a general rule, the more hard data provided, the
better. Anectdotes about frustrated users can be a useful tool for
understanding why a fixing a particular bug might be important. But
knowing there’s an error at line 142 (for instance) is much more
likely to turn out to be key to actually implementing the fix.
A bug report is a story about a problem.
A good story has a beginning, middle and end. A good bug report has a
description of the problem, steps to reproduce a problem and concludes
by stating the expected resolution. These three parts of a bug report
map neatly to the narrative arc that underlies all good stories.
description of problem
→
establishment of the setting and characters
steps to reproduce
→
rising action
expected resolution
→
climax and resolution
Stories are powerful tools for communication. The wetware of the human
brain is heavily optimized for the processing of information that is
structured as a narrative.
In her talk “How To Test The Inside of Your Head,” Liz Keogh discusses
confirmation bias, the Face on Mars at Cydonia, the Martian Canals of
Percival Lowell and how we as a species are sometimes predisposed to
believe obvious but incorrect explanations. I highly recommend
watching the whole talk but here is the really pertinent bit.
Writing bug reports as I’ve described helps prevent misunderstandings
and wasted effort. But there’s more to it than that. By providing a
complete narrative, a good bug report sets the stage for everyone
involved to act within a larger, cohesive context. Human beings feel
better and make better decisions when they feel like the world around
them makes sense. Stories help us to make sense of the world and a
good bug report can go a long way toward making a chaotic situation
suddenly feel manageable.
Thus the larger benefit of including complete information in a bug
report can be to directly reduce stress and therefore indirectly to
increase the capability of the team to rapidly resolve production
issues.
This article was written as a general guide, with the
novice-to-intermediate practitioner in mind. The techniques described
above reflect what has worked well in my limited experience. For other
perspectives the reader is referred to the many other fine bug writing
guides available on the Internet, including:
Google Chrome bug life cycle and reporting guidelines
A couple of days ago I decided to
write down some of the things I’ve learned about testing
over the course of the last several years. In the
course of enumerating the areas that benefit most from testing, I
realized that I had accumulated a lot of specific thoughts about how
we as programmers tend to abuse the concept of time.
So I wrote another post called
“falsehoods programmers believe about time,” where
I included 34 misconceptions and mistakes having to do with both
calendar and system time. Most of these were drawn from my immediate
experience with code that needed to be debugged (both in production
and in test).
A great many of the false assumptions listed were my own. Especially “time stamps are always in seconds since epoch” and “the duration of a system clock minute is always pretty close to the duration of a wall clock minute.” Whoa did I ever live to regret my ignorance in those two cases! But hey, apparently I’m not the only one who has run into (or inadvertently caused) such issues. A lot of people responded and shared similar experiences.
UPDATED: I’d like to say a big thanks to all the Redditors who have been discussing this post recently. I have read every single one of your comments :) and learned about fun stuff like the year zero and International Atomic Time.
I’d like to say an enormous thanks to everyone who contributed to
the comment threads on BoingBoing and
Hacker News as well as Reddit and
MetaFilter and to everyone on Twitter who
shared their strange experiences with time. In those thousand or so comments
and tweets, there were a lot of suggestions as to “falsehoods 35 to 35+n.”
First and foremost was the omission of the false assumption that “time
always moves forward,” as pointed out by Technomancy and many others.
I enjoyed reading all the suggested falsehoods. When I was done
reading, I realized that taken as a whole,
these constitute a whole other blog post. So I collected some of your suggested falsehoods
into a post and here it is.
All of these assumptions are wrong
All of these falsehoods were suggested by people who commented on the
original post. Each contributor is credited below.
The offsets between two time zones will remain constant.
OK, historical oddities aside, the offsets between two time zones won’t change in the future.
Changes in the offsets between time zones will occur with plenty of advance notice.
Daylight saving time happens at the same time every year.
Daylight saving time happens at the same time in every time zone.
Daylight saving time always adjusts by an hour.
Months have either 28, 29, 30, or 31 days.
The day of the month always advances contiguously from N to either N+1 or 1, with no discontinuities.
There is only one calendar system in use at one time.
It will be easy to calculate the duration of x number of hours and minutes from a particular point in time.
The same month has the same number of days in it everywhere!
Unix time is completely ignorant about anything except seconds.
Unix time is the number of seconds since Jan 1st 1970.
The day before Saturday is always Friday.
Contiguous timezones are no more than an hour apart. (aka we don’t need to test what happens to the avionics when you fly over the International Date Line)
Two timezones that differ will differ by an integer number of half hours.
Okay, quarter hours.
Okay, seconds, but it will be a consistent difference if we ignore DST.
If you create two date objects right beside each other, they’ll represent the same time. (a fantastic Heisenbug generator)
You can wait for the clock to reach exactly HH:MM:SS by sampling once a second.
If a process runs for n seconds and then terminates, approximately n seconds will have elapsed on the system clock at the time of termination.
Weeks start on Monday.
Days begin in the morning.
Holidays span an integer number of whole days.
The weekend consists of Saturday and Sunday.
It’s possible to establish a total ordering on timestamps that is useful outside your system.
The local time offset (from UTC) will not change during office hours.
Thread.sleep(1000) sleeps for 1000 milliseconds.
Thread.sleep(1000) sleeps for >= 1000 milliseconds.
Thanks again to everyone who commented. I read everything that you
wrote, even if I didn’t wind up including it above.
I made the list above by going through each of the comment threads on
Hacker News, Reddit, MetaFilter and BoingBoing (in that order) and
finding all(?) of the places where folks had broken out “falsehood 35
to 35 + n” as a bulleted list. I then selectively copied those
lists — in the order that I found them. I made small edits for
readability and occasionally I paraphrased (this is noted below).
This post — like the one before it — owes a great debt to
Patrick McKenzie’s canonical blog post about user names,
which I have read over and over throughout the years and from
which I have shamelessly cribbed both concept and style. If you
haven’t yet read this gem, go and do so right now. I promise you’ll
enjoy it.
Over the past couple of years I have spent a lot of time debugging
other engineers’ test code. This was interesting work, occasionally
frustrating but always informative. One might not immediately think
that test code would have bugs, but of course all code has bugs and
tests are no exception.
I have repeatedly been confounded to discover just how
many mistakes in both test and application code stem from
misunderstandings or misconceptions about time. By this I mean both
the interesting way in which computers handle time, and the
fundamental gotchas inherent in how we humans have constructed our
calendar — daylight savings being just the tip of the iceberg.
In fact I have seen so many of these misconceptions crop up in other
people’s (and my own) programs that I thought it would be worthwhile
to collect a list of the more common problems here.
All of these assumptions are wrong
There are always 24 hours in a day.
Months have either 30 or 31 days.
Years have 365 days.
February is always 28 days long.
Any 24-hour period will always begin and end in the same day (or week, or month).
That thing about a minute being longer than an hour was a joke, right?
No.
There was a fascinating bug in older versions of KVM on CentOS.
Specifically, a KVM virtual machine had no awareness that it was not
running on physical hardware. This meant that if the host OS put the
VM into a suspended state, the virtualized system clock would retain
the time that it had had when it was suspended. E.g. if the VM was
suspended at 13:00 and then brought back to an active state two hours
later (at 15:00), the system clock on the VM would still reflect a
local time of 13:00. The result was that every time a KVM VM went
idle, the host OS would put it into a suspended state and the VM’s
system clock would start to drift away from reality, sometimes by a
large margin depending on how long the VM had remained idle.
There was a cron job that could be installed to keep the virtualized
system clock in line with the host OS’s hardware clock. But it was
easy to forget to do this on new VMs and failure to do so led to much
hilarity. The bug has been fixed in more recent versions.
An acknowledgment
This post owes a great debt to
Patrick McKenzie’s canonical blog post about user names,
which I have read over and over throughout the years and from
which I have shamelessly cribbed both concept and style. If you
haven’t yet read this gem, go and do so right now. I promise you’ll
enjoy it.
UPDATED: Thanks for your comments and anecdotes!
I’d like to say thanks to everyone who contributed to the comment threads about this post on BoingBoing and Hacker News as well as Reddit and MetaFilter and to everyone on Twitter who shared their strange experiences with time.
You have provided so many interesting edge cases I had forgotten about as well as many oddities of which I wasn’t aware. For instance: in the Jewish calendar, days start at sunset not midnight. And as Bruce Sterling pointed out, I didn’t even think about what happens when the computer is on a spaceship orbiting a black hole.
There’s more than enough material for another (longer!) post about this topic. But first I’ll have to finish reading all >500 of your comments as well as the wealth of awesome research material that has been linked.
Thanks again for your enthusiasm and for the mind-boggling level of detail. I learned a lot about time in the last 24 hours. Fellow nerds, I salute you.
A checklist of things that are worth testing in pretty much any software system.
…trailing his fingers along the edge of an incomprehensible
computer bank, he reached out and pressed an invitingly large red
button on a nearby panel. The panel lit up with the words “Please do
not press this button again.” ~ Douglas Adams
Software systems are complex and as such exhibit non-deterministic
behavior. This is true of any non-trivial system. The behaviors of
even a small software product are so varied and unpredictable as to
defy complete testing.
However there are general five general areas of interest that are always
worth examining because they reveal mistakes with such surprising
regularity. Specifically it’s worthwhile to find out how any system
handles inputs, math, text, time and system
resources.
If like me you are a software developer then it’s commonly accepted
that about 50% of your time should be spent in testing rather than
writing code. If this seems excessive think about how much time you
spent in debugging the last time code you wrote was involved in a
production issue. Then think about your level of stress.
In the book The Soul of A New Machine, Tracy Kidder makes a
comment to the effect that most career programmers are pack-a-day
smokers who eventually drop dead of a heart attack. Don’t be that
guy. Time spent testing happens during work hours, within the
parameters of an estimated project schedule that you (hopefully) got
to sign off on in advance. If you follow the “50% of development time
is testing” rule then it’s possible that overall in the course of
your career you may spend more time testing than you would have
debugging production issues if you hadn’t taken the time to test. But
even so, you will spend less time being stressed and less time working
on the weekend.
And seriously, tested code is better code. Better code means more
reliable products. Reliability in turn leads to better
customer experience because reliability engenders trust. Trust
in turn is the foundation of the relationship that a product team
forms with its customers. Tested code means better customer
experience which leads to products that compete more effectively in
the marketplace. And that means you keep getting paid, which means
you get to keep writing code.
Inputs
Minimum and maximum input values are always good to test. For
instance, if a password field allows 6 to 128 characters, what actually
happens when you submit a six-character password? What about a 128-character password?
Too-high and too-low values. What happens with a 5-character or
129-character password? Alternately, how does the system respond
to inputs equal to the the minimum and maximum integer values
allowed by the implementation language or platform?
Invalid values such as null and NaN. Strings instead of
integers, arrays instead of strings.
Empty inputs such as a blank user name field or a transaction
record in which none of the fields contain any information. For
unit tests, submitting zero or an empty string instead of a valid parameter
can sometimes yield interesting results.
Too many inputs or not enough inputs. For a unit test this is
simply a matter of creating an incorrect function signature. For a
Web app it might involve submitting too many POST parameters or
selectively deleting parts of a URL’s query string.
Math
Decimal math is hard. Verify that integers are treated correctly
in a floating-point context, and vice versa.
Repeating decimals. Does the system treat 0.666 differently from
0.665?
Rounding. If you put 3 * 1.005 into the system, do you
get 3.015 back out? (This is notthe default behavior
in JavaScript, for instance.)
Type coercion. Is an input of 23 treated differently than an input of
"23"? That is: is a numeric input treated differently than
a string containing a numeric value?
Units of currency. There’s going to be a problem if an input of
£23.00 is stored in the database as $23.00.
Text
User names are perhaps the single most interesting class of text
that can be submitted as input to a computer program. At a
minimum, the system shouldn’t break when names contain apostrophes,
hyphens or spaces.
Passwords are also interesting. Does the maximum password length
allow for enough entropy? Are plain-English
passphrases disallowed because they don’t contain
numbers? Are passwords stored as salted hashes?
Are Unicode inputs treated differently than ASCII?
On the Web, are HTML-encoded entities properly converted to
characters and vice versa? What about URL-encoded characters?
Test on the first and last day of daylight savings time. The
system does allow you to mock out the first and last day of
daylight savings time, right?
Like with unit tests, boundary conditions can reveal
interestingness. How does the system behave between 23:59 and
00:01? What about during the hour between 00:00 and 00:59?
Be very aware of dates and times that are “special” to your
system. For instance, if you have a fake user for testing
purposes, how does the system respond when it’s that user’s birthday?
System resources
What if there’s half as much available memory as the system’s designers expect?
In a distributed system, what happens if half the nodes become unavailable?
In a service-oriented architecture, what happens if one of the
services becomes unavailable? What if it’s only partially available?
What happens if the network is slow?
What happens when the database is down?
What happens when the database is empty?
What happens if the cache is disabled? What about the CDN?
What if load on the system spikes to ten times normal?
What if load on the system drops to zero?
For long-running operations, what happens if you power cycle the machine before the operation is complete?
Two digressions: names and time
When it comes to Web apps, there are two areas that seem to cause more
pain than any other: people’s names and the time. These elements are
both common, essential to the correct functioning of a system, and
shockingly difficult to get right.
People’s names
There are only two hard problems in Computer Science: cache
invalidation, naming things and off-by-one-errors. ~ Phil Karlton
My favorite real-world case of a system finding a user’s name
“unacceptable” involved a person whose first name was 9. Not “Nine,”
mind you but the numeral “9”.
I have a friend named Sonnet (no middle name, no last name) who is
unable to complete registration flow for most Web sites. I myself
have occasionally been rejected by a registration form because I have no middle
name.
When I used to build internal tools for Etsy I worked
with a plethora of excellently-named hackers such as
Michelle D’Netto, Kellan Elliot-McCrea and of
course Ramin Bozorgzadeh. Ramin quickly became my test user
of choice because his surname was almost always too long for the
single line allotted to display it, thus breaking the UI. And in at
least one case an intranet tool (which had been around for several
years at that point) was brought down hard by the introduction of a
user name that contained an apostrophe. If you’re not as fortunate in
the naming scheme of your alpha testers then take care to construct
your fixtures appropriately.
King and villein, lad and lass,
All answer to the hourglass. ~ trad.
Tests that use the system time implicitly test the system clock of
whatever machine happens to be running the tests. Speaking from long
experience, I can attest that this approach can only lead to
unreliable tests and extreme debugging pain. If there is a test that
must rely on the system clock then it is better to go without
implementing the test than it is to expose yourself to the lost time
and frustration that running such a test would surely incur on you and
your team.
So, the system you are testing does allow you to mock out all of
the necessary times of day and times of year. Right? I hope so
because if you’re using the system time in tests, you are doing it
completely wrong.
And in my humble opinion, if you’re using the system time in tests
because the system you are testing won’t allow you to mock the time,
you aren’t the only one doing it wrong — the system itself is
fundamentally broken.
Good hunting
Pretty much every bullet point on each checklist above was drawn from
my own direct experience with a mistake that was found either in
development or in production. The cost of such knowledge was at the
very least some frustration for myself and in other cases a lot of stress and
lost time for many people on my team. But as my career has progressed
and I’ve moved to larger and larger projects, it’s been really useful
to have this information in my head. I like to think I design better
software because I’ve been burned in the past.
I hope this checklist helps you to find mistakes in the design and
implementation of your own systems as well. I hope you at least will
find most of them before they’re caught by your customers in
production. Because as software engineers, a clean, well-functioning
system is the basic foundation of the trust that our users put in us
and in the products we deliver.
There’s an old joke that goes something like this:
Proposition one: all programs have bugs.
Proposition two: all programs can be shortened by one line.
Conclusion: every program can be reduced to one line of buggy code.
Corny, I know ;-;)
But hey, there is a point in the life of every piece
of software when the entire system consists of one line of code.
That time is at the very beginning of a project, when one has just typed
the first bit of code into one’s text editor.
You might be wondering: why are we
talking about projects that contain only one line of code? How could
automation possibly help there? And wouldn’t it be overkill to set up
tooling to supports a trivially small, new project?
I’ll explain how two automated tools can help you maintain a
project, even at the point where you’ve just typed your first line of
code. These tools are code review and static analysis.
Start By Establishing A Culture Of Code Review
Recently I sat and talked with Erik Kastner
about his thoughts on code review and testing. Erik’s a thoughtful,
experienced guy and after working with him for a couple of years I
have found that his opinions have become very important to me. Erik
says great stuff like “code review is just reading someone else’s code
and understanding it before it ships.”
That’s one of the things I like about Kastner — he does more than just propound his
methodology. When Erik talks about how he thinks software engineering
should or shouldn’t work, he always qualifies his statements. And
there can be some pretty surprising insights wrapped up in those
qualifications.
So let’s look at this statement again:
Code review means reading and understanding someone else’s code.
This implies that if you and I are working on a project together, you’re
going to read my diffs before I commit or merge them into trunk.
We might be doing the review in FishEye
within a GitHub pull request,
or you might just be looking at my commits in our SCM.
But I expect you to do more than just read my changesets. I also should
expect you to fully comprehend how the diffs I’m showing you are
going to change the behavior of the system. Like many of Kastner’s
qualifications to software methodologies, this is a subtle but large
distinction.
Ideally every changeset I write gets reviewed by someone else before
it goes to production. This is the practice at a lot of large,
successful organizations like Google and the JPL. Having a human
review every changeset does impose an upper limit on how fast you can
deploy code to production. For a new, relatively small project, you
might feel that reviewing every changeset is too heavyweight. And you
might be right. But keep in mind that it’s a lot easier to put this
kind of process in place at the beginning than it is to wait until
your application is mature — and you’re definitely going to want a
code review process in place at that point.
Static Analysis
Now consider the case where I have written the following one line of
code and I ask you to review it. This is a trivial case of course,
but I hope it’s still illustrative of why you should spend the time to
set up these tools before you write a single line of code. Anyway,
here’s my changeset, would you review it before I push it to prod?
<?php
echo "hello world''
Did you catch both of the errors in my code? Probably you did. And
I’m sure you noticed the missing semicolon immediately. But did it
take you just a moment longer to realize there was something wrong
with that closing double quote? If it did, then you were experiencing
a trivial increase in cognitive load.
As our application gets larger and my changesets grow in complexity,
you’re going to have to endure a greater and greater amount of
cognitive load every time you review and debug one of my changesets. That’s not
great. You’re a good hacker and our project is going to win because
you’re using your whole brain to think about solving hard problems.
It’s too bad that instead our new code review process is causing you
to fill your brain with thoughts about whether or not I got my
punctuation right.
Besides, checking other people’s syntax is boring drudge work and drudgery is evil.
So it’s actually really important that we take a little bit of time at
the beginning of our project to make sure that code review imposes
as little unnecessary cognitive cost as possible.
Both of the errors I made above actually cause the PHP interpreter to
barf. So by induction, there must be a way to catch those errors
programmatically. And of course there are several
open source tools to help us do exactly that. But the simplest option is to
just use the PHP interpreter’s built-in syntax checker:
php -l index.php
Parse error: syntax error, unexpected $end, expecting T_VARIABLE or T_DOLLAR_OPEN_CURLY_BRACES or T_CURLY_OPEN in foo.php on line 3
Errors parsing index.php
Great. Just by running php -l on my code before you review it, you can
now avoid winding up as a human syntax-checker. This saves us both
time and frustration as we continue to work on our project. Even
better, I could run the syntax check on my own code before I send it
over to you for review.
It’s worthwhile for us to informally agree that we won’t bother
reviewing any code that doesn’t pass a syntax check.
Is It Worth Automating Static Analysis At This Point?
So we’ve made an agreement to always run static analysis on our code
before asking someone else to review it. This implies that any code
we deploy to production will have been run through static analysis at
least once. Even though our project and our team are small, we’ve
managed to put in place some important cornerstones on which we can
build a healthy engineering culture.
We could codify
our new agreement by writing it down in our wiki (if we have one).
Another way to codify our contract would be to set up a CI server
and configure it to fail the build if anyone commits a file that
doesn’t pass the syntax check. Yet another way to do this would be
for each of us to run watchr on
our laptops, and configure it to throw up a Growl alert whenever the
syntax check fails. We can pick one of these automated solutions,
spend a couple of hours setting it up and get its benefit throughout
the life of our project. So that seems like a worthwhile thing to do,
even though so far we only have one line of code.
There is a very convincing argument to be made that feature-complete
software is ultimately more valuable than a readable codebase.
That there is more value in what an application does than in how it
is put together. Perhaps it is fair to consider architecture,
including the comprehensibility of a program as source code, to be of
some benefit but still orthogonal to a program’s business value?
After all, it’s incontrovertibly true that the most important aspect
of developing software is
shipping it to the customer.
So then isn’t it the case that the
real value is all in what the software does?
Why is so much value placed on delivering readable code?
Greg Horvath recently showed me a
paper on JPL coding standards (PDF)
that encouraged eschewing some
pretty basic strategies (recursion, dynamic memory allocation) on the
grounds that they lead to code that is somewhat more difficult to run
through static analysis. The takeaway for me was that NASA cares
a lot about being able to tell what the code is intended to do,
without actually running it.
So while shipping feature-complete software is obviously important, it
seems that shipping readable software is really important,
too. Why? I think it’s because comprehensibility of components
contributes to the resiliency of a system overall.
A prototype that
works is good. A prototype that can evolve rapidly is even better.
Complexity is an inherent property of software. Comprehensibility is not.
It’s interesting to note that Dijkstra espoused
a readable-code-over-feature-completeness approach to software
architecture. He contrasted the two mindsets as “postulational” and
“operational,” respectively. Postulational meaning one can
postulate about what a program does just by reading the source.
Operational meaning that one bases one’s expectations about what a
program will do, on (educated) assumptions about what operations will
be carried out when that program is executed.
Dijkstra also once pronounced
that software is the most complex product ever produced by human
effort. When delivering any non-trivial software application, some
degree of complexity is intrinsic to the task. And obviously,
concepts that are complex do not lend themselves to implementations
that are easily readable. So maintaining comprehensibility in
the components of a complex system, turns out to be a rather difficult
problem.
Incidental complexity, once identified, can eventually be
factored out, leading to
code that is more readable overall. Therefore it is valuable to
take the (sometimes considerable amount of) time to distinguish between intrinsic and incidental complexity,
and to continually either avoid or remove the incidental complexities
that over time can make a codebase harder and harder to read.
The most interesting part of delivering software is watching what users do with it.
In order to provide a satisfying user experience
over the long term, any software needs to be able to adapt
iteratively and rapidly to the unpredictable needs and desires of its
user base. Resilient systems are best able to adapt, because successful
adaptation requires constant readjustment in the face of new
circumstances. So there is considerable value in preserving the
readability of source code throughout the life of a system.