I’m surprised there is no commonly-available solution for viewing Git logs as JSON documents.
It is very useful to convert logs to JSON, because JSON is immediately consume-able by almost all general-purpose data visualization tools — everything from jQuery UI to MatPlotLib, “speaks” JSON. So with Git logs converted to JSON, it becomes possible to perform all sorts of ad hoc historical analysis of source code repositories.
However, there is to my knowledge no simple, stand-alone tool to do the conversion! In practice, converting Git logs to JSON requires either relying on some large, third-party library that has already implemented git-log-to-JSON functionality, or writing regular expressions that turn out to be a bit of a pain in the ass.
Since there’s no commonly-available simple tool, I wonder how many people wind up putting off their Git log analysis because of the time overhead involved in JSON output conversion. If this were the case, it would really be too bad because code churn metrics are the best predictors of bugs; as discussed in this video from GTAC, which details a recent study of the Eclipse editor code base:
Making it easy to get the Git log as a JSON document!
Here’s a gist I wrote, which hopefully takes the mystery out of converting to to JSON. It has since been referenced from a popular question on StackOverflow. I’m glad someone else found this useful!
[Matt Huesser](http://twitter.com/mheusser) and I have been discussing the role of config flags in continuous delivery for some time, so it was great that he asked me to get some of my thoughts down on paper. Always a pleasure working with Matt!
What does it really mean to "teach yourself technical QA skills?"
The other day I published the table of contents
for a forthcoming book
on technical QA. Since then I have several times heard the criticism
that many QA analystss could not get started with this process as
I agree that setting up a simple Web site with PHP is beyond the
technical skills of many people who would otherwise like to learn how
to perform technical QA.
But I’d say to everyone who objects —are you seriously telling
me that you think someone can “learn to be a technical tester”
without also getting to know how the Web stack works?
learning to automate browsers well is hard enough without having to learn to program at the same time. learn to program first.
The mailing list for open source test tools are full of frustrated
people who are legitimately experts in black-box testing, deduction,
bug finding and problem-solving. But they are trying to extend and
apply their limited or nonexistent software engineering expertise to,
for example Selenium.
I love Selenium. I’ve been a user since 2006, when the tool was
use —and believe most people in Web dev should or could
But Selenium is one of the most complex web
development tools out there. Selenium is a tool comprised of
program-inside-a-server-inside-a-Web-browser, and that’s before you’ve
written a line of your own test code let alone picked a test harness
and gotten your changes to run in CI.
It’s true that what I’m describing is a learning process that at first glance may appear totally
beyond the capability of most QA analysts. But it’s also the case
that… this is the only road I know —that anyone
knows— to actually being a contributing member of a software
test engineering team.
The challenge then is to work out how a Web site’s “non-technical staff” (eg, QA Analaysts) can bridge the software engineering knowledge gap.
On self-teaching technical QA: I do hear your concerns!
The other day I published the table of contents
for a forthcoming guide to self-teaching the “hard skills” associated with technical QA. Since then I have several times heard the criticism
that many QA analysts could not get started with this process as
I agree that setting up a simple Web site with PHP is beyond the
technical skills of many people who would otherwise like to learn how
to perform technical QA.
Testing software is a difficult problem
Let’s face it: automated GUI-driven browser testing is one of
the most complex and badly-understood domains available to the
Software engineers don’t understand or implement automated testing
very well yet. The discipline is young. The first known examples of
what we now call “developer testing” only go back to the late
1960s. In much of the literature of computer science, testing is not
mentioned at all.
In order to successfully operate as a QA engineer, one must first
become a software engineer. And not just any software engineer, but
one who focuses on testing which I’ve just argued is a hard area to
dear novice automators; stop trying to learn java!
I don’t think that anything I’ve said above devalues or detracts from
the real concern that managers and senior engineers feel about asking
people to take on work that will be very difficult to complete.
But that’s why the title of my table of contents contains that magical
phrase teach yourself. I’m not writing a technical manual to be used
in the enterprise. In order to follow the program I’m outlining,
someone would need to dedicate a lot of time, on an ongoing basis,
for a year or more. It would require giving up weekend and evening
time to study. It might require volunteering for risky projects at
work in order to get to work with new technology in order to learn
about it. Anyone who tried it would have to be slightly obsessed, to be
But hey, that approach worked for me, for the most part, so far ;-)
So bear with me. I do hear your concerns. But also, this is hard
stuff. It’s not for everybody.
But I’m writing about the topic because I think we as engineers
exclude too many talented people who would be excellent QA engineers
—excellent engineers— given the proper opportunity to
self-teach. I know there’s a middle ground here. I just need (a lot
of) your help to find it.
Almost no one would accept a text editor or IDE that lacked colorful syntax highlighting for source code. Color-coded fonts make program text dramatically more readable and thus greatly reduce overall cognitive load for engineers.
I spend a lot of time in the shell and I view a lot of files via less. Over time I became increasingly frustrated that less — my tool-of-choice for reading code — lacked the ability to colorize text. I’d consider that a dealbreaker in my editor — why accept the constraint in the former context but not in the latter?
That last part is pretty cool: different-and-better than most IDEs, less will apply proper syntax-highlighting to files lacking an extension as long as the file starts with a shebang line.
Configuring less to colorize source code
I’ve put together a gist (embedded below) which contains everything you need to know in order to get syntax highlighting working with less.
Pro tip: line numbers in less
You can use less -N to enable line numbers when viewing source code with less. Combined with typing ‘/’ to search by regular expression, less is a code reader whose capabilities are on par with the best code editors.
How to teach yourself to be a technical tester: some thoughts.
This is the first draft of the table of contents of a book that I have been writing.
It’s worth noting that this entire program can be worked through
without spending a penny on proprietary software (with the optional
exception of Charles). It also does not require a powerful
computer — you could do all this on a five year old MacBook.
A self-teaching model for learning to use Selenium, assuming no prior professional programming experience
Use PHP to put up your own Web site. Spend a couple of weeks making
it a reasonably decent looking site. It doesn’t matter what the
site is or whether it has any transactional functionality. Just
some pages that display photos and content is fine.
Chrome Web inspector while building your
site. You don’t have to do anything fancy just try to use inspector
Use lynx --dump to retrieve the contents of your Web site. Just
hardcode all the page URLs. Redirect all the content to flat files,
then use grep to look for patterns in your content. Start by
looking for mistakes you commonly make. Save your greps in a file.
Do your greps do what you want now? You might want to stop and read
the Mastering Regular Expressions book. You will never once in your
long future career ever regret taking the time to read this book.
Use wget to retrieve source files from your site. Save the files in
a directory. Then write shell scripts to apply jshint, html tidy
and csslint respectively. What did you learn?
Write a grep wrapper for the static analysis performed in the
previous step. Filter out boring lines and/or highlight interesting
lines in red. Save it all and check it in to source
control. Seriously if you are still not in source control at this
point STOP NOW and figure it out. Otherwise, congratulations you
are just about to lose several hours of valuable work — prepare to
go back to step 3 and start over! Sorry to be harsh but you WILL
lose work. CHECK YOUR SITE INTO SOURCE CONTROL TOO.
Use curl to retrieve source instead of wget. What is different?
Which one do you like better? Check it into source control.
Now go back to your wget script and use your text editor to
batch-replace the end of every wget line with &. Now all your wgets
run at once. What happens? Which strategy do you like better?
Create a branch called “concurrent,” check in your new code and
push it to github.
Now use GNU Parallel to do the same thing. Spend some time
squelching wget log noise if necessary.
Now start your script, open a separate window and open htop. What
does it all mean? What did you learn once you understood it? Now
repeat with iotop and iftop. Check your bash history, see if
there’s any lines in there worth remembering (spoiler alert: yes
there are). Drop those lines in a new lib/snippets.sh and check it
in to source control.
Is your wget script overpowering your CPU? You can go back and
tweak GNU Parallel to run fewer concurrent jobs. Check your config
change into source control.
Implement the same script with curl. What is different? What is
Extract the GNU Parallel runner into its own file, so that there
is no duplicated code between the curl and wget versions. Learn how
to use source <file>
Now merge your “concurrent” branch back into master. You can
always go back to just before the merge commit if you want to play
with the old single threaded version. Tag the revision if you want,
and/or just put in a comment you can remember to search for later.
Now use wget, lwp-request and curl to examine HTTP
headers. These won’t make sense unless you go and read Wikipedia
and StackOverflow and ServerFault and get a good understanding for
yourself of what happens during the life of a Web request. You will
never once regret having spent the time to learn this information.
Now use wget/curl to examine the headers for your Web site. Does
it return the headers you would expect? What about for pages that
don’t exist or have moved? Fix your site if the headers aren’t
Save your favorite header checks in SCM.
Wrap your header checks in greps that filter out boring lines.
Write a for loop that wraps the wrapper you wrote in the previous step.
Go to your Web server and and start a GNU screen session. You will
have to spend some time reading up on what screen is and how to
control it. Once you have this information you can pretty much
always set up a CI server of your own on a Linux box, regardless of
the other parameters of the environment.
Start your for-loop-wrapper inside the screen
session. Congratulations! You are now running a production
You could use cron instead of a for loop and then you would get
email notifications for free (assuming your ops person has
configured your host with a mail account). Go and learn how cron
works, it’s simple and you will use this knowledge about five times
an hour once you start dealing with software systems at Web
scale. Cron runs basically everything on the Web.
Now use Chrome Web Inspector Net tab to look at HTTP traffic on
your site. It is assumed you learned the basics of Chrome inspector
during the initial 2 weeks you spent building your site.
Use Charles to view the same traffic (optional if you don’t want
to pay for Charles).
Use tcpdump to view network traffic. You just have to learn to
filter tcpdump down to your web server traffic, then you can
stop. TcpDump is huge, you should not try to learn all of it
now. It’s OK if the commands that work for you seem somewhat obtuse
and magic. You can come back and fill in those gaps in your
knowledge later. For now you need to be thinking about TCP packets.
Go learn how TCP conversations work during the life of a Web
request. You will use this information every day for the rest of
your career as a Web developer.
Now go back and look at tcpdump again. Use wireshark to visualize and filter.
Save your tcpdump scripts in SCM.
Go and read your web server logs and PHP logs. What is
interesting? What needs to be fixed? Write and save greps for the
log lines that are interesting.
Use cut and histo to graph log messages in real time. Save it
I. SCM, leave it running inside screen. Congrats you built a log
Take a day to fix your worst log boo boos. watch the graphs of
“bad ” messages go down. Take screenshots, celebrate! Now you have
built a “refactoring dashboard” like the one
Ross Snyder talked about at Surge 2011.
Now you are probably ready to get your head around the Selenium
stack, one of the most complex application stacks in the Web
First install wd (prounounced “would”) and open a wd shell. Follow the steps in the
tutorial and get them to actually work. This will take a couple of
days, probably and that’s OK. Go slowly and bookmark all the
helpful selenium articles you find. You will come back to these
again and again.
Now go back and implement all of the wget, curl, lynx projects
above with Selenium. This will take a month or so, probably. You
don’t have to use wd, use whatever driver/language makes sense to
As you go, install the selenium headless environment on your
server and get your new scripts running heedlessly inside your
for-loop (or your cron if you went that way). You can leave your
old scripts running too, in fact this is recommended so you can
cross check output between different versions of the same script.
Congratulations you now have an incredibly powerful Web testing
and monitoring tool at your disposal.
Take a week to pay down your technical debt. This is the only time
you will stop all production for a week like this, but it’s worth
it. After this initial debt is paid, you can just fix technical
debt as part of your normal work day, occasionally making a project
out of a big task.
Now select a login form whose action you would like to
automate. Get it working. This will take a while. Use your local
selenium meet up, #selenium and StackOverflow. Bookmark everything
and blog what you learn. This is a huge opportunity for you to
demonstrate that you are have achieved beginner status in one of
the most complex and least-understood technologies around. Don’t
skimp on the time required to make your first good impression as a
selenium hacker! And welcome to the community!
Add your login form automation script to your CI cron. Watch it
for a couple of days. What did you learn?
Pick another login form or similar single-page form. Automate
it. Put it in your CI cron.
Read the Selenium API documentation carefully. Think about what
kinds of capabilities exist that you could use to automate a
slightly more complex flow (think: 3 or 4 page form such as credit
card and shipping flow).
Automate that slightly more complex flow. You will have to read
the Selenium/WebDriver source to fully understand how the APIs
work. That is normal — you cannot master Selenium without reading
the source, because the project evolves so fast that the
documentation lags behind as a matter of course. Use your list of
Selenium resources to back you up as you read the code. Ask
whatever questions you need to, and try to do it on
StackOverflow. If you are wondering about it and asking, then a lot
of other people are probably out there getting stuck on it but not
Get it working.
Put it in your CI cron.
That’s it. If you have done all of the above you are a qualified
journeyman Selenium developer. Welcome to the world of tomorrow! I
mean, the world of automated QA!
Colorful diffs for files where all the code is one one very long line
It happens. Sometimes the code is all on One Very Long Line™
Sure it’s easy to insert line breaks with perl or sed. But how sure are you that your regex inserts the right line breaks?
No, even though it’s a bit gross,at times it would be really convenient to be able to see what differences there are between two somewhat similar very long lines of text. Fortunately that is easy with wdiff.
Use wdiff (short for word-diff) to diff very long lines of text
Install and use colordiff in conjunction with wdiff
colordiff is a command line tool that colorizes diff output. colordiff understands and will colorize the “word diff” output from wdiff. It also understands all the output formats from regular old diff as well — colordiff is awesome :)
You should be able to install both wdiff and colordiff using your favorite package manager. You may also wish to install colordifffrom source if it is not immediately available to your package manager.
In this post, text inside code blocks can be pasted into the terminal. It will all run as written, at least as of OSX Mountain Lion.
If a step fails, run it again. Usually failures of package management tools are due to missing dependencies, which (hopefully) won’t be missing on the next run. Repeat until it returns a good exit status, or until you get bored.
This post described the monthly maintenance steps I perform to keep my primary development Macbook up to date! I wrote it up mainly for my own use, but if you read this far, maybe it was helpful to you too — I hope so!
Git does very nice syntax highlighting for diffs when browsing the revision history. Colorful fonts make diffs more readable and thus reduce overall cognitive load for engineers.
It would be great if such syntax highlighting were available throughout the filesystem, not just within Git repos. And on OSX and Linux it’s easy to enable pretty-colored syntax highlighting for all your diffs and patches!
Pro tip: cat a patch through colordiff and it will work
colordiff parses and colors diff output — it doesn’t matter whether the output is on STDOUT or in a file. In the latter case, simply cat or echo the contents of your patch file, pipe it to colordiff and enjoy the resulting easier-to-read diffs!
Continuous Improvement: How Rapid Release Cycles Are Changing QA And Testing
Here are two videos and two slide decks from "Continuous Improvement," my talk on what I learned setting up QA process and test infrastructure for large software organizations like Etsy and Barnes & Noble.
For about a year now I have been giving this talk for mixed audiences of testing, dev and ops professionals. The topics under discussion seem to resonate across all three of those groups, which is good to see. But in the future I hope to also address the roles of Product Design and Customer Support (it’s hard to fit everything into 40 slides and 25 minutes!).
In the past few months, it seems like a lot of people are suddenly talking about “devops QA.” Maybe it’s just me :) but perhaps the devops cultural movement has attained enough maturity that its adherents are now facing much more specific (and subtle) challenges when it comes to moving fast and breaking things.
Whatever the cause, it’s gratifying to hear more people talking about these issues. Because I find the whole topic fascinating and look forward to continuing to travel around and talk about systems thinking in the context of continuous integration and deployment.
When I worked at Etsy, the problem of many developers committing to trunk at once was colloquially known as the “commit mutex” problem.
A CI system can effectively become “blocked” if too many people commit and trigger too many builds within a relatively short time period. All nodes in the CI cluster are busy but new builds keep getting queued. The resulting "build queue" means that it takes progressively longer to get feedback from CI. The longer it takes to get feedback, the less useful the CI system.
Throwing hardware at the problem is a reasonable solution, as long as you can afford it. In fact almost every time we had to add hardware to the CI cluster, it was because of the commit mutex. There are a very few cases where we added a new tool that made the build slower <cough>code coverage </cough> but the far more common case for adding hardware was: more people want to commit more often, and they all want to run all the tests.
Everyone committing small chunks of code all the time is a Very Good Thing. "Everyone commits to trunk every day" is one of the tenets of continuous integration. Yet there’s still an upper limit on how fast and how concurrently everyone can (or should) deploy.
Avoiding the commit mutex and preventing build queues are in my humble opinion two of the hard problems in continuous deployment.
It seems like every week I have a long conversation with someone on
the topic of hiring programmers. tl;dr: there are no programmers for hire in
New York in 2013. The job market is such that all decent programmers
are gainfully employed. In order to hire a developers it is necessary
to convince them to leave their current job. This will be true for
the foreseeable future.
There aren’t enough software engineers to fill all the roles
The most poignant question recruiters ask me lately is: do you know
someone else who would be interested in this role?
No. The last time I knew a programmer who was unemployed, was
in 2010. And they were only out of work for three months. And that
was basically because they were being super picky about what kind of
early-stage startup they wanted to join.
So no. No I don’t know any programmers who are looking for work.
And I don’t expect to meet an unemployed programmer again. Unless the
Internet stops being a thing. Or unless the American school system
starts turning out class after class of qualified,
passionate journeyman software engineers.
All the good programmers I know are gainfully employed
I get a lot of recruiter contacts and I read every single one. I am
always looking for people’s thoughts as to me why I would leave my
current job and come to their office to work on their project
every day for the next 2-3 years. Unfortunately most job postings are
little more than a list of keywords and a salary range.
Still, keywords and salary ranges are interesting. I always take
note of which languages are trending among tech recruiters. And did
you know that a Web Developer makes less money than a Software
Since 2005 I’ve used Salary.com (and more recently
GlassDoor) to get a general sense of what various job
titles were worth around New York. Combining that with the actual
salaries for roles I’ve seen mentioned in recruiter emails gives a
pretty accurate picture of real salary ranges.
Let’s take it as read that any decent programmer performs that kind of
analysis on a regular basis.
To recruit decent developers it’s always necessary to have a
reasonably interesting narrative about solving problems. In the current job market, it also takes competitive pay and a work
environment that is conducive to writing a lot of code. Job postings
that fail to communicate those qualities will not be successful.
Salary ranges from Salary.com and GlassDoor were retrieved on January 27 2013. When reviewing these graphs, keep in mind that demand is high and therefore salaries in the median range are unlikely to draw qualified candidates. Anecdotal evidence suggests that engineers are willing to consider a new role at salaries closer to the 80th percentile.
In writing bug reports, I have found it helpful to take the attitude
that I am engaged in scientific inquiry in the relatively
new field that is software. I’ve gotten the best results when
I constructed my communications around bugs in a way that lent itself
to application of the Scientific Method (or a
rough approximation thereof).
Description Of Problem, Steps To Reproduce, and Expected Resolution
In the rest of this post I’m going to delve at length into some of the
techniques I’ve learned for effective communication where bugs are
concerned. But even if you stop reading right here, you already know
the most important technique: write bug reports that consist of a
description of the problem, steps to reproduce, and an expected
Provide context, and lots of it.
The problem with communication is the illusion that it has taken
place. George Bernard Shaw
There’s a phenomenon by which software that was working just fine,
comes to be seen over time as increasingly buggy. In fact this
phenomenon occurs so often that it has a name: bit rot.
Very rarely, bit rot occurs due to actual changes in the software. But
much more often the apparent “rot” is actually due the software
staying the same while the attitudes and habits of its users slowly
change and evolve.
As the existence of bit rot demonstrates, it is not only possibly but
very likely that, given two different people, one of them might
perceive a behavior as buggy while the other person might see the
system as working just fine. Not to mention the costs associated with
other consequences of miscommunication including:
Overcoming such disconnects requires careful statement of the problem
at hand, with sensitivity to the different perspectives of one’s
Describe the problem, using at least two complete sentences.
Usually it’s enough to write two
complete sentences that describe the problem in
detail. The first sentence should generally be of the form "Feature X
is exhibiting behavior Y, but it was expected to exhibit behavior Z."
The second sentence should expand upon or clarify the first.
For example: "Feature X is redirecting signed-in users to the
cart, but it should be redirecting to the new holiday promotion. This
is happening in IE and Chrome."
Always include screenshots that illustrate the problem.
Screenshots of errors in production are a
fundamentally important artifact in the reporting of bugs and the
specification of improvements to existing features.
Having a visual picture of a problem, is of huge benefit to people who
weren’t there to witness the issue firsthand. Because of the
deeply visual wiring of our brains, there are certain
aspects of most problems that are best communicated via a picture. This
tends to be true regardless the level of detail provided in writing.
Thus most bug reports cannot be considered complete without at least
Present the Steps To Reproduce as a numbered list of discrete actions.
More than any other factor, the ability to
reproduce a problem will determine whether or not
the problem gets fixed. Failure to provide enough information to
reproduce an issue is one of the most insidious and frustrating things
that can go wrong with a bug report. In the worst case, everyone who
tries to reproduce the issue winds up feeling like they’ve wasted time
over nothing while the original reporter of the bug feels like the
owner of a singing frog.
Numbered lists are the ideal format for explaining how to reproduce a
bug because a list makes it immediately obvious how many steps there
are as well as where each step leaves off and the next step
begins. Confusion about “minor” distinctions like that can lead to
major headaches later on.
Here’s an example of a reasonably detailed “steps to reproduce”
Open a browser (latest version of any major browser is fine).
Navigate to foo.example.com/my/awesome/thing
Sign in as the user “HomerJ”
Observe that the text of the menu item in the upper right corner,
runs off the right edge of the screen.
And by counter example, here’s the same set of steps with all the
helpful context removed.
Look at the dev site.
The menu is wonky.
In the best case, a developer who reads the second example is confused
but figures it out. In the worst case, she notices that the background
color of the menu in question is green, recalls (possibly wrongly)
that the background color was meant to be blue, and spends the rest of
the day “fixing” that before she realizes she was only meant to be
moving the menu a couple of pixels to the left. Or she signs in as a
user whose preferences specify a totally different menu
look-and-feel. Or she goes to the wrong part of the dev site and
looks at the wrong menu. More misunderstandings are possible in this
situation but their enumeration is left as an exercise for the reader.
Describe the expected resolution, using at least two complete sentences.
This is where it all comes together. The reader has been informed of
the nature of the problem and has walked through the steps to reproduce
it. What remains now are the tasks of inventing a fix for the issue,
applying it to the production application without breaking anything
else, and finally verifying that the fix as implemented actually
solves the original problem.
It is generally insufficient to describe the expected resolution of a
software bug in less than
two complete sentences. Even a trivial bug
represents at best a failure to properly implement the software as
specified. Such failures inevitably stem from misunderstandings of one
sort or another. So it is reasonable to approach writing up the
desired state of a software feature, as an exercise in bridging a
pre-existing communication gap.
To this end it’s also best to avoid domain-specific jargon and
acronyms. Stating the desired outcome in plain English can be more
challenging than it might first appear. But it’s worth the effort.
Be quantitative. Engineers respond well to hard evidence.
Most bugs, most of the time, are easily nailed given even an
incomplete but suggestive characterization of their error conditions
at source-code level. When someone among your beta-testers can point
out, “there’s a boundary problem in line nnn”, or even just “under
conditions X, Y, and Z, this variable rolls over”, a quick look at
the offending code often suffices to pin down the exact mode of
failure and generate a fix. Eric
Screenshots are the basic currency of good bug reports. But there are
numerous other artifacts which could be helpful in diagnosing an
issue. These include but are not limited to:
Graphs from tools like Graphite, Ganglia,
Cacti, Cloudkick, Google Analytics, etc.
In the case of a Web application it is also extremely helpful to
provide hyperlinks pointing directly to the page(s) in production
where an issue has been observed.
As a general rule, the more hard data provided, the
better. Anectdotes about frustrated users can be a useful tool for
understanding why a fixing a particular bug might be important. But
knowing there’s an error at line 142 (for instance) is much more
likely to turn out to be key to actually implementing the fix.
A bug report is a story about a problem.
A good story has a beginning, middle and end. A good bug report has a
description of the problem, steps to reproduce a problem and concludes
by stating the expected resolution. These three parts of a bug report
map neatly to the narrative arc that underlies all good stories.
Stories are powerful tools for communication. The wetware of the human
brain is heavily optimized for the processing of information that is
structured as a narrative.
In her talk “How To Test The Inside of Your Head,” Liz Keogh discusses
confirmation bias, the Face on Mars at Cydonia, the Martian Canals of
Percival Lowell and how we as a species are sometimes predisposed to
believe obvious but incorrect explanations. I highly recommend
watching the whole talk but here is the really pertinent bit.
Writing bug reports as I’ve described helps prevent misunderstandings
and wasted effort. But there’s more to it than that. By providing a
complete narrative, a good bug report sets the stage for everyone
involved to act within a larger, cohesive context. Human beings feel
better and make better decisions when they feel like the world around
them makes sense. Stories help us to make sense of the world and a
good bug report can go a long way toward making a chaotic situation
suddenly feel manageable.
Thus the larger benefit of including complete information in a bug
report can be to directly reduce stress and therefore indirectly to
increase the capability of the team to rapidly resolve production
This article was written as a general guide, with the
novice-to-intermediate practitioner in mind. The techniques described
above reflect what has worked well in my limited experience. For other
perspectives the reader is referred to the many other fine bug writing
guides available on the Internet, including:
So I wrote another post called
“falsehoods programmers believe about time,” where
I included 34 misconceptions and mistakes having to do with both
calendar and system time. Most of these were drawn from my immediate
experience with code that needed to be debugged (both in production
and in test).
A great many of the false assumptions listed were my own. Especially “time stamps are always in seconds since epoch” and “the duration of a system clock minute is always pretty close to the duration of a wall clock minute.” Whoa did I ever live to regret my ignorance in those two cases! But hey, apparently I’m not the only one who has run into (or inadvertently caused) such issues. A lot of people responded and shared similar experiences.
I’d like to say an enormous thanks to everyone who contributed to
the comment threads on BoingBoing and
Hacker News as well as Reddit and
MetaFilter and to everyone on Twitter who
shared their strange experiences with time. In those thousand or so comments
and tweets, there were a lot of suggestions as to “falsehoods 35 to 35+n.”
First and foremost was the omission of the false assumption that “time
always moves forward,” as pointed out by Technomancy and many others.
I enjoyed reading all the suggested falsehoods. When I was done
reading, I realized that taken as a whole,
these constitute a whole other blog post. So I collected some of your suggested falsehoods
into a post and here it is.
All of these assumptions are wrong
All of these falsehoods were suggested by people who commented on the
original post. Each contributor is credited below.
The offsets between two time zones will remain constant.
OK, historical oddities aside, the offsets between two time zones won’t change in the future.
Changes in the offsets between time zones will occur with plenty of advance notice.
Daylight saving time happens at the same time every year.
Daylight saving time happens at the same time in every time zone.
Daylight saving time always adjusts by an hour.
Months have either 28, 29, 30, or 31 days.
The day of the month always advances contiguously from N to either N+1 or 1, with no discontinuities.
There is only one calendar system in use at one time.
Thanks again to everyone who commented. I read everything that you
wrote, even if I didn’t wind up including it above.
I made the list above by going through each of the comment threads on
Hacker News, Reddit, MetaFilter and BoingBoing (in that order) and
finding all(?) of the places where folks had broken out “falsehood 35
to 35 + n" as a bulleted list. I then selectively copied those
lists — in the order that I found them. I made small edits for
readability and occasionally I paraphrased (this is noted below).
This post — like the one before it — owes a great debt to
Patrick McKenzie’s canonical blog post about user names,
which I have read over and over throughout the years and from
which I have shamelessly cribbed both concept and style. If you
haven’t yet read this gem, go and do so right now. I promise you’ll
Over the past couple of years I have spent a lot of time debugging
other engineers’ test code. This was interesting work, occasionally
frustrating but always informative. One might not immediately think
that test code would have bugs, but of course all code has bugs and
tests are no exception.
I have repeatedly been confounded to discover just how
many mistakes in both test and application code stem from
misunderstandings or misconceptions about time. By this I mean both
the interesting way in which computers handle time, and the
fundamental gotchas inherent in how we humans have constructed our
calendar — daylight savings being just the tip of the iceberg.
In fact I have seen so many of these misconceptions crop up in other
people’s (and my own) programs that I thought it would be worthwhile
to collect a list of the more common problems here.
All of these assumptions are wrong
There are always 24 hours in a day.
Months have either 30 or 31 days.
Years have 365 days.
February is always 28 days long.
Any 24-hour period will always begin and end in the same day (or week, or month).
That thing about a minute being longer than an hour was a joke, right?
There was a fascinating bug in older versions of KVM on CentOS.
Specifically, a KVM virtual machine had no awareness that it was not
running on physical hardware. This meant that if the host OS put the
VM into a suspended state, the virtualized system clock would retain
the time that it had had when it was suspended. E.g. if the VM was
suspended at 13:00 and then brought back to an active state two hours
later (at 15:00), the system clock on the VM would still reflect a
local time of 13:00. The result was that every time a KVM VM went
idle, the host OS would put it into a suspended state and the VM’s
system clock would start to drift away from reality, sometimes by a
large margin depending on how long the VM had remained idle.
There was a cron job that could be installed to keep the virtualized
system clock in line with the host OS’s hardware clock. But it was
easy to forget to do this on new VMs and failure to do so led to much
hilarity. The bug has been fixed in more recent versions.
This post owes a great debt to
Patrick McKenzie’s canonical blog post about user names,
which I have read over and over throughout the years and from
which I have shamelessly cribbed both concept and style. If you
haven’t yet read this gem, go and do so right now. I promise you’ll
There’s more than enough material for another (longer!) post about this topic. But first I’ll have to finish reading all >500 of your comments as well as the wealth of awesome research material that has been linked.
A checklist of things that are worth testing in pretty much any software system.
…trailing his fingers along the edge of an incomprehensible
computer bank, he reached out and pressed an invitingly large red
button on a nearby panel. The panel lit up with the words “Please do
not press this button again.” ~ Douglas Adams
Software systems are complex and as such exhibit non-deterministic
behavior. This is true of any non-trivial system. The behaviors of
even a small software product are so varied and unpredictable as to
defy complete testing.
However there are general five general areas of interest that are always
worth examining because they reveal mistakes with such surprising
regularity. Specifically it’s worthwhile to find out how any system
handles inputs, math, text, time and system
If like me you are a software developer then it’s commonly accepted
that about 50% of your time should be spent in testing rather than
writing code. If this seems excessive think about how much time you
spent in debugging the last time code you wrote was involved in a
production issue. Then think about your level of stress.
In the book The Soul of A New Machine, Tracy Kidder makes a
comment to the effect that most career programmers are pack-a-day
smokers who eventually drop dead of a heart attack. Don’t be that
guy. Time spent testing happens during work hours, within the
parameters of an estimated project schedule that you (hopefully) got
to sign off on in advance. If you follow the “50% of development time
is testing” rule then it’s possible that overall in the course of
your career you may spend more time testing than you would have
debugging production issues if you hadn’t taken the time to test. But
even so, you will spend less time being stressed and less time working
on the weekend.
And seriously, tested code is better code. Better code means more
reliable products. Reliability in turn leads to better
customer experience because reliability engenders trust. Trust
in turn is the foundation of the relationship that a product team
forms with its customers. Tested code means better customer
experience which leads to products that compete more effectively in
the marketplace. And that means you keep getting paid, which means
you get to keep writing code.
Minimum and maximum input values are always good to test. For
instance, if a password field allows 6 to 128 characters, what actually
happens when you submit a six-character password? What about a 128-character password?
Too-high and too-low values. What happens with a 5-character or
129-character password? Alternately, how does the system respond
to inputs equal to the the minimum and maximum integer values
allowed by the implementation language or platform?
Invalid values such as null and NaN. Strings instead of
integers, arrays instead of strings.
Empty inputs such as a blank user name field or a transaction
record in which none of the fields contain any information. For
unit tests, submitting zero or an empty string instead of a valid parameter
can sometimes yield interesting results.
Too many inputs or not enough inputs. For a unit test this is
simply a matter of creating an incorrect function signature. For a
Web app it might involve submitting too many POST parameters or
selectively deleting parts of a URL’s query string.
Decimal math is hard. Verify that integers are treated correctly
in a floating-point context, and vice versa.
Repeating decimals. Does the system treat 0.666 differently from
Rounding. If you put 3 * 1.005 into the system, do you
get 3.015 back out? (This is notthe default behavior
Type coercion. Is an input of 23 treated differently than an input of
"23"? That is: is a numeric input treated differently than
a string containing a numeric value?
Units of currency. There’s going to be a problem if an input of
£23.00 is stored in the database as $23.00.
User names are perhaps the single most interesting class of text
that can be submitted as input to a computer program. At a
minimum, the system shouldn’t break when names contain apostrophes,
hyphens or spaces.
Passwords are also interesting. Does the maximum password length
allow for enough entropy? Are plain-English
passphrases disallowed because they don’t contain
numbers? Are passwords stored as salted hashes?
Are Unicode inputs treated differently than ASCII?
On the Web, are HTML-encoded entities properly converted to
characters and vice versa? What about URL-encoded characters?
Test on the first and last day of daylight savings time. The
system does allow you to mock out the first and last day of
daylight savings time, right?
Like with unit tests, boundary conditions can reveal
interestingness. How does the system behave between 23:59 and
00:01? What about during the hour between 00:00 and 00:59?
Be very aware of dates and times that are “special” to your
system. For instance, if you have a fake user for testing
purposes, how does the system respond when it’s that user’s birthday?
What if there’s half as much available memory as the system’s designers expect?
In a distributed system, what happens if half the nodes become unavailable?
In a service-oriented architecture, what happens if one of the
services becomes unavailable? What if it’s only partially available?
What happens if the network is slow?
What happens when the database is down?
What happens when the database is empty?
What happens if the cache is disabled? What about the CDN?
What if load on the system spikes to ten times normal?
What if load on the system drops to zero?
For long-running operations, what happens if you power cycle the machine before the operation is complete?
Two digressions: names and time
When it comes to Web apps, there are two areas that seem to cause more
pain than any other: people’s names and the time. These elements are
both common, essential to the correct functioning of a system, and
shockingly difficult to get right.
There are only two hard problems in Computer Science: cache
invalidation, naming things and off-by-one-errors. ~ Phil Karlton
My favorite real-world case of a system finding a user’s name
“unacceptable” involved a person whose first name was 9. Not “Nine,”
mind you but the numeral “9”.
I have a friend named Sonnet (no middle name, no last name) who is
unable to complete registration flow for most Web sites. I myself
have occasionally been rejected by a registration form because I have no middle
When I used to build internal tools for Etsy I worked
with a plethora of excellently-named hackers such as
Michelle D’Netto, Kellan Elliot-McCrea and of
course Ramin Bozorgzadeh. Ramin quickly became my test user
of choice because his surname was almost always too long for the
single line allotted to display it, thus breaking the UI. And in at
least one case an intranet tool (which had been around for several
years at that point) was brought down hard by the introduction of a
user name that contained an apostrophe. If you’re not as fortunate in
the naming scheme of your alpha testers then take care to construct
your fixtures appropriately.
King and villein, lad and lass,
All answer to the hourglass. ~ trad.
Tests that use the system time implicitly test the system clock of
whatever machine happens to be running the tests. Speaking from long
experience, I can attest that this approach can only lead to
unreliable tests and extreme debugging pain. If there is a test that
must rely on the system clock then it is better to go without
implementing the test than it is to expose yourself to the lost time
and frustration that running such a test would surely incur on you and
So, the system you are testing does allow you to mock out all of
the necessary times of day and times of year. Right? I hope so
because if you’re using the system time in tests, you are doing it
And in my humble opinion, if you’re using the system time in tests
because the system you are testing won’t allow you to mock the time,
you aren’t the only one doing it wrong — the system itself is
Pretty much every bullet point on each checklist above was drawn from
my own direct experience with a mistake that was found either in
development or in production. The cost of such knowledge was at the
very least some frustration for myself and in other cases a lot of stress and
lost time for many people on my team. But as my career has progressed
and I’ve moved to larger and larger projects, it’s been really useful
to have this information in my head. I like to think I design better
software because I’ve been burned in the past.
I hope this checklist helps you to find mistakes in the design and
implementation of your own systems as well. I hope you at least will
find most of them before they’re caught by your customers in
production. Because as software engineers, a clean, well-functioning
system is the basic foundation of the trust that our users put in us
and in the products we deliver.
Rapid Infrastructure: Tools You Can Use Even If You Only Have One Line Of Code
There’s an old joke that goes something like this:
Proposition one: all programs have bugs.
Proposition two: all programs can be shortened by one line.
Conclusion: every program can be reduced to one line of buggy code.
Corny, I know ;-;)
But hey, there is a point in the life of every piece
of software when the entire system consists of one line of code.
That time is at the very beginning of a project, when one has just typed
the first bit of code into one’s text editor.
You might be wondering: why are we
talking about projects that contain only one line of code? How could
automation possibly help there? And wouldn’t it be overkill to set up
tooling to supports a trivially small, new project?
I’ll explain how two automated tools can help you maintain a
project, even at the point where you’ve just typed your first line of
code. These tools are code review and static analysis.
Start By Establishing A Culture Of Code Review
Recently I sat and talked with Erik Kastner
about his thoughts on code review and testing. Erik’s a thoughtful,
experienced guy and after working with him for a couple of years I
have found that his opinions have become very important to me. Erik
says great stuff like “code review is just reading someone else’s code
and understanding it before it ships.”
That’s one of the things I like about Kastner — he does more than just propound his
methodology. When Erik talks about how he thinks software engineering
should or shouldn’t work, he always qualifies his statements. And
there can be some pretty surprising insights wrapped up in those
So let’s look at this statement again:
Code review means reading and understanding someone else’s code.
This implies that if you and I are working on a project together, you’re
going to read my diffs before I commit or merge them into trunk.
We might be doing the review in FishEye
within a GitHub pull request,
or you might just be looking at my commits in our SCM.
But I expect you to do more than just read my changesets. I also should
expect you to fully comprehend how the diffs I’m showing you are
going to change the behavior of the system. Like many of Kastner’s
qualifications to software methodologies, this is a subtle but large
Ideally every changeset I write gets reviewed by someone else before
it goes to production. This is the practice at a lot of large,
successful organizations like Google and the JPL. Having a human
review every changeset does impose an upper limit on how fast you can
deploy code to production. For a new, relatively small project, you
might feel that reviewing every changeset is too heavyweight. And you
might be right. But keep in mind that it’s a lot easier to put this
kind of process in place at the beginning than it is to wait until
your application is mature — and you’re definitely going to want a
code review process in place at that point.
Now consider the case where I have written the following one line of
code and I ask you to review it. This is a trivial case of course,
but I hope it’s still illustrative of why you should spend the time to
set up these tools before you write a single line of code. Anyway,
here’s my changeset, would you review it before I push it to prod?
echo "hello world''
Did you catch both of the errors in my code? Probably you did. And
I’m sure you noticed the missing semicolon immediately. But did it
take you just a moment longer to realize there was something wrong
with that closing double quote? If it did, then you were experiencing
a trivial increase in cognitive load.
As our application gets larger and my changesets grow in complexity,
you’re going to have to endure a greater and greater amount of
cognitive load every time you review and debug one of my changesets. That’s not
great. You’re a good hacker and our project is going to win because
you’re using your whole brain to think about solving hard problems.
It’s too bad that instead our new code review process is causing you
to fill your brain with thoughts about whether or not I got my
Besides, checking other people’s syntax is boring drudge work and drudgery is evil.
So it’s actually really important that we take a little bit of time at
the beginning of our project to make sure that code review imposes
as little unnecessary cognitive cost as possible.
Both of the errors I made above actually cause the PHP interpreter to
barf. So by induction, there must be a way to catch those errors
programmatically. And of course there are several
open source tools to help us do exactly that. But the simplest option is to
just use the PHP interpreter’s built-in syntax checker:
php -l index.php
Parse error: syntax error, unexpected $end, expecting T_VARIABLE or T_DOLLAR_OPEN_CURLY_BRACES or T_CURLY_OPEN in foo.php on line 3
Errors parsing index.php
Great. Just by running php -l on my code before you review it, you can
now avoid winding up as a human syntax-checker. This saves us both
time and frustration as we continue to work on our project. Even
better, I could run the syntax check on my own code before I send it
over to you for review.
It’s worthwhile for us to informally agree that we won’t bother
reviewing any code that doesn’t pass a syntax check.
Is It Worth Automating Static Analysis At This Point?
So we’ve made an agreement to always run static analysis on our code
before asking someone else to review it. This implies that any code
we deploy to production will have been run through static analysis at
least once. Even though our project and our team are small, we’ve
managed to put in place some important cornerstones on which we can
build a healthy engineering culture.
We could codify
our new agreement by writing it down in our wiki (if we have one).
Another way to codify our contract would be to set up a CI server
and configure it to fail the build if anyone commits a file that
doesn’t pass the syntax check. Yet another way to do this would be
for each of us to run watchr on
our laptops, and configure it to throw up a Growl alert whenever the
syntax check fails. We can pick one of these automated solutions,
spend a couple of hours setting it up and get its benefit throughout
the life of our project. So that seems like a worthwhile thing to do,
even though so far we only have one line of code.
There is a very convincing argument to be made that feature-complete
software is ultimately more valuable than a readable codebase.
That there is more value in what an application does than in how it
is put together. Perhaps it is fair to consider architecture,
including the comprehensibility of a program as source code, to be of
some benefit but still orthogonal to a program’s business value?
After all, it’s incontrovertibly true that the most important aspect
of developing software is
shipping it to the customer.
So then isn’t it the case that the
real value is all in what the software does?
Why is so much value placed on delivering readable code?
Greg Horvath recently showed me a
paper on JPL coding standards (PDF)
that encouraged eschewing some
pretty basic strategies (recursion, dynamic memory allocation) on the
grounds that they lead to code that is somewhat more difficult to run
through static analysis. The takeaway for me was that NASA cares
a lot about being able to tell what the code is intended to do,
without actually running it.
So while shipping feature-complete software is obviously important, it
seems that shipping readable software is really important,
too. Why? I think it’s because comprehensibility of components
contributes to the resiliency of a system overall.
A prototype that
works is good. A prototype that can evolve rapidly is even better.
Complexity is an inherent property of software. Comprehensibility is not.
It’s interesting to note that Dijkstra espoused
a readable-code-over-feature-completeness approach to software
architecture. He contrasted the two mindsets as “postulational” and
“operational,” respectively. Postulational meaning one can
postulate about what a program does just by reading the source.
Operational meaning that one bases one’s expectations about what a
program will do, on (educated) assumptions about what operations will
be carried out when that program is executed.
Dijkstra also once pronounced
that software is the most complex product ever produced by human
effort. When delivering any non-trivial software application, some
degree of complexity is intrinsic to the task. And obviously,
concepts that are complex do not lend themselves to implementations
that are easily readable. So maintaining comprehensibility in
the components of a complex system, turns out to be a rather difficult
Incidental complexity, once identified, can eventually be
factored out, leading to
code that is more readable overall. Therefore it is valuable to
take the (sometimes considerable amount of) time to distinguish between intrinsic and incidental complexity,
and to continually either avoid or remove the incidental complexities
that over time can make a codebase harder and harder to read.
The most interesting part of delivering software is watching what users do with it.
In order to provide a satisfying user experience
over the long term, any software needs to be able to adapt
iteratively and rapidly to the unpredictable needs and desires of its
user base. Resilient systems are best able to adapt, because successful
adaptation requires constant readjustment in the face of new
circumstances. So there is considerable value in preserving the
readability of source code throughout the life of a system.