Infinite Undo! RSS


Hello, I'm Noah Sussman. I am a scientist studying technosocial systems in New York.


How to read a manpage

A bunny playing xbox -- super cute!

In this article I will explain how to get help on the command line. It’s a basic skill that everyone needs to learn once they set out to master the shell.

If you are using Mac or Linux, then this article contains everything you need to know to browse the built-in help pages for all those “command-line commands” =D like ls, cp, cd and grep.

Getting help

The quickest and briefest way to read the help for a shell command is to type <command> --help where in place of <command> you substitute the name of the actual unix command that you want to know about.

For example:

$ ls --help
ls: illegal option -- -
usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...]

So <command> --help will print a short block of text that shows what parameters and options a command can take.

Or, that might not work.

Unfortunately shell commands aren’t standardized in this regard so if <command> --help doesn’t work, you will have to try the following variations, one of which should work instead:

<command> -h
<command> -help

However the “short help” is only helpful if you know what the command does and what the common use cases are. That information is to be found in the manpage.

Reading the manpage

Just type

man <command>

Then use the arrow keys to move up and down.



to go to the top of the page. Type


to go to the bottom of the page.

Type q to quit.

To search for text in a manpage:

  1. type /
  2. type the string you want to search for and hit ENTER
  3. type n to go the next match
  4. type p to go to the previous match

And don’t forget to read man man!


Why is it so hard to design automated GUI-driven functional tests?

Google search suggestions for why is Jenkins

Google wants to correct flaky tests to flaky SELENIUM tests. LOL.

It’s really, really difficult to design automated tests that are useful at Web scale. The Web is littered with blog posts, Quora posts and StackOverflow questions about how to deal with the consequences of test systems that were built without the benefit of rigorous analysis.

What sort of analytics and algorithms are useful in designing tests?

To write effective automated tests, engineers have to switch their minds over to thinking about the whole system rather than a single unit of functionality. This is one reason why expert testers are so useful: they don’t have to context switch from “developer mode” in order to get work done.

The expert tester designs her tests by implicitly referencing a state diagram of a whole system. But, if you draw the state diagram of a system like, say, login and authentication — that’s a very complex set of paths.

So the expert tester also decomposes the state diagram of each feature into multiple paths. A state diagram is then implicitly constructed for each of those paths. She then prioritizes the paths and assesses the feasibility of actually implementing a computer program that can traverse those that are most important.

It’s never the case that all of the highest priority paths can actually be tested given our current technology. Figuring that out up front is essential, otherwise it’s very easy to blow all the available development time on a test that looks superficially easy but can’t actually be implemented.

How is test design done? And how specifically is it different from “customer-facing” software architecture?

Note that I keep saying “implicitly constructs a state diagram.” That’s because the expert tester is a domain expert who’s been building her skills for years. Plus, she’s gotten very good at state and path analysis. She doesn’t need to draw diagrams and perform explicit path reduction. She can do that stuff in her head. She doesn’t need explicit mathematical models any more than a front end engineer needs to see an explicit graph of the DOM.

That’s a longwinded way of saying: test design can look easy — when it’s being done by highly trained professionals. But everyone has to spend years climbing a very painful learning curve in order to get good at test design. In that respect test design is just like any other programming skill.

So it’s important to approach large-scale automated testing projects knowing that the work involved is not intuitive or obvious to engineers who design client-facing features for a living! There is a lot of specialized analysis that needs to happen before the first line of code is written. The expert tester does this analysis implicity, and can get the job done very fast. But for engineers who don’t think about testing 24-7, it won’t hurt to grab a conference room and chart out some finite state diagrams on the whiteboard!

Remember: you cannot write a useful test unless you can represent the scenario as a finite state machine. If you can’t do that then the set of paths you have chosen is too complex and/or non-deterministic; and you need to re-apply the path reduction heuristic described above.

Even a simple test case contains a surprisingly large number of execution paths


Counting The Number Of Ways You Can Put Items Into A Shopping Cart: A Combinatorial Explosion Problem

knots represent complexity and interconnectedness aka entanglement

This problem illustrates how common, every day Web functionality presents the test designer with a staggeringly large number of execution paths.

Imagine you have a shopping cart which can contain up to 100 different items in varying quantities. There are about 150 different items to choose from.

Now imagine you have been asked to design an automated test that exercises all possible combinations of items, in all valid quantities including zero.

How many variables do we need to think about here? There are 3 variables:

  1. capacity of the cart,
  2. number of different items, and
  3. how many of each item is available

How would you describe the complexity of this test? The number of paths that need be excercised exhibits factorial or O(n!) growth as we put more items into the cart.

How long would you expect this test to take to run? There are over 10^45 ways (that’s the sum of 150 choose i as i grows from 1 to 100) to fill the cart with just one of each item. That’s just one of each item, we haven’t even considered different quantities of items yet.

the sum of 150 choose i from 1 to 100

Assuming one test per microsecond, runtime exceeds the age of the universe (only about 4 * 10^17 seconds). And that still is just a fraction of the number of paths that would be encountered in the the full test case as described above; where not just all possible combinations of items, but also all possible combinations in all valid quantities from 0 to 100, would be tested. I didn’t bother to do the math on that one, but it’s a lot of paths :)

This is a bad test. Besides being impossible to implement, what are we proving by testing every combination that we can’t prove by just testing a few edge cases?


Counting the Number of Paths Through A Series of Drop-Down Menus: A Combinatorial Explosion Problem

A complex system of nodes connected by edges.

This problem illustrates how a test scenario that originally was simple, can “explode” into numerous cases through the addition of “just a little bit more” functionality to the application under test!

Imagine a form with two drop-down menus. The first menu has two options and the second menu has three options. You must choose one option from each menu in order to submit the form.

How much time would you expect that it would take you to write and execute a test plan for this form?

    2 * 3 = 6

How does the answer change when we add a third drop-down menu that has four options?

    2 * 3 * 4 = 24

How does it change if I add two more menus with five and six options, respectively?

    2 * 3 * 4 * 5 * 6 = 720

How would your expectations about a manual test plan change when I add another drop-down menu with 7 options?

    2 * 3 * 4 * 5 * 6 * 7 = 5040

the number of paths grows factorially as menus are added

This test will not scale well because it exhibits factorial growth. Additionally the test’s design is bad: why do we need to test every single item on every single menu? Is every item handled by a different subroutine?

A pragmatic question to ask in this case would be: “what don’t we cover by just testing one menu in isolation?”


Testing A Multiple Select Menu: A Combinatorial Explosion Problem

a complex structure with many nodes and edges

This problem is illustrative of how the scope of a test scenario can “explode” through the introduction of “just a few more edge cases.”

Let’s say we have a multiple-select menu with 3 options. In order to submit this menu, you need to choose at least one option. Other than that you can choose as many options as you want, up to and including all of them.

  1. foo
  2. bar
  3. baz

How many tests do you need to perform in order to test every combination of items that a user could choose? In other words, how many different paths can a user take through this menu?

7 ways: the number of ways you can choose up to 3 menu items is the sum of 3 choose i as i grows from 1 to 3

There are 7 paths: the number of ways you can choose up to 3 menu items is the sum of 3 choose i as i grows from 1 to 3

Now what if I double the number of menu items?

  1. foo
  2. bar
  3. baz
  4. homer
  5. marge
  6. maggie

Now there are 63 paths — exponential growth has occurred!

now there are 63 paths -- exponential growth has occurred!

What about with 20 options?

Now there are over one million paths through the test case as described above.

now there are over a million paths

Such a test will not scale to even a moderately large number of menu items, because it exhibits factorial growth.

O(n!) growth for larger n

So this is a bad test. Besides being impossible to implement, what are we proving by testing every combination that we can’t prove by just testing a few edge cases?


Do you do code review?

Marie Antoinette doing code review, apparently.

I’m always interested to hear from organizations who are trying to put automated deployment/testing in place, but have given not a single thought to instituting a code review process.

In such cases I always want to ask: if no one reviews anyone’s code then how the hell do you know what you are testing? The answer (if there is one) is always some hand-waviness about how actually some teams do code review and/or “that’s on the road map.”

No code review? Really? Really?!?

Code review is not optional.

Forget about the pyramid of testing — implementing any kind of deployment pipeline automation without code review is game over. Why? Because code quality is all about comprehensible code. Code review is the stage in the deploy pipeline where comprehensibility (or lack thereof) is evaluated by the simple expedient of… having at least one other human being look at your code and decide if it makes sense, and if it does make sense then is it good enough to deploy to production?

Without code review, testing cannot tell you anything meaningful. Of course code that’s never been reviewed has bugs. Duh.

What is code review?

Refer to the Joel Test. The last question is: “Do you do hallway usability testing?”

All “hallway UX testing” means is: do you get at least one other human being to look at your designs before you start implementing? Code review is similar.

All “code review” means is: do you get at least one other human being to look at your code before you integrate it?

How does code review work?

Here is a simple recipe for code review:

  1. Turn to your left.
  2. Do you see another programmer? Tap them on the shoulder.
  3. Show them your monitor.
  4. Say: does this make sense? Because I’m about to push it to prod.
  5. Listen to what the other programmer says.
  6. Make your own decision about whether to change the code and/or push to prod.

That’s it.

Now, I can sell you lots of sophisticated code review tools and I can then spend a bazillion hours customizing those tools. But all that code review tools can do is facilitate the process described above.

Simple tools for code review

Here are the simple code review tools I’d recommend if your organization employs fewer than 5,000 software engineers:

  1. less
  2. diff, or wdiff where line breaks are lacking
  3. GitHub pull requests

That’s it. There are tons of code review tools out there and I’ve never heard one engineer say they loved any of them. As far as I’m concerned, less, diff and pull requests will scale pretty well as long as your primary use case is asking another human to verify your code is not insane.

If your code review process is so complex that it requires a ton of tooling, that’s almost certainly because you’ve acquired some unnecessary ceremony that you could remove. Bring the pain to me on Twitter if you disagree ;-D

Gandalf with a laptop


How to write tests for a system that doesn’t have any

knit graffiti

Divide and conquer the system by extracting parameters.

Note that I use the term “parameters” here very loosely to mean “any input that the system will accept, in whatever form.” A binary file might be considered a “parameter” for testing purposes. Especially if the system you want to test can only consume binary files and has no programmatic interface!

For each parameter so extracted, it should be possible to write several meaningful tests around the interesting values that can be assigned to the the newly extracted parameter.

Software is arbitrarily decomposable so in practice it simply takes time and smart engineers to locate appropriate inputs and outputs for meaningful testing.

This approach is described in gorey detail in Working Effectively With Legacy Code. tl;dr: locate the meaningful inputs and outputs first and only then begin to design specific tests. Because the available inputs and outputs are a hard constraint on test design.

Do Whatever It Takes(tm) to get the system under test.

All systems have inputs and outputs. These can include:

  • accepting and sending messages over various TCP protocols
  • making database queries
  • writing and reading files from the filesystem

And all systems have domain languages including:

  • API specifications (documented or not)
  • "obviously intuitive" parameter orderings
  • expected file formats

Although these hidden languages are rarely elegant, identifying and using them makes the difference between safely adding tests to a system, and not.

Oh we can’t test that part of the system because it is a black box.


Boris Beizer in 1990, detailing the phenomenon Michael Feathers later described as &#8220;pinch points.&#8221;

Call them what you will, such points-of-separation in a monolithic system represent opportunities to imbue a monolithic legacy system with modern attributes!

Boris Beizer in 1990, detailing the phenomenon Michael Feathers later described as “pinch points.”

Call them what you will, such points-of-separation in a monolithic system represent opportunities to imbue a monolithic legacy system with modern attributes!


How software systems learn: what happens after they’re built

How Buildings Learn by Stewart Brand

As software designers we have to look beyond our own discipline for insight into how to handle the complexity which we create. By only looking at software we would be limited to 60 years or so of direct experience. The insight provided by adapting lessons learned from IRL building open up the possibility of applying tens of thousands of years of knowledge to the software engineering discipline. Such order of magnitude gains cannot be ignored!

Software Architecture concepts such as design patterns and “system maintenance” are in fact borrowed concepts from the field of IRL architecture! Brian Foote (Brian is credited with coining the phrase “Inversion of Control”) does a great job of explaining this in the canonical paper on legacy systems, Software Systems As Big Balls Of Mud. (PDF).

While I first read Christopher Alexander (because he and his team created the idea of architectural design patterns) I think that the most directly useful “buildings architecture” :) book I’ve read so far is "How Buildings Learn: What Happens After They’re Built" by Stewart Brand, editor of The Whole Earth Catalog and co-founder of proto-Web community the WELL.

Here is the relevant excerpt from Software Systems As Big Balls Of Mud by Brian Foote. I come back to this passage over and over in course of learning to move from “old world thinking” or as it is lovingly called in my niche of the industry: “old school QA.”

When designers are faced with a choice between building something elegant from the ground up, or undermining the architecture of the existing system to quickly address a problem, architecture usually loses. Indeed, this is a natural phase in a system’s evolution [Foote & Opdyke 1995]. This might be thought of as messy kitchen phase, during which pieces of the system are scattered across the counter, awaiting an eventual cleanup. The danger is that the clean up is never done. With real kitchens, the board of health will eventually intervene. With software, alas, there is seldom any corresponding agency to police such squalor. Uncontrolled growth can ultimately be a malignant force. The result of neglecting to contain it can be a BIG BALL OF MUD.

In How Buildings Learn, Brand [Brand 1994] observed that what he called High Road architecture often resulted in buildings that were expensive and difficult to change, while vernacular, Low Road buildings like bungalows and warehouses were, paradoxically, much more adaptable. Brand noted that Function melts form, and low road buildings are more amenable to such change. Similarly, with software, you may be reluctant to desecrate another programmer’s cathedral. Expedient changes to a low road system that exhibits no discernable architectural pretensions to begin with are easier to rationalize.

In the Oregon Experiment [Brand 1994][Alexander 1988] Alexander noted:

Large-lump development is based on the idea of replacement. Piecemeal Growth is based on the idea of repair. … Large-lump development is based on the fallacy that it is possible to build perfect buildings. Piecemeal growth is based on the healthier and more realistic view that mistakes are inevitable. … Unless money is available for repairing these mistakes, every building, once built, is condemned to be, to some extent unworkable. … Piecemeal growth is based on the assumption that adaptation between buildings and their users is necessarily a slow and continuous business which cannot, under any circumstances, be achieve in a single leap.

Alexander has noted that our mortgage and capital expenditure policies make large sums of money available up front, but do nothing to provide resources for maintenance, improvement, and evolution [Brand 1994][Alexander 1988]. In the software world, we deploy our most skilled, experienced people early in the lifecycle. Later on, maintenance is relegated to junior staff, when resources can be scarce. The so-called maintenance phase is the part of the lifecycle in which the price of the fiction of master planning is really paid. It is maintenance programmers who are called upon to bear the burden of coping with the ever widening divergence between fixed designs and a continuously changing world. If the hypothesis that architectural insight emerges late in the lifecycle is correct, then this practice should be reconsidered.

Brand went on to observe Maintenance is learning. He distinguishes three levels of learning in the context of systems. This first is habit, where a system dutifully serves its function within the parameters for which it was designed. The second level comes into play when the system must adapt to change. Here, it usually must be modified, and its capacity to sustain such modification determines it’s degree of adaptability. The third level is the most interesting: learning to learn. With buildings, adding a raised floor is an example. Having had to sustain a major upheaval, the system adapts so that subsequent adaptations will be much less painful.

Again, the above passed is excerpted from Software Systems As Big Balls of Mud — thanks for reading all the way to the bottom! =D


Outages and service degradations are a NORMAL part of the software life cycle


Preventing all possible risks entirely can be shown to be mathematically impossible, besides which it would be practically impossible due to cost and time constraints.

Recovering from failure by contrast is a quality that contributes to the survival of the organization. In the words of Vince Lombardi: it is not how many times you get knocked down, but how many times you get back up, that determines the outcome.

Optimizing for rapid recovery and spending less time on preventing failures is therefore a very pragmatic attitude. Recovery is something that can potentially be practiced on a daily basis. And the ability to recover is easily quantifiable (eg: did the site come back up or not?)

Risk prevention by contrast cannot be measured in any meaningful way — how do you assess the value of outages that were prevented? If an outage did not occur, how do you determine that the outage was prevented versus just… not occurring… ?

In The ETTO Principle, Erik Hollnagel suggests that we should study success with the same alacricity with which we study failure. John Allspaw echoes this idea in many of his talks and the idea forms part of the foundation for the practice of holding blameless post mortems around production incidents.

It is worth noting that studying the daily minutia of everyday success (aka normal operation without any significant failures) — already exists as an academic discipline and is called Cultural Anthropology.

Implications for Software Quality Assurance

If as Hollnagel says, failures are special cases of success, then that invalidates the entire concept of QA and/or Release Engineering as “the cop.” All strategies that rely on enforcement, go out the window.

Enforcement is no longer an option because (as Hollnagel points out) the nature of complex systems is such that it is not possible to provide documented formal procedures for managing such systems. Success is necessarily an ad hoc affair. In general we accept this — except in the case of failure, where by convention we place blame on the first human who can be shown to have violated the documented procedures.

But if failure is instead systemic, then identifying a human as “responsible” for a failure doesn’t make sense. Modern software quality investigations must therefore move beyond the concept of human error and root cause.

Two kittens cooperating to deploy new code to production. Kitten power!


Using static analysis to quickly become familiar with a large codebase

business cat says sort every file in the codebase from a to z and start reading

Static analysis tools automate the process of reading code. Any decent static analyzer can quickly process a large codebase and help you to find interesting entry points into reading the source code.

The point of static analysis is to help you find potentially interesting places to start reading the code.

—Sebastian Bergmann

Something I get asked to do a lot is look at suites of automated tests, with an eye toward improving architecture. In such a case I not only have to familiarize myself with the tests but with the codebase under test as well. And I have to be able to do that regardless of the size of the codebase. Static analyzers are the first tool I reach for in such cases.

Outlined below are the heuristics I have found effective for quickly figuring out where the important code is located, no matter how large or complex the codebase.

Start with source control.

  1. Check out a fresh copy of the project’s source code repository (repo).
  2. How long does the repo take to download?
  3. How big is the repo?
  4. How many branches does the repo have?
  5. How many committers?
  6. Who were the recent committers?
  7. When was the first commit made (how far back does this repo go)?
  8. What files were edited recently?
  9. How many files tend to be touched by a single commit?
  10. Does the volume of files touched follow a pattern over time?
  11. How much code has been deleted from the repo over time?
  12. Who has made recent significant deletions of code?
  13. Run gitk on the repo and look at the output.
  14. Run gource on the repo and watch the movie.

PEDANTIC SIDE NOTE: I’ve grouped these heuristics under the umbrella of static analysis because each heuristic can be performed (and is useful) without running any code. Technically, analysis of the Git log is Code Churn analysis and not static analysis. Meh.


Next, look at project layout.

  1. First, what project assets are not checked in to source control?
  2. What is the top-level directory structure of the project?
  3. If there is a README file, what is in it?
  4. What’s in the CHANGELOG, if there is one?
  5. What file extensions are in the project?
  6. How much static client side code is there (how many JavaScript, CSS and HTML files)?
  7. How many application code files are there? Eg for a PHP project, how many .php files?
  8. What is the overall directory structure of the project? In other words — spend some time paging through the output of tree | less
  9. How many files and directories are there in all?
  10. Are there tests?
  11. Do the tests use an open-source xUnit test runner or BDD framework?

Look for trivially obvious smells.

  1. How many template files are there?
  2. Is there business logic in the templates?
  3. Is there any SQL in the templates?
  4. In the Git log, the code and the comments, what is the distribution of swear words?
  5. What is the distribution of words like “bug,” “kludge” and “hack”?
  6. In general, what is the distribution of sentiment words? (Eg does the word “confusing” occur a lot?)
  7. How many lines of code are in the codebase?
  8. How many lines of code tend to be in a file?

Look for smells that can be trivially detected by an open-source tool.

  1. For each language in the code base, does every file in that language pass the language interpreter’s lint check?
  2. What kinds of and how many style violations are found when running the style checker for each language in its default configuration?
  3. What is the distribution of cyclomatic complexity of files?
  4. What is the distribution of npath complexity?
  5. How many unique tokens tend to be in a file?
  6. What is the ratio of comments to code?
  7. How many lines of code tend to be in a method?
  8. How many lines of code in a class?
  9. Duplicated code — how much and what is it? More importantly, why?

a cat that fell asleep trying to read the entire internet


What is the difference between testing and debugging?

woman riding on the edge of a surfboard

Debugging starts with broken code and works backward from there to get to working code.

Testing assumes the code works and builds on that.

Testing represents a real competitive advantage over debugging because testing is a practice that can be applied to working code rather than broken code. And large, successful Web applications are universally “grown organically” from smaller, working Web applications; again by applying a continuous stream of small improvements to working code.

The idea that large systems are grown from smaller systems is supported by Fred Brooks in No Silver Bullet:

One always has, at every stage in the process, a working system. I find that teams can grow much more complex entities… than they can build.

Brian Foote has also recognized this in his "Keep It Working" design pattern. And chapter 2 of Refactoring by Martin Fowler, makes explicit the relationship between well-tested code and working code.

This is not to say that the practice of debugging isn’t useful!

Code does break. In such cases, debugging skills are invaluable. Rather than eliminate debugging, the point is to use testing to minimize the amount of time spent debugging.

Additionally it’s worth noting that the class of tools called debuggers is useful for more than strictly debugging broken code. Debuggers are useful for stepping through code, inspecting the values held in memory, and many other activities that contribute to the central software engineering goal of checking one’s assumptions in every way that is feasible and as soon as is convenient.

The whole time I am programming, I’m testing my assumptions… I use whatever tools are available.

— Rasmus Lerdorf

an armadillo lizard biting its tail like an orobourous, to represent the idea of a feedback loop


How to be an open source hacker in the enterprise and get away with it

oakland raiders / tuskan raider t-shirt from watch your back nyc, hat tip to

I was able to successfully build a career as an open source software consultant while working full-time for enterprise software organizations.

This guide is based upon over ten years of my own personal experience working in enterprise Web shops in New York while contributing to open source community and open source software projects. In all my years of doing this, I’ve never once been questioned on it by my employer. Those employers who did bring it up, saw my open source contributions as a pure competitive advantage and in some cases they even asked me how other engineers could replicate my success :)

Following these six simple rules has led to my repeated successes in getting enterprise software organizations to pay me to contribute to open source.

  1. I never, ever worked on an open-source project that was in any way related to my employer’s core business. For instance, if you work for an eCommerce company (as I generally did), do not get publicly involved with open-source eCommerce platforms.
  2. Always in everything, I remembered Grace Hopper’s advice: "it is better to ask forgiveness than to ask permission."
  3. Seriously: go back an read the previous rule again. I never, ever asked my employer to support me in my efforts to contribute to open source.
  4. I never asked my employer if it was OK for me to contribute to open source.
  5. I never mentioned or advertised to my employer that I was contributing to open source.
  6. In the eventuality that my employer discovered I had made contributions to open source, I acknowledged the fact, tried to downplay it, and let the conversation move on.

Note that while rule 1 states not to get involved with projects that might in some way be perceived as competetive with your employer, I did focus on contributing to projects that produce tools for creating components of software systems similar to my employer’s. That is: I contributed to infrastructure tools for building eCommerce applications.

Infrastructure is a good area to contribute to in terms of open source, because no one knows how to directly monetize tools like test runners and IRC robots. So no executive will ever come around after the fact asking why The Company didn’t get a chance to market said system before it was open-sourced. But at the same time, such systems are very valuable and can attain a widespread installed base.

In conclusion I’d emphasize that I followed all the above practices — no half-measures. I attribute my success to strict and painstaking adherence to strategy.

DISCLAIMER: I am not a lawyer. Your ability to participate in open source projects may be more severely constrained than I was, due to different agreements you have with your employer. Where doubt exists, it is always advisable to have a lawyer review any relevant contracts.


How I number my releases

space shuttle launch

This is the numbering scheme I use for releases and milestones on my own projects. If you use any of my code then you might encounter this version numbering system.

v00.00 <- format will always sort filesystem assets correctly into
       the expected numerically ascending order, regardless of
       whether the sort algorithm in use is numerical or

v00.00 <- prototype

v00.01 <- odd numbers indicate dev milestones

v00.13 <- example of another dev milestone

v00.02 <- even numbers indicate public milestones (eg: showed
       prototype to stakeholder, added some comments with UX research
       findings notes)

v00.24 <- example of another public milestone

v01.00 <- dev major version (eg: refactored ORM, added a staging
       environment, stopped using hotfix branches, etc.)

v02.00 <- public major release

Implicit is the assumption there is only one major dev milestone between each public release. Which is as it should be!