Please stop trying to sell me on management positions while offering a "QA Engineer” title and salary
Recently I was sent a job posting that embodied what I will call the Pernicious Myth of the QA Automation Engineer. Here’s just one line:
Own automated testing capabilities across Web, mobile and production processes.
An engineer cannot by definition take responsibility (that’s what “ownership” means — right?) for features that cut across multiple groups. In all but the smallest companies, Web and mobile would be managed by different teams. With different MANAGERS.
Engineers don’t take responsibility for the behavior of MULTIPLE managers across different teams.
This is a Director’s job.
Managing software capabilities across multiple teams is the job of a mid-level Engineering manager such as a director or (in larger organizations) a VP.
Why not? Because decades of computer engineering experience show that it never works.
Management (with all its wondrous hierarchical levels) is responsible for behavior of people within and across teams. Engineers and designers are responsible for the behavior and organization of the product. Not the people. People are a management problem. Especially at scale. Organizations that forget this fail.
Takeaway: do not sign on for a director’s job at an engineer’s salary
I’m not saying no one should take responsibilty for the “tests and testability” of an application or service.
What I am saying is that someone should be explicity responsible for testing across the whole organizaiton and that person should be at the director or executive level. Never at the engineer or team lead level. Ever.
There is a widespread misconception that QA should own testing. This is false. Not only is it false but it doesn’t even make sense. You cannot have a discrete group that “owns” testing because testing is integral to the activities of both engineers and designers. Organizations often make a choice to ignore this, but the choice is based on cultural inertia rather than science.
Engineering and Product always owns testing because testing is a design process. If QA performs all testing and devises test plans, then that’s an explicit decision by Engineering and Product groups to try not to own testing.
As an engineer-or-designer, you can try not to own testing, but it doesn’t work. Because testing is part of the engineering and design process. Testing is not an activity in itself. Rather it is an aspect of the larger activity of designing and building software systems.
I’ve been studying the design and maintenance of software systems for a while now, and these 14 articles have helped me to attain a higher level of understanding of the domain. I thought I’d share them with you.
Any "IT Crisis" then can be re-understood as hitting the steep rightward end of the diseconomies of scale curve (shown below) with costs spiking at the beginning and end of the life of the organization. The spike at the end is due to communication costs, which again: CI mitigates communication costs, at least for an engineering team.
Preventing risks entirely can be shown to be mathematically impossible, besides which it would be practically impossible due to cost and time constraints.
Recovering from failure by contrast is a quality of organizations that incontrovertibly contributes to the survival of the organization. In the words of Vince Lombardi: it is not how many times you get knocked down, but how many times you get back up, that determines the outcome.
Optimizing for rapid recovery
Optimizing for rapid recovery — even at the expense of being less effective at preventing failures — is a most pragmatic attitude. Recovery is something that can potentially be practiced on a daily basis.
And as an added bonus: the ability to recover is easily quantifiable (eg: did the site come back up or not?) Risk prevention by contrast cannot be measured in any meaningful way — how do you assess the value of outages that were prevented? If an outage did not occur, how do you determine that the outage was prevented versus just… not occurring… ? You don’t. You just… guess. Optimizing for recovery instead; removes the guessing and allows your team to focus on quantifiably improving the robustness of the system!
…my contention [is] that automatic computers represent a radical novelty and that only by identifying them as such can we identify all the nonsense, the misconceptions and the mythology that surround them. Closer inspection will reveal that it is even worse, viz. that automatic computers embody not only one radical novelty but two of them.
The first radical novelty is a direct consequence of the raw power of today’s computing equipment. We all know how we cope with something big and complex; divide and rule, i.e. we view the whole as a compositum of parts and deal with the parts separately. And if a part is too big, we repeat the procedure. The town is made up from neighbourhoods, which are structured by streets, which contain buildings, which are made from walls and floors, that are built from bricks, etc. eventually down to the elementary particles. And we have all our specialists along the line, from the town planner, via the architect to the solid state physicist and further. Because, in a sense, the whole is “bigger” than its parts, the depth of a hierarchical decomposition is some sort of logarithm of the ratio of the “sizes” of the whole and the ultimate smallest parts. From a bit to a few hundred megabytes, from a microsecond to a half an hour of computing confronts us with completely baffling ratio of 10^9! The programmer is in the unique position that his is the only discipline and profession in which such a gigantic ratio, which totally baffles our imagination, has to be bridged by a single technology. He has to be able to think in terms of conceptual hierarchies that are much deeper than a single mind ever needed to face before. Compared to that number of semantic levels, the average mathematical theory is almost flat. By evoking the need for deep conceptual hierarchies, the automatic computer confronts us with a radically new intellectual challenge that has no precedent in our history.
Again, I have to stress this radical novelty because the true believer in gradual change and incremental improvements is unable to see it. For him, an automatic computer is something like the familiar cash register, only somewhat bigger, faster, and more flexible. But the analogy is ridiculously shallow: it is orders of magnitude worse than comparing, as a means of transportation, the supersonic jet plane with a crawling baby, for that speed ratio is only a thousand.
The second radical novelty is that the automatic computer is our first large-scale digital device. We had a few with a noticeable discrete component: I just mentioned the cash register and can add the typewriter with its individual keys: with a single stroke you can type either a Q or a W but, though their keys are next to each other, not a mixture of those two letters. But such mechanisms are the exception, and the vast majority of our mechanisms are viewed as analogue devices whose behaviour is over a large range a continuous function of all parameters involved: if we press the point of the pencil a little bit harder, we get a slightly thicker line, if the violinist slightly misplaces his finger, he plays slightly out of tune. To this I should add that, to the extent that we view ourselves as mechanisms, we view ourselves primarily as analogue devices: if we push a little harder we expect to do a little better. Very often the behaviour is not only a continuous but even a monotonic function: to test whether a hammer suits us over a certain range of nails, we try it out on the smallest and largest nails of the range, and if the outcomes of those two experiments are positive, we are perfectly willing to believe that the hammer will suit us for all nails in between.
It is possible, and even tempting, to view a program as an abstract mechanism, as a device of some sort. To do so, however, is highly dangerous: the analogy is too shallow because a program is, as a mechanism, totally different from all the familiar analogue devices we grew up with. Like all digitally encoded information, it has unavoidably the uncomfortable property that the smallest possible perturbations —i.e. changes of a single bit— can have the most drastic consequences. [For the sake of completness I add that the picture is not essentially changed by the introduction of redundancy or error correction.] In the discrete world of computing, there is no meaningful metric in which “small” changes and “small” effects go hand in hand, and there never will be.
“Hierarchical systems seem to have the property that something considered as an undivided entity on one level, is considered as a composite object on the next lower level of greater detail; as a result the natural grain of space or time that is applicable at each level decreases by an order of magnitude when we shift our attention from one level to the next lower one. We understand walls in terms of bricks, bricks in terms of crystals, crystals in terms of molecules etc. As a result the number of levels that can be distinguished meaningfully in a hierarchical system is kind of proportional to the logarithm of the ratio between the largest and the smallest grain, and therefore, unless this ratio is very large, we cannot expect many levels. In computer programming our basic building block has an associated time grain of less than a microsecond, but our program may take hours of computation time. I do not know of any other technology covering a ratio of 10^10 or more: the computer, by virtue of its fantastic speed, seems to be the first to provide us with an environment where highly hierarchical artifacts are both possible and necessary.”—Edsger Dijkstra
When you write comments in source code, the audience is yourself in the future because what you know now is so complex, you may never be able to remember it again. Programming is the only professional discipline that universally encourages the practice of writing notes addressed to one’s future self.
Conway's Law and how no one wants to talk about it
"organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations"
—Mel Conway, 1968
Frequently, the organization of the system reflects the sprawl and history of the organization that built it (as per Conway’s Law) and the compromises that were made along the way.
— Brian Foote & Joseph Yoder, 1999
If you want a product with certain characteristics, you must ensure that the team has those characteristics before the product’s development.
— Jim and Michele McCarthy, 2002
No software can be better organized than the team that creates it.
I’ve been a consultant and a Web developer in many, many organizations. I can tell you one thing no one anywhere wants to hear is that the organization is straight-up incapable of delivering certain features.
Certain kinds of organizations can’t deliver certain kinds of software systems.
Conway’s law says that if you have known “really bad communication problems” in your organization, then you are constrained such that your organization cannot produce a software system that has efficient architecture when it comes to communication between services and/or the components that make up those services.
Corollary To Conway’s Law
It’s too bad people have so much trouble admitting that Conway’s Law is their actual constraint. If you want to work around architectural problems in your software, you can look for direct analogues in how the people in your organization work around the communication problems. Sometimes by solving communication problems in your organization, you can cause problems in the software to disappear without any explicit fix being applied in code.
“More than the act of testing, the act of designing tests is one of the best bug preventers known. The thinking that must be done to create a useful test can discover and eliminate bugs before they are coded — indeed, test-design thinking can discover and eliminate bugs at every stage in the creation of software, from conception to specification, to design, coding and the rest.”—Boris Beizer
The Pesticide Paradox as explained by Boris Beizer
The following is excerpted from Software Testing Techniques, 2d. Ed. by Boris Beizer.
First Law: The Pesticide Paradox
Every method you use to prevent or find bugs leaves a residue of subtler bugs against which those methods are ineffectual.
That’s no too bad, you say, because at least the software gets better and better. Not quite!
Second Law: The Complexity Barrier
Software complexity (and therefore that of bugs) grows to the limits of our ability to manage that complexity.
Corollary to the First Law: Test suites wear out.
Yesterday’s elegant, revealing, effective test suite will wear out because programmers and designers, given feedback on their bugs, do modify their programming habits and style in an attempt to reduce the incidence of bugs they know about. Furthermore, the better the feedback, the better the QA, the more responsive the programmers are, the faster those suites wear out. Yes, the software is getting better, but that only allows you to approach closer to, or to leap over, the previous complexity barrier. True, bug statistics tell you nothing about the coming release, only the bugs of the previous release — but that’s better than basing your test technique strategy on general industry statistics or myths. If you don’t gather bug statistics, organized into some rational taxonomy, you don’t know how effective your testing has been, and worse, you don’t know how worn out your test suite is. The consequences of that ignorance is a brutal shock. How many horror stories do you want to hear about the sophisticated outfit that tested long, hard, and diligently — sent release 3.4 to the field, confident that it was the best tested product they had ever shipped — only to have it bomb more miserably than any prior release?
As I found out later, these aphorisms are Ha-Ha Only Serious. That is: they sound funny. But take them seriously and you can design complex systems that grow and evolve and continue to deliver value over the long term.
As I later found out at my own expense, these aphorisms are Ha-Ha Only Serious. Jokes like this exist in the hacker culture because complex systems exhibit behavior that is so counterintuitive that… well sometimes you just have to laugh.
There are lots of fancy “code review tools” available, but all Code Review really means is: another human being reads your code and you can talk to them about it, before you merge into the mainline.
Half a dozen command-line code review tips
For most teams, code review can begin with diff and less — the simple command line tools that everyone already uses to read code. Add a couple of static analyzers on top of an ad hoc code review culture, and you could build a process that can scale to several thousand engineers before you actually need a dedicated code review tool!
Gresham’s Law states that counterfeit currency will tend to be exchanged by otherwise honest actors. What does that have to do with software engineering? In the programming world, code is the currency of exchange. So Gresham’s Law in the programming world is: bad code will tend to get written by otherwise intelligent engineers.
How does Gresham’s law apply to test coverage?
Consider the case where engineers are asked by management to contribute unit tests such that code coverage remains at/above a numerical target such as 80%. There is by definition no direct business benefit to providing these tests, since tests are never seen by the customers. Therefore if it is possible to fake test contributions by gaming test coverage metrics, then engineers will tend to regard this subversion as the only ethically viable choice. Time not spent on test coverage is time spent increasing business ROI.
There is a mismatch between what science knows and what business does
Psychologist Dan Pink is very interesting to me because he makes the scholarly argument for working the way I work!
Autonomy, mastery and purpose.
How do I work? Well, I don’t worry about money other than paying rent and other basic necessities. I don’t worry about career advancement at any one company because I follow my own interests and those cut across organizations.
Dan Pink makes a great argument for working the way I work!
I focus on entirely on doing work in which I am interested because — as long as the work remains interesting — then I will work hard without having to force myself to do it and actually I will enjoy it and in the course of enjoying my work I will (experience shows) work with clients who are optimally satisfied with my work.
The way I work makes sense once you look at the science!
Here are my notes on Dan Pink’s TED Talk. I would suggest watching the whole thing (embedded above) before reading my response — but if you want to treat this post like a skimmable cliffs-notes-version of the talk, that’s OK too! I’ve provided a transcript of the relevant bits of Dan’s talk, below.
When you encourage people to chase rewards like bonuses then you actually rob them of their creativity
Extrinsic rewards cause us to narrowly focus our attention. This works for repetetive tasks such as an assembly line. But narrow focus is ineffective for creative work.
"Financial incentives can have a negative impact on performance … Organizations are making decisions based on information that is outdated, unexamined, and rooted more in folklore for than in science”
engagement > compliance
"Traditional management is great if you want compliance. But for engagement, self-direction works better." And engagement is more important than compliance when it comes to knowledge workers.
It shouldn’t matter where, when or how people do their work as long as the work gets done. Meetings are optional. In these situations so far, it appears that productivity goes up and turnover goes down.
intrinsic motivation > extrinsic motivators
Carrots and sticks are ineffective motivators for when it comes to software engineers (and knowledge workers in general). What does motivate us is: being in an environment where we have autonomy, mastery and purpose.
Conditional motivators: “if you do this then you will rewarded with that” — are not effective with creative tasks. In fact they make performance worse and this has been replicated in many experiments over several decades.
If-then rewards work really well for those sorts of tasks, where there is a simple set of rules and a clear destination to go to. Rewards, by their very nature, narrow our focus, concentrate the mind; that’s why they work in so many cases. And so, for tasks like this, a narrow focus, where you just see the goal right there, zoom straight ahead to it, they work really well. …
Think about your own work. Are the problems that you face, … do they have a clear set of rules, and a single solution? No. The rules are mystifying. The solution, if it exists at all, is surprising and not obvious. …
"Do the problems that you face have a clear set of rules and a single solution?"
Ladies and gentlemen of the jury, some evidence: Dan Ariely, one of the great economists of our time, he and three colleagues, did a study of some MIT students. They gave these MIT students a bunch of games, games that involved creativity, and motor skills, and concentration. And the offered them, for performance, three levels of rewards: small reward, medium reward, large reward. Okay? If you do really well you get the large reward, on down. What happened? As long as the task involved only mechanical skill bonuses worked as they would be expected: the higher the pay, the better the performance. Okay? But once the task called for even rudimentary cognitive skill, a larger reward led to poorer performance. …
There is a mismatch between what science knows and what business does. And what worries me, as we stand here in the rubble of the economic collapse, is that too many organizations are making their decisions, their policies about talent and people, based on assumptions that are outdated, unexamined, and rooted more in folklore than in science. And if we really want to get out of this economic mess, and if we really want high performance on those definitional tasks of the 21st century, the solution is not to do more of the wrong things, to entice people with a sweeter carrot, or threaten them with a sharper stick. We need a whole new approach.
"There is a mismatch between what science knows and what business does."
And the good news about all of this is that the scientists who’ve been studying motivation have given us this new approach. It’s an approach built much more around intrinsic motivation. Around the desire to do things because they matter, because we like it, because they’re interesting, because they are part of something important. And to my mind, that new operating system for our businesses revolves around three elements: autonomy, mastery and purpose. Autonomy: the urge to direct our own lives. Mastery: the desire to get better and better at something that matters. Purpose: the yearning to do what we do in the service of something larger than ourselves. These are the building blocks of an entirely new operating system for our businesses. …
There is a mismatch between what science knows and what business does. And here is what science knows. One: Those 20th century rewards, those motivators we think are a natural part of business, do work, but only in a surprisingly narrow band of circumstances. Two: Those if-then rewards often destroy creativity. Three: The secret to high performance isn’t rewards and punishments, but that unseen intrinsic drive — the drive to do things for their own sake. The drive to do things cause they matter.
Here are some of my observations on the research of psychologist Dan Ariely.
Ariely’s work has remained important to me because in large part he studies what motivates us (human beings) in our day-to-day decisions at work. I’d recommend watching the whole talk (above) before reading this post — but if you want to treat this as a sort of skimmable “cliff’s notes” version of the TED talk then that’s ok too :) I’ve even provided a transcript of the relevant bits of Ariely’s talk, below.
Here’s my notes…
People particularly did not work well when also watching their previous work getting torn down at the same time.
Interestingly this is a characteristic experience for many teams deploying software:
each release immediately exhibits unintended behavior in production. One is faced with a mounting production bug count accrued from the previous releases.
At the same time, one is trying to pour all of one’s efforts into the next release, which one knows will be equally buggy.
Repeat until developers are bored of the cycle and begin to quit and/or find other projects…
…hire new developers and repeat.
The research that is described here goes some way toward explaining the why of the phenomenon described above :-\
It was not a very meaningful task [that we used in our experiments] but even a little bit of meaning made a [significant] difference.
People are motivated by the sense that they are building something. What they’re building does not seem to be particularly important.
What does seem to matter is that people have a consistent sense that each problem they solve somehow “builds on” the set of problems they have solved in the past. With this “sense of meaning” intact, people will happily work very hard so long as their other basic needs are taken care of.
Without the “sense of meaning,” no amount of coercion or money will get people to solve problems in a creative fashion.
Additionally it seems from the experimental evidence that in the absence of a “sense of meaning,” increasing the size of the reward or punishment actually makes workers output significantly worse when it comes to creative breakthroughs.
A theme throughout Ariely’s research is the difference between assembly-line tasks (that is, simple tasks with an obvious or predetermined solution) versus “knowledge worker” tasks; that is: tasks requiring creative problem-solving, and/or for which no solution is known to exist.
All of the various work roles that fall under the umbrella of “the tech industry” unquestionably fall in the category of knowledge worker tasks.
It is the migration of “industrial – age” tasks to the third world and to mechanical automation that is responsible for the growing interest in what motivates knowledge workers. Knowledge work is now the primary source of high-paying employment that we can reasonably expect to be a growing sector in the West in the near future. Most if not all middle-class people in the West will be knowledge workers within less than a generation — arguably this transition has already taken place. Therefore it is possible to make a fairly convincing case that organizations compete best when they foster a conducive environment for knowledge work.
Conversely it can be argued that organizations clinging to “industrial age” or “assembly-line” practices, will find themselves unable to compete no matter what other factors may be in their favor (eg great funding, good product concept, widespread brand identity, etc. — it won’t matter if all their programmers are bored and quitting after a year to 18 months).
Ignoring or refusing to acknowledge the existence of the results of people’s work had about the same negative impact on productivity as destroying that work in front of them.
In the “shredder condition” people could have done worse work and made more money but they didn’t. Instead their motivation dropped. In other words when faced with a Sysyphean task people lost even their motivation to game the system. They simply wanted to get out of the situation as fast as possible.
People are motivated by the perception that the work they are doing is meaningful to others.
There was another condition. This other condition was inspired by David, my student. And this other condition we called the Sisyphic condition. And if you remember the story about Sisyphus, Sisyphus was punished by the gods to push the same rock up a hill, and when he almost got to the end, the rock would roll over, and he would have to start again. And you can think about this as the essence of doing futile work. You can imagine that if he pushed the rock on different hills, at least he would have some sense of progress. Also, if you look at prison movies, sometimes the way that the guards torture the prisoners is to get them to dig a hole and when the prisoner is finished, they ask him to fill the hole back up and then dig again. There’s something about this cyclical version of doing something over and over and over that seems to be particularly demotivating. So in the second condition of this experiment, that’s exactly what we did. We asked people, “Would you like to build one Bionicle for three dollars?” And if they said yes, they built it. Then we asked them, “Do you want to build another one for $2.70?” And if they said yes, we gave them a new one, and as they were building it, we took apart the one that they just finished. And when they finished that, we said, “Would you like to build another one, this time for 30 cents less?” And if they said yes, we gave them the one that they built and we broke. So this was an endless cycle of them building and us destroying in front of their eyes.
Now what happens when you compare these two conditions? The first thing that happened was that people built many more Bionicles — they built 11 versus seven — in the meaningful condition versus the Sisyphus condition. And by the way, we should point out that this was not a big meaning. People were not curing cancer or building bridges. People were building Bionicles for a few cents. And not only that, everybody knew that the Bionicles would be destroyed quite soon. So there was not a real opportunity for big meaning. But even the small meaning made a difference.
Now we had another version of this experiment. In this other version of the experiment, we didn’t put people in this situation, we just described to them the situation, much as I am describing to you now, and we asked them to predict what the result would be. What happened? People predicted the right direction but not the right magnitude. People who were just given the description of the experiment said that in the meaningful condition people would probably build one more Bionicle. So people understand that meaning is important, they just don’t understand the magnitude of the importance, the extent to which it’s important.
The next experiment was slightly different. We took a sheet of paper with random letters, and we asked people to find pairs of letters that were identical next to each other. That was the task. And people did the first sheet. And then we asked them if they wanted to do the next sheet for a little bit less money and the next sheet for a little bit less money, and so on and so forth. And we had three conditions. In the first condition, people wrote their name on the sheet, found all the pairs of letters, gave it to the experimenter. The experimenter would look at it, scan it from top to bottom, say “uh huh” and put it on the pile next to them. In the second condition, people did not write their name on it. The experimenter looked at it, took the sheet of paper, did not look at it, did not scan it, and simply put it on the pile of pages. So you take a piece, you just put it on the side. And in the third condition, the experimenter got the sheet of paper and directly put it into a shredder. What happened in those three conditions?
In this plot I’m showing you at what pay rate people stopped. So low numbers mean that people worked harder. They worked for much longer. In the acknowledged condition, people worked all the way down to 15 cents. At 15 cents per page, they basically stopped these efforts. In the shredder condition, it was twice as much — 30 cents per sheet. And this is basically the result we had before. You shred people’s efforts, output, you get them not to be as happy with what they’re doing. But I should point out, by the way, that in the shredder condition, people could have cheated. They could have done not so good work, because they realized that people were just shredding it. So maybe the first sheet you would do good work, but then you see nobody is really testing it, so you would do more and more and more. So in fact, in the shredder condition, people could have submitted more work and gotten more money and put less effort into it. But what about the ignored condition? Would the ignored condition be more like the acknowledged or more like the shredder, or somewhere in the middle? It turns out it was almost like the shredder.
Now there’s good news and bad news here. The bad news is that ignoring the performance of people is almost as bad as shredding their effort in front of their eyes. Ignoring gets you a whole way out there. The good news is that by simply looking at something that somebody has done, scanning it and saying “uh huh,” that seems to be quite sufficient to dramatically improve people’s motivations. So the good news is that adding motivation doesn’t seem to be so difficult. The bad news is that eliminating motivations seems to be incredibly easy, and if we don’t think about it carefully, we might overdo it. So this is all in terms of negative motivation or eliminating negative motivation.
I have long been fascinated by the phenomenon of software teams pursuing a hard numerical target for code coverage. Therefore I have always made it a point to find out about this practice whenever I visit a software development shop.
Over the years it has always proved interesting to hear engineers’ responses to the following eleven questions:
Eleven weird old questions that will reveal whether your code coverage efforts are useful or just well-intentioned?
I try to ask these questions of engineers whenever discussing a new or existing test coverage project.
What is the specific, day-to-day benefit of covering every single line of code with a unit test?
What would be the specific, day-to-day benefit of achieving 80% code coverage?
How would a codebase (and a system) with 80% coverage behave differently than it does today?
How much worse would it be to achieve, say 60% coverage instead?
What about 79% coverage?
Why only 80% coverage as a goal — why not 90%?
What are the factors that contribute to system determinism?
Specifically how would increased test coverage contribute to system determinism?
When you talk about “code coverage” do you mean line coverage, branch coverage or statement coverage, or a combination of some-or-all of these?
Does your current code coverage metric include files that have no tests at all?
In other words, does your test coverage metric include all of your untested code, or do you only measure how well you have covered the code for which unit tests exist?
It is important to listen carefully to how these questions are answered. Does the team in fact have a reasoned answer for each of the 11 questions? Do the coverage metrics that are in use actually make sense from a business perspective? Has the team examined low-cost code quality strategies such as code review and static analysis? Is there an explicit mapping of test automation benefit to widespread organizational benefit, at least within the engineering team? Are the problems the team is facing actually soluble via the route of adding test coverage?
It is unfortunately very easy for humans to place undue faith in numerical targets. This is afaict a consequence of our psychology. That numerical targets are intrinsically deceptive is not a problem to be solved, rather it is a serious limitation that must be considered when designing test infrastructure.