Infinite Undo! RSS


 
 

Hello, I'm Noah Sussman. I am a scientist studying technosocial systems in New York.

Mar
27th
Thu
permalink

Why is it so hard to design automated GUI-driven functional tests?

Google search suggestions for why is Jenkins

Google wants to correct flaky tests to flaky SELENIUM tests. LOL.

It’s really, really difficult to design automated tests that are useful at Web scale. The Web is littered with blog posts, Quora posts and StackOverflow questions about how to deal with the consequences of test systems that were built without the benefit of rigorous analysis.

What sort of analytics and algorithms are useful in designing tests?

To write effective automated tests, engineers have to switch their minds over to thinking about the whole system rather than a single unit of functionality. This is one reason why expert testers are so useful: they don’t have to context switch from “developer mode” in order to get work done.

The expert tester designs her tests by implicitly referencing a state diagram of a whole system. But, if you draw the state diagram of a system like, say, login and authentication — that’s a very complex set of paths.

So the expert tester also decomposes the state diagram of each feature into multiple paths. A state diagram is then implicitly constructed for each of those paths. She then prioritizes the paths and assesses the feasibility of actually implementing a computer program that can traverse those that are most important.

It’s never the case that all of the highest priority paths can actually be tested given our current technology. Figuring that out up front is essential, otherwise it’s very easy to blow all the available development time on a test that looks superficially easy but can’t actually be implemented.

How is test design done? And how specifically is it different from “customer-facing” software architecture?

Note that I keep saying “implicitly constructs a state diagram.” That’s because the expert tester is a domain expert who’s been building her skills for years. Plus, she’s gotten very good at state and path analysis. She doesn’t need to draw diagrams and perform explicit path reduction. She can do that stuff in her head. She doesn’t need explicit mathematical models any more than a front end engineer needs to see an explicit graph of the DOM.

That’s a longwinded way of saying: test design can look easy — when it’s being done by highly trained professionals. But everyone has to spend years climbing a very painful learning curve in order to get good at test design. In that respect test design is just like any other programming skill.

So it’s important to approach large-scale automated testing projects knowing that the work involved is not intuitive or obvious to engineers who design client-facing features for a living! There is a lot of specialized analysis that needs to happen before the first line of code is written. The expert tester does this analysis implicity, and can get the job done very fast. But for engineers who don’t think about testing 24-7, it won’t hurt to grab a conference room and chart out some finite state diagrams on the whiteboard!

Remember: you cannot write a useful test unless you can represent the scenario as a finite state machine. If you can’t do that then the set of paths you have chosen is too complex and/or non-deterministic; and you need to re-apply the path reduction heuristic described above.

Even a simple test case contains a surprisingly large number of execution paths

Mar
20th
Thu
permalink

Counting The Number Of Ways You Can Put Items Into A Shopping Cart: A Combinatorial Explosion Problem

knots represent complexity and interconnectedness aka entanglement

This problem illustrates how common, every day Web functionality presents the test designer with a staggeringly large number of execution paths.

Imagine you have a shopping cart which can contain up to 100 different items in varying quantities. There are about 150 different items to choose from.

Now imagine you have been asked to design an automated test that exercises all possible combinations of items, in all valid quantities including zero.

How many variables do we need to think about here? There are 3 variables:

  1. capacity of the cart,
  2. number of different items, and
  3. how many of each item is available

How would you describe the complexity of this test? The number of paths that need be excercised exhibits factorial or O(n!) growth as we put more items into the cart.

How long would you expect this test to take to run? There are over 10^45 ways (that’s the sum of 150 choose i as i grows from 1 to 100) to fill the cart with just one of each item. That’s just one of each item, we haven’t even considered different quantities of items yet.

the sum of 150 choose i from 1 to 100

Assuming one test per microsecond, runtime exceeds the age of the universe (only about 4 * 10^17 seconds). And that still is just a fraction of the number of paths that would be encountered in the the full test case as described above; where not just all possible combinations of items, but also all possible combinations in all valid quantities from 0 to 100, would be tested. I didn’t bother to do the math on that one, but it’s a lot of paths :)

This is a bad test. Besides being impossible to implement, what are we proving by testing every combination that we can’t prove by just testing a few edge cases?

permalink

Counting the Number of Paths Through A Series of Drop-Down Menus: A Combinatorial Explosion Problem

A complex system of nodes connected by edges.

This problem illustrates how a test scenario that originally was simple, can “explode” into numerous cases through the addition of “just a little bit more” functionality to the application under test!

Imagine a form with two drop-down menus. The first menu has two options and the second menu has three options. You must choose one option from each menu in order to submit the form.

How much time would you expect that it would take you to write and execute a test plan for this form?

    2 * 3 = 6

How does the answer change when we add a third drop-down menu that has four options?

    2 * 3 * 4 = 24

How does it change if I add two more menus with five and six options, respectively?

    2 * 3 * 4 * 5 * 6 = 720

How would your expectations about a manual test plan change when I add another drop-down menu with 7 options?

    2 * 3 * 4 * 5 * 6 * 7 = 5040

the number of paths grows factorially as menus are added

This test will not scale well because it exhibits factorial growth. Additionally the test’s design is bad: why do we need to test every single item on every single menu? Is every item handled by a different subroutine?

A pragmatic question to ask in this case would be: “what don’t we cover by just testing one menu in isolation?”

permalink

Testing A Multiple Select Menu: A Combinatorial Explosion Problem

a complex structure with many nodes and edges

This problem is illustrative of how the scope of a test scenario can “explode” through the introduction of “just a few more edge cases.”

Let’s say we have a multiple-select menu with 3 options. In order to submit this menu, you need to choose at least one option. Other than that you can choose as many options as you want, up to and including all of them.

  1. foo
  2. bar
  3. baz

How many tests do you need to perform in order to test every combination of items that a user could choose? In other words, how many different paths can a user take through this menu?

7 ways: the number of ways you can choose up to 3 menu items is the sum of 3 choose i as i grows from 1 to 3

There are 7 paths: the number of ways you can choose up to 3 menu items is the sum of 3 choose i as i grows from 1 to 3

Now what if I double the number of menu items?

  1. foo
  2. bar
  3. baz
  4. homer
  5. marge
  6. maggie

Now there are 63 paths — exponential growth has occurred!

now there are 63 paths -- exponential growth has occurred!

What about with 20 options?

Now there are over one million paths through the test case as described above.

now there are over a million paths

Such a test will not scale to even a moderately large number of menu items, because it exhibits factorial growth.

O(n!) growth for larger n

So this is a bad test. Besides being impossible to implement, what are we proving by testing every combination that we can’t prove by just testing a few edge cases?

permalink

How to write tests for a system that doesn’t have any

knit graffiti

Divide and conquer the system by extracting parameters.

Note that I use the term “parameters” here very loosely to mean “any input that the system will accept, in whatever form.” A binary file might be considered a “parameter” for testing purposes. Especially if the system you want to test can only consume binary files and has no programmatic interface!

For each parameter so extracted, it should be possible to write several meaningful tests around the interesting values that can be assigned to the the newly extracted parameter.

Software is arbitrarily decomposable so in practice it simply takes time and smart engineers to locate appropriate inputs and outputs for meaningful testing.

This approach is described in gorey detail in Working Effectively With Legacy Code. tl;dr: locate the meaningful inputs and outputs first and only then begin to design specific tests. Because the available inputs and outputs are a hard constraint on test design.

Do Whatever It Takes(tm) to get the system under test.

All systems have inputs and outputs. These can include:

  • accepting and sending messages over various TCP protocols
  • making database queries
  • writing and reading files from the filesystem

And all systems have domain languages including:

  • API specifications (documented or not)
  • "obviously intuitive" parameter orderings
  • expected file formats

Although these hidden languages are rarely elegant, identifying and using them makes the difference between safely adding tests to a system, and not.

Oh we can’t test that part of the system because it is a black box.

REDACTED

permalink
Boris Beizer in 1990, detailing the phenomenon Michael Feathers later described as “pinch points.”

Call them what you will, such points-of-separation in a monolithic system represent opportunities to imbue a monolithic legacy system with modern attributes!

Boris Beizer in 1990, detailing the phenomenon Michael Feathers later described as “pinch points.”

Call them what you will, such points-of-separation in a monolithic system represent opportunities to imbue a monolithic legacy system with modern attributes!

Mar
19th
Wed
permalink

Outages and service degradations are a NORMAL part of the software life cycle

Godzilla

Preventing all possible risks entirely can be shown to be mathematically impossible, besides which it would be practically impossible due to cost and time constraints.

Recovering from failure by contrast is a quality that contributes to the survival of the organization. In the words of Vince Lombardi: it is not how many times you get knocked down, but how many times you get back up, that determines the outcome.

Optimizing for rapid recovery and spending less time on preventing failures is therefore a very pragmatic attitude. Recovery is something that can potentially be practiced on a daily basis. And the ability to recover is easily quantifiable (eg: did the site come back up or not?)

Risk prevention by contrast cannot be measured in any meaningful way — how do you assess the value of outages that were prevented? If an outage did not occur, how do you determine that the outage was prevented versus just… not occurring… ?

In The ETTO Principle, Erik Hollnagel suggests that we should study success with the same alacricity with which we study failure. John Allspaw echoes this idea in many of his talks and the idea forms part of the foundation for the practice of holding blameless post mortems around production incidents.

It is worth noting that studying the daily minutia of everyday success (aka normal operation without any significant failures) — already exists as an academic discipline and is called Cultural Anthropology.

Implications for Software Quality Assurance

If as Hollnagel says, failures are special cases of success, then that invalidates the entire concept of QA and/or Release Engineering as “the cop.” All strategies that rely on enforcement, go out the window.

Enforcement is no longer an option because (as Hollnagel points out) the nature of complex systems is such that it is not possible to provide documented formal procedures for managing such systems. Success is necessarily an ad hoc affair. In general we accept this — except in the case of failure, where by convention we place blame on the first human who can be shown to have violated the documented procedures.

But if failure is instead systemic, then identifying a human as “responsible” for a failure doesn’t make sense. Modern software quality investigations must therefore move beyond the concept of human error and root cause.

Two kittens cooperating to deploy new code to production. Kitten power!

Mar
18th
Tue
permalink

Using static analysis to quickly become familiar with a large codebase

business cat says sort every file in the codebase from a to z and start reading

Static analysis tools automate the process of reading code. Any decent static analyzer can quickly process a large codebase and help you to find interesting entry points into reading the source code.

The point of static analysis is to help you find potentially interesting places to start reading the code.

—Sebastian Bergmann

Something I get asked to do a lot is look at suites of automated tests, with an eye toward improving architecture. In such a case I not only have to familiarize myself with the tests but with the codebase under test as well. And I have to be able to do that regardless of the size of the codebase. Static analyzers are the first tool I reach for in such cases.

Outlined below are the heuristics I have found effective for quickly figuring out where the important code is located, no matter how large or complex the codebase.

Start with source control.

  1. Check out a fresh copy of the project’s source code repository (repo).
  2. How long does the repo take to download?
  3. How big is the repo?
  4. How many branches does the repo have?
  5. How many committers?
  6. Who were the recent committers?
  7. When was the first commit made (how far back does this repo go)?
  8. What files were edited recently?
  9. How many files tend to be touched by a single commit?
  10. Does the volume of files touched follow a pattern over time?
  11. How much code has been deleted from the repo over time?
  12. Who has made recent significant deletions of code?
  13. Run gitk on the repo and look at the output.
  14. Run gource on the repo and watch the movie.

PEDANTIC SIDE NOTE: I’ve grouped these heuristics under the umbrella of static analysis because each heuristic can be performed (and is useful) without running any code. Technically, analysis of the Git log is Code Churn analysis and not static analysis. Meh.

Anyway…

Next, look at project layout.

  1. First, what project assets are not checked in to source control?
  2. What is the top-level directory structure of the project?
  3. If there is a README file, what is in it?
  4. What’s in the CHANGELOG, if there is one?
  5. What file extensions are in the project?
  6. How much static client side code is there (how many JavaScript, CSS and HTML files)?
  7. How many application code files are there? Eg for a PHP project, how many .php files?
  8. What is the overall directory structure of the project? In other words — spend some time paging through the output of tree | less
  9. How many files and directories are there in all?
  10. Are there tests?
  11. Do the tests use an open-source xUnit test runner or BDD framework?

Look for trivially obvious smells.

  1. How many template files are there?
  2. Is there business logic in the templates?
  3. Is there any SQL in the templates?
  4. In the Git log, the code and the comments, what is the distribution of swear words?
  5. What is the distribution of words like “bug,” “kludge” and “hack”?
  6. In general, what is the distribution of sentiment words? (Eg does the word “confusing” occur a lot?)
  7. How many lines of code are in the codebase?
  8. How many lines of code tend to be in a file?

Look for smells that can be trivially detected by an open-source tool.

  1. For each language in the code base, does every file in that language pass the language interpreter’s lint check?
  2. What kinds of and how many style violations are found when running the style checker for each language in its default configuration?
  3. What is the distribution of cyclomatic complexity of files?
  4. What is the distribution of npath complexity?
  5. How many unique tokens tend to be in a file?
  6. What is the ratio of comments to code?
  7. How many lines of code tend to be in a method?
  8. How many lines of code in a class?
  9. Duplicated code — how much and what is it? More importantly, why?

a cat that fell asleep trying to read the entire internet

permalink

What is the difference between testing and debugging?

woman riding on the edge of a surfboard

Debugging starts with broken code and works backward from there to get to working code.

Testing assumes the code works and builds on that.

Testing represents a real competitive advantage over debugging because testing is a practice that can be applied to working code rather than broken code. And large, successful Web applications are universally “grown organically” from smaller, working Web applications; again by applying a continuous stream of small improvements to working code.

The idea that large systems are grown from smaller systems is supported by Fred Brooks in No Silver Bullet:

One always has, at every stage in the process, a working system. I find that teams can grow much more complex entities… than they can build.

Brian Foote has also recognized this in his "Keep It Working" design pattern. And chapter 2 of Refactoring by Martin Fowler, makes explicit the relationship between well-tested code and working code.

This is not to say that the practice of debugging isn’t useful!

Code does break. In such cases, debugging skills are invaluable. Rather than eliminate debugging, the point is to use testing to minimize the amount of time spent debugging.

Additionally it’s worth noting that the class of tools called debuggers is useful for more than strictly debugging broken code. Debuggers are useful for stepping through code, inspecting the values held in memory, and many other activities that contribute to the central software engineering goal of checking one’s assumptions in every way that is feasible and as soon as is convenient.

The whole time I am programming, I’m testing my assumptions… I use whatever tools are available.

— Rasmus Lerdorf

an armadillo lizard biting its tail like an orobourous, to represent the idea of a feedback loop

Mar
8th
Sat
permalink

Liz Keogh on systems thinking.

Money quotes:

You can take apart a bicycle and put it back together again and it will work fine. But you can’t do that with a frog. A frog is complex.

An “adaptive complex system” — which is most human systems — is one where as soon as people start looking at it, they start changing the behavior of the system as well. So the system is changing all the time.

Einstein said that the definition of madness is doing the same thing again and again and expecting a different result. In the complex space, doing the same thing again and again and expecting the same result, is mad.

Liz Keogh

Other points of note:

  1. Domains have fuzzy borders. Don’t confuse “quadrants” or any set of explicit partitions, as the set of domains. There is no “set of domains” per se, as every domain bleeds into every other domain at the borders.
Mar
6th
Thu
permalink

Interview questions for Quality Assurance Engineers

The Voight-Kampff Test — exploratory technical testing at its most dramatic!

I recently stumbled across this list of interview questions from back when I was hiring Software Engineers In Test for a team I was leading.

"Technical QA" means a lot of different things to a lot of different people. I found that no matter how I wrote my job postings, I always wound up interviewing candidates with an astoundingly wide range of skills and experience ranging somewhere along the gamut from senior engineer to human-computer interaction researcher.

This is a list of ideas. I’d never ask all of these questions in an interview. I’m also the kind of person who likes to skip around and I wrote this list with that in mind.

However the questions are ordered roughly according to level of technical knowledge required to answer, starting with the least technical.

Here are 41 interview questions to ask when hiring QA Automation Engineers.

  1. When did you first start using a computer?
  2. What was the first code you ever wrote?
  3. When did you become interested in QA? (“Everyone has a different answer.” —Stephen Donner)
  4. Why are you interested in QA now?
  5. What blogs do you read to keep up with the QA industry?
  6. Describe the role of QA in the software life cycle: what does QA do and when does it happen?
  7. Why does software have so many bugs?
  8. Would it be a good idea to automate all the tests and dispense entirely with QA?
  9. How does a file system work? what is a file? how does a directory “contain” files?
  10. What happens when you empty the trash?
  11. Imagine you are testing the validation of a new user sign up form for a web site. what kinds of names should the validation reject? what about email addresses?
  12. How do Web sites protect credit cards? from being intercepted in transit? from being stolen off the server?
  13. What is daylight saving time?
  14. How do blind people use the Web?
  15. How good are you at figuring out how to do things using Google?
  16. What’s an example of a sophisticated thing you’ve taught yourself to do using only resources you found on the internet? eg play ukulele, knit a sock, provision an ec2 instance
  17. How good are you at communicating with other people? with engineers?
  18. How good at command line are you?
  19. How would you find a file on a Mac? using only the shell?
  20. How good of a programmer are you?
  21. What programming languages are you comfortable with?
  22. What is html? what is valid html? what is a table-based layout?
  23. What is css? the cascade? the box model?
  24. What is JavaScript?
  25. How can you tell if there is a JavaScript error on a web page?
  26. Imagine there is a JavaScript error — what can you learn about it, how much detail can you provide?
  27. Describe what happens after I type a URL in my browser and hit return. Explain how the browser and server get from that point to a fully loaded web page.
  28. Have you used Charles and/or the Net tab in Firebug/Chrome Inspector?
  29. What are “boundary conditions” aka “edge cases”?
  30. Why do people use “boundary” and “edge” to describe these types of problems? edge of what?
  31. What is a regular expression? what is a search pattern? a wildcard?
  32. How would you make the same change to two different files? to 2000 files?
  33. Have you used an SCM (for instance: git, svn, cvs, vss, ClearCase, Mercurial, Perforce)?
  34. Imagine you found two copies of the same file on your machine. how do you tell if the copies really are the same, or if one has some changes that didn’t make it into the other copy?
  35. How do engineers typically use source control branches?
  36. What does it mean to patch a web application?
  37. How does Google search work?
  38. How do password reset emails work?
  39. What are some ways that passwords get stolen? how would you advise me to keep my users’ passwords safe?
  40. What is a botnet? what is malware?
  41. How does malware get onto computers?
permalink

Test Automation Lessons, Learned The Hard Way

CEILING CAT IS WATCHING YOU DEPLOY

In 2009, I switched from front-end development to focusing on systems. I’ve since had the opportunity to help develop a range of tools including automation for new platforms and continuous integration systems for entire organizations. More importantly, I’ve gotten to use the tools I’ve built in production, and I’ve learned things. And by “learn things,” I mean I’ve made a lot of mistakes. Which I learned from.

Here’s 11 things I learned the hard way about implementing test automation for Web scale systems.

  1. Nondeterministic automated tests are worse than no automated tests at all.
  2. If you don’t know what a test does, it’s useless.
  3. Tests that make network connections are intrinsically intermittent.
  4. Simple tests help more than complicated tests.
  5. Every failed test is a context switch.
  6. Test failures must be actionable. Alerts for test failures that are not actionable are harmful.
  7. Humans under pressure cannot be relied upon to properly interpret a stack trace.
  8. Automation adds complexity.
  9. Writing tests is harder (and a lot more expensive) than not writing tests.
  10. "Automated testing" actually consists of two distinct features: the test codebase itself and the testability of the application under test.
  11. If you expect engineers to maintain the automated tests for a software application, implement the tests in the same language(s) as the application.
Feb
20th
Thu
permalink
Automated alerts such as Nagios, affect the lives of real people.

It is important but also challenging to design monitoring systems that are capable of emitting, for the most part, only actionable alerts. Designing such systems requires working with undecidable problems such as the Halting Problem.

Automated alerts such as Nagios, affect the lives of real people.

It is important but also challenging to design monitoring systems that are capable of emitting, for the most part, only actionable alerts. Designing such systems requires working with undecidable problems such as the Halting Problem.

permalink
Recently someone smart (can’t remember who) pointed out that the two inflection points on the McConnell-Stigler cost curve are probably due in part to people, particularly software engineers, leaving the organization over time.

Holding on to employees is a hard problem and I’ve found it very interesting to consider how engineer attrition might have contributed to rising costs in organizations where I have worked in the past.

But do note the emphasis above on “in part.” Cost curves are very high level abstractions that contain a lot of nuance. Attrition can only ever be one of many factors that affect the life cycle of an organization.

Recently someone smart (can’t remember who) pointed out that the two inflection points on the McConnell-Stigler cost curve are probably due in part to people, particularly software engineers, leaving the organization over time.

Holding on to employees is a hard problem and I’ve found it very interesting to consider how engineer attrition might have contributed to rising costs in organizations where I have worked in the past.

But do note the emphasis above on “in part.” Cost curves are very high level abstractions that contain a lot of nuance. Attrition can only ever be one of many factors that affect the life cycle of an organization.

permalink
The Pyramid of Testing should be revised to include expert testing.

The idea behind the “pyramid” motif is the same as the "food pyramid" which is used in the United States for teaching about nutrition: you need a lot of vegetables (bottom of the pyramid, which is large) and should eat just a little candy (top of the pyramid, which has the smallest area).

Instead of food, the  original Testing Pyramid identifies three rough classes of tests: small, medium and large, and then shows the relative amount of effort that should be put into each “testing tier.”

Given the flexibility and power of contemporary testing tools, it now makes sense to fully include expert testers in the feedback loop.  The tools are so powerful and the quick wins of automation are so simple to implement, that it no longer makes sense to leave manual testers out of the loop.

Or, said another way: the complexity of the test tools themselves has reached a level where designing a non-trivial automated test is now a job that requires a domain expert, if not several different domain experts. It absolutely does not make sense to leave expert testers out of such design discussions.

One final thing I’d point out about the diagram above is that it’s really really not to scale. The organization-specific scale of the diagram is… organization-specific :) But generally, measured by hours-per-worker, almost all effort should go into unit testing, with the other three disciplines taking up a tiny portion of the very top of the pyramid.

Note above I am talking very specifically about measuring testing effort in hours per worker.  It’s important to keep in mind what that means so I will give an example that illustrates why I think orders-of-magnitude more worker-hours should be expended on unit tests than on any other testing activity. It is actually very simple: in most organizations there are many more engineers than either QA analysts or technical testers. As long as every engineer is putting in some time writing tests each week, the long-term output from engineering should dwarf the effort from other groups.

For instance, if there are 10 engineers on the team and they write tests concurrently with code, and we assume something like 5 hours-per-week-per-engineer, then that is 50 hours of unit test development every week. The QA team for such an engineering team would at most be 3 people, who also have to attend meetings, perform manual tests, and do other non-automation-related activities every week. The amount of time a 3-person QA team spends actually writing test code probably hovers around 20 hours weekly, far less than the potential output from engineering; even though the engineers spend less time-per-person on testing activities overall.

The Pyramid of Testing should be revised to include expert testing.

The idea behind the “pyramid” motif is the same as the "food pyramid" which is used in the United States for teaching about nutrition: you need a lot of vegetables (bottom of the pyramid, which is large) and should eat just a little candy (top of the pyramid, which has the smallest area).

Instead of food, the original Testing Pyramid identifies three rough classes of tests: small, medium and large, and then shows the relative amount of effort that should be put into each “testing tier.”

Given the flexibility and power of contemporary testing tools, it now makes sense to fully include expert testers in the feedback loop. The tools are so powerful and the quick wins of automation are so simple to implement, that it no longer makes sense to leave manual testers out of the loop.

Or, said another way: the complexity of the test tools themselves has reached a level where designing a non-trivial automated test is now a job that requires a domain expert, if not several different domain experts. It absolutely does not make sense to leave expert testers out of such design discussions.

One final thing I’d point out about the diagram above is that it’s really really not to scale. The organization-specific scale of the diagram is… organization-specific :) But generally, measured by hours-per-worker, almost all effort should go into unit testing, with the other three disciplines taking up a tiny portion of the very top of the pyramid.

Note above I am talking very specifically about measuring testing effort in hours per worker. It’s important to keep in mind what that means so I will give an example that illustrates why I think orders-of-magnitude more worker-hours should be expended on unit tests than on any other testing activity. It is actually very simple: in most organizations there are many more engineers than either QA analysts or technical testers. As long as every engineer is putting in some time writing tests each week, the long-term output from engineering should dwarf the effort from other groups.

For instance, if there are 10 engineers on the team and they write tests concurrently with code, and we assume something like 5 hours-per-week-per-engineer, then that is 50 hours of unit test development every week. The QA team for such an engineering team would at most be 3 people, who also have to attend meetings, perform manual tests, and do other non-automation-related activities every week. The amount of time a 3-person QA team spends actually writing test code probably hovers around 20 hours weekly, far less than the potential output from engineering; even though the engineers spend less time-per-person on testing activities overall.