Problem Checking

Michael Bolton’s recent article on test focus has important implications in automated check design:

So rather than focusing on correctness, I’d recommend that you focus on problems and risk instead.

…if you focus on correctness, that’s what you’ll tend to see. You will feel an urge… [to] demonstrate success.

…if you focus on risk and search diligently for problems and you don’t find problems, you’ll be able to make reasonable inferences about the correctness part for free!

This article is pure gold for all kinds of testing, but I’d like to fill in something which Bolton didn’t go into detail about.

You can design tens of thousands of automated UI checks (Selenium, Cypress, etc.) and not be able to prove correctness. You can, however, design just a handful of checks which would prove incorrectness. Here’s how.

Identify critical functions, things that would really matter if they failed. For the time being, ignore the kinds of functions which would be tolerable if they failed, as long as they were fixed pretty fast. For each critical function, design the minimum number of interactions necessary with the system necessary to prove if there were a serious error in that function. Often, for me, this means checking for some value in a response after a web form is submitted. You don’t need to bother writing a locator for the exact element that value is supposed to appear in: if there were an important failure, that value wouldn’t be returned at all.

When “setting the stage” for this check, bear in mind “the stage” itself is not the test subject. You’re writing a check for a problem in a specific critical feature. This check should not fail if there’s a problem in some other feature. Whatever UI elements and features necessary to setup this case should be the subjects of different checks. Keep UI interactions with them to a minimum.

This approach reduces the number of locators you need to use, decreases the length of each check, and keeps the checks isolated and independent from eachother. Nikolay Advolodkin calls this pattern “Atomic test design” and explains it more here.

The Quality Relationship: What is it?

Now that we have an idea who is involved in the quality relationship, it’s time to ask exactly what this relationship is.

As mentioned before, the quality relationship is between a user and a product. What determines the status of this relationship? How can it be judged good or bad? I believe these attributes of the user and the product largely determine the status of the relationship:

  1. Goals
  2. Feelings
  3. Product properties

Goals

Goals can be objective, subjective, or somewhere in between. Objective goals are like those towards the bottom of Maslo’s “Hierarchy of Needs.” Others are totally subjective, as “In matters of taste there can be no disputes.” There also exists a huge middle ground of semi-objective goals implicated by a particular cultural system. This system is a set of structures for solving existential problems while affording opportunity for play, science, and arts. These “Secondary Existential Goals” are necessary to function within a cultural system, which system satisfies the primary existential goals. One of the most common secondary goals in any cultural system is to acquire money. Another common secondary goal where I live is to be able to travel very quickly.

Product Properties

Products are solutions for goals. For a mass-produced product to be successful, I assume the goals it’s designed for must be shared by a large number of people. If true, this simple fact reveals that a product’s quality is not just its relationship with any one individual. Quality is in this context is founded on the relationship between goals and specific product properties which assist or detract from those goals.

Objective lines can be drawn between product properties and the satisfaction of those goals, abstracted from any specific user. These connections are objective manifestations of quality. As such, they are identifiable. If those connections are identifiable, they can be identified in some amount or to some degree. If they can be identified to some degree, that degree can be higher or lower. If so, it is measurable. Doug Hubbard calls this the “Clarification Chain” in his book, “How to Measure Anything.”

Feelings

The objective aspects of quality as defined above do not paint the whole picture. Using a product to fulfill a goal can elicit feelings anywhere on the emotional spectrum for a variety of unpredictable reasons. These feelings lend a certain subjectivity to all goals, and render a comprehensive objective assessment of quality impossible. Thankfully people share enough of a common human experience to make educated guesses on how a product will make their users feel, although this isn’t always the case.

In Conclusion

I believe this understanding of the quality relationship means it is possible for rough, fallible, ordinal measurement of quality through analysis of the relationship between product properties and goals. In fact, I believe I use this measurement constantly as a software tester. I hope my attempt in this article to make this tacit measurement more explicit is useful. In fact, in my next post I hope to showcase a measurement system I’ve designed based on this understanding of the quality relationship to help assess product quality.

Who is in the Quality Relationship?

Quality is a Relationship

Quality is described in the Rapid Software Testing school as a relationship rather than objective properties of a product.

It took me a while, but I’m beginning to understand: quality often seems objective because we are able to infer the goals and desires of users. We naturally identify properties of a product which seem likely to detract from those goals or desires. These properties then become representative of the quality. We possess this ability to infer quality from our shared human experience.

At the moment I’m interested in refining the definition of quality a little more specifically. My first question is, “Who is in the Quality Relationship?”

Who is in the Quality Relationship?

If shareholders are those responsible for producing a product, and users are those who use that product for their own goals, is the quality relationship between all three members? Are there multiple quality relationships between the different members? Or is the quality relationship specifically between users and a product?

Case One:

For instance: a user might find a product of extremely high quality for their purposes.

However, the company that made that product goes under because the cost of that quality was too high.

I don’t think it’s practical to say that the quality of the product was “bad” because the company went under.

Case two:

Another instance: a company releases a product because they wanted to meet a deadline. The product was not suitable for the users’ purposes and they wanted their money back.

I don’t think it’s practical to say that the quality of the product was “good” because the company released it on a certain date.

Variation of Case two:

Let’s say this company had great customer service, and those customers were satisfied in the end: enough to continue doing business with them.

In this case I believe we have a fourth member, a second product: the customer service. In that regard, the relationship is still between the users and product.

Preliminary Conclusion

Quality is a relationship between a product and users.

What other cases or arguments are there? Please let me know!

How to Test

Premise

We can never know if there is no risk since we can never test all software behaviors. I wrote about this idea in the Tenets of Testing.

A huge part of testing is identifying what areas of the product pose significant risk vs which ones pose acceptable risk.

Within those areas, we must choose specific software behaviors (actions, data, conditions) to check, or “sample.” Those sampled behaviors/conditions represent the potential risk for the product.

To estimate the risk areas and create accurate samples we apply sensemaking, estimation, and our gut instinct to what knowledge we have available.

Interactions with the product continually add to this available knowledge, allowing continually more refined estimation and risk assessment.

Below I describe how a tester might test a new work item or “ticket.”

Define the Mission

  1. Goal: what is the goal of this ticket in terms meaningful to me, my understanding of the product, the business, the real world?
  2. Risk: what risks might there be as a result of the changes (i.e. business logic, code, infrastructure) necessary to meet the goal?
  3. Scout: can I get my hands dirty immediately and witness successful (or unsuccessful) fulfillment of major requirements? The information I can quickly gather in this exercise will make further test planning much more accurate.

Product Elements in Scope

What product elements are in scope, that is, what product elements seem to be connected in some way to the goal?

  1. Features
  2. Actions
  3. Business logic
  4. Data
  5. Settings/Contexts
  6. Devices
  7. Real world conditions

Some elements will be connected much more directly than others. However, in limiting testing to obviously connected features and behavior, we may miss a large range of bugs which occur in less obvious connections within the product.

Variations of Elements

What variations of those product elements should I definitely make sure to check?

List out and organize your ideas. I like to first list the product elements and then list the variations under each element. We’re creating a Product Coverage Outline.

  1. Positive and negative values
  2. Boundaries
  3. Different sequences/workflows/locations
  4. Devious values, circumventing interfaces
  5. Much more

Are there variations/product elements that seem a little more removed from the goal, but which appeal to my gut instinct or logic as being a good idea to quickly check?

Interaction Design

How can I interact with the product to efficiently cover combinations of the product element variations that I think are important?

As I execute these interactions, how does the information I’m gathering help redefine my current knowledge?

As I test, I’m gathering 3 main categories of information:

  1. Potentially curious/negative behaviors
  2. Information about my chosen scope (I could observe things about the product which logically or intuitively make me realize they also deserve to be checked)
  3. Information about the effectiveness of my testing: is my uncertainty increasing or decreasing? Why?

Definition of Done

This cyclical process continues until I “feel” confident that the risk is low.

How do I know if my feeling is accurate? To test myself, I try to make a sound argument that I witnessed a sample of system behaviors truly representative of the risk. I can be fooled: but the ability to accurately assess when you’re done is a big part of what makes a good tester.

Oftentimes making this argument to myself reveals new test ideas which should also be checked.

Production Bugs

Production bugs are invaluable “tests” of our testing process.

By isolating the ticket that introduced a production bug and reviewing the testing on the older ticket in light of the new bug, we gain immense insight into holes in our thinking and aspects of product behavior which will make future testing better.

Tenets of Software Testing

  1. A software product’s behavior is exhibited by interactions.
  2. There is potentially an infinite number of possible behaviors in software.
  3. Some of those behaviors are potentially negative, that is, would detract from the objectives of the software company or users.
  4. The potentiality for that negative behavior is risk.
  5. It’s impossible to guarantee a lack of risk as it’s impossible to experience an infinite number of behaviors.
  6. Therefore a subset of behaviors must be sampled to represent the risk.
  7. The ability to take an accurate sample, representative of the true risk, is a testing skill.
  8. A code change to an existing product may also affect the product in an infinite number of ways.
  9. It is possible to infer that some behaviors are more likely to be affected by that change than others.
  10. The risk -of that change- is higher within the set of behaviors that are more likely to be affected by that change.
  11. The ability to accurately estimate a scope of affected behavior is another testing skill.
  12. The scope and sampling ideas alone are meaningless without empirical evidence.
  13. Empirical evidence is gathered through interactions with the product, observation of resultant behavior, and assessment of those observations.
  14. The accuracy and speed of scope estimation, behavior sampling, and gathering of evidence are key performance indicators for the tester.
  15. Heuristics for the gathering of such evidence, the estimation of scope, and the sampling of behavior are defined in the Heuristic Test Strategy Model.

These tenets were inspired by James Bach’s “Risk Gap” and Doug Hubbard’s book “How to Measure Anything.” Both Bach and Hubbard discuss a very similar idea from different spaces. Hubbard suggests that by defining our uncertainty, we can communicate the value of reducing the uncertainty. Bach describes the “knowledge we need to know” as the “Risk Gap.” This Risk Gap is our uncertainty, and in defining it, we can compute the value of closing it. In testing, I realized we have three primary areas of uncertainty: 1) what is the “risk gap,” or knowledge we need to find out, 2) how can we know when we’ve acquired enough of that unknown knowledge, and 3) how can we design interactions with the program to efficiently reveal this knowledge.

In my own testing experience, I am in a constant cyclical process of defining and reducing uncertainty.

Another Expression of the Risk Gap Technique

Close the risk gap: find out what we need to know

“The Risk Gap” – courtesy of James Bach, www.satisfice.com

This three step strategy aims to provide a framework or story within which to apply the catalog of heuristics found in the HTSM.

1. Estimate the gap

Estimate testing scope based on current knowledge

We need to know the boundaries of the gap in order to close it. Those boundaries can be expressed in terms of testing scope.

  • Scope is 4 dimensional:
    • Product areas
    • Functions and logic
    • Data
    • Time
  • Estimation is done via extrapolation of current knowledge in light of the system and real world:
    • Claims
    • Code
    • Domain knowledge
    • Testing skill
  • Estimate becomes more accurate over time, as current knowledge increases via testing
  • The estimation itself is an important skill

2. Explore the gap

Design and execute experiments

Design interactions, or just jump right in, to evaluate the accuracy of the estimation and shed light on system behavior.

  • Play and exploration
  • Paths, workflows, data variations, other conditions
  • Design and preparation of conditions

Testing artifacts: plans, mind maps, test cases, and more, exist as aids to keep our minds organized and communicate our findings with others.

elements_of_testing

Experiment results

Results increase current knowledge, enabling more accurate estimation, until the estimates transform into sufficiently certain conclusions.

  • Is the scope estimate accurate? Is it wider than we thought? Narrower? Deeper? Shallower?
  • Are there bugs?
  • Is our technique working well enough? Will a modification to our approach work better?
  • Is the product testable? Why or why not?

3. Close the gap: create a logical evidence-based argument

We repeat the above 2 steps, estimation and exploration, until we can close the risk gap, that is, have discovered everything we’ve inferred we need to know.

To close the gap means to create an evidence and reason-based argument that the level of risk in the system is sufficiently low. A good argument can stand up to serious scrutiny.

Logic is necessary to incorporate results of tests within a holistic system model so that each test provides meaning towards the risk evaluation. We must apprehend the system behavior during testing, make a judgement on its meaning, and infer the significance of that meaning to adjust our system model.

The system model is largely composed of the real world goals, the users we offer solutions to, the processes and abilities we implement to allow our users to achieve their goals, the business logic that seeks to reduce errors and human effort, the data that we process and its relationship to the real world, changes made to the code and other parts of the system, and real world circumstances such as those within which we work and those within which users will engage the product.

As part of an agile team, I focus testing almost exclusively around changes made to the system as tracked in each ticket in reach release cycle, relying on the results of previous testing that the general level of risk in the system before a given change is already sufficiently low. This trust in the results of previous testing allows a greater focus on testing changes, which creates a beneficial feedback loop: as each change is tested more thoroughly, those areas are demonstrated to be more stable than could otherwise be possible if more time was taken in regression testing.

Production bugs are tests of my risk gap evaluations. They mean that my estimates missed an important set of conditions or area of the product that was affected by a change I tested. Just as a developer fixes bugs in code, I can learn from these bugs in my risk estimations to make future estimates more accurate.

Testing is a process by which we transform from estimation to conclusion, less certain to more certain. Testing is like both a Fermi problem and the scientific method.

Risk Gap Model

The Risk Gap is what we need to know before we ship. Testing closes the gap via defining its dimensions (testing scope) and filling those dimensions with data (test results) via application of testing techniques.

These are rough notes which, when filled out, I hope will describe a general testing practice that is applicable to many modern software development teams. These principles are abstractions of how I perceive I actually test, whether I consciously think through them or just execute them subconsciously.

How do we determine what it is we need to know?

Estimation of scope via extrapolation on current knowledge

“If X is changed, then I know I need to check behaviors A and B under conditions N and M. I might also need to check condition O. My memory of behavior A is a little fuzzy, testing it might reveal that more behaviors could be affected, like D and E.”

– subconscious thought process occurring within seconds of reading a ticket description
  • This estimation can happen instantly, as soon as we read a ticket description we can think of a couple tests that could reveal serious errors.
  • It can also take longer to determine, and is strongly affected by actual knowledge gained through interactions, meaning interactions often shouldn’t be postponed until a great plan is ready.
  • Estimation becomes more accurate as more knowledge is available.
  • Knowledge categories:
    1. Interactions with the product
      • Real time interaction is the most certain and up to date form of knowledge available
    2. Claims
      • Conversations
      • What the application is telling you through its interface
      • Documentation
      • Requirements
      • Comments
    3. Code
      • Extent of code change
        • Number of files
        • Number of changed lines in files
        • Context of change within files
        • Object names within the code
        • Conditions within the code
        • Files in context of application structure: common, specific names, folder locations
    4. Domain and Real World Experience
      • Purposes of application workflows in connection with real users and their real goals
      • Usage of data in various locations and processes
      • Historic logic
      • Infrastructure, databases, APIs, models of system architecture
      • Interconnection of settings and variations based on data
    5. Testing Skill and Experience
      • Experience of “gotchas”
      • Modeling
      • Knowledge of techniques (HTSM)
      • Diving right in

Design of tests to shed light on and in the scope

  • Design tests to reveal information within the estimated scope.
  • Again, interactions with the product don’t necessarily need to wait long, and knowledge gained through them can help the planning better quickly.
  • Many test techniques on HTSM
  • Quickly reveal knowledge through interaction: don’t need to wait for a great plan: the sooner we have test results the sooner we have feedback and the sooner we can make the plan better
    • Cursory skimming of an item to test can reveal enough information to start testing right way and find bugs immediately
    • Immediate interaction with the product provides context for the claims in the ticket
  • Aspects of testing
    • Ability to absorb what’s on the page, meaning of words, available functions, possibilities for data variations, extrapolations on those meanings
    • Test techniques such as combinatorial, functional, user, etc. See HTSM
    • Usage of Dev Tools and other various tools
    • Knowledge of HTTP and how to manipulate the network
    • Ability to rapidly interact and absorb information while creating a mental model and getting inspired
    • Discipline to stay focused and get through occasional drudgery

Refinement of estimation given new knowledge

  • Testing reveals three important types of knowledge
    • Whether the scope is accurate: may need to be expand, contracted, made more specific, etc.
    • Finding bugs: things that violate pertinent oracles and reveal possibility of higher risk and deeper testing around the risk
    • Feedback on success of testing technique and testing plan
  • The product reveals many things to us: we connect meanings from the ticket to meanings gained from interaction with the product and assimilate both into our system model

How do we know when the risk gap is closed

We can’t. We can estimate when enough is enough. When our estimation of scope is confirmed and when we logically perceive the scope is filled with sufficient test data, we can start to consider we’re done. Both scope and completion of the scope must be ascertained with logic inextricably tied to the specific project context, aspects of the product we’re working on, and assessment of the changes being made.

  • Estimating remaining risk tells us when we can reasonably be done for now
    • Estimation based on logic applied to facts (previous knowledge combined and assimilated with new knowledge gained through testing)
    • Can the facts gathered be organized together with a sufficiently strong logical argument that the level of risk is low enough for the particular situation?
  • Actual prod bugs reported tells us over time how well we’re doing
  • Finding bugs when going back to areas we tested tells us how well we’re doing
  • Testing skill and Domain knowledge builds up over time, which remind us of instances of problems and circumstances which we might recognize again, allowing us to act on them

Risk-Gap Test Strategy

Introduction

James Bach defined the “risk gap” as what we need to know before we ship a product. Testing seeks to close this gap, to find what we need to know. Well, how do you find out what this gap even is? How do you find out what you don’t even know you need to know? We can never quite know for certain: but we can gain a level of confidence through estimation and assessment, reasoning and evidence.

Summary of the Method

“The purpose of testing is to close the risk gap.”

In order to close the gap, we have to have some idea of its boundaries. Therefore, testing can be described in terms of 1) identifying the dimensions or boundaries of the risk-gap (also known as scope) and 2) determining if the gap is sufficiently closed.

Without defining boundaries for the risk gap we can’t determine when it’s closed. In a similar way, if I wanted to measure a garden, I’d need to know where the edges of the garden are. In the beginning those boundaries are rough estimates. We can make them more accurate as we gain new knowledge through testing.

To close the gap we need to determine the questions (tests) which will give us sufficient evidence and reason-based confidence that the gap we defined is accurate and actually closed to a sufficient degree (where “sufficient” depends on context, client, usage, etc.)

Steps

1) Estimate the dimensions of the risk gap (testing scope) given current knowledge.

Using our current knowledge we estimate a testing scope. What are we going to test and how? What features, data, and conditions do we think we need to check? This scope is the boundaries or dimensions of the risk-gap. The scope, or gap, becomes more accurate as we gather new information through testing.

Current knowledge comes in several primary buckets:

  1. Claims
  2. Code
  3. Historic knowledge, like domain expertise
  4. Testing skill

Using that knowledge, we estimate what dimensions of the product need to be tested. We know immediately we’ll need to check the claims, and probably will need to check them under different conditions. We might be able to expand or narrow the scope from analyzing the code change. Domain expertise includes knowledge of how the product is structured, causing us to estimate that if a certain feature is affected, another feature which involves the same data could be affected. Our testing skill will help us recognize the relevancy of different product dimensions and how they relate to claims and code. If you have little domain knowledge, you can start out with a playful product survey to get your bearings.

Examples of high-level dimensions which form the risk gap are:

  1. Area (parts of the product, different features)
  2. Functions given to the user by the application
  3. Data
  4. Conditions: users, permissions, time, etc.
  5. See HTSM for a list of many more dimensions

2) Design your test strategy to quickly shed light on the gap

What interactions with the product will help us rapidly assess the accuracy of our estimates? What interactions will reveal the knowledge we need to know before we ship?

The HTSM also lists many test strategies which, through experience and trial and error, a tester will know to apply to certain situations.

Keep in mind that testing has 3 purposes, all related to gathering knowledge:

  1. Shedding light on the accuracy of the estimated risk gap/scope
  2. Investigating and defining potential violations of pertinent oracles (finding bugs)
  3. Providing feedback on the efficacy of the test strategies being used (and testability of the product)

3) Reassess and Repeat with new knowledge

As your knowledge increases through testing you can make the risk-gap boundaries more accurate. (wider than you thought, smaller than you thought, more specific than you thought, etc.)

As you experience the product you will notice potential violations of pertinent oracles, which will indicate higher risk and require more testing.

Similarly your experience testing will indicate the effectiveness of your strategy and the testability of the product.

When combined, all 3 categories of knowledge gained through testing will make your subsequent test activities more effective (i.e. gathering more of more pertinent information)

You repeat this cycle until your reason demonstrates that the evidence gathered is sufficient to close the risk gap given the context and purpose of the testing.

4) Constantly assess your confidence on the dimensions and status of the risk-gap.

You need both evidence and reason to judge the dimensions and status.

My gut often gives me a feeling that more needs to be done or that I’ve tested enough.

You need to qualify that feeling by logically connecting the evidence gathered to the nature of the product/product-change or reason that you’re testing.

5) Testing skill improves this process

Over time as you get more experience testing, you become better at estimating scope, at recognizing test techniques to use, at using tools to help you, at using the smallest number of tests to extract the largest amount of information, at recognizing common problems or ways that you can be fooled.

Artifacts of Testing

Notes, mind maps, test cases, etc. are all means to document and track our journey to help us stay organized and focused. Early on when the amount of unknown is great, it may be a waste of time to create extensive documentation. A few early tests may reveal information which invalidates our documents or reveals whole other aspects that require us to start over anyway.

Testing Through Risk Estimation

Enrico Fermi and Estimation

Enrico Fermi, Italian physicist and the creator of the world’s first nuclear reactor, taught students to estimate using examples like the number of piano tuners in Chicago. He even estimated the lower range of the first nuclear explosion using paper confetti. His estimate was 10,000 kilotons of TNT, whereas other measurements calculated estimates of 5-10, 18, 21, and recently in 2016, 22.1.

He created what we now know as a “Fermi Problem” or “Fermi Estimation.”

He was known to ask the following questions to his class in Chicago (the version I heard was slightly different than the referenced article):

Fermi: How many piano tuners are there in Chicago?

Students: We can’t possibly know that! We’d have to make a random guess.

Fermi: What’s the population of Chicago?

Students: 3 million.

Fermi: What do you think the average household size is?

Students: 3.

Fermi: how many house holds do you think have a piano?

Students: 1 in 10.

Fermi: how often do you think a piano needs to be tuned?

Students: once a year.

Fermi: How many pianos do you think a tuner can tune a day?

Students: 4-5.

Fermi: How many days out of the year does a piano tuner work?

Students: 250.

Fermi: Then how many tuners would it take to tune every piano in Chicago once a year?

Students: between 50 – 200 tuners, depending on the values above.

The real answer was nearly within that range, I believe it was 48.

What this method reveals to us is that we often have much more knowledge about a problem than we think we have, and an educated guess is probably better than nothing at all. Using this method of breaking down a problem into pieces we can make a very educated guess. Even more importantly, this method reveals which factors would be instrumental in making that guess more accurate. We can then determine the feasibility of gathering more data for those factors. As I’ll describe later, this technique can be applied to software testing in risk assessment and estimating test scope.

S. S. Stevens and Measurement scales

S. S. Stevens wrote a paper published in Science in the 1940’s discussing four different measurement scales, how the different scales have different meaning, and different math can be applied to each. One of those scales is the Ordinal.

Ordinal: Whether a value is greater or lesser in some object compared to another

Movie ratings are examples of ordinal scales. You cannot perform any mathematical operation on these scales, for instance, four one-star movies are not equal to one four-star movie. But it does give us a rough subjective idea of whether one movie is better than another.

Another example is Mohs scale of mineral hardness. This scale just tells us if a mineral is harder or softer, but not by how much.

Ordinal Risk Measurement for Software Testing

Using Fermi’s estimation technique we can describe a generic testing tactic which is tracked with an ordinal measurement system.

Begin by assessing current knowledge

Fermi cycled through what the students knew or could make educated guesses about to build a case for what they thought they knew nothing about. In a similar way we can design a test strategy around a software change. The strategy will be designed to reduce our uncertainty, in the same way as Fermi’s students might have researched the real number of average households that have a piano, how many pianos a tuner can tune in a day, etc. What do we know?

  1. People’s claims
  2. Code
  3. Domain knowledge
  4. Testing skill

These are the four main categories of knowledge we have for a software testing strategy. People’s claims might be requirements or conversations. Code could be existing as well as changed code, hopefully trackable and visible to you as the tester. Your domain knowledge will help you tie those claims and code changes to business functions, flows, data, and users. Your toolbox of testing skills will help you reduce the uncertainty of how those code changes affected the product. Your test ideas will be information-gathering forays to reduce the uncertainty and make a more accurate risk estimation.

Assess risk and scope through estimation

  1. General danger of the change as related to business model and product design.
  2. General areas of the product that might be affected, and how they might be affected.
  3. Testing time necessary to sufficiently reduce uncertainty.

Domain knowledge helps estimate whether a change is related to an important revenue generating business workflow or is a probably-harmless change. This assessment helps to indicate test effort. Ability to read code, claims, and domain knowledge help us to estimate testing scope. They also help us to estimate initial tests to rapidly reduce uncertainty and cover risky areas that could be affected. These estimates in turn help us to estimate the necessary testing effort.

Strategically test to reduce the uncertainty as quickly as possible

Remember how working through the Chicago tuners problem helped identify what pieces of information would make a guess more accurate? Design your testing to quickly reveal those pieces of information that will better inform your estimation of risk and scope. Your goal should be to have evidence and logic based confidence in your estimated testing scope and that the risk is low. You can track your confidence through subjective ordinal scales, like movie ratings. You can apply these scales to the product at large, individual features, and individual claims.

As you test, you’re always on the lookout for information which will make those risk and scope estimations more accurate.

Reassess, restrategize, and repeat until your Fermi risk estimation is low

Testing is a cyclical process of knowledge assessment geared towards estimation of risk and reduction of uncertainty. It uses strategies to rapidly reduce that uncertainty and knowledge gathered through testing to constantly refine the strategies.

Like the hardness scale or movie rating system, a relational scale for estimating or measuring risk might be helpful to determine when we can stop testing and why we might want to keep testing. For instance, I might use a scale of 1-4 for general risk on tickets, individual scales for claims on the tickets, and even areas of the application. A risk level of 1 means I know very little. Either way you count, up or down, doesn’t matter. By the time I get to 3-4, I’m feeling more confident, like it could be released. If my gut tells me the risk is at a 3, then I can ask myself, like Fermi, why isn’t it a 4? Like his students might have needed to find out how many pianos can be tuned in a day, I ask how can my next tests get me to the next confidence level? Will those tests be cost effective, or should I be satisfied with a 3? Can I use a couple cheap tests to reduce that uncertainty somewhat and bring in more info to reassess? I’ll use Fermi’s approach of breaking down problems into pieces of what we do and do not know to craft a strategy to reduce the uncertainty as rapidly as possible. As I execute the strategy, I’ll frequently reassess the risk and the strategy in light of what I’m learning from the testing, and update the estimation level to track progress and help me compartmentalize what I need to learn.

Tool Effectiveness

Recently I’ve seen quite a few questions around automation tool effectiveness. What are some KPI’s? How do I know if it’s efficient?

This question really got me thinking: how do we determine the effectiveness of any tool?

Now that question got me thinking, what’s the most effective tool of all time?

A hammer.

A hammer works so well because it amplifies human effort. It’s application is limited to our imaginations, it works for any task that needs a sudden jolt of precise force. One very common usage is to drive a nail. That led me to think of a nail gun.

Have you seen a construction crew framing a house, putting plywood on a roof, or molding in a room? Nail guns turn tasks that would take minutes into tasks that take seconds. Nail guns are portable, convenient, easy to use, and enormously reduce human effort.

How does the nail gun accomplish that reduction in effort?

Instead of amplifying human effort, it replaces that effort. It has to supply an energy source. It has to have a feeding mechanism for the nails. It needs safety features to prevent lawsuits. In addition to requiring all these features, it has a much more limited set of tasks it can be used for than a hammer.

We see here a correlation between tool complexity, task specificity, and human replacement. As a tool aims to amplify human effort, it is often simpler and be applied to a broader range of tasks. As the tool aims to replace human effort, it necessarily increases in complexity and specificity of task.

Tools are balanced between replacement and enhancement of the human effort.

The balance between replacement and enhancement of human effort is very important to consider in tool design. The more of the human element you want to replace, the more exactly you must define the task and the more complex you must make the tool.

The following chart maps the relationship and plots a couple sample tools. Their exact placement on the diagram is mostly arbitrary just to give a general idea.

Tool Design