Enrico Fermi and Estimation
Enrico Fermi, Italian physicist and the creator of the world’s first nuclear reactor, taught students to estimate using examples like the number of piano tuners in Chicago. He even estimated the lower range of the first nuclear explosion using paper confetti. His estimate was 10,000 kilotons of TNT, whereas other measurements calculated estimates of 5-10, 18, 21, and recently in 2016, 22.1.
He created what we now know as a “Fermi Problem” or “Fermi Estimation.”
He was known to ask the following questions to his class in Chicago (the version I heard was slightly different than the referenced article):
Fermi: How many piano tuners are there in Chicago?
Students: We can’t possibly know that! We’d have to make a random guess.
Fermi: What’s the population of Chicago?
Students: 3 million.
Fermi: What do you think the average household size is?
Students: 3.
Fermi: how many house holds do you think have a piano?
Students: 1 in 10.
Fermi: how often do you think a piano needs to be tuned?
Students: once a year.
Fermi: How many pianos do you think a tuner can tune a day?
Students: 4-5.
Fermi: How many days out of the year does a piano tuner work?
Students: 250.
Fermi: Then how many tuners would it take to tune every piano in Chicago once a year?
Students: between 50 – 200 tuners, depending on the values above.
The real answer was nearly within that range, I believe it was 48.
What this method reveals to us is that we often have much more knowledge about a problem than we think we have, and an educated guess is probably better than nothing at all. Using this method of breaking down a problem into pieces we can make a very educated guess. Even more importantly, this method reveals which factors would be instrumental in making that guess more accurate. We can then determine the feasibility of gathering more data for those factors. As I’ll describe later, this technique can be applied to software testing in risk assessment and estimating test scope.
S. S. Stevens and Measurement scales
S. S. Stevens wrote a paper published in Science in the 1940’s discussing four different measurement scales, how the different scales have different meaning, and different math can be applied to each. One of those scales is the Ordinal.
Ordinal: Whether a value is greater or lesser in some object compared to another
Movie ratings are examples of ordinal scales. You cannot perform any mathematical operation on these scales, for instance, four one-star movies are not equal to one four-star movie. But it does give us a rough subjective idea of whether one movie is better than another.
Another example is Mohs scale of mineral hardness. This scale just tells us if a mineral is harder or softer, but not by how much.
Ordinal Risk Measurement for Software Testing
Using Fermi’s estimation technique we can describe a generic testing tactic which is tracked with an ordinal measurement system.
Begin by assessing current knowledge
Fermi cycled through what the students knew or could make educated guesses about to build a case for what they thought they knew nothing about. In a similar way we can design a test strategy around a software change. The strategy will be designed to reduce our uncertainty, in the same way as Fermi’s students might have researched the real number of average households that have a piano, how many pianos a tuner can tune in a day, etc. What do we know?
- People’s claims
- Code
- Domain knowledge
- Testing skill
These are the four main categories of knowledge we have for a software testing strategy. People’s claims might be requirements or conversations. Code could be existing as well as changed code, hopefully trackable and visible to you as the tester. Your domain knowledge will help you tie those claims and code changes to business functions, flows, data, and users. Your toolbox of testing skills will help you reduce the uncertainty of how those code changes affected the product. Your test ideas will be information-gathering forays to reduce the uncertainty and make a more accurate risk estimation.
Assess risk and scope through estimation
- General danger of the change as related to business model and product design.
- General areas of the product that might be affected, and how they might be affected.
- Testing time necessary to sufficiently reduce uncertainty.
Domain knowledge helps estimate whether a change is related to an important revenue generating business workflow or is a probably-harmless change. This assessment helps to indicate test effort. Ability to read code, claims, and domain knowledge help us to estimate testing scope. They also help us to estimate initial tests to rapidly reduce uncertainty and cover risky areas that could be affected. These estimates in turn help us to estimate the necessary testing effort.
Strategically test to reduce the uncertainty as quickly as possible
Remember how working through the Chicago tuners problem helped identify what pieces of information would make a guess more accurate? Design your testing to quickly reveal those pieces of information that will better inform your estimation of risk and scope. Your goal should be to have evidence and logic based confidence in your estimated testing scope and that the risk is low. You can track your confidence through subjective ordinal scales, like movie ratings. You can apply these scales to the product at large, individual features, and individual claims.
As you test, you’re always on the lookout for information which will make those risk and scope estimations more accurate.
Reassess, restrategize, and repeat until your Fermi risk estimation is low
Testing is a cyclical process of knowledge assessment geared towards estimation of risk and reduction of uncertainty. It uses strategies to rapidly reduce that uncertainty and knowledge gathered through testing to constantly refine the strategies.
Like the hardness scale or movie rating system, a relational scale for estimating or measuring risk might be helpful to determine when we can stop testing and why we might want to keep testing. For instance, I might use a scale of 1-4 for general risk on tickets, individual scales for claims on the tickets, and even areas of the application. A risk level of 1 means I know very little. Either way you count, up or down, doesn’t matter. By the time I get to 3-4, I’m feeling more confident, like it could be released. If my gut tells me the risk is at a 3, then I can ask myself, like Fermi, why isn’t it a 4? Like his students might have needed to find out how many pianos can be tuned in a day, I ask how can my next tests get me to the next confidence level? Will those tests be cost effective, or should I be satisfied with a 3? Can I use a couple cheap tests to reduce that uncertainty somewhat and bring in more info to reassess? I’ll use Fermi’s approach of breaking down problems into pieces of what we do and do not know to craft a strategy to reduce the uncertainty as rapidly as possible. As I execute the strategy, I’ll frequently reassess the risk and the strategy in light of what I’m learning from the testing, and update the estimation level to track progress and help me compartmentalize what I need to learn.