How to Design Automated Checks

“Confidence through knowledge through checks through a tool”

We want automated checks to inspire some confidence in a shorter period of time than could be attained by multiple testers

  • Is the time really shorter?
  • How do we create a report which communicates what was checked and what failed?
  • How long does it take to understand a failure and act on it?
  • Do the resources spent on the checks exceed those of testers spending time performing the same regression?
  • Is an increase in cost acceptable if the return on the investment is a large reduction in the execution time necessary to inspire confidence?

What knowledge do we need to inspire confidence

The whole question of value of an automated checking framework directly relates to the knowledge that it imparts.

  • Do we want general, high level, highest risk knowledge?
  • Do we want specific knowledge on features?
  • Do we want specific knowledge on end to end workflows for different user types?

What checks are necessary to create that knowledge

The checks we use depend on what knowledge we hope to gain.

The effectiveness of the check directly relates to its design and the design of its framework.

Design is therefore the ground level of the value of an automated solution.

  • The design needs to be directly mapped to the knowledge it’s supposed to impart, and have super effective methods for reporting the knowledge.
  • The design is based on context of the app, the feature, the available data, environment, to name a few.
  • A good design requires both system and business knowledge.
  • Design should be as simple as possible to support the checks.
  • Simplicity invokes less complexity, increased complexity invokes greater risk of failure.
  • Simplicity is quicker, maintainable, fixable, extensible, updateable, cheaper

What tools would be appropriate in our context to build those checks

If check design is ground level, the tool is the basement.

  • Its goal is to support the design of the check.
  • It needs to be the easiest and most maintainable possible solution that will support the necessary checks’ reporting and execution.
  • It should be extensible for future needs. Sufficiently simple, but not limited.


I came across this article today. Homeschooled from preschool to college, I felt compelled to make a response:

From the article:

Children should [not be homeschooled because they will not] “grow up exposed to…democratic values, ideas about nondiscrimination and tolerance of other people’s viewpoints.”

By “some estimates” 90% of homeschooling is driven by conservative Christian beliefs.

“…something ought to be done.”

If 90% of homeschooling were not driven by conservative Christian beliefs, would something still need to be done away with?

Isn’t this article demonstrating intolerance of other people’s viewpoints?

Prioritizing Workload

I don’t like having to figure out what I should be working on. This is the deduction process I use to find out the maximum priority issue at any given time.

Others’ Needs

  1. Existing release date takes priority, unless you need to change it per items below
  2. Written or verbal communication implicating a desired release date
  3. If the above are missing, then check if it’s a production issue (if yes, then automatically assume a release date of asap, and prioritize)
  4. If a production issue is especially urgent, then prioritize above other production issues
    1. Ideally, the urgency is already communicated via factors 1 and 2 above.
    2. If not, you may notice red flags indicating you should take the initiative to escalate
    3. Red flags include:
      1. Size of impact (Lots of users, lots of data)
      2. Location of impact (Primary workflow, data affecting privacy or money)
      3. Who is impacted (Personal prestige of client or user, user type)
      4. Nature of impact (Irreparable damage, no work around, $$$)
      5. Context of impact (Impact prevents achieving business or client deadline)
      6. The above are determined using explicit and implicit business knowledge and company culture. The examples listed are not exhaustive.
  5. Outside the above, any written or verbal communication demonstrating a heightened interest in receiving test feedback
  6. Some part of the testing on any given project, whether a priority or not, may need to be scheduled to accommodate others’ various needs (need to bring down a service, need two people, need to remove some data, etc.)


If the factors above indicate a priority that the environment does not permit, then you should work to prepare the environment.

Some factors you can change, others you have to wait on.

  1. Is the code ready and deployed?
  2. Are the conditions in the test environment ready?
  3. Is data available?

Your Needs

  1. What do you feel like working on?
  2. Does your test strategy for any of your projects have any requirements that influence when it can be executed?


By following this list, you can almost always arrive at the optimal thing to work on at any given point.

Elements of Testing

With this model, I aim to describe the fundamental elements that comprise any test.

It represents what I’ve learned over the past 3 years as a tester, learning to think about what I’m doing, avidly reading about testing online, taking the Rapid Software Testing Explored course with James Bach, and joining the Automation Guild with Joe Colantonio.


Testing is the evaluation of system behavior in response to interactions. Figuring out what to test, how to test, what to look for, how to interpret results, and how to apply the knowledge you gain during this process makes testing a difficult, highly skilled, and indispensable activity. 


To evaluate, you have to perceive behavior, decipher its meaning within your context using what oracles are available, and make decisions applying the knowledge gained from the evaluation. Those decisions could be an ad-hoc test idea to be immediately executed, a modification to your test strategy (such as a whole new category of tests), the creation of a bug report, etc. Finally, we need to evaluate if we have gathered enough evidence to demonstrate that the level of risk present in the system is sufficiently low. 


Observation is ongoing throughout the entire process of testing and can instantly trigger evaluation at any point. We need to notice many details and patterns, as any of them might open a new door to an area of testing and risk. Our experience as humans greatly aids us in observation. The consumers of a system are frequently human, and often share similar experiences with us testers. What sticks out to us will probably stick out to them as well, even if it doesn’t contradict a requirement. We can gain additional insight by studying our users and their contexts. Our experience as testers and the heuristics under our belts also allow for wiser, broader, and deeper observation.

System Behavior

I classify system behavior in two parts, visible outputs and state changes within the system. Visible outputs are usually easier to spot and are obvious candidates for testing. State changes within the system are not always externally apparent and therefore take a higher level of skill to identify, but can be equally risky as any incorrect output. For instance, if you write your name on a piece of paper with a pen, if the pen is the system under test, then the decrease in ink inside the pen is a state change. Wear and tear on the tip is another. Flow of ink out of the well and through the tip is a temporary dynamic state which is not (usually) present after the interaction is complete. Some outputs might be the application of ink and indentation from the pressure of the pen on the surface of the paper. There might be many more manifestations of behavior in this example. Using observation and evaluation, you have to figure out which ones are important.

Interactions with the System

Interactions with systems go on all around us. The pen lying on the desk is an interaction between it and the surface of the desk, gravity, air, etc. Grasping the pen between our hands and bending it until it breaks is another interaction. As testers with limited time in a fast paced work environment, we must wisely choose interactions to study. In order to do so, we can draw upon many resources: looking at a code change, talking to team members, reading comments, tickets, or other documentation, and crucially, starting to interact with the system itself, allowing it to inform and inspire us directly.


We test to try to spot problems which might matter to end users or the business as early in the product development cycle as possible. As we test, the knowledge gained through our constant evaluation of observations allows us to continually fine tune our strategy. Much of the time, evaluation is going on subconsciously. We see it come to the surface when an alarm bell goes off in our heads as we spot something curious which was not expressly looked for. Conscious evaluation then takes over to interpret our observation. The curiosity might be a change in system state or an output which is unexplained within our Current Understanding of the System, or mental model. Conscious evaluation is especially prevalent near milestones of testing activities. During the initial stages of testing, evaluation might be triggered by looking at a set of requirements, contemplating the application as defined within our mental model, and planning scenarios which we imagine would quickly put those requirements through the wringer. It takes place again when determining if the system behavior lines up with relevant oracles, such as the expectations of the team. Sometimes we can perform this entire activity within our minds and spot inconsistencies in the requirements even during planning stages.


Automation in testing is the scripting of human interaction, observation, and evaluation, so as to be performed by a computer. An automation system has the potential to execute vast numbers of interactions and evaluations within short periods of time. In order to do so, the interactions, observation, and evaluation must all be explicitly scripted and maintained. Scripting and maintenance are development activities which can take away large portions of time from test activities. The maintenance of a test server, services, integration with the cloud, etc., are devops activities which also take away time from testing. In addition, since every aspect of interaction, observation, and evaluation must be so minutely and explicitly scripted, system behavior which occurs outside of those narrow scripts will either prevent their execution or go unnoticed. During the automation system’s interaction with the system under test, the enormous opportunity for unscripted rational observation and evaluation is removed from the equation. Finally, a human must still decide which interactions to automate. To be most effective, those decisions require the testing skills discussed above. Given the constraints and large investment, opportunities for automation should be carefully reviewed by experienced testers and implemented with the same care as are the business’s other products.


My perception of what people usually mean when they say “Artificial Intelligence” is a neural network system which defines behavior through analysis of multitudinous interactions and or data. It is purported to overcome the handicaps of automation by providing a tool which can self-script to interact, observe, and evaluate. The elements of testing are by nature permeated with decision making and evaluation in accordance with implicit oracles, often doused in vagueness, directly related to our human interpretations of meaning. For that reason, I believe the makers of tools claiming to use “AI” should do so in a way which expressly seeks to aid, rather than replace, the tester. In other words, to create tools to increase the powers of a tester to interact with and observe the software. We see this empowering usage already in the gaming industry, where AI-enhanced image recognition can help automate regression testing suites in complex game-worlds.