John Geer

How to Tell When You Have Enough Data to Act

6 Minute Read

We rely on statistics to help us make many decisions.

However, the timing of our decisions can matter almost as much as which decision we make. This article goes over a few concepts to help us figure out when we have enough information to take a chance.

In this article, I’ll focus on a conceptual understanding of how statistics reveal the world and what that means for the timing of our actions. For the moment, let’s hold off on the mathematical details.

Bit by Bit

Before we can work out when we have enough data to act, it is helpful to understand how statistics tend to reveal information.

Statistics can gradually reveal the world around us kind of like a picture coming into focus. At first, we know very little. Maybe we have some information about broad areas of light and dark. However, as we receive more information, the picture starts to clear up a little. Perhaps we can now see that we are looking at a planet. As we get more information, we can see that this planet has rings.

With more data, the picture becomes more precise. However, even with very little data we have some information about what is in front of us.

Based on the public domain image from NASA / JPL

As we incorporate more and more data, we can get a more and more precise picture. This could be a literal picture (like with astronomical data) or a forecast of the impact of certain decisions.

Interestingly, the first few bits of information tell us the most about the situation. Those initial data points greatly narrow the realm of possibilities. With our picture analogy, we are able to go from “it could be anything, including a city, a forest, or an entity in space” to “it is probably an image in space.” Later information can still be important but it doesn’t narrow the possibilities quite as much. Perhaps our estimates go from:

It is a planet with rings.

to:

It is a planet with rings and atmospheric bands.

What About Statistical Significance?

This “gradual focusing” analogy may seem a bit at odds with the concept of statistical significance. But “significance” is just a tool for a different situation.

Let’s say we are testing a new variety of wheat. If someone tells us that this strain of wheat is “significantly better” (statistically), that is a shorthand way of saying:

If we assume this strain is the same as our baseline, it is very difficult to explain its higher performance with random variation alone.

Importantly, this doesn’t mean the difference matters. If we have enough data, a difference of 0.0001% can be significant, but that doesn’t mean it is worth switching to this new strain of wheat.

In situations where the impact is small, significance just tells us how much it doesn’t matter. In many situations, the probability that the change is large enough to be relevant to the decision is much more important than whether it is significant. Happily, we can often work out that probability with much less data.

When to Act

With knowledge that gets progressively more precise, deciding when to act is about balancing the value of improved knowledge with the value of a quicker decision. Another way of thinking of this trade-off is between:

Inaccuracy: the cost of making the wrong decision
Inaction: the opportunity cost of not making a decision yet

When it is appropriate to act depends a lot on the specific decision. For some decisions, inaction will be more costly than inaccuracy, for others it can be the reverse.

Expectation

Let’s work through an example together.

Imagine you have a retirement investment that is reliably earning $2.50 a month. You are considering switching to another investment, but you don’t yet know how much it earns. With the data they presently have, your advisors say there is a 60% chance it will earn $10 a month and a 40% chance it will lose $10 a month.

With this information, you work out the expected monthly earnings for this new investment. You do this by multiplying $10 with the probability it will happen (60%) and -$10 by the probability it will happen (40%). When we add these together we get:

$$\begin{align*} & \text{Expected Monthly Earnings } \\
& 10.00 * 0.6 - 10.00 * 0.4 \\
& 2.00 \\
\end{align*}$$

So, each month this new investment is expected to earn $2.00 and your present investment is earning $2.50. In this case, inaccuracy is more costly than inaction. This suggests that you would be better off sticking with your present investment until there is stronger evidence that this new investment will make $10 a month rather than lose $10.

But, what if your situation was slightly different? What if inaction were a little more costly? Let’s say your knowledge of the new investment is the same, but your present investment is only earning $1 a month. In this case, the $2.00 expected monthly earnings from the new investment would already be more than you are making. Now inaction is more costly than inaccuracy. Assuming the additional risk of the new investment wasn’t a concern, these same probabilities suggest it is wise to switch investments now rather than waiting for stronger evidence.

In reality, the math is usually more complicated but the same principles apply¹. We want to choose the path with the best expected outcome. To identify this option, it is helpful to consider both the value and likelihood of possible outcomes.

Summary

Incorporating more data often leads to an increasingly precise estimate, similar to a picture coming into focus over time.

When we make decisions in the context of these improving estimates, we are balancing the costs of inaccuracy and inaction. We can improve accuracy by waiting for more data, but that comes at the cost of inaction.² We can act quickly, but that would be with less accuracy than we might have later. How we balance these depends on both the probability and value of the different possible outcomes.

By clarifying what we know, statistics can help inform our decisions. It can also help us discover when it is the right time to take a chance and when it is better to wait.

The math is often complicated by the costs and benefits themselves being only partially known. There are some great mathematical techniques for working with this and balancing inaction and inaccuracy, but I’ll save the details of those for another day. ^[return]
This cost also usually goes up; it takes more and more data to improve accuracy by the same amount. ^[return]

If you like my work, consider connecting to me on LinkedIn.