Understand your bugs, understand your product

As a product manager, more time than I’d care to admit has been dedicated to understanding, prioritising, and resolving bugs. I know that I am far from alone in that, and yet you do not see bugs prominently feature in the product management thought pieces. Perhaps this is because they are just considered a chore that is part of every PM’s life, or maybe people do not want to draw attention to the fact that bugs are something that all software has. I would like to offer a different perspective. Bugs are essential for enabling you to truly understand your product.

Bugs as a window into the inner world of your product

In the now classic book The Design of Everyday Things, Don Norman makes the point that frustration with a product arises when the mental model of the user does not match the mental model of the designer of the product, and how this becomes more likely as the product grows more complex. Good design goes a long way to bridge this gap, and the book explores this in detail. Nevertheless, a gap tends to remain, and with time no party may have an accurate model of how the system actually works. This leaves a lot of room for bugs to start creeping in.

I want to stress that it is normal for a system to become more complex with time, and anyway people are far from flawless in coming up with mental models of how the world works. (It’s perhaps also not surprising that there is a whole area of study about how people’s intuitive notions of physics are off.) Furthermore, the people building the product are naturally people too, which helps explain why leap years cause software issues around the world every four years.

Bugs are in a way your honest broker. They may be hard to understand, and sometimes hard to notice, but they will tell you things about your product that no amount of documentation will ever be able to capture. You should treat them like your users: not all of them will be equally impactful, but listen to what they tell you. You never know when you might find avenues of value to explore, and they can act as canaries in the coal mine for deeper issues which you should get ahead of.

Debugging the brain: Examples from cognitive neuroscience

If it seems odd to leverage failures of a system for discovery, it is good to be aware that we have learned a lot about how the brain works by seeing examples of when it doesn’t. The literature is filled with different people with strokes, lesions, or other accidents to their brain which has led to a better understanding of how they work. In 1848 Phineas Gage had a railroad spike damage his frontal lobes, and thus we discovered the role they play in cognitive functions and impulse control. In 1953, the removal of most of Henry Molaison’s hippocampi taught us the crucial role they play in the formation of long term memories. Meanwhile, Patient DF’s carbon monoxide poisoning revealed the presence of two separate visual processing streams for object recognition and for action guidance. These and other anomalous cases have spearheaded an improved understanding in these different areas of psychology.

That said, one could argue that most (if not all) psychological experiments tend to focus on finding the right parameters that cause ‘failures’ of the mind. If a task is too easy and all your participants are scoring 100%, then you won’t be finding out anything interesting about how the underlying system is working. The parameters of the experiment must therefore be pushing against some limits of the system it aims to study. Attention is often measured by what participants failed to detect, while a lot of social psychology investigates ‘failures’ in human reasoning. Optical illusions in turn tell us about how the visual system constructs what we see.

Unsurprisingly, the brain is another system that is not built in an intuitive way. For instance, did you know that you are essentially blind while your eyes are moving? Or that we are terrible in noticing changes in our environment (see this three minute video if you want a neat demonstration)? Or how unreliable eye witness testimony is due to the malleability of human memory? Then there are the many illogical fallacies people regularly are subject to, such as the sunk cost fallacy, anchoring, availability bias, and many more.

All this and more has been figured out by ‘failures’ in the brain’s output. Perhaps anyone building software products should find some solace in the fact that although the brain works as well as it does, it still isn’t bug free.

Debugging software

In some ways bugs aren’t really bugs at all, at least not in the sense that something is broken. Instead, they are merely a reflection of how the system is currently put together. This is very much like the quirks of behaviour that we see in psychology. Because of this, your software will never be completely bug-free, as your users will be testing its limits.

That is not an approach you necessarily should take when dealing with an irate user who has encountered one of your bugs (“The product isn’t actually broken if you really think about it”). Nevertheless, the perspective can still be valuable for you to use them to really understand your current product better.

In research, experimentation is a key tool of discovery, and the methods are not dissimilar to those that one can use for debugging. When I was conducting my experiments, I would design them so as to figure out whether I could modify a certain behaviour taking place by modifying a specific variable, while keeping other parameters constant. I would have chosen this variable based on a theory of how the behaviour comes about, and if this manipulation failed, then I would have to update my mental model of said behaviour and try to modulate it in a different way. Furthermore, I would also have to work hard to rule out alternative explanations.

With bugs my approach is often very similar. I am explicit with my assumptions as to why the bug is taking place, and replication of the bug is of course the first step. I may also check and see if I can disprove any of my theories on why the bug may be occurring and check whether the issue would persist in different variations. This may include trying to replicate the bug in different parts of the application or within different accounts with different data or settings.

(Here I am thinking of the more complicated bugs which are harder to replicate and are not that clear, and not those where a button has stopped working due to a missing semicolon in the code. The latter would be similar to discovering that someone who loses an eye can no longer see with it. This discovery may be true and uncontroversial, but you are not learning anything new about the system.)

There are a lot of bugs that can be triaged and sorted pretty quickly, which means that as time passes your backlog is populated by more and more of the pernicious bugs. More than once I have spent days or longer trying to figure out why an issue was occurring, and sometimes it has only become clear after encountering the issue a few times (and there are some bugs I never came around to cracking). These have included issues introduced by not factoring currency exchange rates, inconsistent data, or issues introduced by unclear editing functionality. The good news is that bugs do become easier to diagnose as you keep encountering them in different variations, and in some cases even when you move from one product to the next. For instance I have encountered multiple times the effect currency conversion rates can have on your data, and it is a quirk that becomes a lot easier to recognise each time.

There is another bit of good news. When you manage to understand a particularly insidious bug, it usually helps explain another behaviour somewhere else in your product. Furthermore, there is also a chance that by fixing one you may also tackle the root cause of both, and thus fix multiple bugs for the price of one. For this reason, I recommend grouping your issues by type, so you start seeing the forest rather than the trees. This will help you avoid having to play bug-whack-a-mole, where you jump from one bug to the next without rhyme or reason. Instead, it’s better to tackle a cluster of bugs, preferably one which is more relevant to your current business objectives, or which is otherwise tied to your product’s core value proposition.

Some notable bugs in the real world

It is worth underscoring though that bugs can have severe consequences, especially when they go undiagnosed for a long time. A flagrant example is how issues with a system used by the Royal Mail were undetected, which resulted in several employees of the Royal Mail being unjustly accused of stealing money from post offices accounts. This had consequences for hundreds of employees, which included prison time for some. Bad as this already sounds, this resulted in a grand total of 700 cases that were brought to court between 1999 and 2015. The fact that the problems persisted undetected for so long and the consequences were so severe, really makes you question how well the Royal Mail really understood the product they were using.

Another case worth mentioning is a more recent one (2024), and though it is not actually a bug, it nevertheless showcases the value of debugging. A software developer at Microsoft was trying to figure out why a certain process was taking longer than expected, and while doing so stumbled upon a malicious security vulnerability which had been injected into the codebase. If this engineer had not been digging around trying to understand the quirks of his system, it is anybody’s guess how long this backdoor would have gone undetected, just like it is anybody’s guess how many others may still be out there.

Parting thoughts

I once was chatting to a product manager in another company, and he told me he did not really have to trouble himself with bugs, as there was a platform team that dealt with them. I have also seen some product managers who would misunderstand the bugs they had encountered, and whose primary concern was for the problem to go away. There are of course time constrains, but I would recommend to these people to invest the time to understand the bugs in their product. This approach is not only a common method for gaining a deeper understanding of a system but also an effective means to pinpoint areas for significant improvement.

People take it for granted when things just work, and are irate when they don’t. This can make fixing bugs seem like a relatively thankless task, as people understandably expect a bug-free product to be table stakes (unrealistic as it might be). However the knowledge of your system that your bugs will give you will enable you to push the limits of your product ever further afield and to build innovative products on a solid foundation. And if you only fulfil the first part of that clause and don’t actually innovate… well, there are worse things than having a product that people describe by saying “It just works.”

Post Views: 2,324

Comments

One response to “Understand your bugs, understand your product”

Juan Jimenez

9. April 2024

Fantastic piece and nice encouragement not to treat every bug as a worthless chore. To all that, I would add two further aspects that would deserve a few paragraphs each: 1) what was the quality of the models (assuming they even existed) used as the basis to construct the system and 2) assuming the models reflected nearly perfectly how the business works and the specific domain the system covers, how good was the communication with everyone involved? Humans being as they are, even if these two items were perfect, individuals process information differently and often arrive at different conclusions and therefore the actual implementation may not be perfect. We then collective name these “mismatches or lack of models” simply a bug.