A new product is seldom without flaws. There is an old saying about new houses. When you buy a new house, just finished by the builder, let your enemy live in it the first year. Let a friend live in it the second year, move into it to live there yourself the third year. This wisdom is from long before we had ever heard about beta testers and alpha testers.
Since 1954, it has been possible to recall vehicles in the USA based on their VIN. Even the best testing could not find all shortcomings in a product, and when a shortcoming was safety related, the NHTSA could order a recall. Again, this is not beta testing the cars before general availability. It is acknowledgement that even the best tested products can contain shortcomings that are only found after a long time and extensive use.
My father was a real car guy. He and his friends did not know a greater pleasure than taking apart a car and rebuilding it, in what they thought was a better way. This was in the 1930s in Paris. The story about a Hispano Suiza they cut in half to make it half a meter longer was retold every time they met each other.
Another story that was popular among them was about a prototype. When development thought it perfect, it was produced in a small series and given to the mechanics to make a car to test in real life. After a few months without complaints, real production was started. It was a disaster. It turned out the mechanics were proud of their work and maintained and repaired the cars as only the mechanics that made the car could.
The next time a prototype needed to be tested, the company chose farmers who only knew horses for transport. They shifted the gears without engaging the clutch. They put their big wooden shoes on both the brake and accelerator at the same time. They broke everything that could break. They did it in many different ways. The result was a car that stayed in production for nearly half a century. It was famous for being simple, robust, and a mechanic’s dream. All the bolts were the same size, you needed just a single wrench. That car was the Citroen 2CV. It was perhaps my father’s favorite car.
This testing is also known as: “Is this product foolproof?” No, nothing ever is. But the best way to come as close as possible is to use it in the worst possible way.
I worked in software development most of my working life. It was mostly financial software. One of my jobs was the testing of a system that paid about a hundred thousand teachers each month. Another was international money transfers with the SWIFT system during the introduction of the euro. If there is a bug in such a system, it is expensive. It is hard to get the money back when it is paid to the wrong person or the wrong amount is paid. The testing was very rigorous. There were bug fixes each month. No system is bug free. Banking systems with millions of users and hundreds of subsystems are no exception. Neither are very well defined and tested salary systems — they contain bugs too. Both the bank and salary company had a dedicated team that could patch the system in production.
The case for beta testing software
When a system gets more complex, it will have more bugs. When systems get used by more people, more bugs will be found. When systems get deployed to more environments, more never envisioned situations will occur.
The first round of testing is by the developers. Did they make what they think they made? But you can’t really test your own product, because you know it is good.
The second round is by the QA (Quality Assurance) department. Based on the specs, QA can write very large test sets that test every condition in the specs. But QA can not test the specs. As far as their imagination allows, they can create other tests, but that is not much.
The third round is users testing in a controlled environment, doing what they would do normally and reporting on all that is not to their liking. This reveals the omissions and mistakes in the specs.
The findings in each round go back to the developers and the whole circus starts again. The next round only starts as the previous round is error free.
The fourth round is the integration into the operational world. Can it function and perform without disturbing any other system? Is it immune to disturbances from any other system? This sounds simple, but the real world is very complex. I worked mostly on a single mainframe with a limited number of interfaces. Microsoft Office works on over a hundred million computers with a nearly endless number of interfaces and configurations — astounding.
This fourth round starts in a test lab with a few thousand different configured computers running test scripts. Recordings from use cases that made previous versions brake are used. Perhaps a hundred thousand known problematic situations are tested. The software is as stable as they can make it.
But this is the end of what is possible to test for the development department and the QA gurus. And there are still too many bugs in the system to release it to the general public. Experience and statistics have taught many this ugly truth.
In the early days, the software industry launched a version ##.1 and all experienced users knew that it would be buggy. Some innovators and early adopters would use it out of curiosity and for adventure. After a few months, version ##.2 would be released with most bugs removed. Later, version ##.23 or ##.31 would be the version in general use.
These first users of the systems became a community that got a peek at the software before it was released. The current practice of beta testing was born.
And with this little history, it is clear what beta-software is. It is the software that is the best the development team can produce, but not good enough for wide use. It is great for users who can tolerate imperfections and the occasional bug, but for those who think it should just work, it is not yet good enough.
The advantage of using a beta release only for selected users instead of calling it release ##.1 is that it is kept away from the unaware. There is no misunderstanding that release 23.1 must be very good because they are at the 23rd version.
Beta testing with real users is the only way to cross the gap between what development can produce and what the general public expects. A good beta testing program starts with a small group of testers, people selected for their skills in using the product and their willingness, and hopefully proven ability, to be critical and describe what improvements need to be made.
With the software becoming more mature, the group of beta testers can be enlarged. Some beta tests involved millions of users. The longer the beta test phase is, the better the product gets. Some companies (e.g., Google) keep their software in beta for many years.
With a neural net–based artificial intelligence (AI) system, we have another problem. There are no specifications. We can not build a test set based on every condition specified in the design. The neural net (NN) was given a few million situations and solutions. It wrote the code to recognize and solve them itself. How can you test this?
It is basically the same way the testing of other software is done. Only there is greater demand for creativity from the testers. The AI can be fed millions of situation in a virtual environment of which the correct solution is known. I wrote about this when discussing Dojo.
In the case of Tesla’s Full Self Driving (FSD) software, this method can get the AI to a competent level. The cases are based on millions of cases uploaded by Tesla drivers. But these virtual cases can never replace real-world testing.
I wrote an article about driving examiners being the best beta testers Tesla could find. This can perhaps better be called alpha testing. It is the last check systematically performed under the auspices of the development team. These people are perfect to decide if it is really feature complete and can handle all normal, foreseeable traffic situations. (What they can not do is help make the software foolproof.)
In the end, though, only normal users, with all their quirks and misunderstandings, can find all the situations where the software is not good enough.
Recently, executives of a competing autonomous driving company criticized Tesla for using the public for beta testing. They were using their own employees to do that, the executives said. Apart from the fact that employees are often less motivated and more easily distracted, as long as you can improve your product by testing it yourself, you are not ready for beta testing. That is the point. But they will learn.