The Evolution of Testing With Your Codebase

The Startup Testing Evolution Story and Some of the Lessons Learned

Published in

Austin Startups

7 min readFeb 7, 2020

We sometimes focus on robust testing or the best testing strategy but it’s not quite that simple. While there is a strategy that is typically best if we want to release code with very few bugs in the most efficient way possible, this is not always the strategy that makes the most sense for a particular company. What makes sense for a small two person startup may not make sense for a larger game app, which might, again, be totally different for a large banking app.

Having worked at a labs group, a tiny startup and two slightly larger startups (one of which I stayed with until we were rather large), I’ve seen a variety of approaches to testing. In general though, there seems to be a common evolution in how companies approach testing over time. The evolution is logical but is also more reactive than is typically good for the company.

The Startup Testing Evolution

When you’re in a labs group, or even a really small startup, robust testing doesn’t really make sense. If you just need a demo to work to show investors or you’re pivoting what you’re building every week, adding lots of tests provides very little value for how much it slows you down. Additionally, if there are only a couple of developers working on a small codebase, they will know the code well and the chance of introducing errors is lower because everyone knows all of the features and how things work together. At this size, speed matters above all else. After all, if you build the most robust piece of software but can’t build enough to convince anyone else before you run out of money, it really doesn’t matter. Therefore few or no tests are written and that is actually, in most cases, the right choice.

Then you become a little bigger. Instead of just doing a few manual tests to verify your work, you find that it’s important to add some automated tests — at least for the most important features. You hopefully have at least a couple of real customers who will not be happy if something fundamental breaks and testing everything vital manually each time is becoming inefficient. Since at this point, you’re most focused on the main use-cases and you want to write a limited number of tests that will cover broad things, the right choice is usually to write end to end tests. Usually UI end to end tests. In this way, you’re mostly just automating some of the manual testing that you were doing before. You’re making sure that your most important scenarios work and hopefully you’re able to add these tests without too much time and effort. Again, this is often the right tradeoff between test robustness and speed.

Slowly, over time, testing becomes more and more important; you can no longer get away with breaking as many things. You’ve increased the number of features as well as the number of developers working on them and it’s quite likely that not everyone knows everything anymore. In fact, you might even have a feature that no current employees understand. Up until this point, end to end tests have been the main form of testing, so companies tend to add more and more of them.

The Testing Crisis: Speeding Up Tests

Unfortunately, end to end tests are the slowest and therefore most resource intensive and expensive to run. As more tests are added, test runs become hours if not days. Now merging code takes large amounts of resources, computing power and time. Additionally, because UI tests tend to be a bit more flakey and brittle, lots of time is also spent rerunning or fixing these tests. Usually enough developers have been added at this point and testing has been emphasized enough that the number of these tests is exploding, bringing these problems suddenly into sharp focus. The company is now forced to confront their testing situation.

Around this time, usually someone from a larger company background will bring up the testing pyramid. The testing pyramid visually represents that unit tests are the fastest, most robust tests, and therefore, we should be writing the most of those, while UI end to end tests, are the slowest and most likely to break. So while end to end tests still have an important place, we should be writing the fewest of those. The company then tries to reverse the tide of test writing to stop the influx of new end to end tests and start a culture of unit tests.

One company I was at just completely stopped running any end to end tests very abruptly one day. A good number of the tests should have been replaced, but by abruptly stopping, we ended up with much of our (now much larger) codebase completely untested. We also failed to keep even a few of the most valuable end to end tests. With the backlash of trying to stop more end to end tests, we didn’t take the time to think about which ones actually made sense and added value. While it is important to make this shift and it is important to add more unit tests while limiting or even removing end to end tests, it is also important to remember that balance is important.

Often, the company will slowly introduce one or two more test types instead of trying to introduce a comprehensive strategy. This often leads to more over-correcting before they can finally land on a good balance. For example, one place I was, my team, who had little experience with unit testing at the time, was asked to be the first to stop using end to end tests. Because we didn’t have a good definition of true unit tests and because our feature was actually pretty tightly tied to some complex DB logic, we ended up writing almost entirely integration tests. This alone may have been okay, but we were touted as an example to the rest of engineering and because our tests were basically the only non-end to end tests, everyone copied what we had done. Pretty soon we had a HUGE number of integration tests and still virtually zero true integration tests. This, again, lead to problems.

Then, when true unit tests were introduced, we were basically taught that unit tests were always better than integration. We then swung toward adding only unit tests. It wasn’t until much later that we were given the knowledge and tools to effectively write a variety of tests. We had eliminated all of our end to end tests and years later we found ourselves re-introducing them as we realized that there were things they could do that other tests just couldn’t.

Even early on, we had the testing pyramid preached to us, we largely weren’t given the knowledge of which tests to write or even how to write more than one or two types. This made it feel very much like there was an ideal, but maybe that wasn’t what was done in practice. On top of that, effectively implementing an actual pyramid of tests on an already large codebase with many contributors and unclear ownership boundaries can be difficult. While it’s easy to add unit tests on new things that are added, adding them to code that no one knows is both challenging and hard to justify to other departments. Even worse, if no one owns the code, already stretched teams don’t want to take on ownership and then who is responsible for adding or modifying tests?

Lessons Learned

Starting with few or no tests often makes sense. Adding end to end tests as your first automated tests, likewise makes sense. However, don’t wait until you have problems with the speed of test runs before introducing other types of testing. Also make sure to introduce multiple test types — at the very least, unit tests, integration tests and end to end tests — at the same time. Don’t villainize any of these types but instead make sure everyone understands the pros and cons of each and how to write each. Strive to make all types easy to write — we all tend toward preferring whatever is easiest, but if they’re all easy, we will instead pick what’s best. If you are introducing unit tests for the first time and trying to limit end to end tests, make sure that the team is set up for success with unit tests — they are easy to write, frameworks have been chosen, examples exist, etc. If a team is told to write unit tests but doesn’t know how, they likely won’t. Stopping end to end test runs, while effective at limiting end to end tests, is not the best way to end up with robust tests.

It is easier to start a codebase with high test coverage early rather than increase it later. However, that doesn’t mean it’s not possible. To that end, ramping up testing earlier can be a good thing. Additionally, finding ways to encourage always improving the tests (even if only by a very small amount), can make a big difference over time. For example, just because your code coverage is abysmal now, that isn’t a reason to not put in a (hopefully rising) bar to prevent it from getting worse.

Finally, take into account your company’s appetite for risk, development speed, release speed, resource cost and code complexity to come up with a test mix that makes sense in your context. Even if you can’t immediately get to that ideal, knowing what you’re aiming for is valuable. If you don’t have a goal, it’s easy to be swayed by the feeling of the day.

Conclusion

It is natural and common for a startup to evolve it’s approach to testing as it grows. There’s nothing wrong with this. The problem often comes when that startup isn’t proactive in it’s testing approach and is instead reactive. This often leads to over correcting and gut responses instead of a thoughtful testing strategy. The good news is that with a little more thinking before things become a problem and proper education for the development team, most of these problems can be avoided.