Learning big codebases is hard

Saul Delgado

Feb 19, 2024 — 6 min read

One of the realities of software engineering is the need to get up to speed with relatively big repos, sometimes in the hundreds of thousands of lines of code. Getting familiar with these repos can take weeks, getting comfortable can take months and getting an expert can take years. Senior engineers know this and actually expect you to struggle with this – sometimes we hear horror stories about awful leads/seniors demanding that someone hits the ground running and when this doesn't happen things take a turn south. Thankfully that's not generally the case and you should have enough time to familiarize yourself with a new codebase.

Here are some guidelines I found useful to help close the gap in a relatively timely manner. Your mileage might vary.

Understand the business

I cannot stress enough how important this is. You need to understand the problem your company is trying to solve before you start spitting out code. This usually is conveyed to you during onboarding and you will hopefully have a tour of the product but this is not always the case. If you are not 100% sure what your company does, I would suggest you take the time to figure it out.

Once you have a high-level understanding of what your company does you can then start taking a look at how the existing platform tries to solve the problem. Again, your mileage might vary but depending on the size of your platform you can be looking at a large chunk of microservices that each solve a different aspect of the problem. Each one of them will (hopefully) specialize in a very specific area of knowledge and it's very common to have teams dedicated to one or more of them.

If you are going to be working on a very specific area of the platform, it is always a good idea to get a good sense of what the platform does as a whole. You might not have to know what other parts of the platform do specifically but having a good sense of orientation about these areas can be very helpful, specially when troubleshooting integrations.

Do not try to figure it out all by yourself

Unless you landed in a very hostile environment or you find yourself as the sole developer for a large codebase, always try to get help from your peers. After many years I have yet to find a place where I could not find a single soul willing to share the knowledge with me. This also a good opportunity to start knowing your team!

Yes, you can figure it out by yourself if you really are forced to it or no one else is available but it will not be time effective. You will spend many hours playing with the code before you can be productive – and yes, having help will speed up the process but it will still require lots of hours of learning and testing before you can start being productive, so try to recruit some help as soon as you can!

I would like to point out that being an introvert myself I can understand how difficult it can be to reach out to peers out of anxiety and fear of interaction. It is real, I've been there many times. However, learning to ask for help can be a really valuable skill to master and I wish I learned this sooner.

Play with the code!

One of the best ways to learn how the code works is by taking it for a spin. If you can spin up a local environment in your machine, by all means learn how to do it and have fun with it!

Most complex platforms however have so many moving parts and usually you will only have one or very few test environments available at your disposal. Learn how to safely deploy your feature branches to them and follow your team guidelines on what's cool and what's not and start from there. Once you learn your team's etiquette and practices for testing, don't be afraid to break things – as long as you can safely and easily revert your changes you shouldn't be afraid to experiment.

I remember a few years ago I was terrified of breaking our dev environment because we all shared one single environment and errors would get immediately posted to our Slack dev channel for everyone to see. I would stress, overthink and deploy after many hours of running the code in my head hundreds of times only to predictably fail at the one scenario I didn't think of. I brought this to a 1:1 with one of my leads and she laughed it off and told me that this was actually something desirable, this is how all of us on the team know the code is being tested. It was a feature, not a bug!

Nowadays I shamelessly deploy and test my code thoroughly but took me years to get comfortable knowing others would see my code fail. As long as it fails in dev, all good. What's not cool is failing to test in dev and having your code break production.

Study the unit tests

Yup, a lot of people hate writing unit tests. I am one of those. I understand how valuable they are to catch errors before they reach production but I really dislike writing them. I see them as a necessary evil.

In this case however we want to see the test cases written by others and see the data they are feeding the code and how it behaves. See what makes the code fail and what is the expected behavior. You can get a good sense of the input and output of the codebase you are studying by just spending some time going through the test cases... unless there are no test cases (good luck with that).

Office hours

If your team leads/seniors have office hours available, use them! These can be very informal or very structured, depending on your particular organization culture but whatever the case, it is one of the most powerful learning tools I know.

It gives you access to developers that wrote the code you are trying to learn, or worst case scenario, they learned it from the original developers and are passing on that information to you. They will have insights on why something was done a certain way or the pitfalls that they were trying to avoid when implementing X, Y and Z.

A nice byproduct of this is also the fact that you will start to be known as someone who doesn't cut corners and is trying to genuinely understand the platform. Don't book office hours just for this though.

Follow the rabbit hole

Finally, the more time-consuming but the most valuable in my book. Identify the entry points of your code and follow the different paths it can take. This will require a good sense of orientation and know when to back-track once you reach a dead end. Pair this with lots of testing to confirm your findings and you are on to something.

This will take probably days of your time but the reward is reaching parts of the code you would otherwise would be unaware of. Use any help you can get from your IDE and step into the definition of functions your code uses – this will help you gain a deeper understanding and not only thinking of functions as abstractions/black boxes. Yes, you will probably not need to know everything and be an expert on obscure helper functions your system uses but it doesn't hurt to explore and put in the hours early on. Down the line you will feel more comfortable coming back to areas of the code you need to fix or enhance.

A calculus professor during my college years used to say that calculus can only be learned through the tip of the pencil. Although we don't use pencil and paper to write software (at least not that I know of), the same can be said about learning a codebase. You need to put in the hours and practice a lot with it before you can safely say you know it.

Rome wasn't built in a day

Learning a big platform will not happen in a day or a week. Part of the success when trying to cram a lot of knowledge in a reasonable amount of time is knowing what reasonable means. Take as much time as you need and also communicate with your leadership if you feel you are taking longer to catch up – it will be counterproductive to try to run before you walk and your productivity will take a dip.

Hopefully these guidelines will steer you in the right direction.