Getting to Know Monorepo
As a concept, Monorepos have been around for more than a decade. Google, Facebook and Microsoft have been using this architecture for ages. But it’s only now, as better tooling hits the market, that startups and open source projects are jumping on the bandwagon.
...Meaning now’s the time to get familiar with what a Monorepo is, when it’s fitting to use it and how your project or organization can benefit from it.
You’ll find all that and more in this article. Let’s dive right in.
First, it’s important to understand the opposite approach: Multirepo.
Multirepo - a.k.a. Manyrepo or Polyrepo - is a type of software architecture in which a project is divided into smaller projects in a way that makes the most sense. It's one of the go-to choices for developing software at scale. The reasoning behind it relates mostly to better code-sharing and avoiding code duplication.
In a Multirepo layout, each project has its own Version Control System (e.g. git), deployment, configuration etc. Some benefits of this approach are:
- Teams or individuals can work independently. Each repository is autonomous and can grow separately.
- Onboarding new developers is easier. The smaller the codebase, the easier it is to get familiar with it.
- Simpler tooling. Projects typically do one thing and use a single language, so tooling - CI/CD and configuration - is as simple as it gets.
But as the number of repositories grows, things can get messy. Changing a shared piece of code starts to take multiple pull requests. Figuring out the correct order to release all the components takes up a big chunk of developer time. Large scale refactors become complicated—and therefore never get done. Teams or individuals become isolated, lost in their own projects, and so on.
Monorepos help us deal with most of these problems.
Just like Multirepos, Monorepos are a type of software architecture. But instead of breaking a large project into multiple repositories, all code goes into the same place. Every artifact - whether it’s a library, project, component, app etc. - shares the same repository and VCS. These artifacts may or may not be connected (e.g., a website and an Android app), and they don’t necessarily use the same programming language. Adding new artifacts can be as simple as creating a new folder on the top-level of the Monorepo.
Please note that Monorepo doesn't mean that all components are tightly coupled and mixed together in a big bowl of spaghetti. Things are still separated according to what they pertain to; they just happen to live on the same git repository.
Compared to Multirepos, Monorepos have many benefits:
- Easier to update shared code. Updating a common piece of code and all the places where it is used can be done in a single step - a single commit. This operation is known as an atomic commit.
- Better discoverability. Monorepos gives us a 360° view of the codebase. This makes actions like estimating the impact of a change much easier to do. You can also leverage tools such as a global Find & Replace for quick refactors.
- Development culture. Sharing knowledge becomes easier when everyone is working on the same codebase and can talk about the code at the same level. Growing pains are shared, helping promote a sense of empathy among the team.
- Flexible architecture. The process of isolating (or merging) code in a Monorepo is less costly than in a Multirepo. There's no need to set up a whole new repository, configuration and deployment. This makes it easier to revert bad abstractions or over-engineered parts of a system.
WHEN TO GO MONOREPO...
Monorepos are a great choice as a general architecture, but there are scenarios where it stands out.
If you need to share code across multiple places, Monorepos are a safe bet. When projects are released separately - a common thing in non-Monorepo setups - you need to bring some sort of versioning system into play. You must keep track of breaking changes so that consumers can install updates safely, on their own time. Monorepos allow us to update everything in a single commit, decreasing the need for a versioning system. And since there's just one repository, the latest version is represented by the latest commit hash.
Monorepos are also a good choice when projects need to be released together. The more projects that need to be deployed simultaneously, the bigger the efforts with coordination. Figuring out the correct release order and testing all parts together can take up a lot of development time—like when using Multirepos. With Monorepos, changing, reviewing, testing and releasing changes to multiple artifacts can all be done in the same development cycle.
Monorepos can also bring value to less mature code bases. We've all been there: when starting a project, you tend to make abstractions and separate things in a way that makes the most sense at the time. As the project grows and business requirements change, those abstractions are suddenly not the best representation of the system anymore. In order to keep things moving forward, refactoring the existing code becomes inevitable. Monorepos help reduce the cost of those refactors by making it easier to find, change and re-organize code from multiple projects at once.
...AND WHEN GOING MONOREPO PROVES CHALLENGING
We all know that there are no silver bullets when it comes to software development. Monorepos are no exception. While this software architecture may work well in most cases, there are some trade-offs to consider.
[Access Control] Restricting access, for example, is still a challenge with Monorepos. Since everything lies together in the same repository, everyone is able to see the code in full. This can be a problem if you have sensitive areas in your code that you only want a handful of people to see.
[Complex tooling] Tooling with Monorepos can get quite complex. The more unique and sophisticated your Monorepo - different languages, platforms, runtimes etc. - the more effort will be required when setting up, for example, a CI/CD workflow. Tools like TravisCI and CircleCI can work with Monorepos just as well as with Multirepos, but be ready to hit a roadblock when dealing with edge cases.
Version Control Systems might drag a team's feet, too. When multiple developers are pushing code to the same repository every day, tools like git will be stressed out. Simple commands like 'git status' or 'git commit' might take longer than usual to run if your Monorepo is gigabytes-large. Tools such as VFS for Git help deal with the performance of large git repositories.
Whether you choose a Monorepo or a Multirepo architecture for your project (or organization), you’ll have to deal with trade-offs in both approaches.
Multirepos will bring challenges related to coordination: managing dependencies, doing large-scale refactors, releasing multiple artifacts at once etc. Monorepos will solve most of these problems, but come with greater complexity when it comes to tooling and managing the scalability of the repository.
With discipline, however, both approaches will work fine at any scale.
And one last note: No matter which architecture you choose, always take into account the engineering culture of your team. Building software is a social activity and different teams have different ways of working and communicating. Always be smart and carefully evaluate your architecture choice based on your specific situation.
Like our content, what we do and how we do it?