How You Structure Your Codebase Is a Business Decision — And Most Teams Get It Wrong

The architecture choice that silently kills engineering velocity at 20, 50, and 200 engineers.
How You Structure Your Codebase Is a Business Decision — And Most Teams Get It Wrong
The architecture choice that silently kills engineering velocity at 20, 50, and 200 engineers.
There's a decision that every engineering team makes — usually without realizing it — that will define how fast they can ship, how painful their CI/CD pipeline becomes, and whether new hires can be productive within their first week or their first month.
It's not your database. It's not your cloud provider. It's not even your programming language.
It's how you organize your code repositories.
Monorepo. Polyrepo. Hybrid. These terms get thrown around in architecture debates as if they're purely technical choices. They're not. They're organizational decisions with technical consequences, and getting them wrong at the wrong stage of company growth is one of the most expensive silent taxes an engineering team can pay.
This article gives you the complete picture: what each approach actually looks like in practice (with real directory trees), who uses what at scale, and the honest trade-offs that most blog posts skip over.
The Three Approaches, Defined Plainly
Monorepo — Every project, service, app, and shared library lives in a single Git repository. One clone. One CI pipeline. One place to search.
Polyrepo — One repository per service or project. Each team owns their repo independently. Deploy, release, and version on their own schedule.
Hybrid — A middle path. Multiple repositories, but each one is itself a monorepo for a specific domain. Your frontend team has one repo. Your backend team has another. Your data platform is a third.
None of these is universally correct. The right answer depends entirely on your team size, your tooling maturity, your deployment model, and — critically — where you are in your company's growth curve.
Who Actually Uses What (Real-World Data)
Before we go deeper, let's ground this in what companies at scale actually do:
| Company | Approach | Scale |
|---|---|---|
| Monorepo (Piper) | ~86TB, 2B+ files, 40K commits/day | |
| Meta | Monorepo (Mercurial) | Billions of lines |
| Microsoft | Monorepo (VFS for Git) | Windows codebase |
| Stripe | Monorepo | Large, multi-language |
| Vercel / Next.js | Monorepo (Turborepo — they built it) | Mid-large |
| Airbnb | Hybrid (mono frontend, poly backend) | Large |
| Netflix | Polyrepo | Hundreds of microservices |
| Amazon | Polyrepo | Thousands of services |
| Shopify | Rails monolith → modular monorepo | Large |
| Uber | Hybrid (monorepo per language) | Very large |
The pattern that jumps out: the largest tech companies are predominantly monorepo. Netflix and Amazon are the most famous counter-examples — and both have invested hundreds of engineering-years into platform tooling to make polyrepo work at their scale.
An important data point: in a 2023 Nx community survey, approximately 77% of organizations using monorepos reported improved code sharing across teams. Turborepo, the leading JavaScript monorepo tool, now sees around 4 million downloads per week on npm — a number that reflects just how mainstream monorepo tooling has become in the last three years.
The trend is clear. The tooling has caught up. The monorepo renaissance is real.
The Honest Pros and Cons
Monorepo
What it does well:
A monorepo's biggest superpower is the atomic cross-service change. When you rename a shared type, update a library, or refactor a core abstraction, you do it in a single PR. Every consumer of that code is updated simultaneously. No versioning. No coordination. No three-week migration where half your services are on v1 and half are on v2.
It also eliminates the hidden tax of code duplication. In a polyrepo world, every service eventually grows its own
logger.ts, its own auth.ts, its own formatDate() utility. These diverge quietly over months. Bugs get fixed in some but not others. In a monorepo, shared utilities are shared by default.Onboarding is simpler. One clone, one README, one dev setup script. A new engineer doesn't need tribal knowledge about which of the 30 repos they need to clone to run the system locally.
Where it creates pain:
CI gets slow. Without smart tooling — affected-only builds, remote caching — a monorepo's CI pipeline will balloon to 30-60 minutes and teams will revolt. Tooling is not optional; it's load-bearing.
Git performance degrades at extreme scale. Google had to build an entire custom VCS (Piper) because Git couldn't handle their monorepo. Microsoft built VFS for Git (now available as open source) for the same reason. Most teams won't hit this problem, but it's real at the top of the scale.
Access control is coarser. Giving a contractor access to one service in a monorepo requires careful configuration. In a polyrepo, you just add them to one repo.
The "noisy neighbor" problem is real: a broken commit in an unrelated service can block CI for everyone until it's fixed.
Polyrepo
What it does well:
Team autonomy is the core value proposition. Each team ships on their own schedule, with their own CI, their own dependencies, their own tooling preferences. A failure in one service doesn't block another team's deployment.
For security-sensitive environments, fine-grained access control is straightforward. You share exactly the repos you want to share, nothing more.
Each repo is small, fast, and focused. CI pipelines are simple. Git history is clean and scoped.
Where it creates pain:
Cross-service changes are brutal. Updating a shared library means opening PRs across N repositories, coordinating merges, and — in practice — waiting for different teams to adopt your update on their own timeline. In organizations with dozens of services, it's common for critical fixes to sit un-adopted in downstream services for months.
Dependency management becomes a full-time job. Tools like Renovate and Dependabot help automate updates, but they don't solve the root coordination problem. They just make it slightly less manual.
Discoverability suffers badly. Finding where a specific piece of logic lives requires tribal knowledge or a very good internal search tool. Backstage (built by Spotify) exists largely because polyrepo organizations needed a developer portal to make their own codebase navigable.
Standards drift. Without a shared lint config, shared test setup, and shared CI template, every repo slowly evolves its own conventions. Onboarding becomes inconsistent. Code quality varies service by service.
Hybrid
What it does well:
The hybrid approach maps technology to org structure in a natural way. Your frontend team has one repo. Your backend team has another. Your data team has a third. Each domain has the cohesion of a monorepo internally, with the autonomy of a polyrepo between domains.
It's also the most pragmatic migration path. Companies moving from polyrepo to monorepo can consolidate domain by domain rather than attempting a big-bang migration.
Where it creates pain:
Cross-domain changes still suffer from all the same polyrepo coordination problems. If your frontend needs a change in your backend repo, you're back to multi-PR coordination.
The hybrid approach requires discipline to define domain boundaries correctly upfront. Wrong cuts early on create friction that compounds over years. And without active enforcement, it can quietly devolve into "polyrepo with extra steps."
The Full Directory Trees
This is where most articles skip the details. Let's not.
Structure 1: Turborepo (JS/TS Monorepo) — The Modern Standard for Startups to Scale-ups
This is the right default for the majority of product companies today.
my-app/
├── .github/
│ ├── workflows/
│ │ ├── ci.yml
│ │ ├── deploy-staging.yml
│ │ └── deploy-prod.yml
│ └── CODEOWNERS
├── apps/
│ ├── web/ # Next.js frontend
│ │ ├── app/
│ │ │ ├── (auth)/
│ │ │ │ ├── login/page.tsx
│ │ │ │ └── register/page.tsx
│ │ │ ├── (dashboard)/
│ │ │ │ ├── layout.tsx
│ │ │ │ └── page.tsx
│ │ │ ├── api/
│ │ │ │ └── [...route]/route.ts
│ │ │ └── layout.tsx
│ │ ├── components/
│ │ ├── public/
│ │ ├── next.config.js
│ │ ├── tsconfig.json
│ │ └── package.json
│ ├── api/ # Express / Fastify / Hono backend
│ │ ├── src/
│ │ │ ├── routes/
│ │ │ │ ├── auth.ts
│ │ │ │ ├── users.ts
│ │ │ │ └── index.ts
│ │ │ ├── middleware/
│ │ │ │ ├── auth.ts
│ │ │ │ └── rateLimit.ts
│ │ │ ├── services/
│ │ │ │ ├── user.service.ts
│ │ │ │ └── email.service.ts
│ │ │ ├── db/
│ │ │ │ ├── schema.ts
│ │ │ │ └── client.ts
│ │ │ └── index.ts
│ │ ├── tsconfig.json
│ │ └── package.json
│ ├── mobile/ # React Native / Expo
│ │ ├── app/
│ │ ├── components/
│ │ ├── app.json
│ │ └── package.json
│ └── docs/ # Internal docs site
│ ├── docs/
│ ├── docusaurus.config.js
│ └── package.json
├── packages/
│ ├── ui/ # Shared component library
│ │ ├── src/
│ │ │ ├── components/
│ │ │ │ ├── Button/
│ │ │ │ │ ├── Button.tsx
│ │ │ │ │ ├── Button.test.tsx
│ │ │ │ │ └── index.ts
│ │ │ │ ├── Input/
│ │ │ │ ├── Modal/
│ │ │ │ └── index.ts
│ │ │ └── index.ts
│ │ ├── tsconfig.json
│ │ └── package.json # name: "@myapp/ui"
│ ├── types/ # Shared TypeScript types
│ │ ├── src/
│ │ │ ├── api.ts # Request/response contracts
│ │ │ ├── models.ts # User, Order, etc.
│ │ │ └── index.ts
│ │ └── package.json # name: "@myapp/types"
│ ├── config/ # Shared configs
│ │ ├── eslint/
│ │ │ └── index.js
│ │ ├── typescript/
│ │ │ ├── base.json
│ │ │ ├── nextjs.json
│ │ │ └── react-library.json
│ │ └── package.json # name: "@myapp/config"
│ ├── utils/ # Shared utility functions
│ │ ├── src/
│ │ │ ├── formatters.ts
│ │ │ ├── validators.ts
│ │ │ └── index.ts
│ │ └── package.json # name: "@myapp/utils"
│ └── db/ # Shared DB client + schema (Drizzle/Prisma)
│ ├── src/
│ │ ├── schema/
│ │ │ ├── users.ts
│ │ │ ├── orders.ts
│ │ │ └── index.ts
│ │ ├── migrations/
│ │ └── client.ts
│ └── package.json # name: "@myapp/db"
├── infra/
│ ├── terraform/
│ │ ├── modules/
│ │ │ ├── vpc/
│ │ │ ├── ecs/
│ │ │ └── rds/
│ │ ├── environments/
│ │ │ ├── staging/
│ │ │ └── prod/
│ │ └── main.tf
│ └── docker/
│ ├── api.Dockerfile
│ └── web.Dockerfile
├── tools/
│ ├── scripts/
│ │ ├── seed.ts
│ │ ├── migrate.ts
│ │ └── generate-types.ts
│ └── generators/
│ ├── component/
│ └── service/
├── turbo.json
├── package.json # Root workspace
├── pnpm-workspace.yaml
├── .env.example
└── README.md
Structure 2: Nx Monorepo — Enterprise / Polyglot with Enforced Module Boundaries
Nx adds a layer that Turborepo doesn't: architectural enforcement. You define which libraries are allowed to import from which other libraries. Violations are caught at lint time, not six months later when everything is tangled.
my-platform/
├── apps/
│ ├── shell/ # Host app (Module Federation)
│ ├── dashboard/ # Remote MFE
│ ├── admin/ # Remote MFE
│ ├── api-gateway/ # NestJS
│ └── worker/ # Background job processor
├── libs/
│ ├── shared/
│ │ ├── ui/ # tag: type:ui
│ │ ├── data-access/ # tag: type:data-access
│ │ ├── utils/ # tag: type:util
│ │ └── types/ # tag: type:types
│ ├── auth/
│ │ ├── feature/ # Smart components + state
│ │ ├── data-access/ # API calls + state management
│ │ └── ui/ # Presentational components
│ ├── users/
│ │ ├── feature/
│ │ ├── data-access/
│ │ └── ui/
│ └── orders/
│ ├── feature/
│ ├── data-access/
│ └── ui/
├── tools/
│ ├── executors/
│ └── generators/
├── nx.json
├── workspace.json
└── package.json
The tagging system (
type:ui, type:data-access, type:feature) is what makes this scale. You configure rules like "feature libs can import from data-access and ui, but ui libs cannot import from feature." Nx enforces this at lint time. Circular dependencies are caught before they can form.Structure 3: Python / Polyglot Monorepo (Pants)
For data-heavy teams with Python, Go, or Java services alongside each other, Pants is the right tool. Bazel is more powerful but has a steep learning curve that only pays off at Google scale.
platform/
├── services/
│ ├── auth-service/
│ │ ├── src/
│ │ │ └── auth/
│ │ │ ├── __init__.py
│ │ │ ├── routes.py
│ │ │ └── models.py
│ │ ├── tests/
│ │ ├── BUILD # Pants build file
│ │ └── Dockerfile
│ ├── data-pipeline/
│ │ ├── src/
│ │ │ └── pipeline/
│ │ │ ├── ingest.py
│ │ │ ├── transform.py
│ │ │ └── load.py
│ │ ├── tests/
│ │ └── BUILD
│ └── ml-service/
│ ├── src/
│ │ └── model/
│ │ ├── train.py
│ │ └── serve.py
│ ├── notebooks/
│ └── BUILD
├── libs/
│ ├── common/
│ │ ├── src/
│ │ │ └── common/
│ │ │ ├── db.py
│ │ │ ├── auth.py
│ │ │ └── logging.py
│ │ └── BUILD
│ └── schemas/
│ ├── src/
│ │ └── schemas/
│ │ ├── user.py
│ │ └── order.py
│ └── BUILD
├── infra/
│ ├── k8s/
│ │ ├── base/
│ │ │ ├── deployment.yaml
│ │ │ └── service.yaml
│ │ └── overlays/
│ │ ├── staging/
│ │ └── prod/
│ └── terraform/
├── pants.toml
└── BUILD
Structure 4: Microservices Polyrepo (Netflix / Amazon Style)
Each service is its own repo. Shared code lives in versioned packages published to an internal registry.
# Each service is an independent repository:
order-service/
├── src/
│ ├── controllers/
│ ├── services/
│ ├── repositories/
│ ├── entities/
│ └── main.ts
├── test/
│ ├── unit/
│ └── e2e/
├── .github/workflows/
├── Dockerfile
├── docker-compose.yml # Local dev only
├── k8s/
│ ├── deployment.yaml
│ └── service.yaml
└── package.json
# Shared code lives in separately versioned packages:
@company/shared-types # Versioned npm package
@company/shared-middleware # Versioned npm package
@company/proto-definitions # protobuf / gRPC contracts
The critical thing to understand here: this works well when you have a dedicated platform team maintaining those shared packages. Without that investment, updating
@company/shared-types requires PRs and version bumps across every service repo. At 20+ services, this becomes a coordination tax that never goes away.Structure 5: Hybrid (Domain Monorepos)
# Org level — multiple repos, each a domain monorepo:
github.com/company/
├── frontend/ # Turborepo: all web/mobile apps + design system
├── backend/ # Nx: all services + shared libs
├── data-platform/ # Pants: pipelines, ML models, notebooks
├── infra/ # Terraform + K8s configs
└── design-system/ # Component library + Storybook
This is the natural landing spot for companies with 75-200 engineers that have grown beyond a single monorepo but haven't committed to the platform investment required for intentional polyrepo.
The Right Choice by Team Size
Here's the framework that the research and real-world patterns support:
1–5 Engineers: Don't Over-Engineer
At this stage you don't have a distribution problem. You have a speed problem. Build fast, ship fast, learn fast.
A Next.js monolith handling both frontend and backend via
/app/api routes is the correct architecture for 95% of early SaaS products. Stripe, Linear, and Vercel all started here.my-startup/
└── web/ # Single Next.js app, /app/api handles backend
Add a
packages/ folder only when you have a mobile app that needs to share components or types. Don't structure for the team you'll have in three years. Structure for the team you have today.Tooling: plain pnpm workspaces, no Turborepo yet.
CI: one GitHub Actions file, ~2 min builds.
CI: one GitHub Actions file, ~2 min builds.
5–20 Engineers: Monorepo, Add Tooling Now
This is where code sharing starts to matter and where polyrepo starts to create friction you'll feel every week.
At this size, you're probably running a separate backend service, maybe a mobile app, definitely a shared component library. TypeScript types shared via
@myapp/types eliminate an entire class of API contract bugs that are invisible until production.Turborepo with remote caching is the right call here. The setup cost is a few days. The CI time savings are immediate and compound.
Teams at this size that go polyrepo typically spend the next 18 months discovering why they shouldn't have.
According to 2023 community data, approximately 68% of teams under 20 engineers reported monorepos as their primary structure. Polyrepo adoption sat below 15% at this size.
Build times with Turborepo remote cache: 5–8 minutes on affected-only builds.
20–75 Engineers: Monorepo with Enforced Boundaries
The monorepo is still right here — but unstructured. Without module boundary enforcement, a monorepo at this scale becomes a big ball of mud within 12–18 months. Any engineer can import anything from anywhere. Circular dependencies form. Shared packages become implicit global singletons.
The solution is Nx module boundaries (or ESLint import restrictions as a lighter alternative) combined with CODEOWNERS files that assign explicit team ownership to every package.
The other critical addition at this stage: a dedicated platform engineer. Monorepo tooling doesn't maintain itself. Someone needs to own it.
Key additions at this stage:
- CODEOWNERS on every package (non-negotiable)
- Tag-based module boundary rules
- Remote caching with 80%+ cache hit rate as a target
- Per-team CI pipelines triggered only on changed domains
75–200 Engineers: Hybrid or Heavily Structured Monorepo
Teams are now semi-autonomous. Different domains have genuinely different deployment cadences, different tooling preferences, different release rhythms.
Two paths work here. Some companies invest in the tooling to maintain a single monorepo at this scale — sparse checkout, domain-scoped CI, a small internal platform team. Others move to hybrid: one monorepo per domain.
The wrong move at this stage is an unplanned migration under time pressure. Several companies have attempted a mid-growth migration from polyrepo to monorepo and abandoned it halfway through, ending up with a hybrid they didn't design and can't maintain well.
200+ Engineers: Committed Investment Either Way
At this scale, both monorepo and polyrepo work — but only if you've invested accordingly.
Committed monorepo (Google, Meta, Stripe model): Requires Bazel or a custom build system, VFS or sparse checkout, and a dedicated build/platform team of 10–20 engineers. Returns: atomic changes across millions of lines, unified standards, no dependency coordination.
Intentional polyrepo (Netflix, Amazon model): Requires a service mesh (Envoy, Istio), an internal package registry (Artifactory), a developer portal (Backstage), and Renovate for automated dependency updates. Returns: total team autonomy, clean service isolation, independent deployment.
The key phrase is intentional polyrepo. Netflix didn't end up with hundreds of repos by accident. They built an entire platform engineering organization around making that model work. Arriving at polyrepo without that investment is just organizational chaos with better naming.
The Biggest Mistakes by Stage
These are the patterns that repeat, consistently, across engineering organizations at each size:
1–5 engineers: Building microservices on day one. This appears frequently in founder retrospectives as a regret. The overhead exceeds the benefit by years.
5–20 engineers: Starting polyrepo because it "feels like what real companies do," then spending 30% of engineering time on dependency management within 18 months.
20–75 engineers: Running a monorepo without module boundaries. The codebase becomes unmaintainable faster than expected. By the time you notice, the circular dependencies are everywhere and the refactor takes quarters, not weeks.
75–200 engineers: Not hiring a platform engineer early enough. The monorepo tooling degrades. Cache hit rates fall. CI times balloon. Teams lose trust in the build system and start working around it.
200+ engineers: Attempting to migrate from polyrepo to monorepo without a dedicated migration team and a 6+ month runway. Several organizations have failed this migration and abandoned it midway.
The Tooling Landscape at a Glance
| Tool | Best For | Standout Feature |
|---|---|---|
| Turborepo | JS/TS, startups to mid-scale | Remote caching, best DX |
| Nx | JS/TS, enterprise | Module boundary enforcement, plugin ecosystem |
| Bazel | Polyglot, Google-scale | Hermetic builds, extreme performance |
| Pants | Python/Go/Java, data teams | Simpler than Bazel, great for ML orgs |
| Moon | Newer polyglot option | Fast, modern configuration |
| Renovate | Polyrepo dependency updates | Automated PRs across repos |
| Backstage | Polyrepo discoverability | Developer portal, service catalog |
A Practical Decision Framework
Are you JS/TS only?
└── Yes → Turborepo
└── Need enterprise module boundaries? → Add Nx on top
Are you polyglot (Python, Go, Java)?
└── Small to mid scale → Pants
└── Google-scale → Bazel
Are you a 2–5 person team?
└── Single Next.js app with /api routes
└── Extract services only when you feel the pain
Do teams own completely separate products?
└── Hybrid: one monorepo per domain
Are you migrating from polyrepo?
└── Consolidate domain by domain, not all at once
The Bottom Line
Codebase structure is not a one-time architecture decision. It's an evolving strategy that should be revisited as your team grows, your tooling matures, and your organizational boundaries solidify.
The most expensive mistake is making the decision based on what looks impressive in a job listing rather than what fits your current stage. Microservices at five engineers. Monorepo at five hundred engineers without platform investment. These are the failure modes, repeated constantly across the industry.
Start simple. Add structure when you feel the pain. Invest in tooling before the tooling becomes the bottleneck. And document your boundaries so the next engineer who joins understands not just what the structure is, but why you chose it.
At Oniyore, we help high-growth companies design, build, and scale engineering systems that actually match their stage — from early-stage architecture decisions to complex infrastructure for large teams. If your codebase structure is becoming a bottleneck, or you're planning a migration you want to get right, let's talk.
Oniyore builds custom web applications, AI-powered systems, and business growth infrastructure for companies that can't afford to get it wrong. We specialize in the kind of complex, high-stakes technical work that most agencies won't touch.