Mono-Repo Migration: Boosting Developer Productivity While Keeping Git History Intact

Large software projects require modular codebases to ensure scalability and maintainability of the application. As a result, we can deal with an increased number of customer requests and deliver features on tight schedules.

While modularization is a powerful tool, it may also have a negative impact on developer productivity. This is the case when the application is structured in packages managed in different repositories. Then the overhead caused by managing multiple pull requests becomes significant, when features require changes in the main project as well as other in-house dependencies.

In this article, we discuss how we can overcome these issues by transitioning to a Mono-Repo and demonstrate a python script to automate a Mono-Repo migration.

Advantages of a Monorepo for Projects with Multiple In-House Dependencies

When changes often affect the project as well as its in-house dependencies, a Mono-Repo may be preferred. Developers can include changes in a single pull request and gain a better understanding of the changes required among packages, hence leading to better cross-team awareness.

As an example, consider the following approaches:

Image

While three pull requests are necessary to apply changes to the main project as well as its dependencies when using a Multi-Repo, the Mono-Repo only requires a single pull request.

Even though git operations may become slower, using a Mono-Repo pays off when additional constraints like Stale Approval dismissal are in place.

Since a Mono-Repo migration has a significant impact on the entire code base, the transition needs to be well thought through; ideally automated using a script. With the latter we can run the migration locally without affecting any remote repository. In addition, changes are reviewed via pull-requests before they are integrated into the main and develop branch.

Let's briefly discuss the overall concept of how to approach a Mono-Repo migration.

Concept

For demonstration purposes, we consider a Main repository, which depends on two in-house repositories: Dependency 1 and Dependency 2. All repositories follow Git Flow, using main and develop as their primary branches. Feature branches are created from develop and merged back into it upon completion.

Image

The goal is to consolidate the Git history of all three repositories into the Main repository. This is made possible by Git’s ability to merge branches from unrelated hierarchies. By adding each dependency as a remote, we can directly merge their key branches (such as main and develop) into the corresponding branches of the Main repository.

Before merging, a preparation step is required to minimize potential conflicts. This involves:

  • Removing redundant or duplicate files common across repositories (e.g., .gitignore), keeping only those from the Main repository.
  • Restructuring the folder layout of each dependency to prevent file path collisions.
  • Migrating and renaming existing tags from the dependencies by prefixing them with the respective project names to retain traceability.

As soon as dependencies are prepared we merge their respective main branch into develop to avoid merge-conflicts after all repos have been merged to the Main repository. Otherwise, conflicts will arise after the integration when we try to merge main into develop.

Image

Next, the preparation branches are merged into dedicated integration branches, from which pull requests are created to review the changes introduced during the mono-repo transition. Maintaining the full Git history of each dependency is essential for traceability. This is accomplished by using the --allow-unrelated-histories flag during the merge process. The migration concludes with the final merge of these pull requests into the Main repository.

Image

After outlining the migration concept, let's introduce the Python script designed to assist with the migration.

Migration Script

The migration script is crucial to ensure that all necessary steps are performed in the right order. While manual migration might be feasible for smaller projects, it becomes too error-prone and impractical for larger codebases.

Consider the following repositories to demonstrate how the migration script is built:

  • Repositories:

We consider the Main repository as well as its in-house dependencies Dependency1 and Dependency2. When the migration is done, we expect both dependencies to be integrated into the Main repository according to the following structure:

  • Structure:
    • ./...
    • ./Sources/Dependency1/...
    • ./Sources/Dependency2/...

Since source code from Dependency1 is placed in its root directory, the folder structure needs to be adjusted such that all top-level files and directories are moved into ./Sources/Dependency1. Similar we proceed with Dependency2. This way, we avoid merge conflicts when integrating dependencies into the Main repository.

Steps

Let's begin by defining an abstract Python class to represent a step within the migration:

from abc import ABC, abstractmethod

class MigrationStep(ABC):
    def __init__(self, title: str):
        self.title = title

    @abstractmethod
    def execute(self) -> None:
        """Execute migration step."""
        pass

Any MigrationStep consists of a title as well es the action that should be done. Using this abstract class we can structure the migration into unique steps.

Cleanup

First, the CleanupStep ensures that all artifacts from previous migrations are removed before the migration starts. Otherwise, artifacts of a previous run may interfere with the current run. To ensure a clean context, it suffices to remove all repository directories.

from config import Repository
from utils import remove_directory
from steps.migration_step import MigrationStep

class CleanupStep(MigrationStep):
    def __init__(self, title: str, repositories: list[Repository]):
        self.repositories = repositories

        super().__init__(title)

    def execute(self) -> None:
        for repo in self.repositories:
            remove_directory(repo.path)
            print(f'Removed {repo.name} repository.')

Clone Repositories

Next, the CloneRepositoriesStep clones all repositories passed via its initializer. In this case the Main , Dependency1 and Dependency2 Repositories are cloned into the same directory where the migration_script.py script is run.

from typing import Dict, Any, List
from config import REPOSITORIES
from steps.migration_step import MigrationStep
from utils import git_clone, process_items_with_breaks

class CloneRepositoriesStep(MigrationStep):
    def __init__(self, title: str, repositories: List[Any] = REPOSITORIES):
        super().__init__(title)
        self.repositories = repositories

    def execute(self) -> None:
        """Clone all repositories."""
        process_items_with_breaks(self.repositories, git_clone)

Determine Submodule References

Before starting with package preparation, we need to determine commit hashes of the submodules that are referenced from the main repository. Otherwise, when taking the latest commit, the source code may mismatch leading to compilation issues.

from config import MONO_REPO, PACKAGES, BRANCHES
from utils import process_items_with_breaks, run_in_repo
from steps.migration_step import MigrationStep
from typing import Dict, Any, Callable

class DetermineSubmoduleReferencesStep(MigrationStep):
    def __init__(self, title: str, completion: Callable[[Dict[str, Dict[str, str]]], None]):
        self.completion = completion
        super().__init__(title)

    def execute(self):
        references: Dict[str, Dict[str, str]] = {
            'develop': {},
            'main': {}
        }

        def process_branch(branch: str):
            run_in_repo(MONO_REPO, f'git checkout {branch}')
            for package in PACKAGES:
                commit_hash = run_in_repo(
                    MONO_REPO,
                    f'git ls-tree {branch} {package.submodulePath} | grep commit | awk \'{{print $3}}\'',
                    print_output=False
                )
                references[branch][package.name] = commit_hash
                print(f'[{branch}] Found {package.name} Git-Submodule Reference: {commit_hash}')

        process_items_with_breaks(BRANCHES, process_branch)
        self.completion(references)

The git ls-tree command returns the referenced commit hash given the branch as well as the submodule's name. All hashes are stored in the references dictionary to be later accessed by the preparation step.

Prepare Packages

The PreparePackagesStep is responsible to restructure dependencies to ensure seamless integration into the primary repository. This process involves relocating all files from the root directory to subdirectories to minimize merge conflicts during integration. Furthermore, tags are prefixed with the dependency's name to facilitate easy identification within the main repository. Notably, any redundant files such as swift-lint/-format configurations or .gitignore files are pruned to keep dependencies lean and organized.

import os
import uuid
from config import PACKAGES, BRANCHES, BRANCH_NAMES
from utils import process_items_with_breaks, run_in_repo, git_merge, git_commit_all, remove_directory
from steps.migration_step import MigrationStep
from typing import Dict, Any

class PreparePackagesStep(MigrationStep):
    def __init__(self, title: str, submodule_references: Dict[str, Any]):
        self.submodule_references = submodule_references

        super().__init__(title)

    def _cleanup_package_files(self, package):
        """Remove package-specific configuration files."""
        paths_to_remove = [
            '.swiftformat',
            '.swiftlint.yml',
            '.gitignore'
        ]

        for path in paths_to_remove:
            full_path = os.path.join(package.path, path)
            if os.path.exists(full_path):
                try:
                    remove_directory(full_path)
                    git_commit_all(package, f'Remove {path}')
                    print(f'Removed {path}')
                except Exception as e:
                    print(f'Warning: Could not remove {path}: {e}')
            else:
                print(f'Skipping removal of{path} - file does not exist')

    def execute(self) -> None:
        def prepare_package(package):
            print(f'Preparing {package.name}...')
            run_in_repo(package, 'git fetch --all')
            run_in_repo(package, f'''git tag -l | while read t; do n="{package.name}/$t"; git tag $n $t; git tag -d $t; done''')

            for branch in BRANCHES:
                run_in_repo(package, 'git config advice.detachedHead false')
                commit_hash = self.submodule_references[branch][package.name]
                print(f'Checkout Commit {commit_hash} in {package.name}')
                run_in_repo(package, f'git checkout {commit_hash}')

                preparation_branch = BRANCH_NAMES.preparation(branch)
                run_in_repo(package, f'git checkout -b {preparation_branch}')

                def introduceSourcesSubDirectory():
                    target_dir = str(uuid.uuid4())
                    run_in_repo(package, f'mkdir {target_dir}')
                    excluded = [target_dir, '.git', '.gitignore']
                    for file_name in [file_name for file_name in os.listdir(package.path) if file_name not in excluded]:
                        run_in_repo(package, f'git mv {file_name} {target_dir}/{file_name}')
                    run_in_repo(package, f'git mv {target_dir} {package.name}')
                    run_in_repo(package, f'mkdir Sources')
                    run_in_repo(package, f'git mv {package.name} Sources/{package.name}')

                introduceSourcesSubDirectory()
                print(f'Introduced Sources/{package.name} Sub-Directory')
                git_commit_all(package, message=f'Introduced subdirectory /Sources/{package.name}')

                self._cleanup_package_files(package)

            git_merge(package, BRANCH_NAMES.preparation('main'), BRANCH_NAMES.preparation('develop'))

        process_items_with_breaks(PACKAGES, prepare_package)

Add Remotes

As soon as all packages are prepared, we can add each in-house dependency as a remote of the main repository. By establishing this connection, it is possible to merge the main as well as develop branch in their respective counterparts in the Main repository. By fetching we ensure that we are up-to-date with the latest changes from the remote repository.

from config import MONO_REPO, PACKAGES
from utils import run_in_repo
from steps.migration_step import MigrationStep

class AddPackageRemotesStep(MigrationStep):
    def __init__(self, title: str):
        super().__init__(title)

    def execute(self) -> None:
        for package in PACKAGES:
            run_in_repo(MONO_REPO, f'git remote add {package.name} ../{package.path}')
            run_in_repo(MONO_REPO, f'git fetch {package.name}')
            print(f'Added remote and fetched branches: {package.name}')

Merge Packages

Next, integration branches are created in the Main repository, which are later merged into the repository’s primary branches. These intermediate branches allow us to review all changes before finalizing the migration. Finally, we preserve each dependency’s commit history by merging with the --allow-unrelated-histories flag.

from config import MONO_REPO, PACKAGES, BRANCHES, BRANCH_NAMES
from utils import process_items_with_breaks, run_in_repo, git_commit_all
from steps.migration_step import MigrationStep

class MergePackagesStep(MigrationStep):
    def __init__(self, title: str):
        super().__init__(title)

    def _remove_submodules(self):
        """Remove all submodules from the mono repository for all branches."""
        print('Removing submodules from mono repository...')

        for package in PACKAGES:
            try:
                submodule_exists = run_in_repo(
                    MONO_REPO,
                    f'git config --file .gitmodules --get submodule.{package.submodulePath}.url',
                    print_output=False
                )

                if not submodule_exists:
                    print(f'Submodule {package.name} does not exist, skipping...')
                    continue

                run_in_repo(MONO_REPO, f'git submodule deinit -f {package.submodulePath}')
                run_in_repo(MONO_REPO, f'rm -rf .git/modules/{package.submodulePath}')
                run_in_repo(MONO_REPO, f'git rm -f {package.submodulePath}')
                git_commit_all(MONO_REPO, f'Remove submodule {package.name}')
                print(f'Removed submodule: {package.name}')
            except Exception as e:
                print(f'Warning: Could not remove submodule {package.name}: {e}')
                continue

    def execute(self) -> None:
        def merge_branch(branch: str):
            print(f'Processing {branch} branch...')
            integration_branch = BRANCH_NAMES.integration(branch)
            run_in_repo(MONO_REPO, f'git checkout {branch}')
            run_in_repo(MONO_REPO, f'git checkout -b {integration_branch}')

            self._remove_submodules()

            for package in PACKAGES:
                run_in_repo(
                    MONO_REPO,
                    f'git merge {package.name}/{BRANCH_NAMES.preparation(branch)} --quiet --no-ff --allow-unrelated-histories'
                )
                print(f'Merged: {package.name}/{BRANCH_NAMES.preparation(branch)} -> {integration_branch}')

        process_items_with_breaks(BRANCHES, merge_branch)

Remove Remotes

As soon as submodules are integrated their remotes are removed from the main repository.

from config import MONO_REPO, PACKAGES
from utils import run_in_repo
from steps.migration_step import MigrationStep

class RemovePackageRemotesStep(MigrationStep):
    def __init__(self, title: str):
        super().__init__(title)

    def execute(self) -> None:
        for package in PACKAGES:
            run_in_repo(MONO_REPO, f'git remote remove {package.name}')
            print(f'Removed remote: {package.name}')

Migration

Having outlined all the steps required for a monorepo migration, we can now implement a dedicated MonoRepoMigration class responsible for executing them in sequence. While most steps operate solely on input, the DetermineSubmoduleReferencesStep includes a completion handler to communicate the discovered submodules. It’s important to note that the migration process does not push any changes to the remote repository. Instead, all operations are performed locally on device. Once everything has been verified and works as expected, the integration branches can be pushed to the remote to create pull requests and finalize the migration.

#!/usr/bin/env python3

from typing import Dict, List
from config import REPOSITORIES, PACKAGES
from utils import withExecutionTimeMeasurement, perform_action
from steps.migration_step import MigrationStep
from steps import (
    CleanupStep,
    CloneRepositoriesStep,
    DetermineSubmoduleReferencesStep,
    PreparePackagesStep,
    AddPackageRemotesStep,
    MergePackagesStep,
    RemovePackageRemotesStep
)

class MonoRepoMigration:
    def __init__(self):
        self.submodule_references: Dict[str, Dict[str, str]] = {}

    def run(self):
        """Execute the complete migration process."""
        def execute_migration():
            # Define steps
            steps: List[MigrationStep] = [
                CleanupStep(
                    title='Pre-Migration Cleanup',
                    repositories=REPOSITORIES
                ),
                CloneRepositoriesStep(
                    title='Cloning Repositories',
                    repositories=REPOSITORIES
                ),
                DetermineSubmoduleReferencesStep(
                    title='Determining Submodule References',
                    completion=lambda value: self.submodule_references.update(value)
                ),
                PreparePackagesStep(
                    title='Preparing Packages',
                    submodule_references=self.submodule_references
                ),
                AddPackageRemotesStep(
                    title='Adding Package Remotes'
                ),
                MergePackagesStep(
                    title='Merging Packages'
                ),
                RemovePackageRemotesStep(
                    title='Removing Package Remotes'
                ),
                CleanupStep(
                    title='Post-Migration Cleanup',
                    repositories=PACKAGES
                )
            ]

            # Perform each step
            for step in steps:
                perform_action(step.title, step.execute)

        withExecutionTimeMeasurement(
            action_name='Migration',
            action=execute_migration
        )

if __name__ == "__main__":
    migration = MonoRepoMigration()
    migration.run()

Below you can find all constants that are used by the migration script. This way, we can define the Main repository as well as its in-house dependencies and the primary branches involved in the migration.

#!/usr/bin/env python3

from dataclasses import dataclass
from typing import Callable

@dataclass
class BranchNames:
    preparation: Callable[[str], str]
    integration: Callable[[str], str]

@dataclass
class Repository:
    name: str
    url: str
    path: str
    submoduleName: str | None
    submodulePath: str | None

# Branch naming configuration
BRANCH_NAMES = BranchNames(
    preparation=lambda branch: f'feature/Prepare-Repository-{branch}',
    integration=lambda branch: f'feature/Integrate-Repository-{branch}'
)

# Repository configurations
MONO_REPO = Repository(
    name='Main',
    url='git@github.com:LinkAndreas/MonoRepoMigration_Main.git',
    path='MonoRepoMigration_Main',
    submoduleName=None,
    submodulePath=None
)

PACKAGES = [
    Repository(
        name='Dependency1',
        url='git@github.com:LinkAndreas/MonoRepoMigration_Dependency1.git',
        path='MonoRepoMigration_Dependency1',
        submoduleName='Dependency1',
        submodulePath='Sources/Dependency1'
    ),
    Repository(
        name='Dependency2',
        url='git@github.com:LinkAndreas/MonoRepoMigration_Dependency2.git',
        path='MonoRepoMigration_Dependency2',
        submoduleName='Dependency2',
        submodulePath='Sources/Dependency2'
    )
]

# Combined repositories list
REPOSITORIES = [MONO_REPO] + PACKAGES

# Branch configurations
BRANCHES = ["develop", "main"]

For the sake of completeness, the following table contains all utility methods used throughout the migration:

MethodPurpose
run_command(...)Runs a shell command and returns the output.
run_in_repo(...)Runs a command inside a given Git repo.
process_items_with_breaks(...)Processes items, with optional line breaks.
remove_directory(...)Deletes a directory and its contents.
perform_action(...)Logs and runs a titled action.
git_clone(...)Clones a Git repo with submodules.
git_merge(...)Merges one branch into another.
git_commit_all(...)Stages and commits all repo changes.
withExecutionTimeMeasurement(...)Times and logs how long an action takes.
from datetime import datetime
from subprocess import Popen, PIPE
from typing import List, Callable, Any
from config import Repository

def run_command(command: str, print_output: bool = True) -> str:
    """Execute a shell command and return its output."""
    command = f'set -Eeuo pipefail && {command}'
    process = Popen(command, stdout=PIPE, shell=True, text=True)
    lines = []

    while True:
        line = process.stdout.readline().rstrip()
        if not line:
            break
        lines.append(line)
        if print_output:
            print(line)

    output = "\n".join(lines)
    process.communicate()

    if process.returncode != 0:
        raise Exception(f"Command failed with exit code {process.returncode}: {command}")

    return output

def run_in_repo(repo: Repository, command: str, print_output: bool = True) -> str:
    """Execute a command in the context of a repository."""
    return run_command(f'(cd {repo.path} && {command})', print_output)

def process_items_with_breaks(items: List, action: Callable, show_breaks: bool = True) -> None:
    """Process a list of items with optional line breaks between them."""
    for index, item in enumerate(items):
        action(item)
        if show_breaks and index < len(items) - 1:
            print('')

def remove_directory(path: str) -> None:
    """Safely remove a directory and its contents."""
    run_command(f'rm -rf {path}')

def perform_action(title: str, action: Callable) -> Any:
    """Execute an action with formatted logging."""
    print(f"\n=============== {title} ===============\n")
    result = action()
    print("=" * (len(title) + 32))
    return result

def git_clone(repo: Repository) -> None:
    """Clone a git repository."""
    run_command(f'git clone {repo.url} --progress --recursive')

def git_merge(repo: Repository, source: str, destination: str) -> None:
    """Merge branches in a repository."""
    run_in_repo(repo, f'git checkout {destination}')
    run_in_repo(repo, f'git merge {source} --strategy=ours --quiet --no-ff --no-edit')
    print(f'Merged {source} -> {destination}')

def git_commit_all(repo: Repository, message: str) -> None:
    """Commit all changes in a repository."""
    run_in_repo(repo, f'git add -A')
    run_in_repo(repo, f'git commit --quiet -m "{message}"')

def withExecutionTimeMeasurement(action_name: str, action: Callable) -> Any:
    """Measure and print execution time of an action"""
    start_time = datetime.now()
    print(f'Starting {action_name}...\n')

    result = action()

    duration = (datetime.now() - start_time).total_seconds()
    print(f'\nFinished {action_name} in {duration:.2f} seconds.')
    return result

Before concluding this article, we present a structured approach for executing the mono-repo migration in a coordinated and controlled manner.

Procedure

While the migration script enables local testing, completing the migration requires a coordinated effort among team members.

Below, you’ll find the key steps to follow before, during, and after the migration. Although the exact procedure may vary depending on your specific context, this guide can serve as a blueprint to help you successfully initiate and manage your own migration process.

Pre-migration

  1. Identify a suitable time window for the migration
    • If you’re working in sprints, the beginning of a sprint is often ideal, as there’s typically less delivery pressure compared to the end.
  2. Request all developers to merge their open pull requests by a fixed deadline
    • This ensures a clean state and minimizes merge conflicts during the migration.

Migration

  1. Notify the team that the migration is about to begin.
  2. Temporarily freeze all merges to main and develop during the migration process.
  3. Merge the latest changes from main into develop
    • This ensures that any hotfixes or bugfixes are synchronized with the ongoing development branch.
  4. Run the migration script on a developer machine with access to all required repositories
    • After execution, review the logs to ensure there were no errors.
    • Verify the migration result by checking the branch and commit history in your Git client.
  5. Create pull requests from the integration branches generated by the script.
  6. Have your team review the changes introduced during the migration.
  7. Merge the integration branches into develop and main.
  8. Push any migrated tags to the main repository.

Post-migration

  1. Announce to the team that the migration is complete.
  2. Archive or restrict access to the now-obsolete repositories.
  3. Re-enable merges into main and develop.

Conclusion

In this article, we examined the trade-offs between Mono- and Multi-Repo and introduced a structured approach to a Mono-Repo migration. Consolidating repositories can significantly enhance developer productivity by minimizing cross-repository coordination and reducing the number of required pull requests. The provided migration script automates key steps, enabling repeatable and locally testable migrations without impacting the remote repository. Once validated, changes are submitted through dedicated pull requests, ensuring all modifications remain transparent, reviewable, and easy to track.

References

Happy Coding 🚀