Combining repositories

There is a (and it seems not particularly rare) need sometimes to combine (I’ll avoid using the term “merge” for this) two repositories into a single repository so that we can then delete the original two and do all work on the new combined repository.

This is discussed for example in this blogpost

You’ll see a question I just asked on that blog and I’m repeating that question here:


This is an interesting post and I have a question, does your solution cover the most genera case? Here’s what I mean.

I have repo_a and repo_b.

_ repo_a has branches master, gui_project and api_project branches_
_ repo_b has branches master, mem_project and api_project branches._

In this scenario api_project has the same name in each repo because it is the same project, some code in repo_a and some in repo_b.

I want to combine these and end up with repo_c that has 4 branches:

master, gui_project, mem_project and api_project.

where:

_master contains all commits from master in repo_a and repo_b _
_api_project contains all commits from api_project in repo_a and repo_b _
_mem_project contains all commits from repo_b _
_gui_project contains all commits from repo_a _

The repos are completely different, their folder structure and so on are unrelated so no folder/path/file hieracrchy appears in more than one repo, that is there is no code/files that is in each repo.

This is my challenge, in repos where we have a lot of branches some unique to each repo and some with the same name in each repo because the work on them pertains to the same project.


Now I know that GitHub doesn’t provide such a capability (be great if it did as this need often arises from time to time as projects or organizations grow and mature) but has anyone done this, in the general case that I outline above?

In reality there a quite a lot more branches than just four, but this should be academic if we have a general solution.

I want to attempt this but also want to gather as much input from others and git gurus before I actually begin, I’d hate to spend an hour or two then hit some roadblock simply because of a lack of due dilligence.

Thanks

1 Like

:rotating_light:  WARNING: At no point in what I describe below did I need to use git push --force. If you find yourself with the urge to use git push --force, then you are at risk of losing data and you should back out and start over. git push --force is a potentially destructive command, use it at your own risk. :rotating_light:

Let me start by saying that you should protect yourself from data loss by using scratch repositories to perform your experiments. A scratch repository is one that is disconnected from all other repos so that you can’t possibly mess up anything but the scratch repo, but the various data sources that you’re using to build the scratch one are guaranteed intact. One of the nice things about git though is that it makes this pretty easy. If at any time you mess something up, just delete the entire local repository and start over.

So here’s what I did, you can see the results in https://github.com/lee-dohm/merged-repo, which is a combination of https://github.com/lee-dohm/test-repo and https://github.com/lee-dohm/test-repo-2. I took your requirements:

  • n repositories
  • No duplicate paths between the repositories on any combination of branches
  • At least one branch name common between m repositories
  • At least one branch that is unique among all repositories

To satisfy these requirements, I crated the following scenario:

  • From lee-dohm/test-repo, I would take branches master and lee-dohm-patch-4
  • From lee-dohm/test-repo-2, I would take branches master and branch-that-does-not-exist-on-test-repo
  • The master branches from each would need to be merged
  • The unique branches would each need to contain all contents from the merged master and the unique contents from their source repo

First, I created a scratch repo:

mkdir merged-repo
cd merged-repo
git init

Then, I hooked up the remotes:

git remote add test-repo https://github.com/lee-dohm/test-repo.git
git remote add other-repo https://github.com/lee-dohm/test-repo-2.git

Then, I needed to bring in all the source refs from the two repositories:

git fetch test-repo master
git fetch other-repo master
git fetch test-repo lee-dohm-patch-4
git fetch other-repo branch-that-does-not-exist-in-test-repo

At this point, you can protect yourself from altering the source repos by deleting the remotes created:

git remote remove test-repo
git remote remove other-repo

Here’s how I merged the history of the master branches:

git merge test-repo/master
git merge --allow-unrelated-histories other-repo/master

The key here, of course, being the --allow-unrelated-histories parameter to git merge. Now that the two histories have been joined, we can work on the other branches (whether their names are shared or unique).

git checkout -b lee-dohm-patch-4 test-repo/lee-dohm-patch-4
git rebase master
git checkout -b branch-that-does-not-exist-in-test-repo other-repo/branch-that-does-not-exist-in-test-repo
git rebase master

Rebasing each of the unique branches off of the local master means that they are now based from the shared history from both repositories.

Then I created a new GitHub repository to host the merged results and pushed everything up:

git remote add origin https://github.com/lee-dohm/merged-repo.git
git checkout master
git push origin master
git checkout lee-dohm-patch-4
git push origin lee-dohm-patch-4
git checkout branch-that-does-not-exist-in-test-repo
git push origin branch-that-does-not-exist-in-test-repo

I hope that helps!

8 Likes

Lee, I greatly appreciate this, I will do a similar experiment and follow your directions and if that works out I’ll then try it with the two real repos.

Many thanks.

3 Likes

3 posts were split to a new topic: Merge unrelated histories and keep parts unrelated