Project setup question with public and private "branches" #22158

cobohub · 2019-06-30T15:36:00Z

cobohub
Jun 30, 2019

Completely new to Github, however programming since 40 years, and using git for about 2 years now. In very general terms I must admit first of all that I have still big troubles to get the “git logic” and “git language” into my head - much more than many programming languages (and also CVS and SVN…) that I already loaded into my head, so I would be very happy if somebody can explain in a not too much “git-ish” language or slang!

My question is about proper project setup with git and GitHub - which is related but not identical. So far I have it in a local git repository that is accessible only for myself, on an external computer with backup, so it should not get lost. It is all written in C++, with CMake as the build system and Qt for the GUI parts.

Now I have a “base” version of the software that I want to publish as Open Source, but then I also have an “extended” version that I can only sell for a license fee for some reasons. In order to get the code published I did some effort during the last months to make it two projects: the “base” that is in one git repository, and the “extended” as a separate second project that depends on “base” through some CMake constructs. The advantage is that I can now publish “base” freely while “extended” remains completely hidden. Still - I did not find time yet to really do the publication.

Instead I found a customer for whom I am writing another “custom extension” for the “base” version. So far I simply made it a “branch” of “base”, so I can switch between “master” and “customer”, and if there is something that should be shared I can “merge” between the branches - very handy and helpful!

However: Now the “base” repository is again not in a state that I can publish it - because it contains in a branch the “customer” code that I am not supposed to publish!

In other words: I am back in the same trouble - and it would again be very tedious to split the codes…

Ideally I would even merge the “extended” version back into the main repository and have then three “branches”: public “master”, and private “extended” and “customer”, but this is not possible! As far as I understand I have to either publish the entire repository (and expose also the “private branches”), or I have to do a huge effort to separate the open and closed parts of the code.

Now I was reading about some ways to deal also with such kind of problems with both Github and Gitlab, because you cannot only “branch” but also have some kind of “more separate branch” that still allows to also merge code from the one to the other - first of all from the public “master” to the private “children”.

So finally the question: Is this true - and if yes: What is the name of this technology, and where can I find some instructions to make it happen?

I would very much like to understand the procedure in advance, before I jump into reorganizing my entire code again (always very error prone!).

Many thanks in advance for any helpful answers!!

Answered by cobohub

Aug 19, 2019

A final word for people who find this thread, with similar questions and looking for a good solution:

I understood that with the means of git it is quite possible to achieve a setup like I need to build, i.e. an open source and a closed source project, with two “remote” repositories (public and private) and only one working repository locally. The disadvantage is that you need a lot of discipline to always push the right thing to the right remote.

(And at this point also thanks for contributing to my better understanding of git - which I am using for quite a while already but my understanding is growing only slowly…)

I went now for the other possible solution: I split the project into …

View full answer

lee-dohm · 2019-07-03T22:16:45Z

lee-dohm
Jul 3, 2019

Thank you for all the context to help me understand your particular problem better. Let me start off by suggesting @jwiegley’s Git from the Bottom Up e-book as, in my opinion, the best way to “load git into your head”.

To answer your specific question, “is it possible to have private branches?” Yes, it is possible, though not the way you seem to be thinking about them.

You are correct that it is not possible to have a published repository that contains a branch foo where the contents of branch foo are not accessible to people with read access to the repository. However, it is possible to have two instances of a single repository where one instance is public and the other is private. You can publish the branches that need to remain private to the private instance and publish the branches that can be public to the public instance.

With that said, I would not normally recommend this kind of setup because one would either need to be highly disciplined to prevent accidentally publishing private branches to the public repository instance. You could create custom tooling to make it less likely that mistakes would occur, but even if you eliminated the possibility that private branches would be exposed, there would still be the chance that code would be added to the wrong (public) branch and leaked in that way.

There’s no name for the feature or technology, because it is a consequence of how distributed version control systems, like git, are designed, specifically the “distributed” part. For the purpose of illustration, here’s how this could be achieved. I’m going to start from an empty repository because it’s easier 😀

mkdir sample-split-repo
cd sample-split-repo
git init
touch README.md
git add --all .
git commit -m "Initial commit"

At this point, we have a basic repository with the one README.md file in it. Let’s add a file for code:

touch script.rb
# In an editor add some simple statements to the file
git add --all .
git commit -m "Add public code"

Now let’s create the two repositories. I’ve created:

for the purposes of this demonstration. Then we add both repositories as “remote” repositories of our local copy:

git remote add public https://github.com/lee-dohm/split-repo-public.git
git remote add private https://github.com/lee-dohm/split-repo-private.git

And then we push the current version of the master branch (containing only public code) to both repositories:

git push public master
git push private master

Now. let’s create some private code on a separate private branch:

git checkout -b private-branch
# Make some changes to script.rb to add private code
git commit -a -m "Add private code"
git push private private-branch

So, at this point, we have two repositories containing mostly the same stuff. Both are public so that you can see how this works, but in practice the split-repo-private repository would actually be private. But as you can see here:

split-repo-public only has one branch
split-repo-private has two branches, the public master branch and the private private-branch branch

Now, at any time, you can “promote” private code to public code by simply merging the code from the private-branch branch into master and pushing the latest copy of the master branch to the public repository.

I hope that helps and let me know if you have any questions.

0 replies

cobohub · 2019-07-05T20:21:52Z

cobohub
Jul 5, 2019
Author

Thank you very much for your friendly and very clear explanation - almost a little tutorial: I appreciate it very much!

Actually you are using the feature of having several “remotes” for one and the same local repository which I knew in principle, but somehow never really “trusted” - because I did not really know what I am doing. Maybe the best thing is indeed that I just play a little with your example, or with a similar one that I create locally - and where I cannot do any damage. Until I feel comfortable with it!

There was another track that I was considering, with “branches” a bit more separate - but I don’t know if I can do all the necessary steps in that setup:

Start with one local repository and move it to Github (or possibly Gitlab - no idea yet).
Then make some kind of remote “clone” of the public repository and make that one private. Now I would also have two repositories on Github, both with the same content, but in some kind of “parent-child” relation.
Now I can generate also a local version of that second Github repository, so also locally I would have two of them: one where I manage the private project, and one for the public.

But now the QUESTION is in this case: are there ways to “merge” changes in the private repository to the private clone? Because that is the intention: updates of the private project should be taken over also in the private.

The other way round would not normally happen - which is basically a question of adding every update to the right project locally and pushing it to the right Github remote.

For me this setup would look still a bit more “safe” than your proposal that needs a bit more awareness of always doing the right thing! But ok, also in my potential setup such kind of awareness if of course required.

Only the question would be: Is there an easy way to do such kind of “merge from the public”?

Actually it should be the case because I know that there are lots of public repositories on github, licensed in such a way that I am allowed to derive “closed” projects from them - and I assume that also there should be a way to “pull” public updates into my private project (although in “git language” the verb “pull” has a very specific meaning, I am using the verb now in the more general “english language”!).

Any comment on that idea? Is it a) possible and b) advisable in your eyes?

0 replies

lee-dohm · 2019-07-07T21:43:17Z

lee-dohm
Jul 7, 2019

The short answer is: Yes, it is possible to do what you describe. But because git is a distributed version control system, there is no material difference between your solution and my solution.

The version control systems that you’re used to, you mentioned CVS and SVN specifically, are designed in a client/server configuration. There is one central repository that is the single source of truth that acts as the server. Then there are many clients that interact with the server to update it with new content or changes. Clients can’t exchange new content or changes between each other except through the central server. As a matter of fact, the clients never have a full copy of the repository locally.

Distributed version control systems, on the other hand, are designed so that every instance of the repository is a full “clone” of the entire repository. There is no “server” except by agreed-upon convention and any instance can exchange new content or changes with any other instance. All clones of any repository are peers. It’s only that everyone’s used to client/server version control architectures that makes it so that people typically describe networks of git repositories in the client/server way to make it easier for new people to understand.

For example, let’s say that there are three members of a team working on a project together: Alice, Bob, and Clarice. They have a git repository on a local server and a cloned backup of it off-site that is kept up-to-date by a scheduled task. With this setup, all of these workflows are possible using git (or theoretically any other distributed version control system):

Get the latest changes from the central server on Alice’s, Bob’s, or Clarice’s machine
Clone a new copy of the repository from the server on the new employee’s machine
When the server’s hard drive dies, everyone can use the backup server as the new central “source of truth” by adding a backup remote to their local repository with the backup server’s address
When Alice and Bob are at a café together with no access to a network, they can exchange branches directly between their machines by one of them cloning their local repo to a thumb drive. Then each in turn can “push” or “pull” changes from their local repo to the thumb drive before passing the thumb drive to the other person.
When Clarice goes to the client’s site where the app is to be deployed, she can push the latest code directly from her laptop to the client’s git server

I hope those examples illustrate the possibilities that are available with this kind of system. It doesn’t matter which repository is created “first” or “last”, they’re all equal partners in the network. It’s only a question of which instance has which changes.

So, no matter what topology you choose for the network, the same level of care is needed to prevent mistakes.

I hope that answers your question. Let me know if you have any more.

0 replies

cobohub · 2019-07-08T07:10:32Z

cobohub
Jul 8, 2019
Author

Thanks again for your explanations! I know that learning one thought is always easier than a “way of thinking”, and git is in that sense a “way of thinking”, not just another version control system!

One thing I realize from your last explanations - which are not completely new for me, but it was not sufficiently in the “core” of my reasonings: If all the repositories - “local”, “remote”, whatever… - are more or less equivalent, the “natural thing” would always be to keep them all aligned as much as possible. Like when adding ink to a pot with water, the blue will initially be local, but eventually it would spread all over the pot and no differentiations remain.

The “unnatural” thing is then the intention to keep some “private” extention of a “public” repository separate. If it is just two projects - no problem: it is like having two pots with water with different amount of ink that is spreading. But if you want to keep a differentiation within one single pot, the physical comparison does not fit any more because in the case of water and ink it is impossible!

And in the case of git repositories, this is the point where “taking very much care” comes into the game!

For me the bottom line is: I have to keep this in mind, do some little local “playing around” with mini repos with mini changes, until I feel safe to do the same with my real code.

I also learn from this that there might be less of “git extension” if I go to Github (or also gitlab) than I thought: All these “pull request” etc. things that you find there are then more a certain communication strategy between users than a fundamental change in the overall logic - which is still “git logic”.

0 replies

lee-dohm · 2019-07-08T22:59:11Z

lee-dohm
Jul 8, 2019

Distrubited version control systems, like git, do require a certain mental paradigm shift to really grok, it is true. But I tihnk you’re getting it now. At least, from your description, I think you understand the risk I was trying to convey in my first message 😀

Please do feel free to reach out with any more questions!

0 replies

cobohub · 2019-07-09T06:42:11Z

cobohub
Jul 9, 2019
Author

Thank you for all - you really helped me a good deal further! I am right now doing some little local tests with only few files and several repositories, and I am studying “Git from the Bottom Up” - which is exactly the kind of thing that I missed: So far most of my explanations were “do this - do that - it is all very easy” … and at the end I felt like I understood a lot but nothing really at depth…

Only one more question - if you happen to have an answer: You initially proposed a setup that would solve my problem - saying at the same time that you would not recommend it! So the question still is whether there is a git setup that you would recommend indeed!? Because - the problem that I need to solve is real! And I believe that I am not the only one with a similar setup:

[public base project] -> [private extended project]

Actually my public base project is again derived from a public project, which is Paraview (see paraview.org), written in C++ and managed with CMake (where the makers of CMake happen to be the same as the makers of Paraview). They are solving the problem at the CMake level, not at the git level: You have a base project (Paraview), and then you have a separate project that “imports” the base project with the means of CMake. So at the git level there is no interference at all, and you can change the version of the underlying Paraview for your project, do some adaptations to the new version and recompile.

However, with this setup you still have to duplicate quite a number of source files and kind of “settings files”, depending on the level of “intrusiveness” of the derived project of course (which then need to be reworked manually in the case of a version update of Paraview), and also the project setup is sometimes overly complex, just in order to fit the pattern. (Which does not mean that the Paraview people have not done a phantastic work: I am really really impressed day after day!)

But this is basically the reason why I was looking for my own solution at the git level - in order not to add even more complexity to this setup, where the first “derivation” is given (CMake based) and I am trying to do the second with the means of git:

[Paraview] -> [base project] -> [extended project]

Anyway, I am not stuck, so no real need to answer in detail - because I think I understand now both the shortcomings and strengths of the “CMake solution” and the “git solution”. Except you still have another hint or remark that I did not think of so far!

0 replies

lee-dohm · 2019-07-09T22:20:29Z

lee-dohm
Jul 9, 2019

You’ve definitely got a good understadning of the choice in front of you.

I do want to take the time to point out that this is an issue that lots of software projects face. But it isn’t really a “CMake problem” or a “git problem” as much as it is a problem of managing the varied dependencies of software projects. This is why many, many modern software systems are being built on top of dependency management systems: npm for JavaScript, Ruby Gems for Ruby, Maven for Java, Hex for Erlang and Elixir, etc. These dependency or package managers create a framework for making reusable components that can be easily composed into a higher-level system and they encourage cleaner abstractions than were common in the past.

By using a package manager, one can take several open- or closed-source components and combine them into a larger project. Each of those components are also probably composed of several lower-level components and so on. This allows the decision of “should this project be open or closed source?” to be made at the project level, rather than trying to remember which pieces of a project are supposed to be public or private.

I’m not sure if the project you’re trying to build lends itself to that kind of methodology or if a package or dependency manager is available to you in the toolchain or language ecosystem you’ve chosen (or are restricted to for various other reasons). But using a system based on the “CMake solution” with the help of a dependency/package manager is what I have seen be successful most often for the kind of problem you’re describing.

I made a lot of assumptions about your situation and the tools you’re using. So if I’m stating the obvious, I apologize. I felt that a discussion of this problem space wasn’t complete without mentioning dependency management. In any case, I hope that’s helpful!

0 replies

cobohub · 2019-07-09T22:28:23Z

cobohub
Jul 9, 2019
Author

Right, that’s exactly the point! And CMake basically _is_ the dependency manager that I am using!

0 replies

lee-dohm · 2019-07-09T23:01:33Z

lee-dohm
Jul 9, 2019

I apologize. I haven’t used CMake before and didn’t do the right amount of research before I spoke.

0 replies

cobohub · 2019-08-19T06:28:01Z

cobohub
Aug 19, 2019
Author

A final word for people who find this thread, with similar questions and looking for a good solution:

I understood that with the means of git it is quite possible to achieve a setup like I need to build, i.e. an open source and a closed source project, with two “remote” repositories (public and private) and only one working repository locally. The disadvantage is that you need a lot of discipline to always push the right thing to the right remote.

(And at this point also thanks for contributing to my better understanding of git - which I am using for quite a while already but my understanding is growing only slowly…)

I went now for the other possible solution: I split the project into two, one being the open source project and nothing else, and the other one consisting of the closed source code and “including” the open source project. All this I am doing with the means of CMake that I am using as a “meta make” or project management utility - another very powerful tool (like git) that can do “almost everything” but needs some time to grow into it.

0 replies

HoodFeelGood · 2020-12-22T17:43:44Z

HoodFeelGood
Dec 22, 2020

I’m sorry for replying to this “old” thread, but I had some questions that are directly in line with the discussion. But first, I wanted to say thank you – this thread has helped me tremendously. I am somewhat new to coding and extremely new to version-control.

I too want to have a private path and public path. I made this a bit easier by storing almost all the data I want to remain private in a single configuration file. When I wanted to share publicly, I removed the private information from the config file and had to mess with my repo to create an orphan branch in order to flatten all the commits so that the private information wasn’t visible in the commits. My end result is that my master branch has all my code and the config file with my private information. I have a branch off of master called “Public” that contains the same code and the same config file with all the private information stripped.

Would there a way to do a slight deviation of what’s described above where there is master, a public branch, and a private branch. The private branch contains the config file with private information. The public branch contains the config file with private information stripped. And the master branch is where the bulk of my code lives. When I update code in the master branch it will somehow be merged into both the public and private branches?

0 replies

HoodFeelGood · 2020-12-24T03:57:53Z

HoodFeelGood
Dec 24, 2020

This is what I’m thinking so far.

Create directory and initialize git.
Create bulk of code in master branch.
Create public branch for public only code.
Create private branch for private only code.
Update main code in master branch – merge to both public/private when needed.
If something in public needs to get into private branch - switch to private and "git checkout public .\file_name.
If something in private needs to get into public branch - switch to public and "git checkout private .\file_name.

Through experimenting – I’ve noticed that if I work on files in one branch, say the private branch, and then copy the file to the public branch, the commits will not follow…meaning that in the private branch there will be a tracked history of changes to the file, but in the public branch the file will be brand-new without tracked history.

Even further, if the file originated in the public branch, and had several commits, and then I copied to the private branch and had several more commits, when I copy file back over to the public branch, the latest version will be a single commit on top of the original commits in the public branch (i.e. only the latest version without the additional commits made in the private branch will come over to the public branch).

0 replies

airtower-luna · 2020-12-24T09:35:34Z

airtower-luna
Dec 24, 2020

As you described it the checkout/add procedure will lose history. Checkout just gives you the file content (blob) from the given ref, not its history.

An important catch to consider: You cannot have private branches in a public repository. If you host your master & public branches on Github and push the private branch as well, it’ll be public. You can keep a branch private by never pushing it (so it lives only in your local repository), but that seems risky to me: a mistyped push command can happen quickly.

In general a better approach is the one described in the post above yours: Have a repository with the public code only, and use that from your private code (which might live in another repository). Depending on the structure of your code including the public code as a submodule might make sense.

0 replies

HoodFeelGood · 2020-12-24T21:28:01Z

HoodFeelGood
Dec 24, 2020

Airtower, is there a command that gives you the file content (blob) and the history?
In my case, when taking a private file from a private branch and copying it to a public branch, having the file content without the history is desired, due to the possibility that the file history in the private branch would have private information.

What I was trying to describe was – the private branch would be connected to a private repository on GitHub, while the public branch would be connected to a public repository on GitHub. The Master branch would not be remotely connected to anything, and would only serve as a merging source for the public and private branches. Not sure if this is doable…I’ll be experimenting some more.

0 replies

HoodFeelGood · 2020-12-25T02:53:55Z

HoodFeelGood
Dec 25, 2020

I guess I realize my thinking is a bit one-dimensional – I’m a lone programmer, a beginner, with no thoughts of collaborating with anyone via Git or GitHub. I think what I’m proposing might not work well if I have to play nice with others…

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Community

Project setup question with public and private "branches" #22158

{{title}}

Replies: 15 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Project setup question with public and private "branches" #22158

Replies: 15 comments

cobohub Jul 5, 2019 Author

cobohub Jul 8, 2019 Author

cobohub Jul 9, 2019 Author

cobohub Jul 9, 2019 Author

cobohub Aug 19, 2019 Author

cobohub
Jul 5, 2019
Author

cobohub
Jul 8, 2019
Author

cobohub
Jul 9, 2019
Author

cobohub
Jul 9, 2019
Author

cobohub
Aug 19, 2019
Author