Project setup question with public and private "branches" #22158
-
Completely new to Github, however programming since 40 years, and using git for about 2 years now. In very general terms I must admit first of all that I have still big troubles to get the “git logic” and “git language” into my head - much more than many programming languages (and also CVS and SVN…) that I already loaded into my head, so I would be very happy if somebody can explain in a not too much “git-ish” language or slang! My question is about proper project setup with git and GitHub - which is related but not identical. So far I have it in a local git repository that is accessible only for myself, on an external computer with backup, so it should not get lost. It is all written in C++, with CMake as the build system and Qt for the GUI parts. Now I have a “base” version of the software that I want to publish as Open Source, but then I also have an “extended” version that I can only sell for a license fee for some reasons. In order to get the code published I did some effort during the last months to make it two projects: the “base” that is in one git repository, and the “extended” as a separate second project that depends on “base” through some CMake constructs. The advantage is that I can now publish “base” freely while “extended” remains completely hidden. Still - I did not find time yet to really do the publication. Instead I found a customer for whom I am writing another “custom extension” for the “base” version. So far I simply made it a “branch” of “base”, so I can switch between “master” and “customer”, and if there is something that should be shared I can “merge” between the branches - very handy and helpful! However: Now the “base” repository is again not in a state that I can publish it - because it contains in a branch the “customer” code that I am not supposed to publish! In other words: I am back in the same trouble - and it would again be very tedious to split the codes… Ideally I would even merge the “extended” version back into the main repository and have then three “branches”: public “master”, and private “extended” and “customer”, but this is not possible! As far as I understand I have to either publish the entire repository (and expose also the “private branches”), or I have to do a huge effort to separate the open and closed parts of the code. Now I was reading about some ways to deal also with such kind of problems with both Github and Gitlab, because you cannot only “branch” but also have some kind of “more separate branch” that still allows to also merge code from the one to the other - first of all from the public “master” to the private “children”. So finally the question: Is this true - and if yes: What is the name of this technology, and where can I find some instructions to make it happen? I would very much like to understand the procedure in advance, before I jump into reorganizing my entire code again (always very error prone!). Many thanks in advance for any helpful answers!! |
Beta Was this translation helpful? Give feedback.
Replies: 15 comments
-
Thank you for all the context to help me understand your particular problem better. Let me start off by suggesting @jwiegley’s Git from the Bottom Up e-book as, in my opinion, the best way to “load git into your head”. To answer your specific question, “is it possible to have private branches?” Yes, it is possible, though not the way you seem to be thinking about them. You are correct that it is not possible to have a published repository that contains a branch With that said, I would not normally recommend this kind of setup because one would either need to be highly disciplined to prevent accidentally publishing private branches to the public repository instance. You could create custom tooling to make it less likely that mistakes would occur, but even if you eliminated the possibility that private branches would be exposed, there would still be the chance that code would be added to the wrong (public) branch and leaked in that way. There’s no name for the feature or technology, because it is a consequence of how distributed version control systems, like git, are designed, specifically the “distributed” part. For the purpose of illustration, here’s how this could be achieved. I’m going to start from an empty repository because it’s easier 😀
At this point, we have a basic repository with the one
Now let’s create the two repositories. I’ve created: for the purposes of this demonstration. Then we add both repositories as “remote” repositories of our local copy:
And then we push the current version of the
Now. let’s create some private code on a separate private branch:
So, at this point, we have two repositories containing mostly the same stuff. Both are public so that you can see how this works, but in practice the split-repo-private repository would actually be private. But as you can see here:
Now, at any time, you can “promote” private code to public code by simply merging the code from the I hope that helps and let me know if you have any questions. |
Beta Was this translation helpful? Give feedback.
-
Thank you very much for your friendly and very clear explanation - almost a little tutorial: I appreciate it very much! Actually you are using the feature of having several “remotes” for one and the same local repository which I knew in principle, but somehow never really “trusted” - because I did not really know what I am doing. Maybe the best thing is indeed that I just play a little with your example, or with a similar one that I create locally - and where I cannot do any damage. Until I feel comfortable with it! There was another track that I was considering, with “branches” a bit more separate - but I don’t know if I can do all the necessary steps in that setup:
But now the QUESTION is in this case: are there ways to “merge” changes in the private repository to the private clone? Because that is the intention: updates of the private project should be taken over also in the private. The other way round would not normally happen - which is basically a question of adding every update to the right project locally and pushing it to the right Github remote. For me this setup would look still a bit more “safe” than your proposal that needs a bit more awareness of always doing the right thing! But ok, also in my potential setup such kind of awareness if of course required. Only the question would be: Is there an easy way to do such kind of “merge from the public”? Actually it should be the case because I know that there are lots of public repositories on github, licensed in such a way that I am allowed to derive “closed” projects from them - and I assume that also there should be a way to “pull” public updates into my private project (although in “git language” the verb “pull” has a very specific meaning, I am using the verb now in the more general “english language”!). Any comment on that idea? Is it a) possible and b) advisable in your eyes? |
Beta Was this translation helpful? Give feedback.
-
The short answer is: Yes, it is possible to do what you describe. But because git is a distributed version control system, there is no material difference between your solution and my solution. The version control systems that you’re used to, you mentioned CVS and SVN specifically, are designed in a client/server configuration. There is one central repository that is the single source of truth that acts as the server. Then there are many clients that interact with the server to update it with new content or changes. Clients can’t exchange new content or changes between each other except through the central server. As a matter of fact, the clients never have a full copy of the repository locally. Distributed version control systems, on the other hand, are designed so that every instance of the repository is a full “clone” of the entire repository. There is no “server” except by agreed-upon convention and any instance can exchange new content or changes with any other instance. All clones of any repository are peers. It’s only that everyone’s used to client/server version control architectures that makes it so that people typically describe networks of git repositories in the client/server way to make it easier for new people to understand. For example, let’s say that there are three members of a team working on a project together: Alice, Bob, and Clarice. They have a git repository on a local server and a cloned backup of it off-site that is kept up-to-date by a scheduled task. With this setup, all of these workflows are possible using git (or theoretically any other distributed version control system):
I hope those examples illustrate the possibilities that are available with this kind of system. It doesn’t matter which repository is created “first” or “last”, they’re all equal partners in the network. It’s only a question of which instance has which changes. So, no matter what topology you choose for the network, the same level of care is needed to prevent mistakes. I hope that answers your question. Let me know if you have any more. |
Beta Was this translation helpful? Give feedback.
-
Thanks again for your explanations! I know that learning one thought is always easier than a “way of thinking”, and git is in that sense a “way of thinking”, not just another version control system! One thing I realize from your last explanations - which are not completely new for me, but it was not sufficiently in the “core” of my reasonings: If all the repositories - “local”, “remote”, whatever… - are more or less equivalent, the “natural thing” would always be to keep them all aligned as much as possible. Like when adding ink to a pot with water, the blue will initially be local, but eventually it would spread all over the pot and no differentiations remain. The “unnatural” thing is then the intention to keep some “private” extention of a “public” repository separate. If it is just two projects - no problem: it is like having two pots with water with different amount of ink that is spreading. But if you want to keep a differentiation within one single pot, the physical comparison does not fit any more because in the case of water and ink it is impossible! And in the case of git repositories, this is the point where “taking very much care” comes into the game! For me the bottom line is: I have to keep this in mind, do some little local “playing around” with mini repos with mini changes, until I feel safe to do the same with my real code. I also learn from this that there might be less of “git extension” if I go to Github (or also gitlab) than I thought: All these “pull request” etc. things that you find there are then more a certain communication strategy between users than a fundamental change in the overall logic - which is still “git logic”. |
Beta Was this translation helpful? Give feedback.
-
Distrubited version control systems, like git, do require a certain mental paradigm shift to really grok, it is true. But I tihnk you’re getting it now. At least, from your description, I think you understand the risk I was trying to convey in my first message 😀 Please do feel free to reach out with any more questions! |
Beta Was this translation helpful? Give feedback.
-
Thank you for all - you really helped me a good deal further! I am right now doing some little local tests with only few files and several repositories, and I am studying “Git from the Bottom Up” - which is exactly the kind of thing that I missed: So far most of my explanations were “do this - do that - it is all very easy” … and at the end I felt like I understood a lot but nothing really at depth… Only one more question - if you happen to have an answer: You initially proposed a setup that would solve my problem - saying at the same time that you would not recommend it! So the question still is whether there is a git setup that you would recommend indeed!? Because - the problem that I need to solve is real! And I believe that I am not the only one with a similar setup: [public base project] -> [private extended project] Actually my public base project is again derived from a public project, which is Paraview (see paraview.org), written in C++ and managed with CMake (where the makers of CMake happen to be the same as the makers of Paraview). They are solving the problem at the CMake level, not at the git level: You have a base project (Paraview), and then you have a separate project that “imports” the base project with the means of CMake. So at the git level there is no interference at all, and you can change the version of the underlying Paraview for your project, do some adaptations to the new version and recompile. However, with this setup you still have to duplicate quite a number of source files and kind of “settings files”, depending on the level of “intrusiveness” of the derived project of course (which then need to be reworked manually in the case of a version update of Paraview), and also the project setup is sometimes overly complex, just in order to fit the pattern. (Which does not mean that the Paraview people have not done a phantastic work: I am really really impressed day after day!) But this is basically the reason why I was looking for my own solution at the git level - in order not to add even more complexity to this setup, where the first “derivation” is given (CMake based) and I am trying to do the second with the means of git: [Paraview] -> [base project] -> [extended project] Anyway, I am not stuck, so no real need to answer in detail - because I think I understand now both the shortcomings and strengths of the “CMake solution” and the “git solution”. Except you still have another hint or remark that I did not think of so far! |
Beta Was this translation helpful? Give feedback.
-
You’ve definitely got a good understadning of the choice in front of you. I do want to take the time to point out that this is an issue that lots of software projects face. But it isn’t really a “CMake problem” or a “git problem” as much as it is a problem of managing the varied dependencies of software projects. This is why many, many modern software systems are being built on top of dependency management systems: npm for JavaScript, Ruby Gems for Ruby, Maven for Java, Hex for Erlang and Elixir, etc. These dependency or package managers create a framework for making reusable components that can be easily composed into a higher-level system and they encourage cleaner abstractions than were common in the past. By using a package manager, one can take several open- or closed-source components and combine them into a larger project. Each of those components are also probably composed of several lower-level components and so on. This allows the decision of “should this project be open or closed source?” to be made at the project level, rather than trying to remember which pieces of a project are supposed to be public or private. I’m not sure if the project you’re trying to build lends itself to that kind of methodology or if a package or dependency manager is available to you in the toolchain or language ecosystem you’ve chosen (or are restricted to for various other reasons). But using a system based on the “CMake solution” with the help of a dependency/package manager is what I have seen be successful most often for the kind of problem you’re describing. I made a lot of assumptions about your situation and the tools you’re using. So if I’m stating the obvious, I apologize. I felt that a discussion of this problem space wasn’t complete without mentioning dependency management. In any case, I hope that’s helpful! |
Beta Was this translation helpful? Give feedback.
-
Right, that’s exactly the point! And CMake basically _is_ the dependency manager that I am using! |
Beta Was this translation helpful? Give feedback.
-
I apologize. I haven’t used CMake before and didn’t do the right amount of research before I spoke. |
Beta Was this translation helpful? Give feedback.
-
A final word for people who find this thread, with similar questions and looking for a good solution:
(And at this point also thanks for contributing to my better understanding of git - which I am using for quite a while already but my understanding is growing only slowly…)
|
Beta Was this translation helpful? Give feedback.
-
I’m sorry for replying to this “old” thread, but I had some questions that are directly in line with the discussion. But first, I wanted to say thank you – this thread has helped me tremendously. I am somewhat new to coding and extremely new to version-control. I too want to have a private path and public path. I made this a bit easier by storing almost all the data I want to remain private in a single configuration file. When I wanted to share publicly, I removed the private information from the config file and had to mess with my repo to create an orphan branch in order to flatten all the commits so that the private information wasn’t visible in the commits. My end result is that my master branch has all my code and the config file with my private information. I have a branch off of master called “Public” that contains the same code and the same config file with all the private information stripped. Would there a way to do a slight deviation of what’s described above where there is master, a public branch, and a private branch. The private branch contains the config file with private information. The public branch contains the config file with private information stripped. And the master branch is where the bulk of my code lives. When I update code in the master branch it will somehow be merged into both the public and private branches? |
Beta Was this translation helpful? Give feedback.
-
This is what I’m thinking so far.
Through experimenting – I’ve noticed that if I work on files in one branch, say the private branch, and then copy the file to the public branch, the commits will not follow…meaning that in the private branch there will be a tracked history of changes to the file, but in the public branch the file will be brand-new without tracked history. Even further, if the file originated in the public branch, and had several commits, and then I copied to the private branch and had several more commits, when I copy file back over to the public branch, the latest version will be a single commit on top of the original commits in the public branch (i.e. only the latest version without the additional commits made in the private branch will come over to the public branch). |
Beta Was this translation helpful? Give feedback.
-
As you described it the checkout/add procedure will lose history. Checkout just gives you the file content (blob) from the given ref, not its history. An important catch to consider: You cannot have private branches in a public repository. If you host your master & public branches on Github and push the private branch as well, it’ll be public. You can keep a branch private by never pushing it (so it lives only in your local repository), but that seems risky to me: a mistyped push command can happen quickly. In general a better approach is the one described in the post above yours: Have a repository with the public code only, and use that from your private code (which might live in another repository). Depending on the structure of your code including the public code as a submodule might make sense. |
Beta Was this translation helpful? Give feedback.
-
Airtower, is there a command that gives you the file content (blob) and the history? What I was trying to describe was – the private branch would be connected to a private repository on GitHub, while the public branch would be connected to a public repository on GitHub. The Master branch would not be remotely connected to anything, and would only serve as a merging source for the public and private branches. Not sure if this is doable…I’ll be experimenting some more. |
Beta Was this translation helpful? Give feedback.
-
I guess I realize my thinking is a bit one-dimensional – I’m a lone programmer, a beginner, with no thoughts of collaborating with anyone via Git or GitHub. I think what I’m proposing might not work well if I have to play nice with others… |
Beta Was this translation helpful? Give feedback.
A final word for people who find this thread, with similar questions and looking for a good solution:
(And at this point also thanks for contributing to my better understanding of git - which I am using for quite a while already but my understanding is growing only slowly…)