Before you know it your GitHub organization is a mess, filled with thousands of repositories, with different naming conventions, related to different departments and applications, filled with different programming languages or even just config-files and being used in different levels in the organization. And from the people having access, who is primarily responsible? I’m looking for ways to bring some order in the chaos.
What would be good ways to add relevant metadata to the repositories, so it can be indexed using a bot of some sorts? Let me share you some solutions I encountered and why I feel they are not the right solution:
A lot can be added in a README, with the benefit of showing up on first glance. But it won’t be so structured, making it difficult to parse.
GitHub topics allow you to set some labels on the repository. Great to make them more discoverable, but not structured and it gets crowded in the UI after a couple of topics.
The good old MAINTAINERS file. It will help clarify who is responsible, but doesn’t allow for much more. Also it is free-format, but that can be checked upon of course.
Sort of the newer and more advanced version of a MAINTAINERS file. But it also includes information on scope within the repository, making it more difficult to parse. And entire groups can be references, making it mare difficult to pinpoint an exact person. Like the MAINTAINERS file the scope is limited to the maintainers and it is not fit for other information.
This GitHub - publiccodeyml/publiccode.yml: a metadata standard for public software definition seems a good fit. Although it is intended for open source projects, it would also work in an internal organization as well. It is structured so good to parse. But as there might be different needs, it might be necessary to fork it into a custom solution. And still it requires tooling for parsing these files.
So, my question, how are others taking care of this? What tooling is commonly used?