Evaluating GDPR implications when creating app that pulls user metadata from GitHub API?

Sorry I’m not sure whether this question is best here or in the GitHub API Development and Support category. If I’m confused, please let me know!

First of all, thank you to this community for helping me learn so much about using GitHub so far! :pray:t5: I feel like I should better introduce what I’m working on in relation to my main question about the GDPR.

I’m a small member of an European Union research project called “Open!Next” which - among many goals - seeks to gather information on the health of open source hardware projects hosted on platforms such as GitHub:

To that end, we hope to develop data-mining scripts that pull metadata from the GitHub API to get information on how members of an open source project interact with each other.

For example, for a given repository we might visualise user interactions in a network graph where each node represents a user, connections between nodes represent interactions between users (e.g. having committed to the same files, commented on the same issues, etc.), and the size/thickness of the nodes and connections represent how often those interactions happen. This way, by looking at, and calculating the metrics of the network, we might be able to discern whether a project is highly centralized with just one or a few members doing all the work, or a more “modular” community where you can see clusters of users in the visualisation that works on different parts of a project. The graph might reveal that a group of users is the documentation team, with frequent commits to the doc directory of their repository.

To do these things, we will inevitably need metadata about the users themselves. This information is clearly publicly visible just by browsing the pages of a GitHub repository, and is also accessible via the GitHub API.

However, we would be pulling that data about the users out of GitHub and onto our software (which will include a publicly-accessible web frontend that shows these metrics and visualisations) as part of our research output.

To be clear, this post is not to seek professional legal advice, and I understand that giving professional legal advice might be difficult and/or inadvisable here. Instead, I am hoping to hear from others who have built apps with the GitHub API and can share general experiences on how to think about the GDPR implications of running a project like this.

What are the important questions to ask? Is what I described reasonably achievable? If so, what are the broad strokes (or specifics, if you know them) on making sure we are GDPR-compliant?

And of course, if we manage to pull this off, we’ll be happy to share the results of our research with the GitHub community!

Thank you so much in advance for your knowledge. :hugs: