GitHub Community AMA: State of the Octoverse Report

What: The State of the Octoverse explores a year of change with new deep dives into developer productivity, security, and how we build communities on GitHub, during this AMA Senior Data Scientist Derek Jedamski @DJedamski will answer questions and share insights about the report from build to execution.

Where: The AMA will take place in this event topic, the topic will be open 10 minutes before start time.

11 Likes

About our Speaker:

Derek Jedamski is a Senior Data Scientist on the Data Science team at GitHub. He was one of the primary contributors to the analysis behind the State of the Octoverse report in 2020. The Data Science team at GitHub supports a broad array of product areas, Derek is currently primarily supporting the Security Product space for all of their data needs. Prior to GitHub, he worked at various tech and fintech companies primarily focused on machine learning.

6 Likes

@DJedamski , is your map wrong or did you count the European part of Russia as Asian stats?

image
vs
image
with the majority of the population being in the European part
image

1 Like

There is a graph for the top languages in the report. Is it possible to see the data for ALL languages recognized by GitHub - github/linguist: Language Savant. If your repository's language is being reported incorrectly, send us a pull request!?

Hiya, I hope you join us on the 26th, so we can dig in to your questions.
Thanks for being here!

Welcome to the GitHub community AMA, we are excited to be joined by Senior Data Scientist @djedamski. We are standing by live to answer your questions!

To submit a question, click “Reply”.

2 Likes

Hey everybody! I’m Derek - Senior Data Scientist at GitHub. I’ve been at GitHub for almost four years now and I was one of the primary contributors to the analysis for the State of the Octoverse report in 2020.

Looking forward to spending the next hour with you, feel free to ask me anything about Octoverse, GitHub, or whatever else comes to mind!

4 Likes

is your map wrong or did you count the European part of Russia as Asian stats?

Hi @rimutaka - you are right, good catch! We did count Russia entirely under the stats for Asia. Without leveraging granular location data for users, we can’t appropriately assign usage patterns to the European part of Russia vs the Asian part of Russia. With that said - as you appropriately call out, in the future perhaps we should consider allocating these usage patterns to where the majority of the population resides (instead of based on majority by land mass). Thanks for the feedback!

4 Likes

Nice to see you hear @DJedamski :wave:

Curious about:

  • Something upcoming you’re excited for
  • A challenging task/problem/concept you’ve worked on lately
  • A personal read/watch/play list item you’ve enjoyed lately
4 Likes

Unfortunately, we have not made that data public at this point. If you have a question about a specific language, I might be able to help. :slight_smile:

1 Like

Is there a reason why it’s not public? I’d think it is quite interesting piece of info.

Derek, since no one else is asking anything, do you mind I’ll keep firing questions at your and you answer what you like? :slight_smile:

2 Likes

Absolutely! Fire away! :grinning_face_with_smiling_eyes:

1 Like

I am working on a side project studying all sorts of developer interaction via GitHub. So far I downloaded 6,392,268 repos, which is miniscule amount. Is there a better way of accessing the code other than downloading it?
You keep your data on AWS S3, I presume. Is it possible to get direct read access, requestor pays, to it?

1 Like

That’s a great question. Unfortunately, I do not have a very satisfying answer for you but I can tell you it is something we are thinking a lot about. We take the privacy of our users’ data very seriously but at the same time, we recognize the value of some of this aggregated data to the public. The challenge is in finding the right line there and it is something we are thinking a lot about!

3 Likes

:smile_cat: thanks for being here @rimutaka! Looks like you’ve got some awesome 1:1 time with @DJedamski

2 Likes

Having direct EC2 <-> S3 would allow others to analyse all sorts of aspects of the community and the code without incurring the cost of storing it and also improve the latency. EC2 to S3 is lightning fast compare to going via an API.

3 Likes

Are we seeing this usage of open-source trending up overall?

4 Likes

That’s a lot of repos! I would be interested to hear what specifically you are studying in regards to these developer interactions.

Sadly, there is not a better path to accessing this data at this point though I will say, we certainly feel your pain.

I suppose I will just reiterate that we are thinking about a lot about these exact pain points that you are mentioning. We are thinking through the possible solutions that would allow access to some of this fully secure/anonymized data in a way that would be useful to users that want to analyze this data, such as yourself.

So sadly, the answer is: there is not a better way right now but we feel your pain and we are thinking through how we can address this. :slight_smile:

3 Likes