Fetching events from Public event API

Hello,

I’m wondering about the usage of the https://developer.github.com/v3/activity/events API, and mainly the “List public events” section.
According to the docs, Querying this API will return the last 300 public events that triggered in Github, with a 5 minute delay.
It also states that the API returns a “X-Poll-Interval” header, which should tell you how often you can poll the API.
This by default is set to 60 seconds, and can only increase during high server load.

From my short experience in testing this API, the amount of events that is received is quite enormous (as there is no filter on the request, you receive all event types available).
During my tests, the only way you can go over ALL public events (without missing any), is querying the API every second and going over all 300 events received.

Furthermore, the following research article presents the same problem (see the bottom section of " What Didn’t Work")
It indicates that querying in a 5 second interval will lead to missed events in 80% of the time.
It also mentions that crawlers such as GH archive are able to go over all events by querying the API every 0.75 seconds!

Can you please shed some light on how this API should be used?
Since there is no filter on the polled events, and querying the API according to the “X-Polling-Interval” header will surely lead to missed events, what is the suggested procedure for using this API to find specific public events without them being missed?

Thanks in advance.

1 Like

Hello @ilia-cy

Going back over our API category for items we missed and this one stood out, to me. This concern has been brought to GitHub’s attention more than a few times, and it looks like you also opened a Support ticket with us. In that Support ticket, there was some feedback provided that suggests our GraphQL API would be better suited for future development in this area. Mainly, filtering.

Though I think it would be helpful for others looking into this, to have some word from GitHub, on the status of the existing v3 REST API, and a quick note about GraphQL.

The quick note Re: GraphQL – currently still does not provide an endpoint with similar functionality.

If you (or others) would like to see parity in GraphQL rather than improving/modifying our REST for public events, use our feedback form, here:

https://support.github.com/contact/feedback


For the existing /events endpoint…

There is no easy way for you to continually poll this endpoint, and ensure that you are getting 100% of the events that are occurring, without hitting the rate limit.

If you are making authenticated requests you have 5000 requests per hour. That’s around 83 per minute. Let’s say - 1 per second for ease of use.

You can request:

https://docs.github.com/en/rest/reference/activity#list-public-events

…with 100 items, per page:

https://docs.github.com/en/rest/overview/resources-in-the-rest-api#pagination

We can’t guarantee that there won’t be more than 100 public events in that 1 second window, and as you have rightly pointed out there are usually more than this.

This event feed then can then be used to give you an impression of activity, but shouldn’t be treated as a 100% source of truth.

Primary use cases are people wanting events with in their own scope:

  • Users looking to track their organisation activity
  • Users looking to track their repository activity
  • Users looking to track their friends activity
  • Users looking to track their overall activity

Hopefully this information is helpful to you and others!

Thanks a lot for the detailed response!

1 Like