Yes, webhook delivery honors round-robin DNS. The advantage to round-robin DNS is that it is the DNS system that handles the round-robin process. The application layer isn’t aware of the work going on under the hood.
I guess my question wasn’t precise enough. Some background:
Round-robin DNS with multiple A records for a domain name was historically meant for load balancing. The DNS server hands out one of the IPs at random. These days, there is application layer support that, to help with availability, tries all IPs if one does not respond. Chrome e.g. will do this. If one server is down, it will try the next (similar to what multiple MX records have been doing for mail all along). This isn’t handled by the network layer though, or “transparent”. It’s typically an application layer feature provided by the HTTP library (so that it can e.g. react to a 50x error and just try the next IP).
If you do
you should see 5 IPs. Chrome will try them all in turn if the first fails to respond. (I think most browsers do)
on the other hand will only try the first IP, and will try the same IP with every further invocation.
So my question was really: Does the code underlying Github’s webhook delivery backend try alternate IPs if one is not responding (like chrome) or does it just try one (like ping)?
I doubt we have a load problem. On the scale of things, we are very very very tiny. It’s an availability problem. The bot uses
aiohttp and has multiple threads (IIRC) so it should have no issue with a lot of concurrent requests. But it may block due to programmer error, or due to it simply rebooting. If the webhook delivery side foes try multiple IPs, I can simply sidestep the whole “pull missed events after restart” issue by having n>1 instances and be done with it.
It’s good to hear that this isn’t common, though. Means I need to look to our app to see why.