Python Notebook cell execution failed due to timeout

Until a few days ago, my GitHub Actions for daily testing of Python Notebooks worked just fine for this GitHub repository:

https://github.com/SimFin/simfin-tutorials

Normally the entire test of all the Notebooks takes around 50 minutes to run, and it has been working fine on GitHub Actions for a few months now. But for the past 3 days a few of the Notebooks have failed. I have made a clean install on my own PC of this GitHub repo with the same Python version running on GitHub Actions, and everything works fine when I run it on my own PC. So I wonder if something changed on GitHub’s servers around Friday May 8th, 2020?

This is the error that I get for the Notebooks that fail on GitHub’s servers (only the cell-number changes):

    Notebook cell execution failed
    Cell 19: Timeout of 2000 seconds exceeded while executing cell. Failed to interrupt kernel in 5 seconds, so failing without traceback.

Here are the logs from GitHub Actions:

https://github.com/SimFin/simfin-tutorials/runs/662507646?check_suite_focus=true
https://github.com/SimFin/simfin-tutorials/runs/660389867?check_suite_focus=true
https://github.com/SimFin/simfin-tutorials/runs/658614648?check_suite_focus=true

These are all the logs:

https://github.com/SimFin/simfin-tutorials/actions?query=workflow%3A%22Test+All+Notebooks%22

Any idea why it doesn’t work on GitHub Actions anymore?

Thanks!

@hvass-labs 

I forked your repo and changed on: schedule to on: push . The workflow run passed . 

Could you please trigger it again and show me the result? 

Thanks for the quick reply!

I have tried 3 things:

  • I let the test run on the cron schedule for several days more.

  • In the log of a run that failed, I tried clicking the button “Re-run jobs” in the top-right corner to manually restart the test.

  • I tried changing the file test.yml from on: schedule to on: push, which triggered instantly when I pushed the file test.yml.

In all these cases, the test still fails in the same Notebook cells.

You can see all the logs here: https://github.com/SimFin/simfin-tutorials/actions

Since this works on my local computer, and it works when you forked it to your own github account, I wonder if maybe the free monthly time-limit for github actions has been used up for this github repo? But then it would be strange that so much of the test is allowed to run. So it’s a very strange problem. Do you have any idea what is going wrong here?

Thanks again for your help!

Have you tried install a self-hosted runner in your PC, will the Python Notebook cell execution pass?

I have already tried running the tests on my own PC and they work fine here. But do you want me to setup the GitHub Action to somehow trigger a test to run on my local PC? That sounds like a really big job to setup and I wouldn’t know how to do it, but I also can’t see that is where the problem lies. Clearly there is a problem with the GitHub Action running on your servers. It worked fine for several months, and then suddenly it stopped working. Do you have access to more detailed logs than the ones I can see on my github page? Can you see what goes wrong? Thanks!

Sorry for letting you suffering from this .
I am trying to get some help from senior engineers. Your patience will be appreciated.

@hvass-labs

Sorry for delayed response.
The only significant changes that we made on images before the first workflow failure, are implementation of Haveged tool and update of Python build versions, however, we’ve forked and tested your workflow on downgraded image and got same timeout error.
We’ve also tried to run your project on clean ubuntu-18.04 image and got the same error.
Perhaps, there was some dependencies already preinstalled on your local machine that are required to the project?

Thanks very much for all the effort, I greatly appreciate that!