I had some CI configured in such a way, that a certain process is run for x minutes, and then its results inspected. How I was doing it was by having it start forked in one run, sleep in another run (which was dynamic depending on base ref), and then kill in a run thereafter. All worked fine, until about yesterday.
Here's the simplest workflow example:
name: broken_ci on: - push jobs: example: runs-on: ubuntu-latest container: image: ubuntu steps: - uses: actions/checkout@v1 - name: forked job run: | apt update && apt install -y iputils-ping ping google.ca > /tmp/ping.out & echo "this will echo" - name: this is never reached run: echo "i dont get echoed ever"
This issue is specific to running in containers. Without a container, it works as expected and as it used to even in containers.
One work around I currently have is adding
sleep 60 && pkill ping &
to the same run of the forked process (basically at the end of the "forked job" run).
Thank you for reporting this issue. I could reproduce it in my side. I have directed this scenario to the appropriate engineering team for further evaluation. Let's wait for their response.
Could this be related to the Docker environment not having a real init process, and zombie processes not getting collected? I had a similar issue with a test script written in Python that (simplified) does the following:
Step four never completed, because the server double forks during start by default, making the main service a zombie process on exit, and without an init process to reap it it remained in the process table indefinitely. With the Apache server I could work around that issue by telling it not to fork ("-DFOREGROUND" option), so my Python script can do a proper wait() and avoid the zombie process issue. The similarity is that your ping will become a orphan process when the shell running the "run:" block exits.
Maybe you can do something similar as a workaround, e.g. a Python script like this?
import subprocess import time process = subprocess.Popen(['your', 'command', 'here']) time.sleep(your_timeout) process.terminate() process.wait()
The issue is that the behaviour is inconsistent. It worked fine, months and weeks ago, and only a few days ago it started acting like this, effectively breaking a bunch of my CI unless I hack around with it. It is not a docker environment issue, as my local docker environment behaviour matches how things used to work, and the non-container github actions environment.
Sorry to tell you that I haven't heard from github actions team till now , please try to add
sleep 60 && pkill ping &
as temporary workaround. I will try to contact to github actions team again. Thank you for your understanding.