Help
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Copilot Lvl 3
Message 1 of 7

GitHub Actions on macOS randomly kill my test program

I documented all details at https://github.com/fortran-lang/fpm/issues/16. I run `cargo test` and sometimes all tests pass. Some other times macOS sends SIGKILL (signal 9) to my test program.

 

Is that a bug at GitHub, or in my code?

6 Replies
Highlighted
GitHub Partner
Message 2 of 7

Re: GitHub Actions on macOS randomly kill my test program

Hi @certik ,

 

Glad to see you in Github Community Forum!

I forked your repo, however 'cargo test' always passed in my workflow. Could you please confirm whether the issue is gone? 

If the issue persists, please try to export the detail debug log in workflow then it will be helpful for further investigation.

 

Thanks.

Highlighted
Copilot Lvl 3
Message 3 of 7

Re: GitHub Actions on macOS randomly kill my test program

Thank you @weide-zhou, I appreciate you trying it.

 

Unfortunately the bug is still there, but I don't know how to best debug it. Here are our Actions:

 

https://github.com/fortran-lang/fpm/actions

 

There are 3 failures just on the first page:

 

https://github.com/fortran-lang/fpm/runs/391281163

https://github.com/fortran-lang/fpm/runs/389637164

https://github.com/fortran-lang/fpm/runs/392603247

 

I instrumented the code using ExitStatus::signal() to retreive the signal which macOS used to kill the program, and it reported number 9, which is SIGKILL (from the line in the log

##[error]thread 'test_2' panicked at 'INTERRUPTED with signal: 9', tests/cli.rs:35:21

).

 

How do I export the detailed debug log?

 

I would love to help Debug it, but I I ran out of ideas. My best idea is to see if I can somehow reliably reproduce this, which could mean instructions like "run this 20 times, and it will fail 2x out of 20".

Highlighted
Copilot Lvl 3
Message 4 of 7

Re: GitHub Actions on macOS randomly kill my test program

A little update: In the PR:

 

https://github.com/certik/fpm/pull/1

 

I wrote a simple script (https://github.com/certik/fpm/pull/1/files#diff-6ef86427d4f1b23499bc6659d557b5b5) that makes a change in README and pushes to GitHub to trigger a CI. It pushes 10 commits, each is tested as a PR and as a push, so effectively 20 CI runs. Of those 20 CI runs, there were exactly 2 failures, both due to the above bug. So it seems the bug happens in ~ 10% of cases, but I think I can reproduce it now pretty reliably.

Highlighted
GitHub Partner
Message 5 of 7

Re: GitHub Actions on macOS randomly kill my test program

Hi @certik ,

 

Today I triggered the workflow +20 times, however it always passed on my repo.

Please try below setting to check the debug info:

1. Change env `RUST_BACKTRACE` value to 'full' in workflow file.

2. Enable debug logging, please check doc here.

 

Thanks.

Highlighted
Copilot Lvl 3
Message 6 of 7

Re: GitHub Actions on macOS randomly kill my test program

@weide-zhouthank you! I spent the whole day yesterday trying to make it more reproducible:

 

https://github.com/certik/fpm/pull/2

 

So far I had to do 416 commits.

 

1. I enabled the diagnostics, here is an example of a failure with diagnostics on: https://github.com/certik/fpm/runs/395497656

 

2. The Rust exectuable that fails is now super simple:

fn main() {
    println!("Command: --help");
}

3. I tried the whole day yesterday to simplify the rest of the repository, for example I tried removing installing Python or cmake, none of which are now used, but when I do it, I can't trigger the failure anymore.

 

Do you see anything suspicious from the logs from the failure?

Highlighted
GitHub Partner
Message 7 of 7

Re: GitHub Actions on macOS randomly kill my test program

Hi @certik ,

 

Thanks for your check! Sorry I don't find anything suspicious from the log.

The issue is not repro on my side, but looks it could be somehow unstable for 'rust + python or cmake'.

Split into two jobs, seperate rust and 'python/cmake' command could be an option then.

 

Thanks.