-
I have builds set up for Windows (windows-latest) and macOS - it’s a matrix, so 2 builds run for each. Sometimes one of the Windows ones fails and cuts off messages like this:
…but the other Windows build works. Then I re-run everything using the “re-run” button and they all pass (or they swap and the other Windows one fails this time). Seems like something is killing the process randomly but don’t have access to any additional info - just the cut-off logs. This is happening frequently enough that it’s annoying because the macOS ones randomly get “cancelled by user” when nobody has touched it as well :slight_smile: Windows builds take ~20-25 minutes, so manually re-running all the time is not a good option. Any idea how to troubleshoot this? |
Beta Was this translation helpful? Give feedback.
Replies: 16 comments
-
Could you please enable runner diagnostic logging and enable step debug logging in your workflow? Let’s check whether there are more information in detail. And if you don’t mind, please share a trouble workflow run with debug logs here? Please note that, the debug logging will be enabled for all the workflows in the current repository, after you getting a debug log for a failure workflow run, you could disable it. |
Beta Was this translation helpful? Give feedback.
-
Thank you for the reply. I don’t have admin access to set that up. I’ll have to try to get the organization admin to do it. In the meantime, here’s one that failed like this (without the extra debug information). And here is the workflow YAML. When I re-run these, they usually pass the first time, but sometimes it takes a couple of tries. I notice that it’s (often? always?) when it’s doing cmake, so I wonder if it’s a bug in the Windows cmake that’s installed. I’ll update when I can get that extra information. |
Beta Was this translation helpful? Give feedback.
-
Since I can’t seem to attach anything here in the forum I uploaded the logs over here. I took a look through them and can’t see anything obvious. The one that failed this time is Windows Latest MSVC SCALAR_DOUBLE=OFF in step 7_Configure CMake.txt (sometimes it’s Windows Latest MSVC SCALAR_DOUBLE=ON that fails though).
I don’t see anything in the additional Runner and Worker logs that looks strange, but I don’t know what I’m looking for exactly. |
Beta Was this translation helpful? Give feedback.
-
I have forked your repo to my org. I met the same error as yours. I am trying to involve github engineering team to further investigate this issue. There might be some time delay. Appreciate your patience. |
Beta Was this translation helpful? Give feedback.
-
Great - thank you for your time. This was a very large time sink yesterday as I had to re-run dozens of times. If you (or they) have any suggestions on how to reduce the build time on Windows (~25 minutes) I’d love to hear that as well :slight_smile: |
Beta Was this translation helpful? Give feedback.
-
I have another run that keeps failing - I’ve tried it at least a dozen times today. I know it should pass because it passed the PR request stage, was merged, and now the merged CI keeps failing :slight_smile: I turned on the logging and found this in one of the “worker” logs - maybe it will help?
|
Beta Was this translation helpful? Give feedback.
-
Thank you for looking deep into this issue . I am still trying to contact to senior engineer , your patience would be appreciated. |
Beta Was this translation helpful? Give feedback.
-
Thank you. We have had to re-run these things dozens of times (and you can’t just re-run the ones that fail - it’s all-or-nothing), so not only has this been very frustrating, it has been a major time sink and a waste of build resources at GitHub. Combine this with the macOS runners randomly deciding someone cancelled them, and it shakes my confidence in Actions as a CI system. It kind of goes without saying that we need to be able to trust this and rely on it to be rock-solid. |
Beta Was this translation helpful? Give feedback.
-
Hi @asmaloney ! Thank you for your patience. Could you please check the investigation results below and see if it helps to solve the issue? It turned out that by default CMake is not displaying its internal errors, and in order to find a reason of the fail we changed the workflow to show the CMake logs and saw there’s a compilation error from the C++ build itself
The diff to show logs: dsame/CloudCompare@e8248d4 (idea is to surround cmake… with try…catch and output all the logs) |
Beta Was this translation helpful? Give feedback.
-
Thanks for looking into it Alena. …but the build you’re showing didn’t fail. What you’re showing here (and in the log) looks like the standard way CMake tests for compiler attributes. In that case it’s supposed to fail. (The code it points at is cmake’s code for testing compilers for specific atttributes - not our code.) The build works on a real Windows machine (the rest of the team are using them). Right now with master it usually fails several times before it builds properly. Here’s a more recent example: https://github.com/CloudCompare/CloudCompare/runs/618602537 If I keep re-running that one it will pass eventually. So maybe try re-running it until it fails? Did you see my comment above about the worker logs and pipes? That seems more likely to be an issue. |
Beta Was this translation helpful? Give feedback.
-
Your comment makes sense. We were actually able to reproduce the issue from your forked repo, and the only errors we were able to get are the ones in the build I provided in the answer above (that build I attached did not fail because of applied logic in try…catch):
Could you please add the suggested patch to your code, and provide the logs you will get? it will help us to understand what is happening in your build. |
Beta Was this translation helpful? Give feedback.
-
> did not fail because of applied logic in try…catch I’m sorry, but that’s not the case. As I mentioned above, the things you are showing as errors here will not cause the build to fail. They are normal - it’s the way that cmake determines what’s available on the machine. So I don’t think you have reproduced the error with the try…catch yet. It fails in the cmake config process, so it never starts the build step. What about the other error I mentioned with the runner? Again - it seems a likely candidate. I will see what else I can do at my end, but please try to reproduce it again. The symptom is (I think) that cmake is trying to produce a warning but instead something (runner?) produces an error which kills the build. It doesn’t happen all the time, but it happens very frequently. I’ve had to run them 16 or 17 times in some cases to get it to pass. |
Beta Was this translation helpful? Give feedback.
-
The reason builds fails is either CMake or another tool invoked by CMake writes to stderr. PowerShell triggers terminating exception then. The reason build fails randomly is a race condition we did not identified exactly becaues the build system is too complex and not obvious. In order to identify the exact the problem of CMake fail i suggest you to apply the code of this PR: CloudCompare/CloudCompare#1232 which intended to intercept the stderr stream and to do not let it miss the logs and PowerShell output. With this changes the CMake Configure will not break on the warnings but print them to the build log, so you can analyze them and find out the unexpected behaviour. The sample build: https://github.com/dsame/CloudCompare/runs/694200554?check_suite_focus=true The messages cause PowerShell to fail becaues of CMake writes to stder are marked with “ERR” prefix: Please, review the suggested PR and confirm it can help effectively to solve the problem |
Beta Was this translation helpful? Give feedback.
-
There’s a slightly modified build.yml which fails in case if there is any sdterr otput from the build. It seems to be too strict because but you might find it usefull Please try to apply it your build in order to see the stderr output that was lost in your build and find out the differences in success and failed builds. Alternatively, as the recent Windows images have bash installed i’d advise to switch to it from PowerShell: With the bash you should not have the problem with CMake output to stderr |
Beta Was this translation helpful? Give feedback.
-
I like the idea of switching to bash since I’m completely unfamiliar with PowerShell anyways. It seems like the easiest option. I’ll do that and report back. Thanks for your help Sergey! |
Beta Was this translation helpful? Give feedback.
-
After switching from PowerShell to bash as suggested by @dsame, we have not had any of the random failures on Windows we were seeing before. Solution: Don’t use PowerShell. Thanks @dsame, @alenasviridenko and @yanjingzhu for looking into it. |
Beta Was this translation helpful? Give feedback.
After switching from PowerShell to bash as suggested by @dsame, we have not had any of the random failures on Windows we were seeing before.
Solution: Don’t use PowerShell.
Thanks @dsame, @alenasviridenko and @yanjingzhu for looking into it.