Standard_DS2_v2 vcpu parallel/multithread performance

Hi,

I just launched the following trivial cpu performance test on ubuntu workers

int k=0;
for (int i=1; i<2000000000; i++)
  k=k+(1%i);

Running this test two times sequentially takes about 5 seconds each, wall time 10 seconds
Running this test two times in parallel takes about 13 seconds each, wall time 13 seconds (it’s using separate linux processes, using mvn -T 2.0C package)

I expected the paralell test to be almost exactly twice as fast as the sequential test (that’s what it does on my computer).

What are the expected multithread performances of the Standard_DS2_v2 vcpus ? Did anyone get a noticeable speedup using parallelization ? Is there something to do ?

Cheers,

Jon

ps: it works as expected on macos workers

How do you run this test two times sequentially, add two test steps in your workflow?

And how do you run this test two times in parallel ? Use two jobs? A workflow run is made up of one or more jobs. Jobs run in parallel by default.

For us to know your scenario more clearly, if you are using a public repo, could you please share the link here? Then we can find your workflow yml file and workflow runs logs .

Hi,

thanks for taking the time to reply. Here’s the link to my tests:

https://github.com/jonenst/temptestactionmvn/pull/1/

You can see the two different check runs:

https://github.com/jonenst/temptestactionmvn/pull/1/checks?check_run_id=346119561 << parallel

https://github.com/jonenst/temptestactionmvn/pull/1/checks?check_run_id=346117935  << sequential

> How do you run this test two times sequentially, add two test steps in your workflow?
> And how do you run this test two times in parallel ? Use two jobs?

It’s always one step: “Build with maven”. The maven command is changed between the two runs to use -T 2.0C or not. When maven doesn’t have -T 2.0, it does the tests sequentially. When maven does have the -T 2.0C, it does them in paralell. In both cases, the test in run in its separate linux process.

Thanks in advance,

Jon

ps:  (If you want I can cleanup the example)
ps2: What I’d like to know is what kind of guarantees we have regarding the vcpus. What code can they execute in parallel ? do they share some hardware components (FPU …?).

Note: the code in this question and the repo with my tests are just a reduced testcase. In my real use case, I have real java unit tests that get a x2 speedup when I run them in parallel on my computer or on the macos github worker, but that do not get the x2 speedup when I run them in parallel on the ubuntu github worker.

Thank you for your sample project and other information. It helped me a lot to investigate this issue.

Standard_DS2_v2 has 2-core CPU.  When you use mvn –T 2.0C. it will use 4 thread (2 threads in each core CPU) .

As you just have two tests, I tested in my side using mvn –T 1C. It will generate two parallel threads to run these two tests.

The next image shows the result. As two tests run in parallel, so the build total time is also about 24s.

When I remove –T 1C, tests run in sequence. The execute time of the two tests add up to the total time approximately .

When you just has a few small tests, run them in parallel doesn’t have any advantage. Run tests in parallel will spend some additional time except running tests themselves. Such as generate Multi Threads .

If your tests will speed a long time to execute, then you could choose to run them in parallel.

Hi,

I disagree with your analysis. The exact same test case shows a 2x performance improvement on the macos slaves (or on my computer). Using 4 threads or 2 threads doesn’t make any difference. Running a ~10second test is largely enough to make thread creation overhead negligible.

It would be nice to know what exactly can be executed in parallel on these 2vcpu machines.

What’s the system of your computer?

According to official document: https://help.github.com/en/actions/automating-your-workflow-with-github-actions/virtual-environments-for-github-hosted-runners#cloud-hosts-for-github-hosted-runners  GitHub uses MacStadium to host the macOS runners. Not Standard_DS2_v2 vitual machines. . 

In the Build with Maven step, you could see how many threads it used for the build. In my case,

I use hosted Ubuntu runner, and run mvn –B –T 1C package --file pom.xml.

In 2-core CPU machine, it generate two threads. These two threads run in parallel.

MacStadium and its macOS runner shows the good behavior. Azure and its Standard_DS2_v2 show the bad behavior. This proves that the build is correctly starting multiple “threads” (processes actually) that do benefit from multicore systems. But on Standard_DS2_v2 systems, the two parallel process are two times slower, as if they were sharing the same core. That’s the problem I’m trying to understand.

Is there any news on this ? This is still a problem as of 2020-05-05

1 Like