Code Modernization: Weighing Pros and Cons Of OpenMP Share your comment!

openmp logo

I am an OpenMP evangelist. I use it, and I love it. This semester I spent one week in my advanced architecture class showing how it contributes to the continuance of Moore’s Law. I have also spent a lot of time here at Go Parallel talking about OpenMP, and showing how to get the most of it.

On the flip side, though, OpenMP can be a performance drag if you are not careful. That is because there is overhead that OpenMP introduces when it spins up threads to do your work. This blog talks about the potential downside of OpenMP, and what you need to do to weigh the overhead versus the performance gains that OpenMP can provide.

Quick Explanation of OpenMP

Let’s take a look at a simple OpenMP example. Suppose we call a function called doGnarlyMath() 10,000,000 times. If the function is CPU intensive, and is relatively slow, you can boost performance by parallelizing the loop. First, though, take a look at the following code. It loops 10,000,000 times and for each iteration calls the doGnarlyMath() function.

When this code runs on my development machine, it takes 719 milliseconds to run. But we can reduce the execution time by adding OpenMP directives as the following code shows.

On my development machine, this was a big win. Execution time went from 719 milliseconds to 156 milliseconds. But in my mind, I asked the question: what happens for small loops? Does OpenMP help or hurt.

Experimenting With The Loops

I attempted to experiment by changing the loop end to 16 as follows. Without OpenMP the execution time was 0 milliseconds. This is because the loop will finish in less time than the GetTickCounter() function granularity can produce. Adding the OpenMP still showed an execution time of 0 milliseconds.

In order to test OpenMP for short loops, I added a wrapper loop. Without OpenMP this code executed in 12,875 milliseconds. You might predict that adding OpenMP to the inner loop would enhance the performance. But that thought does not take into account the OpenMP overhead.

The following code shows the previous loop with the addition of an OpenMP pragma. I executed in 17,000 milliseconds. By adding OpenMP, the performance was reduced and execution took more time.

Given the performance degradation, I wanted to determine the point where performance was improved. So I experimented in order to find an answer. I changed the inner loop number of iterations to attempt to find out where the break-even point was. When the inner loop count went (from 4) to 5, the code without OpenMP was still faster. But at the point where the inner loop count from (from 5) to 6, the OpenMP code was faster.

The explanation for this is probably related to my development computer’s processors. I have eight processors, so when the inner loop came close to that (in this case six), the performance gain was positive.

Conclusion

OpenMP is almost always a performance gain. Be careful, though, loops with small iterations may not deliver the boost you expect. Experimentation is your best method to determine your best options.

Posted on March 8, 2017 by Rick Leinecker, Slashdot Media Contributing Editor