Using Numba to Accelerate Python Execution Share your comment!


Python is an excellent language but not being compiled makes it slower than compiled languages. There are various ways to speed it up, using cython, PyPy (a compiler) and Numba. It speeds up your Python applications by just-in-time compiling Python code using the LLVM compiler to produce optimized machine code that can targeted to run on either CPU or GPU.

Intel Python includes Numba in the packages installed along with Python so I installed it for this article. The idea is to compare standard Python3 with Intel Python3 accelerated by using Numba. Once installed in Linux read the release notes. If you installed it in the default location (/opt/intel/intelpython3) then you should use a virtual environment. The installer which uses Conda provides that by default. So after install you can switch to a virtual environment.

Verify with

That should show something like:

Using Numba

It’s cross-platform with both 32 and 64 bit support except on Mac OSX which is 64 bit only. The recommended way forward is to let Numba decide when to optimize by adding the decorator @jit to a function.

I’ve created the following computationally intense program to generate a list of the first n odd numbers. Then process each number in the list to see if it’s a prime number.

As 2 is a prime it gets inserted at the start of oddnumbers. The prime number check uses an interesting property that any prime number appears to be next to a number divisible by 6.

When I first ran this, as a sanity check on the results, according to another page on the primes site (about halfway down the page), the counts for n is 1,000, 10,000 and 100,000 are 168, 1229 and 9592 and the program correctly gets these.

How Fast?

The reason I wanted a computationally intense task is because there’s a price to pay for Numba’s use. It has to compile the code each time you ran it though it’s possible to cache it. On my PC (an i7-5930K) this appears to be about 36 ms. The times given below are for Intel Python with and without Numba. Please don’t take these as an accurate benchmark, they are merely indicative. All runs were done under the same conditions.
The figure on the left is the number of runs as specified in this line:

The time is from doing the run with %time %run

It turns out that somewhere around 50,000 is the point at which the time with or without Numba is roughly the same. Below this number of primes, it’s faster without Numba because of the compilation overhead but above it, Numba makes a huge difference, 6.5 x faster for a million primes and for ten million it was 10 x faster (8 seconds vs 81).

If I rerun them with Numba but for normal Python i.e. CPython, is there as big a difference?
Luckily I had Numba installed from Anaconda in another virtual machine (all were run under VirtualBox on Windows 10 PC with 64 GB ram) so I was able to rerun it. As before I ran everything within IPython using the command line %time %run

Other Features of Numba

I’ve only touched on using Numba. You can for instance expand the @jit decorator with types. For example:

This lets the compiler generate more optimal machine code. Other types include void, intp/uintp for pointers, intc/uintc for C int and unsigned int types, int8, int16 etc, float32/float64 and array types such as float32[:] for one dimensional float arrays.
There’s also more advanced features for caching generated machine code, targeting Cuda (for GPU), compiling Python classes and creating universal functions.


Once you factor in the overhead of compilation, Numba can make your code run faster. Just don’t use it for one off very fast computations. It’s like comparing a van (without Numba) racing a train (with Numba). Over short distances, the van will always win due to faster acceleration but beyond a certain point the train will pull ahead due to greater speed.
And tellingly both sets of figures showed that Intel Python was faster but as the amount of computations increased Numba made more of a difference.
Numba is a very handy way of speeding up your code. Given the right type of computation it can make a massive difference.

Posted on March 14, 2017 by David Bolton, Slashdot Media Contributing Editor