Greg Tyrelle made a very important comment regarding exploiting multiple cores and Python (which will surely be included in my next part on bioinformatics and multi-core computing).


First, my understanding of python threads is that they are not separate system level processes, but some kind of fake process that are python specific ? Trouble is that I see two separate process when I launch two Blast runs via threading ?

The other aspect of threading that I’m still not entirely clear about is how the global interpreter lock (GIL) fits into the picture. I get resource locking to prevent race conditions, but is the GIL also invoked each time an action that manipulates memory takes place in a thread ? I’ve heard this property of python makes it unsuitable for multi-core programming ?

I will trade formal correctness for clarity of explanation (namely I won’t discuss that much the difference between thread and process, as it would make this too techy and confusing).

Python uses real (i.e. native) threads. Ruby uses the so called green threads, those are “fake” (simulated). Ruby 2.0 will use native threads.

So, in theory, Python is OK in multi-core architectures. In practice there is a problem, a serious one, identified by Greg: the Global Interpreter Lock (GIL). The GIL makes it impossible for more than one thread to be executing Python code at a time. When you are dealing with Python code, even if you have many threads with many cores, only one thread can be executing Python code. This is not as serious as it looks, there are 4 ways to live with this:

  • If you use a thread to start an external process, that process is not under the control of the GIL (it is a separate process), so it can run concurrently (think BLASTing something) as it is running outside Python, that is, it will be using a different core. So I think it covers one fundamental use case in bioinformatics: using external, computationally intensive, programs. In fact you can start as many instances of external programs as the number of cores you have (or even more, in case you think it will be advantageous). Note that the thread that calls the external application will block (well… depends, but for simplicity lets assume it), but your other Python threads can continue in concurrency with the application.
  • This is subtle, but important: If you use CPython (the standard implementation), and you do your computationally intensive stuff in C (which makes sense - and is a common strategy - as Python is quite slow) then the C code, as long as it is not interacting with Python objects, can release the GIL and therefore make use of multiple cores. The Python code uses only one core, the C part might be using all the remaining available ones. This approach is not valid for Ruby because of the green threads issue (I am a simple Ruby newbie, so take my words with a grain of salt).
  • Now… this GIL problem (or the green threads issue in Ruby) disappears if you use Jython or JRuby, as they use the JVM native concurrency mechanisms which have no notion of acquiring an exclusive lock for execution. By the way you can also use JVM based interpreters to call native (non-JVM) applications (think BLAST again, from inside Java). To put this point in another way: the GIL/green threads problem is not a language limitation, it is a limitation of the standard (C based) implementations that other implementations might not share (and the Java implementations, in fact, DO NOT).
  • If you think about grids (and not multiple cores) then the problem disappears as we are then talking of different processes (even more, running on different hardware).

I am afraid of being too techy with this post (I am probably labeled as 100% computer nerd by now ;) ), but I think Greg’s point is fundamental and required some discussion.

In my defense ;) I would like to say that I am only writing too much about programming because I am in some sort of professional unclear phase, as soon as things get back on track I want to focus more on the biological part of things… Until there I will be writing of the issue that I know better, and that is, for better or worse, informatics…

Comments, especially constructive criticism, is, as always, welcome…

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

3 Comments to "Python, Ruby, Java and Threads"

Please share your thoughts