A few years ago, one expectation was implicit when buying a new computer: that performance would increase more or less linearly with CPU clock frequency. That is, if you had a old 450 MHz computer and bought a new one of 900 MHz you would expect that the new one would have more or less twice the speed of the old one.

If you had a computationally intensive bioinformatics application, you would then expect that it would run at twice the speed. So if it took 4 weeks to complete a certain task, it would now complete in only 2 weeks. Nice!

For technical reasons (limitations is a better word), since a few years ago CPU manufacturers like Intel, AMD and IBM resort to not increasing the CPU frequency but to provide more cores. That is, when you have a dual core CPU with the same CPU frequency(*), a single task will still take the same time to complete, but you could do two tasks simultaneously and they would not steal CPU time from one another.

So:

Imagine that you have 2 tasks of four weeks each (on a single core 1 GHz computer):

On that machine the whole operation would take 8 weeks (2 tasks * 4 weeks / 1 core) weeks.

On a single core 2 GHz machine, the whole operation would take 4 weeks (2*2/1) weeks.

On a dual core 1 GHz computer the whole operation would take 4 weeks (2*4/2). But here is the catch: You would have to start them at the same time, concurrently. Because if you started one after the other it would take 8 weeks (and you would be half of your processing power - one of the cores would be doing nothing).

It could be expected that the increase in computing power, in the future will be a lot like this: multi-cores with more than one CPU, but each core speed staying more or less the same.

So, if you have a brand new machine and your computationally intensive application still takes ages to run, this might be the cause.

In a series of 4 blog posts I will discuss the consequences of this change for both users and developers. The parts will be:

  1. Introduction - You are reading it.
  2. Consequences to users - I will discuss the consequences of this paradigm shift in a user perspective. I will present some real situations (using existing bio software today) and suggest strategies to take the most performance out new hardware. Scenarios range from complete frustration (ie, only a single core can be used, thus there will be no noticeable performance gain) to total gain (ie, there will be linear gains with the number of new cores introduced).
  3. Consequences to developers: Design and concepts - I will discuss the changes that bio software developers will have to consider in order to make their applications multi-core aware and thus, use all the performance available on new machines. The key words here will be: asynchronous calling models, concurrent programming, memory sharing, message passing.
  4. Consequences to developers: One practical example - I will present a framework, in Python, to facilitate the development of multi-core aware applications.

In each post I will also discuss grid computing issues briefly, as taking advantage of grids is sometimes similar to multi-core performance gains.

As during next week I will be traveling, the next post should only surface around Monday, 15th. Please accept my apologies in advance for this delay.

(*)CPU frequency is really an erroneous simplification, but for the sake of simplicity I use it.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

One Comment to "Bioinformatics, multi-core CPUs and grid computing: Introduction (1/4)"

Please share your thoughts