A few years ago, one expectation was implicit when buying a new computer: that performance would increase more or less linearly with CPU clock frequency. That is, if you had a old 450 MHz computer and bought a new one of 900 MHz you would expect that the new one would have more or less twice the speed of the old one.
If you had a computationally intensive bioinformatics application, you would then expect that it would run at twice the speed. So if it took 4 weeks to complete a certain task, it would now complete in only 2 weeks. Nice!
For technical reasons (limitations is a better word), since a few years ago CPU manufacturers like Intel, AMD and IBM resort to not increasing the CPU frequency but to provide more cores. That is, when you have a dual core CPU with the same CPU frequency(*), a single task will still take the same time to complete, but you could do two tasks simultaneously and they would not steal CPU time from one another.
So:
Imagine that you have 2 tasks of four weeks each (on a single core 1 GHz computer):
On that machine the whole operation would take 8 weeks (2 tasks * 4 weeks / 1 core) weeks.
On a single core 2 GHz machine, the whole operation would take 4 weeks (2*2/1) weeks.
On a dual core 1 GHz computer the whole operation would take 4 weeks (2*4/2). But here is the catch: You would have to start them at the same time, concurrently. Because if you started one after the other it would take 8 weeks (and you would be half of your processing power - one of the cores would be doing nothing).
It could be expected that the increase in computing power, in the future will be a lot like this: multi-cores with more than one CPU, but each core speed staying more or less the same.
So, if you have a brand new machine and your computationally intensive application still takes ages to run, this might be the cause.
In a series of 4 blog posts I will discuss the consequences of this change for both users and developers. The parts will be:
- Introduction - You are reading it.
- Consequences to users - I will discuss the consequences of this paradigm shift in a user perspective. I will present some real situations (using existing bio software today) and suggest strategies to take the most performance out new hardware. Scenarios range from complete frustration (ie, only a single core can be used, thus there will be no noticeable performance gain) to total gain (ie, there will be linear gains with the number of new cores introduced).
- Consequences to developers: Design and concepts - I will discuss the changes that bio software developers will have to consider in order to make their applications multi-core aware and thus, use all the performance available on new machines. The key words here will be: asynchronous calling models, concurrent programming, memory sharing, message passing.
- Consequences to developers: One practical example - I will present a framework, in Python, to facilitate the development of multi-core aware applications.
In each post I will also discuss grid computing issues briefly, as taking advantage of grids is sometimes similar to multi-core performance gains.
As during next week I will be traveling, the next post should only surface around Monday, 15th. Please accept my apologies in advance for this delay.
(*)CPU frequency is really an erroneous simplification, but for the sake of simplicity I use it.
One Comment to "Bioinformatics, multi-core CPUs and grid computing: Introduction (1/4)"
Please share your thoughts
Filed in: bioinformatics











[…] Bioinformatics, multi-core CPUs and grid computing: Introduction (1/4) […]