During my “silent months” (for details see this post) I’ve been developing a simple system to study the spread of of antimalarial drug resistance. It is a “typical” scientific application with a core (which simulates genetic recombination of individuals reproducing) which is computationally very demanding.
As it is common in these scenarios I started by developing a prototype in a high-level, declarative language (in my case Groovy). I was pretty sure that the first solution would be slow as hell, and part of of that slowness would be due to using a “scripting” language (although algorithm complexity is the cause of slowness, changing the language should at least get running times down 1 order of magnitude). The initial solution was in fact slow. So I proceeded to do the usual thing: identify the expensive part (easy in my case) and rewrite that part in Java. My intention was to end up with a typical hybrid system: core, computational intensive code in Java and high-level functions in Groovy, for easy and productive manipulation.
Converting from Groovy to Java is easy, in fact it is too easy: The final Java code was full of Groovyisms: legacy generics code (things like Map<String,List<Integer>>) and strange looking (from a Java perspective) code originating on .each constructs among other things that made the Java code look very strange.
Needless to say, there were not that much speed improvements. In order to improve things I started to try to be sure that the data structures below List<> had the required complexity for my most used operations. Not much improvement. I then decided to completely convert things like List<List<Integer>> to the typical Java int[][]. Spaghetti and semantic chaos followed (just think of the not-so-minor differences in semantics between lists of lists and [][]).
Being a member of the fundamentalist church of refactoring I decided to do the unthinkable: throw the code away and rewrite it from scratch. I would rewrite the whole code, starting from the core in Java in a Java idiomatic way targeting performance. Then, on top of that I would grow a set of Groovy wrappers in order to easily manipulate the said core. Worked perfectly! Actually I am running that code in the background (on a Asus EEE) as I write this.
The (somewhat elusive) lesson that I took from this is that going from prototype to production code, when the fundamental difference is performance, can be cumbersome if the prototype language is too close to the production language (and Groovy and Java and close enough). The temptation to do a line by line code conversion is too good for comfort (I actually did rename the computationally intensive .groovy to .java and translated line by line – feel free to call me silly) and can have very upseting results.
Leave a Reply