Archive for December, 2008

Some reasons in favor of explicit typing (I am not talking about static vs dynamic here, just about the possibility annotate the expected type of a certain thing):

1. It can serve as code documentation. In some cases, with some kinds of programmers, the only documentation that you have is code itself (although this would require compulsory explicit typing which doesn’t happen always – groovy, caml and scala are examples of exceptions – in many cases saying the type is optional)

2. It helps IDEs help you. There is some discussion that IDEs for non-explicit (actually dynamic) languages can be as helpful as static languages. Well, if the information is there, then surely the IDE case use it and help you.

3. Bugs. Maybe your function is working and should not be working. Maybe the object, which should never be passed to that function is responding just because there is a signature that matches. I find this pattern somewhat common: a) There is a function parameter (without the explicit type) on a buggy function call. b) I put the type in on the called function. c) It immediately becomes clear that somewhere I am passing something that shouldn’t be going in in that form, a pseudo-code example:

myFunction(a, b) {
  String x = a + b
  print x.toUpperCase()

a should be a String (+ is a concatenation), but for some reason myFunction gets called with a as an integer and kaboom (+ is interpreted as addition).

This can be quite insidious with type inference (CAML for sure, probably Scala also) where you can get a bug on a chain of say, myFunction3 calling myFunction2 where the bug is somewhere else (say another myFunction1 which also calls myFunction2): When the compiler reads myFunction1 it does a wrong type inference about myFunction2. Afterwards, when the compiler passes on myFunction3 it complains, but the bug was caused elsewhere (so the information from the compiler is useless). If you put the type on myFunction2, the compiler will whine on the correct place (on the myFunction1 call). These bugs can be a pain to detect because sometimes the chains can be long. I had the “pleasure” of spending countless nights with caml tracking these bugs. 15 years ago, but I still remember.

Anyway, non-compulsory explicit typing (a la Groovy, Scala, CAML) is a good compromise (use it if you like it). In fact, in some cases it is good to be lazy anyways ;)

PS – As far as I remember in Groovy and Scala there are cases where explict typing is compulsory anyways (correct me if I am wrong). I would suppose that comes as a need as those languages and JVM and Java friendly by design and the compiled code will require that info.

Social network sharing
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn
  • connotea
  • FriendFeed
  • Twitter
  • Yahoo! Bookmarks

When doing Groovy scripts I sometimes have the need to redirect the output of a part of the computation that is going to standard out.. A possible solution would be to open a new Writer and change the code to write to it (i.e. replacing all prints with newStream.prints), this, of course requires changing all prints, which is cumbersome and boring. There is a lazy alternative, using method references:

s = new PrintStream(new FileOutputStream("/tmp/myOut"))
def print = s.&print
print "a"
print = System.out.&print
print "b"

In this case the a is written to /tmp/myOut and b gets back to the standard output again. The big gain: all those prints in a script (and printlns) don’t need to be changed! Lazy me is happy.

Caveat: I would be careful in using this strategy a lot, it is be very easy to loose track of what is happening to the output. But it can be quite an expedient way to redirect prints on simple scripts.

Social network sharing
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn
  • connotea
  • FriendFeed
  • Twitter
  • Yahoo! Bookmarks

During my “silent months” (for details see this post) I’ve been developing a simple system to study the spread of of antimalarial drug resistance. It is a “typical” scientific application with a core (which simulates genetic recombination of individuals reproducing) which is computationally very demanding.

As it is common in these scenarios I started by developing a prototype in a high-level, declarative language (in my case Groovy). I was pretty sure that the first solution would be slow as hell, and part of of that slowness would be due to using a “scripting” language (although algorithm complexity is the cause of slowness, changing the language should at least get running times down 1 order of magnitude). The initial solution was in fact slow. So I proceeded to do the usual thing: identify the expensive part (easy in my case) and rewrite that part in Java. My intention was to end up with a typical hybrid system: core, computational intensive code in Java and high-level functions in Groovy, for easy and productive manipulation.

Converting from Groovy to Java is easy, in fact it is too easy: The final Java code was full of Groovyisms: legacy generics code (things like Map<String,List<Integer>>) and strange looking (from a Java perspective) code originating on .each constructs among other things that made the Java code look very strange.

Needless to say, there were not that much speed improvements. In order to improve things I started to try to be sure that the data structures below List<> had the required complexity for my most used operations. Not much improvement. I then decided to completely convert things like List<List<Integer>> to the typical Java int[][]. Spaghetti and semantic chaos followed (just think of the not-so-minor differences in semantics between lists of lists and [][]).

Being a member of the fundamentalist church of refactoring I decided to do the unthinkable: throw the code away and rewrite it from scratch. I would rewrite the whole code, starting from the core in Java in a Java idiomatic way targeting performance. Then, on top of that I would grow a set of Groovy wrappers in order to easily manipulate the said core. Worked perfectly! Actually I am running that code in the background (on a Asus EEE) as I write this.

The (somewhat elusive) lesson that I took from this is that going from prototype to production code, when the fundamental difference is performance, can be cumbersome if the prototype language is too close to the production language (and Groovy and Java and close enough). The temptation to do a line by line code conversion is too good for comfort (I actually did rename the computationally intensive .groovy to .java and translated line by line – feel free to call me silly) and can have very upseting results.

Social network sharing
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn
  • connotea
  • FriendFeed
  • Twitter
  • Yahoo! Bookmarks

First a personal note: I’ve did not write (or doing any “things on the Internet”) for the best part of 2008. Although part of it was due to a busy schedule, most of it was due to illness (being obsessive-compulsive has some strange consequences). I finally decided to tackle my health issue (which is solvable, at least in my case).

Anyway, I’m still working in computational biology, still working with malaria, and I am still working with Groovy. So… Lets get back to the usual topics…

Before I start, a caveat: “Over-engineering”, as used below, should not be seen as scornful, we all know that traditional OO-languages and libraries try to be as general purpose and deployable in industrial software processes. In that setting, languages and libraries which present themselves in a typical OO-setting are, comprehensibly “over-engineered”.

The so-called scripting languages (for the lack of a better word, lets stick to it) are supposedly more productive than traditional languages (especially in small to medium size projects). Languages like Java are “over-engineered” beasts, seen as general-purpose, “industrial”, heavy-duty. Our beloved scripting languages fit our brain, they are agile, we can be highly productive, write less lines of code, accomplish more, be more declarative.

Can we really? Lets consider a subset of those languages, those like Groovy or Scala which were developed for the JVM (or all languages that were ported to the JVM). One of the pluses of these languages is that they can use the whole JVM ecology of libraries. The problem is that, most of those libraries are developed in a Java mentality (i.e., they are over-engineered). An example:

The fantastic JFreeChart library produces high-quality 2D charts of all kinds. It has all the flexibility that we expect from the typical Java library, you can do everything. In the Groovy landscape there is also a Builder for it, groovychart, of which I am a minor author. But, whenever I want to plot a chart, my first impulse is to use the (also great) matplotlib (CPython based). Why? Because to plot a line chart in matplotib it is 3 lines, which I remember without going to the documentation:

from pylab import *
plot([1,2,3])
show()

Really, it worked at the first attempt.

In groovychart? I am not even sure of the whole process, but it involves starting swing, preparing the dataset, choosing the chart, … And again, I am one of the authors, I do groovycharts everyday, but I still need to go to a template to do something it takes 3 lines in matplotlib. Matplotlib fits my brain, groovychart doesn’t.

While having all the Java libraries at hand is obviously a good thing, there needs to be a “scriptization” of many of those libraries. There is need for interfaces that “fit the brain”. Groovy + JVM libraries is only tackling half of the problem. Even JVM libraries with a Groovy idiom (like groovychart) don’t address the “over-engineering” problem. What is needed, in my view, are wrappers which are not only Groovy-idiomatic but also Groovy-philosophical: they fit the brain (wrappers which allow to plot a simple line chart in 3 lines).

This can actually be seen in Groovy itself for IO and many data structures: Some core Java libraries are well covered and are already available in a “fit-your-brain” interface. Hopefully we will see more interfaces like this for many existing libraries (and less like groovychart, which are only idiomatic wrappers).

PS – Another way to tackle “over-engineering” problems comes from good IDEs and, in fact, the average modern Java IDE goes to great lengths in reducing “over-burden”.

Social network sharing
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn
  • connotea
  • FriendFeed
  • Twitter
  • Yahoo! Bookmarks