Groovy/Scala/Ruby/Python on JVM

There seems to be some competition in the field that can be vaguely defined as “The next Java”(TM).

I don’t know if there will be a “next Java” to start with. Things seem to shape up in way where the JVM is our common interoperability platform and on top of it we have a an ecology of JVM based languages.

I have used Jython quite a lot but have several doubts about it, not only on the current status of Jython (lags a bit behind CPython) but I also deslike Python (when compared with the other languages dicussed here). As such I decided to evaluate the other Scala, Ruby and Groovy.

I have done a couple of small projects in Scala (A prototype DSL for modeling malaria resistance is available here) and JRuby. I am now starting with Groovy, and I think I’ve found my new love. Here I will try to explain why, among Groovy, Scala and JRuby, I have chosen Groovy. To preempt any religious war idea, I would like to say I have full respect for Scala, Ruby, which are, with Caml and Prolog among my favorite languages (for a true crusade and flame ask me for my opinion about Perl or Visual Basic 6 ;) ).

Steven Devijver suggests that Groovy is the language with more syntatic similarities with Java. I would say that, not only that, but on the semantics and everything, Groovy is the closest language to Java. And that is a good thing. The world (both in programming languages and all the rest) is never revolutionary. Revolutions, when they rarely happen, are either a disgrace or are not that much of big change below the surface. People normally prefer (for good and bad reasons) the path of least short term pain. Groovy delivers that: almost 0 cost in starting to code coming from a Java background. Most importantly Groovy does that but still delivers most of the new goodies. This is actually the cornerstone of my argument: path of least pain while delivering the good stuff (in some cases better than the competition, as we will see).

Let me start with the fundamental reasons why I dismiss JRuby (which is, nonetheless, my second option after Groovy). First, I would like to say, very honestly, that the work of the JRuby guys is nothing short of outstanding! But I have 3 problems:

  1. One, by definition, JRuby is based on Ruby, a language from outside the JVM. That means semantic hurdles, coupling issues between the two worlds (think, e.g., libraries)
  2. Most importantly (but connected with the first point): Typing. I am a bit far away from computing issues currently (I work with Malaria currently, so excuse me if I mess strong/explicit typing and such) but clearly the typing system of Ruby make like hard for IDEs (think IDEs to neded to tame those over engineered Java APIs) and automated tools around code. Debugging without explicit typing is also a pain in a big program (I actually suffered my first debug nightmare with typing systems with Caml, arguably the mother of Scala). Some might say that Scala type inference and Groovy duck typing also are problematic in this respect; while the argument might be correct both languages have mechanisms to support typical Java explicit/strong typing and as such profit from IDEs and automated analysis tools.
  3. Ugly perlisms. Although I have read somewhere that those might be deprecated in the future.

Ah… Scala… Mats Henricson argues that Scala is the only option because of elegance regarding multicore computing. I fundamentally disagree with his point - multicore programming is fundamental but Scala is not really a good solution, but before we get there, lets talk about other Scala issues.

Type inference. I have some experience with the “mother” of Scala, Caml. Type inference in Caml is really elegant: I don’t remember a single case of it failing and requiring the programmers’ help in discovering the type of a parameter. That is not the case with Scala, several times the compiler seems to be “lost in translation”. Some might say that this is because of JVM imposed constraints, but if that is the case then it would raise the argument of bringing a language with a foreign semantics to the JVM and the ugliness attached to the process.

My biggest peeve? Metaprogramming. I won’t give you my opinion about it because it really doesn’t exist. It is on the Scala wiki in the section “future”. I am sorry, but a 21st century language where meta programming is absent can only be called in “beta stage”. As a side note, there seems to be something lost in the ML branch of functional programming from Lisp in this regard (no introspection and such), that is a shame (How is Haskell in that respect?).

Ok, multicore computing. This is an area where I have some experience in the JVM: [Shameless plug] I invite you to have a look at my Java Web Start, Jython based, multicore aware evolutionary biology workbench LOSITAN. Furthermore I have written tutorials for the multicore paradigm and bioinformatics:

Bioinformatics, multi-core CPUs and grid computing: Introduction (1/4)


Bioinformatics, multi-core CPUs and grid computing: User perspective (2/4)

Most importantly in this context: Bioinformatics, multi-core CPUs and grid computing: developer perspective (3/4)

Mats argues that Scala Actors and immutable data types provide a simple and elegant solution to the extremely complex problem (I am calling it extremely complex, because I think it really is) of concurrent programming. Immutable data types… Does anyone believe that the hordes of existing Java developers/programmers are ready and willing to do radical conceptual jump to immutable data types? The change from C++ to Java was minor in terms of semantics, even the change from C to C++ was much less radical that a change requiring to “get rid of all variables”. How do you think the majority of programmers will react when you say: “Forget variables”? More, as Scala allows for imperative type of programming, what do you think most programmers idiom wil be: Imperative or functional? To makes things worse, in Scala a immutable is called a “val” and the mutable a “var”. Am I the only only picturing hordes of developers, with tight deadlines just swapping L’s for R’s?

I speak for myself here: in spite of having probably more experience with “immutable” languages (Prolog a lot, Caml a bit) than most developers, when I wrote Scala code, my reasoning was so tainted by “real world” imperative languages that it was really hard to write in a functional dialect. I have the background, enough free time, and the motivation to write functional code, but it was hard to get back in that mindset.

Scala only apparently solves the multi core problem. Give it to a typical developer and he will write imperative code, unless you put a functional zealot behind him (and give the said zealot a strong, resistant whip).

How to address the multicore issue? Clearly we have a problem here. A few ideas:

  • In many applications there is no big need to go multicore. In some cases lets not try to solve a problem that doesn’t exist in the first place.
  • Many multicore applications can survive very well with simple concurrency management. Not all applications require a PhD in concurrent programming.
  • Scala and the like. For those who can and are willing to go functional, why not? I have nothing against that. My only argument is that it won’t be mainstream.
  • The way of PAIN. Most developers will continue to use old languages and paradigms and SUFFER with it. Only after much suffering there will be motivation to try out new things and, say, endure the pain of learning a new paradigm. That suffering still hasn’t happen, only after this becomes a big problem, there will be interest in accepting new solutions.
  • A silver bullet that can be attached to the current programming paradigm. Sometimes it happens. Don’t misunderestimate (silly Bushism intended) the power of a “Black Swan” (A reference to Taleb’s book where he discusses the impact of the unexpected important events).

To finalize, I would like to say that I am not sticking with Groovy out of being conservative. Groovy seems to beat the competition in many areas (the biggest example is metaprogramming) and strikes a very good balance between being a “small evolutionary step” and delivering the goodies.

To really finalize, a caveat: my Groovy knowledge is still limited, one of these days you might read a post where I apologize for having written this ;)

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Ruby, Scala, groovy

by: tiago

9 Comments

A holiday, Ruby and Scala

I really did not have an holiday, but I stopped posting for a while.

But I want to talk about another “holiday”: Scala.

I have spent a couple of months with Scala: A functional-OO programming language done from the scratch with the JVM in mind, with a nice, smart community.

I actually decided to stop my efforts on Scala and decided to go back to explore the Ruby way… The reasons:

  1. No metaprogramming facilities. This comes from ML, I suppose. But Ruby has it and many “old school” elegant languages have it (Lisp, Prolog). It is possible to be elegant (in fact I would contend that in many settings it is a requirement) with metaprogramming.
  2. There seems to be some difference in the semantics between compiled and interpreted. I only compiled, but the interpreter could add new variables to its local scope (as it really needs it) but the compiler couldn’t. While one might argue that that is excessive flexibility coming from the scripting languages camp, but I actually had to, on a compiled program, to create new classes which would include traits that would be dependent of need of the user, and this cannot be done. If one has many traits, it has to compile a priori all the trait mixins desired, they cannot be defined at run-time in a compiled environment (contrast this to JRuby or even JPython). This is actually metaprogramming lacking part 2.
  3. Type inference: Scala type inference might seem clever, but, compared to CAML it is not. Sometimes the compiler is not able to infer the types and the user has to explicitly declare them. CAML was always capable (at least in my cases) of complete type inference.
  4. Information sources are scarce. The mailing list is reasonable, but sometimes questions get unanswered and there is no other source (other than inspecting the source code). This will sort out if there are more people using it - and more books like the Artima ebook.

Decent metaprogramming in a runtime setting would be my main requirement, but in the current Scala status, one can only have it though the typical Java way: execute the compiler, link a jar, not elegant…

Regarding Ruby, I would like to have some form of strong and explicit (or inferred) typing. I would imagine that the requirements of metaprogramming flexibility and typing are contradictory, but, at least, some kind of optional (but standard) annotation for input/return parameters would allow avoiding some debugging nightmares of not having the compiling helping with types and would also allow for smart code editors to do all that fancy completion that is possible with explicit typing.

[This was initially posted - with modifications and additions - on Artima as a comment]

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Ruby, Scala

by: tiago

5 Comments

Automated GUIs for OO models and DSLs

One of the most delightful things in bioinformatics is the possibility of working with people with really different mindsets. Surely CS geeks are amazing, and everyday I feel that my original background is really a comparative advantage, but, from where I look, nothing beats being in an environment with scientific and cultural diversity. But, lets talk some geekiness now:

A couple of years ago, I did a population genetics simulator in Caml. It was really flexible, allowing for many demographic and genomic scenarios, mating rules, selection… really flexible. I never got to try to publish it because there are many good simulators around (I suggest simuPOP, if you are looking for one) and it would take some time to make it robust and documented for public exposure. But, the interesting part is, when I went to my MSc supervisor (an “old-type” biologist) and after a very exuberant explanation on how flexible the simulator was, he added only one comment: That is all very well and good, but you did not show me the easy to use graphical interface!

Fast forward a couple of years… With regards to a DSL to model drug resistance in the context of infectious diseases that I am developing, I went to my PhD advisor (a population geneticist, malarialogist, biostatistician who knows how to program in C), showed him my rough prototype and he said: People will be able to read this, but, to interact they will want an easy to use graphical user interface. To be honest, this time, I was expecting the comment (I am living in the middle of experimentalists long enough to have learned something). I have no expectations, for my DSL, that domain specialists will write it (well, maybe a couple of them will, if things pick up). If I end up giving my system away to domain specialists, it will have to have a easy to use interface, there is no escaping from that.

Well, DSLs (at least in Scala and in Ruby) have an underlying OO model. Which, most of the times is neither complex nor big. I am starting to suspect that it won’t be too difficult to automatically generate an easy to use interface to input in a “nice” way what could be rendered as DSL programs (or object instances and relationships, if you prefer to look at it that way). For embedded DSLs, which have the whole expressive power of the host language available, that would be unfeasible to do completely. But, at least part of it could be automated. Obviously this idea is not new at all, this is just a rehash of what Lift or Rails do for databases.

I am aware that graphical programming languages never went too far (I actually dislike them), but the scope and context here are completely different, different premises apply. This might be one way of lowering the barrier to rigorous modeling to a wider crowd.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Caml, Ruby, Scala, bioinformatics, declarative programming, science, software engineering

by: tiago

No Comments

Modeling drugs in Scala

I am currently trying to model antimalarial drug behavior in order to understand the spread of drug resistant malaria. Generally speaking, malaria strains are more or less tolerant to a drug depending on the quantity of drug that is necessary to kill an infection. In theory, a totally resistant infection will survive any treatment, a totally susceptible one will only require small levels of drug to be cleaned.

I see the word drug used in two different ways (for the readers of this blog that are specialists, in some form, on issues regarding drugs, particularly pharmacokinetics, if you see any thing particularly wrong, please do inform me): For instance, SP (Fansidar) is a drug, composed of two drugs (Sulfadoxine and Pyrimethamine). I will use drug for SP and compound for S and P (as active compound seems to be used).

Antimalarial drugs work mainly in the blood stream against asexual parasite forms.

In the blood, compounds have a certain concentration. With time, the body gets rid of compounds (thus the concentration of a compound goes down with time). The concentration of compounds is normally (but not always) modeled using an exponential decay function, being the fundamental parameter the half-life, i.e, the time that it takes for the concentration of a compound to drop to half.

Two other important concepts for drugs that are not taken intravenously (like cheap antimalarials which are oral), are

  1. Bioavailability, i.e. the fraction of the compound that actually reaches the circulation. It seems that one of the problems with counterfeit drugs is low bioavailability. Bioavailability is normally discussed in terms of AUC (Area Under the Curve. Being the curve related to the plot of drug concentration against time). I will model it in terms of maximum concentration, half-life and the time it takes to reach maximum concentration in the blood, which by the way is the next concept…
  2. The time it takes to reach maximum concentration in the blood, i.e. the time from ingestion to circulation in the blood at maximum concentration. I suppose this time frame has a technical name, but I don’t know it (if you know, drop me an email our comment, please).

Now, back to computational modeling:

A big objective is declarative programming. Preferably a program that can be read by domain specialists (biologists, MDs, biostatisticians, …), with that in mind…

Currently, a computer program in Scala to model drugs look like this.

Compound create "Sulfadoxine"
Compound abbreviation "S"
Compound half_life 116 //hours
Compound bio_availability 408 //1mg to nanoM
val Sulfadoxine = Compound prepare
 
Compound create "Pyrimethamine"
Compound abbreviation "P"
Compound half_life 83 //hours
Compound bio_availability 34 //1mg to nanoM
val Pyrimethamine = Compound prepare
 
Drug create "SP"
Drug includes Sulfadoxine quantity 500
Drug includes Pyrimethamine quantity 25
val SP = Drug prepare

Discussion:

  • I am using the “object companion” pattern a lot. The idea is that all “stateful” mess is stored “prepared” in the object (which is the DSL source). When the prepare method is invoked in the object a class (with only immutable vals, very lovely for those of you who are functional programming enthusiasts) is created.
  • Notice the dependence on operator precedence on Drug includes quantity (there is not really one, strictly speaking, but assume there is). I would really like to have, per class the ability to define operator precedence, other than not based on dictionary order (à la Prolog).
  • I don’t like the val SP = Drug prepare. It is too verbose and too geeky. I would prefer just Drug prepare. I believe that this is possible in Scala as at least at the interpreter level (as the Scala interpreter does it), but I still don’t know how. The idea would be that a val named SP would be added to the local scope in some way. For those computer inclined readers that think that I am being too pedantic and nit-picking, I just have one thing too say: I am really trying to make the system the most pleasant possible to non programmer types, and I think my proposal does not sacrifice elegance and generality (although I would recognize the non-explicit name creation is “strange” - but, hey, the Scala interpreter already does it!)

Caveat emptor, big one: Although drugs (compounds) are discussed in terms of half-lives, bioavailability, etc… these properties are actually not of the drug but of the interaction between the drug and the individual. Making them drug properties only is a “cognitive abuse”, although it has its uses. For instance, my advisor, after looking at the language, was talking about bioavailability for counterfeit drugs, for children between 2 and 5 years. A great example that they are not properties only of the drugs but also, at least, of individuals (and not only that, for instance many drugs are more bioavailable if there are taken in conjunction with, say, fatty foods).

A proper, precise, computational modeling of drugs would be a gigantic undertaking. I have a different approach: Modeling as close as possible to the average domain discourse and hook, in some way, the necessary precision, should the need arise. It is worth noting that “incorrect”, “imprecise” modeling is enough for many tasks.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Scala, bioinformatics, declarative programming, malaria, metaprogramming

by: tiago

2 Comments

Scala types again

It turns out that my previous Scala idea to model environments for arithmetic expressions doesn’t fly. Solution? Go to the Scala mailing list and ask for opinions, there is always a lot of helpful, highly skilled people willing to help. What follows is my account of things based on the help that I have got.

My initial solution was to create a type like this:

type Env = String => Double

Doesn’t work as types have to be defined inside objects or classes. This will, by the way, work on the interpreter as the interpreter is really running inside an anonymous class (so type definitions will be attached to that anonymous class).

I then tried 2 traits as alternatives:

trait Env {
 def apply(name : String) : Double
}

and

trait Env extends Fuction1[String, Double]

When I used any of those like in:

val env : Env = (x : String) => x match {case "S" => 1.0}

I had a typing error complaining that String => Double is different from Env. But is it really? Aren’t the two traits above the same as String => Double? Well, they are, but Scala uses nominative typing, not structural typing. So after you give that type a name it will be different from equal structural types that have different names.

I ended up with with a type inside a companion object. The companion object pattern seems to be widely used in Scala. I still don’t have a opinion if that is generally good or bad.

In any case, I don’t like the idea that types cannot be defined top level. I don’t see any advantage in disallowing this.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Scala

by: tiago

No Comments

Learning Scala: Types

I will document my effort in learning Scala. The main objective is to help in building up as much content as possible on the Web about Scala. Caveat though: I am learning, therefore what I document here might be rubbish.

The problem: I need an arithmetic expression parser to be able to deal with drug isobolograms like these 2 taken from Wang et al.:

Isobologram

Very roughly speaking, the concentration “curve” that you see is when the drug combination becomes active against the parasite. In this case the compounds are Sulfadoxine (SDX) and Pyrimethamine (PYR) used to combat P. falciparum malaria.

I want to be able to build expressions for those curves. These expressions run in contexts, that is, if one “curve” (here approximated by a line) is 133*SDX + 2000 - PYR then there is a need for having a variable SDX and another PYR.

[People with a functional programming background might immediately recognize the typical newbie exercise of doing an expression evaluator… In fact you can find it in various Scala documentation… and Caml]

So I need, what is called an environment, a store of mappings from symbols to values. Something like

{case "PYR" => 1800.0 case "SDX" => 5.0}.

First problem: Doubles and Ints. There seems to be no way to specify that the return is a Number. I would like, sometime, to use Numbers irrespective of the specific subclass. I would like to do something like:

type Exp : String => Number

The problem is that one cannot have type at all, unless inside a class. But, but, but the type is not visible outside the class as a type (i.e. just for the type info, without the need to instantiate to access). Maybe putting it into an object? Did not try it, but it would be clumsy.

I ended up with this solution for now:

trait Env {
  def apply(name : String) : Double
}

Can live with it. The Double instead of Number hurts more.

Again, I am beginning. Please don’t be too harsh on me ;)

By the way: It would be nice to have Scala support on GeSHi so that Wordpress blogs could render Scala code beautifully. No, I am suggesting only, I don’t have the time to actually do it myself.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Scala, bioinformatics

by: tiago

No Comments

Scala for bioinformatics

I am seriously considering doing the core of my work (at least when I have the freedom to decide) in Scala. The reasons? Well, I can give them in the form of requirements:

  1. Domain Specific Language support, that is:
    1. Making life easy on declarative programming
    2. Ability to show the code to non-programmers in a form that is readable and understandable (I will talk about this topic a lot in the future).
  2. DSLs should be embedded and not stand-alone. A DSL (say, one to model the spread of malaria drug resistance) can be made in any programming language, really. But embedded languages (i.e., where the DSL resides inside the host language) cannot really be done in most languages. This allows for “unlimited” extensibility (Turing completeness some would say). Prolog is still my favorite here.
  3. Availability of a wide range of libraries (think math libraries, chart libraries, bio libraries). All JVM based languages can use Java libraries. This more or less kills Prolog, Caml and Haskell.
  4. Easy multi platform support. Think Linux, Mac OS X and Windows. With not much pain. Kills most non-VM languages and “system” languages (C, C++, Fortran).
  5. By the way, I refuse to malloc. I was born in the 70s, not retired in the 70s.
  6. Lively, clever and helpful community.
  7. Strong-typing, better yet, strong typing with type inference. I don’t think typing in traditional “scripting” languages scale when the code base grows, it is overrated (think Ruby, Python, Perl and friends), debugging becomes a mess. Caml wins here. Scala type inference seems to sometime fail (i.e. requiring the programmer to explicitly specify the type). Java type of languages force you to always be verbose, that kills productivity.
  8. The language should be seem by the creators mainly as a production vehicle and not as a research vehicle. A big no-no to Prolog here. Haskell goes the same way. Scala seems to strike a reasonable balance. I need to produce reliable code, I require a reliable compiler/interpreter.
  9. I have a strong bias towards the JVM: Open source and open development process (Java Community Process), robust, widely supported, massive user base. .NET, being in practice vendor locked (I don’t think Mono is really a viable alternative as MS really controls whatever they want to control) is out. At the end of the day I also have a soft spot for Java. There are many things that I don’t mind doing in Java.
  10. Introspection. Caml fails here. I actually don’t know how Scala fares here, but at least JVM mechanisms are enough for me.
  11. Striking a good balance between cognitive freedom and damage control on bad code design. As an example, Java gives little freedom in regards on how to express your ideas. Perl, on the other hand allows you to do a big mess (without really giving you expressive power, actually). Functional and logic programming languages shine here.
  12. Over engineering might be good support all possible use cases, but it is a productivity disaster to code in. I am thinking Java here. All libraries are difficult to use by design. Even 3rd party libraries seem to be designed mostly with a complexity culture in mind.

Scala seems to be the option that tackles most issues. To be honest I was always frustrated with all languages because they missed a crucial point in a big way. Prolog is too “researchy”, Haskell also, C too low level, Java too verbose and too freedom-curtaining. Perl and C++ are a complete mess (although in different ways).

Python is almost there (Major: Jython lags. Minor: weak DSL and functional-paradigm support). JRuby is probably there. Scala is probably there. My gut feeling points to try out Scala.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Scala, bioinformatics

by: tiago

7 Comments