Posts tagged ‘groovy’

Lets continue the development agile DSL for music notation with Groovy. If you remember our fundamental concepts are Scores, Parts (for instruments), Phrases and Notes.

At a certain time we are typically working only on a Score, Part and Phrase (indeed we might work only on a single Score during a session). So, we would like to have a concept of default Score, Part and Phrase, and avoid referring to it (unless of, course, we want to change the default). For instance, instead of writing:

...
myScore = score(name:"Row Your Boat")
myPart = part(title: "Flute", instrument: FLUTE, channel: 0)
myPhrase = phrase(startTime: 0.0)
myPhrase.addNoteList pitchArray, rhythmArray

(pitchArray and rhythmArray are pre-defined before)
We want to write, the much simpler

1
2
3
4
5
...
score(name:"Row Your Boat")
part(title: "Flute", instrument: FLUTE, channel: 0)
phrase(startTime: 0.0)
addNoteList pitchArray, rhythmArray

All Score, Part and Phrase methods will implicitly refer to myScore, myPart and myPhrase. Note that you can still explicitly refer to them. Indeed this will be necessary has most scores.

In this first instalment (of 2) we will not deal with line 5 above. Part 1 is actually the bulk of the work. Breath deeply has this will be the tough part.

We will use Groovy ASTTransformations for this. The Groovy compiler allows us to attach code to it while it is working. We can manipulate the AST (Abstract Syntax Tree) of our code during most of the compilation stages. This means that we will need a separate program to attach to the compiler. So, if we step back we now have 3 artifacts:

  1. The code to do the AST transformation (called during compilation)
  2. The core DSL implementation (with all the other stuff except AST transforms)
  3. Your music scripts with your score

So we need kind of a sub-project to handle this as Groovy requires a separate jar with the AST transformation code. This separate jar will have to have a descriptor file in the META-INF/services directory called

org.codehaus.groovy.transform.ASTTransformation

That is the name of the file (big one eh?). Inside it should have only one line: the fully qualified name for the class implementing the transformation (SimpleTransformation in our case).

OK, now we need to develop SimpleTransformation. This is not a trivial bit of code, I will splash it here and the it line by line (only dealing with Scores – Parts and Phrases are similar):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
@GroovyASTTransformation(phase=CompilePhase.CONVERSION)
public class SimpleTransformation implements ASTTransformation {
 
  public void visit(ASTNode[] astNodes, SourceUnit sourceUnit) {
    BlockStatement sblock = sourceUnit.getAST()?.getStatementBlock()
    List stmts = sblock.getStatements()
    int numStmts = stmts.size()
    for (int i=0;i < numStmts ;i ++) {
      Class cls = stmts.get(i).getClass()
      if (cls == ExpressionStatement) { 
        Expression es = stmts.get(i).expression
 
        if (es.getClass() == MethodCallExpression) {
          String method = es.method.text
          if (method.equals("score")) {
              Expression e = transformBinary("myScore", es)
              stmts[i].setExpression(e)
          }
        }
      }
    }
  }
...

So

  • Lines 1-4 – Boilerplate of our class so that the Groovy compiler uses this. There is one important part here: the phase where the code will attach. For now I am attaching to the conversion phase. But this might change in the future (I would like to do some type analysis, but I do not even think that that is possible with Groovy. If it is possible, than analysis would have to be done at a later phase).
  • 5-6 – We get the statements of the script that we are compiling
  • 7 – Here we iterate through all statements. Note the for and not an each/closure. I do this because I might want to change the statement list (like adding stuff at the end – prints). That is not so easy with each/closures
  • 10 – We get all expressions. This means we ignore fors, ifs, switches, function definitions, … We are not going deep, just changing methods at the top level of the code.
  • 13-17 – If it is a Method Call, and it method name is called score then we apply our transformation (16) and replace the expression (line 17)

Our transformation is:

BinaryExpression transformBinary(String var, Expression expression) {
    BinaryExpression newExp = new BinaryExpression(
      new VariableExpression(var),new Token (100, "=", 1, 1), expression)
    return newExp
  }

OK, here the bulk of the work is done: We create a new BinaryExpression composed of a Variable (called myScore in our case - as per the code above), a Token and then we attach the old expression, as is. So score(name:"Row Your Boat") becomes myScore=score(name:"Row Your Boat").

Now, a confession. The 100 in the Token was a reverse engineering of an expression. I do not know where the table of options for token types is (If you know, please tell).

You will need a few imports to do the above, by the way

import org.codehaus.groovy.ast.*
import org.codehaus.groovy.ast.expr.*
import org.codehaus.groovy.ast.stmt.*
import org.codehaus.groovy.control.*
import org.codehaus.groovy.syntax.Token
import org.codehaus.groovy.transform.*

With all this you now create a jar that will have to be on the classpath of the Groovy compiler. So, this code will be used by the Groovy compiler to manipulate the AST.

Note that this code is pretty basic: It will not recurse through for/switch statements, will not go in closures, functions, etc. It will also only look at the first token in a method call expression. I will deal with this in time (not in the second part of this article). For now it is good for illustrative purposes and good for my personal needs.

Some final notes...

You can inspect an AST from groovyConsole (helps a lot), here is an example for sc=score(name:"Row Your Boat"):

AST viewing with groovyConsole

Another point if that these kind of transformations are a bit heavy in the theory and heavy in the approach. For instance, it was difficult, in netbeans to setup a project architecture that would allow easy build (an agile cycle of develop/build/test). This is of course because part of the code has to be hooked to the compiler and IDEs are not normally used to do that. It is a bit like compiling part of the compiler before going to the actual code. Well, I finally switched to emacs+gradle. Any excuse to stop using Oracle software (which netbeans nowadays is) is fair game for me.

In the second instalment we will trap method calls like addNoteList so that addNoteList listOfNotes becomes myPhrase.addNoteList listofNotes (like line 5 on the initial example above). In this case we will use some introspection to determine the method names of Score, Part and Phrase. The second part will be cooler as the bulk of the boilerplate work was done here.

You can find the code in launchpad. Note that this is still in early stages.

Comments and improvements will be most appreciated!

I am starting to play (pun intended) with jMusic. I am just learning the basics of music composition. jMusic is quite cool, but having the usual Java overhead makes things oh so boring! Therefore I am starting developing a DSL in Groovy to write some scores. Score is exactly the first class that was DSLed. Something of a trivial nature. Just replacing this

score = new Score("My new Score")

with:

score = score(name: "My new Score")

and the same to Note and Phrase. Furthermore I would like to avoid all the usual “import everything”.

Solution? Create a class with a static method that accepts an environment to which I add the necessary functions. So, in an external file I have class Music to do just this:

1
2
3
4
5
6
7
8
9
10
11
12
13
class Music implements JMC {
  static init(env){
 
    env.score = { Map args ->
      Score score = new Score(args["name"])
      return score
    }
 
    ProgramChanges.fields.each {env."$it.name"=ProgramChanges."$it.name"}
    Durations.fields.each {env."$it.name"=Durations."$it.name"}
    Pitches.fields.each {env."$it.name"=Pitches."$it.name"}
 
    ...

So, lines 4-7 I am creating a new property with a closure. The property accepts a map of parameters and creates a new Score object that is returned. Similar things exist with Note and Phrase.

Now look at lines 9-11. I am importing all fields of those classes into the environment namespace. “That is namespace polution”, I hear you say. Well, maybe, but it happens to be a jMusic design philosophy (you will find that in many classes of jMusic), and, at least for now it is pretty manageable. If it becomes problematic, this can always be changed. Writing CLARINET sounds better than writing ProgramChanges.CLARINET or even creating an INSTRUMENT property/class in the environment to hold all instruments. This is particularly useful with Pitches and Durations (because we tend to write A LOT of these). A simple script looks like this for now:

Music.init(this)
score = score(name:"Row Your Boat")
flute = part(title: "Flute", instrument: FLUTE, channel: 0)
trumpet = part(title: "Trumpet", instrument: TRUMPET, channel: 1)
clarinet = part(title: "Clarinet", instrument: CLARINET, channel: 2)
int[] pitchArray = [C4,C4,C4,D4,E4,E4,D4,E4,F4,G4,
		    C5,C5,C5,G4,G4,G4,E4,E4,E4,
		    C4,C4,C4,G4,F4,E4,D4,C4]
double[] rhythmArray = [ C, C,CT,QT, C,CT,QT,CT,QT, M,
			QT,QT,QT,QT,QT,QT,QT,QT,QT,QT,
			QT,QT,CT,QT,CT,QT, M]
 
phrase1 = phrase(startTime: 0.0)
phrase1.addNoteList pitchArray, rhythmArray
...

Nothing particularly fantastic, but somewhat less clutter.
This is version 0.0.0.0.0.1 pre-pre-pre-alpha. ;)
Watch this space for newer versions (more useful).
For now this serves to show two very basic DSL techniques with Groovy: adding methods to the environment and inspecting classes to copy fields.

I am currently in the process of assuring that two of my bioinformatics applications are multi-platform. This would be a simulator of age structured populations newAge and a library to access the HapMap project, interPopula. I am also responsible for a small part of the Biopython project. I’ve been only concerned with Linux and Windows (I do not have a Mac, but Linux stuff seems to work there as Mac has a *nix base). I would like to share here my experiences, maybe for the benefits of others.

The overall experience has been quite positive. I do have a strong Java/JVM background and it seems to me that Python is almost as much write-once, run-anywhere. At least if “anywhere” is old fashioned computer platforms. I would split the issues as follows:

  1. Python code – I have not had a single problem to report. My code includes sub-process management and file system access. The only thing where some care was needed is the use of os.sep (so that directory + os.sep + file yields either directory/file or directory\file. I also maintain Java/JVM applications and I do remember having more problems than this when assuring cross-platform work (see more below), unfortunately I just forgot precisely the problem to document it.
  2. GUI (wxPython) – Here there are indeed some minor problems. The semantics of the API seems sligthly different (e.g. Skip methods in events), or at least the Windows implementation might be buggy as the same event is called twice if there is a .Skip call. There are also some minor layout issues, but those were to be expected.
  3. External expectations – matplotlib can rely on LaTeX to pretty-print text (formulas and such), one of my scripts did exactly that. Well, in most *nixes LaTeX is around, not so much on Windows, there was a subtle, slightly hidden dependency on LaTeX. With most other libraries there was no big problems to be found (e.g NumPy)
  4. The database API – This officially sucks! The problem is that parametrized SQL is not standard. For instance, with SQLite one writes “select column1 from table where column2=?” to be able to parametrize the value for column2, but with psycopg (PostgreSQL) you have “select column1 from table where column2=%s”. There is still, at least a positional version. Even if you write standardized SQL (and it is possible to write SQL that works in many different flavours of servers) you will end up writing different versions for different drivers because of the non-standardization of parameters.

I happen to be also deploying a Java Web Start based application (Groovy+JVM), namely ogaraK, a simulator of malaria population genetics. Multi-platform is not that easy if you have a Swing GUI. The semantics of windows sizing operations differ slightly on the Mac. Also some components (like HTML rendering widgets) are buggy in the Linux OpenJDK. Generating Java 6 .classes and then having recent Macs failing because they just go to 1.5 is irritating. While some newer versions of Mac OS X do indeed support 6, it does not seem realistic for now, if Mac support is desired to go with anything above 1.5 :( .

All in all, Python fares pretty well as long as database stuff is not involved. I would dare to say that multi-platform GUI development with wxPython is slightly easier than with Java Swing.

PS – Bias disclaimer: I am a strong supporter of the JVM platform (as long as the word Oracle is not included), much more than of Python (in fact a big part of my Python usage is on top of the JVM via Jython). So my “hidden agenda”, if there was one, would be pro-JVM.

ogaraK

I would like to announce ogaraK, a simulator of malaria population genetics.

ogaraK is a Java Web Start application developed in Groovy and Java which allows to simulate parasite population genetics. It can be used, for instance, to compare the effects of different drug deployment policies.

A set of Python scripts are made available to analyze the results (mainly frequencies of resistance loci over time).

ogaraK is free software (GPL v3).

ogaraK can also be used to simulate some of common theories about sex: Epistasis, Red-Queen and spatial heterogeneity (but not based on size, as the underlying model has no concept of population numbers).

I would like to ask the reader for some basic help: If you can, could you please test the application (A single click to run, if you have Java 1.5+)? I know most people will not understand the results or the parameters, but just a simple run would help (and reporting back if something goes wrong!). The objective is to make the application available to epidemiologists, and due to their lack of IT knowledge, a robust application is needed. Any comments would really be appreciated (I make no financial profit from this free application). If you know Java Web Start, then if you could activate the console and return any error detected (even from a blind execution), that would be most appreciated.

Preamble: The problem of writing a defense of a certain thing X is that, most people interpret that as an attack to potential alternatives. This is not how this post should be read. There is no One Single Solution. My defense of Groovy is based on a set of assumptions that do not hold true for many people. In fact, they do not even hold true for myself. Different people and different development problems entail different solutions.

My main assumption in defense of Groovy is that you are, a Java person. Java is your day to day programming language and you are comfortable with that. Though you are comfortable with that you want to try something different: maybe you want to try a scripting language, maybe you do not like too much boilerplate code.

Why Groovy? Because it gives you a lot of goodies with essentially no learning curve. An illustrative example: I’ve spent the whole morning thinking that I was editing a Java source file, but I was indeed working on a Groovy file. You see, Groovy not being a superset of Java ends up being almost that: Code in Java is, in most cases already code in Groovy. So, if you know Java, you can write Groovy. You will not write the best idiomatic Groovy, you will not gain any of Groovy’s goodies (and you will pay a performance penalty, BTW). But this is an amazing head-start if you want to go in the direction of higher-level languages (I would argue that it is even smoother than C from C++ as the paradigm does not change from Java to Groovy, it is OO to OO). If you go the Jython way then you have to learn a new language. If you go Scala or Clojure then you have to learn a whole new paradigm (and deal with the impedance of imperative semantics in typical Java libs against standard functional semantics).

For little to no cost you have now many goodies associated with scripting languages (dynamic, low boiler-plate coding, DSLs, better meta-programming, …). In some cases Groovy even out-competes supposedly more elegant languages (are Scala meta-programming facilities still as bad as in the past?).

Another interesting advantage of Groovy is that, if you want to revert back to Java then it is much easier. Why would you want to do that? Well, for performance reasons. In a Groovy application that I have, a small part of the code is extremely intensive, so I had to rewrite it in Java. This revealed to be a trivial exercise (similar syntax, similar semantics).

Again, let me stress out this: your requirements and your personal path are fundamental in any decision you take. There is no true language (OK, Prolog…. ;) ). Different people, different approaches. All I am saying is: if your background is strongly grounded on Java and you feel comfortable with that, then Groovy is probably the way to go.

Disclaimer: While I have a couple of applications made in Groovy, most of my scripting efforts in the JVM world involve Jython (another fine language implementation, appropriate in a different set of circumstances) and I also believe, that from a declarative and highly expressive language point of view, what is being done with Clojure certainly deserves mention.

If you search the web you can find some discussions on whether IDEs for dynamic languages can be as helpful as IDEs for static languages. The issue is that static languages like Java have compile-time (thus easy to get at IDE-time) information in order to provide that fundamental code-completion functionality (among many others). If the IDE knows that a certain parameter is a String, than it is simple: it will present to you all the String methods when you type in the dot. For dynamic languages things get more complex are there is formally no (by definition) compile-time information. Some people would argue that there are ways around it (which you can already find in existing IDEs, I remember having some sort of code completion, years ago, on SPE – for Python). I will not add anything to that discussion here, this preamble was mainly for putting the reader in context. I am more interested in discussing good IDEs for DSLs.

With DSLs you get, most of the times, added syntax. Worse than that, you might fall into situations where you have changed (not only added) the initial language syntax; furthermore those syntax changes might even become valid only in runtime (imagine that a method is added to a class that is supplying DSL methods).

One example comes from Ioke and Prolog operator precedence and associativity rules which are changeable (see the previous post). It is not trivial to know if something like 1+2 is even syntactically valid (*). Even if it is syntactically valid things like association rules might change. In languages like Groovy you can add (e.g., through categories) methods to code blocs (from classes that can be dynamically changed). Then there is dynamic dispatching and macros. What is valid in a certain piece of code can be different from what is valid a few lines below. In fact, complete information of what is valid in a certain code block might require code execution. Or, to put in another way, it might be very difficult to have a completely helpful IDE! In this scenario there are 3 considerations that I think are worth being done:

1. One should not be discouraged for not having perfect solutions. Maybe it is not possible to determine all that can be expressed in a certain code block, but sometimes good approximations are enough.
2. On this issue, one good example comes from Prolog: In Prolog, syntax can be changed mainly through the use of the :-o p directive (and through asserts and retracts). The :-o p directive changes operators but is very easy to analyze pre-compilation/interpretation. So, the way DSLs are normally be constructed lend themselves very easily to code analysis which can be used by IDEs. This unfortunately not the case in most real-world languages.
3. It would be cool to have a language where DSL specifications could be automatically used to construct IDEs. The current real-world DSL-able languages (Ruby, Groovy, …) are DSL-enabled through indirect techniques which can be used to build DSLs (Dynamic reception, operator overload, whatever), in fact many of these techniques exist with other objectives than creating DSLs. If there was a declarative and explicit way to create DSLs, that information could be used to inform IDEs on parsing and other issues. An embedded, core way, to explicitly specify DSLs.

(*) I suppose some will see this as an argument for the fact that you can do pretty stupid (or at least unintuitive) things with DSLs. Well, you can do stupid things with everything. The question is not if you can or not, but the extent of bad use cases and how bad uses can creep in easily. Another (interesting) discussion, but not for now.

Before I start, please remember the finesse of numbers in groovy: 0.1 is a BigDecimal, if you want a Double, you have to write 0.1D.

Also, I might be seeing something completely wrong here, corrections are more than welcome!

So, what is the result of code below?


List lst = [0.1, 0.1D]
println lst[0].class
println lst[1].class
println 0.1 == lst[0]
println 0.1 == lst[1]
println 0.1 in [lst[0]]
println 0.1 in [lst[1]]

Well, in my book the interpreter should whine on the first line and stop. I am declaring a List of doubles and putting a BigDecimal in. But it doesn’t. I suppose this is either a bug or some type messing coming from a the not very clear way (for me) Groovy handles types: If I say the type of lst is a List of Doubles, I expect it to behave statically. Either that or the language is misguiding me is allowing me to specify the type and then ignoring it, not good.
So, the result:


class java.math.BigDecimal
class java.lang.Double
true
true
true
false

Note that 0.1 is equal to 0.1D (i.e. BigDecimal is equal to Double. For me it makes sense as they have the same value) BUT 0.1 is not in [0.1D]. This, I suppose can only be categorized as a bug (or as something completely unintuitive).

I understand that numbers are not an easy thing to address (precision vs efficiency), but this strikes me as nonintuitive in 2 fronts (type declaration and number/equality behavior)

Correct me if I am wrong (I can see myself doing a big blunder with equality operator semantics, but I have trouble accepting that groovy lets me put a BigDecimal inside a list of double)…

When doing Groovy scripts I sometimes have the need to redirect the output of a part of the computation that is going to standard out.. A possible solution would be to open a new Writer and change the code to write to it (i.e. replacing all prints with newStream.prints), this, of course requires changing all prints, which is cumbersome and boring. There is a lazy alternative, using method references:

s = new PrintStream(new FileOutputStream("/tmp/myOut"))
def print = s.&print
print "a"
print = System.out.&print
print "b"

In this case the a is written to /tmp/myOut and b gets back to the standard output again. The big gain: all those prints in a script (and printlns) don’t need to be changed! Lazy me is happy.

Caveat: I would be careful in using this strategy a lot, it is be very easy to loose track of what is happening to the output. But it can be quite an expedient way to redirect prints on simple scripts.

During my “silent months” (for details see this post) I’ve been developing a simple system to study the spread of of antimalarial drug resistance. It is a “typical” scientific application with a core (which simulates genetic recombination of individuals reproducing) which is computationally very demanding.

As it is common in these scenarios I started by developing a prototype in a high-level, declarative language (in my case Groovy). I was pretty sure that the first solution would be slow as hell, and part of of that slowness would be due to using a “scripting” language (although algorithm complexity is the cause of slowness, changing the language should at least get running times down 1 order of magnitude). The initial solution was in fact slow. So I proceeded to do the usual thing: identify the expensive part (easy in my case) and rewrite that part in Java. My intention was to end up with a typical hybrid system: core, computational intensive code in Java and high-level functions in Groovy, for easy and productive manipulation.

Converting from Groovy to Java is easy, in fact it is too easy: The final Java code was full of Groovyisms: legacy generics code (things like Map<String,List<Integer>>) and strange looking (from a Java perspective) code originating on .each constructs among other things that made the Java code look very strange.

Needless to say, there were not that much speed improvements. In order to improve things I started to try to be sure that the data structures below List<> had the required complexity for my most used operations. Not much improvement. I then decided to completely convert things like List<List<Integer>> to the typical Java int[][]. Spaghetti and semantic chaos followed (just think of the not-so-minor differences in semantics between lists of lists and [][]).

Being a member of the fundamentalist church of refactoring I decided to do the unthinkable: throw the code away and rewrite it from scratch. I would rewrite the whole code, starting from the core in Java in a Java idiomatic way targeting performance. Then, on top of that I would grow a set of Groovy wrappers in order to easily manipulate the said core. Worked perfectly! Actually I am running that code in the background (on a Asus EEE) as I write this.

The (somewhat elusive) lesson that I took from this is that going from prototype to production code, when the fundamental difference is performance, can be cumbersome if the prototype language is too close to the production language (and Groovy and Java and close enough). The temptation to do a line by line code conversion is too good for comfort (I actually did rename the computationally intensive .groovy to .java and translated line by line – feel free to call me silly) and can have very upseting results.

First a personal note: I’ve did not write (or doing any “things on the Internet”) for the best part of 2008. Although part of it was due to a busy schedule, most of it was due to illness (being obsessive-compulsive has some strange consequences). I finally decided to tackle my health issue (which is solvable, at least in my case).

Anyway, I’m still working in computational biology, still working with malaria, and I am still working with Groovy. So… Lets get back to the usual topics…

Before I start, a caveat: “Over-engineering”, as used below, should not be seen as scornful, we all know that traditional OO-languages and libraries try to be as general purpose and deployable in industrial software processes. In that setting, languages and libraries which present themselves in a typical OO-setting are, comprehensibly “over-engineered”.

The so-called scripting languages (for the lack of a better word, lets stick to it) are supposedly more productive than traditional languages (especially in small to medium size projects). Languages like Java are “over-engineered” beasts, seen as general-purpose, “industrial”, heavy-duty. Our beloved scripting languages fit our brain, they are agile, we can be highly productive, write less lines of code, accomplish more, be more declarative.

Can we really? Lets consider a subset of those languages, those like Groovy or Scala which were developed for the JVM (or all languages that were ported to the JVM). One of the pluses of these languages is that they can use the whole JVM ecology of libraries. The problem is that, most of those libraries are developed in a Java mentality (i.e., they are over-engineered). An example:

The fantastic JFreeChart library produces high-quality 2D charts of all kinds. It has all the flexibility that we expect from the typical Java library, you can do everything. In the Groovy landscape there is also a Builder for it, groovychart, of which I am a minor author. But, whenever I want to plot a chart, my first impulse is to use the (also great) matplotlib (CPython based). Why? Because to plot a line chart in matplotib it is 3 lines, which I remember without going to the documentation:

from pylab import *
plot([1,2,3])
show()

Really, it worked at the first attempt.

In groovychart? I am not even sure of the whole process, but it involves starting swing, preparing the dataset, choosing the chart, … And again, I am one of the authors, I do groovycharts everyday, but I still need to go to a template to do something it takes 3 lines in matplotlib. Matplotlib fits my brain, groovychart doesn’t.

While having all the Java libraries at hand is obviously a good thing, there needs to be a “scriptization” of many of those libraries. There is need for interfaces that “fit the brain”. Groovy + JVM libraries is only tackling half of the problem. Even JVM libraries with a Groovy idiom (like groovychart) don’t address the “over-engineering” problem. What is needed, in my view, are wrappers which are not only Groovy-idiomatic but also Groovy-philosophical: they fit the brain (wrappers which allow to plot a simple line chart in 3 lines).

This can actually be seen in Groovy itself for IO and many data structures: Some core Java libraries are well covered and are already available in a “fit-your-brain” interface. Hopefully we will see more interfaces like this for many existing libraries (and less like groovychart, which are only idiomatic wrappers).

PS – Another way to tackle “over-engineering” problems comes from good IDEs and, in fact, the average modern Java IDE goes to great lengths in reducing “over-burden”.