Skip to content


DSLs and IDEs

If you search the web you can find some discussions on whether IDEs for dynamic languages can be as helpful as IDEs for static languages. The issue is that static languages like Java have compile-time (thus easy to get at IDE-time) information in order to provide that fundamental code-completion functionality (among many others). If the IDE knows that a certain parameter is a String, than it is simple: it will present to you all the String methods when you type in the dot. For dynamic languages things get more complex are there is formally no (by definition) compile-time information. Some people would argue that there are ways around it (which you can already find in existing IDEs, I remember having some sort of code completion, years ago, on SPE - for Python). I will not add anything to that discussion here, this preamble was mainly for putting the reader in context. I am more interested in discussing good IDEs for DSLs.

With DSLs you get, most of the times, added syntax. Worse than that, you might fall into situations where you have changed (not only added) the initial language syntax; furthermore those syntax changes might even become valid only in runtime (imagine that a method is added to a class that is supplying DSL methods).

One example comes from Ioke and Prolog operator precedence and associativity rules which are changeable (see the previous post). It is not trivial to know if something like 1+2 is even syntactically valid (*). Even if it is syntactically valid things like association rules might change. In languages like Groovy you can add (e.g., through categories) methods to code blocs (from classes that can be dynamically changed). Then there is dynamic dispatching and macros. What is valid in a certain piece of code can be different from what is valid a few lines below. In fact, complete information of what is valid in a certain code block might require code execution. Or, to put in another way, it might be very difficult to have a completely helpful IDE! In this scenario there are 3 considerations that I think are worth being done:

1. One should not be discouraged for not having perfect solutions. Maybe it is not possible to determine all that can be expressed in a certain code block, but sometimes good approximations are enough.
2. On this issue, one good example comes from Prolog: In Prolog, syntax can be changed mainly through the use of the :-op directive (and through asserts and retracts). The :-op directive changes operators but is very easy to analyze pre-compilation/interpretation. So, the way DSLs are normally be constructed lend themselves very easily to code analysis which can be used by IDEs. This unfortunately not the case in most real-world languages.
3. It would be cool to have a language where DSL specifications could be automatically used to construct IDEs. The current real-world DSL-able languages (Ruby, Groovy, …) are DSL-enabled through indirect techniques which can be used to build DSLs (Dynamic reception, operator overload, whatever), in fact many of these techniques exist with other objectives than creating DSLs. If there was a declarative and explicit way to create DSLs, that information could be used to inform IDEs on parsing and other issues. An embedded, core way, to explicitly specify DSLs.

(*) I suppose some will see this as an argument for the fact that you can do pretty stupid (or at least unintuitive) things with DSLs. Well, you can do stupid things with everything. The question is not if you can or not, but the extent of bad use cases and how bad uses can creep in easily. Another (interesting) discussion, but not for now.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn

Posted in Languages, Programming. Tagged with , , , , , .

Operators in Ioke and Prolog and DSLs

I was reading Ola Bini’s post about operators in Ioke (Ioke being the new language that Ola is developing).

It is a common saying around LISPers that everything that is being done in “modern” languages is a return to LISP. And the argument holds some ground. The truth is, among the 4 most conceptually influential programming languages that I can think of (Lisp/Functional, Fortran/Imperative, Smalltalk/OO, Prolog/Logic), the bad option (Fortran) won as it is the major philosophical contributor to current programming languages (much more than Smalltalk).

Take the reinvention of operators on Ioke as per the post above. This concept is available in Prolog for decades. It is all there: precedence (i.e. 2*3+4 means (2*3)+4 and not 2*(3+4)). Associativity (left or right - ie. 3-2-1 is 0 (3-2)-1 and not 2 3-(2-1) ). And even more as new operators can be defined and can be made of alphanumeric characters (want to create a new operator called say, “in”? go ahead). In fact people were doing DSLs a long time ago (in the small Prolog community at least) using techniques such as these.

The next thing that you will need (and we are getting there with macros and AST access) is no default interpretation. This is especially important with arithmetic, let me give an example:

Imagine the expression 1+x. Most languages will evaluate this expression and will return the sum of 1 + x. If x is defined and say is 4, then 1+x is 5. If x is not defined then an error (compile or run)-time will be raised. This is an absolute disgrace for DSLs with are essentially declarative (i.e., detached from semantics). “1+x” might be something that you want to evaluate now (and get the result) or might be something that you want to specify in order to evaluate later (say, I want to do a chart of all values of x between 1 and 5, or I want to differentiate), look at this pseudo-code

Var x
Exp expression = 1 + x**2

chart(expression, [[x,[1, 5]]]) //do a chart, x between 1 and 5
evaluate(expression, [[x,3]]) //Evaluate expression where x is 3 (i.e.  10)
diffe = differentiate(expression, x) //returns the expression 2*x
prettyprint(expression) //Pretty prints the expression.

Most people automatically associate the operation evaluate to 1+x**2. That might be so in an imperative world (can I call it shitty world?). But in an declarative/DSL world 1+x**2 is just that, an expression, it has no meaning attached per se. What you do with it depends on the context. Pretty print it, differentiate it, integrate it, or even evaluate it by instantiating x to 3 and getting the “precious” 10.

Update: I was rereading the post and noticed that it might be read as seeing Ola’s work as less interesting. Not at all: I actually think the way forward is precisely improving the current “imperative” setting in the way Ola is doing.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn

Posted in Languages. Tagged with , , .

Groovy type crazyness

Before I start, please remember the finesse of numbers in groovy: 0.1 is a BigDecimal, if you want a Double, you have to write 0.1D.

Also, I might be seeing something completely wrong here, corrections are more than welcome!

So, what is the result of code below?


List lst = [0.1, 0.1D]
println lst[0].class
println lst[1].class
println 0.1 == lst[0]
println 0.1 == lst[1]
println 0.1 in [lst[0]]
println 0.1 in [lst[1]]

Well, in my book the interpreter should whine on the first line and stop. I am declaring a List of doubles and putting a BigDecimal in. But it doesn’t. I suppose this is either a bug or some type messing coming from a the not very clear way (for me) Groovy handles types: If I say the type of lst is a List of Doubles, I expect it to behave statically. Either that or the language is misguiding me is allowing me to specify the type and then ignoring it, not good.
So, the result:


class java.math.BigDecimal
class java.lang.Double
true
true
true
false

Note that 0.1 is equal to 0.1D (i.e. BigDecimal is equal to Double. For me it makes sense as they have the same value) BUT 0.1 is not in [0.1D]. This, I suppose can only be categorized as a bug (or as something completely unintuitive).

I understand that numbers are not an easy thing to address (precision vs efficiency), but this strikes me as nonintuitive in 2 fronts (type declaration and number/equality behavior)

Correct me if I am wrong (I can see myself doing a big blunder with equality operator semantics, but I have trouble accepting that groovy lets me put a BigDecimal inside a list of double)…

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn

Posted in Programming. Tagged with .

Why I like explicit typing

Some reasons in favor of explicit typing (I am not talking about static vs dynamic here, just about the possibility annotate the expected type of a certain thing):

1. It can serve as code documentation. In some cases, with some kinds of programmers, the only documentation that you have is code itself (although this would require compulsory explicit typing which doesn’t happen always - groovy, caml and scala are examples of exceptions - in many cases saying the type is optional)

2. It helps IDEs help you. There is some discussion that IDEs for non-explicit (actually dynamic) languages can be as helpful as static languages. Well, if the information is there, then surely the IDE case use it and help you.

3. Bugs. Maybe your function is working and should not be working. Maybe the object, which should never be passed to that function is responding just because there is a signature that matches. I find this pattern somewhat common: a) There is a function parameter (without the explicit type) on a buggy function call. b) I put the type in on the called function. c) It immediately becomes clear that somewhere I am passing something that shouldn’t be going in in that form, a pseudo-code example:

myFunction(a, b) {
  String x = a + b
  print x.toUpperCase()

a should be a String (+ is a concatenation), but for some reason myFunction gets called with a as an integer and kaboom (+ is interpreted as addition).

This can be quite insidious with type inference (CAML for sure, probably Scala also) where you can get a bug on a chain of say, myFunction3 calling myFunction2 where the bug is somewhere else (say another myFunction1 which also calls myFunction2): When the compiler reads myFunction1 it does a wrong type inference about myFunction2. Afterwards, when the compiler passes on myFunction3 it complains, but the bug was caused elsewhere (so the information from the compiler is useless). If you put the type on myFunction2, the compiler will whine on the correct place (on the myFunction1 call). These bugs can be a pain to detect because sometimes the chains can be long. I had the “pleasure” of spending countless nights with caml tracking these bugs. 15 years ago, but I still remember.

Anyway, non-compulsory explicit typing (a la Groovy, Scala, CAML) is a good compromise (use it if you like it). In fact, in some cases it is good to be lazy anyways ;)

PS - As far as I remember in Groovy and Scala there are cases where explict typing is compulsory anyways (correct me if I am wrong). I would suppose that comes as a need as those languages and JVM and Java friendly by design and the compiled code will require that info.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn

Posted in Languages.

Method references and Output redirection in Groovy

When doing Groovy scripts I sometimes have the need to redirect the output of a part of the computation that is going to standard out.. A possible solution would be to open a new Writer and change the code to write to it (i.e. replacing all prints with newStream.prints), this, of course requires changing all prints, which is cumbersome and boring. There is a lazy alternative, using method references:

s = new PrintStream(new FileOutputStream("/tmp/myOut"))
def print = s.&print
print "a"
print = System.out.&print
print "b"

In this case the a is written to /tmp/myOut and b gets back to the standard output again. The big gain: all those prints in a script (and printlns) don’t need to be changed! Lazy me is happy.

Caveat: I would be careful in using this strategy a lot, it is be very easy to loose track of what is happening to the output. But it can be quite an expedient way to redirect prints on simple scripts.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn

Posted in Tips. Tagged with .

Prototyping, performance and declarative languages

During my “silent months” (for details see this post) I’ve been developing a simple system to study the spread of of antimalarial drug resistance. It is a “typical” scientific application with a core (which simulates genetic recombination of individuals reproducing) which is computationally very demanding.

As it is common in these scenarios I started by developing a prototype in a high-level, declarative language (in my case Groovy). I was pretty sure that the first solution would be slow as hell, and part of of that slowness would be due to using a “scripting” language (although algorithm complexity is the cause of slowness, changing the language should at least get running times down 1 order of magnitude). The initial solution was in fact slow. So I proceeded to do the usual thing: identify the expensive part (easy in my case) and rewrite that part in Java. My intention was to end up with a typical hybrid system: core, computational intensive code in Java and high-level functions in Groovy, for easy and productive manipulation.

Converting from Groovy to Java is easy, in fact it is too easy: The final Java code was full of Groovyisms: legacy generics code (things like Map<String,List<Integer>>) and strange looking (from a Java perspective) code originating on .each constructs among other things that made the Java code look very strange.

Needless to say, there were not that much speed improvements. In order to improve things I started to try to be sure that the data structures below List<> had the required complexity for my most used operations. Not much improvement. I then decided to completely convert things like List<List<Integer>> to the typical Java int[][]. Spaghetti and semantic chaos followed (just think of the not-so-minor differences in semantics between lists of lists and [][]).

Being a member of the fundamentalist church of refactoring I decided to do the unthinkable: throw the code away and rewrite it from scratch. I would rewrite the whole code, starting from the core in Java in a Java idiomatic way targeting performance. Then, on top of that I would grow a set of Groovy wrappers in order to easily manipulate the said core. Worked perfectly! Actually I am running that code in the background (on a Asus EEE) as I write this.

The (somewhat elusive) lesson that I took from this is that going from prototype to production code, when the fundamental difference is performance, can be cumbersome if the prototype language is too close to the production language (and Groovy and Java and close enough). The temptation to do a line by line code conversion is too good for comfort (I actually did rename the computationally intensive .groovy to .java and translated line by line - feel free to call me silly) and can have very upseting results.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn

Posted in Programming. Tagged with , , .

The library over-engineering syndrome

First a personal note: I’ve did not write (or doing any “things on the Internet”) for the best part of 2008. Although part of it was due to a busy schedule, most of it was due to illness (being obsessive-compulsive has some strange consequences). I finally decided to tackle my health issue (which is solvable, at least in my case).

Anyway, I’m still working in computational biology, still working with malaria, and I am still working with Groovy. So… Lets get back to the usual topics…

Before I start, a caveat: “Over-engineering”, as used below, should not be seen as scornful, we all know that traditional OO-languages and libraries try to be as general purpose and deployable in industrial software processes. In that setting, languages and libraries which present themselves in a typical OO-setting are, comprehensibly “over-engineered”.

The so-called scripting languages (for the lack of a better word, lets stick to it) are supposedly more productive than traditional languages (especially in small to medium size projects). Languages like Java are “over-engineered” beasts, seen as general-purpose, “industrial”, heavy-duty. Our beloved scripting languages fit our brain, they are agile, we can be highly productive, write less lines of code, accomplish more, be more declarative.

Can we really? Lets consider a subset of those languages, those like Groovy or Scala which were developed for the JVM (or all languages that were ported to the JVM). One of the pluses of these languages is that they can use the whole JVM ecology of libraries. The problem is that, most of those libraries are developed in a Java mentality (i.e., they are over-engineered). An example:

The fantastic JFreeChart library produces high-quality 2D charts of all kinds. It has all the flexibility that we expect from the typical Java library, you can do everything. In the Groovy landscape there is also a Builder for it, groovychart, of which I am a minor author. But, whenever I want to plot a chart, my first impulse is to use the (also great) matplotlib (CPython based). Why? Because to plot a line chart in matplotib it is 3 lines, which I remember without going to the documentation:

from pylab import *
plot([1,2,3])
show()

Really, it worked at the first attempt.

In groovychart? I am not even sure of the whole process, but it involves starting swing, preparing the dataset, choosing the chart, … And again, I am one of the authors, I do groovycharts everyday, but I still need to go to a template to do something it takes 3 lines in matplotlib. Matplotlib fits my brain, groovychart doesn’t.

While having all the Java libraries at hand is obviously a good thing, there needs to be a “scriptization” of many of those libraries. There is need for interfaces that “fit the brain”. Groovy + JVM libraries is only tackling half of the problem. Even JVM libraries with a Groovy idiom (like groovychart) don’t address the “over-engineering” problem. What is needed, in my view, are wrappers which are not only Groovy-idiomatic but also Groovy-philosophical: they fit the brain (wrappers which allow to plot a simple line chart in 3 lines).

This can actually be seen in Groovy itself for IO and many data structures: Some core Java libraries are well covered and are already available in a “fit-your-brain” interface. Hopefully we will see more interfaces like this for many existing libraries (and less like groovychart, which are only idiomatic wrappers).

PS - Another way to tackle “over-engineering” problems comes from good IDEs and, in fact, the average modern Java IDE goes to great lengths in reducing “over-burden”.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn

Posted in Design. Tagged with , .

DSLs involving existing classes in Groovy

Preamble: In order to understand this post you should know a little bit (a little is enough, that is how much I know) about ExpandoMetaClass and Categories in Groovy.

DSLs that involve existing classes might be a source of long term sorrow. Let me give an example: Imagine that you want to make a small DSL to handle equations, like

x = new Symbol("x")
(2 * x).differentiate(x) //Result is 2

The problem is that the * operator of Numbers doesn’t know how to handle Symbols, therefore an exception would be raised. The obvious solutions as discussed before on mailing lists and blog posts are:

Categories

Categories would solve the problem, but at the expense of polluting the source with things like

use (Something.Category) {
  //code here
}

Not a disaster, but not pretty too…

Talking about disasters…

Expando over Numbers

The idea here would be to change the behavior of Numbers to be able to handle Symbols. Code would be very clean, no need for uses…

As somebody said on the groovy mailing list: This is disaster in the making. The problem is that I change Numbers, then, for another valid reason you change Numbers, somebody else also changes Numbers… This is chaos. Or at least it would make code from different sources potentially not inter operable or exhibiting very strange, buggy, behavior. This is clearly akin to the “global variable” problem. I believe that in the long term and with big software projects, this approach is a dead end.

Enter Python

Python actually has a workaround (I will not call it a clear, beautiful solution) that might be somewhat useful here. Imagine that you do

1 + x

The default 1 (default class for number) is not able to handle the symbol. For python that is OK, it will try to call a “right add” method of x (Search for __radd__ in this page). So, the default behavior is not to raise an exception if the left object cannot handle the operator, but to try to call the “right” version on the right object (if it fails then raise).

Not perfect, but might be just enough to avoid Expando in anger.

I do believe that people still don’t appreciate the consequences of Expanding core classes and the interop disaster that that can entail.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn

Posted in Programming. Tagged with , , , .

Revisiting Groovy performance issues

I tried to drill down the Groovy performance issue that I had with what is in practice a text processing exercise.

The original code was written in Groovy (and then ported to Java, not the other way around), but as I was in a hurry it was written in idiomatic Java (I am too much of a Groovy newbie to be able to write in idiomatic Groovy if I am in a hurry). Ted Naleid left some great suggestions on how to be more groovyish.

Anyway, I took my original code and tried to understand what was going on, here are my findings.

Replacing duck typing with explicit typing took a minute out (from 4m to 3m).

Converting this

iCase.each {
    if (jCase.contains(it)) {
        isDifferent = false
    }
}

to this

if (jCase.contains(iCase[0]) || jCase.contains(iCase[1])) {
    isDifferent = false
}

took 1m10s (from 3m to 1m50s) - This is in a inner loop part.

8 seconds were gained by changing this:

for (int j=i+1; j

into

int iSize = indivs.size()
for (int j=i+1; j

As inline comments you can find how much time each line took in the
following inner loop:

for (int j=i+1; j

The only stunning thing is the time lost at indexing String arrays (and maps, but that I can understand).

This text is being written as I was changing and trying things, I gained 20s from
minor changes of which I lost track :) . I am currently at 1m30s (down from the
original 4m and comparing with Java’s 4s).

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn

Posted in Programming. Tagged with , , .

Groovy performance speed

I think that language performance (from a speed point of view) is highly overrated. There are many factors that are more important. Well, on the time front alone, developer time is normally more important: The time spent groking code is normally more expensive than the running time of an application. Of course, there are many other important points, too much to enumerate (portability, declarativeness, readability, …). If performance was the fundamental variable, we would all be using assembler.

I am currently doing a bit of code to go over tens of millions of lines of text, while comparing separate columns.

I did a little piece of Groovy code to go over all those lines. The performance results were abysmal, so I decided to do a program in Java (copying the Groovy code to a Java file and converting in a very direct way). For 3000 lines of text here are the results (remember, this is to process hundreds of millions):

$ time java Do

real    0m4.427s
user    0m4.384s
sys     0m0.040s

$ time groovy do.groovy

real    2m53.303s
user    2m47.650s
sys     0m0.668s

4 seconds against 2mins 53 seconds. This is not serious as it is possible to write all Groovy intensive parts in Java. But, even so, it is too much.

The code? (Afterwards, some speculation and profiling)

Groovy:

...
while ((line = reader.readLine()) != null) {
    lineTok = line.tokenize()
    if (lineTok.size() == colIndiv.size() + 11) {
        for (int i=0; i

Java:

...
while ((line = reader.readLine()) != null) {
    String[] lineTok = line.split(" ");
    if (lineTok.length == colIndiv.size() + 11) {
        for (int i=0; i -1 ||
                    jCase.indexOf(iCase.charAt(1)) > -1 ) {
                    isDifferent = false;
                }
                if (isDifferent) counts.put(
                    indivs.get(i) + indivs.get(j),
                    indivs.get(i) + indivs.get(j) + 1);
            }
        }
    }
}

I cursorly run the Java profiler, though I did not spent much time on it, it seemed that (speculation alert!) Groovy was spending sometime on metaclassing/proxying parts. I wonder if the “defs” were making things much slower? Maybe if I had properly typed my loop variables (instead of being lazy and def duck typing) things would have ran smoother. If that is the case, then one more reason against duck typing (others being helping the IDEs and automated code tools and for debugging purposes)

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn

Posted in Programming. Tagged with , .