Archive for the ‘Programming’ Category

Here is my first clojure application: A phylogenetic tree viewer (PhyloXML format). The obligatory screenshot:

Simple Phylogenetic viewer

Simple Phylogenetic viewer

Preamble:

  1. This is newbie code: Handle with care! My main objective is not to make a tree viewer but a tree comparer. So this is no more than a learning step.
  2. You can test it yourself as it is a Java WebStart application, just click here. You don’t need to have a phylogenetic tree file yourself. I supply an example inside.
  3. This makes use of JGraph and Archaeopteryx (the PhyloXML parser)

I do maintain this code on github. I have one project for the viewer and another for general utilities. All the code is still very crude, but you might be interesting in stealing some of the swing code, either as a crude example of how to interact with swing or taking my micro-DSL for menus. If you want to interact with JGraph, this might be a starting point. I don’t want, in any way, suggest that this code is any good.

Some lessons that I’ve learned and that I would like to share:

  1. Some of the clojure.contrib code is a bit green. I tried to use the graph library, but it is very small and specific. I ended up starting doing my own. Mine is even smaller and specific, no claims of generality.
  2. I don’t appreciate some of the core functions of Clojure (I’ve written on this before and will write more in the near future). The great thing about Clojure is that you can import only what you want from the core and extend it yourself. I intend to do just that for my personal use. This is a PLUS point for Clojure: the flexibility that is made available to change many of the decisions of the language implementor (in the great tradition of declarative and homoiconic languages)
  3. While I can change the core for my uses, I think defnk should really be core for everybody! I fact I wander if defn should not become defnk…
  4. I am pretty sure that when *warn-on-reflection* is activated and action taken to correct the warnings, lots of code will increase in performance. With the more important side effect of annotating the code with type info.
  5. I have quite a lot of recursive code that doesn’t use recur. Something to learn and master…
  6. JGraph layout algorithms are not fantastic. I’ve tried with much bigger trees and the result was far from perfect (I also noticed performance problems in my own code).

The biggest hurdle that I’ve found was the construction of user interfaces and how verbose Clojure Java interop can become. Of course one can create functions (and that was done) to create buttons, frames, menus, etc. But the creation of Java container structures (think frame contains menubar which contains menu with menus inside and so on) would benefit from a dialect where, when a certain (container) object was created it’s (Java) namespace would become easily available.

Imagine constructing a Structure like this:

MenuBar[
    Menu(File) [
        Menu(New)
        Menu(Close)
        Separator
        Menu
    ]
    Menu(Edit) [
        Menu(Cut)
        Menu(Paste)
    ]
]

it would be nice to be able to write something like:

((new JMenuBar)
   (add ((new JMenu "File")
       (add (new JMenu "New"))
       (add (new JMenu "Close"))
       (addSeparator)
    )
   (add ((new JMenu "Edit")
       (add (new JMenu "Cut"))
       (add (new JMenu "Paste"))
    )
)

“add” and “addSeparator” are Java methods. All this would be dynamic against the Java object hierarchy (not a hand-written library!). Note that there is no doto special form (or variants) and, most importantly, note that, given a list (a b c d), if a is a Java object b c d are evaluated as methods of a. If b is (i (y x s)), x and s would be evaluated as methods of y, if they failed then as methods of a, if this fails interpreted as normal Clojure. Something like this (rough sketch).
This would be useful, e.g., to construct Swing hierarchies by hand in a expedient way (not suggesting anything more, especially not to do big programs with outside scope).
I am going to try to write some code that does this in the next few days.

Social network sharing
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn
  • connotea
  • FriendFeed
  • Twitter
  • Yahoo! Bookmarks

Here can be found an interesting effort to implement the 99 Prolog problems in Clojure.

It is not clear to me that the exercise is conducted in the same way as the original one [Update: the author actually says this in the preamble]. Let me explain:

The original Prolog problems are solved without the help of the (existing) Prolog libraries, just using the basic language mechanisms. They are a good at illustrating the underlying declarative power of Prolog.

For instance, problem 1, finding the last element of a list is solved with this in Prolog:

% P01 (*): Find the last element of a list
 
% my_last(X,L) :- X is the last element of the list L
%    (element,list) (?,?)
 
% Note: last(?Elem, ?List) is predefined
 
my_last(X,[X]).
my_last(X,[_|L]) :- my_last(X,L).

Notice the in-code comment that “last is predefined”. In fact, using the Prolog library this could be done with a one liner:

?- last([1,2,3],E).
E = 3.

The offered solution in Clojure is also a one liner:

user=> (last [1 1 2 3 5 8])
8

Given a sufficiently large and clever library (and Clojure has a very nice library) all problems on the list could be solved with a one-liner.

In my opinion, an apples-to-apples comparison with the original solutions would not use the core library.
It would probably be like this for the same problem:

(defn mylast [l]
  (let [mynext (next l)]
    (if (nil? mynext)
      (first l)
      (mylast mynext)
    )
  )
)

Yep, next is in the core library also, but being a call to clojure.lang.RT, I think it is fair game to use it.

Ok, better yet, with recur, as it is on core (this is essentially a copy of the core version):

(defn mylast [s]
  (if (next s)
    (recur (next s))
    (first s)
  )
)

The Prolog exercise exposes the declarativeness and expressive power of Prolog. The Clojure example exposes mostly the cleverness of the core library.

Both are interesting points of view (I am not criticizing the Clojure solution), but they cannot be used for comparison purposes.

Social network sharing
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn
  • connotea
  • FriendFeed
  • Twitter
  • Yahoo! Bookmarks

More than 10 years ago I participated in the development of an University IT system (the front- and backend to maintain grades and that sort of stuff). The system was based on a DB/2 backend (a very nice database system) with the business code stored on a Prolog interpreter (Prolog interpreter which was in-house developed) and the web backend being a Java servlet engine (the old JServ, the thingy pre-Tomcat from Apache). Prolog is famed to be slow, and Java (at that point in time) was very slow. Surprise, surprise… the bottleneck was on the DB/2 server. Eventually, as the system grow (and the database hardware was beefed up) the bottleneck come forward to the business and web tiers, but the problem was sorted by just adding more machines: The contention was on a bunch of parallel independent process, they could be run on separate machines.

The example above illustrates why the concurrency problem posed by multiple core CPUs and GPUs, might not be that much important:

  1. Many problems are not CPU bound anyway, and even if they are, the bottleneck might be elsewhere. Another example: I am the proud owner of 3 cheap, slow laptops (one being a netbook). For my use case I really don’t need faster applications, I wonder how many users really need more than they already have?
  2. Even if more CPU/GPU power is needed, a loosely coupled model (without much interprocess communication and contention issues) might be enough. This is typically the case of many web apps, which can scale by just adding more computers which run independent processes.

Concurrency, even with modern abstractions, is hard. It should be avoided if possible and it can be avoided in many applications. If it cannot be avoided, maybe a loosely coupled model is enough… Guido van Rossum has a nice take on this issue.

This is important as concurrency is being touted as an important criteria to evaluate languages. Modern functional languages (think Scala and Clojure) are being touted as a better option precisely because they are better to do concurrency (both because of functional – “no changing state” – programming and the availability of libraries implementing nice concurrency paradigms like actors).

When addressing this importance of this issue, I would propose, that people would ask themselves this: “Am I developing computationally intensive software?” and “If I am developing computationally intensive software, can I live with loosely coupled models of computation, preferably processes with no shared memory?”

This is not to say that there are not some cases where tightly coupled computing is a good idea. It is just that, this complex solution might be an overkill for many problems.

I would just like to add that I am not defending my cause, in fact it is quite the opposite. There is actually some content produced here, in the past, on how to tackle concurrent programming:

  1. LOSITAN – A multicore-aware Jython-based (Python for the JVM) Web Start application to do selection detection.
  2. An introductory tutorial on concurrent computing targeting computational biologists – Part 1, 2 and 3
Social network sharing
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn
  • connotea
  • FriendFeed
  • Twitter
  • Yahoo! Bookmarks

I am doing some development in Clojure (a Lisp type language for the JVM). Lisp as in a clone tailored for the JVM, not Lisp as only “functional programming”. I note, by the way, that more than functional programming, Lisp is an homoiconic language.

I developed a simple system to specify Swing menus in clojure, here is an example:

Simple Menu

Simple Menu

The following “micro-language” was developed to specify this:

 (getMenuBar actionManager '(
    (menu {
      :text "Project" :key "P"
      :content (
        (item {:text "New" :key "N"})
        (item {:text "Open" :key "O"  })
        (item {:text "Close" :key "O" :id "Close" :enabled false})
        (item {:text "Recent" :key "R"})
        (separator)
        (item {:text "Exit" :key "E"})
      )
    })
    (menu {
      :text "Options" :key "O"
      :content (
        (item {:text "Rendering" :key "R"})
      )
    })
))

The code is very easy to read, I hope: two menu items, with a few menu entries with text, ability to enable/disable and accelerator keys, plus a separator.

Notice the actionManager on top, is it the (very simple) event processing function which receives only a text as parameter (to identify the selection). The text is simply the menu text, or, if specified an id. Not the most general solution, but enough for simple menu structures.

The code? Below is the _complete_ implementation.

 
(ns org.tiago.swing
  ;(:require clojure.contrib.def)
  (:use
    [clojure.contrib.seq-utils :only (flatten)]
    [clojure.contrib.def :only (defnk)]
  )
  (:import
    (java.awt.event ActionListener KeyEvent)
    (javax.swing JFrame JMenu JMenuBar JMenuItem)
  )
)
 
(defnk createFrame [title :menuBar nil]
  (def frame (new JFrame title))
  (. frame setDefaultCloseOperation (. JFrame EXIT_ON_CLOSE))
  (if menuBar (. frame setJMenuBar menuBar))
  (. frame pack)
  (. frame setVisible true)
  frame
)
 
(defmulti addMItem (fn [manager x & rst] (first x)))
(defmethod addMItem 'item [manager content menu]
  (let [params (second content)]
    (def mItem (new JMenuItem (:text params)))
    (if (contains? params :id) (. mItem putClientProperty "id" (:id params)))
    (if (contains? params :key) (. mItem setMnemonic (. (:key params) charAt 0)))
    (. menu add mItem)
    (. mItem addActionListener manager)
 
  )
)
(defmethod addMItem 'separator [manager sep menu]
  (. menu addSeparator)
)
 
(defmulti getMBItem first)
(defmethod getMBItem 'menu [desc]
  (let [params (second desc) manager (last desc)]
    (def menu (new JMenu (:text params)))
    ;Assuming mnemonic is ASCII CODE.
    ;java7 has . KeyEvent getExtendedKeyCodeForChar
    (if (contains? params :key) (. menu setMnemonic (. (:key params) charAt 0)))
    (if (contains? params :id) (. menu putClientProperty "id" (:id params)))
    (dorun (map #(addMItem manager % menu) (:content params)))
    menu
  )
)
(defmethod getMBItem :default [arg] (new JMenu "UNK"))
 
(defn getMenuBar [actionManager menuItems]
  (let [manager (
      proxy [ActionListener]
      []
      (actionPerformed [e] (let [obj (.getSource e)
                                 id (.getClientProperty obj "id")]
        (actionManager (if (nil? id) (. obj getText) id))
      ))
   )]
   (def menuBar (new JMenuBar))
   (dorun (map #(. menuBar add %)
            (map #(getMBItem (concat % (cons manager ()))) menuItems)))
    menuBar
  )
)

OK, comments have to be added ;) .
From a declarative point of view, not bad at all.

My first Lisp program. It completely baffles me that, 25 years of programming with all the languages imaginable (including some functional like Caml or highly declarative like Prolog), I never tried Lisp.

Social network sharing
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn
  • connotea
  • FriendFeed
  • Twitter
  • Yahoo! Bookmarks

I have been toying with Processing. Processing is..

…an open source programming language and environment for people who want to program images, animation, and interactions. It is used by students, artists, designers, researchers, and hobbyists for learning, prototyping, and production.

Processing is actually a full blown IDE with a language based on Java (it is JVM based).

I am fascinated by its community and how most things seems to cleverly done.

Here is my first example (applet alert!). After it loads, move the mouse over the applet to see it in action (you might have to mouse click for it to start on some browsers, like Opera):


Please install Java!

The code for the above is just:

void setup() {
    size(400, 200);
}
 
color elColor = color(255,255,0);
 
void draw() {
  background(255);
  float delta = (1.0*mouseY)/height;
  if (random(10)<1) {
      elColor = color(random(255), random(255), random(255));
  }
  fill(elColor);
  ellipse(width/2, height/2, 80.0*mouseY/height, 50.0*mouseX/width);
  fill(255.0*mouseY/width,255.0*mouseX/width, 0, 255*mouseX/width);
  quad(0, 0,
      delta*width/2, (1-delta)*height/2,
      delta*width/2, height - (1-delta)*height/2,
      0, height);
  quad(width, 0,
      width - delta*width/2, (1-delta)*height/2,
      width - delta*width/2, height - (1-delta)*height/2,
      width, height);
}

The vibrant community behind seems to produce quite a lot of really neat examples. Go and check for yourself

Social network sharing
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn
  • connotea
  • FriendFeed
  • Twitter
  • Yahoo! Bookmarks

It is interesting to see how different people tackle the ongoing multicore (and GPU) software “revolution”. There are strong philosophical differences on how to develop for these new concurrent architectures. Lets start with the extremes.

The most interesting extreme comes from Guido van Rossum (aka Python benevolent dictator for life): He suggests that if you want to use the available processing power of multiple cores you should have separated processes, let me quote:

[...] doesn’t mean that multiple processes (with judicious use of IPC) aren’t a much better approach to writing apps for multi-CPU boxes than threads.

Just Say No to the combined evils of locking, deadlocks, lock granularity, livelocks, nondeterminism and race conditions.

Some similar arguments are made by the message passing crowd, which seems to be quite happy with a model based on explicit message passing between separated processes.

The fundamental idea here is that shared memory between parallel computing threads can lead to a lot of grief and sorrow, thus is is better if all the data memory space is the sole propriety of a single thread. Communication occurs in a explicit form (e.g., message passing among executing code) between threads that do not share anything (other than messages).

The opposite idea can be found on the typical C/C++/Fortran, lower-level crowd: One single process, many threads, a single memory space shared among threads with concurrent access controlled through a low level mechanism like semaphores. This seems also to be the underlying idea of the OpenMP system. These folks believe that programmers can tackle parallel complexity easily (well, at least it is not an impossible, daunting task according to this philosophy).

The point of contention comes from the fact that multiple execution flows introduce a completely new class of bugs coming from the need to coordinate a lot of things going on in parallel. The worst problem introduced is non-determinism: You can execute the same program twice, WITH THE SAME INPUT and get different results. Why? Because the different threads/processes will be scheduled in unpredicted ways by the operating system (or virtual machine) which can yield different results. This severely increases the difficulty to test and debug software. The shared memory crowd (the shared memory model is more efficient and flexible as, well, memory is directly shared) will say that we can deal with this. The message passing crowd suggests that having some restrictions and explicit communication will make life easier (or, less complicated).

The Java crowd is where you can find the most variety of opinions, but the core JVM and Java language itself seems to follow the C/C++ philosophy (though with some candy thrown in, like the Fork/Join framework). But on top of that you can find everything with a vocal support community: Tuple spaces, Map/Reduce, Message passing, etc. This is not to say that the Python and C/C++ communities are monolithic (they are not! Just check the C implementations of MPI and PVM), but you really can find a lot alternatives with vibrant communities on top of the JVM.

A sort of middle of the ground approach was introduced de facto with the programming language Erlang: Erlang allows for multiple threads, but the communication is shared-nothing and based on message passing. I.e. while there is one single process with multiple threads, there is no shared-memory per se and all inter-thread communication is based on message passing. This Actor model based language has influenced some recent language libraries in Scala, Groovy and Clojure, among others where the actor model is the main concurrent programming model.

Many functional languages (like Erlang, Scala and Clojure) proponents also suggest that mutability (ie, the concept of variable stemming from imperative languages like C, Java, C#, Basic, C++, 99% of used languages) is not easily amenable to parallel programming and suggest that immutable data structures make life much easier: If what is shared cannot be changed then much less bugs can be introduced.

To sum it up: Some people suggest concurrent programming is difficult and it is better to minimize communication to tackle that difficulty. Others suggest that concurrent programming is workable and tightly-coupled memory-sharing systems are OK. Some also suggest (functional crowd) that immutable data structures help.

Further reading:
Concurrent computing (Wikipedia)
Scala actors – My preferred introduction to Actors (which happens to be based on Scala)
Erlang Concurrency Message passing (Wikipedia)

My opinion: Shared memory models are for real men! I am just a regular bloke, so I stick with message passing models. The complexity of bugs introduced by concurrent programming is much much worse compared to the existing sequential paradigm. In most of the cases that I have encountered, the restrictions imposed by message passing are acceptable compared to the benefits. Even with message passing and immutable data structures, concurrent programming is still very hard and bug prone (non-determinism is still quite possible with message passing). I expect (hope) that new R&D will allow us to tame this complexity. Avoid shared memory/tightly coupled systems like the plague!

Social network sharing
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn
  • connotea
  • FriendFeed
  • Twitter
  • Yahoo! Bookmarks

If you search the web you can find some discussions on whether IDEs for dynamic languages can be as helpful as IDEs for static languages. The issue is that static languages like Java have compile-time (thus easy to get at IDE-time) information in order to provide that fundamental code-completion functionality (among many others). If the IDE knows that a certain parameter is a String, than it is simple: it will present to you all the String methods when you type in the dot. For dynamic languages things get more complex are there is formally no (by definition) compile-time information. Some people would argue that there are ways around it (which you can already find in existing IDEs, I remember having some sort of code completion, years ago, on SPE – for Python). I will not add anything to that discussion here, this preamble was mainly for putting the reader in context. I am more interested in discussing good IDEs for DSLs.

With DSLs you get, most of the times, added syntax. Worse than that, you might fall into situations where you have changed (not only added) the initial language syntax; furthermore those syntax changes might even become valid only in runtime (imagine that a method is added to a class that is supplying DSL methods).

One example comes from Ioke and Prolog operator precedence and associativity rules which are changeable (see the previous post). It is not trivial to know if something like 1+2 is even syntactically valid (*). Even if it is syntactically valid things like association rules might change. In languages like Groovy you can add (e.g., through categories) methods to code blocs (from classes that can be dynamically changed). Then there is dynamic dispatching and macros. What is valid in a certain piece of code can be different from what is valid a few lines below. In fact, complete information of what is valid in a certain code block might require code execution. Or, to put in another way, it might be very difficult to have a completely helpful IDE! In this scenario there are 3 considerations that I think are worth being done:

1. One should not be discouraged for not having perfect solutions. Maybe it is not possible to determine all that can be expressed in a certain code block, but sometimes good approximations are enough.
2. On this issue, one good example comes from Prolog: In Prolog, syntax can be changed mainly through the use of the :-o p directive (and through asserts and retracts). The :-o p directive changes operators but is very easy to analyze pre-compilation/interpretation. So, the way DSLs are normally be constructed lend themselves very easily to code analysis which can be used by IDEs. This unfortunately not the case in most real-world languages.
3. It would be cool to have a language where DSL specifications could be automatically used to construct IDEs. The current real-world DSL-able languages (Ruby, Groovy, …) are DSL-enabled through indirect techniques which can be used to build DSLs (Dynamic reception, operator overload, whatever), in fact many of these techniques exist with other objectives than creating DSLs. If there was a declarative and explicit way to create DSLs, that information could be used to inform IDEs on parsing and other issues. An embedded, core way, to explicitly specify DSLs.

(*) I suppose some will see this as an argument for the fact that you can do pretty stupid (or at least unintuitive) things with DSLs. Well, you can do stupid things with everything. The question is not if you can or not, but the extent of bad use cases and how bad uses can creep in easily. Another (interesting) discussion, but not for now.

Social network sharing
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn
  • connotea
  • FriendFeed
  • Twitter
  • Yahoo! Bookmarks

Before I start, please remember the finesse of numbers in groovy: 0.1 is a BigDecimal, if you want a Double, you have to write 0.1D.

Also, I might be seeing something completely wrong here, corrections are more than welcome!

So, what is the result of code below?


List lst = [0.1, 0.1D]
println lst[0].class
println lst[1].class
println 0.1 == lst[0]
println 0.1 == lst[1]
println 0.1 in [lst[0]]
println 0.1 in [lst[1]]

Well, in my book the interpreter should whine on the first line and stop. I am declaring a List of doubles and putting a BigDecimal in. But it doesn’t. I suppose this is either a bug or some type messing coming from a the not very clear way (for me) Groovy handles types: If I say the type of lst is a List of Doubles, I expect it to behave statically. Either that or the language is misguiding me is allowing me to specify the type and then ignoring it, not good.
So, the result:


class java.math.BigDecimal
class java.lang.Double
true
true
true
false

Note that 0.1 is equal to 0.1D (i.e. BigDecimal is equal to Double. For me it makes sense as they have the same value) BUT 0.1 is not in [0.1D]. This, I suppose can only be categorized as a bug (or as something completely unintuitive).

I understand that numbers are not an easy thing to address (precision vs efficiency), but this strikes me as nonintuitive in 2 fronts (type declaration and number/equality behavior)

Correct me if I am wrong (I can see myself doing a big blunder with equality operator semantics, but I have trouble accepting that groovy lets me put a BigDecimal inside a list of double)…

Social network sharing
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn
  • connotea
  • FriendFeed
  • Twitter
  • Yahoo! Bookmarks

During my “silent months” (for details see this post) I’ve been developing a simple system to study the spread of of antimalarial drug resistance. It is a “typical” scientific application with a core (which simulates genetic recombination of individuals reproducing) which is computationally very demanding.

As it is common in these scenarios I started by developing a prototype in a high-level, declarative language (in my case Groovy). I was pretty sure that the first solution would be slow as hell, and part of of that slowness would be due to using a “scripting” language (although algorithm complexity is the cause of slowness, changing the language should at least get running times down 1 order of magnitude). The initial solution was in fact slow. So I proceeded to do the usual thing: identify the expensive part (easy in my case) and rewrite that part in Java. My intention was to end up with a typical hybrid system: core, computational intensive code in Java and high-level functions in Groovy, for easy and productive manipulation.

Converting from Groovy to Java is easy, in fact it is too easy: The final Java code was full of Groovyisms: legacy generics code (things like Map<String,List<Integer>>) and strange looking (from a Java perspective) code originating on .each constructs among other things that made the Java code look very strange.

Needless to say, there were not that much speed improvements. In order to improve things I started to try to be sure that the data structures below List<> had the required complexity for my most used operations. Not much improvement. I then decided to completely convert things like List<List<Integer>> to the typical Java int[][]. Spaghetti and semantic chaos followed (just think of the not-so-minor differences in semantics between lists of lists and [][]).

Being a member of the fundamentalist church of refactoring I decided to do the unthinkable: throw the code away and rewrite it from scratch. I would rewrite the whole code, starting from the core in Java in a Java idiomatic way targeting performance. Then, on top of that I would grow a set of Groovy wrappers in order to easily manipulate the said core. Worked perfectly! Actually I am running that code in the background (on a Asus EEE) as I write this.

The (somewhat elusive) lesson that I took from this is that going from prototype to production code, when the fundamental difference is performance, can be cumbersome if the prototype language is too close to the production language (and Groovy and Java and close enough). The temptation to do a line by line code conversion is too good for comfort (I actually did rename the computationally intensive .groovy to .java and translated line by line – feel free to call me silly) and can have very upseting results.

Social network sharing
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn
  • connotea
  • FriendFeed
  • Twitter
  • Yahoo! Bookmarks

Preamble: In order to understand this post you should know a little bit (a little is enough, that is how much I know) about ExpandoMetaClass and Categories in Groovy.

DSLs that involve existing classes might be a source of long term sorrow. Let me give an example: Imagine that you want to make a small DSL to handle equations, like

x = new Symbol("x")
(2 * x).differentiate(x) //Result is 2

The problem is that the * operator of Numbers doesn’t know how to handle Symbols, therefore an exception would be raised. The obvious solutions as discussed before on mailing lists and blog posts are:

Categories

Categories would solve the problem, but at the expense of polluting the source with things like

use (Something.Category) {
  //code here
}

Not a disaster, but not pretty too…

Talking about disasters…

Expando over Numbers

The idea here would be to change the behavior of Numbers to be able to handle Symbols. Code would be very clean, no need for uses…

As somebody said on the groovy mailing list: This is disaster in the making. The problem is that I change Numbers, then, for another valid reason you change Numbers, somebody else also changes Numbers… This is chaos. Or at least it would make code from different sources potentially not inter operable or exhibiting very strange, buggy, behavior. This is clearly akin to the “global variable” problem. I believe that in the long term and with big software projects, this approach is a dead end.

Enter Python

Python actually has a workaround (I will not call it a clear, beautiful solution) that might be somewhat useful here. Imagine that you do

1 + x

The default 1 (default class for number) is not able to handle the symbol. For python that is OK, it will try to call a “right add” method of x (Search for __radd__ in this page). So, the default behavior is not to raise an exception if the left object cannot handle the operator, but to try to call the “right” version on the right object (if it fails then raise).

Not perfect, but might be just enough to avoid Expando in anger.

I do believe that people still don’t appreciate the consequences of Expanding core classes and the interop disaster that that can entail.

Social network sharing
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn
  • connotea
  • FriendFeed
  • Twitter
  • Yahoo! Bookmarks