Posts tagged ‘scala’

More than 10 years ago I participated in the development of an University IT system (the front- and backend to maintain grades and that sort of stuff). The system was based on a DB/2 backend (a very nice database system) with the business code stored on a Prolog interpreter (Prolog interpreter which was in-house developed) and the web backend being a Java servlet engine (the old JServ, the thingy pre-Tomcat from Apache). Prolog is famed to be slow, and Java (at that point in time) was very slow. Surprise, surprise… the bottleneck was on the DB/2 server. Eventually, as the system grow (and the database hardware was beefed up) the bottleneck come forward to the business and web tiers, but the problem was sorted by just adding more machines: The contention was on a bunch of parallel independent process, they could be run on separate machines.

The example above illustrates why the concurrency problem posed by multiple core CPUs and GPUs, might not be that much important:

  1. Many problems are not CPU bound anyway, and even if they are, the bottleneck might be elsewhere. Another example: I am the proud owner of 3 cheap, slow laptops (one being a netbook). For my use case I really don’t need faster applications, I wonder how many users really need more than they already have?
  2. Even if more CPU/GPU power is needed, a loosely coupled model (without much interprocess communication and contention issues) might be enough. This is typically the case of many web apps, which can scale by just adding more computers which run independent processes.

Concurrency, even with modern abstractions, is hard. It should be avoided if possible and it can be avoided in many applications. If it cannot be avoided, maybe a loosely coupled model is enough… Guido van Rossum has a nice take on this issue.

This is important as concurrency is being touted as an important criteria to evaluate languages. Modern functional languages (think Scala and Clojure) are being touted as a better option precisely because they are better to do concurrency (both because of functional – “no changing state” – programming and the availability of libraries implementing nice concurrency paradigms like actors).

When addressing this importance of this issue, I would propose, that people would ask themselves this: “Am I developing computationally intensive software?” and “If I am developing computationally intensive software, can I live with loosely coupled models of computation, preferably processes with no shared memory?”

This is not to say that there are not some cases where tightly coupled computing is a good idea. It is just that, this complex solution might be an overkill for many problems.

I would just like to add that I am not defending my cause, in fact it is quite the opposite. There is actually some content produced here, in the past, on how to tackle concurrent programming:

  1. LOSITAN – A multicore-aware Jython-based (Python for the JVM) Web Start application to do selection detection.
  2. An introductory tutorial on concurrent computing targeting computational biologists – Part 1, 2 and 3

When you read about programming language comparisons, the main narrative for comparison is normally about the paradigm(s) supported. Lisp, Haskell, Scala, Clojure fall mainly in the functional realm. Prolog is logic. Smalltalk OO. C and Fortran, imperative. Most of them are not “pure” paradigm (e.g. you can make nice OO designed programs in C – just check GTK’s GLib library if you disagree, imperative coding in Prolog, and so on…), but that is besides the point.

The point is that, when comparing programming languages, the main issue of discussion is the bloody paradigm thing.

Paradigm is not really that important! In fact, as said above, you normally can tweak a language to write in your favorite paradigm. Sure the ability to do that varies from case to case, but in most cases that I can think of, it is really not difficult to cross paradigm boundaries. In fact, I would go as far as to argue that it is easier to do proper OO design with C using GLib then with the highly complex and convoluted C++.

Before going into the fundamental point that I want to make, I would also note that ecology matters: Are there good libraries? Good documentation? Does it run on a virtual machine? Portability? Nice community? User base? That is, when comparing programming languages all that is around the language is more important than the language itself. Just ask all the poor of us poor Prolog/Lisp/Haskell fans why are we doing Java/C++ during most of our day? It puts bread on the table, and, for the most of us, that is the most important criteria (I prefer not to starve!).

But, going to the main point here, I would like to propose that one of the fundamental points in comparing programming languages from a technical standpoint is homoiconicity.

Just to remember, an homoiconic language is a language where the program is represented as the core language data-type. Code is a data type.

If you classify languages according to homoiconicity, then they split in completely different ways:

  1. The homoiconic bunch: Lisp, Prolog, Ioke, Clojure, …
  2. The non-homoiconic bunch: Cobol, Fortran, C, Java, Goovy, Scala, Haskell, OCaml, [A very long list follows]…

From this point of view, the comparison of say, Clojure to Scala as sister-languages makes little sense, as they fall in different groups.

Homoiconic languages lend themselves to – by construction – metaprogramming and extensibility (think very easy embedded DSLs). And some of these features are difficult (with varying levels of difficulty) to implement in non-homoiconic languages. At best (as “best” I am thinking of some scripting languages like Python), they are awkward to do in a non homoiconic language.

As a side jab, last time a checked, Scala was very very poor on metaprogramming (has that changed?), making it the only “modern” language which seems to be scorning metaprogramming. Scala can still be DSL-extensible (I offer my own example both in Scala and Grovy: Ronald: A Domain-Specific Language to study the interactions between malaria infections and drug treatments.

One could argue of the value of doing programs that reason about themselves (and that idea has very bad karma coming from assembler – an idea so old and so disconnected from current reality that I am not even going to discuss it). I am surely on the side that proper metaprogramming is one of the core features of any elegant, productive and declarative solution.

Also, a very nice side effect of having code as data, is that the syntax of homoiconic languages is normally very, very simple (as in trivial to learn). This is just a side effect, but compare this with the learning curve of, say, C++ syntax. There is also a philosophical issue here: you get a simple, highly flexible environment, where complexity is tacked not by having a complex mammoth that tries to address all possible cases, but by a set of plastic, bendable building blocks.

Homoiconicity is not a black-and-white feature. For instance, Lisp macros are not first-class objects (I am a Clojure newbie, so feel free to correct me) so you cannot metaprogram with them. Prolog seems to come close. In fact, to a Prolog programmer, Lisp macros seem especially inelegant as the are “out of the system”.

There seems to be some competition in the field that can be vaguely defined as “The next Java”(TM).

I don’t know if there will be a “next Java” to start with. Things seem to shape up in way where the JVM is our common interoperability platform and on top of it we have a an ecology of JVM based languages.

I have used Jython quite a lot but have several doubts about it, not only on the current status of Jython (lags a bit behind CPython) but I also deslike Python (when compared with the other languages dicussed here). As such I decided to evaluate the other Scala, Ruby and Groovy.

I have done a couple of small projects in Scala (A prototype DSL for modeling malaria resistance is available here) and JRuby. I am now starting with Groovy, and I think I’ve found my new love. Here I will try to explain why, among Groovy, Scala and JRuby, I have chosen Groovy. To preempt any religious war idea, I would like to say I have full respect for Scala, Ruby, which are, with Caml and Prolog among my favorite languages (for a true crusade and flame ask me for my opinion about Perl or Visual Basic 6 ;) ).

Steven Devijver suggests that Groovy is the language with more syntatic similarities with Java. I would say that, not only that, but on the semantics and everything, Groovy is the closest language to Java. And that is a good thing. The world (both in programming languages and all the rest) is never revolutionary. Revolutions, when they rarely happen, are either a disgrace or are not that much of big change below the surface. People normally prefer (for good and bad reasons) the path of least short term pain. Groovy delivers that: almost 0 cost in starting to code coming from a Java background. Most importantly Groovy does that but still delivers most of the new goodies. This is actually the cornerstone of my argument: path of least pain while delivering the good stuff (in some cases better than the competition, as we will see).

Let me start with the fundamental reasons why I dismiss JRuby (which is, nonetheless, my second option after Groovy). First, I would like to say, very honestly, that the work of the JRuby guys is nothing short of outstanding! But I have 3 problems:

  1. One, by definition, JRuby is based on Ruby, a language from outside the JVM. That means semantic hurdles, coupling issues between the two worlds (think, e.g., libraries)
  2. Most importantly (but connected with the first point): Typing. I am a bit far away from computing issues currently (I work with Malaria currently, so excuse me if I mess strong/explicit typing and such) but clearly the typing system of Ruby make like hard for IDEs (think IDEs to neded to tame those over engineered Java APIs) and automated tools around code. Debugging without explicit typing is also a pain in a big program (I actually suffered my first debug nightmare with typing systems with Caml, arguably the mother of Scala). Some might say that Scala type inference and Groovy duck typing also are problematic in this respect; while the argument might be correct both languages have mechanisms to support typical Java explicit/strong typing and as such profit from IDEs and automated analysis tools.
  3. Ugly perlisms. Although I have read somewhere that those might be deprecated in the future.

Ah… Scala… Mats Henricson argues that Scala is the only option because of elegance regarding multicore computing. I fundamentally disagree with his point – multicore programming is fundamental but Scala is not really a good solution, but before we get there, lets talk about other Scala issues.

Type inference. I have some experience with the “mother” of Scala, Caml. Type inference in Caml is really elegant: I don’t remember a single case of it failing and requiring the programmers’ help in discovering the type of a parameter. That is not the case with Scala, several times the compiler seems to be “lost in translation”. Some might say that this is because of JVM imposed constraints, but if that is the case then it would raise the argument of bringing a language with a foreign semantics to the JVM and the ugliness attached to the process.

My biggest peeve? Metaprogramming. I won’t give you my opinion about it because it really doesn’t exist. It is on the Scala wiki in the section “future”. I am sorry, but a 21st century language where meta programming is absent can only be called in “beta stage”. As a side note, there seems to be something lost in the ML branch of functional programming from Lisp in this regard (no introspection and such), that is a shame (How is Haskell in that respect?).

Ok, multicore computing. This is an area where I have some experience in the JVM: [Shameless plug] I invite you to have a look at my Java Web Start, Jython based, multicore aware evolutionary biology workbench LOSITAN. Furthermore I have written tutorials for the multicore paradigm and bioinformatics:

Bioinformatics, multi-core CPUs and grid computing: Introduction (1/4)


Bioinformatics, multi-core CPUs and grid computing: User perspective (2/4)

Most importantly in this context: Bioinformatics, multi-core CPUs and grid computing: developer perspective (3/4)

Mats argues that Scala Actors and immutable data types provide a simple and elegant solution to the extremely complex problem (I am calling it extremely complex, because I think it really is) of concurrent programming. Immutable data types… Does anyone believe that the hordes of existing Java developers/programmers are ready and willing to do radical conceptual jump to immutable data types? The change from C++ to Java was minor in terms of semantics, even the change from C to C++ was much less radical that a change requiring to “get rid of all variables”. How do you think the majority of programmers will react when you say: “Forget variables”? More, as Scala allows for imperative type of programming, what do you think most programmers idiom wil be: Imperative or functional? To makes things worse, in Scala a immutable is called a “val” and the mutable a “var”. Am I the only only picturing hordes of developers, with tight deadlines just swapping L’s for R’s?

I speak for myself here: in spite of having probably more experience with “immutable” languages (Prolog a lot, Caml a bit) than most developers, when I wrote Scala code, my reasoning was so tainted by “real world” imperative languages that it was really hard to write in a functional dialect. I have the background, enough free time, and the motivation to write functional code, but it was hard to get back in that mindset.

Scala only apparently solves the multi core problem. Give it to a typical developer and he will write imperative code, unless you put a functional zealot behind him (and give the said zealot a strong, resistant whip).

How to address the multicore issue? Clearly we have a problem here. A few ideas:

  • In many applications there is no big need to go multicore. In some cases lets not try to solve a problem that doesn’t exist in the first place.
  • Many multicore applications can survive very well with simple concurrency management. Not all applications require a PhD in concurrent programming.
  • Scala and the like. For those who can and are willing to go functional, why not? I have nothing against that. My only argument is that it won’t be mainstream.
  • The way of PAIN. Most developers will continue to use old languages and paradigms and SUFFER with it. Only after much suffering there will be motivation to try out new things and, say, endure the pain of learning a new paradigm. That suffering still hasn’t happen, only after this becomes a big problem, there will be interest in accepting new solutions.
  • A silver bullet that can be attached to the current programming paradigm. Sometimes it happens. Don’t misunderestimate (silly Bushism intended) the power of a “Black Swan” (A reference to Taleb’s book where he discusses the impact of the unexpected important events).

To finalize, I would like to say that I am not sticking with Groovy out of being conservative. Groovy seems to beat the competition in many areas (the biggest example is metaprogramming) and strikes a very good balance between being a “small evolutionary step” and delivering the goodies.

To really finalize, a caveat: my Groovy knowledge is still limited, one of these days you might read a post where I apologize for having written this ;)

Originally posted on Perfect Storm