Oct
26
I am seriously considering doing the core of my work (at least when I have the freedom to decide) in Scala. The reasons? Well, I can give them in the form of requirements:
- Domain Specific Language support, that is:
- Making life easy on declarative programming
- Ability to show the code to non-programmers in a form that is readable and understandable (I will talk about this topic a lot in the future).
- DSLs should be embedded and not stand-alone. A DSL (say, one to model the spread of malaria drug resistance) can be made in any programming language, really. But embedded languages (i.e., where the DSL resides inside the host language) cannot really be done in most languages. This allows for “unlimited” extensibility (Turing completeness some would say). Prolog is still my favorite here.
- Availability of a wide range of libraries (think math libraries, chart libraries, bio libraries). All JVM based languages can use Java libraries. This more or less kills Prolog, Caml and Haskell.
- Easy multi platform support. Think Linux, Mac OS X and Windows. With not much pain. Kills most non-VM languages and “system” languages (C, C++, Fortran).
- By the way, I refuse to malloc. I was born in the 70s, not retired in the 70s.
- Lively, clever and helpful community.
- Strong-typing, better yet, strong typing with type inference. I don’t think typing in traditional “scripting” languages scale when the code base grows, it is overrated (think Ruby, Python, Perl and friends), debugging becomes a mess. Caml wins here. Scala type inference seems to sometime fail (i.e. requiring the programmer to explicitly specify the type). Java type of languages force you to always be verbose, that kills productivity.
- The language should be seem by the creators mainly as a production vehicle and not as a research vehicle. A big no-no to Prolog here. Haskell goes the same way. Scala seems to strike a reasonable balance. I need to produce reliable code, I require a reliable compiler/interpreter.
- I have a strong bias towards the JVM: Open source and open development process (Java Community Process), robust, widely supported, massive user base. .NET, being in practice vendor locked (I don’t think Mono is really a viable alternative as MS really controls whatever they want to control) is out. At the end of the day I also have a soft spot for Java. There are many things that I don’t mind doing in Java.
- Introspection. Caml fails here. I actually don’t know how Scala fares here, but at least JVM mechanisms are enough for me.
- Striking a good balance between cognitive freedom and damage control on bad code design. As an example, Java gives little freedom in regards on how to express your ideas. Perl, on the other hand allows you to do a big mess (without really giving you expressive power, actually). Functional and logic programming languages shine here.
- Over engineering might be good support all possible use cases, but it is a productivity disaster to code in. I am thinking Java here. All libraries are difficult to use by design. Even 3rd party libraries seem to be designed mostly with a complexity culture in mind.
Scala seems to be the option that tackles most issues. To be honest I was always frustrated with all languages because they missed a crucial point in a big way. Prolog is too “researchy”, Haskell also, C too low level, Java too verbose and too freedom-curtaining. Perl and C++ are a complete mess (although in different ways).
Python is almost there (Major: Jython lags. Minor: weak DSL and functional-paradigm support). JRuby is probably there. Scala is probably there. My gut feeling points to try out Scala.
7 Comments to "Scala for bioinformatics"
Please share your thoughts
Filed in: Scala, bioinformatics











and you can add the Actor-based concurrent programming and the ability to match objects with patterns.
I starting scala too. I’m waiting your next posts …
Your blog is really interesting
Would be interesting if you could discuss the tradeoffs between Scala and JRuby.
[…] From Neil Saunders we have an excellent tutorial (part I and part II) about setting up and using SVN and Trac for tracking bioinformatics projects. In theory, scientists should be able to trace anything they release (not only source code) back to its origins and Neil has ready to implement solution. As Paulo Nuin from Blind.Scientist found Trac a little bit clumsy, he recommended svn-time-lapse instead, since it’s easier to compare two versions of the file (see part I and part II). You can test both approaches with your new project inspired by Tiago from Perfect Storm - he started an interesting journey with Scala for bioinformatics. […]
[…] Scala for bioinformaticsI am seriously considering doing the core of my work (at least when I have the freedom to decide) in Scala. The reasons? Well, I can give them in the form of requirements:. Domain Specific Language support, that is: … […]
I actually think Haskell is about as practically-oriented as a lazy pure functional language can be (Clean is no less practical, but it fails some of your other criteria). You might think referential transparency is a research experiment rather than a truly fantastic language feature, but many would differ. Of course the JVM can’t implement it, but the same limitations prevent Scala being half the language it ought to be.
I have been using Haskell for bioinformatics the last few years, with, I would claim, some degree of success. There’s a fair-sized and growing library available from my web site, should you be interested.
-k
…uh, available from:
http://malde.org/~ketil/biohaskell
-k