A holiday, Ruby and Scala

I really did not have an holiday, but I stopped posting for a while.

But I want to talk about another “holiday”: Scala.

I have spent a couple of months with Scala: A functional-OO programming language done from the scratch with the JVM in mind, with a nice, smart community.

I actually decided to stop my efforts on Scala and decided to go back to explore the Ruby way… The reasons:

  1. No metaprogramming facilities. This comes from ML, I suppose. But Ruby has it and many “old school” elegant languages have it (Lisp, Prolog). It is possible to be elegant (in fact I would contend that in many settings it is a requirement) with metaprogramming.
  2. There seems to be some difference in the semantics between compiled and interpreted. I only compiled, but the interpreter could add new variables to its local scope (as it really needs it) but the compiler couldn’t. While one might argue that that is excessive flexibility coming from the scripting languages camp, but I actually had to, on a compiled program, to create new classes which would include traits that would be dependent of need of the user, and this cannot be done. If one has many traits, it has to compile a priori all the trait mixins desired, they cannot be defined at run-time in a compiled environment (contrast this to JRuby or even JPython). This is actually metaprogramming lacking part 2.
  3. Type inference: Scala type inference might seem clever, but, compared to CAML it is not. Sometimes the compiler is not able to infer the types and the user has to explicitly declare them. CAML was always capable (at least in my cases) of complete type inference.
  4. Information sources are scarce. The mailing list is reasonable, but sometimes questions get unanswered and there is no other source (other than inspecting the source code). This will sort out if there are more people using it – and more books like the Artima ebook.

Decent metaprogramming in a runtime setting would be my main requirement, but in the current Scala status, one can only have it though the typical Java way: execute the compiler, link a jar, not elegant…

Regarding Ruby, I would like to have some form of strong and explicit (or inferred) typing. I would imagine that the requirements of metaprogramming flexibility and typing are contradictory, but, at least, some kind of optional (but standard) annotation for input/return parameters would allow avoiding some debugging nightmares of not having the compiling helping with types and would also allow for smart code editors to do all that fancy completion that is possible with explicit typing.

[This was initially posted - with modifications and additions - on Artima as a comment]

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Ruby, Scala

by: tiago

5 Comments

Automated GUIs for OO models and DSLs

One of the most delightful things in bioinformatics is the possibility of working with people with really different mindsets. Surely CS geeks are amazing, and everyday I feel that my original background is really a comparative advantage, but, from where I look, nothing beats being in an environment with scientific and cultural diversity. But, lets talk some geekiness now:

A couple of years ago, I did a population genetics simulator in Caml. It was really flexible, allowing for many demographic and genomic scenarios, mating rules, selection… really flexible. I never got to try to publish it because there are many good simulators around (I suggest simuPOP, if you are looking for one) and it would take some time to make it robust and documented for public exposure. But, the interesting part is, when I went to my MSc supervisor (an “old-type” biologist) and after a very exuberant explanation on how flexible the simulator was, he added only one comment: That is all very well and good, but you did not show me the easy to use graphical interface!

Fast forward a couple of years… With regards to a DSL to model drug resistance in the context of infectious diseases that I am developing, I went to my PhD advisor (a population geneticist, malarialogist, biostatistician who knows how to program in C), showed him my rough prototype and he said: People will be able to read this, but, to interact they will want an easy to use graphical user interface. To be honest, this time, I was expecting the comment (I am living in the middle of experimentalists long enough to have learned something). I have no expectations, for my DSL, that domain specialists will write it (well, maybe a couple of them will, if things pick up). If I end up giving my system away to domain specialists, it will have to have a easy to use interface, there is no escaping from that.

Well, DSLs (at least in Scala and in Ruby) have an underlying OO model. Which, most of the times is neither complex nor big. I am starting to suspect that it won’t be too difficult to automatically generate an easy to use interface to input in a “nice” way what could be rendered as DSL programs (or object instances and relationships, if you prefer to look at it that way). For embedded DSLs, which have the whole expressive power of the host language available, that would be unfeasible to do completely. But, at least part of it could be automated. Obviously this idea is not new at all, this is just a rehash of what Lift or Rails do for databases.

I am aware that graphical programming languages never went too far (I actually dislike them), but the scope and context here are completely different, different premises apply. This might be one way of lowering the barrier to rigorous modeling to a wider crowd.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Ruby, Scala, bioinformatics, declarative programming, science, software engineering

by: tiago

No Comments

Python, Ruby, Java and Threads

Greg Tyrelle made a very important comment regarding exploiting multiple cores and Python (which will surely be included in my next part on bioinformatics and multi-core computing).


First, my understanding of python threads is that they are not separate system level processes, but some kind of fake process that are python specific ? Trouble is that I see two separate process when I launch two Blast runs via threading ?

The other aspect of threading that I’m still not entirely clear about is how the global interpreter lock (GIL) fits into the picture. I get resource locking to prevent race conditions, but is the GIL also invoked each time an action that manipulates memory takes place in a thread ? I’ve heard this property of python makes it unsuitable for multi-core programming ?

I will trade formal correctness for clarity of explanation (namely I won’t discuss that much the difference between thread and process, as it would make this too techy and confusing).

Python uses real (i.e. native) threads. Ruby uses the so called green threads, those are “fake” (simulated). Ruby 2.0 will use native threads.

So, in theory, Python is OK in multi-core architectures. In practice there is a problem, a serious one, identified by Greg: the Global Interpreter Lock (GIL). The GIL makes it impossible for more than one thread to be executing Python code at a time. When you are dealing with Python code, even if you have many threads with many cores, only one thread can be executing Python code. This is not as serious as it looks, there are 4 ways to live with this:

  • If you use a thread to start an external process, that process is not under the control of the GIL (it is a separate process), so it can run concurrently (think BLASTing something) as it is running outside Python, that is, it will be using a different core. So I think it covers one fundamental use case in bioinformatics: using external, computationally intensive, programs. In fact you can start as many instances of external programs as the number of cores you have (or even more, in case you think it will be advantageous). Note that the thread that calls the external application will block (well… depends, but for simplicity lets assume it), but your other Python threads can continue in concurrency with the application.
  • This is subtle, but important: If you use CPython (the standard implementation), and you do your computationally intensive stuff in C (which makes sense – and is a common strategy – as Python is quite slow) then the C code, as long as it is not interacting with Python objects, can release the GIL and therefore make use of multiple cores. The Python code uses only one core, the C part might be using all the remaining available ones. This approach is not valid for Ruby because of the green threads issue (I am a simple Ruby newbie, so take my words with a grain of salt).
  • Now… this GIL problem (or the green threads issue in Ruby) disappears if you use Jython or JRuby, as they use the JVM native concurrency mechanisms which have no notion of acquiring an exclusive lock for execution. By the way you can also use JVM based interpreters to call native (non-JVM) applications (think BLAST again, from inside Java). To put this point in another way: the GIL/green threads problem is not a language limitation, it is a limitation of the standard (C based) implementations that other implementations might not share (and the Java implementations, in fact, DO NOT).
  • If you think about grids (and not multiple cores) then the problem disappears as we are then talking of different processes (even more, running on different hardware).

I am afraid of being too techy with this post (I am probably labeled as 100% computer nerd by now ;) ), but I think Greg’s point is fundamental and required some discussion.

In my defense ;) I would like to say that I am only writing too much about programming because I am in some sort of professional unclear phase, as soon as things get back on track I want to focus more on the biological part of things… Until there I will be writing of the issue that I know better, and that is, for better or worse, informatics…

Comments, especially constructive criticism, is, as always, welcome…

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Java, Jython, Python, Ruby, bioinformatics

by: tiago

3 Comments

Ruby: Hello World

I happen to be a strong believer in DSLs, in fact, unless for the most computational intensive stuff, I would do everything as a DSL.

In my current setup (research in bioinformatics), I can essentially choose the tools I want. My only constraint is having decent bioinformatics and graphics libraries (ah… and I try to do everything inside a JVM). The constraint excludes Prolog and OCAML, so, based on previous experience, I was working with the pair Python/Jython. Guido explicitly stated that Python 3000 is not going the DSL way. Its his option, I am all for having different approaches to programming, but that is not my option.

Enter Ruby: enough bioinformatics libraries (well, from JRuby one can use JFreeChart and BioJava and, of course, BioRuby) and DSL support.

As my Hello World application I decided to immediately use Ruby’s DSL features, so my first application was a Web Template language (yes, yet another), which I describe here:

Fundamental concepts

  • template – A template for a certain type of page: A title page template, a multicolumn page template, …
  • snippet – A part of a page: A navigation bar, an embedded RSS feed, …
  • page – a certain page: The entry page of my website, the page about bioinformatics. Pages are template based and can use snippets.

The fundamental idea is that, for each template there is a template language tailored for that template, for instance, my entry page looks something like this:

title "Tiago's virtual house"
abstract "Bioinformatics, software development, sports (doing, not watching), cinema, music, ..."
topic ("Bioinformatics") {
  summary "Here you will find software for life sciences"
  subtopic "Soft4Life" {|f| in_link f, "soft4life"}
  subtopic "Molecular adaptation"
  subtopic "Tropical diseases"
}
topic ("another") ...

Different kinds of pages (i.e., with different templates) will have different languages

Interesting Ruby features

  1. instance_eval – instance_eval seems to be the workhorse behind Ruby’s DSLish style. Mainly instance eval takes a string and executes it making the name scope of object visible without having to explicitly refer it, that is, imagine that you have an object myCar of class Vehicle, which has a method called start. In that script you can do just start and not myCar.start. That (coupled with less parenthesis clutter) makes the thing work.
  2. attr_reader and friends – attr_reader is an expedient way of having a getter/setter pattern, nice to spare keystrokes. The annotation/decoration that seems to go with this sure deserves research…
  3. Method catching – I am using method_missing to (naively) convert any non existing method name of a template class to an HTML element, so if one calls object.a it will render <a>…</a> (if one does object.shaite, yes, it will do <shaite>…</shaite> ;) )

“Problems” with Ruby (ie, showing that I am a complete newbie)

I did not like the following things:

  1. yield inside instance_eval (show stopper?) – When yielding inside instance_eval, the “inside object” scope seems to be lost. I.e:
      def sillyMethod()
      end
     
      def goneYielding()
        yield
      end

    The code called on yield will not get sillyMethod (and all others from the of the object yielding) on its scope.
    For me this is the biggest hurdle, can be a show stopper, I will research more here before continuing with Ruby…

  2. Parenthesis – Like this:
    topic ("Bioinformatics") {

    Are those parenthesis really needed (before the code block)? Ruby is quite nice in not needing parenthesis, but in this case I could not get rid of them and I don’t see why… (Actually, I see, its probably just my ignorance for now).

  3. Lots of Rubyisms still lying around – Like this:
    subtopic "Soft4Life" {|f| in_link f, "soft4life"}

    I don’t like to have to write |f|…, as it seems to force things to be too Rubyish (pun really unintended). Intuitively I would say that there is something about variable visibility inside code blocks that does not lend itself to easily to this.
    Also, I would like to do some code rewriting, like just putting in_link “soft4life” and then automatically rewrite it to be something like |f| in_link “soft4life”, f. I would bet that this is possible (again, newbie ignorance). This is not the best example of code rewriting, but I hope the point is clear…

  4. Yielding to multiple code blocks – I would like to yield to multiple code blocks. Seems ridiculous? I could do that in Prolog, and I can think of an example use case: When writing an HTML element, yield to a (first) code block to write the attributes, then yield to a second one to write the content.

Preliminary conclusions

I am still too green Ruby to make a decision (only this piece of code), but it looks good. I suppose most issues are due to my total inexperience with Ruby.

There are a lot of things that still need to be checked (like operators – can one change the semantics? And the precedence (a la Prolog)? And the association (Prolog again)? )…

Resources that I used (and recommend):

Programming Ruby

Ruby Standard Library Documentation

Jay Fields blog, especially this post.

Ola Bini blog, I am reading this metaprogramming post, from time to time, and my next explorations will be around what Ola talks on that post…

I am redesigning my site around this code.The source code will be available if somebody declares some interest…
This is my first Ruby program ever, three days work, please be tolerant with the newbie kind of comments that you surely have read…

PS – Just recently discovered, to be read in the future Creating DSLs in Ruby. I will return to this topic somewhere in the future.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Ruby, declarative programming

by: tiago

2 Comments

Programming languages and platforms: Existential doubts

Through my whole career I was torn between what I like (Prolog and Caml) and what makes me marketable (Java, VB, Perl, C, Python, C#). Of course, the world is not black and white so, in the list of marketable languages there are some that more likeable to me than others.

Java is acceptable, but is too verbose and the libraries are grossly over engineered (with no apparent advantage), of course a DSL framework is nowhere to be seen. Python is also acceptable, but the lack of DSLs (and Guido explicitly stating that he is not going in that direction in Python 3000) makes it loose a lot of its sex appeal, also, less importantly, I have some bias towards strong typing.

Enter Ruby: DSLish, very pragmatic, a vibrant community, a fantastic JVM implementation, and one can $$$ on it… I am giving it a try.

Regarding platforms, a less important issue for me, I am quite happy working on top of the JVM: multi platform, stable, industry support, really open…

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Java, Python, Ruby, declarative programming

by: tiago

1 Comment