Holy Grail: The quest for THE programming language

Being a computer scientist with a strong interest in languages (languages in the broadest sense possible: programming, natural and cognition related issues), I am in an holy grail quest for a programming language that:

First and foremost allows me to express my computations in a way that is close to the problem domain (as opposed to close to the machine). As I am working in a biology setting that means being able to talk about concepts around genes, epidemics and pharmacology in my programs. I don’t want to think about CPUs, memories and things like that when I am coding. Prolog and Lisp are good examples here. I also need programs that can evolve over time as knowledge changes, I need strong metaprogramming and Domain Specific Language facilities.

Unfortunately I have a couple more requirements coming from the day to day reality…

Real world: I want a language that interacts with existing libraries and that I can easily make available to other people to use, inspect and change. I need Bio* libraries, graphics plotting libraries. I my personal case I decided that I want to work inside the JVM, so I need a language that works in the Java world (Jython, JRuby, Scala, Groovy, … Java).

Software engineering: Programs have to be easy to maintain and debug. I guess there is no way around explicit typing on the debug and tool construction front.

Ridiculous religious fanatic quest? Yes, it might be, but I am pursing it.

The truth is that we are not far away from this grail.

Scala is almost there. Lacks metaprogramming and things like type inference are a bit amateurish (compare it with CAML).

JRuby is maybe there, I could live with it, I guess. The lack of explicit typing will make things difficult in the long run on the software engineering front.

I decided to give a final try to yet another language: Groovy, and up to now it is going very OK. Seems to nail all the fundamental points. I especially love the effort on good metaprogramming facilities.

I decided, for pragmatic reasons, that after this one I will stop my pursuit for the grail. If Groovy proves a blunder of some sorts I will revert to JRuby and carry on.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: bioinformatics, declarative programming, groovy, metaprogramming, science, software engineering

by: tiago

6 Comments

Rock star politics or genuine and honest ideas?

The rock star:

Or is it “Genuine and honest ideas”? Obama from 1995 (Digg here).

Maybe is rock star politics AND genuine and honest ideas.

I tend to by cynical, but I want to believe.

If I was American I would vote for him. We the current remaining candidates, no doubt for that (I must say I liked Kucinich and Edwards). Obama > McCain > Clinton.

Female US president? Michelle Obama in 8 years time.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: uncategorized

by: tiago

No Comments

Bio.PopGen

I am currently developing the Biopython module for population genetics and genomics (by the way, you are invited both to help with the development and to make suggestions - maybe based on your needs - for new features).

On the current (1.44) version of Biopython, a GenoPop parser and code to deal with FDist (a Fst outlier method for selection detection) is available.

It is my pleasure to announce that coalescent simulation (in the form of support for the SimCoal2 simulator) is currently available on CVS and will probably be out on the next public version. This includes, code, test code and DOCUMENTATION. This means you can now do coalescent simulations from inside Biopython (many demographies and markers supported).

Future plans for Bio.PopGen include statistics (the meat of the module, actually) and HapMap support, among others.

Need any feature? Just ask. I cannot promise it, but I will try to address user requests in as much as possible

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Python, bioinformatics, biopython, population genetics

by: tiago

No Comments

A holiday, Ruby and Scala

I really did not have an holiday, but I stopped posting for a while.

But I want to talk about another “holiday”: Scala.

I have spent a couple of months with Scala: A functional-OO programming language done from the scratch with the JVM in mind, with a nice, smart community.

I actually decided to stop my efforts on Scala and decided to go back to explore the Ruby way… The reasons:

  1. No metaprogramming facilities. This comes from ML, I suppose. But Ruby has it and many “old school” elegant languages have it (Lisp, Prolog). It is possible to be elegant (in fact I would contend that in many settings it is a requirement) with metaprogramming.
  2. There seems to be some difference in the semantics between compiled and interpreted. I only compiled, but the interpreter could add new variables to its local scope (as it really needs it) but the compiler couldn’t. While one might argue that that is excessive flexibility coming from the scripting languages camp, but I actually had to, on a compiled program, to create new classes which would include traits that would be dependent of need of the user, and this cannot be done. If one has many traits, it has to compile a priori all the trait mixins desired, they cannot be defined at run-time in a compiled environment (contrast this to JRuby or even JPython). This is actually metaprogramming lacking part 2.
  3. Type inference: Scala type inference might seem clever, but, compared to CAML it is not. Sometimes the compiler is not able to infer the types and the user has to explicitly declare them. CAML was always capable (at least in my cases) of complete type inference.
  4. Information sources are scarce. The mailing list is reasonable, but sometimes questions get unanswered and there is no other source (other than inspecting the source code). This will sort out if there are more people using it - and more books like the Artima ebook.

Decent metaprogramming in a runtime setting would be my main requirement, but in the current Scala status, one can only have it though the typical Java way: execute the compiler, link a jar, not elegant…

Regarding Ruby, I would like to have some form of strong and explicit (or inferred) typing. I would imagine that the requirements of metaprogramming flexibility and typing are contradictory, but, at least, some kind of optional (but standard) annotation for input/return parameters would allow avoiding some debugging nightmares of not having the compiling helping with types and would also allow for smart code editors to do all that fancy completion that is possible with explicit typing.

[This was initially posted - with modifications and additions - on Artima as a comment]

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Ruby, Scala

by: tiago

5 Comments

Death Penalty Repealed in New Jersey

The title says it all. Lets enjoy the good news. Not all is dark and bleak for human rights post-9/11.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: life

by: tiago

No Comments

Slowing down

I very much doubt that nature has prepared the human species for the very fast pace of today’s society. Moreover, we are supposed to have our attention span distributed around a multitude of issues. Not only that, we are supposed to answer fast, interact in real time.

Not only I doubt this is a path to General Human Happiness(TM), I also doubt that there is a real increase in productivity from all this speed increase. Blogging (especially reading) is just increasing the pace of things even more.

During the last couple of weeks I have to put off most of my multiple tasks in order to get work done in time for a presentation next week. I was afraid that I would not have enough time to do half of it. Guess what? Everything is done by now. Of course, no blog reading (ok, little), no blog writing and very little distractions were allowed.

I decide to very consciously reduce the speed of things, especially of interaction and multitasking.

Blog reading? Of course, 1 day a week. Cut the diet in the number of blogs. But not the areas covered: I still read blogs in my areas of interest (bioinformatics, poverty diseases, population genetics, cinema, economy, human rights, fitness and practicing sport).
Answering to interesting blog entries? Of course. But, I can do it in this hour (a.k.a. in blog time) or take a couple of months.
Journals? More or less the same rule applies. The noise ratio is very high. Just digging for gems takes time. I still look at the RSS feeds everyday though.
Mail? 3 times a day max (unless there is an urgency going on. Urgency, means urgency. For now there have been 0). I am also back at using a text-based mail client, it is more efficient, after a learning curve.
Lunch? Not the anglo-saxon variety that is for sure. I am doing the typical Portuguese hour long, away from work. And I am going Spanish: Siesta! Sleep is important. I live 2 blocks away from where I work. I do really see more freshness in afternoon work when I do this.
Whenever possible I respect my biological clock. Want to stay extra time in bed in the morning? Of course (Although, being an early bird, I am normally at work before 8am, to be honest).
Too noisy in the office? I move my work to a quieter place.
I don’t work at home. Home is to rest.
This allows for, imagine, spending most of my day actually working on my core tasks. And less tiredness.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: uncategorized

by: tiago

No Comments

Lawrence Lessig

An impressive presentation about creativity, law and the producer/consumer paradigm applied to culture.

Hopefully will restart blogging soon. Sometimes inertia creeps in.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: life

by: tiago

2 Comments

Automated GUIs for OO models and DSLs

One of the most delightful things in bioinformatics is the possibility of working with people with really different mindsets. Surely CS geeks are amazing, and everyday I feel that my original background is really a comparative advantage, but, from where I look, nothing beats being in an environment with scientific and cultural diversity. But, lets talk some geekiness now:

A couple of years ago, I did a population genetics simulator in Caml. It was really flexible, allowing for many demographic and genomic scenarios, mating rules, selection… really flexible. I never got to try to publish it because there are many good simulators around (I suggest simuPOP, if you are looking for one) and it would take some time to make it robust and documented for public exposure. But, the interesting part is, when I went to my MSc supervisor (an “old-type” biologist) and after a very exuberant explanation on how flexible the simulator was, he added only one comment: That is all very well and good, but you did not show me the easy to use graphical interface!

Fast forward a couple of years… With regards to a DSL to model drug resistance in the context of infectious diseases that I am developing, I went to my PhD advisor (a population geneticist, malarialogist, biostatistician who knows how to program in C), showed him my rough prototype and he said: People will be able to read this, but, to interact they will want an easy to use graphical user interface. To be honest, this time, I was expecting the comment (I am living in the middle of experimentalists long enough to have learned something). I have no expectations, for my DSL, that domain specialists will write it (well, maybe a couple of them will, if things pick up). If I end up giving my system away to domain specialists, it will have to have a easy to use interface, there is no escaping from that.

Well, DSLs (at least in Scala and in Ruby) have an underlying OO model. Which, most of the times is neither complex nor big. I am starting to suspect that it won’t be too difficult to automatically generate an easy to use interface to input in a “nice” way what could be rendered as DSL programs (or object instances and relationships, if you prefer to look at it that way). For embedded DSLs, which have the whole expressive power of the host language available, that would be unfeasible to do completely. But, at least part of it could be automated. Obviously this idea is not new at all, this is just a rehash of what Lift or Rails do for databases.

I am aware that graphical programming languages never went too far (I actually dislike them), but the scope and context here are completely different, different premises apply. This might be one way of lowering the barrier to rigorous modeling to a wider crowd.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Caml, Ruby, Scala, bioinformatics, declarative programming, science, software engineering

by: tiago

No Comments

Biopython’s population genetics module

I would like to make a preemptive defensive comment on the new population genetics module. ;)

I am, for now, the sole author of the code that is there (although, in future versions there will be at least code from another person. By the way, if YOU want to participate, your 100% welcome). Although the code is mine there was a lot of help from Peter Cock, one of Biopython’s core developers. Without him, this initial groundwork would not have been possible.

Now for the preemptive defense :

If you look at the module, it has very little functionality included. This is a very deliberate strategy to start small and grow slowly. I am expecting for some feedback (which will be very little, I am sure). I want to grow in small steps, including as much feedback as possible. Test code and documentation have to exist before releasing anything to the public.

In the pipeline there is code for coalescent simulation, statistics (including code supplied by Ralph Haygood, that I am joining with my own) and HapMap. If you are interested in early access to any of this code, please give me a shout as most of it already exists. Alpha testers are more than welcome ;) .

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Python, bioinformatics, biopython

by: tiago

No Comments

Modeling drugs in Scala

I am currently trying to model antimalarial drug behavior in order to understand the spread of drug resistant malaria. Generally speaking, malaria strains are more or less tolerant to a drug depending on the quantity of drug that is necessary to kill an infection. In theory, a totally resistant infection will survive any treatment, a totally susceptible one will only require small levels of drug to be cleaned.

I see the word drug used in two different ways (for the readers of this blog that are specialists, in some form, on issues regarding drugs, particularly pharmacokinetics, if you see any thing particularly wrong, please do inform me): For instance, SP (Fansidar) is a drug, composed of two drugs (Sulfadoxine and Pyrimethamine). I will use drug for SP and compound for S and P (as active compound seems to be used).

Antimalarial drugs work mainly in the blood stream against asexual parasite forms.

In the blood, compounds have a certain concentration. With time, the body gets rid of compounds (thus the concentration of a compound goes down with time). The concentration of compounds is normally (but not always) modeled using an exponential decay function, being the fundamental parameter the half-life, i.e, the time that it takes for the concentration of a compound to drop to half.

Two other important concepts for drugs that are not taken intravenously (like cheap antimalarials which are oral), are

  1. Bioavailability, i.e. the fraction of the compound that actually reaches the circulation. It seems that one of the problems with counterfeit drugs is low bioavailability. Bioavailability is normally discussed in terms of AUC (Area Under the Curve. Being the curve related to the plot of drug concentration against time). I will model it in terms of maximum concentration, half-life and the time it takes to reach maximum concentration in the blood, which by the way is the next concept…
  2. The time it takes to reach maximum concentration in the blood, i.e. the time from ingestion to circulation in the blood at maximum concentration. I suppose this time frame has a technical name, but I don’t know it (if you know, drop me an email our comment, please).

Now, back to computational modeling:

A big objective is declarative programming. Preferably a program that can be read by domain specialists (biologists, MDs, biostatisticians, …), with that in mind…

Currently, a computer program in Scala to model drugs look like this.

Compound create "Sulfadoxine"
Compound abbreviation "S"
Compound half_life 116 //hours
Compound bio_availability 408 //1mg to nanoM
val Sulfadoxine = Compound prepare
 
Compound create "Pyrimethamine"
Compound abbreviation "P"
Compound half_life 83 //hours
Compound bio_availability 34 //1mg to nanoM
val Pyrimethamine = Compound prepare
 
Drug create "SP"
Drug includes Sulfadoxine quantity 500
Drug includes Pyrimethamine quantity 25
val SP = Drug prepare

Discussion:

  • I am using the “object companion” pattern a lot. The idea is that all “stateful” mess is stored “prepared” in the object (which is the DSL source). When the prepare method is invoked in the object a class (with only immutable vals, very lovely for those of you who are functional programming enthusiasts) is created.
  • Notice the dependence on operator precedence on Drug includes quantity (there is not really one, strictly speaking, but assume there is). I would really like to have, per class the ability to define operator precedence, other than not based on dictionary order (à la Prolog).
  • I don’t like the val SP = Drug prepare. It is too verbose and too geeky. I would prefer just Drug prepare. I believe that this is possible in Scala as at least at the interpreter level (as the Scala interpreter does it), but I still don’t know how. The idea would be that a val named SP would be added to the local scope in some way. For those computer inclined readers that think that I am being too pedantic and nit-picking, I just have one thing too say: I am really trying to make the system the most pleasant possible to non programmer types, and I think my proposal does not sacrifice elegance and generality (although I would recognize the non-explicit name creation is “strange” - but, hey, the Scala interpreter already does it!)

Caveat emptor, big one: Although drugs (compounds) are discussed in terms of half-lives, bioavailability, etc… these properties are actually not of the drug but of the interaction between the drug and the individual. Making them drug properties only is a “cognitive abuse”, although it has its uses. For instance, my advisor, after looking at the language, was talking about bioavailability for counterfeit drugs, for children between 2 and 5 years. A great example that they are not properties only of the drugs but also, at least, of individuals (and not only that, for instance many drugs are more bioavailable if there are taken in conjunction with, say, fatty foods).

A proper, precise, computational modeling of drugs would be a gigantic undertaking. I have a different approach: Modeling as close as possible to the average domain discourse and hook, in some way, the necessary precision, should the need arise. It is worth noting that “incorrect”, “imprecise” modeling is enough for many tasks.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Scala, bioinformatics, declarative programming, malaria, metaprogramming

by: tiago

2 Comments