DSLs: specification and behavior

One of the interesting applications of a DSL lies in the inherent facility to separate an abstract (domain-level) specification from possible applications. Lets make this a bit more concrete with an example (taken from my malaria domain).

As it is becoming a pattern is my recent posts, I start with a smallish explanation of the biological and pharmacological background and then I go deep in the technical DSL/Groovy design and implementation part.

Antimalarial drugs have effects on parasites (being the desired effect the killing of lots of parasites). Roughly speaking a malaria infection can be seen as a progression in time of parasite loads: Parasites are multiplying (growing) and this growth is balanced by both the human immune system natural response and the effect of drugs taken (which goes by the name of pharmacokinetics – PK). Malaria parasite loads in humans can go up to 10^12 (10 to the power of 12, no typo).

PK is modeled by a function (I won’t go into details here) which is parametrized by drug concentration and parasite response (resistant parasites tolerate drugs better). As an example for Chloroquine in Groovy:

formula: {3.8 / (1 + 1/K + CQ)}

This (for now) magic formula, represented as a closure, has a 2 parameters (1/K) which is 68 micrograms/liter for non-resistant parasites and CQ is the concentration of drug in the blood.

This is the specification of the problem. Now, what do we do with this formula? The obvious response is to use it to do calculations (i.e. given a certain drug concentration, what is the value of the PK function. But, in reality we might want to many other things with it, like generating documentation (say, by creating a Word or LaTeX document) or by converting this formula into a a faster language (e.g. Fortran) for simulation purposes. I actually do both things.

So, one thing is the formula as a specification. Another thing, is what you do with it. And we can do truckloads of different things with this specification.

Lets see how we could do some of the different tasks described above:

Calculating the value of the function

Lets imagine that we want to print the values of the function between 0 and 1800 (being 1800 ng/mL a reported maximum concentration in the blood of the Chloroquine). The solution could be:

//formula is a closure with the formula
formula.K = 1/68.0 //We set the fixed 1/K parameter
(1..1800).each { concentration ->
    formula.CQ = concentration  //Varying CQ concentration
    println formula() //Execute closure
}
//In the example above

So, in this approach we take the closure, set the parameters (setting closure properties in Groovy is very simple as the example above shows), and execute the closure repeatedly.

I actually think that this example is of the worse kind possible, because it is blending specification with execution. That is, we specify our effects formula without any behavior and the we take the specification and execute it. So we are tying specification and behavior. Pedagogical and philosophical considerations aside, this works OK, is easy to code and efficient.

Generating Fortran code

The formula above is also used to generate Fortran code with the formula representation which is plugged in a malaria epidemiology simulator. In that case executing the closure with arithmetic semantics is useless, so another strategy has to be used.

The current solution gets the code AST representation through the meta class. Before I present the solution, I will show the full representation of the (slightly altered) formula and effect:

cqEffect = effect(
    name:       "General Chloroquine effect",
    formula:    {3.8 / (1 + km1/cq) },
    parameters: [km1: 68.0] //Hoshen98 microg/l
)
//effect creates an Effect object

(So km1 is a fixed parameter for the effect and cq – drug concentration – is variable).

The Effect object has a property, called code which has the Abstract Syntax Tree (AST) for the formula, the AST is accessed in the Effect constructor in this way.

this.code = formula.getMetaClass().getClassNode().getMethods("doCall")[0].code

Short story: Gets the meta class for the closure, gets the closure class AST, and then get the AST for the code of the method doCall which has the formula code for the closure. Whew, big, long train.

Caveat: Because groovy is compiled, and for memory and performance reasons, sometimes getClassNode might return null :( . If that happens to you google for “getClassNode groovy” as that issue is out of the scope of this post (I could get around this in my cases, up to now).

So, now we have to traverse the AST. In the most general case, this would mean creating a full interpreter for the Groovy AST, a breath taking task (but a good way to learn all about Groovy ;) ). In our malaria case we will only process arithmetic expressions (and if constructs, but I will not discuss that here for brevity reasons), so we expect the users of our DSL to be careful in just passing a arithmetic expression. As such the formula is a block of statements which happens to have only a single statement composed of an arithmetic formula:

def expression = it.code.getStatements()[0].getExpression()
println expression

The first line traverses the AST to get the formula. It only works because the closure code is of the form define above (single arithmetic formula). println results in:

org.codehaus.groovy.ast.expr.BinaryExpression@186d484[
  ConstantExpression[3.8]
  ("/" at 22:22:  "/")
  org.codehaus.groovy.ast.expr.BinaryExpression@ea48be[
    ConstantExpression[1]
    ("+" at 22:27:  "+" )
    org.codehaus.groovy.ast.expr.BinaryExpression@14dd758[
      org.codehaus.groovy.ast.expr.VariableExpression@174d93a[variable: km1]
      ("/" at 22:32:  "/" )
      org.codehaus.groovy.ast.expr.VariableExpression@61a907[variable: cq]]]]

Although it looks dreadful at first, a second inspection will surface that we have what we need.

A vanilla expression processor for the AST above could be:

def drillExpression
drillExpression = { expr ->
    switch (expr.class) {
        case BinaryExpression:
            return "(" + drillExpression(expr.leftExpression) + ")" +
                     expr.operation.text +
                     "(" + drillExpression(expr.rightExpression) + ")"
            break
        case ConstantExpression:
        case VariableExpression:
            return expr.text
            break
        default: return ""
    }
}

This would return the string: “(3.8)/((1)+((km1)/(cq)))”

From here I think it is quite easy to see how one could take an expression and covert it to LaTeX or Fortran code (the remaining work is really just LaTeX/Fortran syntax).

There are 2 drawbacks from this approach: It requires work to do the AST traversing and supporting for all AST types would be daunting work. At least in my malaria case the amount of work required is very manageable.

A completely different strategy to this would be to Monkey Patch numbers (i.e. massively alter the definition of the classes) and variables in a radical way: not to produce arithmetic results but to, say, generate LaTeX sources. That is probably possible, but it would be one of the worse examples of monkey patching that I could think of. Monkey business indeed!

There is also Groovy Code Visitor pattern that I did not explore… It would be probably a variation of the AST traversal strategy presented here.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: bioinformatics, declarative programming, groovy, malaria

by: tiago

2 Comments

Chloroquine malaria treatment and Groovy (DSL tactics in Groovy 2)

Chloroquine was, for many years, the workhorse against P. falciparum malaria. Around fifties (give or take a decade) resistance appeared in Cambodia and spread around the globe (if my memory serves me right there are at most 4 independent sources of malaria Chloroquine (CQ) resistance, being the Cambodia one the first to appear). Currently CQ clinical efficacy is deemed too low and CQ use is frowned upon. CQ is extremely cheap, therefore economically sustainable in Africa. The more current Artemisinin (ART) based drugs (ART, a short lived drug commonly used in combination with other – longer lived – drugs) are too expensive for most countries where malaria is a public health threat (thus requiring subsidies from external sources).

CQ is still used as a first line drug at least in Guinea-Bissau (On Google Scholar search for “kofoed bissau chloroquine”), even in the presence of resistance. A change of drug regimen (i.e. how the drug is used) seems to make its clinical efficacy go up and without increasing the spread of resistance. This is interesting from both a theoretical and practical point of view (being able to reuse CQ would be great given its price and wide availability). This is roughly the scope of my current theoretical study.

I am developing a Groovy model to specify CQ resistance. The fundamental concepts are:

On the drug side there are Compounds (e.g., Chloroquine) and Drugs (a drug is composed of one or more compounds, for instance, the widely used SP is composed of Sulfadoxine and Pyrimethamine. Chloroquine (as a drug) is composed of… Chloroquine – A single compound drug).

On the parasite side there are enzyme (protein) mutations. A mutation might help the parasite in tolerating a certain drug.

So here is my current piece of Groovy code to model CQ resistance:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
cq = compound(name: "Chloroquine", abbreviation: "cq", halfLife: 45.d)
 
CQ = drug(name: "Chloroquine", abbreviation: "CQ")
CQ.includes cpd: cq, qty: 300.mg, bioavail: 1.2
 
regimen = regimen()
regimen.take drug: CQ, qty: 2, at: 0.h
regimen.take drug: CQ, qty: 1, at: 6.h
regimen.take drug: CQ, qty: 1, at: 1.d
regimen.take drug: CQ, qty: 1, at: 2.d
 
CRT = protein("CRT")
CRT.mutatingAmino 76, Lys, Thr
 
cqEffect = effect(
    name:       "General",
    formula:    {3.8 / (1 + km1/cq) },
    parameters: [km1: 68.0]
)
 
cqResistance = resistance(
    effect:     cqEffect,
    mutations:  [CRT.mutation(76)],
    parameters: [km1: 204.0]
)

Chloroquine has a terminal half life (roughly the time that the body takes to eliminate half of the drug concentration) of 45 days (line 1). Actually, it is quite difficult to estimate half lives (and they vary from case to case). CQ is estimated to be between 1 and 2 months (extremely long).

A typical CQ pill has 300 mg of the substance (line 4).

A possible CQ regimen is, for an adult, 2 pills on the first day. 1 pill 8 hours later, 1 pill the 1 and 2 days after. Lines 6-10.

Resistance is related, among many other things to codon 76 of the CRT (Chloroquine resistance transporter) lines 12-13.

Looking at the code until line 13 I would say that is pretty readable and an elegant representation the problem. From line 13 onwards I think the same holds, but for now I will not discuss pharmacokinetics (I also refrained from explained the simplistic bioavailability parameter on line 4).

In the next posts I will concentrate on line 17, a formula for the pharmacokinetics (PK is mainly the killing effect of the drug on the parasite) of CQ. Sometimes I will be more of a computer geek and concentrate on the Groovy side of things, sometimes I will discuss more the underlying biology and pharmacology.

By the way, and going in the geek direction, why do optional parenthesis become mandatory inside list? i.e., I can do

DHFR.mutation 108

But I need parenthesis here:

[DHFR.mutation(108)]

The same seems to be happen when calling functions scoped inside a script (in the DSL example above, line 1 requires parenthesis).

By the way, that DHFR thingy above? DHFR is an enzyme involved in malarial resistance to SP, the other widely deployed cheap drug. SP acts in a less obvious way, and that will require changes to the DSL (to have relationships among effects), but that is further down the road.

Appendix:

One interesting Scala syntactic goodie that Groovy could plagiarize is this:

import org.jfree.chart.plot.{PlotOrientation, XYPlot}

From the snippet above you might infer that charts will be appearing in future posts ;)

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: bioinformatics, declarative programming, groovy, malaria

by: tiago

3 Comments

Holy Grail: The quest for THE programming language

Being a computer scientist with a strong interest in languages (languages in the broadest sense possible: programming, natural and cognition related issues), I am in an holy grail quest for a programming language that:

First and foremost allows me to express my computations in a way that is close to the problem domain (as opposed to close to the machine). As I am working in a biology setting that means being able to talk about concepts around genes, epidemics and pharmacology in my programs. I don’t want to think about CPUs, memories and things like that when I am coding. Prolog and Lisp are good examples here. I also need programs that can evolve over time as knowledge changes, I need strong metaprogramming and Domain Specific Language facilities.

Unfortunately I have a couple more requirements coming from the day to day reality…

Real world: I want a language that interacts with existing libraries and that I can easily make available to other people to use, inspect and change. I need Bio* libraries, graphics plotting libraries. I my personal case I decided that I want to work inside the JVM, so I need a language that works in the Java world (Jython, JRuby, Scala, Groovy, … Java).

Software engineering: Programs have to be easy to maintain and debug. I guess there is no way around explicit typing on the debug and tool construction front.

Ridiculous religious fanatic quest? Yes, it might be, but I am pursing it.

The truth is that we are not far away from this grail.

Scala is almost there. Lacks metaprogramming and things like type inference are a bit amateurish (compare it with CAML).

JRuby is maybe there, I could live with it, I guess. The lack of explicit typing will make things difficult in the long run on the software engineering front.

I decided to give a final try to yet another language: Groovy, and up to now it is going very OK. Seems to nail all the fundamental points. I especially love the effort on good metaprogramming facilities.

I decided, for pragmatic reasons, that after this one I will stop my pursuit for the grail. If Groovy proves a blunder of some sorts I will revert to JRuby and carry on.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: bioinformatics, declarative programming, groovy, metaprogramming, science, software engineering

by: tiago

6 Comments