One of the interesting applications of a DSL lies in the inherent facility to separate an abstract (domain-level) specification from possible applications. Lets make this a bit more concrete with an example (taken from my malaria domain).
As it is becoming a pattern is my recent posts, I start with a smallish explanation of the biological and pharmacological background and then I go deep in the technical DSL/Groovy design and implementation part.
Antimalarial drugs have effects on parasites (being the desired effect the killing of lots of parasites). Roughly speaking a malaria infection can be seen as a progression in time of parasite loads: Parasites are multiplying (growing) and this growth is balanced by both the human immune system natural response and the effect of drugs taken (which goes by the name of pharmacokinetics - PK). Malaria parasite loads in humans can go up to 10^12 (10 to the power of 12, no typo).
PK is modeled by a function (I won’t go into details here) which is parametrized by drug concentration and parasite response (resistant parasites tolerate drugs better). As an example for Chloroquine in Groovy:
formula: {3.8 / (1 + 1/K + CQ)}
This (for now) magic formula, represented as a closure, has a 2 parameters (1/K) which is 68 micrograms/liter for non-resistant parasites and CQ is the concentration of drug in the blood.
This is the specification of the problem. Now, what do we do with this formula? The obvious response is to use it to do calculations (i.e. given a certain drug concentration, what is the value of the PK function. But, in reality we might want to many other things with it, like generating documentation (say, by creating a Word or LaTeX document) or by converting this formula into a a faster language (e.g. Fortran) for simulation purposes. I actually do both things.
So, one thing is the formula as a specification. Another thing, is what you do with it. And we can do truckloads of different things with this specification.
Lets see how we could do some of the different tasks described above:
Calculating the value of the function
Lets imagine that we want to print the values of the function between 0 and 1800 (being 1800 ng/mL a reported maximum concentration in the blood of the Chloroquine). The solution could be:
//formula is a closure with the formula formula.K = 1/68.0 //We set the fixed 1/K parameter (1..1800).each { concentration -> formula.CQ = concentration //Varying CQ concentration println formula() //Execute closure } //In the example above
So, in this approach we take the closure, set the parameters (setting closure properties in Groovy is very simple as the example above shows), and execute the closure repeatedly.
I actually think that this example is of the worse kind possible, because it is blending specification with execution. That is, we specify our effects formula without any behavior and the we take the specification and execute it. So we are tying specification and behavior. Pedagogical and philosophical considerations aside, this works OK, is easy to code and efficient.
Generating Fortran code
The formula above is also used to generate Fortran code with the formula representation which is plugged in a malaria epidemiology simulator. In that case executing the closure with arithmetic semantics is useless, so another strategy has to be used.
The current solution gets the code AST representation through the meta class. Before I present the solution, I will show the full representation of the (slightly altered) formula and effect:
cqEffect = effect( name: "General Chloroquine effect", formula: {3.8 / (1 + km1/cq) }, parameters: [km1: 68.0] //Hoshen98 microg/l ) //effect creates an Effect object
(So km1 is a fixed parameter for the effect and cq - drug concentration - is variable).
The Effect object has a property, called code which has the Abstract Syntax Tree (AST) for the formula, the AST is accessed in the Effect constructor in this way.
this.code = formula.getMetaClass().getClassNode().getMethods("doCall")[0].code
Short story: Gets the meta class for the closure, gets the closure class AST, and then get the AST for the code of the method doCall which has the formula code for the closure. Whew, big, long train.
Caveat: Because groovy is compiled, and for memory and performance reasons, sometimes getClassNode might return null
. If that happens to you google for “getClassNode groovy” as that issue is out of the scope of this post (I could get around this in my cases, up to now).
So, now we have to traverse the AST. In the most general case, this would mean creating a full interpreter for the Groovy AST, a breath taking task (but a good way to learn all about Groovy
). In our malaria case we will only process arithmetic expressions (and if constructs, but I will not discuss that here for brevity reasons), so we expect the users of our DSL to be careful in just passing a arithmetic expression. As such the formula is a block of statements which happens to have only a single statement composed of an arithmetic formula:
def expression = it.code.getStatements()[0].getExpression() println expression
The first line traverses the AST to get the formula. It only works because the closure code is of the form define above (single arithmetic formula). println results in:
org.codehaus.groovy.ast.expr.BinaryExpression@186d484[
ConstantExpression[3.8]
("/" at 22:22: "/")
org.codehaus.groovy.ast.expr.BinaryExpression@ea48be[
ConstantExpression[1]
("+" at 22:27: "+" )
org.codehaus.groovy.ast.expr.BinaryExpression@14dd758[
org.codehaus.groovy.ast.expr.VariableExpression@174d93a[variable: km1]
("/" at 22:32: "/" )
org.codehaus.groovy.ast.expr.VariableExpression@61a907[variable: cq]]]]
Although it looks dreadful at first, a second inspection will surface that we have what we need.
A vanilla expression processor for the AST above could be:
def drillExpression drillExpression = { expr -> switch (expr.class) { case BinaryExpression: return "(" + drillExpression(expr.leftExpression) + ")" + expr.operation.text + "(" + drillExpression(expr.rightExpression) + ")" break case ConstantExpression: case VariableExpression: return expr.text break default: return "" } }
This would return the string: “(3.8)/((1)+((km1)/(cq)))”
From here I think it is quite easy to see how one could take an expression and covert it to LaTeX or Fortran code (the remaining work is really just LaTeX/Fortran syntax).
There are 2 drawbacks from this approach: It requires work to do the AST traversing and supporting for all AST types would be daunting work. At least in my malaria case the amount of work required is very manageable.
A completely different strategy to this would be to Monkey Patch numbers (i.e. massively alter the definition of the classes) and variables in a radical way: not to produce arithmetic results but to, say, generate LaTeX sources. That is probably possible, but it would be one of the worse examples of monkey patching that I could think of. Monkey business indeed!
There is also Groovy Code Visitor pattern that I did not explore… It would be probably a variation of the AST traversal strategy presented here.
Filed in: bioinformatics, declarative programming, groovy, malaria











