Reducionism and simplification

This post starts with what might seem as a discussion about computing, but is actually a poor man’s discussion about philosophy of science and has nothing to do with computing, it is much more applicable to biology, economics and sociology.

Lets be honest, computer scientists are trained to work for banks and insurance companies, to make web sites, software for cars and things like that. Those domains are actually very simple. A bank might be a gigantic institution, but it is possible to capture, granted with a lot a effort, all its processes inside a computer program. This creates a mental setting: Everything that we need to know is possible to be known: we just decide when to stop.

Now think about those simplistic (this is an understatement) mathematical and computational models for scientific problems (differential equations, Monte Carlo processes, Markov Chains, …). They model the “important parts” of the issue under study. These models are much simpler than the models working in computers to sustain day to day banking chores. Somehow it strikes me as strange that something as mechanic as a bank needs a more complex model than “nature”.

In the context of nature and making mathematical and computational models about it, I have a few things in mind:

First of all, in many problems in the natural world we don’t know what are the important parts to start with. This is very different from the “bank mentality” when you can know everything if you try hard. In my personal case, when I model malarial artesunate resistance, I am modeling something that people speculate how it works, and even if the speculation is correct most of the fundamental parameters are unknown. I am still to read a paper modeling something related to malarial drug use that doesn’t have a phrase like: “the relation between this value is and reality is assumed to be this (no citation – or citing something unpublished – or rationale provided)”.

But the cornerstone of my reasoning is that, in complex processes, the devil is in the details and in the interactions between participating factors (most of which we
are unaware of). Soft sciences are holistic by nature. The property of the whole system comes from the everything and everywhere. The “banking” and “hard science” mentality are no good here, we cannot know everything, what we know is probably not enough, and most simplifications will lose something fundamental.

Does this means that I am suggesting that we should stop modeling and all theoretical work? By no means, but we should refocus:

  • This is not hard science, don’t try to mask it as such. Hard rules, sensitivity analysis are mostly artifacts to make things look more “serious” and more “demonstrated”. This is biology (or even “worse”, economy or sociology), you don’t c.q.d. here.
  • Think you can forecast the future? You think you can… thaen bring me a always correct forecast of the weather in 2 months and I will listen to you. Most models that exist to forecast the future are there because they are very hard to disprove TODAY: climate (as opposed to weather) models, epidemiology, … . The vast majority of models that can be tested fail (think mathematical finance and the current subprime crisis in the USA, think weather predictions…).
  • Theoretical work, although not being able predict the future (or explain the past) might help create a cognitive and linguistic framework for discussion: present the fundamental concepts and narratives underlying the research process, make the discourse clearer, less cloudy, point dangerous imprecisions. This is actually the inverse that what happens now: theoreticians speak in a language that most people struggle to understand.
  • Theoretical work can create interesting questions for field scientists to try to answer: It is the precise inversion of what happens now: We don’t want models that are cheated to look realistic. We want reasonable models that fail miserably so that we can ask field scientists: This is failing, why do you think this happens? Have you considered this other hypothesis? What about testing it?

<sarcasm>
The existing modeling culture is quite good in the current scientific setting: Makes theoreticians look intelligent with all those complicated mathematics and computer programs (and associated publications) and excuses “practical” scientists of even trying to use their brains: They just apply the existing theory in a process that is more industrial then creative to their research questions. The biggest example that I know of this is phylogenetic analysis: Get data from the field, compute a mutation model from the premise that a small genetic distance is better, burn CPU cycles, publish – You don’t even need a human for this – a trained monkey is probably enough.
</sarcasm>

In economics things are a bit worse: elaborate game theories and such are presented as a “hard, undisputed” justification for an economic theory serving some nice agenda. Nothing more than a authoritarian argument.

PS – If you work in an hard science like physics or chemistry you might be thinking that I am smoking something very strong. I don’t think that this post applies to hard sciences, that is a different game altogether.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: biology, science

by: tiago

1 Comment

Malarial drugs and the economics of (human) languages

There is some interesting lack of precision, to the point of “error” on the way some concepts are dealt with by human language.

Take, for instance, the concept of drug half-life, i.e. the time that it takes for the concentration of a drug to drop to half (drug concentrations in the blood are normally modeled through exponential decay), it is conceived as a property of the drug – people talk about drug D has an half-life of H hours – but it is really a property of both drugs and individuals (actually is much more complicated than that, we could repeat the argument).

And no, this has not only to do with statistical deviations that are acceptably approached by the drug only.

As example, there is a study about the pharmacokinetic properties of Sulfadoxine-Pyrimethamine (a widely used cheap antimalarial). In this study, there is a big deviation for half-life (and other parameters) for the children between 2 and 5 years. The study concludes that “dose recommendations need revision” for that group. To put in another way, half-life (and other parameters) is not (only) a function of the drug.

Now, I am not suggesting that the concept of half-life tied just to the drug should be thrown away. I am just speculating why it is framed as a function of the drug only, as clearly that is not the case.

First there is probably historical inertia: The concept was first framed that way at a time that it seemed that half-life was only dependent on the drug and it stuck by “memetic” inertia.

But, much more importantly, it is still there because, it is both less expensive (it is easy to express half-life as a function of just the drug, than other parameters which might be still crucial in some situations) and still meaningful enough in many contexts (for instance, expressed as a function of drug it is still useful to compare the half-life of Artemether – short – against Sulfadoxine – long – for many kinds of reasonings). Even when the most economical concept entails some errors it might still be practical. The problem only arises when its simplicity has bad consequences (in this case, having wrong drug doses)… but, in certain contexts, it might be a problem, a serious problem (See my previous text about the notions of resistance, tolerance and sensitiveness for an example).

It all depends of the discourse context, but one should be careful.

As an anecdotal example if you are seriously ill and a doctor prescribes you a pill, do you prefer to hear “this will cure you” or “this will drop the parasite load at a rate of 1 order of magnitude per hour starting 3 (90% CI of 2.5 – 3.5) hours after intake. Parasite load is expected to drop to 0 in 10 hours”?

The problem arises when the cognitive bias of the simplicity of “this will cure you” gets into more rigorous contexts.

This has implications on the computational modeling of concepts. The tradition in computer science it to “dig down” to the “real meaning” of concepts. In that sense simpler explanations are deemed “wrong” (and should be rewritten in terms of “correct” conceptualizations). Maybe a different strategy is needed, one that takes some linguistic and cognitive economy to computational systems (while still maintaining rigorous and precise reasoning and conceptualization when that is needed – like human languages can do).

I am going to stop here, but I think that one of the problems that impairs mathematical modeling is the application of the “certainty of numbers and formulas” to non-rigorous concepts. Then you have the worst of both worlds: an authoritarian argument (mathematics is a foundation for authority. “The numbers prove it”) based on modeling vague, imprecise and wrong concepts. But that is a topic for a another post.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: bioinformatics, cognition, malaria, science

by: tiago

No Comments

Carlos Paredes and Tocá rufar

Portuguese guitar

For lovers of world music this the best you can get from Portugal.

A more recent project is Tocá rufar, a youth project based mainly on the south part of Lisbon targeting mostly kids is less well-off areas. Based on the local tradition of mobile bass drums. In this video you only see a few of them, but they are a “army” of hundreds and sometimes they play all together!

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: timeout

by: tiago

No Comments

Holy Grail: The quest for THE programming language

Being a computer scientist with a strong interest in languages (languages in the broadest sense possible: programming, natural and cognition related issues), I am in an holy grail quest for a programming language that:

First and foremost allows me to express my computations in a way that is close to the problem domain (as opposed to close to the machine). As I am working in a biology setting that means being able to talk about concepts around genes, epidemics and pharmacology in my programs. I don’t want to think about CPUs, memories and things like that when I am coding. Prolog and Lisp are good examples here. I also need programs that can evolve over time as knowledge changes, I need strong metaprogramming and Domain Specific Language facilities.

Unfortunately I have a couple more requirements coming from the day to day reality…

Real world: I want a language that interacts with existing libraries and that I can easily make available to other people to use, inspect and change. I need Bio* libraries, graphics plotting libraries. I my personal case I decided that I want to work inside the JVM, so I need a language that works in the Java world (Jython, JRuby, Scala, Groovy, … Java).

Software engineering: Programs have to be easy to maintain and debug. I guess there is no way around explicit typing on the debug and tool construction front.

Ridiculous religious fanatic quest? Yes, it might be, but I am pursing it.

The truth is that we are not far away from this grail.

Scala is almost there. Lacks metaprogramming and things like type inference are a bit amateurish (compare it with CAML).

JRuby is maybe there, I could live with it, I guess. The lack of explicit typing will make things difficult in the long run on the software engineering front.

I decided to give a final try to yet another language: Groovy, and up to now it is going very OK. Seems to nail all the fundamental points. I especially love the effort on good metaprogramming facilities.

I decided, for pragmatic reasons, that after this one I will stop my pursuit for the grail. If Groovy proves a blunder of some sorts I will revert to JRuby and carry on.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: bioinformatics, declarative programming, groovy, metaprogramming, science, software engineering

by: tiago

6 Comments

Rock star politics or genuine and honest ideas?

The rock star:

Or is it “Genuine and honest ideas”? Obama from 1995 (Digg here).

Maybe is rock star politics AND genuine and honest ideas.

I tend to by cynical, but I want to believe.

If I was American I would vote for him. We the current remaining candidates, no doubt for that (I must say I liked Kucinich and Edwards). Obama > McCain > Clinton.

Female US president? Michelle Obama in 8 years time.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: uncategorized

by: tiago

No Comments

Bio.PopGen

I am currently developing the Biopython module for population genetics and genomics (by the way, you are invited both to help with the development and to make suggestions – maybe based on your needs – for new features).

On the current (1.44) version of Biopython, a GenoPop parser and code to deal with FDist (a Fst outlier method for selection detection) is available.

It is my pleasure to announce that coalescent simulation (in the form of support for the SimCoal2 simulator) is currently available on CVS and will probably be out on the next public version. This includes, code, test code and DOCUMENTATION. This means you can now do coalescent simulations from inside Biopython (many demographies and markers supported).

Future plans for Bio.PopGen include statistics (the meat of the module, actually) and HapMap support, among others.

Need any feature? Just ask. I cannot promise it, but I will try to address user requests in as much as possible

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Python, bioinformatics, biopython, population genetics

by: tiago

No Comments

A holiday, Ruby and Scala

I really did not have an holiday, but I stopped posting for a while.

But I want to talk about another “holiday”: Scala.

I have spent a couple of months with Scala: A functional-OO programming language done from the scratch with the JVM in mind, with a nice, smart community.

I actually decided to stop my efforts on Scala and decided to go back to explore the Ruby way… The reasons:

  1. No metaprogramming facilities. This comes from ML, I suppose. But Ruby has it and many “old school” elegant languages have it (Lisp, Prolog). It is possible to be elegant (in fact I would contend that in many settings it is a requirement) with metaprogramming.
  2. There seems to be some difference in the semantics between compiled and interpreted. I only compiled, but the interpreter could add new variables to its local scope (as it really needs it) but the compiler couldn’t. While one might argue that that is excessive flexibility coming from the scripting languages camp, but I actually had to, on a compiled program, to create new classes which would include traits that would be dependent of need of the user, and this cannot be done. If one has many traits, it has to compile a priori all the trait mixins desired, they cannot be defined at run-time in a compiled environment (contrast this to JRuby or even JPython). This is actually metaprogramming lacking part 2.
  3. Type inference: Scala type inference might seem clever, but, compared to CAML it is not. Sometimes the compiler is not able to infer the types and the user has to explicitly declare them. CAML was always capable (at least in my cases) of complete type inference.
  4. Information sources are scarce. The mailing list is reasonable, but sometimes questions get unanswered and there is no other source (other than inspecting the source code). This will sort out if there are more people using it – and more books like the Artima ebook.

Decent metaprogramming in a runtime setting would be my main requirement, but in the current Scala status, one can only have it though the typical Java way: execute the compiler, link a jar, not elegant…

Regarding Ruby, I would like to have some form of strong and explicit (or inferred) typing. I would imagine that the requirements of metaprogramming flexibility and typing are contradictory, but, at least, some kind of optional (but standard) annotation for input/return parameters would allow avoiding some debugging nightmares of not having the compiling helping with types and would also allow for smart code editors to do all that fancy completion that is possible with explicit typing.

[This was initially posted - with modifications and additions - on Artima as a comment]

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Ruby, Scala

by: tiago

5 Comments

Death Penalty Repealed in New Jersey

The title says it all. Lets enjoy the good news. Not all is dark and bleak for human rights post-9/11.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: life

by: tiago

No Comments

Slowing down

I very much doubt that nature has prepared the human species for the very fast pace of today’s society. Moreover, we are supposed to have our attention span distributed around a multitude of issues. Not only that, we are supposed to answer fast, interact in real time.

Not only I doubt this is a path to General Human Happiness(TM), I also doubt that there is a real increase in productivity from all this speed increase. Blogging (especially reading) is just increasing the pace of things even more.

During the last couple of weeks I have to put off most of my multiple tasks in order to get work done in time for a presentation next week. I was afraid that I would not have enough time to do half of it. Guess what? Everything is done by now. Of course, no blog reading (ok, little), no blog writing and very little distractions were allowed.

I decide to very consciously reduce the speed of things, especially of interaction and multitasking.

Blog reading? Of course, 1 day a week. Cut the diet in the number of blogs. But not the areas covered: I still read blogs in my areas of interest (bioinformatics, poverty diseases, population genetics, cinema, economy, human rights, fitness and practicing sport).
Answering to interesting blog entries? Of course. But, I can do it in this hour (a.k.a. in blog time) or take a couple of months.
Journals? More or less the same rule applies. The noise ratio is very high. Just digging for gems takes time. I still look at the RSS feeds everyday though.
Mail? 3 times a day max (unless there is an urgency going on. Urgency, means urgency. For now there have been 0). I am also back at using a text-based mail client, it is more efficient, after a learning curve.
Lunch? Not the anglo-saxon variety that is for sure. I am doing the typical Portuguese hour long, away from work. And I am going Spanish: Siesta! Sleep is important. I live 2 blocks away from where I work. I do really see more freshness in afternoon work when I do this.
Whenever possible I respect my biological clock. Want to stay extra time in bed in the morning? Of course (Although, being an early bird, I am normally at work before 8am, to be honest).
Too noisy in the office? I move my work to a quieter place.
I don’t work at home. Home is to rest.
This allows for, imagine, spending most of my day actually working on my core tasks. And less tiredness.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: uncategorized

by: tiago

No Comments

Lawrence Lessig

An impressive presentation about creativity, law and the producer/consumer paradigm applied to culture.

Hopefully will restart blogging soon. Sometimes inertia creeps in.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: life

by: tiago

2 Comments