Archives for September 2007

Bioinformatics web services and moving

Just recently (i.e., last week) I have changed the research location where I work in order to pursue a PhD. From conservation and animal genetics (the focus of my previous research center) to tropical diseases (where I am now - and I confess - motivates me much more).

The first thing that I have published during my MSc was a small web service to download, organize and visualize complete mitochondrial genomes from multiple species. The web service requires some server (obviously). Its purpose is completely out of the scope of what my new place does. Also, there is nobody capable of maintaining the application running on previous place. For now it is working, but I don’t know for how long. I suppose that after the first power failure there the machine will simply not be rebooted back.

There is this obvious, immediate, question of maintenance of services that have one person (or a very small team) behind coupled with a lab which really has no professional infrastructure available. I would guess that my application was not the first, and won’t be the last to disappear because of a fragile infrastructure (human, technical or other).

After that service going public it occurred to me the fragility of the whole situation, so I took measures to avoid it happening again: My subsequent developments were all client side applications (Java Web Start to make it easy on users) and I bought my own Internet domain (I already had server space so it was not that expensive) to host the applications.

My fundamental point here is not to propose my solution (which is surely not feasible in some scenarios - many server side apps have to be server side) but to draw attention to a problem which you might have in the future, a problem that will affect the longevity and usefulness of your work.

PS - BTW, If you would happen to have the ability to host a not computationally intensive BioPerl application (This one), I would really be thankful.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: bioinformatics

by: tiago

1 Comment

Patents: apples and potatoes

First of all, I would like to apologize for a long time of silence. I am in a middle of a turmoil here:
Stopped working in a conservation genetics research group in Portugal, helped co-organize a conference in conservation genetics in Montana, USA (20 hours flight in each direction :( ), and tomorrow I am flying to Liverpool to start fresh new work in Malaria. In the middle I was sick with some sort of food poisoning, so…

Hopefully things will get much quieter and stable. Being able to work in poverty diseases is somewhat of a dream of mine and I am very happy with the prospects…

In the mean time, I would like to answer to this post by Deepak Singh. I think he more or less gets it all wrong ;) .

There is one a priori issue with patents (which I call “apples and potatoes”) which stems from the fact that patents are both discussed in the context of new drugs and software (in bioinformatics that happens a lot). The main problem is that they are completely different kinds of problems and, as such, one cannot “transport” (consciously or, more commonly, unconsciously) the reasoning that is done in one domain to the other. Here I will discuss software only, especially because it is the domain that I understand the best.

First, a minor point, the notion of “trade secret”. There is a simple solution to make a certain algorithm a “trade secret”: closed source. Yes, reverse engineering is possible but it is very uncommon these days, and why it is very uncommon? Because there is no such thing as “sheer genius” in creating new algorithms (and that will be my main point of disagreement).

This might sound shocking but creating new algorithms in CS is intellectually cheap. When somebody releases (closed source) software with a new cool thing, there is no need to look at the code: by just seeing the behaviour it is, in the vast majority of the times, enough to devise an algorithm (which might be or not the same) to do the same thing.

Another example: A couple of years ago I remember James Gosling (Java’s father) talking about storing the source code of a program using some sort of abstract syntax tree notation, this allows for very sophisticated things to be done in programming environments. When I read that I remembered a very clever colleague of mine having the same idea a few years ago (not to say there is prior art, for instance in Computer Associate’s Gen product). Ideas (algorithms) are cheap.

Take Google for instance: Fundamental parts of the search engine technology are just taking ideas that were not feasible before (like taking the whole database in memory) and using them in non conventional ways. Gmail? Ajax was around for long. I am not saying that the final product is not fantastic, what I am saying is that what makes it fantastic is NOT some new algorithm.

Sometimes, there are ideas that were considered terrible and reappear as fantastic (I am thinking here, for instance, of Python’s block by indenting, which existed before and was seen as dreadful).

What is really expensive/important in an application? Development time and effort is the fundamental piece. Having ideas is very cheap, developing products is very hard. The solution? Copyright. Copyright is the best way to protect the expensive development investment. [On a personal note, I give away all my code like all open source developers out there, but I am thinking on those that don’t want to do it]. There are other ways to profit from the code (other than copyright) like a service oriented approach, but I will not discuss that here.

There is another pragmatic issue: The massive number of algorithms that even a small application uses. I would be paying royalties to hundreds of companies even for my small projects. From a pragmatic standpoint that would put almost everyone out of the business. This is, by the away, another evidence that algorithms are cheap to invent, considering the many millions that are around to do all kinds of things in all kinds of ways.

Note that I didn’t even tried to frame this discussion on moral and political grounds, only on pragmatic and economic ones.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: bioinformatics, software engineering

by: tiago

3 Comments

Manu Chao

Manu Chao has a new album, here is one song, rainin’ in paradize, with a new video directed by Emir Kusturica.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: timeout

by: tiago

1 Comment

Conservation Genetics Data Analysis Course

There won’t be many posts here during the next couple of weeks as I am one of the organizers of the Conservation Genetics Data Analysis Course. Feel free to have a look at the website. Comments are most welcome.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: bioinformatics, science

by: tiago

No Comments

Biopython and Population Genetics

I am currently submitting code to the Biopython project to support research in Population Genetics.

As far as I know (and I might by wrong/outdated) the only support for Population Genetics in the Bio* projects is within BioPerl (see
this) which doesn’t cover a lot of ground.

The current status of the BioPython PopGen module is described in a email that I have sent to the biopython-dev mailing list and that I include below.

The reason I am posting this here is that I would like to have suggestions of things to implement in Bio.PopGen from more than the biopython-dev community (which includes only a couple of population geneticists). Are you doing research in Population Genetics? What would you like to see in a PopGen library? I am not promising that I will implement all requests, but, with your feedback I will have an idea of what people need I will direct my efforts to implementing needed features instead of doing work that might be, at the end of the day, worthless…

Anyway I have decided that I will put aside some of my time to help Biopython with regards to population genetics.

The email that I have sent with the status of Biopython PopGen development:


Hi!

This is a small mail to inform all of the effort to create a Bio.PopGen.

What is currently available doesn’t still deserve to be called a
Population Genetics module per se. But I think we are getting there…
So what is available?

There is code, test code and documentation for working with GenePop
files, a format which I suppose is reasonably widely used in
population genetics (at least when not considering sequence based
data). I am thinking in closing the related bug.

There is code, test code and documentation (in this case, under
review) to work with Fdist. FDist is a moderately used selection
detection application. The main purpose of this code is to serve as a
“commit exercise” of moderate dimension before starting to commit more
important stuff (therefore learning and making mistakes with a less
important component).

3 important parts follow: Statistics, Coalescent Simulations and
HapMap. For these parts there is already code written…

Statistics: Ralph Haygood sent me code to deal with sequence based
data. I have myself code to deal with no-sequence based data. I will
work on merging both code bases. Documentation and test code will
follow. At this point I think we could say that we have a bare bones
Bio.PopGen module.

Coalescent Simulations: There exists written (and published on a
journal) code to work with simcoal2. Most documentation is also
written. At this point I would guess Bio.PopGen would compare rather
favorably with BioPerl.

HapMap: Part of the code is written, but more will have to be done.

This is the current status of things as I see it from here…
Comments, corrections, discussion would be most welcome…

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Python, bioinformatics

by: tiago

1 Comment