I am currently submitting code to the Biopython project to support research in Population Genetics.

As far as I know (and I might by wrong/outdated) the only support for Population Genetics in the Bio* projects is within BioPerl (see
this) which doesn’t cover a lot of ground.

The current status of the BioPython PopGen module is described in a email that I have sent to the biopython-dev mailing list and that I include below.

The reason I am posting this here is that I would like to have suggestions of things to implement in Bio.PopGen from more than the biopython-dev community (which includes only a couple of population geneticists). Are you doing research in Population Genetics? What would you like to see in a PopGen library? I am not promising that I will implement all requests, but, with your feedback I will have an idea of what people need I will direct my efforts to implementing needed features instead of doing work that might be, at the end of the day, worthless…

Anyway I have decided that I will put aside some of my time to help Biopython with regards to population genetics.

The email that I have sent with the status of Biopython PopGen development:


Hi!

This is a small mail to inform all of the effort to create a Bio.PopGen.

What is currently available doesn’t still deserve to be called a
Population Genetics module per se. But I think we are getting there…
So what is available?

There is code, test code and documentation for working with GenePop
files, a format which I suppose is reasonably widely used in
population genetics (at least when not considering sequence based
data). I am thinking in closing the related bug.

There is code, test code and documentation (in this case, under
review) to work with Fdist. FDist is a moderately used selection
detection application. The main purpose of this code is to serve as a
“commit exercise” of moderate dimension before starting to commit more
important stuff (therefore learning and making mistakes with a less
important component).

3 important parts follow: Statistics, Coalescent Simulations and
HapMap. For these parts there is already code written…

Statistics: Ralph Haygood sent me code to deal with sequence based
data. I have myself code to deal with no-sequence based data. I will
work on merging both code bases. Documentation and test code will
follow. At this point I think we could say that we have a bare bones
Bio.PopGen module.

Coalescent Simulations: There exists written (and published on a
journal) code to work with simcoal2. Most documentation is also
written. At this point I would guess Bio.PopGen would compare rather
favorably with BioPerl.

HapMap: Part of the code is written, but more will have to be done.

This is the current status of things as I see it from here…
Comments, corrections, discussion would be most welcome…

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

One Comment to "Biopython and Population Genetics"

Please share your thoughts