Bioinformatics web services and moving

Just recently (i.e., last week) I have changed the research location where I work in order to pursue a PhD. From conservation and animal genetics (the focus of my previous research center) to tropical diseases (where I am now - and I confess - motivates me much more).

The first thing that I have published during my MSc was a small web service to download, organize and visualize complete mitochondrial genomes from multiple species. The web service requires some server (obviously). Its purpose is completely out of the scope of what my new place does. Also, there is nobody capable of maintaining the application running on previous place. For now it is working, but I don’t know for how long. I suppose that after the first power failure there the machine will simply not be rebooted back.

There is this obvious, immediate, question of maintenance of services that have one person (or a very small team) behind coupled with a lab which really has no professional infrastructure available. I would guess that my application was not the first, and won’t be the last to disappear because of a fragile infrastructure (human, technical or other).

After that service going public it occurred to me the fragility of the whole situation, so I took measures to avoid it happening again: My subsequent developments were all client side applications (Java Web Start to make it easy on users) and I bought my own Internet domain (I already had server space so it was not that expensive) to host the applications.

My fundamental point here is not to propose my solution (which is surely not feasible in some scenarios - many server side apps have to be server side) but to draw attention to a problem which you might have in the future, a problem that will affect the longevity and usefulness of your work.

PS - BTW, If you would happen to have the ability to host a not computationally intensive BioPerl application (This one), I would really be thankful.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: bioinformatics

by: tiago

1 Comment

Patents: apples and potatoes

First of all, I would like to apologize for a long time of silence. I am in a middle of a turmoil here:
Stopped working in a conservation genetics research group in Portugal, helped co-organize a conference in conservation genetics in Montana, USA (20 hours flight in each direction :( ), and tomorrow I am flying to Liverpool to start fresh new work in Malaria. In the middle I was sick with some sort of food poisoning, so…

Hopefully things will get much quieter and stable. Being able to work in poverty diseases is somewhat of a dream of mine and I am very happy with the prospects…

In the mean time, I would like to answer to this post by Deepak Singh. I think he more or less gets it all wrong ;) .

There is one a priori issue with patents (which I call “apples and potatoes”) which stems from the fact that patents are both discussed in the context of new drugs and software (in bioinformatics that happens a lot). The main problem is that they are completely different kinds of problems and, as such, one cannot “transport” (consciously or, more commonly, unconsciously) the reasoning that is done in one domain to the other. Here I will discuss software only, especially because it is the domain that I understand the best.

First, a minor point, the notion of “trade secret”. There is a simple solution to make a certain algorithm a “trade secret”: closed source. Yes, reverse engineering is possible but it is very uncommon these days, and why it is very uncommon? Because there is no such thing as “sheer genius” in creating new algorithms (and that will be my main point of disagreement).

This might sound shocking but creating new algorithms in CS is intellectually cheap. When somebody releases (closed source) software with a new cool thing, there is no need to look at the code: by just seeing the behaviour it is, in the vast majority of the times, enough to devise an algorithm (which might be or not the same) to do the same thing.

Another example: A couple of years ago I remember James Gosling (Java’s father) talking about storing the source code of a program using some sort of abstract syntax tree notation, this allows for very sophisticated things to be done in programming environments. When I read that I remembered a very clever colleague of mine having the same idea a few years ago (not to say there is prior art, for instance in Computer Associate’s Gen product). Ideas (algorithms) are cheap.

Take Google for instance: Fundamental parts of the search engine technology are just taking ideas that were not feasible before (like taking the whole database in memory) and using them in non conventional ways. Gmail? Ajax was around for long. I am not saying that the final product is not fantastic, what I am saying is that what makes it fantastic is NOT some new algorithm.

Sometimes, there are ideas that were considered terrible and reappear as fantastic (I am thinking here, for instance, of Python’s block by indenting, which existed before and was seen as dreadful).

What is really expensive/important in an application? Development time and effort is the fundamental piece. Having ideas is very cheap, developing products is very hard. The solution? Copyright. Copyright is the best way to protect the expensive development investment. [On a personal note, I give away all my code like all open source developers out there, but I am thinking on those that don't want to do it]. There are other ways to profit from the code (other than copyright) like a service oriented approach, but I will not discuss that here.

There is another pragmatic issue: The massive number of algorithms that even a small application uses. I would be paying royalties to hundreds of companies even for my small projects. From a pragmatic standpoint that would put almost everyone out of the business. This is, by the away, another evidence that algorithms are cheap to invent, considering the many millions that are around to do all kinds of things in all kinds of ways.

Note that I didn’t even tried to frame this discussion on moral and political grounds, only on pragmatic and economic ones.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: bioinformatics, software engineering

by: tiago

3 Comments

Conservation Genetics Data Analysis Course

There won’t be many posts here during the next couple of weeks as I am one of the organizers of the Conservation Genetics Data Analysis Course. Feel free to have a look at the website. Comments are most welcome.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: bioinformatics, science

by: tiago

No Comments

Biopython and Population Genetics

I am currently submitting code to the Biopython project to support research in Population Genetics.

As far as I know (and I might by wrong/outdated) the only support for Population Genetics in the Bio* projects is within BioPerl (see
this) which doesn’t cover a lot of ground.

The current status of the BioPython PopGen module is described in a email that I have sent to the biopython-dev mailing list and that I include below.

The reason I am posting this here is that I would like to have suggestions of things to implement in Bio.PopGen from more than the biopython-dev community (which includes only a couple of population geneticists). Are you doing research in Population Genetics? What would you like to see in a PopGen library? I am not promising that I will implement all requests, but, with your feedback I will have an idea of what people need I will direct my efforts to implementing needed features instead of doing work that might be, at the end of the day, worthless…

Anyway I have decided that I will put aside some of my time to help Biopython with regards to population genetics.

The email that I have sent with the status of Biopython PopGen development:


Hi!

This is a small mail to inform all of the effort to create a Bio.PopGen.

What is currently available doesn’t still deserve to be called a
Population Genetics module per se. But I think we are getting there…
So what is available?

There is code, test code and documentation for working with GenePop
files, a format which I suppose is reasonably widely used in
population genetics (at least when not considering sequence based
data). I am thinking in closing the related bug.

There is code, test code and documentation (in this case, under
review) to work with Fdist. FDist is a moderately used selection
detection application. The main purpose of this code is to serve as a
“commit exercise” of moderate dimension before starting to commit more
important stuff (therefore learning and making mistakes with a less
important component).

3 important parts follow: Statistics, Coalescent Simulations and
HapMap. For these parts there is already code written…

Statistics: Ralph Haygood sent me code to deal with sequence based
data. I have myself code to deal with no-sequence based data. I will
work on merging both code bases. Documentation and test code will
follow. At this point I think we could say that we have a bare bones
Bio.PopGen module.

Coalescent Simulations: There exists written (and published on a
journal) code to work with simcoal2. Most documentation is also
written. At this point I would guess Bio.PopGen would compare rather
favorably with BioPerl.

HapMap: Part of the code is written, but more will have to be done.

This is the current status of things as I see it from here…
Comments, corrections, discussion would be most welcome…

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Python, bioinformatics

by: tiago

1 Comment

GUI metaprogramming example

Preamble

This is an example of metaprogramming in Jython. I would really like to have a simpler example (either in just Python or Java), but this is directly taken from what I am doing. The idiom that I am using, Pythonwise, is a bit strange and old (I am using eval instead of __getattribute__), that is because of Jython’s limitations. This can be seen as a more advanced programming technique (If you are starting to learn programming, you might want to skip this for now, just to avoid excessive entropy in your learning process). Although this example is in Jython, it applies to many programming languages (Python, Java, Ruby, Prolog, …) but not C or C++ (or Caml, unfortunately).

The problem at hand

I am doing a selection detection workbench (to detect loci under selection). At certain points in time, I need to disallow the user to input data to a lot of entry fields, like these:

Disabled fields

As you can see, they are all disabled.

How to do this? Option 1, go to all entry fields, one by one (more than 10, and changing) and call the method setEnabled(False). Lots of repeated code, and when there are changes I would have to add/remove a setEnabled.

Option 2. Do a piece of code to inspect my panel (a panel is what contains all the fields) object, check all object attributes that are entry fields and disable them. The point here is doing code that operates on the code itself. In this case, if one adds a new entry field to a panel, the code would automatically detect the field and disable it. How to code this?

1
2
3
4
5
6
7
8
9
10
import java.awt.Component
 
def disablePanel(panel):
    attrs = dir(panel)
    for attr in attrs:
        try:
            if eval('isinstance(panel.' + attr + ', Component)'):
                eval('panel.' + attr + '.setEnabled(False)')
        except TypeError: #Some attributes are write only
            pass

A small piece…
Line 4 (function dir) gets all attributes for the panel object.
Lines 7 and 8 do all the interesting work (eval, isinstance).
First, eval takes a string and executes it, so if you have

i = 1
i = eval('i+5')
print i

Will print 6. eval is very powerful (think about the possibilities of changing code in runtime). It is also quite dangerous, but I will not discuss that here…

isinstance checks to see if a certain object is an instance of a certain class, so

i = 1
print isinstance(i, int) # Will print True
print isinstance(i, str) # Will print False

So, back to our code
if eval(’isinstance(panel.’ + attr + ‘, Component)’):
is evaluating if panel.’attribute name’ is an instance of Component. For instance, my panel has a attribute, called core (storing the number of cores), which is a drop down list, so, when the code checks for isinstance(panel.core, Component), it will eval to True and execute the next line which is:

eval(’panel.’ + attr + ‘.setEnabled(False)’)
It will evaluate panel.’attribute name’.setEnabled(False), i.e., disable the field, in our previous example, it will do panel.core.setEnabled(False).

I will not explain the exception code as it is not important here.

So, a few lines now make it automatic to disable new entry fields, this without changing the code every time a field is added or removed (other than adding the field itself). Less code to maintain and less possibility of bugs.

I wanted just to illustrate the principle (the language used is not really important), but I need to stress out a fundamental point about this particular example in Python: Because of some Jython particularities I am using an old dialect to do this (Python gurus might be horrified), if you are using Python I recommend you to check __getattribute__ (to replace eval).

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Java, Jython, Python, bioinformatics, metaprogramming

by: tiago

No Comments

Python, Ruby, Java and Threads

Greg Tyrelle made a very important comment regarding exploiting multiple cores and Python (which will surely be included in my next part on bioinformatics and multi-core computing).


First, my understanding of python threads is that they are not separate system level processes, but some kind of fake process that are python specific ? Trouble is that I see two separate process when I launch two Blast runs via threading ?

The other aspect of threading that I’m still not entirely clear about is how the global interpreter lock (GIL) fits into the picture. I get resource locking to prevent race conditions, but is the GIL also invoked each time an action that manipulates memory takes place in a thread ? I’ve heard this property of python makes it unsuitable for multi-core programming ?

I will trade formal correctness for clarity of explanation (namely I won’t discuss that much the difference between thread and process, as it would make this too techy and confusing).

Python uses real (i.e. native) threads. Ruby uses the so called green threads, those are “fake” (simulated). Ruby 2.0 will use native threads.

So, in theory, Python is OK in multi-core architectures. In practice there is a problem, a serious one, identified by Greg: the Global Interpreter Lock (GIL). The GIL makes it impossible for more than one thread to be executing Python code at a time. When you are dealing with Python code, even if you have many threads with many cores, only one thread can be executing Python code. This is not as serious as it looks, there are 4 ways to live with this:

  • If you use a thread to start an external process, that process is not under the control of the GIL (it is a separate process), so it can run concurrently (think BLASTing something) as it is running outside Python, that is, it will be using a different core. So I think it covers one fundamental use case in bioinformatics: using external, computationally intensive, programs. In fact you can start as many instances of external programs as the number of cores you have (or even more, in case you think it will be advantageous). Note that the thread that calls the external application will block (well… depends, but for simplicity lets assume it), but your other Python threads can continue in concurrency with the application.
  • This is subtle, but important: If you use CPython (the standard implementation), and you do your computationally intensive stuff in C (which makes sense - and is a common strategy - as Python is quite slow) then the C code, as long as it is not interacting with Python objects, can release the GIL and therefore make use of multiple cores. The Python code uses only one core, the C part might be using all the remaining available ones. This approach is not valid for Ruby because of the green threads issue (I am a simple Ruby newbie, so take my words with a grain of salt).
  • Now… this GIL problem (or the green threads issue in Ruby) disappears if you use Jython or JRuby, as they use the JVM native concurrency mechanisms which have no notion of acquiring an exclusive lock for execution. By the way you can also use JVM based interpreters to call native (non-JVM) applications (think BLAST again, from inside Java). To put this point in another way: the GIL/green threads problem is not a language limitation, it is a limitation of the standard (C based) implementations that other implementations might not share (and the Java implementations, in fact, DO NOT).
  • If you think about grids (and not multiple cores) then the problem disappears as we are then talking of different processes (even more, running on different hardware).

I am afraid of being too techy with this post (I am probably labeled as 100% computer nerd by now ;) ), but I think Greg’s point is fundamental and required some discussion.

In my defense ;) I would like to say that I am only writing too much about programming because I am in some sort of professional unclear phase, as soon as things get back on track I want to focus more on the biological part of things… Until there I will be writing of the issue that I know better, and that is, for better or worse, informatics…

Comments, especially constructive criticism, is, as always, welcome…

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Java, Jython, Python, Ruby, bioinformatics

by: tiago

3 Comments

Reusing results from (intensive) computations

Still inspired on Depth-First post The Best API May Be No API At All: PubChem and PDB, I would like to suggest a trick to reuse results of computationally intensive computations.

The concepts presented here are object serialization and persistence. The practical case will be presented in Python, but it also works in Java and most probably on other modern languages like Ruby.

Example scenario Imagine that you cross all PDB files available (I think there are around 40.000) and register the minimum distances between all irons that might exist near a protein and the protein itself. You might end up with a dictionary whose key is the PDB ID (like 1FZY) and the value is a list of minimum distance of all existing irons, to the protein, in Angstroms. From this list you might do all sorts of things like see which protein has the smallest distance to an iron, the average distance of all irons to proteins, analyze by enzyme types (by the way, you might also construct a table where, for each enzyme you record the type).

The first strategy might be to:

parse_and_process_all_PDB_files_gathering_required_information #Takes hours
#Now you have:
#enzyme_type  a dictionary that records, for each PDB entry,
#                    the enzyme type (if it is an enzyme)
#iron_distances a dictionary that holds, for each PDB entry,
#                    the minimum distance for each iron to the protein
compute_all_sorts_of_interesting_statistics # Rather fast

This has a big drawback: every time that you want to compute new statistics you have to repeat the whole process, taking hours. Furthermore, if you are not using a local copy of the PDB database, you will be dependent on the network and stressing the servers on the other side.

I would like to propose an alternative, have 2 programs:

parse_and_process_all_PDB_files_gathering_required_information #Takes hours
save_to_disk(enzime_type, iron_distances)

and

enzime_type, iron_distances = load_from_disk()
compute_all_sorts_of_interesting_statistics # Rather fast

After parsing and processing (and maybe fetch it from servers) the raw data, you would save it to disk. Whenever you wanted to compute new statistics you would load the processed data from disk and do the computations without repeating the parsing and processing.

That is, you would only run the parsing and processing phase once (or whenever you need new raw information). That is the time consuming part would only be run once or very rarely.

How to do this? Complicated you think? Not at all… You just have to like pickles (in Python)…

Pickles as an healthy diet component in your Python programming

We will make use of the Pickle module

From the module itself:

The pickle module implements a fundamental, but powerful algorithm for serializing and de-serializing a Python object structure. “Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream is converted back into an object hierarchy. Pickling (and unpickling) is alternatively known as “serialization”, “marshalling,” or “flattening”, however, to avoid confusion, the terms used here are “pickling” and “unpickling”.

OK, now continuing our example, how would we do it?

Saving enzyme_types and iron_distances is as simple as:

import pickle
 
#You do your stuff and create enzyme_types and iron_distances
 
pickle_file = open('data.pkl', 'wb')
pickle.dump(enzyme_types, pickle_file)
pickle.dump(iron_distances, pickle_file)
pickle_file.close()

That is it, it is as easy as this (You can go ahead and open data.pkl with a text editor if you are so inclined).

To load the data? It is as simple:

import pickle
 
pickle_file = open('data.pkl', 'rb')
enzyme_types = pickle.load(pickle_file)
iron_distances = pickle.load(pickle_file)
pickle_file.close()
 
#Compute statistics

Just one point: if you, for some reason, just want to load iron_distances (which was saved after enzyme_types) you still have to load iron_distances (for pickle to consume it, as it is saved before). On the other hand, if you just want to load enzyme_types, then you can ignore iron_distances, as it was saved after.

Caveat: you would need to create well designed intermediate structures, so that you didn’t need to run the parsing and processing phase all the time to create new intermediate structures (from my experience this is not that hard, even for people with little programming experience).

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Python, bioinformatics

by: tiago

3 Comments

Comments to Alexei Drummond’s interview on Blind.Scientist

After a somewhat “rantish” response to Alexei Drummond’s interview on Blind.Scientist I have put in a few more well tempered comments to the interview. The reason I reacted so fast (and so unwisely) is that because most of the content impacts directly with what I am currently doing.

So here are my comments to the interview, commenting each point that interests me:

When biologists start asking about where they can learn to program a computer, just so they can do their job you know something is wrong!

This line of thinking seems to be highly pervasive with lots of researchers in Biology/Computational Biology/Bioinformatics. This is my main point of disagreement. I do think that in this brave new world everybody will have to know how to do basic scripting. I am not saying doing an industrial-strength application. Just doing basic data moving and processing. Like maths become in the 20th century a fundamental tool, basic programming will become one also. Especially when more data becomes available and lab work becomes more automated and fast/easy to do.

Firstly, software development isn’t science.

This I completely agree, although I suppose the author is not referring to some of the underlying algorithms that are below the application (like an alignment algorithm). But, just doing an application is not science, it is enabling science, which is quite different.

Secondly, most academic programmers are not interested in (or good at) designing user interfaces, and certainly developing software is not a scientific outcome that gets recognized like publishing papers does.

Designing a good user interface is surely something I don’t think should be required by scientists when I talk that doing basic scripting is becoming a requirement. It is interesting to note that one can publish papers on applications (program notes, application notes, …), so one can make a “scientific” CV with applications. If it makes sense to use the same reward system for applications and research papers is a completely different issue, but, for now one can get publication entries on the CV with applications.

Thirdly, academics are quite bad at supporting software and documenting it.

Most are bad at developing software in the first place. Using publication as a reward for an application might make sense (if at all) after an application is well established, but I doubt it makes much sense in the beginning of the life cycle of the application. Call me cynical, but on this publish or perish culture, putting the reward in the beginning of the life of a product is a strong invitation not to support it at all (as the main reward is already obtained).

So it seemed to me, that for a lot of reasons, a professional software company was the best avenue to realize a software system that would dramatically improve the productivity of molecular biologists by putting bioinformatics at their fingertips.

Makes full sense, but the idea that all of the programming effort can be taken from the hands of scientists seems to me exaggerated. My main line of reasoning is that most science is a creative process, not a factory process, and some of that creativity cannot be foreseen by application developers, so, some “tweaking” will be needed by the final user (even in less creative professions sometimes word processors and spreadsheets have to be programmed), that tweaking is really something like “script programming”.

Java is a general-purpose programming language — so you can do in Java pretty much anything you can do in software. The main reason for choosing Java is that it is very easy to write sophisticated user interfaces that run on Windows, Linux and Mac OS X.

I currently use the same line of reasoning when developing software. I am currently working on a selection detection application that works inside JVM. Note that I say JVM and not Java. One can use the good things (portability of libraries, especially Swing and AWT) of the JVM using other languages that work on JVM, Jython and JRuby come to mind.

While I do subscribe to the JVM almost completely, I have some doubts about Java. For small applications it is clearly an over engineered language. Even for big applications, although I think some of Java features are good (like explicit typing), there is space for extensibility to be provided by scripting languages like Jython/JRuby (MODELER4SIMCOAL2 works just like that).

For biologists beginning programming (another issue), I would surely not start by teaching Java because of the excessive verbosity and difficulty in getting “simple things done” that puts off a lot of people. Furthermore the learning curve is steep. Python would be my clear suggestion on this front.

Our goal is a happy marriage where academic programmers can get on with developing great new algorithms, and Geneious can provide the interoperability, the user interface and the support.

One of the best ideas I have read in a long time. There is a big difference in thinking an algorithm and the process of developing an industrial strength, easy to use (and, I would like to add, script and extend) application plus maintain/support it. The reward for the algorithm that makes more sense to me is the publication, the reward for the application should be money, to put it in simple terms. The bridging between the two sides of the equation can (should) be done in the way Alexi proposes.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: bioinformatics, science

by: tiago

5 Comments

PDB, accessing data, APIs

I was reading Depth-First’s article on The Best API May Be No API At All: PubChem and PDB and decided to relay here my experience in helping a colleague processing of PDB data.

To begin with, that person wanted to bulk analyze many (thousands) PDB files, furthermore she only knew Java and very little of it.

My suggestion was:

  1. Download all PDB files from PDB using ftp. All being the keyword here
  2. Use Python
  3. Parse the files yourself (i.e., don’t use Biopython’s Bio.PDB)

This goes in line with the idea of the “best API being no API at all” (I am not suggesting this generally, but in this case it made sense).

I suppose some justifications to a lot of counter intuitive suggestions might be in order…

For point 1: The person really wanted to analyze a lot of files in bulk, it made sense just to download them all. As far as I remember we are talking of less than 10GB. I ask myself, that, even in cases we only want to use hundreds/few thousand PDBs, this might make sense: 10GB download is not that much nowadays, it doesn’t take that much space on disk, it doesn’t take that much bandwidth. Regarding being friendly to RCSB I ask what is worse for them: A big download or many queries using CPU, databases, etc? For users, they can now query locally, and if you look at the PDB format, a few pipes of greps can go a long way and give a lot of flexibility.

For point 2: I would like to stress out that the person knew very little of Java. I contend that learning Python (with a smoother learning curve than Java) takes less time and is less frustrating (at least for users that are concerned only with results and not with the “joy of programming”) than learning/using the remaining Java plus the required system and Bio libraries (remember, Java libraries are much tougher and over engineered than Python’s).

For point 3: PDB file format is reasonably easy. Between learning a new API (which is not for free and requires understanding the API developers mind) and processing the files manually I suggested processing the files manually. This had the added benefit of making the person learning simple and very useful file processing. Please note that I am not suggesting reinventing the wheel (in fact I tend to be strongly opposed to that). But with easy file processing it seemed to make sense. I would like to say, in my defense ;) , that I suggested using the wonderful matplotlib for chart drawing and it never crossed my mind suggesting implementing a chart library from scratch.

So, sometimes, not using an existing API might be an approach worth considering.

PS - I still stand by my suggestions. Currently the person seems to have lots of questions about the chemistry of the problem. The programming problems are very rare. And I think that is the main point, computing and programming should not be the fundamental issue.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Java, Python, bioinformatics, chemistry

by: tiago

No Comments

Easy to use bioinformatics interfaces (2/2): MODELER4SIMCOAL2

In yet another shameless promotion exercise I would like to present a easy to use interface in the area of coalescent simulation:

MODELER4SIMCOAL2

modeler4simcoal2 (m4s2) is a modeler for coalescent processes. It allows the modeling of both demographies and chromosomes (i.e., markers with linkage relationships in multiple chromosome blocks).

m4s2 is a Java Web Start application (requiring Java 1.4, available for Windows, Mac and Linux among others). It requires no installation and can be run directly from the web.

The purpose of m4s2 is to allow biologists to concentrate more on biology and the underlying models used on analysis (and less on having to learn a new computer simulation tools). We expect that m4s2 will lower the barrier for coalescent simulator use.

m4s2 was published on Bioinformatics.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Java, Jython, Python, bioinformatics

by: tiago

No Comments