Archive for the ‘Tips’ Category

Nice reads mostly older:

  • [jvm] YouDebug – A debug with a scriptable UI (not a GUI). Allowing for automated debugging. The main justification for this is e.g. on costumer sites where you cannot fire a graphical debugger. But an automated debugging framework can be the ground for some cool debugging tools. Furthermore it is scriptable in Groovy!
  • [general]Are we there yet? – “Rich Hickey advocated for the reexamination of basic principles like state, identity, value, time, types, genericity, complexity, as they are used by OOP today, to be able to create the new constructs and languages to deal with the massive parallelism and concurrency of the future”. This is a 1 hour video (with synchronized slides). While I disagree with the basic premise that concurrency is fundamental in the future, this presentation is one of the most invaluable videos about the design of programming languages.
  • [lisp] A Lisp User-group meetings calendar (Google based)
  • [clojure] Be mindful of Clojure’s binding – Pitfalls of lazy sequences and thread bindings. The lazy sequence stuff seems more of an example of global variables are bad thing, as the example is dependent on an external symbol evaluation, but clearly one has to be mindful of lazy versus non-lazy evaluation.
Social network sharing
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn
  • connotea
  • FriendFeed
  • Twitter
  • Yahoo! Bookmarks

BioJava has a parser for the nexus file format. While Nexus supports quite a lot of information (like DNA sequences), the most complex part to process is the phylogenetic tree descriptions based on the Newick format. Below you can find a small tutorial on how to process these.

BioJava’s parser relies on JGraphT to create a representation of the “tree”. The tree is actually really more an acyclic graph than a tree, though some trees are rooted (and therefore trees in the proper sense). Manipulating the JGraphT weighted graph is the complicated part, not really the BioJava interface. Note that JGraphT objects can be easily rendered using the JGraph library (yeah, it is confusing: there is one lib with graph algorithms called JGraphT and another, for vizualisation, called JGraph).

In this small tutorial, we will only try to write a textual representation of a tree.

Imagine this simple nexus file:

#NEXUS

Begin TREES;
	tree test1 = (1,2);
	tree test4 = (1:0.1,(2:0.2,3:0.3):0.4);
End;

We just want to draw this:

coder@move-on:~/development/biobug/test$ java Test test1.nex test1
Will process file test1.nex tree test1
p0
  1: 1.0
  2: 1.0
coder@move-on:~/development/biobug/test$ java Test test1.nex test4
Will process file test1.nex tree test4
p0
  1: 0.1
  p1: 0.4
    2: 0.2
    3: 0.3

So, tree 1 is composed of nodes (leaves) 1 and 2 and the inner node which was named p1. Tree 2 has distances.

By the way, we will also want to know which trees are in the file.

Lets start!.

So, we start by loading and parsing the file:

import org.biojavax.bio.phylo.io.nexus.*;
 
[...]
        //file is a String with the name of the file to be processed
        NexusFileBuilder builder = new NexusFileBuilder();
        NexusFileFormat.parseFile(builder, new File(file));
        NexusFile nexus = builder.getNexusFile();

Nexus files have several blocks (Taxa, Data, Tree, Set). We are interested in getting the Tree block, lets do a function for that:

    TreesBlock getTreeNode(NexusFile nexus) {
        Iterator it = nexus.blockIterator();
        NexusBlock block;
        while(it.hasNext()) {
                block = (NexusBlock)it.next();
                if (block.getBlockName().equals("TREES")) {
                        return (TreesBlock)block;
                }
            }
            return null;
    }

We get the nexus block iterator and go through it until we find a block whose name is TREES, and return that block.

No that we have the TREES block, lets print the names of all trees:

    void printTrees(NexusFile nexus) {
            TreesBlock node = getTreeNode(nexus);
            Map trees = node.getTrees();
            Set keys = trees.keySet();
            System.out.println("Trees:");
            for (Object obj : keys) {
                System.out.println(obj);
            }
    }

We get a map, where the key is the name of the tree and the value would be the tree as, essentially, a String based (but with some annotated info) representation (not a graph). Now, given a certain name, lets get the graph:

import org.biojava.bio.seq.io.ParseException;
import org.jgrapht.*;
import org.jgrapht.graph.*;
 
[...]
 
    WeightedGraph<string , DefaultWeightedEdge> getTree(NexusFile nexus, String name)
    throws ParseException {
        String topNode;
        TreesBlock node = getTreeNode(nexus);
        WeightedGraph</string><string , DefaultWeightedEdge> graph = node.getTreeAsWeightedJGraphT(name);
        topNode = node.getTopNode();
        System.out.println("The top node is: " + topNode);
        return graph;
    }
</string>

Note that getTreeAsWeightedJGraphT will do some parsing, so ParsingException can be raised. Note also that the top node name can be retrieved (in the case of tree test1, that will be named p1). Some considerations: You can change the rules to create internal nodes; if there are clashes of names inner nodes will be renamed (not leaves!).

Regarding the top node, we call it top node and not root node. While from a data structure perspective the tree has a root, from a phylogenetic perspective the tree might be rooted or not (in which case being root has no meaning, and it is really just a simple weighted acyclic graph). How to know if the tree is rooted? Remember the function to get all trees (getTrees)? The value of the map has a method called getRootType. So, to know if is rooted, you need to use that function. Not the best design… but at least works.

Ok, now we just need to print a tree…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
    static void dump(WeightedGraph<string , DefaultWeightedEdge> graph,
            String parent, String node, String depth) {
        Set</string><string> verts = graph.vertexSet();
        String vertex = "";
        for (String candidate : verts) {
            if (candidate.equals(node)) {
                vertex = candidate;
                break;
            }
        }
        System.out.print (depth + vertex);
        if (parent != null) {
            System.out.print (": " + graph.getEdgeWeight(graph.getEdge(parent, vertex)));
        }
        System.out.println();
        for(DefaultWeightedEdge e: graph.edgesOf(vertex)){
            if (graph.getEdgeSource(e).equals(node)) {
                dump(graph, vertex, graph.getEdgeTarget(e), "  "+ depth);
            }
        }
    } 
</string>

Ok, this is the complicated part. Note the following:

  • The complexity has to do with processing graphs
  • dump is a recursive function
  • Node is synonym with Vertex
  • Notice the important bit, if you know that there is a node called “bla”, it is not enough to do graph.containsVertex(“bla”). The answer will probably be false. Remember that one thing is reference (which we have here, i.e. ==) and not content equality (.equals). See below, a remainder
  • Finally we go through all edges referencing the current vertex and choose the ones that start on the current one. Again, if the tree is unrooted, the notion of direction does not apply, but it is still good to do a “tree” traversal

We end here.

Regarding the “equal” issue remember that:

        System.out.println("a" == new String("a"));
        System.out.println("a".equals(new String("a")));

Returns false, true. By this order. This is important when traversing the graph. If you know that the reference is equal (and it is when we getEdgeTarget) than one could use it. If you don’t know (like you pass a String that you have constructed yourself or got from some other place), then one needs to go through the vertex/node list and do a .equals to get the correct vertex.

A small example with all the above, is here, ready to use.

Social network sharing
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn
  • connotea
  • FriendFeed
  • Twitter
  • Yahoo! Bookmarks

Having read Tim Bray’s tips for Clojure newbies from a newbie, I decided to write a version of my own.

First, I generally agree with Tim’s observations, so this post is more like minor extension. In fact I recommend reading Tim’s version first.

My background so that you know my point of view: Added to Java, C and Python experience I have lots of exposure to Prolog (which is, like Clojure, homoiconic) and some experience with OCaml (a functional language). I have had some practical contact with other “modern languages over JVM”: Scala and Groovy. No Lisp experience (other than basic elisp).

Reading suggestions: Other than Tim’s comments I would add: The clojure site information is really not very useful as a starting point, the reference part assumes that you already know quite a bit and it is though on newbies, the only thing that I tend to use is the API section. Like Tim, I also use Mark Volkmann’s introduction (It is my main documentation, and I would be a bit more positive than Tim about it. For an intro article it is great. I strongly recommend it as a starting point and as the reading anchor during the first weeks). Next week I plan to order Stuart Halloway’s book, so I still cannot comment on that.

On a more general note, while learning Clojure, I’ve found Paul Graham’s On Lisp (available for free), a gem and I would strongly recommend it. It is not an easy read, it probably takes months to digest the content. But it is really a great book.

Clojure is a fast moving target. Documentation of many modules and functions might not be up to date with the current code. I have noticed that sometimes the best “documentation” ends up being reading the code (clojure is hosted on github and so are many satellite projects – In fact getting familiar with github is probably another recommendation).

Some of the contrib stuff is a bit too green, and I would recommend inspection of some of the modules before using them. Just because it is accepted on clojure.contrib it does not mean it is production quality, has even bare functionality or the exposed API is stable. As an example the graph API is minimal and I question if the graph structure directed-graph is enough to represent a general directed graph (future changes to it will probably break existing code built on top of it). This is not a criticism, a new product is bound to be fast changing, and agile methods of development will entail lots of instability at the beginning. But a caveat should be added to the stability and completeness of parts of clojure-contrib.

Regarding editors and IDEs, I would probably recommend for you to stick with what you feel more comfortable with (vi, emacs, netbeans, eclipse, …). Note that Clojure, being a Lisp derivative has a big share of its user base on emacs. I mainly use Netbeans using the wonderful enclojure plug-in. I can only say that encloure is stable enough for usage and I recommend it if you are a Netbeans type of person.

I also second Tim’s comments on namespace hell. Uses/imports/requires can be quite confusing. Hell is too harsh of a word, but purgatory seems an accurate description ;) . Also, as Tim says, the clojure mailing list and IRC channels are very, very helpful.

One of the most annoying kinds of bugs come from typing problems, things like this:

(defn hiddenBug [a b]
  (println a) ;lets do a println for debug purposes
  (println b) ;lets do a println for debug purposes
  (if-not (= a b) (println "they are different!"))
)

Now lets call this:

(hiddenBug 'x "x")
x
x
they are different!

Notice that, when you are debugging a and b will seem equal on a println, but they are not (one is a symbol, another a string)!

The biggest gotcha that I have been getting is the expectation (Java based) that stupid things raise exceptions, but sometimes they don’t. Here is an example:

user=> (contains? '(a b) 'a)
false
user=> (contains? 'blab 'a)
false
user=> (contains? (list 'a 'b) 'a)
false
user=> (contains? ['a 'b] 'a)
false
user=> (contains? '(a b) :a)
false
user=> (contains? '(:a b) :a)
false

In all these cases, the Java-expecting gnome inside me was hoping something of a throw (as the first argument is not of the type required by contains? ) . It is not clear to me if this is a design issue with contains? only, or it is something that is standard along the whole API. But I notice this from the API reference:

(even? n)
Returns true if n is even, throws an exception if n is not an integer

So it seems that the design is not homogeneous throughout the API as some functions throw exceptions. I would probably prefer a throw when the type of arguments is wrong, but people with lots of Lisp experience might have a different view on the issue. Anyway I would like to understand if the lack of homogeneity is a feature or a bug.

Social network sharing
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn
  • connotea
  • FriendFeed
  • Twitter
  • Yahoo! Bookmarks

When doing Groovy scripts I sometimes have the need to redirect the output of a part of the computation that is going to standard out.. A possible solution would be to open a new Writer and change the code to write to it (i.e. replacing all prints with newStream.prints), this, of course requires changing all prints, which is cumbersome and boring. There is a lazy alternative, using method references:

s = new PrintStream(new FileOutputStream("/tmp/myOut"))
def print = s.&print
print "a"
print = System.out.&print
print "b"

In this case the a is written to /tmp/myOut and b gets back to the standard output again. The big gain: all those prints in a script (and printlns) don’t need to be changed! Lazy me is happy.

Caveat: I would be careful in using this strategy a lot, it is be very easy to loose track of what is happening to the output. But it can be quite an expedient way to redirect prints on simple scripts.

Social network sharing
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn
  • connotea
  • FriendFeed
  • Twitter
  • Yahoo! Bookmarks