GUI metaprogramming example

Preamble

This is an example of metaprogramming in Jython. I would really like to have a simpler example (either in just Python or Java), but this is directly taken from what I am doing. The idiom that I am using, Pythonwise, is a bit strange and old (I am using eval instead of __getattribute__), that is because of Jython’s limitations. This can be seen as a more advanced programming technique (If you are starting to learn programming, you might want to skip this for now, just to avoid excessive entropy in your learning process). Although this example is in Jython, it applies to many programming languages (Python, Java, Ruby, Prolog, …) but not C or C++ (or Caml, unfortunately).

The problem at hand

I am doing a selection detection workbench (to detect loci under selection). At certain points in time, I need to disallow the user to input data to a lot of entry fields, like these:

Disabled fields

As you can see, they are all disabled.

How to do this? Option 1, go to all entry fields, one by one (more than 10, and changing) and call the method setEnabled(False). Lots of repeated code, and when there are changes I would have to add/remove a setEnabled.

Option 2. Do a piece of code to inspect my panel (a panel is what contains all the fields) object, check all object attributes that are entry fields and disable them. The point here is doing code that operates on the code itself. In this case, if one adds a new entry field to a panel, the code would automatically detect the field and disable it. How to code this?

1
2
3
4
5
6
7
8
9
10
import java.awt.Component
 
def disablePanel(panel):
    attrs = dir(panel)
    for attr in attrs:
        try:
            if eval('isinstance(panel.' + attr + ', Component)'):
                eval('panel.' + attr + '.setEnabled(False)')
        except TypeError: #Some attributes are write only
            pass

A small piece…
Line 4 (function dir) gets all attributes for the panel object.
Lines 7 and 8 do all the interesting work (eval, isinstance).
First, eval takes a string and executes it, so if you have

i = 1
i = eval('i+5')
print i

Will print 6. eval is very powerful (think about the possibilities of changing code in runtime). It is also quite dangerous, but I will not discuss that here…

isinstance checks to see if a certain object is an instance of a certain class, so

i = 1
print isinstance(i, int) # Will print True
print isinstance(i, str) # Will print False

So, back to our code
if eval(’isinstance(panel.’ + attr + ‘, Component)’):
is evaluating if panel.’attribute name’ is an instance of Component. For instance, my panel has a attribute, called core (storing the number of cores), which is a drop down list, so, when the code checks for isinstance(panel.core, Component), it will eval to True and execute the next line which is:

eval(’panel.’ + attr + ‘.setEnabled(False)’)
It will evaluate panel.’attribute name’.setEnabled(False), i.e., disable the field, in our previous example, it will do panel.core.setEnabled(False).

I will not explain the exception code as it is not important here.

So, a few lines now make it automatic to disable new entry fields, this without changing the code every time a field is added or removed (other than adding the field itself). Less code to maintain and less possibility of bugs.

I wanted just to illustrate the principle (the language used is not really important), but I need to stress out a fundamental point about this particular example in Python: Because of some Jython particularities I am using an old dialect to do this (Python gurus might be horrified), if you are using Python I recommend you to check __getattribute__ (to replace eval).

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Java, Jython, Python, bioinformatics, metaprogramming

by: tiago

No Comments

Python, Ruby, Java and Threads

Greg Tyrelle made a very important comment regarding exploiting multiple cores and Python (which will surely be included in my next part on bioinformatics and multi-core computing).


First, my understanding of python threads is that they are not separate system level processes, but some kind of fake process that are python specific ? Trouble is that I see two separate process when I launch two Blast runs via threading ?

The other aspect of threading that I’m still not entirely clear about is how the global interpreter lock (GIL) fits into the picture. I get resource locking to prevent race conditions, but is the GIL also invoked each time an action that manipulates memory takes place in a thread ? I’ve heard this property of python makes it unsuitable for multi-core programming ?

I will trade formal correctness for clarity of explanation (namely I won’t discuss that much the difference between thread and process, as it would make this too techy and confusing).

Python uses real (i.e. native) threads. Ruby uses the so called green threads, those are “fake” (simulated). Ruby 2.0 will use native threads.

So, in theory, Python is OK in multi-core architectures. In practice there is a problem, a serious one, identified by Greg: the Global Interpreter Lock (GIL). The GIL makes it impossible for more than one thread to be executing Python code at a time. When you are dealing with Python code, even if you have many threads with many cores, only one thread can be executing Python code. This is not as serious as it looks, there are 4 ways to live with this:

  • If you use a thread to start an external process, that process is not under the control of the GIL (it is a separate process), so it can run concurrently (think BLASTing something) as it is running outside Python, that is, it will be using a different core. So I think it covers one fundamental use case in bioinformatics: using external, computationally intensive, programs. In fact you can start as many instances of external programs as the number of cores you have (or even more, in case you think it will be advantageous). Note that the thread that calls the external application will block (well… depends, but for simplicity lets assume it), but your other Python threads can continue in concurrency with the application.
  • This is subtle, but important: If you use CPython (the standard implementation), and you do your computationally intensive stuff in C (which makes sense - and is a common strategy - as Python is quite slow) then the C code, as long as it is not interacting with Python objects, can release the GIL and therefore make use of multiple cores. The Python code uses only one core, the C part might be using all the remaining available ones. This approach is not valid for Ruby because of the green threads issue (I am a simple Ruby newbie, so take my words with a grain of salt).
  • Now… this GIL problem (or the green threads issue in Ruby) disappears if you use Jython or JRuby, as they use the JVM native concurrency mechanisms which have no notion of acquiring an exclusive lock for execution. By the way you can also use JVM based interpreters to call native (non-JVM) applications (think BLAST again, from inside Java). To put this point in another way: the GIL/green threads problem is not a language limitation, it is a limitation of the standard (C based) implementations that other implementations might not share (and the Java implementations, in fact, DO NOT).
  • If you think about grids (and not multiple cores) then the problem disappears as we are then talking of different processes (even more, running on different hardware).

I am afraid of being too techy with this post (I am probably labeled as 100% computer nerd by now ;) ), but I think Greg’s point is fundamental and required some discussion.

In my defense ;) I would like to say that I am only writing too much about programming because I am in some sort of professional unclear phase, as soon as things get back on track I want to focus more on the biological part of things… Until there I will be writing of the issue that I know better, and that is, for better or worse, informatics…

Comments, especially constructive criticism, is, as always, welcome…

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Java, Jython, Python, Ruby, bioinformatics

by: tiago

3 Comments

PDB, accessing data, APIs

I was reading Depth-First’s article on The Best API May Be No API At All: PubChem and PDB and decided to relay here my experience in helping a colleague processing of PDB data.

To begin with, that person wanted to bulk analyze many (thousands) PDB files, furthermore she only knew Java and very little of it.

My suggestion was:

  1. Download all PDB files from PDB using ftp. All being the keyword here
  2. Use Python
  3. Parse the files yourself (i.e., don’t use Biopython’s Bio.PDB)

This goes in line with the idea of the “best API being no API at all” (I am not suggesting this generally, but in this case it made sense).

I suppose some justifications to a lot of counter intuitive suggestions might be in order…

For point 1: The person really wanted to analyze a lot of files in bulk, it made sense just to download them all. As far as I remember we are talking of less than 10GB. I ask myself, that, even in cases we only want to use hundreds/few thousand PDBs, this might make sense: 10GB download is not that much nowadays, it doesn’t take that much space on disk, it doesn’t take that much bandwidth. Regarding being friendly to RCSB I ask what is worse for them: A big download or many queries using CPU, databases, etc? For users, they can now query locally, and if you look at the PDB format, a few pipes of greps can go a long way and give a lot of flexibility.

For point 2: I would like to stress out that the person knew very little of Java. I contend that learning Python (with a smoother learning curve than Java) takes less time and is less frustrating (at least for users that are concerned only with results and not with the “joy of programming”) than learning/using the remaining Java plus the required system and Bio libraries (remember, Java libraries are much tougher and over engineered than Python’s).

For point 3: PDB file format is reasonably easy. Between learning a new API (which is not for free and requires understanding the API developers mind) and processing the files manually I suggested processing the files manually. This had the added benefit of making the person learning simple and very useful file processing. Please note that I am not suggesting reinventing the wheel (in fact I tend to be strongly opposed to that). But with easy file processing it seemed to make sense. I would like to say, in my defense ;) , that I suggested using the wonderful matplotlib for chart drawing and it never crossed my mind suggesting implementing a chart library from scratch.

So, sometimes, not using an existing API might be an approach worth considering.

PS - I still stand by my suggestions. Currently the person seems to have lots of questions about the chemistry of the problem. The programming problems are very rare. And I think that is the main point, computing and programming should not be the fundamental issue.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Java, Python, bioinformatics, chemistry

by: tiago

No Comments

Easy to use bioinformatics interfaces (2/2): MODELER4SIMCOAL2

In yet another shameless promotion exercise I would like to present a easy to use interface in the area of coalescent simulation:

MODELER4SIMCOAL2

modeler4simcoal2 (m4s2) is a modeler for coalescent processes. It allows the modeling of both demographies and chromosomes (i.e., markers with linkage relationships in multiple chromosome blocks).

m4s2 is a Java Web Start application (requiring Java 1.4, available for Windows, Mac and Linux among others). It requires no installation and can be run directly from the web.

The purpose of m4s2 is to allow biologists to concentrate more on biology and the underlying models used on analysis (and less on having to learn a new computer simulation tools). We expect that m4s2 will lower the barrier for coalescent simulator use.

m4s2 was published on Bioinformatics.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Java, Jython, Python, bioinformatics

by: tiago

No Comments

Jython tip: instanceof

Imagine that you need this kind of Java dialect in Jython:

  if (anObject instanceof aClass) {

I.e., to check if a certain object is an instance of a certain class (note, this will work if it is an instance of a subclass also)

This is quite easy to do in Jython:

 if isinstance(anObject, aClass):
Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Java, Jython, Python

by: tiago

No Comments

Programming languages and platforms: Existential doubts

Through my whole career I was torn between what I like (Prolog and Caml) and what makes me marketable (Java, VB, Perl, C, Python, C#). Of course, the world is not black and white so, in the list of marketable languages there are some that more likeable to me than others.

Java is acceptable, but is too verbose and the libraries are grossly over engineered (with no apparent advantage), of course a DSL framework is nowhere to be seen. Python is also acceptable, but the lack of DSLs (and Guido explicitly stating that he is not going in that direction in Python 3000) makes it loose a lot of its sex appeal, also, less importantly, I have some bias towards strong typing.

Enter Ruby: DSLish, very pragmatic, a vibrant community, a fantastic JVM implementation, and one can $$$ on it… I am giving it a try.

Regarding platforms, a less important issue for me, I am quite happy working on top of the JVM: multi platform, stable, industry support, really open…

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Caml, Java, Python, Ruby, declarative programming

by: tiago

1 Comment

From Java to Python and Prolog

A few years ago I was working with Prolog, which I liked. Prolog will not make do much good to ones employability as such I have decided to go the Java way. I quited my Prolog job and found some Java work.

During the years that followed I have become more or less of a J2EE specialist. Considering Java an acceptable language I have noticed that much of my J2EE knowledge was not a productivity advantage but really what is needed to “tame the beast”. J2EE is overcomplex and overengineered. I have spent most of my time understanding gazillions of TLAs instead of solving real problems. Frustration is a good word to describe what I feel.

As such I have decided to go back to Prolog. I believe in its elegance. I believe strongly in domain languages, in which Prolog excels.

I have decided to also go the way of Phyton: Its very elegant for an imperative/OO language, has a big community of smart people that are as interested in technology as in having a job, and it “fits my head” (like Prolog and oposite to J2EE where the complexity of the system is overwhelming).

I don’t know about the future, but for now I will be much more productive, more happy, surrounded mainly be people that like what they do and excel on what they do.

Employability in the future? Maybe Python will help (no chance with Prolog), if not at least I will have good time.

Time to get back a little of that adolescent mentality of not give up to the system and be a bit more spontaneous.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Java, Python

by: tiago

No Comments

WebSphere: a rant

I have, in the last few years, been very supportive of IBM, in several ways. I think their approach to Linux and open source in general is quite good from a pragmatic point of view. Their new generation of software is much much better than the old stuff (I had the displeasure of using their old RS/6000 with AIX). I have suggested and helped to implement migrations from Oracle to DB/2. I also see their Service oriented approach as a intelligent move.

But, from a developers perspective its very difficult to like WebSphere. Its heavy, slow, sometimes I even question its reliability.

I also tend to like very tight circles of develop/test/develop/test/… I like to do a small change, call ant (or something similar, depending on the environment that I am supposed to work with) which will do for me a round of automated unit testing. As I am doing this all the time, I need fast deployment times, which is the opposite that I get from WebSphere.

Does anyone has suggestions on improving the deployment time? As I tend to use lots of application servers, I end up not specializing in any, thus I might be failing some obvious optimization step…

WebLogic seems, overall more development frendly. But JBoss is the clear winner on development friendliness (failing on the documentation part).

I am aware that, on production, the parameters of evaluation are very different, but my experience of continuously administrating a application server is very limited and I don’t have the level of maturity to comment on that.

But, going back to IBM, at least Eclipse saves the day.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Java

by: tiago

No Comments

App Servers performance for development

I am testing quite a few Application Servers and Tools in regards to certain parameters (I will talk about some of these in further posts).
From startup time to deployment time and to memory consumption most of the application servers could be called sometimes a developers nightmare. In some cases going to minutes just to deploy a very simple EJB application (The machine where I am doing this has reasonable specs like 768 MB memory). I know that on a production environment most of these problems are of minimal importance, but when developing they clearly cause a productivity problem.

Regarding deployment times and lightness, of the application servers tested, JBoss is the clear winner, by far, making it the only application server suited for development (when you have tight development/unit test cycles, which is my case).

Some application servers, when used with the IDE from the same vendor tend to a bit (but not much) better (mainly because of programmatic integration of deployment). But is this a good thing? I suppose its better to have a market where different components (like app servers and IDEs) can be plugged at will with the same level of integration, thus increasing competition and choice on the component level. Using the best app server with the best IDE even if that means different vendors.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Java

by: tiago

No Comments

Java is more Efficient than C

Before discussion I am assuming the following:

  1. Server-side development
  2. Efficiency meaning less time to develop
  3. Efficiency meaning less maintenance hurdles
  4. Efficiency meaning faster execution!

Points 2 and 3 are pretty obvious: From the language and API advantages Java code is smaller and less dirty than C code. What is not that much obvious is that in 90% of the cases these are the most important types of efficiency. That is, from an economic point of view its much better to be conservative on developer and management time (human resources) than on, say CPU time or memory.

But I would also like to argue that in most cases Java code (server-side) is faster. A short story:
A macho programmer that I know decided to implement an highly threaded proxy server in C. Threaded programming in C/Unix (pthreads in the case) is quite awful. Also, other things for the project at hand are already available out of the box in Java, but not in C. It so happens that the proxy has a performance bug (still to be discovered) that puts the CPU at 100% utilization.
Yes, I know that with more careful programming (note: this guy is not bad at all) the C version could be much better. My point is that the human ergonomics of a language should also be considered regarding the performance of it, so as C is a dirty language, there is a bigger possibility of your code be less efficient because its easier to make (performance) mistakes.

There is also another reason why Java (especially on the server-side) can be faster: Clever engineers designed APIs and application servers for you to use and architected it with performance also in consideration.
You might have a deep understanding of your application domain, you might even understand server architectures to a certain degree, but do you really believe that you can do that better than dozens of engineers that concentrate solely on making clever application servers and APIs? I understand a little bit of infrastructure, but I prefer to outsource it, and concentrate on my core business.

By the way, a link on Macho programming

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Java

by: tiago

No Comments