Archives for July 2007

Bioinformatics, multi-core CPUs and grid computing: developer perspective (3/4)

First, my apologies for this long overdue part on developing life sciences applications considering multi-core CPUs and grid environments.
This text is somewhat long, and most probably will be revised in the future.

Developing for multi-core CPUs and grids borrows a lot (everything really) from concurrent, parallel and distributed programming. I do not intend, at all, to cover all those issues. That would be enough to fill in several books and articles. My objective is much smaller and simpler: to discuss some basic intuitions that can be interesting for people who develop “old style” software.

IMHO, the fundamental intuition is: Stop thinking synchronously, start thinking asynchronously… “What is this guy saying?”, you ask…

From synchronous to asynchronous: An example

Lets imagine that you have an application, computationally intensive, that simply does not get any performance improvement on your shiny new quad-core processor.

That application has a function, lets call it suck_cpu that takes 5 minutes to run, and you need to call it 10 times, each run being independent from previous runs, like this (example pseudo code in Python):

#we have an array input_parameters to be feed to suck_cpu
results = []
for i in range(10):
  results.append(suck_cpu(input_parameters[i])) # Takes 5 minutes to run
#The for loop took 50 minutes to run
for i in range(10):
  print "results from execution", i, "were", results

So, you run suck_cpu and you wait (block) until you have the result. You do this 10 times. For 50 minutes.

An alternative would be to ask to system to start a “task” running suck_cpu and your main program would not wait (not block) until the function ends. You would start 10 instances of suck_cpu, not wait any time (well, wait very little) and get back to your program. What could you do after starting the tasks (and not having the results back for now)? Well, it depends, but you could cater for your user (i.e., not block the user interface), do other tasks that are independent of the results of suck cpu or just wait (while informing the user of progress) for the tasks to end.

Even in the case of just waiting for results you would profit a lot: Your task scheduler could run n tasks in parallel (n being the number of your CPU cores), effectively getting all the computing power of your CPU (the main point of these series of articles).

So, how would your code look like?

#we have an array input_parameters to be feed to suck_cpu
results = []
tasks = []
for i in range(10):
  tasks.append(schedule_tasks(suck_cpu(input_parameters[i]))) # returns "immediately"
#The for loop ran in "no time"

So you would ask the task scheduler to run your suck_cpu tasks. At the end of the for loop your code was in control and the task scheduler was busy running your tasks. You would have the task IDs back (not the results). How was the task scheduler operating? Well, that would depend on its internal workings, but your tasks would probably be in one of 3 possible states:

  1. QUEUED - The task would be ready to run, but not started yet
  2. RUNNING - The task is running
  3. DONE - The task is done and you can collect the results

You could query the task scheduler for the state of your tasks, something like this:

for task in tasks:
  print "Task", task, "is in state", get_task_state(task)

The output, at the very beginning would be something like this:

Task T1 is in state RUNNING
Task T2 is in state RUNNING
Task T3 is in state RUNNING
Task T4 is in state RUNNING
Task T5 is in state QUEUED
Task T6 is in state QUEUED
Task T7 is in state QUEUED
Task T8 is in state QUEUED
Task T9 is in state QUEUED
Task T10 is in state QUEUED

Somewhere in the middle of the execution you would probably find something like this:

Task T1 is in state RUNNING
Task T2 is in state DONE
Task T3 is in state DONE
Task T4 is in state RUNNING
Task T5 is in state RUNNING
Task T6 is in state QUEUED
Task T7 is in state RUNNING
Task T8 is in state QUEUED
Task T9 is in state QUEUED
Task T10 is in state QUEUED

Notice one very important point, task T1 was not the first to finish, although it was the first to be scheduled. Also, T6 started before T5. The fundamental issue here is that execution of concurrent stuff is non-deterministic. That is things don’t normally happen in a clear, predetermined order, there is some degree of randomness even if your code is not stochastic at all. This will play a fundamental role on how things will have to be thought and designed… We will discuss this later.

Notice how a (not very dumb) task scheduler will be using all your CPU cores (ie 4 task are running simultaneously).

Of course, at the end, you expect all tasks to be in state DONE. Now you can collect all the results (by the way, you can collect and process results from individual tasks as soon as each one is ready).

You might be thinking: “woo that is a lot of overhead… now I have to keep track of tasks, check their state and so on…”. Yes, there is some overhead with all this, furthermore you have to change the way you are used to think. But I contend that this way is more natural and more close to the way you think: Tasks that are independent were only run in sequence because the existing programming paradigm forced you to conceive then in sequence, but, in your head, you know that they are independent and you were just forcing your way of thought to be similar to to the computing paradigm.

Well, maybe I am just saying this because of what lies ahead… :( . I am now going to discuss new types of bugs you have to be aware of and non-determinism.

The dark side: resource sharing and debugging

Unfortunately this type of programming paradigm introduces some new kinds of problems that you have to be aware of. I would like to separate them in 2 types of problmems:

  1. Resource sharing: Imagine that your tasks dump your output to a printer, you don’t want them to output to the same printer at the same time (like having parts of the same sheet with results for different tasks, interleaved). Most probably your tasks, while not sharing a printer, will share data structures and access to them will have to be cleverly controlled
  2. Non-determinism: Tasks run independent of each other, task schedulers schedule tasks in unexpected ways. This means that your code will run differently every time you run it, even if it is completely deterministic by itself.

The dark side: An example

I will use here the standard, boring example that is always used in concurrent programming:

Imagine that you have a bank account, your bank account has 50€. You make a deposit of 30€, at the same time somebody cashes in a check from your account (i.e., makes a withdrawal) of 20€. So, the end balance should be 60€.

Now, imagine that the code, at your bank to do this is:

def deposit(account_no, amount):
    before_deposit_balance = get_balance(account_no)
    set_balance(account_no, before_deposit_balance + amount)
 
 
def withdrawal(account_no, amount):
    before_withdrawal_balance = get_balance(account_no)
    if before_withdrawal_balance> amount: #Money has to be there...
        set_balance(account_no, before_withdrawal_balance - amount)

You would expect that these operations run atomically, i.e., one separated from one another, that is, on your account, this would happen:

#Current balance 50 Euros
before_deposit_balance = get_balance(account_no)
set_balance(account_no, before_deposit_balance + amount) #adding 30
#Current balance 80 Euros
before_withdrawal_balance = get_balance(account_no)
if before_withdrawal_balance> amount: #Money has to be there...
    set_balance(account_no, before_withdrawal_balance - amount) #Removing 20
#Final is 60

Another possibility would be the withdrawal to occur before the deposit (Start with 50, down to 30, final 60 again).

But, now imagine that the code runs like this (remember, the tasks might be running concurrently, and they don’t know what other tasks are doing)

#Current balance 50 Euros
before_deposit_balance = get_balance(account_no)
before_withdrawal_balance = get_balance(account_no)
set_balance(account_no, before_deposit_balance + amount) #adding 30
if before_withdrawal_balance> amount: #Money has to be there...
    set_balance(account_no, before_withdrawal_balance - amount) #Removing 20
#Final is 30

That is, both operations read the same amount and they both get 50. Then deposit writes 50+30, making it 80. After that withdrawal writes 50-30 making it 20. You just lost 30€ to the bank!. By the way, it could happen the other way around (you would end up with 80€), as an exercise you might try to imagine how the execution trace would look like for the final result to be 80€.
This is a problem of both resource sharing (i.e., both operations are sharing the same account) and non-determinism (i.e., execution of both operations can happen in a lot of different, interleaved, ways). There are actually many ways to solve this, the most obvious is to lock the account in some way, like this:

def deposit(account_no, amount):
    lock_account(account_no) #Get exclusive access to the account
    before_deposit_balance = get_balance(account_no)
    set_balance(account_no, before_deposit_balance + amount)
    release_account(account_no) #Release exclusive access to the account
 
 
def withdrawal(account_no, amount):
    lock_account(account_no) #Get exclusive access to the account
    before_withdrawal_balance = get_balance(account_no)
    if before_withdrawal_balance> amount: #Money has to be there...
        set_balance(account_no, before_withdrawal_balance - amount)
    release_account(account_no) #Release exclusive access to the account

In this case, when an operation starts, it invokes some magic which guarantees that account access is only granted to it while the operation lasts. If another task tries to lock the same account, that other task blocks until access is possible (this means that you should be careful with locking as you might loose parallelism when locking).

By the way, this introduces problems by itself (e.g., forgetting to release a locked account).

But what drives developers mad is the fact that the execution is non-deterministic, so, you might run your concurrent code 999 times and it works perfectly, but on the 1000th run some strange thing happens (which is very difficult to replicate, as it only happens, in this case, on average 1 every 1000 times).

There would be much more to say about this topic, but I think (hope) this relays the fundamental intuitions to help you start developing programs that make use of multi-core CPUs (more so, help you easily adapt existing software).

Grids

Programming for grids is actually quite similar to multi-core programming. The fundamental issue is that memory sharing is more difficult (takes more time), while in a multi-core computer the memory is shared (so it is easier and faster to share information between concurrent tasks) in a grid memory is not shared. This means that if your tasks need to communicate a lot then, in a grid you can expect time and network bandwidth to be lost during communication between tasks, this is probably the fundamental cause that in grids many algorithms don’t scale linearly with the number of grid nodes (i.e., if you double the number of grid nodes, your performance doesn’t duplicate as you loose with communication overhead). Anyways, lots of communication means lots of horrible bugs to correct, so try to keep communication low ;) .

The final part

In the final (fourth) part of these series I will discuss a real, existing case of Python code to deal with multi-core architectures. This code will be based on examples from population genetics simulators (selection detection and coalescent). Included will be a real example of a bug that I suffered when running a selection detection program in parts, concurrently :( . This part will be ready sometime in August.

Other approaches

While there are many issues that could be discussed, many technicalities, I will stop here. In the future I might discuss such topics as state and problems caused by it in concurrent programming (ie, advantages of functional and logic programming) or language support for concurrent programming (functional programming paradigms, Erlang support, Fortress, …). If I get back to the topic it will be mainly to discuss either linguistic approaches to the issue or anything “modern” (all the rest you can easily read in very good, old, well-established literature)

Part 5 of 4

This 4 part article will have a part 5 ;) . I decided that I want to talk a bit about volunteer computing projects like BOINC.

Article too big or complex? Don’t hesitate to comment, I am pretty sure that this text will need revision and your input is more than welcome.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: bioinformatics, software engineering

by: tiago

4 Comments

Average quality of science information sources

Nowadays most of my science readings come from Google Reader. I have mainly 2 folders: “bio” for scientific journals and “bioblogs” for Blogs.

I can “read” “bio” in a half a day per week or so (I have more than 30 feeds). I need almost a day for “bioblogs” (less than 15 feeds). The number of articles on “bio” is probably one or two orders of magnitude higher then the one on “bioblogs”.

To put in another way: I consider the blog content to be of much higher quality than the scientific journal counterpart…

…And I think I know the reason…

…Currently, scientists are mostly evaluated by their ability to publish in scientific journals (that postdoc grant, financing a project, getting a tenure…), so there is strong pressure to publish. People publish everything that is, in their self evaluation is… good publishable.

On the other hand, blogging is still mostly something people do because they want and they feel is valuable.

Of course, we can already see, informally, that people profit from blogging (make contacts, publicity, etc). I would bet that in the near future, blogging will be in formal evaluation processes also (I can already smell measures of quantity of blog posts, inbound links and stuff…).

Care to make a prediction on the average quality of blog content in the future?

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: science

by: tiago

4 Comments

Easy to use bioinformatics interfaces (1/2): BACA

Starting a shameless self promotion exercise, I would like to present two easy to use interfaces (with completely different target users and objectives) in the field of Bioinformatics.

The first one is BACA, a multiple mitochrondrial genome retriever, organizer and visualizer. BACA, allows the retrieval of multiple complete mitochondrial genomes. It will split all of them in features (cDNA, origins of replication, tRNA, …) creating FASTA files per genome (ie a FASTA with all components from a single genome) and type of annotation (ie a FASTA file will a certain gene from all genomes downloaded. BACA also provides an SVG visualizer, allowing to visually compare multiple genomes. Its main purpose is to help biologists download and organize mitochondrial data from GenBank, ie, its targeted at users which no nothing about scripting at all.

BACA was published on Molecular Ecology Notes. It is a (ugh!!!) Perl application.

It was developed as a toy in a Phylogenetics course in my MSc and ended up as a publication and public web service… But it is not much more than a useful toy really.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: bioinformatics

by: tiago

1 Comment

String concatenation performance in Python

Strings in Python (as in Java) are immutable. That means, that when you concatenate 2 strings what you are really doing is creating a new one from the 2 old ones. This can be very inefficient, how much? To the point of, in Jython, talking days just to prepare a String with around 400Kb, with several concats per line.

Solution? Can be found here
.

So, instead of doing str += ‘bla’, do something like str = “”.join([str, ‘bla’]).

I would also add an important tidbit: If you need to do lots of concatenations, append to a list and join at the end, don’t do join over join over join, ie, don’t do:

my_str = ''
for i in range(1000):
  my_str = ''.join([my_str, str(i)])

Do instead:

my_str_list = []
for i in range(1000):
  my_str_list.append(str(i))
my_str = "".join(my_str_list)

If you use the first dialect the result will be as bad as +=. If you use the second, things that took days will take less than a second.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Jython, Python

by: tiago

No Comments

Bioinformatics, multi-core CPUs and grid computing: User perspective (2/4)

Although we can expect (wish?) that most bioinformatics applications in the future will support multi-core (and perhaps, grid) computing, currently most applications were not made with multi-core CPUs in mind (although some are already capable of using grids). Here we discuss three different kinds of typical scenarios that users might find. Each scenario is illustrated with at least an example application.

Single-core application to be run multiple independent times

Some applications are sometimes run multiple independent times in order to determine intervals for which certain parameters are expected to fall.

One example are population genetics’ simulators (both coalescent and forward-time) like CoaSim or simuPOP. These simulators are run thousands of times under some demographic scenario in order to determine e.g. where certain intervals for certain statistics (e.g. Fst) fall for neutral (i.e., that are not candidates for selection) markers.

The strategy here is quite simple: To divide the workload in a number of tasks that is equivalent to the number of cores available.

As an example, imagine that you want to run 10.000 population genetics’ simulations using simuPOP and you have a machine with 8 cores. You simply instruct 8 simuPOP instances to run 1.250 simulations each.

There are a few issues that require some care, though:

  1. You should make sure that output directories are different (and possibly input files also)
  2. All the instances running have really to be independent: You have to make sure that random seeds are independent. If a random seed is specified in one of the input files, then you really have to have different input files.
  3. In the end you will have to concatenate in any way all the results.

Programs that are grid-ready

Some programs, like Migrate were designed to be run in a parallel environment, normally MPI (Message Passing Interface). It is very easy to make these programs use multiple cores: Install MPI on a single machine and configure it by saying that the maximum number of processes that can be run locally is equal to the number of calls. Then mpirun your application calling it with a number of processes equal to the number of cores (normally the parameter is -np).

PS - Regarding forward-time simulators like simuPOP, some of these are MPI based, but really, they only allow to parallelize a single simulation, by eg, simulating each population on a different node. If the objective is to run a very long single simulation, the simuPOP falls under the category of grid-ready, but it the objective is to run many simulations than it becomes an example of a serial application that runs multiple independent times (ie, the previous scenario).

Other cases

In reality, in other cases where you only want to run a single instance and the program does not support MPI there is really no way around it, programs like PAUP* come to mind. It is especially bad in cases where the code is closed source, as some programs internally are really running multiple independent runs of something and could be changed to take advantage of multiple cores. Some programs implementing Maximum Likelihood approaches will probably be somewhat easy to parallelize.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: bioinformatics

by: tiago

2 Comments

Malaria

I am back from almost a week of traveling visiting the Swiss Tropical Institute and Liverpool School of Tropical Medicine.

Thus no posting here during the last few days…

I expect that the number of posts in this blog regarding neglected diseases will increase exponentially in the next few months.

Now, back to the “scheduled program”… I expect to post before Tuesday the second part of a series of posts on the consequences of multi-core CPUs and grids in Bioinformatics.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: malaria, neglected diseases

by: tiago

No Comments

Bioinformatics, multi-core CPUs and grid computing: Introduction (1/4)

A few years ago, one expectation was implicit when buying a new computer: that performance would increase more or less linearly with CPU clock frequency. That is, if you had a old 450 MHz computer and bought a new one of 900 MHz you would expect that the new one would have more or less twice the speed of the old one.

If you had a computationally intensive bioinformatics application, you would then expect that it would run at twice the speed. So if it took 4 weeks to complete a certain task, it would now complete in only 2 weeks. Nice!

For technical reasons (limitations is a better word), since a few years ago CPU manufacturers like Intel, AMD and IBM resort to not increasing the CPU frequency but to provide more cores. That is, when you have a dual core CPU with the same CPU frequency(*), a single task will still take the same time to complete, but you could do two tasks simultaneously and they would not steal CPU time from one another.

So:

Imagine that you have 2 tasks of four weeks each (on a single core 1 GHz computer):

On that machine the whole operation would take 8 weeks (2 tasks * 4 weeks / 1 core) weeks.

On a single core 2 GHz machine, the whole operation would take 4 weeks (2*2/1) weeks.

On a dual core 1 GHz computer the whole operation would take 4 weeks (2*4/2). But here is the catch: You would have to start them at the same time, concurrently. Because if you started one after the other it would take 8 weeks (and you would be half of your processing power - one of the cores would be doing nothing).

It could be expected that the increase in computing power, in the future will be a lot like this: multi-cores with more than one CPU, but each core speed staying more or less the same.

So, if you have a brand new machine and your computationally intensive application still takes ages to run, this might be the cause.

In a series of 4 blog posts I will discuss the consequences of this change for both users and developers. The parts will be:

  1. Introduction - You are reading it.
  2. Consequences to users - I will discuss the consequences of this paradigm shift in a user perspective. I will present some real situations (using existing bio software today) and suggest strategies to take the most performance out new hardware. Scenarios range from complete frustration (ie, only a single core can be used, thus there will be no noticeable performance gain) to total gain (ie, there will be linear gains with the number of new cores introduced).
  3. Consequences to developers: Design and concepts - I will discuss the changes that bio software developers will have to consider in order to make their applications multi-core aware and thus, use all the performance available on new machines. The key words here will be: asynchronous calling models, concurrent programming, memory sharing, message passing.
  4. Consequences to developers: One practical example - I will present a framework, in Python, to facilitate the development of multi-core aware applications.

In each post I will also discuss grid computing issues briefly, as taking advantage of grids is sometimes similar to multi-core performance gains.

As during next week I will be traveling, the next post should only surface around Monday, 15th. Please accept my apologies in advance for this delay.

(*)CPU frequency is really an erroneous simplification, but for the sake of simplicity I use it.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: bioinformatics

by: tiago

1 Comment

Ruby: Hello World

I happen to be a strong believer in DSLs, in fact, unless for the most computational intensive stuff, I would do everything as a DSL.

In my current setup (research in bioinformatics), I can essentially choose the tools I want. My only constraint is having decent bioinformatics and graphics libraries (ah… and I try to do everything inside a JVM). The constraint excludes Prolog and OCAML, so, based on previous experience, I was working with the pair Python/Jython. Guido explicitly stated that Python 3000 is not going the DSL way. Its his option, I am all for having different approaches to programming, but that is not my option.

Enter Ruby: enough bioinformatics libraries (well, from JRuby one can use JFreeChart and BioJava and, of course, BioRuby) and DSL support.

As my Hello World application I decided to immediately use Ruby’s DSL features, so my first application was a Web Template language (yes, yet another), which I describe here:

Fundamental concepts

  • template - A template for a certain type of page: A title page template, a multicolumn page template, …
  • snippet - A part of a page: A navigation bar, an embedded RSS feed, …
  • page - a certain page: The entry page of my website, the page about bioinformatics. Pages are template based and can use snippets.

The fundamental idea is that, for each template there is a template language tailored for that template, for instance, my entry page looks something like this:

title "Tiago's virtual house"
abstract "Bioinformatics, software development, sports (doing, not watching), cinema, music, ..."
topic ("Bioinformatics") {
  summary "Here you will find software for life sciences"
  subtopic "Soft4Life" {|f| in_link f, "soft4life"}
  subtopic "Molecular adaptation"
  subtopic "Tropical diseases"
}
topic ("another") ...

Different kinds of pages (i.e., with different templates) will have different languages

Interesting Ruby features

  1. instance_eval - instance_eval seems to be the workhorse behind Ruby’s DSLish style. Mainly instance eval takes a string and executes it making the name scope of object visible without having to explicitly refer it, that is, imagine that you have an object myCar of class Vehicle, which has a method called start. In that script you can do just start and not myCar.start. That (coupled with less parenthesis clutter) makes the thing work.
  2. attr_reader and friends - attr_reader is an expedient way of having a getter/setter pattern, nice to spare keystrokes. The annotation/decoration that seems to go with this sure deserves research…
  3. Method catching - I am using method_missing to (naively) convert any non existing method name of a template class to an HTML element, so if one calls object.a it will render <a>…</a> (if one does object.shaite, yes, it will do <shaite>…</shaite> ;) )

“Problems” with Ruby (ie, showing that I am a complete newbie)

I did not like the following things:

  1. yield inside instance_eval (show stopper?) - When yielding inside instance_eval, the “inside object” scope seems to be lost. I.e:
      def sillyMethod()
      end
     
      def goneYielding()
        yield
      end

    The code called on yield will not get sillyMethod (and all others from the of the object yielding) on its scope.
    For me this is the biggest hurdle, can be a show stopper, I will research more here before continuing with Ruby…

  2. Parenthesis - Like this:
    topic ("Bioinformatics") {

    Are those parenthesis really needed (before the code block)? Ruby is quite nice in not needing parenthesis, but in this case I could not get rid of them and I don’t see why… (Actually, I see, its probably just my ignorance for now).

  3. Lots of Rubyisms still lying around - Like this:
    subtopic "Soft4Life" {|f| in_link f, "soft4life"}

    I don’t like to have to write |f|…, as it seems to force things to be too Rubyish (pun really unintended). Intuitively I would say that there is something about variable visibility inside code blocks that does not lend itself to easily to this.
    Also, I would like to do some code rewriting, like just putting in_link “soft4life” and then automatically rewrite it to be something like |f| in_link “soft4life”, f. I would bet that this is possible (again, newbie ignorance). This is not the best example of code rewriting, but I hope the point is clear…

  4. Yielding to multiple code blocks - I would like to yield to multiple code blocks. Seems ridiculous? I could do that in Prolog, and I can think of an example use case: When writing an HTML element, yield to a (first) code block to write the attributes, then yield to a second one to write the content.

Preliminary conclusions

I am still too green Ruby to make a decision (only this piece of code), but it looks good. I suppose most issues are due to my total inexperience with Ruby.

There are a lot of things that still need to be checked (like operators - can one change the semantics? And the precedence (a la Prolog)? And the association (Prolog again)? )…

Resources that I used (and recommend):

Programming Ruby

Ruby Standard Library Documentation

Jay Fields blog, especially this post.

Ola Bini blog, I am reading this metaprogramming post, from time to time, and my next explorations will be around what Ola talks on that post…

I am redesigning my site around this code.The source code will be available if somebody declares some interest…
This is my first Ruby program ever, three days work, please be tolerant with the newbie kind of comments that you surely have read…

PS - Just recently discovered, to be read in the future Creating DSLs in Ruby. I will return to this topic somewhere in the future.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Ruby, declarative programming

by: tiago

1 Comment

Jython tip: instanceof

Imagine that you need this kind of Java dialect in Jython:

  if (anObject instanceof aClass) {

I.e., to check if a certain object is an instance of a certain class (note, this will work if it is an instance of a subclass also)

This is quite easy to do in Jython:

 if isinstance(anObject, aClass):
Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: Java, Jython, Python

by: tiago

No Comments

European cinema

In Portugal I would recommend the following places to see european cinema:

  • Lisbon King, Quarteto, Monumental, Fonte Nova, Nimas
  • Porto Teatro do Campo Alegre, Cidade do Porto and, sometimes UCI Arrabida 20

In Portugal, most films (european or otherwise) are subtitled, so if you understand the original language you can still see them. Movies for children might be doubled, check at the ticket office.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • connotea
  • DZone
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati

Filed in: cinema

by: tiago

No Comments