Revisiting Groovy performance issues
I tried to drill down the Groovy performance issue that I had with what is in practice a text processing exercise.
The original code was written in Groovy (and then ported to Java, not the other way around), but as I was in a hurry it was written in idiomatic Java (I am too much of a Groovy newbie to be able to write in idiomatic Groovy if I am in a hurry). Ted Naleid left some great suggestions on how to be more groovyish.
Anyway, I took my original code and tried to understand what was going on, here are my findings.
Replacing duck typing with explicit typing took a minute out (from 4m to 3m).
Converting this
iCase.each { if (jCase.contains(it)) { isDifferent = false } }
to this
if (jCase.contains(iCase[0]) || jCase.contains(iCase[1])) { isDifferent = false }
took 1m10s (from 3m to 1m50s) - This is in a inner loop part.
8 seconds were gained by changing this:
for (int j=i+1; j<indivs .size(); j++) {
into
int iSize = indivs.size() for (int j=i+1; j<isize ; j++) {
As inline comments you can find how much time each line took in the
following inner loop:
for (int j=i+1; j<isize ; j++) { int jPos = indivPos[indivs[j]] //~20s String jCase = lineTok[jPos] //~10s if (jCase.equals('NN')) continue //~8s boolean isDifferent = true //2s if (jCase.contains(iCase[0]) || jCase.contains(iCase[1])) { isDifferent = false //7s } //whole if is ~ 30s - 23 condition, 7 assignment if (isDifferent) counts[indivs[i] + indivs[j]] += 1 //5 secs }
The only stunning thing is the time lost at indexing String arrays (and maps, but that I can understand).
This text is being written as I was changing and trying things, I gained 20s from
minor changes of which I lost track
. I am currently at 1m30s (down from the
original 4m and comparing with Java’s 4s).










March 24th, 2008 at 8:14 am
Looks like you can eliminate the “isDifferent” variable by moving the incrementing of the counts value into the if statement.
You can probably also store the iCase[0] and iCase[1] values in variables before the for loop to improve performance comparing to those values.
March 24th, 2008 at 11:19 am
Hi!
It was not my intention to criticize code quality or anything like that. I dont write groovy idiomatic code all the time myself. I have looked at some groovy code I wrote a few weeks ago. After having written more and more groovy code I could refactor a lot of the old code and eliminate at least half of the LOC count. It was more readable, too.
I would like to provide suggestions on improvement regarding performance, but I dont understand what you are actually trying to do. Could you post the entire code and not just snippets of it? That would be great.
I am using Groovy at work to load a few hundred database entries with lots of hierarchical relationships, traverse those entries to build a specific collections structure, so it can be converted to a specific JSON format for the view layer.
All of this happens in a few ms not more, so I think we can definetly improve your problem.
March 24th, 2008 at 11:34 am
Hi,
I have put the code here (Along with the Java version).
I would like to thank everybody that offered to help. I have sorted this out in Java and for me it is a done deal.
But we could use this exercise in order to understand and document possible groovy performance pitfalls. So, if you have any suggestions to enhance the code feel free to comment, so that we can help others in not making the mistakes that I have done.
March 24th, 2008 at 11:42 am
It would be useful if you could post a sample of the data you were using, even if only a subset of it. Otherwise others can’t try out the code
March 24th, 2008 at 11:53 am
Graeme,
Data? I can get you all the data you need
I am working with data from the public HapMap project.
Any autosome from here, should do fine. I recommend chromosome 22 as it is the smallest autosome (so less data).
Yoruban 21 - Still one of the smallest, should work directly (change the directory only on the BufferedReader line) with the groovy code (which only processes one chromosome from one population. The Java code does all autosomes of Yorubans and Utahans).
March 25th, 2008 at 8:06 pm
Hi there,
The code specified in your initial posting incorporates a bubble-sort algorithm which has a performance factor of Q(n^2) — not too good, especially as n (the number of elements to be sorted) increases. If n = 10 million, your code will loop 10 million ^ 2 times, or approximately, 10,000,000,000,000 times.
Your code will perform a lot better if you incorporate a quick-sort algorithm instead, since its performance factor is Q(n log n). If n = 10 million, your code will loop (10 million) (log 10 million), or approximately, 70,000,000 times — a vast improvement!
Bubble sorts have the following structure:
procedure bubbleSort( A : list of sortable items ) defined as:
do
swapped := false
for each i in 0 to length( A ) - 2 do:
if A[ i ] > A[ i + 1 ] then
swap( A[ i ], A[ i + 1 ] )
swapped := true
end if
end for
while swapped
end procedure
Quicksorts, on the other hand, have the following structure:
function quicksort(array)
var list less, equal, greater
if length(array) ? 1
return array
select a pivot value pivot from array
for each x in array
if x pivot then append x to greater
return concatenate(quicksort(less), equal, quicksort(greater))
The other recommendations to this post are all valid, but its the algorithm itself that is your bottleneck. Hope this helps.
Darryl
March 25th, 2008 at 11:45 pm
It is actually not a sort, but filling up half of a matrix. So I don’t think the bubble/quick sort argument applies.
Also, if it were a complexity problem, the Java version should suffer equally (albeit with a - large - constant difference to the O() case compared to Groovy)
March 27th, 2008 at 3:13 am
[...] Groovy 60 Times Slower than Java? In a pair of very intriguing postings (here and here), Tiago details what he has been observing about the comparative performance of Groovy and [...]
March 27th, 2008 at 9:40 pm
Good work !. I have some insights regarding JVMs. A round of test (Windows XP, Core2Duo 2GHz, 2GB RAM) showed the following:
Java (Sun 1.6.0_03-b05): 384 sec
Java (JRockit 1.5.0_10): 327 sec
Groovy (Sun 1.6.0_03-b05): 10549 sec
Groovy (JRockit 1.5.0_10): 6375 sec
The ratios:
With Sun 1.6.0_03-b05: 1:27
With JRockit 1.5.0_10: 1:19
Ergo, I prefer JRockit over Sun JVM for groovy development.
May 16th, 2008 at 5:36 pm
pmates…
web design in jax fl
webcamnow
webkins
…
May 17th, 2008 at 1:16 am
virgin mary pretzel…
the dump furniture store
the erotic review
the hun yellow pages
…
May 17th, 2008 at 8:43 am
dell…
sportsbook scam types
sportsbook scams
sportsbooks
…