I tried to drill down the Groovy performance issue that I had with what is in practice a text processing exercise.
The original code was written in Groovy (and then ported to Java, not the other way around), but as I was in a hurry it was written in idiomatic Java (I am too much of a Groovy newbie to be able to write in idiomatic Groovy if I am in a hurry). Ted Naleid left some great suggestions on how to be more groovyish.
Anyway, I took my original code and tried to understand what was going on, here are my findings.
Replacing duck typing with explicit typing took a minute out (from 4m to 3m).
Converting this
iCase.each { if (jCase.contains(it)) { isDifferent = false } }
to this
if (jCase.contains(iCase[0]) || jCase.contains(iCase[1])) { isDifferent = false }
took 1m10s (from 3m to 1m50s) – This is in a inner loop part.
8 seconds were gained by changing this:
for (int j=i+1; j<indivs .size(); j++) {
into
int iSize = indivs.size() for (int j=i+1; j<isize ; j++) {
As inline comments you can find how much time each line took in the
following inner loop:
for (int j=i+1; j<isize ; j++) { int jPos = indivPos[indivs[j]] //~20s String jCase = lineTok[jPos] //~10s if (jCase.equals('NN')) continue //~8s boolean isDifferent = true //2s if (jCase.contains(iCase[0]) || jCase.contains(iCase[1])) { isDifferent = false //7s } //whole if is ~ 30s - 23 condition, 7 assignment if (isDifferent) counts[indivs[i] + indivs[j]] += 1 //5 secs }
The only stunning thing is the time lost at indexing String arrays (and maps, but that I can understand).
This text is being written as I was changing and trying things, I gained 20s from
minor changes of which I lost track
. I am currently at 1m30s (down from the
original 4m and comparing with Java’s 4s).