I think that language performance (from a speed point of view) is highly overrated. There are many factors that are more important. Well, on the time front alone, developer time is normally more important: The time spent groking code is normally more expensive than the running time of an application. Of course, there are many other important points, too much to enumerate (portability, declarativeness, readability, …). If performance was the fundamental variable, we would all be using assembler.
I am currently doing a bit of code to go over tens of millions of lines of text, while comparing separate columns.
I did a little piece of Groovy code to go over all those lines. The performance results were abysmal, so I decided to do a program in Java (copying the Groovy code to a Java file and converting in a very direct way). For 3000 lines of text here are the results (remember, this is to process hundreds of millions):
$ time java Do real 0m4.427s user 0m4.384s sys 0m0.040s $ time groovy do.groovy real 2m53.303s user 2m47.650s sys 0m0.668s
4 seconds against 2mins 53 seconds. This is not serious as it is possible to write all Groovy intensive parts in Java. But, even so, it is too much.
The code? (Afterwards, some speculation and profiling)
Groovy:
... while ((line = reader.readLine()) != null) { lineTok = line.tokenize() if (lineTok.size() == colIndiv.size() + 11) { for (int i=0; i<indivs .size()-1; i++) { int iPos = indivPos[indivs[i]] def iCase = lineTok[iPos] if (iCase.equals('NN')) continue for (int j=i+1; j<indivs.size(); j++) { int jPos = indivPos[indivs[j]] def jCase = lineTok[jPos] if (jCase.equals('NN')) continue def isDifferent = true iCase.each { if (jCase.contains(it)) { isDifferent = false } } if (isDifferent) counts[indivs[i] + indivs[j]] += 1 } } } }
Java:
... while ((line = reader.readLine()) != null) { String[] lineTok = line.split(" "); if (lineTok.length == colIndiv.size() + 11) { for (int i=0; i<indivs .size()-1; i++) { int iPos = indivPos.get(indivs.get(i)); String iCase = lineTok[iPos]; if (iCase.equals("NN")) continue; for (int j=i+1; j<indivs.size(); j++) { int jPos = indivPos.get(indivs.get(j)); String jCase = lineTok[jPos]; if (jCase.equals("NN")) continue; boolean isDifferent = true; if (jCase.indexOf(iCase.charAt(0)) > -1 || jCase.indexOf(iCase.charAt(1)) > -1 ) { isDifferent = false; } if (isDifferent) counts.put( indivs.get(i) + indivs.get(j), indivs.get(i) + indivs.get(j) + 1); } } } } </indivs>
I cursorly run the Java profiler, though I did not spent much time on it, it seemed that (speculation alert!) Groovy was spending sometime on metaclassing/proxying parts. I wonder if the “defs” were making things much slower? Maybe if I had properly typed my loop variables (instead of being lazy and def duck typing) things would have ran smoother. If that is the case, then one more reason against duck typing (others being helping the IDEs and automated code tools and for debugging purposes)
Ted Naleid says:
I would have expected a performance hit in groovy over java, but that does seem like a large one. Are you using the latest version of groovy (1.5.4)?
I’m also wondering if the groovy code would be any faster if it leveraged some useful groovy constructs rather than being a mostly straight copy of the java code.
Something like:
def colIndivSize = colIndiv.size() + 11
reader.eachLine { line ->
lineTok = line.tokenize()
if (lineTok.size() == colIndivSize) {
indivs.eachWithIndex { iIndiv, i ->
def iCase = lineTok[indivPos[iIndiv]]
if (iCase.equals(‘NN’)) continue
indivs[i..-1].eachWithIndex { jIndiv ->
def jCase = lineTok[indivPos[jIndiv]]
if (jCase.equals(‘NN’)) continue
if (jCase – iCase != jCase) counts[iIndiv + jIndiv] += 1
}
}
}
}
(I think this is a direct port of what you have, but I don’t have your full source or a test file so I might have missed something)
March 23, 2008, 00:02Dooby says:
It is quite bad that you say something exhibits poor performance, yet the code shows you have little to zero knowledge of the thing you are writing the code in…
Seriously, just changing a type to the word “def” does not make something Groovy… You’ve just given yourself all the speed losses of being interpreted, with none of the gains of the Groovy language
March 23, 2008, 00:25Sakuraba says:
The “def” is more like “Object”. I dont think that explicitely typing it makes a difference.
I think you found some kind of hotspot there. The speed difference is definetly not normal.
I just dont understand why you did not try to make to make it more idiomatic Groovy. Your code is very hard to read.
March 23, 2008, 10:23tiago says:
The code could be better, that is for sure. And I will revisit this shortly. I have lots of stuff going and sometimes things get rushed out. I actually hate duck typing (which I used out of lazyness and being in a hurry)
There is one point though: Considering that Groovy is a language that puts itself as the “next easier step” after Java, it should allow for better results than this when using Groovy in a Java idiomatic way. This is not an excuse for the poor quality of the code, just a realistic comment for a language that is positioned the way Groovy is.
March 23, 2008, 12:08movk says:
Groovy is great language. I thought it would be great to use it as application specific scripting language (Java application + user can add/extend functionality through Groovy scripts). Unfortunately, if such application is math oriented and larger amount of data is processed or complicated math used (scientific, business), then Groovy cannot be used because of performance (Scala may be the solution).I find Groovy great for building application skeleton and write performance critical part in JAVA.
March 23, 2008, 13:45Cognitive Consonance » Blog Archive » Revisiting Groovy performance issues says:
[...] tried to drill down the Groovy performance issue that I had with what is in practice a text processing [...]
March 23, 2008, 19:04tiago says:
An update to this, where I dissect part of the problem and suggest how to improve things can be found here.
March 23, 2008, 19:08