Gawk benchmark
I downloaded "gawk" from GNU and build it once just using configure and the standard apple allocator and once I inserted the Mullocator. Then I ran this little benchmark I found somewhere....
# Contributed by Eiso AB <eiso@chem.rug.nl> BEGIN { Switch["123"] = " abc " Switch["82"] = " def " Switch["985"] = " ghi " Switch["20"] = " jkl " Switch["1098"] = " mno " Switch["3874"] = " pqr " Switch["272"] = " stu " Switch_R["123"] = " 123 " Switch_R["82"] = " 82 " Switch_R["985"] = " 985 " Switch_R["20"] = " 20 " Switch_R["1098"] = " 1098 " Switch_R["3874"] = " 3874 " Switch_R["272"] = " 272 " for (i=0; i <30000; i++) { s1 = s2 = s3 = " 123 82 985 20 1098 3874 272 " for (j in Switch) { # Manually doing a gsub while (match(s1, j)) s1 = substr(s1, 1, RSTART-1) Switch[j] substr(s1, RSTART+RLENGTH) # Use gsub gsub(j, Switch[j], s2) # gsub, and prevent RE recompile gsub(Switch_R[j], Switch[j], s3) } } }
Results (before wedging)
bash-2.05a$ time ./mullegawk -f bench1.awk real 0m16.922s user 0m14.140s sys 0m2.480s bash-2.05a$ time ./applegawk -f bench1.awk real 0m22.372s user 0m19.540s sys 0m2.310s
Ok now as I had already the mullocator as the malloc library in
the MulleMallocTracerLib
, I just used the removed all
tracing code (with clever use of #ifdef (take that Java)) to get a
fairer comparison between the Mullocator and the Apple malloc. I
then ran the test again (slightly different, because I need to use
a shell script wrapper)
bash-2.05a$ time ./gawk.sh real 0m24.526s user 0m21.810s sys 0m2.240s
Catastrophe! My code was actually slower than Apple's! Now how could that happen :) ? Fortunuately I had compiled it with -DDEBUG which does lots of checking. With -DDEBUG removed I got
bash-2.05a$ time ./gawk.sh real 0m18.709s user 0m15.290s sys 0m2.030s bash-2.05a$
This indicates to me, that I am losing a lot time because of the
need to bridge into shared library land, which I could avoid
before. And also the extraneous wrapping call as f.e.
malloc
is now coded like this adds to the
punishment
void *malloc( size_t size) { void *ptr; ptr = mulle_malloc( size); return( ptr); }
As I believe that my allocator performs even better when lots of memory is allocated and active, I am looking for some more benchmarks, that stress the memory system even more.