NOTE: This patch has been committed. The version below is
informational only (whitespace differences have been removed).
ChangeLog addition:
2006-01-09 Didier Verna <didier(a)lrde.epita.fr>
* 2.0/src/dd/cl/tsi.cl: Fix size and nsteps parameters.
* 2.0/txt/bench.didier: New.
GSC source patch:
Diff command: svn diff --diff-cmd /usr/bin/diff -x "-u -t -b -B -w"
Files affected: 2.0/src/dd/cl/tsi.cl 2.0/txt/bench.didier
Index: 2.0/txt/bench.didier
===================================================================
--- 2.0/txt/bench.didier (revision 0)
+++ 2.0/txt/bench.didier (revision 0)
@@ -0,0 +1,331 @@
+ BENCHMARKS RESULTS
+
+
+* Architecture
+
+Debian unstable.
+
+** uname -a
+
+Linux uzeb 2.4.27-2-686-smp #1 SMP Wed Nov 30 21:47:06 JST 2005 i686 GNU/Linux
+
+Note the SMP flag: the CPU has hyperthreading turned on; the OS sees two
+virtual processors.
+
+** cat /proc/cpuinfo
+
+processor : 0
+vendor_id : GenuineIntel
+cpu family : 15
+model : 3
+model name : Intel(R) Pentium(R) 4 CPU 3.00GHz
+stepping : 4
+cpu MHz : 2992.789
+cache size : 1024 KB
+fdiv_bug : no
+hlt_bug : no
+f00f_bug : no
+coma_bug : no
+fpu : yes
+fpu_exception : yes
+cpuid level : 5
+wp : yes
+flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni monitor ds_cpl cid
+bogomips : 5976.88
+
+processor : 1
+vendor_id : GenuineIntel
+cpu family : 15
+model : 3
+model name : Intel(R) Pentium(R) 4 CPU 3.00GHz
+stepping : 4
+cpu MHz : 2992.789
+cache size : 1024 KB
+fdiv_bug : no
+hlt_bug : no
+f00f_bug : no
+coma_bug : no
+fpu : yes
+fpu_exception : yes
+cpuid level : 5
+wp : yes
+flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni monitor ds_cpl cid
+bogomips : 5976.88
+
+
+
+* Benchmarks
+
+** Environment
+
+*** C
+
+gcc (GCC) 4.0.3 20051201 (prerelease) (Debian 4.0.2-5)
+Copyright (C) 2005 Free Software Foundation, Inc.
+This is free software; see the source for copying conditions. There is NO
+warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+
+gcc -03 -DNDEBUG
+
+*** Java
+
+java version "1.5.0_06"
+Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05)
+Java HotSpot(TM) Client VM (build 1.5.0_06-b05, mixed mode, sharing)
+
+javac: idem
+
+*** Eiffel
+
+SmartEiffel The GNU Eiffel Compiler, Eiffel tools and libraries
+Release 1.1 Release (Monday June 16th 2003) [Charlemagne]
+Copyright (C), 1994-2003 - INRIA - LORIA - UHP - Nancy 2 - FRANCE
+D.COLNET, S.COLLIN, O.ZENDRA, P.RIBET, C.ADRIAN - SmartEiffel(a)loria.fr
+http://SmartEiffel.loria.fr
+
+compile_to_c -boost / then C compilation
+
+*** Common Lisp
+
+XEmacs / Slime / CMU-CL
+
+CMU Common Lisp CVS 19c 19c-release + minimal debian patches (19C), running on
+uzeb With core: /usr/lib/cmucl/lisp.core
+Dumped on: Mon, 2005-12-12 10:05:58+01:00 on uzeb
+
+Loaded subsystems:
+ Python 1.1, target Intel x86
+ CLOS based on Gerd's PCL 2004/04/14 03:32:47
+
+
+** Dedicated versions
+
+*** C
+
+Linear / 800x800 / 200 steps:
+SB: 0.31
+MB: 0.33
+
+Randomized / 800x800 / 200 steps:
+SB: 10.82
+MB: 8.86
+
+
+*** Java
+
+Linear / 800x800 / 200 steps:
+SB: 0.35
+MB: 0.57
+
+Randomized / 800x800 / 200 steps:
+SB: 11.00
+MB: 9.10
+
+
+*** Eiffel
+
+Linear / 800x800 / 200 steps:
+SB: 0.40
+MB: 1.06
+
+Randomized / 800x800 / 200 steps:
+SB: 11.04
+MB: 9.00
+
+*** Common Lisp
+
+;;; Optimized benches for 200 step(s):
+;; Linear / Untyped / Multibuffer:
+; Evaluation took:
+; 2.48 seconds of real time
+; 2.49 seconds of user run time
+; 0.0 seconds of system run time
+; 7,429,347,345 CPU cycles
+; 0 page faults and
+; 0 bytes consed.
+;
+;; Linear / Untyped / Singlebuffer / AREF:
+; Evaluation took:
+; 1.32 seconds of real time
+; 1.29 seconds of user run time
+; 0.02 seconds of system run time
+; 3,949,483,710 CPU cycles
+; 0 page faults and
+; 0 bytes consed.
+;
+;; Linear / Untyped / Singlebuffer / SVREF:
+; Evaluation took:
+; 1.12 seconds of real time
+; 1.11 seconds of user run time
+; 0.0 seconds of system run time
+; 3,345,343,913 CPU cycles
+; 0 page faults and
+; 0 bytes consed.
+;
+
+;;; Optimized benches for 200 step(s):
+;; Randomized / Untyped / Multibuffer:
+; Evaluation took:
+; 42.61 seconds of real time
+; 42.55 seconds of user run time
+; 0.0 seconds of system run time
+; 127,510,882,845 CPU cycles
+; 0 page faults and
+; 0 bytes consed.
+;
+;; Randomized / Untyped / Singlebuffer / AREF:
+; Evaluation took:
+; 38.69 seconds of real time
+; 38.66 seconds of user run time
+; 0.01 seconds of system run time
+; 115,775,643,570 CPU cycles
+; 0 page faults and
+; 0 bytes consed.
+;
+;; Randomized / Untyped / Singlebuffer / SVREF:
+; Evaluation took:
+; 38.76 seconds of real time
+; 38.63 seconds of user run time
+; 0.11 seconds of system run time
+; 116,005,607,213 CPU cycles
+; 0 page faults and
+; 0 bytes consed.
+;
+
+;;; Optimized benches for 200 step(s):
+;; Linear / Typed / Multibuffer:
+; Evaluation took:
+; 2.82 seconds of real time
+; 2.83 seconds of user run time
+; 0.0 seconds of system run time
+; 8,467,002,300 CPU cycles
+; 0 page faults and
+; 0 bytes consed.
+;
+;; Linear / Typed / Singlebuffer / AREF:
+; Evaluation took:
+; 0.55 seconds of real time
+; 0.55 seconds of user run time
+; 0.0 seconds of system run time
+; 1,636,237,845 CPU cycles
+; 0 page faults and
+; 0 bytes consed.
+;
+;; Linear / Typed / Singlebuffer / SVREF:
+; Evaluation took:
+; 0.55 seconds of real time
+; 0.52 seconds of user run time
+; 0.03 seconds of system run time
+; 1,651,320,795 CPU cycles
+; 0 page faults and
+; 0 bytes consed.
+;
+
+;;; Optimized benches for 200 step(s):
+;; Randomized / Typed / Multibuffer:
+; Evaluation took:
+; 28.21 seconds of real time
+; 28.18 seconds of user run time
+; 0.03 seconds of system run time
+; 84,420,916,523 CPU cycles
+; 0 page faults and
+; 0 bytes consed.
+;
+;; Randomized / Typed / Singlebuffer / AREF:
+; Evaluation took:
+; 18.4 seconds of real time
+; 18.37 seconds of user run time
+; 0.01 seconds of system run time
+; 55,038,771,285 CPU cycles
+; 0 page faults and
+; 0 bytes consed.
+;
+;; Randomized / Typed / Singlebuffer / SVREF:
+; Evaluation took:
+; 19.32 seconds of real time
+; 19.16 seconds of user run time
+; 0.04 seconds of system run time
+; 57,815,967,854 CPU cycles
+; 0 page faults and
+; 0 bytes consed.
+;
+
+;;; Optimized benches for 200 step(s):
+;; Linear / Typed / Sized / Multibuffer:
+; Evaluation took:
+; 1.09 seconds of real time
+; 1.09 seconds of user run time
+; 0.01 seconds of system run time
+; 3,289,851,878 CPU cycles
+; 0 page faults and
+; 0 bytes consed.
+;
+;; Linear / Typed / Sized / Singlebuffer / AREF:
+; Evaluation took:
+; 0.54 seconds of real time
+; 0.53 seconds of user run time
+; 0.01 seconds of system run time
+; 1,613,471,783 CPU cycles
+; 0 page faults and
+; 0 bytes consed.
+;
+;; Linear / Typed / Sized / Singlebuffer / SVREF:
+; Evaluation took:
+; 0.55 seconds of real time
+; 0.55 seconds of user run time
+; 0.0 seconds of system run time
+; 1,659,704,692 CPU cycles
+; 0 page faults and
+; 0 bytes consed.
+;
+
+;;; Optimized benches for 200 step(s):
+;; Randomized / Typed / Sized / Multibuffer:
+; Evaluation took:
+; 22.03 seconds of real time
+; 22.02 seconds of user run time
+; 0.01 seconds of system run time
+; 65,924,021,535 CPU cycles
+; 0 page faults and
+; 0 bytes consed.
+;
+;; Randomized / Typed / Sized / Singlebuffer / AREF:
+; Evaluation took:
+; 19.19 seconds of real time
+; 19.16 seconds of user run time
+; 0.02 seconds of system run time
+; 57,427,619,017 CPU cycles
+; 0 page faults and
+; 0 bytes consed.
+;
+;; Randomized / Typed / Sized / Singlebuffer / SVREF:
+; Evaluation took:
+; 19.22 seconds of real time
+; 19.17 seconds of user run time
+; 0.08 seconds of system run time
+; 57,512,349,082 CPU cycles
+; 0 page faults and
+; 0 bytes consed.
+;
+
+
+*** Summary
+
+ Linear Randomized
+C 0.31 - 0.33 10.82 - 8.86
+Java 0.35 - 0.57 11.00 - 9.10
+Eiffel 0.40 - 1.06 11.04 - 9.00
+Commom Lisp 0.55 - 1.10 18.38 - 22.03
+
+Note for Common Lisp: the best MB version is the sized one, and makes a big
+difference. The best randomized MB version is also the sized one.
+
+Single buffer versions: randomized time ~ 30 * linear time
+Multi buffer versions: randomized time ~ 20 * linear time
+
+
+
+
+Local Variables:
+mode: outline
+End:
Index: 2.0/src/dd/cl/tsi.cl
===================================================================
--- 2.0/src/dd/cl/tsi.cl (revision 46)
+++ 2.0/src/dd/cl/tsi.cl (working copy)
@@ -17,10 +17,10 @@
"Prime number to randomize memory access.")
(eval-when (:compile-toplevel :load-toplevel :execute)
- (defvar *size* 1024
+ (defvar *size* 800
"Dimension for (square) images."))
-(defvar *nsteps* 100
+(defvar *nsteps* 200
"Number of times to repeat the algorithm.")
--
Didier Verna, didier(a)lrde.epita.fr, http://www.lrde.epita.fr/~didier
EPITA / LRDE, 14-16 rue Voltaire Tel.+33 (1) 44 08 01 85
94276 Le Kremlin-Bicêtre, France Fax.+33 (1) 53 14 59 22 didier(a)xemacs.org
NOTE: This patch has been committed. The version below is
informational only (whitespace differences have been removed).
ChangeLog addition:
2006-01-09 Didier Verna <didier(a)lrde.epita.fr>
* 2.0/README: Update comments for dedicated versions.
GSC source patch:
Diff command: svn diff --diff-cmd /usr/bin/diff -x "-u -t -b -B -w"
Files affected: 2.0/README
Index: 2.0/README
===================================================================
--- 2.0/README (revision 46)
+++ 2.0/README (working copy)
@@ -161,23 +161,32 @@
** src/dd
-dedicated code, no abstraction
+*** Languages currently used:
+C, Java, Eiffel, Common Lisp.
-*** comments
-
-the C code gives us a time reference.
-
-*** code
-
-c/
- dd_c.c
-
-java/
- Main.java
-
-eiffel/
- image1d_int.e
- main.e
+*** Comments:
+Programs are fully dedicated (no abstraction). Apart from language-specific
+peculiarities, there are 4 variations of the algorithm for each language:
+- images represented as one or two-dimensional arrays
+- linear or "randomized" image traversal
+
+The linear versions simply traverse the images by line / column order. The
+"randomized" versions (files ending with an 'i') are supposed to minimize the
+hardware / os impact on memory access (typically, cache size, pagination etc).
+To this aim, image traversal is done by column first (in 2D array versions),
+and individual cells are accessed by steps of a prime number.
+
+Along with the same lines, the default image size (800) is chosen *not* to be
+a power of two.
+
+
+Some recommendations:
+- do not use language-specific idioms (e.g. *p++ = ... in C) to avoid
+ potential language-specific optimization. We want to test the algorithm
+ only.
+- do not bench on a single run of the algorithm, to avoid initalization
+ artefacts (like initial page faults), and timing precision influence. That's
+ why the default number of runs is set to 200.
** src/st
--
Didier Verna, didier(a)lrde.epita.fr, http://www.lrde.epita.fr/~didier
EPITA / LRDE, 14-16 rue Voltaire Tel.+33 (1) 44 08 01 85
94276 Le Kremlin-Bicêtre, France Fax.+33 (1) 53 14 59 22 didier(a)xemacs.org