Attached is a patch that changes memset to use 'long' instead of 'int32'.
I will also send a test program.
As far as I know, setting memory by 64bit unit is faster than 32bit unit on
64bit CPUs. While there are no changes on 32bit CPUs because 'long' and 'int32'
are same length on them.
The followings are test results on my machines.
* Opteron 248 2.2GHz ('long' version is 25-55% faster than int.)
RHEL4 2.6.9-22.0.2.ELsmp #1 SMP
gcc 3.4.3 20050227 (Red Hat 3.4.3-22.1)
gcc -O2
sizeof(int) = 4
sizeof(long) = 8
Loop by int (size=64) : 2.883838 / 1.481662 / 1.481677
Loop by long (size=64) : 0.613221 / 0.610086 / 0.832484
Loop by int (size=256) : 4.272891 / 4.270954 / 4.270668
Loop by long (size=256) : 2.383517 / 2.382581 / 2.382300
Loop by int (size=1024) : 16.898493 / 15.475216 / 15.455585
Loop by long (size=1024) : 11.679070 / 11.682782 / 11.702978
* Xeon 2.80GHz HT ('long' version is 35-55% faster than int.)
RHEL4 2.6.9-22.0.2.ELsmp #1 SMP
gcc 3.4.4 20050721 (Red Hat 3.4.4-2)
gcc -O3
sizeof(int) = 4
sizeof(long) = 8
Loop by int (size=64) : 1.581404 / 1.200408 / 1.202054
Loop by long (size=64) : 0.509141 / 0.535169 / 0.575655
Loop by int (size=256) : 4.138742 / 3.967415 / 3.985982
Loop by long (size=256) : 2.616929 / 2.502939 / 2.447899
Loop by int (size=1024) : 13.035987 / 12.921727 / 13.253955
Loop by long (size=1024) : 8.480915 / 8.916534 / 8.565102
* Penitum4 1.4GHz (32bit cpu, so the performance has not changed.)
SuSE 10.0 2.6.11.4-21.9-default #1
gcc 3.3.5 20050117 (prerelease) (SUSE Linux)
gcc -O2
sizeof(int) = 4
sizeof(long) = 4
Loop by int (size=64) : 1.531222 / 1.531955 / 1.515964
Loop by long (size=64) : 1.576709 / 1.526092 / 1.553555
Loop by int (size=256) : 6.036154 / 5.997446 / 5.995208
Loop by long (size=256) : 6.047094 / 6.020463 / 6.111434
Loop by int (size=1024) : 20.973230 / 20.905700 / 20.905927
Loop by long (size=1024) : 20.943455 / 20.911707 / 20.931522
---
ITAGAKI Takahiro
NTT Cyber Space Laboratories