Re: Performance of hand-optimised assembly
"BartC" <bc@freeuk.com> wrote in message
news:QxgNq.158682$4a.110454@newsfe04.ams2...
> "Ben Bacarisse" <ben.usenet@bsb.me.uk> wrote in message
>> void ShiftVector(unsigned long long vector[static 8], int AmountToShift)
>> {
>> int rest = 64 - AmountToShift;
>> vector[0] = (vector[0] << AmountToShift) | (vector[1] >> rest);
....
> I've tested this in 32-bit mode.
> However, gcc -O3 took 1.4 to 1.6 seconds (and 0.7 seconds for exactly a
> 32-bit shift).
gcc -O3 was obviously taking advantage of some aspect of the repetitive
nature of my simple benchmark.
Varying the amount of shift in each iteration soon put paid to that!
Timings for 100 million iterations of a varying number of shifts (1 to 63)
are now:
gcc -O3 4.7 seconds
lccwin-33 -O 6.6 seconds
PellesC -Ot 6.8 seconds
DMC -o 12.4 seconds
My ASM 3.2 seconds
(And the Asm could do with some further work, this is just the first draft,
but I'm not going to bother. Having to logically swap alternate 32-bit words
in order to match the behaviour of 64-bits has already done my head in..)
So as it stands, the advantage of Asm over gcc -o3 is pretty much what
you've already found.
--
Bartc
|