* Tom Lane <tgl@sss.pgh.pa.us> [001210 12:00] wrote:
> Bruce Guenter <bruceg@em.ca> writes:
> >> A good theory, but unfortunately not a correct theory. PA-RISC can do a
> >> circular shift in one cycle using the "shift right double" instruction,
>
> > Interesting. I was under the impression that virtually no RISC CPU had
> > a rotate instruction. Do any others?
>
> Darn if I know. A RISC purist would probably say that PA-RISC isn't all
> that reduced ... for example, the reason it needs six cycles not seven
> for the CRC inner loop is that the LOAD instruction has an option to
> postincrement the pointer register (like a C "*ptr++").
>
> > Same with the x86 core:
> > movb %dl,%al
> > xorb (%ecx),%al
> > andl $255,%eax
> > shrl $8,%edx
> > incl %ecx
> > xorl (%esi,%eax,4),%edx
>
> > On my Celeron, the timing for those six opcodes is almost whopping 13
> > cycles per byte. Obviously there's some major performance hit to do the
> > memory instructions, because there's no more than 4 cycles worth of
> > dependant instructions in that snippet.
>
> Yes. It looks like we're looking at pipeline stalls for the memory
> reads. I expect PA-RISC would have the same problem if it were not that
> the CRC table and data buffer are almost certainly loaded into level-2
> cache memory. Curious that you don't get the same result --- what is
> the memory cache architecture on your box?
>
> As Nathan remarks nearby, this is just minutiae, but I'm interested
> anyway...
I would try unrolling the loop some (if possible) and retesting.
--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."