Re: Fixed length data types issue - Mailing list pgsql-hackers
From | Jeremy Drake |
---|---|
Subject | Re: Fixed length data types issue |
Date | |
Msg-id | Pine.BSO.4.63.0609102205080.7593@resin2.csoft.net Whole thread Raw |
In response to | Re: Fixed length data types issue (Kevin Brown <kevin@sysexperts.com>) |
List | pgsql-hackers |
On Sun, 10 Sep 2006, Kevin Brown wrote: > Tom Lane wrote: > > (does anyone know the cost of ntohl() on modern > > Intel CPUs?) > > I have a system with an Athlon 64 3200+ (2.0 GHz) running in 64-bit > mode, another one with the same processor running in 32-bit mode, a a > third running a Pentium 4 1.5 GHz processor, and a fourth running a > pair of 2.8 GHz Xeons in hyperthreading mode. > > I compiled the test program on the 32-bit systems with the -std=c9x > option so that the constant would be treated as unsigned. Other than > that, the compilation method I used was identical: no optimization, > since it would skip the loop entirely in the version without the > ntohl() call. I compiled it both with and without defining > CALL_NTOHL, and measured the difference in billed CPU seconds. > > Based on the above, on both Athlon 64 systems, each ntohl() invocation > and assignment takes 1.04 nanoseconds to complete (I presume the > assignment is to a register, but I'd have to examine the assembly to > know for sure). On the 1.5 GHz P4 system, each iteration takes 8.49 > nanoseconds. And on the 2.8 GHz Xeon system, each iteration takes > 5.01 nanoseconds. Of course, that depends on the particular OS and variant as well. IIRC, at some point an instruction was added to x86 instruction set to do byte swapping. This is from /usr/include/netinet/in.h on a gentoo linux box with glibc 2.3 #ifdef __OPTIMIZE__ /* We can optimize calls to the conversion functions. Either nothing has to be done or we are using directly the byte-swappingfunctions which often can be inlined. */ # if __BYTE_ORDER == __BIG_ENDIAN /* The host byte order is the same as network byte order, so these functions are all just identity. */ # define ntohl(x) (x) # define ntohs(x) (x) # define htonl(x) (x) # define htons(x) (x) # else # if __BYTE_ORDER == __LITTLE_ENDIAN # define ntohl(x) __bswap_32 (x) # define ntohs(x) __bswap_16 (x) # define htonl(x) __bswap_32 (x) # define htons(x) __bswap_16 (x) # endif # endif #endif And from bits/byteswap.h /* To swap the bytes in a word the i486 processors and up provide the `bswap' opcode. On i386 we have to use three instructions. */ # if !defined __i486__ && !defined __pentium__ && !defined __pentiumpro__ \ && !defined __pentium4__ # define __bswap_32(x) \ (__extension__ \ ({ register unsigned int __v, __x = (x); \ if (__builtin_constant_p (__x)) \ __v = __bswap_constant_32 (__x); \ else \ __asm__("rorw $8, %w0;" \ "rorl $16, %0;" \ "rorw $8, %w0" \ : "=r" (__v) \ : "0" (__x) \ : "cc"); \ __v; })) # else # define __bswap_32(x) \ (__extension__ \ ({ registerunsigned int __v, __x = (x); \ if (__builtin_constant_p (__x)) \ __v = __bswap_constant_32 (__x); \ else \ __asm__ ("bswap %0" : "=r" (__v) : "0" (__x)); \ __v; })) # endif /me searches around his hard drive for the ia32 developers reference BSWAP Opcode Instruction Description 0F C8+rd BSWAP r32 Reverse the byte order of a 32-bit register ... The BSWAP instruction is not supported on IA-32 processors earlier than the Intel486 processor family. ... I have read some odd stuff about instructions like these. Apparently the fact that this is a "prefixed instruction" (the 0F byte at the beginning) costs an extra clock cycle, so though this instruction should take 1 cycle, it ends up taking 2. I am unclear whether or not this is rectified in later pentium chips. So to answer the question about how much ntohl costs on recent Intel boxes, a properly optimized build with a friendly libc like I quoted should be able to do it in 2 cycles. -- In Ohio, if you ignore an orator on Decoration day to such an extent as to publicly play croquet or pitch horseshoes within one mile of the speaker's stand, you can be fined $25.00.
pgsql-hackers by date: