Re: Fixed length data types issue - Mailing list pgsql-hackers

From Jeremy Drake
Subject Re: Fixed length data types issue
Date
Msg-id Pine.BSO.4.63.0609102205080.7593@resin2.csoft.net
Whole thread Raw
In response to Re: Fixed length data types issue  (Kevin Brown <kevin@sysexperts.com>)
List pgsql-hackers
On Sun, 10 Sep 2006, Kevin Brown wrote:

> Tom Lane wrote:
> > (does anyone know the cost of ntohl() on modern
> > Intel CPUs?)
>
> I have a system with an Athlon 64 3200+ (2.0 GHz) running in 64-bit
> mode, another one with the same processor running in 32-bit mode, a a
> third running a Pentium 4 1.5 GHz processor, and a fourth running a
> pair of 2.8 GHz Xeons in hyperthreading mode.
>
> I compiled the test program on the 32-bit systems with the -std=c9x
> option so that the constant would be treated as unsigned.  Other than
> that, the compilation method I used was identical: no optimization,
> since it would skip the loop entirely in the version without the
> ntohl() call.  I compiled it both with and without defining
> CALL_NTOHL, and measured the difference in billed CPU seconds.
>
> Based on the above, on both Athlon 64 systems, each ntohl() invocation
> and assignment takes 1.04 nanoseconds to complete (I presume the
> assignment is to a register, but I'd have to examine the assembly to
> know for sure).  On the 1.5 GHz P4 system, each iteration takes 8.49
> nanoseconds.  And on the 2.8 GHz Xeon system, each iteration takes
> 5.01 nanoseconds.

Of course, that depends on the particular OS and variant as well.  IIRC,
at some point an instruction was added to x86 instruction set to do byte
swapping.

This is from /usr/include/netinet/in.h on a gentoo linux box with glibc
2.3

#ifdef __OPTIMIZE__
/* We can optimize calls to the conversion functions.  Either nothing has  to be done or we are using directly the
byte-swappingfunctions which  often can be inlined.  */
 
# if __BYTE_ORDER == __BIG_ENDIAN
/* The host byte order is the same as network byte order,  so these functions are all just identity.  */
# define ntohl(x)       (x)
# define ntohs(x)       (x)
# define htonl(x)       (x)
# define htons(x)       (x)
# else
#  if __BYTE_ORDER == __LITTLE_ENDIAN
#   define ntohl(x)     __bswap_32 (x)
#   define ntohs(x)     __bswap_16 (x)
#   define htonl(x)     __bswap_32 (x)
#   define htons(x)     __bswap_16 (x)
#  endif
# endif
#endif


And from bits/byteswap.h

/* To swap the bytes in a word the i486 processors and up provide the  `bswap' opcode.  On i386 we have to use three
instructions. */
 
#  if !defined __i486__ && !defined __pentium__ && !defined __pentiumpro__ \     && !defined __pentium4__
#   define __bswap_32(x)                                                      \    (__extension__
                                   \     ({ register unsigned int __v, __x = (x);                                \
 if (__builtin_constant_p (__x))                                      \          __v = __bswap_constant_32 (__x);
                           \        else                                                                 \
__asm__("rorw $8, %w0;"                                           \                   "rorl $16, %0;"
                       \                   "rorw $8, %w0"                                            \
: "=r" (__v)                                              \                   : "0" (__x)
               \                   : "cc");                                                  \        __v; }))
 
#  else
#   define __bswap_32(x) \    (__extension__                                                           \     ({
registerunsigned int __v, __x = (x);                                \        if (__builtin_constant_p (__x))
                         \          __v = __bswap_constant_32 (__x);                                   \        else
                                                            \          __asm__ ("bswap %0" : "=r" (__v) : "0" (__x));
                 \        __v; }))
 
#  endif


/me searches around his hard drive for the ia32 developers reference

BSWAP
Opcode        Instruction    Description
0F C8+rd    BSWAP r32    Reverse the byte order of a 32-bit register

...

The BSWAP instruction is not supported on IA-32 processors earlier than
the Intel486 processor family. ...


I have read some odd stuff about instructions like these.  Apparently the
fact that this is a "prefixed instruction" (the 0F byte at the beginning)
costs an extra clock cycle, so though this instruction should take 1
cycle, it ends up taking 2.  I am unclear whether or not this is rectified
in later pentium chips.

So to answer the question about how much ntohl costs on recent Intel
boxes, a properly optimized build with a friendly libc like I quoted
should be able to do it in 2 cycles.


-- 
In Ohio, if you ignore an orator on Decoration day to such an extent as
to publicly play croquet or pitch horseshoes within one mile of the
speaker's stand, you can be fined $25.00.


pgsql-hackers by date:

Previous
From: Kevin Brown
Date:
Subject: Re: Fixed length data types issue
Next
From: "Albe Laurenz"
Date:
Subject: Re: [PATCHES] Fix linking of OpenLDAP libraries