Re: inline newNode() - Mailing list pgsql-patches
From | Neil Conway |
---|---|
Subject | Re: inline newNode() |
Date | |
Msg-id | 87ptuin5wb.fsf@mailbox.samurai.com Whole thread Raw |
In response to | Re: inline newNode() (Bruce Momjian <pgman@candle.pha.pa.us>) |
Responses |
Re: inline newNode()
Re: inline newNode() Re: inline newNode() |
List | pgsql-patches |
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Remember, MemSet was invented only to prevent function call overhead, > and on my BSD/OS system, len >= 256 is faster with the libc > memset(). Yes, I remember finding that when testing MemSet() versus memset() for various values of MEMSET_LOOP_LIMIT earlier. > What really surprised me is that MemSet won on Sparc, where they have an > assembler language version that looks very similar to the MemSet > loop. Well, I'd assume any C library / compiler of half-decent quality on any platform would provide assembly optimized versions of common stdlib functions like memset(). While playing around with memset() on my machine (P4 running Linux, glibc 2.2.5, GCC 3.2.1pre3), I found the following interesting result. I used this simple benchmark (the same one I posted for the earlier MemSet() thread on -hackers): #include <string.h> #include "postgres.h" #undef MEMSET_LOOP_LIMIT #define MEMSET_LOOP_LIMIT BUFFER_SIZE int main(void) { char buffer[BUFFER_SIZE]; long long i; for (i = 0; i < 99000000; i++) { memset(buffer, 0, sizeof(buffer)); } return 0; } Compiled with '-DBUFFER_SIZE=256 -O2', I get the following results in seconds: MemSet(): ~9.6 memset(): ~19.5 __builtin_memset(): ~10.00 So it seems there is a reasonably optimized version of memset() provided by glibc/GCC (not sure which :-) ), it's just a matter of persuading the compiler to let us use it. It's still depressing that it doesn't beat MemSet(), but perhaps __builtin_memset() has better average-case performane over a wider spectrum of memory size?[1] BTW, regarding the newNode() stuff: so is it agreed that Bruce's patch is a performance win without too high of a code bloat / uglification penalty? If so, is it 7.3 or 7.4 material? Cheers, Neil [1] Not that I really buy that -- for one thing, if the length is constant, as it is in this case, the compiler can substitute an optimized version of the function for the appropriate memory size. I'm having a little difficulty explaining GCC/glibc's poor performance... -- Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC
pgsql-patches by date: