Re: Faster StrNCpy - Mailing list pgsql-hackers
From | mark@mark.mielke.cc |
---|---|
Subject | Re: Faster StrNCpy |
Date | |
Msg-id | 20060929212331.GB30048@mark.mielke.cc Whole thread Raw |
In response to | Re: Faster StrNCpy (mark@mark.mielke.cc) |
Responses |
Re: Faster StrNCpy
(Tom Lane <tgl@sss.pgh.pa.us>)
|
List | pgsql-hackers |
If anybody is curious, here are my numbers for an AMD X2 3800+: $ gcc -O3 -std=c99 -DSTRING='"This is a very long sentence that is expected to be slow."' -o x x.c y.c strlcpy.c ; ./x NONE: 620268 us MEMCPY: 683135 us STRNCPY: 7952930 us STRLCPY: 10042364 us $ gcc -O3 -std=c99 -DSTRING='"Short sentence."' -o x x.c y.c strlcpy.c ; ./x NONE: 554694 us MEMCPY: 691390 us STRNCPY: 7759933 us STRLCPY: 3710627 us $ gcc -O3 -std=c99 -DSTRING='""' -o x x.c y.c strlcpy.c ; ./x NONE: 631266 us MEMCPY: 775340 us STRNCPY: 7789267 us STRLCPY: 550430 us Each invocation represents 100 million calls to each of the functions. Each function accepts a 'dst' and 'src' argument, and assumes that it is copying 64 bytes from 'src' to 'dst'. The none function does nothing. The memcpy calls memcpy(), the strncpy calls strncpy(), and the strlcpy calls the strlcpy() that was posted from the BSD sources. (GLIBC doesn't have strlcpy() on my machine). This makes it clear what the overhead of the additional logic involves. memcpy() is approximately equal to nothing at all. strncpy() is always expensive. strlcpy() is often more expensive than memcpy(), except in the empty string case. These tests do not properly model the effects of real memory, however, they do model the effects of cache memory. I would suggest that the results are exaggerated, but not invalid. For anybody doubting the none vs memcpy, I've included the generated assembly code. I chalk it entirely up to fully utilizing the parallelization capability of the CPU. Although 16 movq instructions are executed, they can be executed fully in parallel. It almost makes it clear to me that all of these instructions are pretty fast. Are we sure this is a real bottleneck? Even the slowest operation above, strlcpy() on a very long string, appears to execute 10 per microsecond? Perhaps my tests are too easy for my CPU and I need to make it access many different 64-byte blocks? :-) Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/
Attachment
pgsql-hackers by date: