Re: Faster StrNCpy - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Faster StrNCpy
Date
Msg-id 13179.1159813811@sss.pgh.pa.us
Whole thread Raw
In response to Re: Faster StrNCpy  (mark@mark.mielke.cc)
Responses Re: Faster StrNCpy  (mark@mark.mielke.cc)
Re: Faster StrNCpy  ("Sergey E. Koposov" <math@sai.msu.ru>)
List pgsql-hackers
mark@mark.mielke.cc writes:
> Here is the cache hit case including your strlen+memcpy as 'LENCPY':

> $ gcc -O3 -std=c99 -DSTRING='"This is a very long sentence that is expected to be very slow."' -DN=1 -o x x.c y.c
strlcpy.c; ./x
 
> NONE:        696157 us
> MEMCPY:      825118 us
> STRNCPY:    7983159 us
> STRLCPY:   10787462 us
> LENCPY:     6048339 us

It appears that these results are a bit platform-dependent; on my x86_64
(Xeon) Fedora 5 box, I get

$ gcc -O3 -std=c99 -DSTRING='"This is a very long sentence that is expected to be very slow."' -DN=1 x.c y.c strlcpy.c
$ ./a.out
NONE:        358679 us
MEMCPY:      619255 us
STRNCPY:    8932551 us
STRLCPY:    9212371 us
LENCPY:    13910413 us

I'm not sure why the lencpy method sucks so badly on this machine :-(.

Anyway, I looked at glibc's strncpy and determined that on this machine
the only real optimization that's been done to it is to unroll the data
copying loop four times.  I did the same to strlcpy (attached) and got
numbers like these:

$ gcc -O3 -std=c99 -DSTRING='"This is a very long sentence that is expected to be very slow."' -DN=1 x.c y.c strlcpy.c
$ ./a.out
NONE:        359317 us
MEMCPY:      619636 us
STRNCPY:    8933507 us
STRLCPY:    7644576 us
LENCPY:    13917927 us
$ gcc -O3 -std=c99 -DSTRING='"This is a very long sentence that is expected to be very slow."' -DN="(1024*1024)" x.c
y.cstrlcpy.c
 
$ ./a.out
NONE:        502960 us
MEMCPY:     5382528 us
STRNCPY:    9733890 us
STRLCPY:    8740892 us
LENCPY:    15358616 us
$ gcc -O3 -std=c99 -DSTRING='"short"' -DN=1 x.c y.c strlcpy.c
$ ./a.out
NONE:        358426 us
MEMCPY:      618533 us
STRNCPY:    6704926 us
STRLCPY:     867336 us
LENCPY:    10115883 us
$ gcc -O3 -std=c99 -DSTRING='"short"' -DN="(1024*1024)" x.c y.c strlcpy.c
$ ./a.out
NONE:        502746 us
MEMCPY:     5365171 us
STRNCPY:    7983610 us
STRLCPY:    5557277 us
LENCPY:    11533066 us

So the unroll seems to get us to the point of not losing compared to the
original strncpy code for any string length, and so I propose doing
that, if it holds up on other architectures.
        regards, tom lane


size_t
strlcpy(char *dst, const char *src, size_t siz)
{char *d = dst;const char *s = src;size_t n = siz;
/* Copy as many bytes as will fit */if (n != 0) {    while (n > 4) {        if ((*d++ = *s++) == '\0')            goto
done;       if ((*d++ = *s++) == '\0')            goto done;        if ((*d++ = *s++) == '\0')            goto done;
   if ((*d++ = *s++) == '\0')            goto done;        n -= 4;    }    while (--n != 0) {        if ((*d++ = *s++)
=='\0')            goto done;    }}
 
/* Not enough room in dst, add NUL and traverse rest of src */if (siz != 0)    *d = '\0';                /*
NUL-terminatedst */while (*s++)    ;
 

done:return(s - src - 1);        /* count does not include NUL */
}


pgsql-hackers by date:

Previous
From: uwcssa
Date:
Subject: undescribe
Next
From: "Luke Lonergan"
Date:
Subject: Re: Faster StrNCpy