On Tue, 9 Jun 2020 at 22:08, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>
> >>>>> "David" == David Rowley <dgrowleyml@gmail.com> writes:
>
> David> This allows us to speed up a few cases. int2vectorout() should
> David> be faster and int8out() becomes a bit faster if we get rid of
> David> the strdup() call and replace it with a palloc()/memcpy() call.
>
> What about removing the memcpy entirely? I don't think we save anything
> much useful here by pallocing the exact length, rather than doing what
> int4out does and palloc a fixed size and convert the int directly into
> it.
The attached 0001 patch does this.
create table bi (a bigint);
insert into bi select generate_Series(1,10000000);
vacuum freeze analyze bi;
query = copy bi to '/dev/null';
120 second pgbench run.
The results are:
GCC master: latency average = 1757.556 ms
GCC master+0001: latency average = 1588.793 ms (90.4%)
clang master: latency average = 1818.952 ms
clang master+0001: latency average = 1649.100 ms (90.6%)
> For pg_ltoa, etc., I don't like adding the extra call to pg_ultoa_n - at
> least on my clang, that results in two copies of pg_ultoa_n inlined.
> How about doing it like,
>
> int
> pg_lltoa(int64 value, char *a)
> {
> int len = 0;
> uint64 uvalue = value;
>
> if (value < 0)
> {
> uvalue = (uint64) 0 - uvalue;
> a[len++] = '-';
> }
> len += pg_ulltoa_n(uvalue, a + len);
> a[len] = '\0';
> return len;
> }
The 0002 patch does it this way.
David