Thread: pgsql: Specialize tuplesort routines for different kinds of abbreviated

pgsql: Specialize tuplesort routines for different kinds of abbreviated

From
John Naylor
Date:
Specialize tuplesort routines for different kinds of abbreviated keys

Previously, the specialized tuplesort routine inlined handling for
reverse-sort and NULLs-ordering but called the datum comparator via a
pointer in the SortSupport struct parameter. Testing has showed that we
can get a useful performance gain by specializing datum comparison for
the different representations of abbreviated keys -- signed and unsigned
64-bit integers and signed 32-bit integers. Almost all abbreviatable data
types will benefit -- the only exception for now is numeric, since the
datum comparison is more complex. The performance gain depends on data
type and input distribution, but often falls in the range of 10-20% faster.

Thomas Munro

Reviewed by Peter Geoghegan, review and performance testing by me

Discussion:
https://www.postgresql.org/message-id/CA%2BhUKGKKYttZZk-JMRQSVak%3DCXSJ5fiwtirFf%3Dn%3DPAbumvn1Ww%40mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/6974924347c908335607a4a2f252213d58e21b7c

Modified Files
--------------
src/backend/access/gist/gistproc.c     |  18 +---
src/backend/access/nbtree/nbtcompare.c |  22 ++---
src/backend/utils/adt/date.c           |  15 +---
src/backend/utils/adt/mac.c            |  23 +----
src/backend/utils/adt/network.c        |  17 +---
src/backend/utils/adt/timestamp.c      |  11 +++
src/backend/utils/adt/uuid.c           |  25 ++----
src/backend/utils/adt/varlena.c        |  34 ++-----
src/backend/utils/sort/tuplesort.c     | 160 ++++++++++++++++++++++++++++++++-
src/include/utils/sortsupport.h        | 115 ++++++++++++++++++++++++
10 files changed, 308 insertions(+), 132 deletions(-)


Re: pgsql: Specialize tuplesort routines for different kinds of abbreviated

From
John Naylor
Date:
So far, kestrel and tamandua don't like this, and they both use
"-fsanitize=undefined,alignment". I'll try to reproduce locally in a
bit.

-- 
John Naylor
EDB: http://www.enterprisedb.com



Re: pgsql: Specialize tuplesort routines for different kinds of abbreviated

From
Andres Freund
Date:
Hi,

On 2022-04-02 16:06:40 +0700, John Naylor wrote:
> So far, kestrel and tamandua don't like this, and they both use
> "-fsanitize=undefined,alignment". I'll try to reproduce locally in a
> bit.

Just hit this in my development environment.

FWIW, I found it hard to work with ubsan without applying 0002 from
https://postgr.es/m/20220323173537.ll7klrglnp4gn2um%40alap3.anarazel.de
I plan to polish that to be able to commit it, but won't have cycles before
the CF ends.

with that applied
CFLAGS=-fsanitize=alignment,undefined,address -fno-sanitize-recover=all
and
export ASAN_OPTIONS="detect_leaks=0:disable_coredump=0:print_stacktrace=1:abort_on_error=1"
UBSAN_OPTIONS="disable_coredump=0:print_stacktrace=1:abort_on_error=1"

I get a backtrace to investigate. Unfortunately abort_on_error=1 also removes
the nicer error message :(. So I ran both.

It triggers reliably on
CLUSTER clstr_expression USING clstr_expression_minus_a;
for me.

/home/andres/src/postgresql/src/backend/utils/sort/tuplesort.c:723:23: runtime error: load of value 150, which is not a
validvalue for type '_Bool'
 

#0  __ubsan::__ubsan_handle_load_invalid_value_abort (Data=0x7ff65e8fada0, Val=70) at
../../../../src/libsanitizer/ubsan/ubsan_handlers.cpp:548
#1  0x00007ff65a649093 in qsort_tuple_int32_compare (a=0x62b000fc0240, b=0x62b000fc0258, state=0x625000066a20)
    at /home/andres/src/postgresql/src/backend/utils/sort/tuplesort.c:723
#2  0x00007ff65a64bc7b in qsort_tuple_int32 (data=0x62b000fc0240, n=133, arg=0x625000066a20)
    at /home/andres/src/postgresql/src/include/lib/sort_template.h:313
#3  0x00007ff65a672d59 in tuplesort_sort_memtuples (state=0x625000066a20) at
/home/andres/src/postgresql/src/backend/utils/sort/tuplesort.c:3613
#4  0x00007ff65a6617ec in tuplesort_performsort (state=0x625000066a20) at
/home/andres/src/postgresql/src/backend/utils/sort/tuplesort.c:2154
#5  0x00007ff6586965dd in heapam_relation_copy_for_cluster (OldHeap=0x7ff64d893308, NewHeap=0x7ff64d896fc0,
OldIndex=0x7ff64d89c860,use_sort=true,
 
    OldestXmin=23706, xid_cutoff=0x7ffd4650a480, multi_cutoff=0x7ffd4650a490, num_tuples=0x7ffd4650a4a0,
tups_vacuumed=0x7ffd4650a4c0,
    tups_recently_dead=0x7ffd4650a4e0) at /home/andres/src/postgresql/src/backend/access/heap/heapam_handler.c:955
#6  0x00007ff658d6c4b7 in table_relation_copy_for_cluster (OldTable=0x7ff64d893308, NewTable=0x7ff64d896fc0,
OldIndex=0x7ff64d89c860,use_sort=true,
 
    OldestXmin=23706, xid_cutoff=0x7ffd4650a480, multi_cutoff=0x7ffd4650a490, num_tuples=0x7ffd4650a4a0,
tups_vacuumed=0x7ffd4650a4c0,
    tups_recently_dead=0x7ffd4650a4e0) at /home/andres/src/postgresql/src/include/access/tableam.h:1658
#7  0x00007ff658d728aa in copy_table_data (OIDNewHeap=73728, OIDOldHeap=47052, OIDOldIndex=47078, verbose=false,
pSwapToastByContent=0x7ffd4650a670,
    pFreezeXid=0x7ffd4650a680, pCutoffMulti=0x7ffd4650a690) at
/home/andres/src/postgresql/src/backend/commands/cluster.c:913
#8  0x00007ff658d6ff9e in rebuild_relation (OldHeap=0x7ff64d893308, indexOid=47078, verbose=false)
    at /home/andres/src/postgresql/src/backend/commands/cluster.c:606
#9  0x00007ff658d6e622 in cluster_rel (tableOid=47052, indexOid=47078, params=0x7ffd4650a7b0)
    at /home/andres/src/postgresql/src/backend/commands/cluster.c:427
#10 0x00007ff658d6d774 in cluster (pstate=0x619000001aa0, stmt=0x625000005d00, isTopLevel=true)
    at /home/andres/src/postgresql/src/backend/commands/cluster.c:195
#11 0x00007ff659d425b1 in standard_ProcessUtility (pstmt=0x625000006010,
    queryString=0x625000005220 "CLUSTER clstr_expression USING clstr_expression_minus_a;", readOnlyTree=false,
context=PROCESS_UTILITY_TOPLEVEL,params=0x0,
 
    queryEnv=0x0, dest=0x6250000060e0, qc=0x7ffd4650ae40) at
/home/andres/src/postgresql/src/backend/tcop/utility.c:862
#12 0x00007ff659d40901 in ProcessUtility (pstmt=0x625000006010, queryString=0x625000005220 "CLUSTER clstr_expression
USINGclstr_expression_minus_a;",
 
    readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x6250000060e0,
qc=0x7ffd4650ae40)

(rr) p/x *b
$4 = {tuple = 0x62500006ba78, datum1 = 0x7ff659cc7e9e, isnull1 = 0x46, srctape = 0x7ff6}

There's definitely something borked - looks like this is ending up with bogus
pointers? Using rr to set a watchpoint on isnull1, and continuing backward I
see the memory written to with the following stack:

(rr) watch -l b->isnull1
Hardware watchpoint 3: -location b->isnull1

(rr) reverse-continue
Continuing.

Hardware watchpoint 3: -location b->isnull1

Old value = 70
New value = 190
0x00007ff65a65fbe4 in puttuple_common (state=0x625000066a20, tuple=0x7ffd4650a150) at
/home/andres/src/postgresql/src/backend/utils/sort/tuplesort.c:2000
2000                state->memtuples[state->memtupcount++] = *tuple;
(rr) bt
#0  0x00007ff65a65fbe4 in puttuple_common (state=0x625000066a20, tuple=0x7ffd4650a150) at
/home/andres/src/postgresql/src/backend/utils/sort/tuplesort.c:2000
#1  0x00007ff65a65a478 in tuplesort_putheaptuple (state=0x625000066a20, tup=0x625000071d28)
    at /home/andres/src/postgresql/src/backend/utils/sort/tuplesort.c:1810
#2  0x00007ff658696300 in heapam_relation_copy_for_cluster (OldHeap=0x7ff64d893308, NewHeap=0x7ff64d896fc0,
OldIndex=0x7ff64d89c860,use_sort=true,
 
    OldestXmin=23706, xid_cutoff=0x7ffd4650a480, multi_cutoff=0x7ffd4650a490, num_tuples=0x7ffd4650a4a0,
tups_vacuumed=0x7ffd4650a4c0,

(note new/old are kind of switched around due to continuing in reverse)

and then

Hardware watchpoint 3: -location b->isnull1

Old value = 190
New value = false
__memset_evex_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:143
143    ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.
(rr) bt
#0  __memset_evex_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:143
#1  0x00007ff654d2c3c4 in __asan::Allocator::Allocate (this=this@entry=0x7ff654e3ce00 <__asan::instance>,
size=<optimizedout>, size@entry=24640,
 
    alignment=<optimized out>, alignment@entry=8, stack=stack@entry=0x7ffd465092c0,
alloc_type=alloc_type@entry=__asan::FROM_MALLOC,
    can_fill=can_fill@entry=true) at ../../../../src/libsanitizer/asan/asan_allocator.cpp:598
#2  0x00007ff654d28a9b in __asan::asan_malloc (size=size@entry=24640, stack=stack@entry=0x7ffd465092c0)
    at ../../../../src/libsanitizer/asan/asan_allocator.cpp:964
#3  0x00007ff654db998f in __interceptor_malloc (size=24640) at
../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:70
#4  0x00007ff65a5c4f37 in AllocSetAlloc (context=0x625000066900, size=24576) at
/home/andres/src/postgresql/src/backend/utils/mmgr/aset.c:740
#5  0x00007ff65a60e385 in palloc (size=24576) at /home/andres/src/postgresql/src/backend/utils/mmgr/mcxt.c:1082
#6  0x00007ff65a6513dc in tuplesort_begin_batch (state=0x625000066a20) at
/home/andres/src/postgresql/src/backend/utils/sort/tuplesort.c:963
#7  0x00007ff65a6501d9 in tuplesort_begin_common (workMem=65536, coordinate=0x0, randomAccess=false)
    at /home/andres/src/postgresql/src/backend/utils/sort/tuplesort.c:878


Greetings,

Andres Freund



Re: pgsql: Specialize tuplesort routines for different kinds of abbreviated

From
Andres Freund
Date:
Hi,

On 2022-04-02 13:15:57 -0700, Andres Freund wrote:
> I get a backtrace to investigate. Unfortunately abort_on_error=1 also removes
> the nicer error message :(. So I ran both.

This bit was just me being stupid and missing the error message...


> with that applied
> CFLAGS=-fsanitize=alignment,undefined,address -fno-sanitize-recover=all
> and
> export ASAN_OPTIONS="detect_leaks=0:disable_coredump=0:print_stacktrace=1:abort_on_error=1"
UBSAN_OPTIONS="disable_coredump=0:print_stacktrace=1:abort_on_error=1"

Forgot to add the warning that disable_coredump=0 on non-linux (or old linux)
systems can end up generating humongous core files...


To be able to run rr with ubsan/asan one needs verify_asan_link_order=0 in
ASAN_OPTIONS btw.

Greetings,

Andres Freund



Re: pgsql: Specialize tuplesort routines for different kinds of abbreviated

From
Andres Freund
Date:
Hi,

On 2022-04-02 13:15:57 -0700, Andres Freund wrote:
> There's definitely something borked - looks like this is ending up with bogus
> pointers? Using rr to set a watchpoint on isnull1, and continuing backward I
> see the memory written to with the following stack:

Looks like it's just plain old uninitialized data. tuplesort_putheaptuple()
doesn't compute SortTuple.{isnull1,datum1}. And comparetup_heap() doesn't rely
on them being set.  But tuplesort_sort_memtuples() will now bypass
comparetup_heap(), so comparetup_heap computing those doesn't help anymore.

Greetings,

Andres Freund



On Sat, 2 Apr 2022 at 21:30, John Naylor <john.naylor@postgresql.org> wrote:
> src/backend/utils/sort/tuplesort.c     | 160 ++++++++++++++++++++++++++++++++-

I was just looking over this change and wondered a few things:

1. Shouldn't ssup_datum_signed_cmp and ssup_datum_int32_cmp be using
DatumGetInt32 and DatumGetInt64?

2. I also feel that it's relevant to comment about 32-bit safety for
usages of these functions.

I'm trying to work out what the point of ssup_datum_signed_cmp is when
SIZEOF_DATUM == 4. As far as I see, we just never use it. However,
looking at btint8sortsupport(), we seem to set the comparator based on
USE_FLOAT8_BYVAL rather than SIZEOF_DATUM. It's true in master that
USE_FLOAT8_BYVAL will be 1 if SIZEOF_VOIDP >= 8, per:

#if SIZEOF_VOID_P >= 8
#define USE_FLOAT8_BYVAL 1
#endif


#define SIZEOF_DATUM SIZEOF_VOID_P

This is hypothetical, but if for some reason SIZEOF_VOIDP was larger
than 8, say 16, then the above would define USE_FLOAT8_BYVAL resulting
timestamp and bigint using the new comparators. However, the code
you've added to ssup_datum_signed_cmp checks for SIZEOF_DATUM == 8. It
would assume 32-bit accidentally. That would cause issues.

From what I can see, ssup_datum_signed_cmp just shouldn't exist in
32-bit builds and we should be consistent about how we determine when
comparators to use.

I've attached a patch which is along the lines of how I imagined this
should look.

What do you think?

David

Attachment
On Tue, May 10, 2022 at 3:46 PM David Rowley <dgrowleyml@gmail.com> wrote:

> I was just looking over this change and wondered a few things:
>
> 1. Shouldn't ssup_datum_signed_cmp and ssup_datum_int32_cmp be using
> DatumGetInt32 and DatumGetInt64?

Right.

> This is hypothetical, but if for some reason SIZEOF_VOIDP was larger
> than 8, say 16, then the above would define USE_FLOAT8_BYVAL resulting
> timestamp and bigint using the new comparators. However, the code
> you've added to ssup_datum_signed_cmp checks for SIZEOF_DATUM == 8. It
> would assume 32-bit accidentally. That would cause issues.

Testing for Datum size seems more in line with the intent. (Even aside
from that hypothetical, making all Datums 8 bytes has been suggested
before, and this might make that change easier.)

> From what I can see, ssup_datum_signed_cmp just shouldn't exist in
> 32-bit builds and we should be consistent about how we determine when
> comparators to use.
>
> I've attached a patch which is along the lines of how I imagined this
> should look.
>
> What do you think?

+1

-- 
John Naylor
EDB: http://www.enterprisedb.com



On Tue, 10 May 2022 at 21:45, John Naylor <john.naylor@enterprisedb.com> wrote:
>
> On Tue, May 10, 2022 at 3:46 PM David Rowley <dgrowleyml@gmail.com> wrote:
> > What do you think?
>
> +1

Thanks for having a look.

I pushed this after I made a small adjustment to #ifdef out the
qsort_tuple_signed sort specialization in 32-bit builds.  On testing a
32-bit build with the patch I proposed, I was getting a warning about
that function being unused.

I found the 32-bit postgres binary is now an entire 4376 bytes smaller!

David



On Wed, May 11, 2022 at 6:43 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> I pushed this after I made a small adjustment to #ifdef out the
> qsort_tuple_signed sort specialization in 32-bit builds.  On testing a
> 32-bit build with the patch I proposed, I was getting a warning about
> that function being unused.

Earlier I looked at your patch, but didn't think to check the rest of
the code affected by this commit. Do we also need something like the
attached, for the ApplyXYZSortComparator functions? (I don't have a
32-bit platform to test on)

-- 
John Naylor
EDB: http://www.enterprisedb.com

Attachment
On Wed, 11 May 2022 at 20:42, John Naylor <john.naylor@enterprisedb.com> wrote:
> Earlier I looked at your patch, but didn't think to check the rest of
> the code affected by this commit. Do we also need something like the
> attached, for the ApplyXYZSortComparator functions? (I don't have a
> 32-bit platform to test on)

I didn't notice those.

I'm pretty sure those macros need to be used any time we want to
convert a Datum into a C type.

David



Re: pgsql: Specialize tuplesort routines for different kinds of abbreviated

From
Michael Paquier
Date:
On Wed, May 11, 2022 at 03:42:05PM +0700, John Naylor wrote:
> Earlier I looked at your patch, but didn't think to check the rest of
> the code affected by this commit. Do we also need something like the
> attached, for the ApplyXYZSortComparator functions? (I don't have a
> 32-bit platform to test on)

You could use for example gcc's -m32 to do that even on a 64b
platform, no?
--
Michael

Attachment
On Wed, 11 May 2022 at 20:42, John Naylor <john.naylor@enterprisedb.com> wrote:
> (I don't have a
> 32-bit platform to test on)

I had to do as Michael mentioned yesterday to test this.

FWIW, I compiled and ran the tests on my machine with -m32 and no
compile or test failures.  Also, looking at the patch, it seems fine
to me. Do you want to push it?

David



On Thu, May 12, 2022 at 4:50 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Wed, 11 May 2022 at 20:42, John Naylor <john.naylor@enterprisedb.com> wrote:
> > (I don't have a
> > 32-bit platform to test on)
>
> I had to do as Michael mentioned yesterday to test this.
>
> FWIW, I compiled and ran the tests on my machine with -m32 and no
> compile or test failures.  Also, looking at the patch, it seems fine
> to me. Do you want to push it?

Thanks for looking -- I'll push soon after experimenting a bit with
-m32 for education's sake.

--
John Naylor
EDB: http://www.enterprisedb.com