Considering additional sort specialisation functions for PG16 - Mailing list pgsql-hackers

From David Rowley
Subject Considering additional sort specialisation functions for PG16
Date
Msg-id CAApHDvoTTtoQYfp3d0kTPF6y1pjexgLwquzKmjzvjC9NCw4RGw@mail.gmail.com
Whole thread Raw
Responses Re: Considering additional sort specialisation functions for PG16
List pgsql-hackers
Hi,

(Background: 697492434 added 3 new sort functions to remove the
indirect function calls for the comparator function.  This sped up
sorting for various of our built-in data types.)

There was a bit of unfinished discussion around exactly how far to
take these specialisations for PG15.  We could certainly add more.

There are various other things we could do to further speed up sorting
for these datatypes.  One example is, we could add 3 more variations
of these functions that can be called when there are no NULL datums to
sort.  That effectively multiplies the number of specialisations by 2,
or adds another dimension.

I have the following dimensions in mind for consideration:

1. Specialisations to handle sorting of non-null datums (eliminates
checking for nulls in the comparison function)
2. Specialisations to handle single column sorts (eliminates
tiebreaker function call or any checks for existence of tiebreaker)
3. ASC sort (No need for if (ssup->ssup_reverse) INVERT_COMPARE_RESULT(compare))

If we did all of the above then we'd end up with 3 * 2 * 2 * 2 = 24
specialization functions.  That seems a bit excessive. So here I'd
like to discuss which ones we should add, if any.

I've attached a very basic implementation of #1 which adds 3 new
functions for sorting non-null datums.  This could be made a bit more
advanced.  For now, I just added a bool flag to track if we have any
NULL datum1s in memtuples[].  For bounded sorts, we may remove NULLs
from that array, and may end up with no nulls after having seen null.
So maybe a count would be better than a flag.

A quick performance test with 1 million random INTs shows ~6%
performance improvement when there are no nulls.

Master
$ pgbench -n -f bench.sql -T 60 postgres
latency average = 159.837 ms
latency average = 161.193 ms
latency average = 159.512 ms

master + not_null_sort_specializations.patch
$ pgbench -n -f bench.sql -T 60 postgres
latency average = 150.791 ms
latency average = 149.843 ms
latency average = 150.319 ms

I didn't test for any regression when there are NULLs and we're unable
to use the new specializations. I'm hoping the null tracking will be
almost free, but I will need to check.

It's all quite subjective to know which specializations should be
added. I think #1 is likely to have the biggest wins when it can be
used as it removes the most branching in the comparator function,
however the biggest gains are not the only thing to consider. We also
need to consider how commonly these functions will be used. I don't
have any information about that.

David

Attachment

pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: shadow variables - pg15 edition
Next
From: Peter Smith
Date:
Subject: Re: Column Filtering in Logical Replication