Re: Considering additional sort specialisation functions for PG16 - Mailing list pgsql-hackers

From David Rowley
Subject Re: Considering additional sort specialisation functions for PG16
Date
Msg-id CAApHDvqXcmzAZDsj8iQs84+drzo0PgYub_QYzT6Zdj6p4A3A5A@mail.gmail.com
Whole thread Raw
In response to Re: Considering additional sort specialisation functions for PG16  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: Considering additional sort specialisation functions for PG16
List pgsql-hackers
On Thu, 26 Jan 2023 at 23:29, John Naylor <john.naylor@enterprisedb.com> wrote:
> Coming back to this, I wanted to sketch out this idea in a bit more detail.
>
> Have two memtuple arrays, one for first sortkey null and one for first sortkey non-null:
> - Qsort the non-null array, including whatever specialization is available. Existing specialized comparators could
ignorenulls (and their ordering) taking less space in the binary. 
> - Only if there is more than one sort key, qsort the null array. Ideally at some point we would have a method of
ignoringthe first sortkey (this is an existing opportunity that applies elsewhere as well). 
> - To handle two arrays, grow_memtuples() would need some adjustment, as would any callers that read the final result
ofan in-memory sort -- they would need to retrieve the tuples starting with the appropriate array depending on NULLS
FIRST/LASTbehavior. 

Thanks for coming back to this. I've been thinking about this again
recently due to what was discovered in [1].  Basically, the patch on
that thread is trying to eliminate the need for the ORDER BY sort in a
query such as: SELECT a,b,row_number() over (order by a) from ab order
by a,b;  the idea is to just perform the full sort on a,b for the
WindowClause so save from having to do an Incremental Sort on b for
the ORDER BY after evaluating the window funcs.  Surprisingly (for
me), we found a bunch of cases where the performance is better to do a
sort on some of the keys, then do an Incremental sort on the remainder
rather than just doing a single sort on everything.

I don't really understand why this is fully yet, but one theory I have
is that it might be down to work_mem being larger than the CPU's L3
cache and causing more cache line misses.  With the more simple sort,
there's less swapping of items in the array because the comparison
function sees tuples as equal more often.  I did find that the gap
between the two is not as large with fewer tuples to sort.

I think the slower sorts I found in [2] could also be partially caused
by the current sort specialisation comparators re-comparing the
leading column during a tie-break. I've not gotten around to disabling
the sort specialisations to see if and how much this is a factor for
that test.

Why I'm bringing this up here is that I wondered, in addition to what
you're mentioning above, if we're making some changes to allow
multiple in-memory arrays, would it be worth going a little further
and allowing it so we could have N arrays which we'd try to keep under
L3 cache size. Maybe some new GUC could be used to know what a good
value is.  With fast enough disks, it's often faster to use smaller
values of work_mem which don't exceed L3 to keep these batches
smaller.  Keeping them in memory would remove the need to write out
tapes.

I don't really know exactly if such a feature could easily be tagged
on to what you propose above, but I feel like what you're talking
about is part of the way towards that at least, maybe at the least,
the additional work could be kept in mind when this is written so that
it's easier to extend in the future.

David

[1] https://postgr.es/m/CAApHDvpAO5H_L84kn9gCJ_hihOavtmDjimKYyftjWtF69BJ=8Q@mail.gmail.com
[2] https://postgr.es/m/CAApHDvqh%2BqOHk4sbvvy%3DQr2NjPqAAVYf82oXY0g%3DZ2hRpC2Vmg%40mail.gmail.com



pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: Considering additional sort specialisation functions for PG16
Next
From: Amit Langote
Date:
Subject: wrong Append/MergeAppend elision?