Re: Considering additional sort specialisation functions for PG16 - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Considering additional sort specialisation functions for PG16 |
Date | |
Msg-id | CAApHDvqXcmzAZDsj8iQs84+drzo0PgYub_QYzT6Zdj6p4A3A5A@mail.gmail.com Whole thread Raw |
In response to | Re: Considering additional sort specialisation functions for PG16 (John Naylor <john.naylor@enterprisedb.com>) |
Responses |
Re: Considering additional sort specialisation functions for PG16
|
List | pgsql-hackers |
On Thu, 26 Jan 2023 at 23:29, John Naylor <john.naylor@enterprisedb.com> wrote: > Coming back to this, I wanted to sketch out this idea in a bit more detail. > > Have two memtuple arrays, one for first sortkey null and one for first sortkey non-null: > - Qsort the non-null array, including whatever specialization is available. Existing specialized comparators could ignorenulls (and their ordering) taking less space in the binary. > - Only if there is more than one sort key, qsort the null array. Ideally at some point we would have a method of ignoringthe first sortkey (this is an existing opportunity that applies elsewhere as well). > - To handle two arrays, grow_memtuples() would need some adjustment, as would any callers that read the final result ofan in-memory sort -- they would need to retrieve the tuples starting with the appropriate array depending on NULLS FIRST/LASTbehavior. Thanks for coming back to this. I've been thinking about this again recently due to what was discovered in [1]. Basically, the patch on that thread is trying to eliminate the need for the ORDER BY sort in a query such as: SELECT a,b,row_number() over (order by a) from ab order by a,b; the idea is to just perform the full sort on a,b for the WindowClause so save from having to do an Incremental Sort on b for the ORDER BY after evaluating the window funcs. Surprisingly (for me), we found a bunch of cases where the performance is better to do a sort on some of the keys, then do an Incremental sort on the remainder rather than just doing a single sort on everything. I don't really understand why this is fully yet, but one theory I have is that it might be down to work_mem being larger than the CPU's L3 cache and causing more cache line misses. With the more simple sort, there's less swapping of items in the array because the comparison function sees tuples as equal more often. I did find that the gap between the two is not as large with fewer tuples to sort. I think the slower sorts I found in [2] could also be partially caused by the current sort specialisation comparators re-comparing the leading column during a tie-break. I've not gotten around to disabling the sort specialisations to see if and how much this is a factor for that test. Why I'm bringing this up here is that I wondered, in addition to what you're mentioning above, if we're making some changes to allow multiple in-memory arrays, would it be worth going a little further and allowing it so we could have N arrays which we'd try to keep under L3 cache size. Maybe some new GUC could be used to know what a good value is. With fast enough disks, it's often faster to use smaller values of work_mem which don't exceed L3 to keep these batches smaller. Keeping them in memory would remove the need to write out tapes. I don't really know exactly if such a feature could easily be tagged on to what you propose above, but I feel like what you're talking about is part of the way towards that at least, maybe at the least, the additional work could be kept in mind when this is written so that it's easier to extend in the future. David [1] https://postgr.es/m/CAApHDvpAO5H_L84kn9gCJ_hihOavtmDjimKYyftjWtF69BJ=8Q@mail.gmail.com [2] https://postgr.es/m/CAApHDvqh%2BqOHk4sbvvy%3DQr2NjPqAAVYf82oXY0g%3DZ2hRpC2Vmg%40mail.gmail.com
pgsql-hackers by date: