On Tue, Aug 23, 2022 at 1:13 PM John Naylor <
john.naylor@enterprisedb.com> wrote:
>
> On Tue, Aug 23, 2022 at 11:24 AM David Rowley <
dgrowleyml@gmail.com> wrote:
> >
> > On Tue, 23 Aug 2022 at 15:22, John Naylor <
john.naylor@enterprisedb.com> wrote:
> > > Did you happen to see
> > >
> > >
https://www.postgresql.org/message-id/CAFBsxsFhq8VUSkUL5YO17cFXbCPwtbbxBu%2Bd9MFrrsssfDXm3Q%40mail.gmail.com> >
> > I missed that. It looks like a much more promising idea than what I
> > came up with. I've not looked at your code yet, but I'm interested and
> > will aim to look soon.
>
> Note that I haven't actually implemented this idea yet, just tried to
> model the effects by lobotomizing the current comparators. I think
> it's worth pursuing and will try to come back to it this cycle, but if
> you or anyone else wants to try, that's fine of course.
Coming back to this, I wanted to sketch out this idea in a bit more detail.
Have two memtuple arrays, one for first sortkey null and one for first sortkey non-null:
- Qsort the non-null array, including whatever specialization is available. Existing specialized comparators could ignore nulls (and their ordering) taking less space in the binary.
- Only if there is more than one sort key, qsort the null array. Ideally at some point we would have a method of ignoring the first sortkey (this is an existing opportunity that applies elsewhere as well).
- To handle two arrays, grow_memtuples() would need some adjustment, as would any callers that read the final result of an in-memory sort -- they would need to retrieve the tuples starting with the appropriate array depending on NULLS FIRST/LAST behavior.
I believe external merges wouldn't have to do anything different, since when writing out the tapes, we read from the arrays in the right order.
(One could extend this idea further and have two pools of tapes for null and non-null first sortkey, that are merged separately, in the right order. That sounds like quite a bit more complexity than is worth, however.)