Re: Considering additional sort specialisation functions for PG16 - Mailing list pgsql-hackers

From John Naylor
Subject Re: Considering additional sort specialisation functions for PG16
Date
Msg-id CAFBsxsH=3UA+8Sp2sgUs=piYOmR2CQ-Rq5sUhCX-M3Gqv4TRog@mail.gmail.com
Whole thread Raw
In response to Re: Considering additional sort specialisation functions for PG16  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: Considering additional sort specialisation functions for PG16
Re: Considering additional sort specialisation functions for PG16
Re: Considering additional sort specialisation functions for PG16
List pgsql-hackers

On Tue, Aug 23, 2022 at 1:13 PM John Naylor <john.naylor@enterprisedb.com> wrote:
>
> On Tue, Aug 23, 2022 at 11:24 AM David Rowley <dgrowleyml@gmail.com> wrote:
> >
> > On Tue, 23 Aug 2022 at 15:22, John Naylor <john.naylor@enterprisedb.com> wrote:
> > > Did you happen to see
> > >
> > > https://www.postgresql.org/message-id/CAFBsxsFhq8VUSkUL5YO17cFXbCPwtbbxBu%2Bd9MFrrsssfDXm3Q%40mail.gmail.com
> >
> > I missed that.  It looks like a much more promising idea than what I
> > came up with. I've not looked at your code yet, but I'm interested and
> > will aim to look soon.
>
> Note that I haven't actually implemented this idea yet, just tried to
> model the effects by lobotomizing the current comparators. I think
> it's worth pursuing and will try to come back to it this cycle, but if
> you or anyone else wants to try, that's fine of course.

Coming back to this, I wanted to sketch out this idea in a bit more detail.

Have two memtuple arrays, one for first sortkey null and one for first sortkey non-null:
- Qsort the non-null array, including whatever specialization is available. Existing specialized comparators could ignore nulls (and their ordering) taking less space in the binary.
- Only if there is more than one sort key, qsort the null array. Ideally at some point we would have a method of ignoring the first sortkey (this is an existing opportunity that applies elsewhere as well).
- To handle two arrays, grow_memtuples() would need some adjustment, as would any callers that read the final result of an in-memory sort -- they would need to retrieve the tuples starting with the appropriate array depending on NULLS FIRST/LAST behavior.

I believe external merges wouldn't have to do anything different, since when writing out the tapes, we read from the arrays in the right order.

(One could extend this idea further and have two pools of tapes for null and non-null first sortkey, that are merged separately, in the right order. That sounds like quite a bit more complexity than is worth, however.)

--
John Naylor
EDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Dean Rasheed
Date:
Subject: Re: to_hex() for negative inputs
Next
From: Andrei Zubkov
Date:
Subject: Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements