Re: Sort functions with specialized comparators - Mailing list pgsql-hackers

From John Naylor
Subject Re: Sort functions with specialized comparators
Date
Msg-id CANWCAZYAAo8WmqV4wQVB4k8FnCBJ983kAgNMBH2tTnveG_+3EQ@mail.gmail.com
Whole thread Raw
In response to Sort functions with specialized comparators  ("Andrey M. Borodin" <x4mmm@yandex-team.ru>)
List pgsql-hackers
On Fri, Dec 6, 2024 at 1:32 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
> > On 5 Dec 2024, at 15:16, John Naylor <johncnaylorls@gmail.com> wrote:

> > I believe src/port/qsort.c was meant to be just for the standard sort
> > interface as found in a C library. We do have one globally visible
> > special sort here:
> > src/backend/utils/sort/qsort_interruptible.c
> > ...so that directory seems a better fit.
> OK. BTW do we need ST_CHECK_FOR_INTERRUPTS?

That's a good thing to raise right now -- intarray currently doesn't
have one, and we haven't gotten complaints from people trying to sort
large arrays and cancel the query. This extension is not commonly
used, so that's not surprising. It could be that large arrays are even
less common, or no one bothered to report it. What's the largest size
that your customers use?

If we do need a check for interrupts, then this whole thing must
remain private to intarray. From reading e64cdab0030 , it's not safe
to interrupt in general.

> > And one more bikeshedding bit that might get noticed: tuplesorts
> > express their boolean as "reversed". We don't necessarily need to
> > follow that, but consistency is good for readability.
>
> I do not know if "reversed sorting order" is more idiomatic than "ascending sorting order". If you think it is -
let'sswitch argument's name to "reversed". 

After sleeping on it, I actually think it's mildly ridiculous for this
module to force the comparator to know about the sort direction.
Tuplesorts must do that because each sort key could have a different
sort order. There is only one place in intarray that wants reversed
order -- maybe that place should reverse elements itself? It's fine to
keep thing as they are if the sort function stays private to intarray,
but this patch creates a global function, where the "ascending"
parameter is just noise. And if we don't have large int32 sorts
outside of intarray, then the path of least resistance may be to keep
it private.

I had a look at the files touched by this patch and noticed that there
is another sort used for making arrays unique. Were you going to look
at that as well? That reminded me of a patchset from Thomas Munro that
added bsearch and unique macros to the sort template -- see 0001-0003
in the link below. (That also includes a proposal to have a
specialization for uint32 -- I'm still not sure if that would have a
performance benefit for real workloads, but I believe the motivation
was mostly cosmetic):

https://www.postgresql.org/message-id/CA%2BhUKGKztHEWm676csTFjYzortziWmOcf8HDss2Zr0muZ2xfEg%40mail.gmail.com

--
John Naylor
Amazon Web Services



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: MinGW compiler warnings in ecpg tests
Next
From: vignesh C
Date:
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation