Re: Todo: Teach planner to evaluate multiple windows in the optimal order - Mailing list pgsql-hackers

From John Naylor
Subject Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Date
Msg-id CAFBsxsEW-eRv5GephFO-Y=vQs_MjaEKn-1wpf2=6V7EH+SpUjg@mail.gmail.com
Whole thread Raw
In response to Re: Todo: Teach planner to evaluate multiple windows in the optimal order  (Ankit Kumar Pandey <itsankitkp@gmail.com>)
Responses Re: Todo: Teach planner to evaluate multiple windows in the optimal order
List pgsql-hackers

On Sun, Jan 29, 2023 at 7:05 PM Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
>
>
> > On 26/01/23 07:40, David Rowley wrote:

> > Sorting in smaller batches that better fit into CPU cache.
>
> More reading yielded that we are looking for cache-oblivious
> sorting algorithm.

Since David referred to L3 size as the starting point of a possible configuration parameter, that's actually cache-conscious.

> One the papers[1] mentions that in quick sort, whenever we reach size which can fit in cache,
> instead of partitioning it further, we can do insertion sort there itself.

> >  Memory-tuned quicksort uses insertion sort to sort small subsets while they are in
> > the cache, instead of partitioning further. This increases the instruction count,
> > but reduces the cache misses
>
> If this looks step in right direction, I can give it a try and do more reading and experimentation.

I'm not close enough to this thread to guess at the right direction (although I hope related work will help), but I have a couple general remarks:

1. In my experience, academic papers like to test sorting with register-sized numbers or strings. Our sorttuples are bigger, have complex comparators, and can fall back to fields in the full tuple.
2. That paper is over 20 years old. If it demonstrated something genuinely useful, some of those concepts would likely be implemented in the real-world somewhere. Looking for evidence of that might be a good exercise.
3. 20 year-old results may not carry over to modern hardware.
4. Open source software is more widespread in the academic world now than 20 years ago, so papers with code (maybe even the author's github) are much more useful to us in my view.
5. It's actually not terribly hard to make sorting faster for some specific cases -- the hard part is keeping other inputs of interest from regressing.
6. The bigger the change, the bigger the risk of regressing somewhere.

--
John Naylor
EDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: "wangw.fnst@fujitsu.com"
Date:
Subject: RE: Logical replication timeout problem
Next
From: Andres Freund
Date:
Subject: Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)