Re: Ordered Partitioned Table Scans - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Ordered Partitioned Table Scans
Date
Msg-id CANP8+j+mOKCsY2-A-pq7Dq8sFMNhwwYA9eKo8s8EadFi=s-MPA@mail.gmail.com
Whole thread Raw
In response to Re: Ordered Partitioned Table Scans  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Ordered Partitioned Table Scans
List pgsql-hackers
On Fri, 22 Mar 2019 at 11:12, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On Sat, 9 Mar 2019 at 10:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> This can be a huge win for queries of the form "ORDER BY partkey LIMIT
>>> x".  Even if the first subpath(s) aren't natively ordered, not all of
>>> the sorts should actually be performed.

>> [ shrug... ] We've got no realistic chance of estimating such situations
>> properly, so I'd have no confidence in a plan choice based on such a
>> thing.  Nor do I believe that this case is all that important.

> Wondering if you can provide more details on why you think there's no
> realistic chance of the planner costing these cases correctly?

The reason why I'm skeptical about LIMIT with a plan of the form
append-some-sorts-together is that there are going to be large
discontinuities in the cost-vs-number-of-rows-returned graph,
ie, every time you finish one child plan and start the next one,
there'll be a hiccup while the sort happens.  This is something
that our plan cost approximation (startup cost followed by linear
output up to total cost) can't even represent.  Sticking a
LIMIT on top of that is certainly not going to lead to any useful
estimate of the actual cost, meaning that if you get a correct
plan choice it'll just be by luck, and if you don't there'll be
nothing to be done about it.

If we don't incorporate that, then the situation that the planner
will have to model is a MergeAppend with possibly some sorts in
front of it, and it will correctly cost that as if all the sorts
happen before any output occurs, and so you can hope that reasonable
plan choices will ensue.

I believe that the cases where a LIMIT query actually ought to go
for a fast-start plan are where *all* the child rels have fast-start
(non-sort) paths --- which is exactly the cases we'd get if we only
allow "sorted" Appends when none of the inputs require a sort.
Imagining that we'll get good results in cases where some of them
need to be sorted is just wishful thinking.

> It would be unfortunate to reject this patch based on a sentence that
> starts with "[ shrug... ]", especially so when three people have stood
> up and disagreed with you.

I don't want to reject the patch altogether, I just want it to not
include a special hack to allow Append rather than MergeAppend in such
cases.  I am aware that I'm probably going to be shouted down on this
point, but that will not change my opinion that the shouters are wrong.

I agree that the issue of mixing sorts at various points will make nonsense of the startup cost/total cost results.

What you say about LIMIT is exactly right. But ISTM that LIMIT itself is the issue there and it need more smarts to correctly calculate costs.

I don't see LIMIT costing being broken as a reason to restrict this optimization. I would ask that we allow improvements to the important use case of ORDER BY/LIMIT, then spend time on making LIMIT work correctly.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [proposal] Add an option for returning SQLSTATE in psql errormessage
Next
From: Christoph Berg
Date:
Subject: Re: Offline enabling/disabling of data checksums