Re: Unfortunate pushing down of expressions below sort - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Unfortunate pushing down of expressions below sort
Date
Msg-id 2351008.1775593014@sss.pgh.pa.us
Whole thread Raw
In response to Re: Unfortunate pushing down of expressions below sort  (Chengpeng Yan <chengpeng_yan@Outlook.com>)
Responses Re: Unfortunate pushing down of expressions below sort
List pgsql-hackers
Chengpeng Yan <chengpeng_yan@Outlook.com> writes:
> Following up on the discussion below, I now have a patch.

> The patch extends make_sort_input_target() with a conservative rule:
> defer additional non-sort targetlist expressions past Sort only when
> doing so does not require carrying any additional Vars/PlaceHolderVars
> through Sort. This way, Sort input width never increases.

I spent some time thinking about this.

One thing I think we need to keep in mind is that if we don't postpone
an expression past Sort, and the user doesn't like that, she can
easily rewrite the query to force it; as indeed Andres demonstrated
at the start of this thread.  But overriding an unwanted planner
decision to postpone is harder.  I think you can do it with

SELECT * FROM (SELECT x,y,f(z) FROM ... OFFSET 0) ORDER BY whatever;

but if you forget the OFFSET-0 optimization fence you may find
f(z) getting evaluated after the sort anyway.  And the fence might
foreclose some other optimization you did want.

Also, make_sort_input_target() has gone basically unchanged since
2016, without that many complaints.  So I think we need to be pretty
conservative about adding postponement choices that aren't forced by
semantic requirements.

The rule stated above seems pretty conservative, but either it's not
conservative enough or you didn't implement it right, because the
regression test changes show the v2 patch is very willing to create
Result nodes where there were none before, even when there's no LIMIT
and thus no reason to think we can save any expression evaluations.
That extra plan node has nonzero cost that I don't think you're
accounting for.  It'll still be a win if enough data volume is removed
from the Sort step, but I don't see any consideration of how much
we're actually saving before deciding to add the projection step.

So I think we need some sort of gating rule, whereby we only postpone
these expressions if (a) there was already a reason to add a
projection or (b) we can make some cost-based or at least heuristic
estimate that says we'll cut the sort data volume significantly.
Maybe (b) needs to interact with the existing heuristic about
postponing expensive expressions, not sure.

Independently of that, I don't especially like the changes in
make_sort_input_target().  They seem rather inelegant and expensive
(and underdocumented), as well as duplicative of other work already
being done in the function.  It may be time to tackle the unfinished
work mentioned in the existing comments about avoiding redundant
cost/width calculations ...

            regards, tom lane



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Better shared data structure management and resizable shared data structures
Next
From: Andres Freund
Date:
Subject: Re: Adding REPACK [concurrently]