Thread: Early Sort/Group resjunk column elimination.

Early Sort/Group resjunk column elimination.

From

Ronan Dunklau

Date:

13 July 2021, 14:19:27

Hello,

I would like to know if there is any interest in working to reduce the usage 
and propagation of resjunk columns in the planner.

I think this topic is worth investigating, because most of the time when we 
request a sorted path without ever needing the sort key afterwards, we still 
carry the sort key to the final tuple, where the JunkFilter will finally get rid 
of it.

Rationale
========

This would allow several optimizations.

1) Index not needing to output the column 

This one was mentioned as far back as 2009 [1]  and is still relevant today. 
If we query SELECT a FROM t1 ORDER BY b; and we have an index, on b, we 
shouldn't output b at all since we don't need it in the upper nodes. This 
might not look like a huge problem by itself, but as noted in [1] it becomes 
very expensive in the case of a functional index. This is alleviated for 
IndexOnlyScan because it is capable of fetching the value from the index 
itself, but it is still a problem.

Take this query as an example:

regression=# explain (verbose) select two from tenk2 order by hundred;
                                       QUERY PLAN                                        
-----------------------------------------------------------------------------------------
 Index Scan using tenk2_hundred on public.tenk2  (cost=0.29..1574.20 
rows=10000 width=8)
   Output: two, hundred
(2 rows)


We should be able to transform it into:

regression=# explain (verbose) select two from tenk2 order by hundred;
                                       QUERY PLAN                                        
-----------------------------------------------------------------------------------------
 Index Scan using tenk2_hundred on public.tenk2  (cost=0.29..1574.20 
rows=10000 width=4)
   Output: two
(2 rows)


2) Other nodes

Other nodes might benefit from it, for exemple in FDW. Right now the sort key 
is always returned from the underlying FDW, but if the data can be sorted that 
could be a net win.

3) Incremental Sort

While working on the patch to allow Sort nodes to use the datumOnly 
optimization, a suggestion came up to also use it in the IncrementalSort. This 
is not possible today because even if we don't need the previously-sorted 
columns anymore, we still need to output them as resjunk columns.

4) Narrower tuples in dynamic shared memory.

DSM bandwidth is quite expensive, so if we can avoid exchanging some 
attributes here it could be a net win.


Proposal
=======

I've been trying to test this idea using a very simple approach. If that is of 
interest, I can clean up my branch and post a simple patch to discuss 
specifics, but I'd like to keep a high level discussion first. 

The idea would be to:
  - "tag" resjunk TargetEntries according to why they were added. So a column 
added as sort group clause would be tagged as such, and be recognisable
 - in the planner, instead of using the whole processed target list to build 
the finaltarget, we would remove resjunk entries we don't actually need (those 
added only as sortgroup clauses as of now, but there may be other kind of 
resjunk entries we can safely omit).
 - inject those columns only when generating the input targets needed for 
sorting, grouping, window functions and the likes. 

Using only this already allows optimization number 1), because if no Sort node 
needs to be added the pathtarget just cascade to the bottom of the path.

There is one big downside to this: it introduces a mismatch between the 
finaltarget and the output of the previous node (for example sort). This adds a 
costly Result node everywhere, performing an expensive projection instead of 
the much simpler JunkFilter we currently have:

regression=# explain (verbose) select two from tenk2 order by four;
                                  QUERY PLAN                                  
------------------------------------------------------------------------------
 Result  (cost=1109.39..1234.39 rows=10000 width=4)
   Output: two
   ->  Sort  (cost=1109.39..1134.39 rows=10000 width=8)
         Output: two, four
         Sort Key: tenk2.four
         ->  Seq Scan on public.tenk2  (cost=0.00..445.00 rows=10000 width=8)
               Output: two, four
(7 rows)


I think this is something that could easily be solved, either by teaching some 
nodes to do simple projections, consisting only of removing / reordering some 
attributes. This would match what  ExecJunkFilter does, generalized to any 
kind of "subset of attributes" projection.

Alternatively, we could also perform that at the Result level, leaving 
individual nodes alone, by implementing a simpler result node using the 
JunkFilter mechanism when it's possible (either with a full-blown 
"SimpleResult" specific node, or a different execprocnode in the Result).

If the idea seems worthy, I'll keep working on it and send you a patch 
demonstrating the idea.

[1] https://www.postgresql.org/message-id/flat/9957.1250956747%40sss.pgh.pa.us

Best regards,

-- 
Ronan Dunklau

Re: Early Sort/Group resjunk column elimination.

From

James Coleman

Date:

16 July 2021, 15:37:15

On Tue, Jul 13, 2021 at 10:19 AM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
>
> Hello,
>
> I would like to know if there is any interest in working to reduce the usage
> and propagation of resjunk columns in the planner.
>
> I think this topic is worth investigating, because most of the time when we
> request a sorted path without ever needing the sort key afterwards, we still
> carry the sort key to the final tuple, where the JunkFilter will finally get rid
> of it.
>
> Rationale
> ========
>
> This would allow several optimizations.
>
> 1) Index not needing to output the column
>
> This one was mentioned as far back as 2009 [1]  and is still relevant today.
> If we query SELECT a FROM t1 ORDER BY b; and we have an index, on b, we
> shouldn't output b at all since we don't need it in the upper nodes. This
> might not look like a huge problem by itself, but as noted in [1] it becomes
> very expensive in the case of a functional index. This is alleviated for
> IndexOnlyScan because it is capable of fetching the value from the index
> itself, but it is still a problem.
>
> Take this query as an example:
>
> regression=# explain (verbose) select two from tenk2 order by hundred;
>                                        QUERY PLAN
> -----------------------------------------------------------------------------------------
>  Index Scan using tenk2_hundred on public.tenk2  (cost=0.29..1574.20
> rows=10000 width=8)
>    Output: two, hundred
> (2 rows)
>
>
> We should be able to transform it into:
>
> regression=# explain (verbose) select two from tenk2 order by hundred;
>                                        QUERY PLAN
> -----------------------------------------------------------------------------------------
>  Index Scan using tenk2_hundred on public.tenk2  (cost=0.29..1574.20
> rows=10000 width=4)
>    Output: two
> (2 rows)
>
>
> 2) Other nodes
>
> Other nodes might benefit from it, for exemple in FDW. Right now the sort key
> is always returned from the underlying FDW, but if the data can be sorted that
> could be a net win.
>
> 3) Incremental Sort
>
> While working on the patch to allow Sort nodes to use the datumOnly
> optimization, a suggestion came up to also use it in the IncrementalSort. This
> is not possible today because even if we don't need the previously-sorted
> columns anymore, we still need to output them as resjunk columns.
>
> 4) Narrower tuples in dynamic shared memory.
>
> DSM bandwidth is quite expensive, so if we can avoid exchanging some
> attributes here it could be a net win.
>
>
> Proposal
> =======
>
> I've been trying to test this idea using a very simple approach. If that is of
> interest, I can clean up my branch and post a simple patch to discuss
> specifics, but I'd like to keep a high level discussion first.
>
> The idea would be to:
>   - "tag" resjunk TargetEntries according to why they were added. So a column
> added as sort group clause would be tagged as such, and be recognisable
>  - in the planner, instead of using the whole processed target list to build
> the finaltarget, we would remove resjunk entries we don't actually need (those
> added only as sortgroup clauses as of now, but there may be other kind of
> resjunk entries we can safely omit).
>  - inject those columns only when generating the input targets needed for
> sorting, grouping, window functions and the likes.
>
> Using only this already allows optimization number 1), because if no Sort node
> needs to be added the pathtarget just cascade to the bottom of the path.
>
> There is one big downside to this: it introduces a mismatch between the
> finaltarget and the output of the previous node (for example sort). This adds a
> costly Result node everywhere, performing an expensive projection instead of
> the much simpler JunkFilter we currently have:
>
> regression=# explain (verbose) select two from tenk2 order by four;
>                                   QUERY PLAN
> ------------------------------------------------------------------------------
>  Result  (cost=1109.39..1234.39 rows=10000 width=4)
>    Output: two
>    ->  Sort  (cost=1109.39..1134.39 rows=10000 width=8)
>          Output: two, four
>          Sort Key: tenk2.four
>          ->  Seq Scan on public.tenk2  (cost=0.00..445.00 rows=10000 width=8)
>                Output: two, four
> (7 rows)
>
>
> I think this is something that could easily be solved, either by teaching some
> nodes to do simple projections, consisting only of removing / reordering some
> attributes. This would match what  ExecJunkFilter does, generalized to any
> kind of "subset of attributes" projection.
>
> Alternatively, we could also perform that at the Result level, leaving
> individual nodes alone, by implementing a simpler result node using the
> JunkFilter mechanism when it's possible (either with a full-blown
> "SimpleResult" specific node, or a different execprocnode in the Result).
>
> If the idea seems worthy, I'll keep working on it and send you a patch
> demonstrating the idea.
>
> [1] https://www.postgresql.org/message-id/flat/9957.1250956747%40sss.pgh.pa.us

Thanks for hacking on this; as you're not surprised given I made the
original suggestion, I'm particularly interested in this for
incremental sort benefits, but I find the other examples you gave
compelling also.

Of course I haven't seen code yet, but my first intuition is to try to
avoid adding extra nodes and teach the (hopefully few) relevant nodes
to remove the resjunk entries themselves. Presumably in this case that
would mostly be the sort nodes (including gather merge).

One thing to pay attention to here is that we can't necessarily remove
resjunk entries every time in a sort node since, for example, in
parallel mode the gather merge node above it will need those entries
to complete the sort.

I'm interested to see what you're working on with a patch.

Thanks,
James Coleman

Re: Early Sort/Group resjunk column elimination.

From

Ronan Dunklau

Date:

20 July 2021, 15:47:57

Le vendredi 16 juillet 2021, 17:37:15 CEST James Coleman a écrit :
> Thanks for hacking on this; as you're not surprised given I made the
> original suggestion, I'm particularly interested in this for
> incremental sort benefits, but I find the other examples you gave
> compelling also.
>
> Of course I haven't seen code yet, but my first intuition is to try to
> avoid adding extra nodes and teach the (hopefully few) relevant nodes
> to remove the resjunk entries themselves. Presumably in this case that
> would mostly be the sort nodes (including gather merge).
>
> One thing to pay attention to here is that we can't necessarily remove
> resjunk entries every time in a sort node since, for example, in
> parallel mode the gather merge node above it will need those entries
> to complete the sort.

Yes that is actually a concern, especially as the merge node is already
handled specially when applying a projection.

>
> I'm interested to see what you're working on with a patch.

I am posting this proof-of-concept, for the record, but I don't think the
numerous problems can be solved easily. I tried to teach Sort to use a limited
sort of projection, but it brings its own slate of problems...

Quick list of problems with the current implementation, leaving aside the fact
that it's quite hacky in a few places:

* result nodes are added for numerous types of non-projection-capable paths,
since the above (final) target includes resjunk columns which should be
eliminated.
* handling of appendrel seems difficult, as both ordered and unordered appends
are generated at the same time against the same target
* I'm having trouble understanding the usefulness of a building physical
tlists for SubqueryScans

The second patch is a very hacky way to try to eliminate some generated result
nodes. The idea is to bypass the whole interpreter when using a "simple"
projection which is just a reduction of the number of columns, and teach Sort
and Result to perform it. To do this, I added a parameter to
is_projection_capable_path to make the test depend on the actual asked target:
for a sort node, only a "simple" projection.

The implementation uses a junkfilter which assumes nothing else than Const and
outer var will be present.

I don't feel like this is going anywhere, but at least it's here for
discussion and posterity, if someone is interested.


--
Ronan Dunklau

Thread: Early Sort/Group resjunk column elimination.

Early Sort/Group resjunk column elimination.

Re: Early Sort/Group resjunk column elimination.

Re: Early Sort/Group resjunk column elimination.

Attachment