Re: Avoid computing ORDER BY junk columns unnecessarily - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Avoid computing ORDER BY junk columns unnecessarily
Date
Msg-id 356d5811-d8f0-4c29-a601-b61334337b11@iki.fi
Whole thread Raw
In response to Re: Avoid computing ORDER BY junk columns unnecessarily  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Avoid computing ORDER BY junk columns unnecessarily
List pgsql-hackers
On 29/12/2023 01:42, Tom Lane wrote:
> I wrote:
>> Yeah, fair point.  I'll try to take a look at your patchset after
>> the holidays.
> 
> I spent some time looking at this patch, and I'm not very pleased
> with it.  My basic complaint is that this is a band-aid that only
> touches things at a surface level, whereas I think we need a much
> deeper set of changes in order to have a plausible solution.
> Specifically, you're only munging the final top-level targetlist
> not anything for lower plan levels.  That looks like it works in
> simple cases, but it leaves everything on the table as soon as
> there's more than one level of plan involved.

Yeah, that's fair.

> I didn't stop to trace the details but I'm pretty sure this is why
> you're getting the bogus extra projections shown in the 0005 patch.
They're not bogus. With the patches, projecting away the junk columns is 
visible in the plan as an extra Result node, while currently it's done 
as an implicit step in the executor. That seems fine and arguably an 
even more honest representation of what's happening, although I don't 
like the extra verbosity in EXPLAIN output.

> Moreover, this isn't doing anything for cost estimates, which means
> the planner doesn't really understand that more-desirable plans are
> more desirable, and it may not pick an available plan that would
> exploit what we want to have happen here.
> 
> As an example, say we have an index on f(x) and the query requires
> sorting by f(x) but not final output of f(x).  If we devise a plan
> that uses the index to sort and doesn't emit f(x), we need to not
> charge the evaluation cost of f() for that plan --- this might
> make the difference between picking that plan and not.  Right now
> I believe we'll charge all plans for f(), so that some other plan
> might look cheaper even when f() is very expensive.
> > Another example: we might be using an indexscan but not relying on
> its sort order, for example because its output is going into a hash
> join and then we'll sort afterwards.  For expensive f() it would
> still be useful if the index could emit f(x) so that we don't have
> to calculate f() at the sort step.  Right now I don't think we can
> even emit that plan shape, never mind recognize why it's cheaper.

Related to this, we are not currently costing the target list evaluation 
correctly for index-only scans. Here's an example:

create or replace function expensive(i int) returns int
language plpgsql as
$$
   begin return i; end;
$$
immutable cost 1000000;

create table atab (i int);
insert into atab select g from generate_series(1, 10000) g;
create index on atab (i, expensive(i));

postgres=# explain verbose select expensive(i) from atab order by 
expensive(i);
                                  QUERY PLAN 

----------------------------------------------------------------------------
  Sort  (cost=25000809.39..25000834.39 rows=10000 width=4)
    Output: (expensive(i))
    Sort Key: (expensive(atab.i))
    ->  Seq Scan on public.atab  (cost=0.00..25000145.00 rows=10000 width=4)
          Output: expensive(i)
(5 rows)

postgres=# set enable_seqscan=off; set enable_bitmapscan=off;
SET
SET
postgres=# explain verbose select expensive(i) from atab order by 
expensive(i);
                                                   QUERY PLAN 

--------------------------------------------------------------------------------------------------------------
  Sort  (cost=25001114.67..25001139.67 rows=10000 width=4)
    Output: (expensive(i))
    Sort Key: (expensive(atab.i))
    ->  Index Only Scan using atab_i_expensive_idx on public.atab 
(cost=0.29..25000450.29 rows=10000 width=4)
          Output: (expensive(i))
(5 rows)

The cost of the index only scan ought to be much lower than the seqscan, 
as it can return the pre-computed expensive(i) from the index.

That could probably be fixed without any of the other changes we've been 
discussing here, though.

> I have only vague ideas about how to do this better.  It might work
> to set up multiple PathTargets for a relation that has such an
> index, one for the base case where the scan only emits x and f() is
> computed above, one for the case where we don't need either x or
> f(x) because we're relying on the index order, and one that emits
> f(x) with the expectation that a sort will happen above.  Then we'd
> potentially generate multiple Paths representing the very same
> indexscan but with different PathTargets, and differing targets
> would have to become something that add_path treats as a reason to
> keep multiple Paths for the same relation.  I'm a little frightened
> about the possible growth in number of paths considered, but how
> else would we keep track of the differing costs of these cases?

Hmm, if there are multiple functions like that in the target list, would 
you need to create different paths for each combination of expressions? 
That could really explode the number of paths.

Perhaps each Path could include "optional" target entries that it can 
calculate more cheaply, with a separate cost for each such expression. 
add_path() would treat the presence of optional target entries as a 
reason to retain the path, but you wouldn't need to keep a separate path 
for each PathTarget.

Another idea is to include f(x) in the PathTarget only if it's "cheap to 
emit". For example, if it's an expression index on f(x). If it turns out 
that f(x) is not needed higher up in the plan, it's not a big error in 
the estimate because it was cheap to emit. That wouldn't work well if 
the index AM could calculate f(x) more cheaply than the executor, but 
the cost was still not trivial. I'm not sure if that situation can arise 
currently.

> * needed for final grouping/sorting steps (I think all
>    resjunk items produced by the parser are this case;
>    but do we need to distinguish GROUP BY from ORDER BY?)

It would seem useful to distinguish GROUP BY and ORDER BY. For example:

SELECT COUNT(*) FROM table GROUP BY a, b ORDER BY a;

If this is implemented as HashAgg + Sort for example, only 'a' would be 
needed by the sort. Including less data in a Sort is good.

I wanted to implement that in my patch already, by removing the junk 
columns needed for grouping but not sorting from sort_input_target. But 
the grouping path generation has some assumptions that the grouping 
output target list includes all the grouping columns. I don't remember 
the exact problem that made me give up on that, but it probably could be 
fixed.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




pgsql-hackers by date:

Previous
From: Jelte Fennema-Nio
Date:
Subject: Add new protocol message to change GUCs for usage with future protocol-only GUCs
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] Changing references of password encryption to hashing