Avoid computing ORDER BY junk columns unnecessarily - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Avoid computing ORDER BY junk columns unnecessarily
Date
Msg-id 2ca5865b-4693-40e5-8f78-f3b45d5378fb@iki.fi
Whole thread Raw
Responses Re: Avoid computing ORDER BY junk columns unnecessarily
Re: Avoid computing ORDER BY junk columns unnecessarily
Re: Avoid computing ORDER BY junk columns unnecessarily
List pgsql-hackers
Problem
-------

We are using junk columns for (at least) two slightly different purposes:

1. For passing row IDs and other such data from lower plan nodes to 
LockRows / ModifyTable.

2. To represent ORDER BY and GROUP BY columns that don't appear in the 
SELECT list. For example, in a query like:

     SELECT foo FROM mytable ORDER BY bar;

The parser adds 'bar' to the target list as a junk column. You can see 
that with EXPLAIN VERBOSE:

explain (verbose, costs off)
   select foo from mytable order by bar;

             QUERY PLAN
----------------------------------
  Sort
    Output: foo, bar
    Sort Key: mytable.bar
    ->  Seq Scan on public.mytable
          Output: foo, bar
(5 rows)

The 'bar' column get filtered away in the executor, by the so-called 
junk filter. That's fine for simple cases like the above, but in some 
cases, that causes the ORDER BY value to be computed unnecessarily. For 
example:

create table mytable (foo text, bar text);
insert into mytable select g, g from generate_series(1, 10000) g;
create index on mytable (sha256(bar::bytea));
explain verbose
  select foo from mytable order by sha256(bar::bytea);

                                            QUERY PLAN 

------------------------------------------------------------------------------------------------
  Index Scan using mytable_sha256_idx on public.mytable 
(cost=0.29..737.28 rows=10000 width=64)
    Output: foo, sha256((bar)::bytea)
(2 rows)

The index is used to satisfy the ORDER BY, but the expensive ORDER BY 
expression is still computed for every row, just to be thrown away by 
the junk filter.

This came up with pgvector, as the vector distance functions are pretty 
expensive. All vector operations are expensive, so one extra distance 
function call per row doesn't necessarily make that much difference, but 
it sure looks silly. See 
https://github.com/pgvector/pgvector/issues/359#issuecomment-1840786021 
(thanks Matthias for the investigation!).

Solution
--------

The obvious solution is that the planner should not include those junk 
columns in the plan. But how exactly to implement that is a different 
matter.

I came up with the attached patch set, which adds a projection to all 
the paths at the end of planning in grouping_planner(). The projection 
filters out the unnecessary junk columns. With that, the plan for the 
above example:

postgres=# explain verbose select foo from mytable order by 
sha256(bar::bytea);
                                           QUERY PLAN 

-----------------------------------------------------------------------------------------------
  Index Scan using mytable_sha256_idx on public.mytable 
(cost=0.29..662.24 rows=10000 width=4)
    Output: foo
(2 rows)


Problems with the solution
--------------------------

So this seems to work, but I have a few doubts:

1. Because Sort cannot project, this adds an extra Result node on top of 
Sort nodes when the the ORDER BY is implemented by sorting:

postgres=# explain verbose select foo from mytable order by bar;
                                    QUERY PLAN 

--------------------------------------------------------------------------------
  Result  (cost=818.39..843.39 rows=10000 width=4)
    Output: foo
    ->  Sort  (cost=818.39..843.39 rows=10000 width=8)
          Output: foo, bar
          Sort Key: mytable.bar
          ->  Seq Scan on public.mytable  (cost=0.00..154.00 rows=10000 
width=8)
                Output: foo, bar
(7 rows)

 From a performance point of view, I think that's not as bad as it 
sounds. Remember that without this patch, the executor needs to execute 
the junk filter to filter out the extra column instead. It's not clear 
that an extra Result is worse than that, although I haven't tried 
benchmarking it though.

This makes plans for queries like above more verbose though. Perhaps we 
should teach Sort (and MergeAppend) to do projection, just to avoid that?

Another solution would be to continue relying on the junk filter, if 
adding the projection in the planner leads to an extra Result node. 
That's a bit ugly, because then the final target list of a (sub)query 
depends on the Path that's chosen.

2. Instead of tacking on the projection to the paths at the end, I first 
tried modifying the code earlier in grouping_planner() that computes the 
target lists for the different plan stages. That still feels like a 
cleaner approach to me, although I don't think there's any difference in 
the generated plans in practice. However I ran into some problems with 
that approach and gave up.

I basically tried to remove the junk columns from 'final_target', and 
have create_ordered_paths() create paths with the filtered target list 
directly. And if there is no ORDER BY, leave out the junk columns from 
'grouping_target' too, and have create_grouping_paths() generate the 
final target list directly. However, create_grouping_paths() assumes 
that the grouping columns are included in 'grouping_target'. And 
similarly in create_ordered_paths(), some partial planning stages assume 
that the ordering columns are included in 'final_target'. Those 
assumptions could probably be fixed, but I ran out of steam trying to do 
that.

3. I also considered if we should stop using junk columns to represent 
ORDER BY / GROUP BY columns like this in the first place. Perhaps we 
should have a separate list for those and not stash them in the target 
list. But that seems like a much bigger change.

4. It would be nice to get rid of the junk filter in the executor 
altogether. With this patch, the junk columns used for RowLocks and 
ModifyTable are still included in the final target list, and are still 
filtered out by the junk filter. But we could add similar projections 
between the RowLocks and ModifyTable stages, to eliminate all the junk 
columns at the top of the plan. ExecFilterJunk() isn't a lot of code, 
but it would feel cleaner to me. I didn't try to implement that.

5. I'm not sure the categorization of junk columns that I implemented 
here is the best one. It might make sense to have more categories, and 
distinguish row-id columns from others for example. And ORDER BY columns 
from GROUP BY columns.

Patches
-------

So the attached patches implement that idea, with the above-mentioned 
problems. I think this approach is fine as it is, despite those 
problems, but my planner-fu is a rusty so I really need review and a 
second opinion on this.

v1-0001-Add-test-for-Min-Max-optimization-with-kNN-index-.patch
v1-0002-Show-how-ORDER-BY-expression-is-computed-unnecess.patch

These patches just add to existing tests to demonstrate the problem.

v1-0003-Turn-resjunk-into-an-enum.patch

Replace 'resjunk' boolean with an enum, so that we can distinguish 
between different junk columns. The next patch uses that information to 
identify junk columns that can be filtered out. It's is a separate patch 
for easier review.

v1-0004-Omit-columns-from-final-tlist-that-were-only-need.patch

The main patch in this series.

v1-0005-Fix-regression-tests-caused-by-additional-Result-.patch

Regression test output changes, for all the plans with Sort that now 
have Sort + Result. See "Problem with the solution" #1.

-- 
Heikki Linnakangas
Neon (https://neon.tech)
Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: authentication/t/001_password.pl trashes ~/.psql_history
Next
From: Tomas Vondra
Date:
Subject: Re: index prefetching