Thread: Incremental Sort Cost Estimation Instability

Incremental Sort Cost Estimation Instability

From

Andrei Lepikhov

Date:

26 June 2024, 15:00:27

Hi,

While designing an improvement for the cost sort model, I discovered 
that the query plan can vary if we slightly change the query text 
without pushing semantic differences. See the example below:

CREATE TABLE test(x integer, y integer,z text);
INSERT INTO test (x,y) SELECT x, 1 FROM generate_series(1,1000000) AS x;
CREATE INDEX ON test(x);
CREATE INDEX ON test(y);
VACUUM ANALYZE;
SET max_parallel_workers_per_gather = 0;

First query:

EXPLAIN (ANALYZE, TIMING OFF) SELECT count(*) FROM test t1, test t2
WHERE t1.x=t2.y AND t1.y=t2.x GROUP BY t1.x,t1.y;

And the second one - just reverse the left and right sides of 
expressions in the WHERE condition:

EXPLAIN (ANALYZE, TIMING OFF) SELECT count(*) FROM test t1, test t2
WHERE t2.y=t1.x AND t2.x=t1.y GROUP BY t1.x,t1.y;

You can see two different plans here:

GroupAggregate  (cost=37824.89..37824.96 rows=1 width=16)
   Group Key: t1.y, t1.x
   ->  Incremental Sort  (cost=37824.89..37824.94 rows=2 width=8)
         Sort Key: t1.y, t1.x
         Presorted Key: t1.y
         ->  Merge Join  (cost=0.85..37824.88 rows=1 width=8)
               Merge Cond: (t1.y = t2.x)
               Join Filter: (t2.y = t1.x)
               ->  Index Scan using test_y_idx on test t1
               ->  Index Scan using test_x_idx on test t2

GroupAggregate  (cost=37824.89..37824.92 rows=1 width=16)
   Group Key: t1.x, t1.y
   ->  Sort  (cost=37824.89..37824.90 rows=1 width=8)
         Sort Key: t1.x, t1.y
         Sort Method: quicksort  Memory: 25kB
         ->  Merge Join  (cost=0.85..37824.88 rows=1 width=8)
               Merge Cond: (t1.y = t2.x)
               Join Filter: (t2.y = t1.x)
               ->  Index Scan using test_y_idx on test t1
               ->  Index Scan using test_x_idx on test t2

Don't mind for now that the best plan is to do IncrementalSort with 
presorted key t1.x. Just pay attention to the fact that the plan has 
altered without any valuable reason.
The cost_incremental_sort() routine causes such behaviour: it chooses 
the expression to estimate the number of groups from the first 
EquivalenceClass member that relies on the syntax.
I tried to invent a simple solution to fight this minor case. But the 
most clear and straightforward way here is to save a reference to the 
expression that triggered the PathKey creation inside the PathKey itself.
See the sketch of the patch in the attachment.
I'm not sure this instability is worth fixing this way, but the 
dependence of the optimisation outcome on the query text looks buggy.

-- 
regards, Andrei Lepikhov

Attachment

0001-Add-a-reference-to-the-expression-which-has-triggere.patch

Re: Incremental Sort Cost Estimation Instability

From

David Rowley

Date:

12 September 2024, 04:05:25

On Thu, 27 Jun 2024 at 03:00, Andrei Lepikhov <lepihov@gmail.com> wrote:
> I tried to invent a simple solution to fight this minor case. But the
> most clear and straightforward way here is to save a reference to the
> expression that triggered the PathKey creation inside the PathKey itself.
> See the sketch of the patch in the attachment.
> I'm not sure this instability is worth fixing this way, but the
> dependence of the optimisation outcome on the query text looks buggy.

I don't think that's going to work as that'll mean it'll just choose
whichever expression was used when the PathKey was first created.  For
your example query, both PathKey's are first created for the GROUP BY
clause in standard_qp_callback(). I only have to change the GROUP BY
in your query to use the equivalent column in the other table to get
it to revert back to the plan you complained about.

postgres=# EXPLAIN (costs off) SELECT count(*) FROM test t1, test t2
WHERE t1.x=t2.y AND t1.y=t2.x GROUP BY t2.y,t2.x;
                        QUERY PLAN
----------------------------------------------------------
 GroupAggregate
   Group Key: t2.y, t2.x
   ->  Sort
         Sort Key: t2.y, t2.x
         ->  Merge Join
               Merge Cond: (t1.y = t2.x)
               Join Filter: (t2.y = t1.x)
               ->  Index Scan using test_y_idx on test t1
               ->  Index Scan using test_x_idx on test t2
(9 rows)

Maybe doing something with estimate_num_groups() to find the
EquivalenceClass member with the least distinct values would be
better. I just can't think how that could be done in a performant way.

David

Re: Incremental Sort Cost Estimation Instability

From

Andrei Lepikhov

Date:

12 September 2024, 12:51:19

On 12/9/2024 03:05, David Rowley wrote:
> On Thu, 27 Jun 2024 at 03:00, Andrei Lepikhov <lepihov@gmail.com> wrote:
>> I tried to invent a simple solution to fight this minor case. But the
>> most clear and straightforward way here is to save a reference to the
>> expression that triggered the PathKey creation inside the PathKey itself.
>> See the sketch of the patch in the attachment.
>> I'm not sure this instability is worth fixing this way, but the
>> dependence of the optimisation outcome on the query text looks buggy.
> 
> I don't think that's going to work as that'll mean it'll just choose
> whichever expression was used when the PathKey was first created.  For
> your example query, both PathKey's are first created for the GROUP BY
> clause in standard_qp_callback(). I only have to change the GROUP BY
> in your query to use the equivalent column in the other table to get
> it to revert back to the plan you complained about.
Yes, it is true. It is not ideal solution so far - looking for better ideas.

> Maybe doing something with estimate_num_groups() to find the
> EquivalenceClass member with the least distinct values would be
> better. I just can't think how that could be done in a performant way.
Initial problem causes wrong cost_sort estimation. Right now I think 
about providing cost_sort() the sort clauses instead of (or in addition 
to) the pathkeys.

-- 
regards, Andrei Lepikhov

Re: Incremental Sort Cost Estimation Instability

From

David Rowley

Date:

12 September 2024, 13:12:03

On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote:
> Initial problem causes wrong cost_sort estimation. Right now I think
> about providing cost_sort() the sort clauses instead of (or in addition
> to) the pathkeys.

I'm not quite sure why the sort clauses matter any more than the
EquivalenceClass.  If the EquivalanceClass defines that all members
will have the same value for any given row, then, if we had to choose
any single member to drive the n_distinct estimate from, isn't the
most accurate distinct estimate from the member with the smallest
n_distinct estimate?  (That assumes the less distinct member has every
value the more distinct member has, which might not be true)

David

Re: Incremental Sort Cost Estimation Instability

From

Tomas Vondra

Date:

12 September 2024, 17:57:18

On 9/12/24 12:12, David Rowley wrote:
> On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote:
>> Initial problem causes wrong cost_sort estimation. Right now I think
>> about providing cost_sort() the sort clauses instead of (or in addition
>> to) the pathkeys.
> 
> I'm not quite sure why the sort clauses matter any more than the
> EquivalenceClass.  If the EquivalanceClass defines that all members
> will have the same value for any given row, then, if we had to choose
> any single member to drive the n_distinct estimate from, isn't the
> most accurate distinct estimate from the member with the smallest
> n_distinct estimate?  (That assumes the less distinct member has every
> value the more distinct member has, which might not be true)
> 
> David
> 

How large can the cost difference get? My assumption was it's correlated
with how different the ndistincts are for the two sides, so I tried

  CREATE TABLE t1(x integer, y integer,z text);
  CREATE TABLE t2(x integer, y integer,z text);

  INSERT INTO t1 (x,y) SELECT x, 1
    FROM generate_series(1,1000000) AS x;
  INSERT INTO t2 (x,y) SELECT mod(x,1000), 1
    FROM generate_series(1,1000000) AS x;

  CREATE INDEX ON t1(x);
  CREATE INDEX ON t2(x);
  CREATE INDEX ON t1(y);
  CREATE INDEX ON t2(y);

  VACUUM ANALYZE;

Which changes the ndistinct for t2.x from 1M to 1k. I've expected the
cost difference to get much larger, but in it does not really change:

GroupAggregate  (cost=38.99..37886.88 rows=992 width=16) (actual rows=1
loops=1)

GroupAggregate  (cost=37874.26..37904.04 rows=992 width=16) (actual
rows=1 loops=1)

That is pretty significant - the total cost difference is tiny, I'd even
say it does not matter in practice (seems well within possible impact of
collecting a different random sample).

But the startup cost changes in rather absurd way, while the rest of the
plan is exactly the same. We even know this:

   ->  Incremental Sort  (cost=38.99..37869.52 rows=992 width=8)
         Sort Key: t1.x, t1.y
         Presorted Key: t1.x

in both cases. There's literally no other difference between these plans
visible in the explain ...

I'm not sure how to fix this, but it seems estimate_num_groups() needs
to do things differently. And I agree looking for the minimum ndistinct
seems like the right approach. but doesn't estimate_num_groups()
supposed to already do that? The comment says:

 * 3.  If the list contains Vars of different relations that are known equal
 * due to equivalence classes, then drop all but one of the Vars from each
 * known-equal set, keeping the one with smallest estimated # of values
 * (since the extra values of the others can't appear in joined rows).
 * Note the reason we only consider Vars of different relations is that
 * if we considered ones of the same rel, we'd be double-counting the
 * restriction selectivity of the equality in the next step.

I haven't debugged this, but how come this doesn't do the trick?

regards

-- 
Tomas Vondra

Re: Incremental Sort Cost Estimation Instability

From

Andrei Lepikhov

Date:

19 September 2024, 11:44:25

On 12/9/2024 12:12, David Rowley wrote:
> On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote:
>> Initial problem causes wrong cost_sort estimation. Right now I think
>> about providing cost_sort() the sort clauses instead of (or in addition
>> to) the pathkeys.
> 
> I'm not quite sure why the sort clauses matter any more than the
> EquivalenceClass.  If the EquivalanceClass defines that all members
> will have the same value for any given row, then, if we had to choose
> any single member to drive the n_distinct estimate from, isn't the
> most accurate distinct estimate from the member with the smallest
> n_distinct estimate?  (That assumes the less distinct member has every
> value the more distinct member has, which might not be true)
Thanks for your efforts! Your idea looks more stable and applicable than 
my patch.
BTW, it could still provide wrong ndistinct estimations if we choose a 
sorting operator under clauses mentioned in the EquivalenceClass.
However, this thread's primary intention is to stabilize query plans, so 
I'll try to implement your idea.

The second reason was to distinguish sortings by cost (see proposal [1]) 
because sometimes it could help to save CPU cycles on comparisons. 
Having a lot of sort/grouping queries with only sporadic joins, I see 
how profitable it could sometimes be - text or numeric grouping over 
mostly Cartesian join may be painful without fine tuned sorting.

[1] 
https://www.postgresql.org/message-id/8742aaa8-9519-4a1f-91bd-364aec65f5cf@gmail.com

-- 
regards, Andrei Lepikhov

Re: Incremental Sort Cost Estimation Instability

From

Andrei Lepikhov

Date:

23 September 2024, 16:21:16

On 12/9/2024 16:57, Tomas Vondra wrote:
> On 9/12/24 12:12, David Rowley wrote:
>> On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote:
> I'm not sure how to fix this, but it seems estimate_num_groups() needs
> to do things differently. And I agree looking for the minimum ndistinct
> seems like the right approach. but doesn't estimate_num_groups()
> supposed to already do that? The comment says:
I've rewritten the code in the previous email. It looks like we can try 
to rewrite estimate_num_groups to do it more effectively, but I haven't 
done it yet.
Regarding the tiny change in the cost, my initial reason was to teach 
cost_sort to differ sort orderings: begin by considering the number of 
columns in the cost estimation and then consider the distinct estimation 
of the first column.
BTW, it was triggered by user reports, where a slight change in the 
balance between MergeAppend/GatherMerge/Sort/IncrementalSort (or columns 
order) could give significant profit. Especially when grouping millions 
of rows in 2-4 bytea columns.

-- 
regards, Andrei Lepikhov

Re: Incremental Sort Cost Estimation Instability

From

Andrei Lepikhov

Date:

23 September 2024, 17:45:25

On 12/9/2024 16:57, Tomas Vondra wrote:
> On 9/12/24 12:12, David Rowley wrote:
>> On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote:
>  but doesn't estimate_num_groups()
> supposed to already do that? The comment says:
> 
>   * 3.  If the list contains Vars of different relations that are known equal
>   * due to equivalence classes, then drop all but one of the Vars from each
>   * known-equal set, keeping the one with smallest estimated # of values
>   * (since the extra values of the others can't appear in joined rows).
>   * Note the reason we only consider Vars of different relations is that
>   * if we considered ones of the same rel, we'd be double-counting the
>   * restriction selectivity of the equality in the next step.
> 
> I haven't debugged this, but how come this doesn't do the trick?
I've got your point now.
Unfortunately, this comment says that estimate_num_groups removes 
duplicates from the list of grouping expressions (see 
exprs_known_equal). But it doesn't discover em_members to find the 
most-fitted clause for each grouping position.

-- 
regards, Andrei Lepikhov

Re: Incremental Sort Cost Estimation Instability

From

Alena Rybakina

Date:

07 November 2024, 14:06:37

Hi!

On 07.11.2024 08:57, Andrei Lepikhov wrote:

On 10/8/24 11:33, Andrei Lepikhov wrote:
On 9/23/24 20:02, Andrei Lepikhov wrote:
On 12/9/2024 12:12, David Rowley wrote:
On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com>
Minor change to make compiler and cfbot happy
Now, this thread looks connected to the [1]. However, it still has independent profit, which can be discussed separately.
After the introduction of the em->em_ndistinct cache, I played around with the idea of letting the estimate_num_groups use this cache. Occasionally found out that we have one more instability case like the following:

DROP TABLE IF EXISTS test;
CREATE TABLE test (x int, y int, z int);
INSERT INTO test (x,y,z) (SELECT random()*1E5, random()*2, random() FROM generate_series(1,1e4));
VACUUM ANALYZE test;

EXPLAIN SELECT count(*) FROM test WHERE x=y GROUP BY x,y;
EXPLAIN SELECT count(*) FROM test WHERE x=y GROUP BY y,x;

Here, you can see that depending on the initial order of grouping, Postgres chooses different columns for grouping. Doing that causes different estimations - one of them is definitely wrong:

GroupAggregate (cost=181.41..182.29 rows=50 width=16)
GroupAggregate (cost=181.41..181.82 rows=3 width=16)

That happens because when estimating the number of groups, Postgres doesn't consider EquivalenceClass, which can let him correct group estimation at a low price.
It may be done inside the make_pathkeys_for_sortclauses_extended by choosing a column with a lower number of distinct, but IMO, it is better to do it at the moment of the number of groups estimation.

Thoughts? Is it a real issue or just a non-practical corner case?

The new version of the patch is attached.

[1] https://www.postgresql.org/message-id/flat/8742aaa8-9519-4a1f-91bd-364aec65f5cf%40gmail.com

But you haven’t considered the case when you need to use non-cached values, for example, if ndistinct has already changed. Look, here x has a minimum ndistinct, and then column z:

alena@postgres=# delete from test;
DELETE 10000
alena@postgres=# INSERT INTO test (x,y,z) (SELECT id%3, id*2, id FROM generate_series(1,1e4) as id);
INSERT 0 10000
alena@postgres=# EXPLAIN SELECT * FROM test where x=y ORDER BY x,y,z;
                          QUERY PLAN
--------------------------------------------------------------
Sort (cost=196.88..197.02 rows=56 width=12)
   Sort Key: x, z
   -> Seq Scan on test (cost=0.00..195.25 rows=56 width=12)
         Filter: (x = y)
(4 rows)

alena@postgres=# delete from test;
DELETE 10000
alena@postgres=# INSERT INTO test (x,y,z) (SELECT id, id*2, id%3 FROM generate_series(1,1e4) as id);

INSERT 0 10000
alena@postgres=# vacuum analyze;
VACUUM
alena@postgres=# EXPLAIN SELECT * FROM test where x=y ORDER BY x,y,z;
                          QUERY PLAN
--------------------------------------------------------------
Sort (cost=235.41..235.54 rows=50 width=12)
   Sort Key: x, z
   -> Seq Scan on test (cost=0.00..234.00 rows=50 width=12)
         Filter: (x = y)
(4 rows)

but the order of the columns does not change, as you can see.

-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Incremental Sort Cost Estimation Instability

From

Andrei Lepikhov

Date:

08 November 2024, 06:25:29

On 11/7/24 18:06, Alena Rybakina wrote:
> On 07.11.2024 08:57, Andrei Lepikhov wrote:
>> That happens because when estimating the number of groups, Postgres 
>> doesn't consider EquivalenceClass, which can let him correct group 
>> estimation at a low price.
>> It may be done inside the make_pathkeys_for_sortclauses_extended by 
>> choosing a column with a lower number of distinct, but IMO, it is 
>> better to do it at the moment of the number of groups estimation.
>>
>> Thoughts? Is it a real issue or just a non-practical corner case?
>>
>> The new version of the patch is attached.
>>
>> [1] https://www.postgresql.org/message-id/ 
>> flat/8742aaa8-9519-4a1f-91bd-364aec65f5cf%40gmail.com
> 
> But you haven’t considered the case when you need to use non-cached 
> values, for example, if ndistinct has already changed. Look, here x has 
> a minimum ndistinct, and then column z:

> but the order of the columns does not change, as you can see.
I'm unsure what you mean by talking about 'cached value' or 'changed 
ndistinct' even slightly.
Also, I don't understand the issue you tried to show with your examples.
My point was that an equality expression can be used to modify 
statistics-based decisions on the number of groups. Look:

A.x, distincts = 1000
A.y, distincts = 10

After the filter 'A.x=A.y' it is impossible to get more than 10 groups 
on the A.x as well as on the A.y column. So, we have a tool to correct 
the estimation considering equivalence classes.

-- 
regards, Andrei Lepikhov

Re: Incremental Sort Cost Estimation Instability

From

Alexander Korotkov

Date:

03 June, 17:02:53

Hi, Andrei!

On Wed, May 14, 2025 at 1:50 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
> On 9/12/24 16:57, Tomas Vondra wrote:
> > On 9/12/24 12:12, David Rowley wrote:
> >> On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote:
> >>> Initial problem causes wrong cost_sort estimation. Right now I think
> >>> about providing cost_sort() the sort clauses instead of (or in addition
> >>> to) the pathkeys.
> >>
> >> I'm not quite sure why the sort clauses matter any more than the
> >> EquivalenceClass.  If the EquivalanceClass defines that all members
> >> will have the same value for any given row, then, if we had to choose
> >> any single member to drive the n_distinct estimate from, isn't the
> >> most accurate distinct estimate from the member with the smallest
> >> n_distinct estimate?  (That assumes the less distinct member has every
> >> value the more distinct member has, which might not be true)
> > I'm not sure how to fix this, but it seems estimate_num_groups() needs
> > to do things differently. And I agree looking for the minimum ndistinct
> > seems like the right approach. but doesn't estimate_num_groups()
> > supposed to already do that? The comment says:
> Hi,
>
> I have rewritten the code according to the idea of the
> estimate_num_groups responsibility to adjust estimation according to the EC.
> I haven't changed all the places of ngroups estimation - only where the
> planner can access pathkeys to avoid the overhead of passing through all
> the ECs. And added some cache in the EM.
> The most important case for me here is GROUP-BY estimation and
> create_unique_path because I frequently see them as sides of a JOIN.
> It would be also nice to adjust the cost of memoize rescan, but it may
> be a subject of the future improvement.

Thank you for your work on this subject.  I've the following notes
about the last patch.
1. Now groupExprs argument of estimate_num_groups() might contain
either Expr's or Pathkey's.  I think this should be documented.
2. A new argument relids of estimate_num_groups() also should be documented.
3. I doubt estimate_num_groups() is appropriate place to handle "varno
0".  Should we do this in cost_incremental_sort() as before?  I see
that it currently analyzes only first EquivalenceMember.  But could it
be a different answers between first EquivalenceMember and others?

------
Regards,
Alexander Korotkov
Supabase

Re: Incremental Sort Cost Estimation Instability

From

Alexander Korotkov

Date:

05 June, 00:26:17

On Tue, Jun 3, 2025 at 5:02 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Wed, May 14, 2025 at 1:50 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
> > On 9/12/24 16:57, Tomas Vondra wrote:
> > > On 9/12/24 12:12, David Rowley wrote:
> > >> On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote:
> > >>> Initial problem causes wrong cost_sort estimation. Right now I think
> > >>> about providing cost_sort() the sort clauses instead of (or in addition
> > >>> to) the pathkeys.
> > >>
> > >> I'm not quite sure why the sort clauses matter any more than the
> > >> EquivalenceClass.  If the EquivalanceClass defines that all members
> > >> will have the same value for any given row, then, if we had to choose
> > >> any single member to drive the n_distinct estimate from, isn't the
> > >> most accurate distinct estimate from the member with the smallest
> > >> n_distinct estimate?  (That assumes the less distinct member has every
> > >> value the more distinct member has, which might not be true)
> > > I'm not sure how to fix this, but it seems estimate_num_groups() needs
> > > to do things differently. And I agree looking for the minimum ndistinct
> > > seems like the right approach. but doesn't estimate_num_groups()
> > > supposed to already do that? The comment says:
> > Hi,
> >
> > I have rewritten the code according to the idea of the
> > estimate_num_groups responsibility to adjust estimation according to the EC.
> > I haven't changed all the places of ngroups estimation - only where the
> > planner can access pathkeys to avoid the overhead of passing through all
> > the ECs. And added some cache in the EM.
> > The most important case for me here is GROUP-BY estimation and
> > create_unique_path because I frequently see them as sides of a JOIN.
> > It would be also nice to adjust the cost of memoize rescan, but it may
> > be a subject of the future improvement.
>
> Thank you for your work on this subject.  I've the following notes
> about the last patch.
> 1. Now groupExprs argument of estimate_num_groups() might contain
> either Expr's or Pathkey's.  I think this should be documented.
> 2. A new argument relids of estimate_num_groups() also should be documented.
> 3. I doubt estimate_num_groups() is appropriate place to handle "varno
> 0".  Should we do this in cost_incremental_sort() as before?  I see
> that it currently analyzes only first EquivalenceMember.  But could it
> be a different answers between first EquivalenceMember and others?

4. A new EquivalenceMember.em_default_nd field is not documented.  Is
it necessary to keep a dedicated field for this?  Could we use just
special value of em_ndistinct (which would need to be documented as
well)?
5. I think the algorithm of distinct elements estimation needs more
clarification.  Assuming all equivalence members within the relids
should be equal, the cardinality might be lower than cardinality of
any member.  Do we assume the lowest cardinality among the individual
matching members as the conservative worst-case scenario?

------
Regards,
Alexander Korotkov
Supabase