Thread: Incremental Sort Cost Estimation Instability
Hi, While designing an improvement for the cost sort model, I discovered that the query plan can vary if we slightly change the query text without pushing semantic differences. See the example below: CREATE TABLE test(x integer, y integer,z text); INSERT INTO test (x,y) SELECT x, 1 FROM generate_series(1,1000000) AS x; CREATE INDEX ON test(x); CREATE INDEX ON test(y); VACUUM ANALYZE; SET max_parallel_workers_per_gather = 0; First query: EXPLAIN (ANALYZE, TIMING OFF) SELECT count(*) FROM test t1, test t2 WHERE t1.x=t2.y AND t1.y=t2.x GROUP BY t1.x,t1.y; And the second one - just reverse the left and right sides of expressions in the WHERE condition: EXPLAIN (ANALYZE, TIMING OFF) SELECT count(*) FROM test t1, test t2 WHERE t2.y=t1.x AND t2.x=t1.y GROUP BY t1.x,t1.y; You can see two different plans here: GroupAggregate (cost=37824.89..37824.96 rows=1 width=16) Group Key: t1.y, t1.x -> Incremental Sort (cost=37824.89..37824.94 rows=2 width=8) Sort Key: t1.y, t1.x Presorted Key: t1.y -> Merge Join (cost=0.85..37824.88 rows=1 width=8) Merge Cond: (t1.y = t2.x) Join Filter: (t2.y = t1.x) -> Index Scan using test_y_idx on test t1 -> Index Scan using test_x_idx on test t2 GroupAggregate (cost=37824.89..37824.92 rows=1 width=16) Group Key: t1.x, t1.y -> Sort (cost=37824.89..37824.90 rows=1 width=8) Sort Key: t1.x, t1.y Sort Method: quicksort Memory: 25kB -> Merge Join (cost=0.85..37824.88 rows=1 width=8) Merge Cond: (t1.y = t2.x) Join Filter: (t2.y = t1.x) -> Index Scan using test_y_idx on test t1 -> Index Scan using test_x_idx on test t2 Don't mind for now that the best plan is to do IncrementalSort with presorted key t1.x. Just pay attention to the fact that the plan has altered without any valuable reason. The cost_incremental_sort() routine causes such behaviour: it chooses the expression to estimate the number of groups from the first EquivalenceClass member that relies on the syntax. I tried to invent a simple solution to fight this minor case. But the most clear and straightforward way here is to save a reference to the expression that triggered the PathKey creation inside the PathKey itself. See the sketch of the patch in the attachment. I'm not sure this instability is worth fixing this way, but the dependence of the optimisation outcome on the query text looks buggy. -- regards, Andrei Lepikhov
Attachment
On Thu, 27 Jun 2024 at 03:00, Andrei Lepikhov <lepihov@gmail.com> wrote: > I tried to invent a simple solution to fight this minor case. But the > most clear and straightforward way here is to save a reference to the > expression that triggered the PathKey creation inside the PathKey itself. > See the sketch of the patch in the attachment. > I'm not sure this instability is worth fixing this way, but the > dependence of the optimisation outcome on the query text looks buggy. I don't think that's going to work as that'll mean it'll just choose whichever expression was used when the PathKey was first created. For your example query, both PathKey's are first created for the GROUP BY clause in standard_qp_callback(). I only have to change the GROUP BY in your query to use the equivalent column in the other table to get it to revert back to the plan you complained about. postgres=# EXPLAIN (costs off) SELECT count(*) FROM test t1, test t2 WHERE t1.x=t2.y AND t1.y=t2.x GROUP BY t2.y,t2.x; QUERY PLAN ---------------------------------------------------------- GroupAggregate Group Key: t2.y, t2.x -> Sort Sort Key: t2.y, t2.x -> Merge Join Merge Cond: (t1.y = t2.x) Join Filter: (t2.y = t1.x) -> Index Scan using test_y_idx on test t1 -> Index Scan using test_x_idx on test t2 (9 rows) Maybe doing something with estimate_num_groups() to find the EquivalenceClass member with the least distinct values would be better. I just can't think how that could be done in a performant way. David
On 12/9/2024 03:05, David Rowley wrote: > On Thu, 27 Jun 2024 at 03:00, Andrei Lepikhov <lepihov@gmail.com> wrote: >> I tried to invent a simple solution to fight this minor case. But the >> most clear and straightforward way here is to save a reference to the >> expression that triggered the PathKey creation inside the PathKey itself. >> See the sketch of the patch in the attachment. >> I'm not sure this instability is worth fixing this way, but the >> dependence of the optimisation outcome on the query text looks buggy. > > I don't think that's going to work as that'll mean it'll just choose > whichever expression was used when the PathKey was first created. For > your example query, both PathKey's are first created for the GROUP BY > clause in standard_qp_callback(). I only have to change the GROUP BY > in your query to use the equivalent column in the other table to get > it to revert back to the plan you complained about. Yes, it is true. It is not ideal solution so far - looking for better ideas. > Maybe doing something with estimate_num_groups() to find the > EquivalenceClass member with the least distinct values would be > better. I just can't think how that could be done in a performant way. Initial problem causes wrong cost_sort estimation. Right now I think about providing cost_sort() the sort clauses instead of (or in addition to) the pathkeys. -- regards, Andrei Lepikhov
On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote: > Initial problem causes wrong cost_sort estimation. Right now I think > about providing cost_sort() the sort clauses instead of (or in addition > to) the pathkeys. I'm not quite sure why the sort clauses matter any more than the EquivalenceClass. If the EquivalanceClass defines that all members will have the same value for any given row, then, if we had to choose any single member to drive the n_distinct estimate from, isn't the most accurate distinct estimate from the member with the smallest n_distinct estimate? (That assumes the less distinct member has every value the more distinct member has, which might not be true) David
On 9/12/24 12:12, David Rowley wrote: > On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote: >> Initial problem causes wrong cost_sort estimation. Right now I think >> about providing cost_sort() the sort clauses instead of (or in addition >> to) the pathkeys. > > I'm not quite sure why the sort clauses matter any more than the > EquivalenceClass. If the EquivalanceClass defines that all members > will have the same value for any given row, then, if we had to choose > any single member to drive the n_distinct estimate from, isn't the > most accurate distinct estimate from the member with the smallest > n_distinct estimate? (That assumes the less distinct member has every > value the more distinct member has, which might not be true) > > David > How large can the cost difference get? My assumption was it's correlated with how different the ndistincts are for the two sides, so I tried CREATE TABLE t1(x integer, y integer,z text); CREATE TABLE t2(x integer, y integer,z text); INSERT INTO t1 (x,y) SELECT x, 1 FROM generate_series(1,1000000) AS x; INSERT INTO t2 (x,y) SELECT mod(x,1000), 1 FROM generate_series(1,1000000) AS x; CREATE INDEX ON t1(x); CREATE INDEX ON t2(x); CREATE INDEX ON t1(y); CREATE INDEX ON t2(y); VACUUM ANALYZE; Which changes the ndistinct for t2.x from 1M to 1k. I've expected the cost difference to get much larger, but in it does not really change: GroupAggregate (cost=38.99..37886.88 rows=992 width=16) (actual rows=1 loops=1) GroupAggregate (cost=37874.26..37904.04 rows=992 width=16) (actual rows=1 loops=1) That is pretty significant - the total cost difference is tiny, I'd even say it does not matter in practice (seems well within possible impact of collecting a different random sample). But the startup cost changes in rather absurd way, while the rest of the plan is exactly the same. We even know this: -> Incremental Sort (cost=38.99..37869.52 rows=992 width=8) Sort Key: t1.x, t1.y Presorted Key: t1.x in both cases. There's literally no other difference between these plans visible in the explain ... I'm not sure how to fix this, but it seems estimate_num_groups() needs to do things differently. And I agree looking for the minimum ndistinct seems like the right approach. but doesn't estimate_num_groups() supposed to already do that? The comment says: * 3. If the list contains Vars of different relations that are known equal * due to equivalence classes, then drop all but one of the Vars from each * known-equal set, keeping the one with smallest estimated # of values * (since the extra values of the others can't appear in joined rows). * Note the reason we only consider Vars of different relations is that * if we considered ones of the same rel, we'd be double-counting the * restriction selectivity of the equality in the next step. I haven't debugged this, but how come this doesn't do the trick? regards -- Tomas Vondra
On 12/9/2024 12:12, David Rowley wrote: > On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote: >> Initial problem causes wrong cost_sort estimation. Right now I think >> about providing cost_sort() the sort clauses instead of (or in addition >> to) the pathkeys. > > I'm not quite sure why the sort clauses matter any more than the > EquivalenceClass. If the EquivalanceClass defines that all members > will have the same value for any given row, then, if we had to choose > any single member to drive the n_distinct estimate from, isn't the > most accurate distinct estimate from the member with the smallest > n_distinct estimate? (That assumes the less distinct member has every > value the more distinct member has, which might not be true) Thanks for your efforts! Your idea looks more stable and applicable than my patch. BTW, it could still provide wrong ndistinct estimations if we choose a sorting operator under clauses mentioned in the EquivalenceClass. However, this thread's primary intention is to stabilize query plans, so I'll try to implement your idea. The second reason was to distinguish sortings by cost (see proposal [1]) because sometimes it could help to save CPU cycles on comparisons. Having a lot of sort/grouping queries with only sporadic joins, I see how profitable it could sometimes be - text or numeric grouping over mostly Cartesian join may be painful without fine tuned sorting. [1] https://www.postgresql.org/message-id/8742aaa8-9519-4a1f-91bd-364aec65f5cf@gmail.com -- regards, Andrei Lepikhov
On 12/9/2024 16:57, Tomas Vondra wrote: > On 9/12/24 12:12, David Rowley wrote: >> On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote: > I'm not sure how to fix this, but it seems estimate_num_groups() needs > to do things differently. And I agree looking for the minimum ndistinct > seems like the right approach. but doesn't estimate_num_groups() > supposed to already do that? The comment says: I've rewritten the code in the previous email. It looks like we can try to rewrite estimate_num_groups to do it more effectively, but I haven't done it yet. Regarding the tiny change in the cost, my initial reason was to teach cost_sort to differ sort orderings: begin by considering the number of columns in the cost estimation and then consider the distinct estimation of the first column. BTW, it was triggered by user reports, where a slight change in the balance between MergeAppend/GatherMerge/Sort/IncrementalSort (or columns order) could give significant profit. Especially when grouping millions of rows in 2-4 bytea columns. -- regards, Andrei Lepikhov
On 12/9/2024 16:57, Tomas Vondra wrote: > On 9/12/24 12:12, David Rowley wrote: >> On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote: > but doesn't estimate_num_groups() > supposed to already do that? The comment says: > > * 3. If the list contains Vars of different relations that are known equal > * due to equivalence classes, then drop all but one of the Vars from each > * known-equal set, keeping the one with smallest estimated # of values > * (since the extra values of the others can't appear in joined rows). > * Note the reason we only consider Vars of different relations is that > * if we considered ones of the same rel, we'd be double-counting the > * restriction selectivity of the equality in the next step. > > I haven't debugged this, but how come this doesn't do the trick? I've got your point now. Unfortunately, this comment says that estimate_num_groups removes duplicates from the list of grouping expressions (see exprs_known_equal). But it doesn't discover em_members to find the most-fitted clause for each grouping position. -- regards, Andrei Lepikhov
Hi!
On 10/8/24 11:33, Andrei Lepikhov wrote:On 9/23/24 20:02, Andrei Lepikhov wrote:Now, this thread looks connected to the [1]. However, it still has independent profit, which can be discussed separately.On 12/9/2024 12:12, David Rowley wrote:Minor change to make compiler and cfbot happyOn Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com>
After the introduction of the em->em_ndistinct cache, I played around with the idea of letting the estimate_num_groups use this cache. Occasionally found out that we have one more instability case like the following:
DROP TABLE IF EXISTS test;
CREATE TABLE test (x int, y int, z int);
INSERT INTO test (x,y,z) (SELECT random()*1E5, random()*2, random() FROM generate_series(1,1e4));
VACUUM ANALYZE test;
EXPLAIN SELECT count(*) FROM test WHERE x=y GROUP BY x,y;
EXPLAIN SELECT count(*) FROM test WHERE x=y GROUP BY y,x;
Here, you can see that depending on the initial order of grouping, Postgres chooses different columns for grouping. Doing that causes different estimations - one of them is definitely wrong:
GroupAggregate (cost=181.41..182.29 rows=50 width=16)
GroupAggregate (cost=181.41..181.82 rows=3 width=16)
That happens because when estimating the number of groups, Postgres doesn't consider EquivalenceClass, which can let him correct group estimation at a low price.
It may be done inside the make_pathkeys_for_sortclauses_extended by choosing a column with a lower number of distinct, but IMO, it is better to do it at the moment of the number of groups estimation.
Thoughts? Is it a real issue or just a non-practical corner case?
The new version of the patch is attached.
[1] https://www.postgresql.org/message-id/flat/8742aaa8-9519-4a1f-91bd-364aec65f5cf%40gmail.com
But you haven’t considered the case when you need to use non-cached values, for example, if ndistinct has already changed. Look, here x has a minimum ndistinct, and then column z:
alena@postgres=# delete from test;
DELETE 10000
alena@postgres=# INSERT INTO test (x,y,z) (SELECT id%3, id*2, id FROM generate_series(1,1e4) as id);
INSERT 0 10000
alena@postgres=# EXPLAIN SELECT * FROM test where x=y ORDER BY x,y,z;
QUERY PLAN
--------------------------------------------------------------
Sort (cost=196.88..197.02 rows=56 width=12)
Sort Key: x, z
-> Seq Scan on test (cost=0.00..195.25 rows=56 width=12)
Filter: (x = y)
(4 rows)
alena@postgres=# delete from test;
DELETE 10000
alena@postgres=# INSERT INTO test (x,y,z) (SELECT id, id*2, id%3 FROM generate_series(1,1e4) as id);
INSERT 0 10000
alena@postgres=# vacuum analyze;
VACUUM
alena@postgres=# EXPLAIN SELECT * FROM test where x=y ORDER BY x,y,z;
QUERY PLAN
--------------------------------------------------------------
Sort (cost=235.41..235.54 rows=50 width=12)
Sort Key: x, z
-> Seq Scan on test (cost=0.00..234.00 rows=50 width=12)
Filter: (x = y)
(4 rows)
but the order of the columns does not change, as you can see.
-- Regards, Alena Rybakina Postgres Professional
On 11/7/24 18:06, Alena Rybakina wrote: > On 07.11.2024 08:57, Andrei Lepikhov wrote: >> That happens because when estimating the number of groups, Postgres >> doesn't consider EquivalenceClass, which can let him correct group >> estimation at a low price. >> It may be done inside the make_pathkeys_for_sortclauses_extended by >> choosing a column with a lower number of distinct, but IMO, it is >> better to do it at the moment of the number of groups estimation. >> >> Thoughts? Is it a real issue or just a non-practical corner case? >> >> The new version of the patch is attached. >> >> [1] https://www.postgresql.org/message-id/ >> flat/8742aaa8-9519-4a1f-91bd-364aec65f5cf%40gmail.com > > But you haven’t considered the case when you need to use non-cached > values, for example, if ndistinct has already changed. Look, here x has > a minimum ndistinct, and then column z: > but the order of the columns does not change, as you can see. I'm unsure what you mean by talking about 'cached value' or 'changed ndistinct' even slightly. Also, I don't understand the issue you tried to show with your examples. My point was that an equality expression can be used to modify statistics-based decisions on the number of groups. Look: A.x, distincts = 1000 A.y, distincts = 10 After the filter 'A.x=A.y' it is impossible to get more than 10 groups on the A.x as well as on the A.y column. So, we have a tool to correct the estimation considering equivalence classes. -- regards, Andrei Lepikhov