Incremental Sort Cost Estimation Instability - Mailing list pgsql-hackers

From Andrei Lepikhov
Subject Incremental Sort Cost Estimation Instability
Date
Msg-id ba0edc53-4b1f-4c67-92d1-29aeddb36a18@gmail.com
Whole thread Raw
Responses Re: Incremental Sort Cost Estimation Instability
List pgsql-hackers
Hi,

While designing an improvement for the cost sort model, I discovered 
that the query plan can vary if we slightly change the query text 
without pushing semantic differences. See the example below:

CREATE TABLE test(x integer, y integer,z text);
INSERT INTO test (x,y) SELECT x, 1 FROM generate_series(1,1000000) AS x;
CREATE INDEX ON test(x);
CREATE INDEX ON test(y);
VACUUM ANALYZE;
SET max_parallel_workers_per_gather = 0;

First query:

EXPLAIN (ANALYZE, TIMING OFF) SELECT count(*) FROM test t1, test t2
WHERE t1.x=t2.y AND t1.y=t2.x GROUP BY t1.x,t1.y;

And the second one - just reverse the left and right sides of 
expressions in the WHERE condition:

EXPLAIN (ANALYZE, TIMING OFF) SELECT count(*) FROM test t1, test t2
WHERE t2.y=t1.x AND t2.x=t1.y GROUP BY t1.x,t1.y;

You can see two different plans here:

GroupAggregate  (cost=37824.89..37824.96 rows=1 width=16)
   Group Key: t1.y, t1.x
   ->  Incremental Sort  (cost=37824.89..37824.94 rows=2 width=8)
         Sort Key: t1.y, t1.x
         Presorted Key: t1.y
         ->  Merge Join  (cost=0.85..37824.88 rows=1 width=8)
               Merge Cond: (t1.y = t2.x)
               Join Filter: (t2.y = t1.x)
               ->  Index Scan using test_y_idx on test t1
               ->  Index Scan using test_x_idx on test t2

GroupAggregate  (cost=37824.89..37824.92 rows=1 width=16)
   Group Key: t1.x, t1.y
   ->  Sort  (cost=37824.89..37824.90 rows=1 width=8)
         Sort Key: t1.x, t1.y
         Sort Method: quicksort  Memory: 25kB
         ->  Merge Join  (cost=0.85..37824.88 rows=1 width=8)
               Merge Cond: (t1.y = t2.x)
               Join Filter: (t2.y = t1.x)
               ->  Index Scan using test_y_idx on test t1
               ->  Index Scan using test_x_idx on test t2

Don't mind for now that the best plan is to do IncrementalSort with 
presorted key t1.x. Just pay attention to the fact that the plan has 
altered without any valuable reason.
The cost_incremental_sort() routine causes such behaviour: it chooses 
the expression to estimate the number of groups from the first 
EquivalenceClass member that relies on the syntax.
I tried to invent a simple solution to fight this minor case. But the 
most clear and straightforward way here is to save a reference to the 
expression that triggered the PathKey creation inside the PathKey itself.
See the sketch of the patch in the attachment.
I'm not sure this instability is worth fixing this way, but the 
dependence of the optimisation outcome on the query text looks buggy.

-- 
regards, Andrei Lepikhov
Attachment

pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: [PATCH] Add ACL (Access Control List) acronym
Next
From: Tom Lane
Date:
Subject: Re: pgindent exit status if a file encounters an error