Thread: Access plan selection logic PG9.2 -> PG14

Access plan selection logic PG9.2 -> PG14

From
"Ryo Yamaji (Fujitsu)"
Date:
Hi all

I am planning to migrate from PostgreSQL9.2 to PostgreSQL14.7.
Running the same SQL with the same statistics as previous versions resulted in different access plans.
I have no knowledge of the PostgreSQL planner and would like to be informed. Please assist.

Behavior:
I am using an environment in which the statistics have been manually updated using pg_dbms_stats. (*)
In PostgreSQL9.2, the primary key was selected when a SELECT was executed with all columns of the primary key specified
asequals in the where clause. 
However, in PostgreSQL14.7, a SELECT with the same condition would select an index that does not contain any columns
specifiedby the where clause. 

(*) The statistics have been updated as follows.
 * pg_class
   relpages=200000, reltuples=9.62e+06, relallvisible=0
  * pg_statistic
      stanullfrac, stadistinct, stakindN, staopN=0
      stanumbersN, stavaluesN=NULL
      stawidth=61
  * dbms_stats.relation_stats_locked
      curpages=200000

The OSS versions for each of the above events are as follows:
PostgreSQL14.7 - pg_dbms_stats 1.3.9
PostgreSQL9.2 - pg_dbms_stats 1.5.0

Question:
I am assuming that the version upgrade has changed the behavior of the planner. Is this correct?
I don't know why they choose a plan that seems more expensive than IndexOnlyScan.

Reproduction Method:
Run the attached SQL script against the database you created.
Enable pg_dbms_stats beforehand.

Ex) psql -d test -f test.sql

The contents of test.sql are:
1. Define a table (13 columns) + primary key + user index
2. Insert Test Data
3. Analyze
4. Update pg_class and pg_statistic, and lock statistics
5. SELECT specifies all columns of primary key in WHERE
6. Delete the data used in the test

Result:

PostgreSQL9.2
explain analyze select a from public.tbl where a='1000000000' and b='1000000000' and c='1000000000' and d='1000000000'
ande='1000000000' and h=1; 
                                                                                QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using tbl_pkey on tbl  (cost=0.00..0.54 rows=1 width=61) (actual time=0.010..0.011 rows=1 loops=1)
   Index Cond: ((a = '1000000000'::text) AND (b = '1000000000'::text) AND (c = '1000000000'::text) AND (d =
'1000000000'::text)AND (e = '1000000000'::text) AND (h = 1)) 
   Heap Fetches: 1
 Total runtime: 0.023 ms

PostgreSQL14.7
explain analyze select a from public.tbl where a='1000000000' and b='1000000000' and c='1000000000' and d='1000000000'
ande='1000000000' and h=1; 
                                                                                 QUERY PLAN
                                                     

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using tbl_ix1 on tbl  (cost=0.18..0.42 rows=1 width=61) (actual time=0.007..0.008 rows=1 loops=1)
   Index Cond: (((a)::text = '1000000000'::text) AND ((b)::text = '1000000000'::text) AND ((c)::text =
'1000000000'::text)AND ((d)::text = '1000000000'::text) AND (h = 1)) 
   Filter: ((e)::text = '1000000000'::text)
 Planning Time: 0.337 ms
 Execution Time: 0.023 ms


Regards, Ryo

Attachment

Re: Access plan selection logic PG9.2 -> PG14

From
David Rowley
Date:
On Fri, 15 Sept 2023 at 01:36, Ryo Yamaji (Fujitsu)
<yamaji.ryo@fujitsu.com> wrote:
> Question:
> I am assuming that the version upgrade has changed the behavior of the planner. Is this correct?

It's possible.  9.2 was a long time ago. It would be quite a bit of
work to determine if this is the case.  You could perhaps test on 9.2
without pg_dbms_stats and see what happens.

It's likely add_path() has changed quite a bit since 9.2. That could
be having an effect.

> I don't know why they choose a plan that seems more expensive than IndexOnlyScan.

This likely is due to the query planner not giving any preference to
the index that allows more quals to go into the index condition.  Once
the selectivity estimate gets as low as 1 row then the costs between
each index don't vary very much. It's possible the PK index didn't
come out any cheaper, or that add_path() saw the costs as (fuzzily)
the same.

I do think the planner should take the number of matched index quals
into account. I'm just not exactly sure how best to cost that in.

Does v14 run faster if you force the tbl_pkey to be used? (perhaps you
could do that just by dropping the other index if you're using a test
instance that's not needed by anyone else).

David



Re: Access plan selection logic PG9.2 -> PG14

From
Adrian Klaver
Date:
On 9/14/23 00:15, Ryo Yamaji (Fujitsu) wrote:
> 
> Hi all
> 
> I am planning to migrate from PostgreSQL9.2 to PostgreSQL14.7.
> Running the same SQL with the same statistics as previous versions resulted in different access plans.
> I have no knowledge of the PostgreSQL planner and would like to be informed. Please assist.
> 
> Behavior:
> I am using an environment in which the statistics have been manually updated using pg_dbms_stats. (*)

Why not just run ANALYZE on the new 14.7 instance?

> In PostgreSQL9.2, the primary key was selected when a SELECT was executed with all columns of the primary key
specifiedas equals in the where clause.
 


-- 
Adrian Klaver
adrian.klaver@aklaver.com




RE: Access plan selection logic PG9.2 -> PG14

From
"Ryo Yamaji (Fujitsu)"
Date:
on Thu, 14 Sept 2023 at 23:43, David Rowley <dgrowleyml@gmail.com> wrote:
> This likely is due to the query planner not giving any preference to the index that allows more quals to go into the
indexcondition.  Once the selectivity estimate gets as low as 1 row then the costs between each index don't vary very
much.It's possible the PK index didn't come out any cheaper, or that add_path() saw the costs as (fuzzily) the same.
 

I understand that this is just a cost estimate, not a check to see if the SQL condition matches the index. 
Thank you for telling me.

> Does v14 run faster if you force the tbl_pkey to be used? (perhaps you could do that just by dropping the other index
ifyou're using a test instance that's not needed by anyone else).
 

The following example shows a table with 1 million tuples:
* The cost of using PK was higher than the cost of using user index.
* It was faster to use PK.

 Index Scan using tbl_ix1 on tbl  (cost=0.43..0.67 rows=1 width=61) (actual time=0.016..185.013 rows=1 loops=1)
   Index Cond: (((a)::text = '1000000000'::text) AND ((b)::text = '1000000000'::text) AND ((c)::text =
'1000000000'::text)AND ((d)::text = '1000000000'::text) AND (h = 1))
 
   Filter: ((e)::text = '1000000000'::text)
   Rows Removed by Filter: 1000000
 Planning Time: 0.407 ms
 Execution Time: 185.031 ms

 Index Only Scan using tbl_pkey on tbl  (cost=0.56..0.79 rows=1 width=61) (actual time=0.026..0.028 rows=1 loops=1)
   Index Cond: ((a = '1000000000'::text) AND (b = '1000000000'::text) AND (c = '1000000000'::text) AND (d =
'1000000000'::text)AND (e = '1000000000'::text) AND (h = 1))
 
   Heap Fetches: 1
 Planning Time: 0.355 ms
 Execution Time: 0.043 ms

I should probably configure the statistics to account for changes in planner behavior.
Therefore, I will consider appropriate measures.


Regards, Ryo

RE: Access plan selection logic PG9.2 -> PG14

From
"Ryo Yamaji (Fujitsu)"
Date:
on Fri, 15 Sept 2023 at 0:00, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> Why not just run ANALYZE on the new 14.7 instance?

The statistics were manually updated to fix the access plan.
I was trying to see if I could fix the access plan in a similar way in 9.2 -> 14.7.

I probably will run ANALYZE soon because I haven't been able to achieve the access plan I want.


Regards, Ryo

Re: Access plan selection logic PG9.2 -> PG14

From
David Rowley
Date:
On Fri, 15 Sept 2023 at 21:13, Ryo Yamaji (Fujitsu)
<yamaji.ryo@fujitsu.com> wrote:
> The following example shows a table with 1 million tuples:
> * The cost of using PK was higher than the cost of using user index.
> * It was faster to use PK.
>
>  Index Scan using tbl_ix1 on tbl  (cost=0.43..0.67 rows=1 width=61) (actual time=0.016..185.013 rows=1 loops=1)
>    Index Cond: (((a)::text = '1000000000'::text) AND ((b)::text = '1000000000'::text) AND ((c)::text =
'1000000000'::text)AND ((d)::text = '1000000000'::text) AND (h = 1))
 
>    Filter: ((e)::text = '1000000000'::text)
>    Rows Removed by Filter: 1000000
>  Planning Time: 0.407 ms
>  Execution Time: 185.031 ms
>
>  Index Only Scan using tbl_pkey on tbl  (cost=0.56..0.79 rows=1 width=61) (actual time=0.026..0.028 rows=1 loops=1)
>    Index Cond: ((a = '1000000000'::text) AND (b = '1000000000'::text) AND (c = '1000000000'::text) AND (d =
'1000000000'::text)AND (e = '1000000000'::text) AND (h = 1))
 
>    Heap Fetches: 1
>  Planning Time: 0.355 ms
>  Execution Time: 0.043 ms
>
> I should probably configure the statistics to account for changes in planner behavior.
> Therefore, I will consider appropriate measures.

If I were you, I'd try what Adrian mentioned and run ANALYZE on this table.

I tried the following and I can only get it to use the tbl_ix1 index
if I don't run analyze. After running analyze, PostgreSQL 14.7 seems
it would rather Seq scan than use the tbl_ix1 index after dropping the
primary key constriant.

drop table if exists tbl;
create table tbl (a text, b text, c text, d text, e text, h int);
insert into tbl select
'1000000000','1000000000','1000000000','1000000000',x::text,1 from
generate_Series(999000001,1000000000)x;
create index tbl_ix1  on tbl(a,b,c,d,h);
alter table tbl add constraint tbl_pkey primary key (a,b,c,d,e,h);

explain analyze select a,b,c,d,h from tbl where a = '1000000000' and b
= '1000000000' and c = '1000000000' and d = '1000000000' and e =
'1000000000' and h = 1;

Index Scan using tbl_ix1 on tbl  (cost=0.42..8.46 rows=1 width=132)
(actual time=121.062..121.062 rows=1 loops=1)
   Index Cond: ((a = '1000000000'::text) AND (b = '1000000000'::text)
AND (c = '1000000000'::text) AND (d = '1000000000'::text) AND (h = 1))
   Filter: (e = '1000000000'::text)
   Rows Removed by Filter: 999999
 Planning Time: 0.266 ms
 Execution Time: 121.077 ms
(6 rows)

analyze tbl;

explain analyze select a,b,c,d,h from tbl where a = '1000000000' and b
= '1000000000' and c = '1000000000' and d = '1000000000' and e =
'1000000000' and h = 1;

 Index Only Scan using tbl_pkey on tbl  (cost=0.55..4.58 rows=1
width=48) (actual time=0.071..0.072 rows=1 loops=1)
   Index Cond: ((a = '1000000000'::text) AND (b = '1000000000'::text)
AND (c = '1000000000'::text) AND (d = '1000000000'::text) AND (e =
'1000000000'::text) AND (h
 = 1))
   Heap Fetches: 0
 Planning Time: 0.146 ms
 Execution Time: 0.087 ms
(5 rows)

David