Thread: Access plan selection logic PG9.2 -> PG14
Hi all I am planning to migrate from PostgreSQL9.2 to PostgreSQL14.7. Running the same SQL with the same statistics as previous versions resulted in different access plans. I have no knowledge of the PostgreSQL planner and would like to be informed. Please assist. Behavior: I am using an environment in which the statistics have been manually updated using pg_dbms_stats. (*) In PostgreSQL9.2, the primary key was selected when a SELECT was executed with all columns of the primary key specified asequals in the where clause. However, in PostgreSQL14.7, a SELECT with the same condition would select an index that does not contain any columns specifiedby the where clause. (*) The statistics have been updated as follows. * pg_class relpages=200000, reltuples=9.62e+06, relallvisible=0 * pg_statistic stanullfrac, stadistinct, stakindN, staopN=0 stanumbersN, stavaluesN=NULL stawidth=61 * dbms_stats.relation_stats_locked curpages=200000 The OSS versions for each of the above events are as follows: PostgreSQL14.7 - pg_dbms_stats 1.3.9 PostgreSQL9.2 - pg_dbms_stats 1.5.0 Question: I am assuming that the version upgrade has changed the behavior of the planner. Is this correct? I don't know why they choose a plan that seems more expensive than IndexOnlyScan. Reproduction Method: Run the attached SQL script against the database you created. Enable pg_dbms_stats beforehand. Ex) psql -d test -f test.sql The contents of test.sql are: 1. Define a table (13 columns) + primary key + user index 2. Insert Test Data 3. Analyze 4. Update pg_class and pg_statistic, and lock statistics 5. SELECT specifies all columns of primary key in WHERE 6. Delete the data used in the test Result: PostgreSQL9.2 explain analyze select a from public.tbl where a='1000000000' and b='1000000000' and c='1000000000' and d='1000000000' ande='1000000000' and h=1; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Index Only Scan using tbl_pkey on tbl (cost=0.00..0.54 rows=1 width=61) (actual time=0.010..0.011 rows=1 loops=1) Index Cond: ((a = '1000000000'::text) AND (b = '1000000000'::text) AND (c = '1000000000'::text) AND (d = '1000000000'::text)AND (e = '1000000000'::text) AND (h = 1)) Heap Fetches: 1 Total runtime: 0.023 ms PostgreSQL14.7 explain analyze select a from public.tbl where a='1000000000' and b='1000000000' and c='1000000000' and d='1000000000' ande='1000000000' and h=1; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using tbl_ix1 on tbl (cost=0.18..0.42 rows=1 width=61) (actual time=0.007..0.008 rows=1 loops=1) Index Cond: (((a)::text = '1000000000'::text) AND ((b)::text = '1000000000'::text) AND ((c)::text = '1000000000'::text)AND ((d)::text = '1000000000'::text) AND (h = 1)) Filter: ((e)::text = '1000000000'::text) Planning Time: 0.337 ms Execution Time: 0.023 ms Regards, Ryo
Attachment
On Fri, 15 Sept 2023 at 01:36, Ryo Yamaji (Fujitsu) <yamaji.ryo@fujitsu.com> wrote: > Question: > I am assuming that the version upgrade has changed the behavior of the planner. Is this correct? It's possible. 9.2 was a long time ago. It would be quite a bit of work to determine if this is the case. You could perhaps test on 9.2 without pg_dbms_stats and see what happens. It's likely add_path() has changed quite a bit since 9.2. That could be having an effect. > I don't know why they choose a plan that seems more expensive than IndexOnlyScan. This likely is due to the query planner not giving any preference to the index that allows more quals to go into the index condition. Once the selectivity estimate gets as low as 1 row then the costs between each index don't vary very much. It's possible the PK index didn't come out any cheaper, or that add_path() saw the costs as (fuzzily) the same. I do think the planner should take the number of matched index quals into account. I'm just not exactly sure how best to cost that in. Does v14 run faster if you force the tbl_pkey to be used? (perhaps you could do that just by dropping the other index if you're using a test instance that's not needed by anyone else). David
On 9/14/23 00:15, Ryo Yamaji (Fujitsu) wrote: > > Hi all > > I am planning to migrate from PostgreSQL9.2 to PostgreSQL14.7. > Running the same SQL with the same statistics as previous versions resulted in different access plans. > I have no knowledge of the PostgreSQL planner and would like to be informed. Please assist. > > Behavior: > I am using an environment in which the statistics have been manually updated using pg_dbms_stats. (*) Why not just run ANALYZE on the new 14.7 instance? > In PostgreSQL9.2, the primary key was selected when a SELECT was executed with all columns of the primary key specifiedas equals in the where clause. -- Adrian Klaver adrian.klaver@aklaver.com
on Thu, 14 Sept 2023 at 23:43, David Rowley <dgrowleyml@gmail.com> wrote: > This likely is due to the query planner not giving any preference to the index that allows more quals to go into the indexcondition. Once the selectivity estimate gets as low as 1 row then the costs between each index don't vary very much.It's possible the PK index didn't come out any cheaper, or that add_path() saw the costs as (fuzzily) the same. I understand that this is just a cost estimate, not a check to see if the SQL condition matches the index. Thank you for telling me. > Does v14 run faster if you force the tbl_pkey to be used? (perhaps you could do that just by dropping the other index ifyou're using a test instance that's not needed by anyone else). The following example shows a table with 1 million tuples: * The cost of using PK was higher than the cost of using user index. * It was faster to use PK. Index Scan using tbl_ix1 on tbl (cost=0.43..0.67 rows=1 width=61) (actual time=0.016..185.013 rows=1 loops=1) Index Cond: (((a)::text = '1000000000'::text) AND ((b)::text = '1000000000'::text) AND ((c)::text = '1000000000'::text)AND ((d)::text = '1000000000'::text) AND (h = 1)) Filter: ((e)::text = '1000000000'::text) Rows Removed by Filter: 1000000 Planning Time: 0.407 ms Execution Time: 185.031 ms Index Only Scan using tbl_pkey on tbl (cost=0.56..0.79 rows=1 width=61) (actual time=0.026..0.028 rows=1 loops=1) Index Cond: ((a = '1000000000'::text) AND (b = '1000000000'::text) AND (c = '1000000000'::text) AND (d = '1000000000'::text)AND (e = '1000000000'::text) AND (h = 1)) Heap Fetches: 1 Planning Time: 0.355 ms Execution Time: 0.043 ms I should probably configure the statistics to account for changes in planner behavior. Therefore, I will consider appropriate measures. Regards, Ryo
on Fri, 15 Sept 2023 at 0:00, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > Why not just run ANALYZE on the new 14.7 instance? The statistics were manually updated to fix the access plan. I was trying to see if I could fix the access plan in a similar way in 9.2 -> 14.7. I probably will run ANALYZE soon because I haven't been able to achieve the access plan I want. Regards, Ryo
On Fri, 15 Sept 2023 at 21:13, Ryo Yamaji (Fujitsu) <yamaji.ryo@fujitsu.com> wrote: > The following example shows a table with 1 million tuples: > * The cost of using PK was higher than the cost of using user index. > * It was faster to use PK. > > Index Scan using tbl_ix1 on tbl (cost=0.43..0.67 rows=1 width=61) (actual time=0.016..185.013 rows=1 loops=1) > Index Cond: (((a)::text = '1000000000'::text) AND ((b)::text = '1000000000'::text) AND ((c)::text = '1000000000'::text)AND ((d)::text = '1000000000'::text) AND (h = 1)) > Filter: ((e)::text = '1000000000'::text) > Rows Removed by Filter: 1000000 > Planning Time: 0.407 ms > Execution Time: 185.031 ms > > Index Only Scan using tbl_pkey on tbl (cost=0.56..0.79 rows=1 width=61) (actual time=0.026..0.028 rows=1 loops=1) > Index Cond: ((a = '1000000000'::text) AND (b = '1000000000'::text) AND (c = '1000000000'::text) AND (d = '1000000000'::text)AND (e = '1000000000'::text) AND (h = 1)) > Heap Fetches: 1 > Planning Time: 0.355 ms > Execution Time: 0.043 ms > > I should probably configure the statistics to account for changes in planner behavior. > Therefore, I will consider appropriate measures. If I were you, I'd try what Adrian mentioned and run ANALYZE on this table. I tried the following and I can only get it to use the tbl_ix1 index if I don't run analyze. After running analyze, PostgreSQL 14.7 seems it would rather Seq scan than use the tbl_ix1 index after dropping the primary key constriant. drop table if exists tbl; create table tbl (a text, b text, c text, d text, e text, h int); insert into tbl select '1000000000','1000000000','1000000000','1000000000',x::text,1 from generate_Series(999000001,1000000000)x; create index tbl_ix1 on tbl(a,b,c,d,h); alter table tbl add constraint tbl_pkey primary key (a,b,c,d,e,h); explain analyze select a,b,c,d,h from tbl where a = '1000000000' and b = '1000000000' and c = '1000000000' and d = '1000000000' and e = '1000000000' and h = 1; Index Scan using tbl_ix1 on tbl (cost=0.42..8.46 rows=1 width=132) (actual time=121.062..121.062 rows=1 loops=1) Index Cond: ((a = '1000000000'::text) AND (b = '1000000000'::text) AND (c = '1000000000'::text) AND (d = '1000000000'::text) AND (h = 1)) Filter: (e = '1000000000'::text) Rows Removed by Filter: 999999 Planning Time: 0.266 ms Execution Time: 121.077 ms (6 rows) analyze tbl; explain analyze select a,b,c,d,h from tbl where a = '1000000000' and b = '1000000000' and c = '1000000000' and d = '1000000000' and e = '1000000000' and h = 1; Index Only Scan using tbl_pkey on tbl (cost=0.55..4.58 rows=1 width=48) (actual time=0.071..0.072 rows=1 loops=1) Index Cond: ((a = '1000000000'::text) AND (b = '1000000000'::text) AND (c = '1000000000'::text) AND (d = '1000000000'::text) AND (e = '1000000000'::text) AND (h = 1)) Heap Fetches: 0 Planning Time: 0.146 ms Execution Time: 0.087 ms (5 rows) David