Thread: BUG #15984: order of where in() query affects query planer

BUG #15984: order of where in() query affects query planer

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      15984
Logged by:          Richard
Email address:      easteregg@verfriemelt.org
PostgreSQL version: 11.5
Operating system:   Debian Sid
Description:

i have a partial index like in the following example and when reorder the
elements of the in() statement,
i get sometimes a bitmap indexscan instead of the expected index only scan.
if i remove an element, i still get the index only,
but with the wrong order, i get a bitmap heap scan. is this expected?



drop table temp;
create table temp ( i int );
insert into temp
select (random()*20)::int from generate_series(1,1000000,1);
create index "full" on temp( i );
create index "partial" on temp( i ) where i in ( 1,2,3,4,5,6,7,8,9 );
vacuum full temp;  

explain analyse select count(*) from temp where i in ( 1,2,3,4,5,6,7,8,9);


┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                              QUERY PLAN
                                                           │

╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ Aggregate  (cost=5336.92..5336.93 rows=1 width=8) (actual
time=164.105..164.105 rows=1 loops=1)
│
│   ->  Index Only Scan using partial on temp  (cost=0.42..5224.42
rows=45000 width=0) (actual time=0.035..138.494 rows=450415 loops=1) │
│         Heap Fetches: 450415
                                                           │
│ Planning Time: 0.953 ms
                                                           │
│ Execution Time: 164.121 ms
                                                           │

└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘


explain analyse select count(*) from temp where i in ( 1,2,3,4,5,6,7,8);


┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                              QUERY PLAN
                                                           │

╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ Aggregate  (cost=5336.38..5336.39 rows=1 width=8) (actual
time=170.707..170.707 rows=1 loops=1)
│
│   ->  Index Only Scan using partial on temp  (cost=0.42..5236.38
rows=40000 width=0) (actual time=0.017..144.923 rows=400509 loops=1) │
│         Index Cond: (i = ANY ('{1,2,3,4,5,6,7,8}'::integer[]))
                                                           │
│         Heap Fetches: 400509
                                                           │
│ Planning Time: 0.153 ms
                                                           │
│ Execution Time: 170.722 ms
                                                           │

└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘


explain analyse select count(*) from temp where i in ( 1,2,3,4,5,6,7,9,8);


┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                QUERY PLAN
                                                              │

╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ Aggregate  (cost=2641.03..2641.04 rows=1 width=8) (actual
time=313.834..313.835 rows=1 loops=1)
  │
│   ->  Bitmap Heap Scan on temp  (cost=837.50..2528.53 rows=45000 width=0)
(actual time=150.929..262.355 rows=450415 loops=1)             │
│         Recheck Cond: ((i = ANY ('{1,2,3,4,5,6,7,8,9}'::integer[])) AND (i
= ANY ('{1,2,3,4,5,6,7,9,8}'::integer[])))                    │
│         Heap Blocks: exact=4425
                                                              │
│         ->  BitmapAnd  (cost=837.50..837.50 rows=2025 width=0) (actual
time=150.465..150.465 rows=0 loops=1)                             │
│               ->  Bitmap Index Scan on partial  (cost=0.00..349.42
rows=45000 width=0) (actual time=46.848..46.848 rows=450415 loops=1)  │
│               ->  Bitmap Index Scan on "full"  (cost=0.00..465.32
rows=45000 width=0) (actual time=103.481..103.482 rows=450415 loops=1) │
│                     Index Cond: (i = ANY
('{1,2,3,4,5,6,7,9,8}'::integer[]))
                   │
│ Planning Time: 0.121 ms
                                                              │
│ Execution Time: 313.859 ms
                                                              │

└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘


Re: BUG #15984: order of where in() query affects query planer

From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes:
> i have a partial index like in the following example and when reorder the
> elements of the in() statement,
> i get sometimes a bitmap indexscan instead of the expected index only scan.
> if i remove an element, i still get the index only,
> but with the wrong order, i get a bitmap heap scan. is this expected?

FWIW, I get the same plan shape with either order of the IN elements.

However, your example is probably going to be subject to plan instability
because

(1) you used vacuum full not plain vacuum.  That doesn't leave the table
in the all-visible condition that would favor an index-only scan.

(2) you didn't analyze the table.  At some point, autovacuum will come
along and rectify that oversight, likely causing the plan choice to
change underneath you.

            regards, tom lane



Re: BUG #15984: order of where in() query affects query planer

From
easteregg@verfriemelt.org
Date:
Thank you for your reply,

i used the given example only because in my original case, i have a 12gb table totaling to 43gb with indices and an quite similar query along with some other clauses in the wherecase and still see this behavior. that table has constantly updates ( around 5-100 updates per second ) written to it.

i confirm, with the example table after the analyse, the result is consistent.
but my real table still have different plans with different order of the array, i attach an explain. the partial index is defined as followed ( i have to blank out some fieldnames and tablenames )

create index sn_<table>_detail_all on <table> (districtid,datepublished) where districtid in (3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19) and statusid in (2,3,4,5);

and the other index from the bitmap query is defined like this:

(districtid, <othertableid>, dateplaced)


 🐘 08:42:38 » user@host.org:5432/user > analyse verbose <table>;
INFO:  analyzing "public.<table>"
INFO:  "<table>": scanned 300000 of 1608676 pages, containing 1369182 live rows and 120789 dead rows; 300000 rows in sample, 7341901 estimated total rows
Time: 93397.647 ms (01:33.398)


 🐘 08:47:06 » user@host.org:5432/user > explain (verbose, analyse)  select count(*) from <table> where districtid in (3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19) and statusid in (2,3,4,5);
                                                                          QUERY PLAN
═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Aggregate  (cost=15644.10..15644.11 rows=1 width=8) (actual time=68.883..68.883 rows=1 loops=1)
  Output: count(*)
  ->  Index Only Scan using sn_<table>r_detail_all on public.<table>  (cost=0.41..15487.10 rows=62803 width=0) (actual time=0.014..61.728 rows=60915 loops=1)
        Output: districtid, datepublished
        Heap Fetches: 11373
Planning Time: 1.636 ms
Execution Time: 68.910 ms
(7 rows)

Time: 94.375 ms


 🐘 08:47:28 » user@host.org:5432/user > explain (verbose, analyse)  select count(*) from <table> where districtid in (4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,3) and statusid in (2,3,4,5);
                                                                                                                             QUERY PLAN
════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Aggregate  (cost=3885.65..3885.66 rows=1 width=8) (actual time=1023.805..1023.805 rows=1 loops=1)
  Output: count(*)
  ->  Bitmap Heap Scan on public.<table>  (cost=3164.92..3728.64 rows=62803 width=0) (actual time=258.616..999.205 rows=60915 loops=1)
        Recheck Cond: ((<table>.districtid = ANY ('{3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19}'::integer[])) AND (<table>.statusid = ANY ('{2,3,4,5}'::integer[])) AND (<table>.districtid = ANY ('{4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,3}'::integer[])))
        Heap Blocks: exact=52047
        ->  BitmapAnd  (cost=3164.92..3164.92 rows=544 width=0) (actual time=224.005..224.005 rows=0 loops=1)
              ->  Bitmap Index Scan on sn_<table>r_detail_all  (cost=0.00..564.43 rows=62803 width=0) (actual time=75.655..75.655 rows=60941 loops=1)
              ->  Bitmap Index Scan on <table>_districtid_<table>rid_dateplaced_idx  (cost=0.00..2568.84 rows=63650 width=0) (actual time=143.565..143.565 rows=61161 loops=1)
                    Index Cond: (<table>.districtid = ANY ('{4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,3}'::integer[]))
Planning Time: 1.538 ms
Execution Time: 1023.846 ms
(11 rows)

Time: 1054.731 ms (00:01.055)


From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
To: easteregg@verfriemelt.org
Cc: pgsql-bugs@lists.postgresql.org
Sent: Fri, 30 Aug 2019 00:16:45 +0200
Subject: Re: BUG #15984: order of where in() query affects query planer

PG Bug reporting form <noreply@postgresql.org> writes:
> i have a partial index like in the following example and when reorder the
> elements of the in() statement,
> i get sometimes a bitmap indexscan instead of the expected index only scan.
> if i remove an element, i still get the index only,
> but with the wrong order, i get a bitmap heap scan. is this expected?

FWIW, I get the same plan shape with either order of the IN elements.

However, your example is probably going to be subject to plan instability
because

(1) you used vacuum full not plain vacuum. That doesn't leave the table
in the all-visible condition that would favor an index-only scan.

(2) you didn't analyze the table. At some point, autovacuum will come
along and rectify that oversight, likely causing the plan choice to
change underneath you.

regards, tom lane

Re: BUG #15984: order of where in() query affects query planer

From
easteregg@verfriemelt.org
Date:
i have created another example for you, were i get consistent results after the analyse with the different plans:

create table test( a int, b int, c int );
insert into test
select random()*100, random()*10, random()*5 FROM generate_series(1,1000000);
create index "full" on test( a, c );
create index "partial" on test( a, c ) where ( a in ( 3,4,5,6,7,8,9,10 ) and b in ( 2,3 ) );
analyse test;

explain analyse select * from test where a in ( 3,4,5,6,7,8,9,10 ) and b in ( 2,3 );

explain analyse select * from test where a in ( 3,4,5,6,7,8,10,9 ) and b in ( 2,3 );

do i have wrong expections from theryplanner, that those are in fact two different paths to take? currently i work around this issue by sorting the elements for the in statements, but i think that must be the wrong path.

with kind regards, richard