Thread: Partial Index [WHERE col IN ('A','B')] Planner Anomalies
Hi,
I have the following query:
EXPLAIN SELECT * FROM taskinstance WHERE ti_status IN ('ACTIVE', 'DISPATCHED', 'FAILURE');
With two indices:
ti_active… partial index over (ti_status) WHERE ti_status IN (‘ACTIVE’,’DISPATCHED’,’FAILURE’)
ti_sortedstatus… full index over (ti_status, ti_creationtimestamp)
If I explain the above query multiple times (without any data changes) I get either of the two query plans / the Bitmap Heap one more often. If I analyze the table I immediately get the “Index Scan” plan first but subsequent explains revert to switching between the two However, if I run any other query but the three-criteria query then the flipping behavior observed stops and the “Bitmap Heap” plan becomes the dominant plan. The
QUERY PLAN
Bitmap Heap Scan on taskinstance (cost=115.87..135.83 rows=2336 width=723)
Recheck Cond: (((ti_status)::text = ANY ('{ACTIVE,DISPATCHED,FAILURE}'::text[])) AND ((ti_status)::text = ANY ('{ACTIVE,DISPATCHED,FAILURE}'::text[])))
-> BitmapAnd (cost=115.87..115.87 rows=5 width=0)
-> Bitmap Index Scan on ti_active_dispatch_failed_tasks_idx (cost=0.00..47.94 rows=2336 width=0)
-> Bitmap Index Scan on ti_sortedstatus_idx (cost=0.00..66.52 rows=2336 width=0)
Index Cond: ((ti_status)::text = ANY ('{ACTIVE,DISPATCHED,FAILURE}'::text[]))
QUERY PLAN
Index Scan using ti_active_dispatch_failed_tasks_idx on taskinstance (cost=0.00..402.03 rows=2348 width=718)
In a total table size of 1.2M records the partial index covers around 2300
Now, if I keep the IN construct but only list one of the options (any one) I get the following:
QUERY PLAN
Index Scan using ti_active_dispatch_failed_tasks_idx on taskinstance (cost=0.00..466.44 rows=2868 width=725)
Index Cond: ((ti_status)::text = 'FAILURE'::text) --works for ‘ACTIVE’ and ‘DISPATCHED’ as well
If I include two of them I get the following results:
Active, Dispatched ------------
QUERY PLAN
Bitmap Heap Scan on taskinstance (cost=8.52..12.53 rows=1 width=718)
Recheck Cond: ((ti_status)::text = ANY ('{ACTIVE,DISPATCHED}'::text[]))
-> Bitmap Index Scan on ti_active_dispatch_failed_tasks_idx (cost=0.00..8.52 rows=1 width=0) <<<<<<<<<<<<<<<<
Index Cond: ((ti_status)::text = ANY ('{ACTIVE,DISPATCHED}'::text[]))
Active, Failure ---------------
QUERY PLAN
Bitmap Heap Scan on taskinstance (cost=111.51..123.49 rows=2020 width=718)
Recheck Cond: (((ti_status)::text = ANY ('{ACTIVE,DISPATCHED,FAILURE}'::text[])) AND ((ti_status)::text = ANY ('{ACTIVE,FAILURE}'::text[])))
-> BitmapAnd (cost=111.51..111.51 rows=3 width=0)
-> Bitmap Index Scan on ti_active_dispatch_failed_tasks_idx (cost=0.00..46.36 rows=2020 width=0) <<<<<<<<<<<<<<<<<
-> Bitmap Index Scan on ti_sortedstatus_idx (cost=0.00..63.89 rows=2020 width=0) <<<<<<<<<<<<<<<<<<<<
Index Cond: ((ti_status)::text = ANY ('{ACTIVE,FAILURE}'::text[]))
Dispatched, Failure --------------
QUERY PLAN
Bitmap Heap Scan on taskinstance (cost=115.77..135.72 rows=2348 width=718)
Recheck Cond: (((ti_status)::text = ANY ('{ACTIVE,DISPATCHED,FAILURE}'::text[])) AND ((ti_status)::text = ANY ('{DISPATCHED,FAILURE}'::text[])))
-> BitmapAnd (cost=115.77..115.77 rows=5 width=0)
-> Bitmap Index Scan on ti_active_dispatch_failed_tasks_idx (cost=0.00..48.00 rows=2348 width=0) <<<<<<<<<<<<<<<<<<<
-> Bitmap Index Scan on ti_sortedstatus_idx (cost=0.00..66.35 rows=2348 width=0) <<<<<<<<<<<<<<<<<<<
Index Cond: ((ti_status)::text = ANY ('{DISPATCHED,FAILURE}'::text[]))
In the double and single criteria cases I get the same plan each time – regardless of whether I run ANALYZE.
With the double-conditions the inclusion of “FAILURE” seems to confuse the planner but the single condition plan for FAILURE works as expected.
The documentation is unclear on how the use of “IN ()” affects the query planner. At a basic level IN becomes (= OR =) which are both simple conditionals/logic elements so it should work properly. Also, if the planner recognizes that the partial index is sufficient to cover the WHERE clause why would it want to bring in a full index that is, by definition, redundant?
I guess if you are not evaluating the WHERE clause the two-constraint versions likely would work oddly (and they do for ACTIVE/DISPATCHED if I reverse the order of the two items) but the three-constraint version occasionally works and the one-constraints versions always work even though they too do not match the original index where clause.
Nothing of a critical nature here but figured I’d report what I am seeing. I can probably drop some constraints( and some data ) if you want to see the “taskinstance” table; and I will happily provide whatever configuration info you ask for.
Running 9.0.3 Windows 7 64-bit
Thanks,
David Johnston
"David Johnston" <polobo@yahoo.com> writes: > If I explain the above query multiple times (without any data changes) I get > either of the two query plans / the Bitmap Heap one more often. If I > analyze the table I immediately get the "Index Scan" plan first but > subsequent explains revert to switching between the two However, if I run > any other query but the three-criteria query then the flipping behavior > observed stops and the "Bitmap Heap" plan becomes the dominant plan. The Hmmm ... I'm seeing some very odd behavior here too. I suspect there is some use-of-uninitialized-memory in there. After several false starts I have a script that reliably reproduces the change of behavior in (debug builds of) both HEAD and 9.0: drop table if exists taskinstance; create table taskinstance (ti_status varchar, ti_creationtimestamp timestamptz); create index ti_sortedstatus on taskinstance (ti_status, ti_creationtimestamp); create index ti_active on taskinstance (ti_status) WHERE ti_status IN ('ACTIVE', 'DISPATCHED', 'FAILURE'); vacuum taskinstance; \c - EXPLAIN SELECT * FROM taskinstance WHERE ti_status IN ('ACTIVE', 'DISPATCHED', 'FAILURE'); EXPLAIN SELECT * FROM taskinstance WHERE ti_status IN ('ACTIVE', 'DISPATCHED', 'FAILURE'); EXPLAIN SELECT * FROM taskinstance WHERE ti_status IN ('ACTIVE', 'DISPATCHED', 'FAILURE'); The first two EXPLAINs give a sane plan, the third one not so much. Poking at it now. regards, tom lane
I wrote: > "David Johnston" <polobo@yahoo.com> writes: >> If I explain the above query multiple times (without any data changes) I get >> either of the two query plans / the Bitmap Heap one more often. If I >> analyze the table I immediately get the "Index Scan" plan first but >> subsequent explains revert to switching between the two However, if I run >> any other query but the three-criteria query then the flipping behavior >> observed stops and the "Bitmap Heap" plan becomes the dominant plan. The > Hmmm ... I'm seeing some very odd behavior here too. The instability of the plans is a bug, and at least in the example I was looking at, the inefficient BitmapAnd plans are a bug too. I've applied a patch that fixes it for me. There might be related cases that are still broken though. regards, tom lane