Partial Index [WHERE col IN ('A','B')] Planner Anomalies - Mailing list pgsql-general

From David Johnston
Subject Partial Index [WHERE col IN ('A','B')] Planner Anomalies
Date
Msg-id 020401cc0443$9c276850$d47638f0$@yahoo.com
Whole thread Raw
Responses Re: Partial Index [WHERE col IN ('A','B')] Planner Anomalies
List pgsql-general

Hi,

 

I have the following query:

 

EXPLAIN SELECT * FROM taskinstance WHERE ti_status IN ('ACTIVE', 'DISPATCHED', 'FAILURE');

 

With two indices:

ti_active… partial index over (ti_status) WHERE ti_status IN (‘ACTIVE’,’DISPATCHED’,’FAILURE’)

ti_sortedstatus… full index over (ti_status, ti_creationtimestamp)

 

If I explain the above query multiple times (without any data changes) I get either of the two query plans / the Bitmap Heap one more often.  If I analyze the table I immediately get the “Index Scan” plan first but subsequent explains revert to switching between the two However, if I run any other query but the three-criteria query then the flipping behavior observed stops and the “Bitmap Heap” plan becomes the dominant plan.  The

 

QUERY PLAN

Bitmap Heap Scan on taskinstance  (cost=115.87..135.83 rows=2336 width=723)

  Recheck Cond: (((ti_status)::text = ANY ('{ACTIVE,DISPATCHED,FAILURE}'::text[])) AND ((ti_status)::text = ANY ('{ACTIVE,DISPATCHED,FAILURE}'::text[])))

  ->  BitmapAnd  (cost=115.87..115.87 rows=5 width=0)

        ->  Bitmap Index Scan on ti_active_dispatch_failed_tasks_idx  (cost=0.00..47.94 rows=2336 width=0)

        ->  Bitmap Index Scan on ti_sortedstatus_idx  (cost=0.00..66.52 rows=2336 width=0)

              Index Cond: ((ti_status)::text = ANY ('{ACTIVE,DISPATCHED,FAILURE}'::text[]))

 

QUERY PLAN

Index Scan using ti_active_dispatch_failed_tasks_idx on taskinstance  (cost=0.00..402.03 rows=2348 width=718)

 

In a total table size of 1.2M records the partial index covers around 2300

 

Now, if I keep the IN construct but only list one of the options (any one) I get the following:

 

QUERY PLAN

Index Scan using ti_active_dispatch_failed_tasks_idx on taskinstance  (cost=0.00..466.44 rows=2868 width=725)

  Index Cond: ((ti_status)::text = 'FAILURE'::text) --works for ‘ACTIVE’ and ‘DISPATCHED’ as well

 

If I include two of them I get the following results:

 

Active, Dispatched ------------

QUERY PLAN

Bitmap Heap Scan on taskinstance  (cost=8.52..12.53 rows=1 width=718)

  Recheck Cond: ((ti_status)::text = ANY ('{ACTIVE,DISPATCHED}'::text[]))

  ->  Bitmap Index Scan on ti_active_dispatch_failed_tasks_idx  (cost=0.00..8.52 rows=1 width=0)    <<<<<<<<<<<<<<<<

        Index Cond: ((ti_status)::text = ANY ('{ACTIVE,DISPATCHED}'::text[]))

 

Active, Failure ---------------

QUERY PLAN

Bitmap Heap Scan on taskinstance  (cost=111.51..123.49 rows=2020 width=718)

  Recheck Cond: (((ti_status)::text = ANY ('{ACTIVE,DISPATCHED,FAILURE}'::text[])) AND ((ti_status)::text = ANY ('{ACTIVE,FAILURE}'::text[])))

  ->  BitmapAnd  (cost=111.51..111.51 rows=3 width=0)

        ->  Bitmap Index Scan on ti_active_dispatch_failed_tasks_idx  (cost=0.00..46.36 rows=2020 width=0) <<<<<<<<<<<<<<<<<

       ->  Bitmap Index Scan on ti_sortedstatus_idx  (cost=0.00..63.89 rows=2020 width=0) <<<<<<<<<<<<<<<<<<<<

              Index Cond: ((ti_status)::text = ANY ('{ACTIVE,FAILURE}'::text[]))

 

Dispatched, Failure --------------

QUERY PLAN

Bitmap Heap Scan on taskinstance  (cost=115.77..135.72 rows=2348 width=718)

  Recheck Cond: (((ti_status)::text = ANY ('{ACTIVE,DISPATCHED,FAILURE}'::text[])) AND ((ti_status)::text = ANY ('{DISPATCHED,FAILURE}'::text[])))

  ->  BitmapAnd  (cost=115.77..115.77 rows=5 width=0)

        ->  Bitmap Index Scan on ti_active_dispatch_failed_tasks_idx  (cost=0.00..48.00 rows=2348 width=0) <<<<<<<<<<<<<<<<<<<

        ->  Bitmap Index Scan on ti_sortedstatus_idx  (cost=0.00..66.35 rows=2348 width=0) <<<<<<<<<<<<<<<<<<<

              Index Cond: ((ti_status)::text = ANY ('{DISPATCHED,FAILURE}'::text[]))

 

In the double and single criteria cases I get the same plan each time – regardless of whether I run ANALYZE.

 

With the double-conditions the inclusion of “FAILURE” seems to confuse the planner but the single condition plan for FAILURE works as expected.

 

The documentation is unclear on how the use of “IN ()” affects the query planner.  At a basic level IN becomes (= OR =) which are both simple conditionals/logic elements so it should work properly.  Also, if the planner recognizes that the partial index is sufficient to cover the WHERE clause why would it want to bring in a full index that is, by definition, redundant?

 

I guess if you are not evaluating the WHERE clause the two-constraint versions likely would work oddly (and they do for ACTIVE/DISPATCHED if I reverse the order of the two items) but the three-constraint version occasionally works and the one-constraints versions always work even though they too do not match the original index where clause.

 

Nothing of a critical nature here but figured I’d report what I am seeing.  I can probably drop some constraints( and some data ) if you want to see the “taskinstance” table; and I will happily provide whatever configuration info you ask for.

 

Running 9.0.3 Windows  7 64-bit

 

Thanks,

 

David Johnston

 

 

pgsql-general by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Help - corruption issue?
Next
From: "Greg Sabino Mullane"
Date:
Subject: Re: Switching Database Engines