Thread: partial index regarded more expensive

From:
Tobias Brox
Date:

So, I have a table game with a timestamp attribute 'game_end', ranging from
jan-2005 to present.  The game table also have an attribute state, with live
games beeing in state 2, and ended games beeing in state 4 (so,
game_end+delta>now() usually means state=4).  There are also an insignificant
number of games in states 1,3.

This query puzzles me:

  select * from game where game_end>'2005-07-30' and state in (3,4);

Now, one (at least me) should believe that the best index would be a partial
index,

    "resolved_game_by_date" btree (game_end) WHERE ((state = 3) OR (state = 4))

NBET=> explain analyze select * from game where game_end>'2005-07-30' and state in (3,4);
                                                               QUERY PLAN


----------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using resolved_game_by_date on game  (cost=0.00..7002.87 rows=7147 width=555) (actual time=0.220..86.234
rows=3852loops=1) 
   Index Cond: (game_end > '2005-07-30 00:00:00'::timestamp without time zone)
   Filter: ((state = 3) OR (state = 4))
 Total runtime: 90.568 ms
(4 rows)

Since state has only two significant states, I wouldn't believe this index
to be any good:

    "game_by_state" btree (state)


...and it seems like I'm right:

NBET=> explain analyze select * from game where game_end>'2005-07-30' and
state in (3,4);
                                                                   QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using game_by_state, game_by_state on game  (cost=0.00..4413.78 rows=7147 width=555) (actual
time=0.074..451.771rows=3851 loops=1) 
   Index Cond: ((state = 3) OR (state = 4))
      Filter: (game_end > '2005-07-30 00:00:00'::timestamp without time zone)
 Total runtime: 457.132 ms
(4 rows)

Now, how can the planner believe the game_by_state-index to be better?

('vacuum analyze game' did not significantly impact the numbers, and I've
tried running the queries some times with and without the
game_by_state-index to rule out cacheing effects)

--
Tobias Brox
This signature has been virus scanned, and is probably safe to read.
This mail may contain confidential information, please keep your eyes closed.

From:
PFC
Date:

    why not simply create an index on (game_end, state) ?

From:
Tobias Brox
Date:

[PFC - Wed at 08:15:13PM +0200]
>     why not simply create an index on (game_end, state) ?

No, the planner prefers to use the partial index (I dropped the index on
game(state)).

--
Tobias Brox, Nordicbet IT dept
This signature has been virus scanned, and is probably safe to read.
This mail may contain confidential information, please keep your eyes closed.

From:
Tom Lane
Date:

Tobias Brox <> writes:
> This query puzzles me:
>   select * from game where game_end>'2005-07-30' and state in (3,4);
> ...
> Now, how can the planner believe the game_by_state-index to be better?

I suspect the problem has to do with lack of cross-column statistics.
The planner does not know that state=4 is correlated with game_end,
and it's probably coming up with some bogus guesses about the numbers
of index rows visited in each case.  You haven't given enough info to
quantify this, though.

            regards, tom lane