Re: bitmask index - Mailing list pgsql-performance

From Greg Smith
Subject Re: bitmask index
Date
Msg-id 4E134D3E.8030809@2ndQuadrant.com
Whole thread Raw
In response to Re: bitmask index  (Marcus Engene <mengpg2@engene.se>)
List pgsql-performance
On 07/05/2011 06:15 AM, Marcus Engene wrote:
> Though partial index solved another problem. Usually I'm a little bit
> annoyed with the optimizer and the developers religious "fix the
> planner instead of index hints". I must say that I'm willing to
> reconsider my usual stance to that.
>
> We have a large table of products where status=20 is a rare
> intermediate status. I added a...
>
> CREATE INDEX pond_item_common_x8 ON pond_item_common(pond_user, status)
> WHERE status = 20;
>
> ...and a slow 5s select with users who had existing status=20 items
> became very fast. Planner, I guess, saw the 10000 status 20 clips (out
> of millions of items) instead of like 5 different values of status and
> thus ignoring the index. Super!
>
> To my great amazement, the planner also managed to use the index when
> counting how many status=20 items there are in total:

I'm glad we got you to make a jump toward common ground with the
database's intended use.  There are many neat advanced ways to solve the
sorts of problems people try to hammer with hints available in
PostgreSQL, some of which don't even exist in other databases.  It's
kind of interesting to me how similarly one transition tends to happen
to people who learn a lot about those options, enough that they can talk
fully informed about things like how hints would have to work in
PostgreSQL--for example:  they'd have to consider all all these partial
index possibilities.  Once you go through all that, suddenly a lot of
the people who do it realize that maybe hints aren't as important as
good design and indexing--when you take advantages of all the features
available to you--after all.

To help explain what happened to you here a little better, the planner
tracks Most Common Values in the database, and it uses those statistics
to make good decisions about the ones it finds.  But when a value is
really rare, it's never going to make it to that list, and therefore the
planner is going to make a guess about how likely it is--likely a wrong
one.  By creating a partial index on that item, it's essentially adding
that information--just how many rows are going to match a query looking
for that value--so that it can be utilized the same way MCVs are.
Adding partial indexes on sparse columns that are critical to a common
report allow what I'm going to coin a new acronym for:  those are part
of the Most Important Values in that column.  The MIV set is the MCV
information plus information about the rare but critical columns.  And
the easiest way to expose that data to the planner is with a partial index.

I smell a blog post coming on this topic.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
Comprehensive and Customized PostgreSQL Training Classes:
http://www.2ndquadrant.us/postgresql-training/


pgsql-performance by date:

Previous
From: Greg Smith
Date:
Subject: Re: Postgres bulkload without transaction logs
Next
From: Jonathan
Date:
Subject: Slow query when using ORDER BY *and* LIMIT