Re: query against large table not using sensible index to find very small amount of data - Mailing list pgsql-performance

From Shaun Thomas
Subject Re: query against large table not using sensible index to find very small amount of data
Date
Msg-id 0683F5F5A5C7FE419A752A034B4A0B97978B5AD2@sswchi5pmbx2.peak6.net
Whole thread Raw
In response to query against large table not using sensible index to find very small amount of data  ("Andrew W. Gibbs" <awgibbs@awgibbs.com>)
Responses Re: query against large table not using sensible index to find very small amount of data
Re: query against large table not using sensible index to find very small amount of data
List pgsql-performance
> Other possibly relevant pieces of information...  The entity type
> column has a cardinality in the neighborhood of a couple dozen.
> Meanwhile, for some of the entity types there is a large and ongoing
> number of events, and for other entity types there is a smaller and
> more sporadic number of events.  Every now and again a new entity
> type shows up.

With that as the case, I have two questions for you:

1. Why do you have a low cardinality column as the first column in an index?
2. Do you have any queries at all that only use the entity type as the only where clause?

I agree that the planner is probably wrong here, but these choices aren't helping. The low cardinality of the first
columncauses very large buckets that don't limit results very well at all. Combined with the order-by clause, the
plannerreally wants to walk the date index backwards to find results instead. I would do a couple of things. 

First, remove the type/date index. Next, do a count of each type in the table with something like this:

SELECT type_id, count(1)
  FROM my_table
 GROUP BY 2

Any type that is more than 20% of the table will probably never be useful in an index. At this point, you have a
choice.You can create a new index with date and type *in that order* or create a new partial index on date and type
thatalso ignores the top matches. For instance, if you had a type that was 90% of the values, this would be my
suggestion:

CREATE INDEX idx_foo_table_date_event_type_part ON foo_table (event_date, event_type)
 WHERE event_type != 14;

Or whatever. If the IDs are basically evenly distributed, it won't really matter.

In any case, index order matters. The planner wants to restrict data as quickly as possible. If you provide an order
clause,it wants to read the index in that order. Your specified type as the first column disrupts that, so it has to
fetchthe values first, which is usually more expensive. Even if that's wrong in your particular case, planner stats are
notprecise enough to know that. 

Either way, try moving the indexes around. I can't think of many indexes in our database where I have the low
cardinalityvalue as the first column. Databases have an easier time managing many shallow buckets of values, than a few
deepones. 

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd | Suite 400 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


pgsql-performance by date:

Previous
From: "Andrew W. Gibbs"
Date:
Subject: query against large table not using sensible index to find very small amount of data
Next
From: Tom Lane
Date:
Subject: Re: performance drop when function argument is evaluated in WHERE clause