Re: Indexed views? - Mailing list pgsql-hackers

From Mischa Sandberg
Subject Re: Indexed views?
Date
Msg-id 1A31d.194860$X12.99998@edtnps84
Whole thread Raw
In response to Re: Indexed views?  (Greg Stark <gsstark@mit.edu>)
List pgsql-hackers
Greg Stark wrote:

> Mischa Sandberg <ischamay.andbergsay@activestateway.com> writes:
>>I take it that it is a very reasonable assumption that only a small proportion
>>of index records are actually invalid (else Yurk why use the index?). 

> That's faulty logic, the percentage of tuples that are valid is entirely
> independent from the percentage of tuples that match your range criterion. Ie,
> I could be selecting 100 tuples out of a million -- even if 99 are invalid
> it's still worthwhile to use the index.

Ummm ... perhaps I glossed over a bit of inference. If only a small 
proportion of the index contains invalid row references, then (in the 
absence of specific biases otherwise) arbitrary queries using that index 
will average the same proportion of invalid row references. And agreed, 
it would still be worthwhile to use the index in that case. Your analyze 
stats would be a bit queered, though.

>>Since you're using an index at all, the planner must be expecting a restricted
>>set of rows to make it up through to the root. If there is any  filter criteria
>>against the values from the index rows, you won't even have to check rows for
>>tuple visibility, that don't pass that filter.
> 
> It's an interesting idea though. But I can't think of many queries where it
> would be interesting. The query would still have to visit every page
> containing a record used in the final result. So the only time this would be a
> significant win is if you're applying very selective restrictions to columns
> that were in the index but weren't able to put in the index condition. 
> 
> This seems like a pretty rare situation; usually the reason you put columns in
> an index definition is because it is going to be useful for index conditions--
> especially if it's a particularly selective column.

Ummm ... two situations where filters on index columns do not fit the 
standard index probe are:

- filtering by restrictive join. Whether the index is the source or 
target of restriction, you get a better choice of join operators/orders.
For example, if the index is the restrictor, you may be able to build a 
hash table of (filtered) index rows, where building a hash table from a 
heap scan would be a bad choice.

- filtering of non-root index fields, or filtering with inequalities. 
Normally, the planner will not bother with the index for these, and may 
do a serial scan of the table. This can be done with a serial scan of 
the index, with possible optimizations like Oracle's "skip scan".

Furthermore, what 'covering' indexes buy you is, they have all the data 
you need for the query results, whether you apply predicates to them all 
or not.

At another level, people are talking about decomposition storage models 
for data in disk pages, versus n-ary storage models. That's more or less 
a fancy way of saying, organize data by columns instead of groups. This 
storage model pays you back in CPU cycles on most computers with L1/L2 
cache splits. At such point as PG might consider moving to that, then 
the row validity columns would be grouped together in a page, and the 
verification of index rows would be significantly faster: only a small 
portion of a large page need be read and pushed through the CPU.

[For more on DSM vs NSM, google: NSM n-ary DSM ]


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Cleaning up recovery from subtransaction start failure
Next
From: Daniel Schuchardt
Date:
Subject: Re: beta1 & beta2 & Windows & heavy load