Re: Indexed views? - Mailing list pgsql-hackers
From | Mischa Sandberg |
---|---|
Subject | Re: Indexed views? |
Date | |
Msg-id | 1A31d.194860$X12.99998@edtnps84 Whole thread Raw |
In response to | Re: Indexed views? (Greg Stark <gsstark@mit.edu>) |
List | pgsql-hackers |
Greg Stark wrote: > Mischa Sandberg <ischamay.andbergsay@activestateway.com> writes: >>I take it that it is a very reasonable assumption that only a small proportion >>of index records are actually invalid (else Yurk why use the index?). > That's faulty logic, the percentage of tuples that are valid is entirely > independent from the percentage of tuples that match your range criterion. Ie, > I could be selecting 100 tuples out of a million -- even if 99 are invalid > it's still worthwhile to use the index. Ummm ... perhaps I glossed over a bit of inference. If only a small proportion of the index contains invalid row references, then (in the absence of specific biases otherwise) arbitrary queries using that index will average the same proportion of invalid row references. And agreed, it would still be worthwhile to use the index in that case. Your analyze stats would be a bit queered, though. >>Since you're using an index at all, the planner must be expecting a restricted >>set of rows to make it up through to the root. If there is any filter criteria >>against the values from the index rows, you won't even have to check rows for >>tuple visibility, that don't pass that filter. > > It's an interesting idea though. But I can't think of many queries where it > would be interesting. The query would still have to visit every page > containing a record used in the final result. So the only time this would be a > significant win is if you're applying very selective restrictions to columns > that were in the index but weren't able to put in the index condition. > > This seems like a pretty rare situation; usually the reason you put columns in > an index definition is because it is going to be useful for index conditions-- > especially if it's a particularly selective column. Ummm ... two situations where filters on index columns do not fit the standard index probe are: - filtering by restrictive join. Whether the index is the source or target of restriction, you get a better choice of join operators/orders. For example, if the index is the restrictor, you may be able to build a hash table of (filtered) index rows, where building a hash table from a heap scan would be a bad choice. - filtering of non-root index fields, or filtering with inequalities. Normally, the planner will not bother with the index for these, and may do a serial scan of the table. This can be done with a serial scan of the index, with possible optimizations like Oracle's "skip scan". Furthermore, what 'covering' indexes buy you is, they have all the data you need for the query results, whether you apply predicates to them all or not. At another level, people are talking about decomposition storage models for data in disk pages, versus n-ary storage models. That's more or less a fancy way of saying, organize data by columns instead of groups. This storage model pays you back in CPU cycles on most computers with L1/L2 cache splits. At such point as PG might consider moving to that, then the row validity columns would be grouped together in a page, and the verification of index rows would be significantly faster: only a small portion of a large page need be read and pushed through the CPU. [For more on DSM vs NSM, google: NSM n-ary DSM ]
pgsql-hackers by date: