Re: Indexed views? - Mailing list pgsql-hackers
From | Tiago Wright |
---|---|
Subject | Re: Indexed views? |
Date | |
Msg-id | 7ece122a040911145959258368@mail.gmail.com Whole thread Raw |
In response to | Re: Indexed views? (Greg Stark <gsstark@mit.edu>) |
Responses |
Re: Indexed views?
|
List | pgsql-hackers |
IMHO, it is worth duplicating the mvcc data to all index entries. To summarize what I understand from this discussion, with the current method: a1 - Index seeks must return invisible tuples because mvcc data is not found in the index. These tuples are eliminated once the data is read from the actual data pages. a2 - Covered queries are not possible since the data page must be visited to determine visibility If mvcc data is replicated to the index entries: b1 - Index seeks will never return invisible tuples, possibly eliminating some page reads b2 - Covered queries are possible b3 - Inserts are not affected performancewise. Deletes must now visit every index entry, which is a larger cost. Updates must visit every index entry too. It may be possible to reduce the cost of update if the indexed data is not affected, since the new index entry will likely end up in the same page as the index entry that must be deleted, so no extra page reads would be necessary in this scenario. Since the great majority of performance issues are related to select queries, the benefit of eliminating invisible tuple page fetches and supporting covered queries probably outweight the extra cost of updating index entries. And once covered queries are supported, it would be possible to build indexed views or multi-table indexes that can address some of the most performance demanding queries out there. I am wondering whether it would be possible to measure the costs of a1 and a2 above and compare with the probable costs for b3. It seems to me that applications for which b3 are most expensive are also those for which a1 would be most expensive, and since selects are much more common than updates, could one offset the other in the long run? Can anyone shed some light on these? -Tiago On 11 Sep 2004 01:58:01 -0400, Greg Stark <gsstark@mit.edu> wrote: > > Mischa Sandberg <ischamay.andbergsay@activestateway.com> writes: > > > I take it that it is a very reasonable assumption that only a small proportion > > of index records are actually invalid (else Yurk why use the index?). > > That's faulty logic, the percentage of tuples that are valid is entirely > independent from the percentage of tuples that match your range criterion. Ie, > I could be selecting 100 tuples out of a million -- even if 99 are invalid > it's still worthwhile to use the index. > > > Since you're using an index at all, the planner must be expecting a restricted > > set of rows to make it up through to the root. If there is any filter criteria > > against the values from the index rows, you won't even have to check rows for > > tuple visibility, that don't pass that filter. > > It's an interesting idea though. But I can't think of many queries where it > would be interesting. The query would still have to visit every page > containing a record used in the final result. So the only time this would be a > significant win is if you're applying very selective restrictions to columns > that were in the index but weren't able to put in the index condition. > > This seems like a pretty rare situation; usually the reason you put columns in > an index definition is because it is going to be useful for index conditions-- > especially if it's a particularly selective column. > > -- > greg > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster >
pgsql-hackers by date: