Re: Indexed views? - Mailing list pgsql-hackers

From Tiago Wright
Subject Re: Indexed views?
Date
Msg-id 7ece122a040911145959258368@mail.gmail.com
Whole thread Raw
In response to Re: Indexed views?  (Greg Stark <gsstark@mit.edu>)
Responses Re: Indexed views?  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
IMHO, it is worth duplicating the mvcc data to all index entries. To
summarize what I understand from this discussion, with the current
method:

a1 - Index seeks must return invisible tuples because mvcc data is not
found in the index. These tuples are eliminated once the data is read
from the actual data pages.

a2 - Covered queries are not possible since the data page must be
visited to determine visibility

If mvcc data is replicated to the index entries:

b1 - Index seeks will never return invisible tuples, possibly
eliminating some page reads

b2 - Covered queries are possible

b3 - Inserts are not affected performancewise. Deletes must now visit
every index entry, which is a larger cost. Updates must visit every
index entry too. It may be possible to reduce the cost of update if
the indexed data is not affected, since the new index entry will
likely end up in the same page as the index entry that must be
deleted, so no extra page reads would be necessary in this scenario.

Since the great majority of performance issues are related to select
queries, the benefit of eliminating invisible tuple page fetches and
supporting covered queries probably outweight the extra cost of
updating index entries. And once covered queries are supported, it
would be possible to build indexed views or multi-table indexes that
can address some of the most performance demanding queries out there.

I am wondering whether it would be possible to measure the costs of a1
and a2 above and compare with the probable costs for b3. It seems to
me that applications for which b3 are most expensive are also those
for which a1 would be most expensive, and since selects are much more
common than updates, could one offset the other in the long run? Can
anyone shed some light on these?

-Tiago


On 11 Sep 2004 01:58:01 -0400, Greg Stark <gsstark@mit.edu> wrote:
> 
> Mischa Sandberg <ischamay.andbergsay@activestateway.com> writes:
> 
> > I take it that it is a very reasonable assumption that only a small proportion
> > of index records are actually invalid (else Yurk why use the index?).
> 
> That's faulty logic, the percentage of tuples that are valid is entirely
> independent from the percentage of tuples that match your range criterion. Ie,
> I could be selecting 100 tuples out of a million -- even if 99 are invalid
> it's still worthwhile to use the index.
> 
> > Since you're using an index at all, the planner must be expecting a restricted
> > set of rows to make it up through to the root. If there is any  filter criteria
> > against the values from the index rows, you won't even have to check rows for
> > tuple visibility, that don't pass that filter.
> 
> It's an interesting idea though. But I can't think of many queries where it
> would be interesting. The query would still have to visit every page
> containing a record used in the final result. So the only time this would be a
> significant win is if you're applying very selective restrictions to columns
> that were in the index but weren't able to put in the index condition.
> 
> This seems like a pretty rare situation; usually the reason you put columns in
> an index definition is because it is going to be useful for index conditions--
> especially if it's a particularly selective column.
> 
> --
> greg
> 
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>


pgsql-hackers by date:

Previous
From: James William Pye
Date:
Subject: Re: x86_64 configure problem
Next
From: Gaetano Mendola
Date:
Subject: Re: x86_64 configure problem