Re: Idea: GSoC - Query Rewrite with Materialized Views - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: Idea: GSoC - Query Rewrite with Materialized Views
Date
Msg-id 625520725.3243691.1424474454647.JavaMail.yahoo@mail.yahoo.com
Whole thread Raw
In response to Re: Idea: GSoC - Query Rewrite with Materialized Views  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Idea: GSoC - Query Rewrite with Materialized Views  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

> I share the view that this would be very valuable, but the scope
> far exceeds what can be done within a single GSoC project. But
> maybe we could split that into multiple pieces, and Eric would
> implement only the first piece?
>
> For example the 'is_stale' flag for a MV would be really useful,
> making it possible to refresh only the MVs that actually need a
> refresh.

You may be on to something there.  Frankly, though, I'm not sure
that we could even reach consensus within the community on a
detailed design for how we intend to track staleness (that will
hold up both now and once we have incremental maintenance of
materialized views working) within the time frame of a GSoC
project.  This would need to be done with an eye toward how it
might be used in direct references (will we allow a "staleness
limit" on a reference from a query?), for use in a rewrite, and how
it will interact with changes to base tables and with both REFRESH
statements and incremental maintenance at various levels of
"eagerness".  I'm not sure that staleness management wouldn't be
better left until we have some of those other parts for it to work
with.

Questions to consider:

Some other products allow materialized views to be partitioned and
staleness to be tracked by partition, and will check which
partitions will be accessed in determining staleness.  Is that
something we want to allow for?

Once we have incremental maintenance, an MV maintained in an
"eager" fashion (changes are visible in the MV as soon as the
transaction modifying the underlying table commit) could be
accessed with a MVCC snapshots, with different snapshots seeing
different versions.  It seems pretty clear that such an MV would
always be considered "fresh", so there would be no need to
constantly flipping to stale and back again as the underlying table
were changed and the changes were reflected in the MV.  How do we
handle that?

If changes to an MV are less eager (they are queued for application
after COMMIT, as time permits) would we want to track the xid of
how far along they are, so that we can tell whether a particular
snapshot is safe to use?  Do we want to allow a non-MVCC snapshot
that shows the latest version of each row? Only if staleness is
minimal?

What about MVs which don't have incremental maintenance?  We can
still determine what xid they are current "as of", from the
creation or the latest refresh.  Do we want to track that instead
of a simple boolean flag?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Abbreviated keys for Numeric (was: Re: B-Tree support function number 3 (strxfrm() optimization))
Next
From: Andres Freund
Date:
Subject: Bootstrap DATA is a pita