Re: Idea: GSoC - Query Rewrite with Materialized Views - Mailing list pgsql-hackers

From Eric Grinstein
Subject Re: Idea: GSoC - Query Rewrite with Materialized Views
Date
Msg-id CAK7uWEz1Z4OdsrD05+5i_u6TAxDixmRUx-AT9ZmoeauaLiJ5+g@mail.gmail.com
Whole thread Raw
In response to Re: Idea: GSoC - Query Rewrite with Materialized Views  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Idea: GSoC - Query Rewrite with Materialized Views  (Kevin Grittner <kgrittn@ymail.com>)
List pgsql-hackers
Thank you for your answers.
I would be very interested in tracking the staleness of the MV.
You see, I work in a research group in database tuning, and we have
implemented some solutions to take advantage of MV's and speed up queries.
The query rewrite feature would be extremely desirable for us. 
Do you think that implementing the staleness check as suggested by Thomas 
could get us started in the query rewrite business? Do you think I should make a proposal 
or there are more interesting subjects to GSoC? I'd be happy to hear project suggestions, especially
related to the optimizer, tuning, etc.

Eric

2015-02-20 22:35 GMT-02:00 Tomas Vondra <tomas.vondra@2ndquadrant.com>:
On 21.2.2015 00:20, Kevin Grittner wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
>> I share the view that this would be very valuable, but the scope
>> far exceeds what can be done within a single GSoC project. But
>> maybe we could split that into multiple pieces, and Eric would
>> implement only the first piece?
>>
>> For example the 'is_stale' flag for a MV would be really useful,
>> making it possible to refresh only the MVs that actually need a
>> refresh.
>
> You may be on to something there.  Frankly, though, I'm not sure
> that we could even reach consensus within the community on a
> detailed design for how we intend to track staleness (that will
> hold up both now and once we have incremental maintenance of
> materialized views working) within the time frame of a GSoC
> project.  This would need to be done with an eye toward how it
> might be used in direct references (will we allow a "staleness
> limit" on a reference from a query?), for use in a rewrite, and how
> it will interact with changes to base tables and with both REFRESH
> statements and incremental maintenance at various levels of
> "eagerness".  I'm not sure that staleness management wouldn't be
> better left until we have some of those other parts for it to work
> with.

Doing that properly is going to be nontrivial, no doubt about that. I
was thinking about keeping a simple list of updated tables (oids) and
then at commit time, deciding which MVs to depend on that and setting
some sort of flag (or XID) for all those MVs. But maybe there's a better
way.

> Questions to consider:
>
> Some other products allow materialized views to be partitioned and
> staleness to be tracked by partition, and will check which partitions
> will be accessed in determining staleness. Is that something we want
> to allow for?

I think we need to get this working for simple MVs, especially because
we don't have partitioned MVs (or the type of declarative partitioning
the other products do have).

> Once we have incremental maintenance, an MV maintained in an "eager"
> fashion (changes are visible in the MV as soon as the transaction
> modifying the underlying table commit) could be accessed with a MVCC
> snapshots, with different snapshots seeing different versions. It
> seems pretty clear that such an MV would always be considered
> "fresh", so there would be no need to constantly flipping to stale
> and back again as the underlying table were changed and the changes
> were reflected in the MV. How do we handle that?

Yes, incrementally updated MVs might be used more easily, without
tracking staleness. But we don't have that now, and it's going to take a
significant amount of time to get there.

Also, not all MVs can be updated incrementally, so either we allow only
simple MVs to be used for rewrites, or we'll have to implement the
'stale' flag anyway.

> If changes to an MV are less eager (they are queued for application
> after COMMIT, as time permits) would we want to track the xid of how
> far along they are, so that we can tell whether a particular snapshot
> is safe to use? Do we want to allow a non-MVCC snapshot that shows
> the latest version of each row? Only if staleness is minimal?

Maybe. When I talk about 'flag' I actually mean a simple way to
determine whether the MV is up-to-date or not. Snapshots and XIDs are
probably the right way to do that in MVCC-based system.

> What about MVs which don't have incremental maintenance? We can still
> determine what xid they are current "as of", from the creation or the
> latest refresh. Do we want to track that instead of a simple boolean
> flag?

How would we use the 'as of' XID? IMHO it's unacceptable to quietly use
stale data unless the user explicitly references the MV, so we'd have to
assume we can't use that MV.

regards

--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Grzegorz Parka
Date:
Subject: GSoC idea - Simulated annealing to search for query plans
Next
From: Alvaro Herrera
Date:
Subject: Re: logical column ordering