Thread: counting algorithm for incremental matview maintenance

counting algorithm for incremental matview maintenance

From

Kevin Grittner

Date:

14 May 2013, 22:52:13

<div style="color:#000; background-color:#fff; font-family:times new roman, new york, times, serif;font-size:12pt">In
surveyingthe literature on $subject, I find that most of the<br />theoretical work related to how to incrementally
update<br/>materialized views based on the matview declaration was published<br />between 1988 and 1993.  The best
paperI have been able to find on<br />the topic was published in ACM SIGMOD in 1993[1], and covers two<br />algorithms:
countingand DRed.  The former should be very fast for<br />non-recursive views, but not very efficient for recursive
views.<br/>The latter algorithm is the other way around -- it looks like it<br />will do well with recursive views but
generallybe slower for<br />non-recursive views.<br /><br />It does not seem feasible to me to implement both
techniquesin a<br />single one-year PostgreSQL release.  In fact, if we have trouble<br />getting everyone onto the
samepage early, we might have to settle<br />for trying to just get some infrastructure into place, without<br
/>anythingto actually make use of it.  That would be unfortunate,<br />since Oracle added incremental maintenance of
matviewsto their<br />existing feature in 1999, and have been improving it regularly<br />since then.  Many other
productsalso have mature implementations<br />of this, and there seems to be a lot of demand for it in<br
/>PostgreSQL. In the best of circumstances it will take years for us<br />to catch up on this front.<br /><br />What I
proposefor 9.4 is this:<br /><br />During the first CF I would like to have a discussion to settle on<br />how best to
dosome things that the counting algorithm requires, so<br />that following the first CF I can try to implement that<br
/>infrastructureand get at least simple matviews working for<br />incremental maintenance, and have something ready for
thesecond<br />CF.  I can hack up a rough patch to suggest a possible approach for<br />purposes of discussion, but I'm
notsure whether either that's<br />necessary or desirable.<br /><br />----------<br />Needed infrastructure changes
follow.<br/>----------<br /><br />First, the algorithm requires a new system column called<br />count_t (or similar). 
Itwould be optional (like oid is) would<br />only be used by tuples in matviews maintained by the counting<br
/>algorithm,and by delta relations used in the matview incremental<br />maintenance process. Tables and matviews not
createdor altered to<br />specify incremental maintenance would not carry this new system<br />column.  Existing
matviewsfrom 9.3 would continue to work as<br />before until and unless they were ALTERed to specify incremental<br
/>maintenance;this form of ALTER would require rewrite of the<br />matview to add the count_t column.<br /><br />This
algorithmalso requires that we have new execution node types<br />(or conditional behavior in existing nodes) to
supportprocessing<br />counts.  It is worth noting that relations without a count_t column<br />will sometimes need to
beprocessed using the new logic (in which<br />case each tuple is treated as having an implied count of one).<br /><br
/>count_tin a matview must always be greater than zero.  In delta<br />relations it can be negative (to reflect tuples
beingdeleted or<br />the "before" image of an update) or positive (to reflect tuples<br />being inserted or the "after"
imageof an update).<br /><br />Examples of the new behavior in nodes used by incremental<br />maintenance are:<br /><br
/> *DISTINCT or GROUP BY: count_t in the result should represent the<br />number of tuples being combined.<br /><br
/> *JOIN: count_t in the result is the product of multiplying the<br />counts from the pair of joined tuples.<br /><br
/> *UNION (without ALL): count_t in the result should be the value<br />from an unmatched tuple or the sum of the
countsof matched tuples,<br />with any result row with a zero count_t being omitted from the<br />result.<br /><br
/>Notethat for nodes other than UNION, the logic is identical except<br />for calculation of the count for the
output.<br/><br />The resulting matviews can be used in the traditional set logic of<br />PostgreSQL, without knowing
aboutor caring about the count_t<br />column, exactly as before; the new system column is only used for<br
/>maintenance.<br/><br />We could drive the triggering of incremental maintenance off of the<br />dependency
informationwhich is already stored, but for performance<br />we probably want to add a new pg_class flag to indicate
thatthe<br />relation is referenced by a matview definition which specifies<br />incremental update.  That would allow
afast path for skipping<br />other tests for DML on non-referenced relations, at the expense of<br />some additional
catalogupdates on some DDL.<br /><br />----------<br />The above is everything that is a significant infrastructure
change<br/>needed for a first cut at incremental maintenance of materialized<br />views, driven by the view
definition. Below is additional detail<br />about what will happen when incremental maintenance for a view is<br
/>specified,for those who wonder why these infrastructure changes<br />are needed.<br />----------<br /><br
/>Essentially,the technique is based on relational algebra.  This<br />allows previously existing theory to prove the
correctnessof the<br />technique and its optimizations.  The addition of the count_t<br />column allows accurate
identificationof exactly which rows need to<br />be touched during incremental maintenance, with minimal overhead.<br
/><br/>As the paper notes, these incremental maintenance techniques are a<br />heuristic; there will always be cases
whereit is faster to<br />regenerate the data from scratch than to use these incremental<br />techniques, so part of
thegoal of having the transactional REFRESH<br />(as described in a separate thread) is to allow some front-end<br
/>teststo attempt to recognize when an incremental approach will be<br />slower than a REFRESH and fall back onto the
transactionalREFRESH<br />technique seamlessly and fairly quietly.  (My gut feeling on the<br />right level for an
ereportof this is DEBUG1, although I can see<br />the argument for a LOG level entry.)<br /><br />The original and
modifiedversions of the relations (tables or<br />other matviews) which define a matview must be available to<br
/>calculatethe matview deltas, so the snapshot information for this<br />must be available and registered until the
matviewdelta has been<br />calculated.  They can be released once the delta has been<br />established and before it has
beenapplied to the matview.  Initial<br />milestones in this development will focus on "eager" maintenance,<br />so
thiswill not initially be a big issue, but will become more<br />important as we work on making more of the work
asynchronous,so<br />that the foreground query is not held up maintaining data which is<br />allowed to be a little
stale.This seems not entirely unrelated to<br />background worker processes and snapshot sharing.<br /><br />I don't
knowwhether we can get to this during 9.4 development, but<br />it might be worth at least considering as the other
workis done:<br />some aggregate columns with a small amount of data kept while the<br />aggregate is being computed
canbe further optimized under this<br />technique by storing the working data in the tuple as some sort of<br />hidden
orsystem column related to the aggregate.  A few of the<br />obvious candidates for such optimization include count(),
sum()and<br />avg().  Note that with this capability, especially if we ever<br />implement the query rewrite capability
formatviews, we would have<br />a nice answer for the count(*) speed complaints.<br /><br />Obviously, there will need
tobe some new syntax around CMV and AMV<br />to support specification of how the matview should be maintained.<br /><br
/>Also,we are moving into a phase where it makes sense to start the<br />bikeshedding on what timings we should
support,how information<br />about the desired timings should be stored, how "freshness" should<br />be tracked, and
whatsyntax we put around all that.  Frankly, I<br />think I will have my hands pretty full with the incremental<br
/>maintenancework, and I see the timing work as largely orthogonal,<br />so it would be great if anyone who was
interestedin that aspect of<br />things could take primary responsibility for moving that along.<br /><br />--<br
/>KevinGrittner<br />EnterpriseDB: http://www.enterprisedb.com<br />The Enterprise PostgreSQL Company<br /><br />[1]
http://dl.acm.org/citation.cfm?id=170066<br/><br /></div>

Re: counting algorithm for incremental matview maintenance

From

Merlin Moncure

Date:

15 May 2013, 19:12:16

On Tue, May 14, 2013 at 2:52 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> In surveying the literature on $subject, I find that most of the
> theoretical work related to how to incrementally update
> materialized views based on the matview declaration was published
> between 1988 and 1993.  The best paper I have been able to find on
> the topic was published in ACM SIGMOD in 1993[1], and covers two
> algorithms: counting and DRed.  The former should be very fast for
> non-recursive views, but not very efficient for recursive views.
> The latter algorithm is the other way around -- it looks like it
> will do well with recursive views but generally be slower for
> non-recursive views.
>
> It does not seem feasible to me to implement both techniques in a
> single one-year PostgreSQL release.  In fact, if we have trouble
> getting everyone onto the same page early, we might have to settle
> for trying to just get some infrastructure into place, without
> anything to actually make use of it.  That would be unfortunate,
> since Oracle added incremental maintenance of matviews to their
> existing feature in 1999, and have been improving it regularly
> since then.  Many other products also have mature implementations
> of this, and there seems to be a lot of demand for it in
> PostgreSQL.  In the best of circumstances it will take years for us
> to catch up on this front.

#1 issue I have with current matview functionality is locking.
currently refresh takes out an access exclusive lock.  so, question
is, do you think your proposal will be such that it will no longer
require taking out full lock for refresh purposes (either incremental
or otherwise)?

merlin

Re: counting algorithm for incremental matview maintenance

From

Kevin Grittner

Date:

15 May 2013, 19:33:16

Merlin Moncure <mmoncure@gmail.com> wrote:

> #1 issue I have with current matview functionality is locking.
> currently refresh takes out an access exclusive lock.  so,
> question is, do you think your proposal will be such that it will
> no longer require taking out full lock for refresh purposes
> (either incremental or otherwise)?

The right thread for *that* question is "Differential
(transactional) REFRESH"; however, I might as well say here that I
don't think we want to get rid of the (faster) version that just
replaces the current heap when we add the (slower) option to
REFRESH it transactionally.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: counting algorithm for incremental matview maintenance

From

Merlin Moncure

Date:

15 May 2013, 20:20:40

On Wed, May 15, 2013 at 11:33 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Merlin Moncure <mmoncure@gmail.com> wrote:
>
>> #1 issue I have with current matview functionality is locking.
>> currently refresh takes out an access exclusive lock.  so,
>> question is, do you think your proposal will be such that it will
>> no longer require taking out full lock for refresh purposes
>> (either incremental or otherwise)?
>
> The right thread for *that* question is "Differential
> (transactional) REFRESH"; however, I might as well say here that I
> don't think we want to get rid of the (faster) version that just
> replaces the current heap when we add the (slower) option to
> REFRESH it transactionally.

sorry, didn't notice that thread.   agreed, that seems good candidate
for user input to refresh command to manage the tradeoff.

well, do you expect the application of differential refresh to be
automatic?  lockless + differential refresh would be game changer in
terms of how I build up data for analytics.

merlin

Re: counting algorithm for incremental matview maintenance

From

Kevin Grittner

Date:

15 May 2013, 21:18:43

Merlin Moncure <mmoncure@gmail.com> wrote:
> Kevin Grittner <kgrittn@ymail.com> wrote:
>> Merlin Moncure <mmoncure@gmail.com> wrote:
>>
>>> #1 issue I have with current matview functionality is locking.
>>> currently refresh takes out an access exclusive lock.  so,
>>> question is, do you think your proposal will be such that it
>>> will no longer require taking out full lock for refresh
>>> purposes (either incremental or otherwise)?
>>
>> The right thread for *that* question is "Differential
>> (transactional) REFRESH"; however, I might as well say here that
>> I don't think we want to get rid of the (faster) version that
>> just replaces the current heap when we add the (slower) option
>> to REFRESH it transactionally.
>
> sorry, didn't notice that thread.  agreed, that seems good
> candidate for user input to refresh command to manage the
> tradeoff.
>
> well, do you expect the application of differential refresh to be
> automatic?

I expect considerable bikeshedding on this, but my preference would
be to allow syntax for specifying which technique is desired on
the REFRESH command, and in the absence of specification I would
prefer that it default to the current technique for a matview which
is not yet populated and the transactional technique for a
populated matview.  We could associate some property with the
matview for default REFRESH technique, but I don't know whether
that's worth the trouble.

> lockless + differential refresh would be game changer in terms of
> how I build up data for analytics.

Yeah, it's definitely something I would have liked to have in the
initial release, but I tried to keep the scope very limited given
how little time there was left in the release cycle when I got a
chance to start on it.  Adding this seemed to be just a little too
much.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: counting algorithm for incremental matview maintenance

From

Josh Berkus

Date:

15 May 2013, 22:10:29

Kevin,

It's fairly common for matviews to be constructed such that updates to
them are strictly appends.  For example, a matview which has a daily
summary would just get appended to each day, and existing rows would not
change barring a major historical database cleanup.

It seems like we could ... and ought to ... optimize for this pattern
somehow for incremental updates.  That is, if the user knows that we're
going to be only appending new rows and not modifying any old ones, s/he
ought to be able to tell the database that somehow and avoid the
overhead of checking.  While the overhead of checking a count wouldn't
be that high for a few hundred rows, I've dealt with matviews which were
thousands to millions of rows.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: counting algorithm for incremental matview maintenance

From

Kevin Grittner

Date:

16 May 2013, 16:32:22

<div style="color:#000; background-color:#fff; font-family:times new roman, new york, times, serif;font-size:12pt">Josh
Berkus<josh@agliodbs.com> wrote:<br /><br />> It's fairly common for matviews to be constructed such that<br
/>>updates to them are strictly appends.  For example, a matview<br />> which has a daily summary would just get
appendedto each day,<br />> and existing rows would not change barring a major historical<br />> database
cleanup.<br/>><br />> It seems like we could ... and ought to ... optimize for this<br />> pattern somehow for
incrementalupdates.  That is, if the user<br />> knows that we're going to be only appending new rows and not<br
/>>modifying any old ones, s/he ought to be able to tell the<br />> database that somehow and avoid the overhead
ofchecking.  While<br />> the overhead of checking a count wouldn't be that high for a few<br />> hundred rows,
I'vedealt with matviews which were thousands to<br />> millions of rows.<br /><br />Thanks for the suggestion; I
willkeep an eye out for ways this<br />insight might allow an optimization.  I think there might be some<br
/>misunderstandingof the counting algorithm, though -- there is no<br />need to do a sequential pass through the
matviewexamining the<br />counts.<br /><br />I don't want to replicate the content of a fairly dense (in the<br />sense
ofhaving a lot of content per page) 10 page computer science<br />paper in this email, but for purposes of illustration
Iwill take a<br />very simple case and show how it works.  (This is not geared toward<br />your particular case,
becausethat could get kinda long to explain<br />here and now, but hopefully this will give an insight into the<br
/>techniqueoverall.)<br /><br />Let's say there is a table and matview like this:<br /><br />create table foo (fooid
intprimary key, val int not null);<br />create materialized view bar as select distinct val from foo;<br /><br />Let's
saythere are millions of rows in both, and that we have<br />flagged the view for incremental maintenance.  (Syntax
omittedto<br />avoid distracting bikeshedding on that when the point is the<br />algorithm.)<br /><br />Now, someone
runsthis:<br /><br />update foo set val = val + 1 where fooid between 1 and 10;<br /><br />What will happen is this:<br
/><br/>Since foo will be flagged as a relation which is referenced by an<br />incrementally maintained matview, a delta
relationwill be<br />materialized for this update, which will contain the net change to<br />the underlying table in
thecount_t system column.  "Before" tuples<br />will have a count of -1; "after" tuples will have a count of 1.<br
/>Thenthe query defining the view will be run *against the delta*,<br />resulting in a relation with a count_t column
reflectingthe net<br />change for each val.  Anything with a zero for the net change will<br />be dropped.  We will run
alogical UNION of this relation and the<br />bar matview.  In this case, we obviously want this to be done in a<br
/>waythat for each row in this "net change" relation, we do an index<br />scan against the bar matview; if not found,
weinsert the new row<br />into the matview with its count from the net change relation.<br />(That had better be
positiveor we have a bug -- so elog ERROR if<br />negative.)  If bar does contain a matching row, update count_t in<br
/>thatrow with the sum of its old value and the value from the "net<br />change" relation.  Of course, that new value
alsohad better be<br />positive or zero -- if zero we delete the old row rather than<br />updating count_t.<br /><br
/>Thecount_t column saves us from having to scan foo for all the old<br />val values.  It does not require any scan of
theentire bar<br />matview.  It allows us to zero in on exactly the right rows, and<br />lets us know what needs
doing.<br/><br />Clearly we want the planner to find the best plans for the interim<br />steps rather than hard-coding
particularrule-based plans; I just<br />used an example where it's pretty clear what a reasonable plan<br />would
be.<br/><br />Hopefully this makes it fairly clear that the only thing that an<br />optimization around the
"append-only"assertion for a matview would<br />be the ability to skip the probe for an existing record *related to<br
/>rowswhich are in the delta*.  As long as there is reasonable<br />indexing on the matview, maintenance for the
append-onlycase would<br />not involve scanning the entire matview.<br /><br />-- <br />Kevin Grittner<br
/>EnterpriseDB:http://www.enterprisedb.com<br />The Enterprise PostgreSQL Company<br /><br /></div>

Re: counting algorithm for incremental matview maintenance

From

Kevin Grittner

Date:

16 May 2013, 19:25:09

I wrote:

> Let's say there is a table and matview like this:
>
> create table foo (fooid int primary key, val int not null);
> create materialized view bar as select distinct val from foo;

Some of the subsequent text doesn't make sense unless that
materialized view has an index, like this:

create unique index bar_val on bar (val);

Without such an index, it would need to use a plan which scanned
the entire bar relation for any maintenance.  Apologies for the
omission and any confusion it caused.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: counting algorithm for incremental matview maintenance

From

Robert Haas

Date:

17 May 2013, 03:11:19

On Tue, May 14, 2013 at 3:52 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> We could drive the triggering of incremental maintenance off of the
> dependency information which is already stored, but for performance
> we probably want to add a new pg_class flag to indicate that the
> relation is referenced by a matview definition which specifies
> incremental update.  That would allow a fast path for skipping
> other tests for DML on non-referenced relations, at the expense of
> some additional catalog updates on some DDL.

I'm afraid this might require creating a matview or updating the
definition of a matview to refer to different relations to take
AccessExclusiveLock on those relations, in order to avoid SnapshotNow
problems while updating this flag for those relations, and I think
that's probably unacceptable.  Some thought may be needed here to come
up with a good solution.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: counting algorithm for incremental matview maintenance

From

Kevin Grittner

Date:

17 May 2013, 03:33:44

Robert Haas <robertmhaas@gmail.com> wrote:
> Kevin Grittner <kgrittn@ymail.com> wrote:
>> We could drive the triggering of incremental maintenance off of the
>> dependency information which is already stored, but for performance
>> we probably want to add a new pg_class flag to indicate that the
>> relation is referenced by a matview definition which specifies
>> incremental update.  That would allow a fast path for skipping
>> other tests for DML on non-referenced relations, at the expense of
>> some additional catalog updates on some DDL.
>
> I'm afraid this might require creating a matview or updating the
> definition of a matview to refer to different relations to take
> AccessExclusiveLock on those relations, in order to avoid SnapshotNow
> problems while updating this flag for those relations, and I think
> that's probably unacceptable.  Some thought may be needed here to come
> up with a good solution.

Thanks for the feedback.

I had been thinking that such a flag would be the moral equivalent
of such existing flags as relhaspkey, relhasrules, relhastriggers,
and relhassubclass.  Those all require owner rights to change, and
perhaps we don't want to require that a user be the owner of a
table to define a materialized view which references that table and
specifies incremental update.  On the other hand, we might not want
to let just any old user who has SELECT permission on a table to
specify that it feeds an incrementally updated matview, since there
is no escaping the fact that extra work will be needed for DML
against that table if it is doing that.  I seem to recall that at
least one other product requires the owner of a table to ALTER it
to set a flag specifying that the table is allowed to be used to
back incrementally updated matviews; perhaps that's the way to go?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: counting algorithm for incremental matview maintenance

From

Robert Haas

Date:

17 May 2013, 04:46:33

On Thu, May 16, 2013 at 8:33 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>> Kevin Grittner <kgrittn@ymail.com> wrote:
>>> We could drive the triggering of incremental maintenance off of the
>>> dependency information which is already stored, but for performance
>>> we probably want to add a new pg_class flag to indicate that the
>>> relation is referenced by a matview definition which specifies
>>> incremental update.  That would allow a fast path for skipping
>>> other tests for DML on non-referenced relations, at the expense of
>>> some additional catalog updates on some DDL.
>>
>> I'm afraid this might require creating a matview or updating the
>> definition of a matview to refer to different relations to take
>> AccessExclusiveLock on those relations, in order to avoid SnapshotNow
>> problems while updating this flag for those relations, and I think
>> that's probably unacceptable.  Some thought may be needed here to come
>> up with a good solution.
>
> Thanks for the feedback.
>
> I had been thinking that such a flag would be the moral equivalent
> of such existing flags as relhaspkey, relhasrules, relhastriggers,
> and relhassubclass.  Those all require owner rights to change, and
> perhaps we don't want to require that a user be the owner of a
> table to define a materialized view which references that table and
> specifies incremental update.  On the other hand, we might not want
> to let just any old user who has SELECT permission on a table to
> specify that it feeds an incrementally updated matview, since there
> is no escaping the fact that extra work will be needed for DML
> against that table if it is doing that.  I seem to recall that at
> least one other product requires the owner of a table to ALTER it
> to set a flag specifying that the table is allowed to be used to
> back incrementally updated matviews; perhaps that's the way to go?

Possibly.  That at least has the advantage of transparency: if you do
ALTER TABLE wunk ENABLE DELTA QUEUE or somesuch syntax, it's very
clear that you're buying an AccessExclusiveLock.  And while
AccessExclusiveLocks are not a lot of fun, one that you know is coming
is a lot better than one that comes as a surprise.

I feel like it would be nicer, though, to come up with some trick that
avoids the need to update the referenced table's pg_class entry
altogether.  I don't immediately have a good idea, but I'll mull it
over and see if I come up with anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: counting algorithm for incremental matview maintenance

From

Amit Kapila

Date:

17 May 2013, 15:02:52

On Wednesday, May 15, 2013 1:22 AM Kevin Grittner wrote:

Good explanation for understanding the initial concept of incremental update
of matviews.

> The original and modified versions of the relations (tables or
> other matviews) which define a matview must be available to
> calculate the matview deltas, so the snapshot information for this
> must be available and registered until the matview delta has been
> calculated.  They can be released once the delta has been
> established and before it has been applied to the matview. 

Here by modified versions of the relations, do you mean to say delta
relations for recording changes.
Could you elaborate a bit about snapshot information, is this snapshot is
for delta relation, when will it acquire snapshot information to
Update matviews?

> Initial
> milestones in this development will focus on "eager" maintenance,
> so this will not initially be a big issue, but will become more
> important as we work on making more of the work asynchronous, so
> that the foreground query is not held up maintaining data which is
> allowed to be a little stale. This seems not entirely unrelated to
> background worker processes and snapshot sharing.

With Regards,
Amit Kapila.

Re: counting algorithm for incremental matview maintenance

From

Amit Kapila

Date:

17 May 2013, 15:07:37

On Thursday, May 16, 2013 7:02 PM Kevin Grittner wrote:
Josh Berkus <josh@agliodbs.com> wrote:

> Let's say there is a table and matview like this:

> create table foo (fooid int primary key, val int not null);
> create materialized view bar as select distinct val from foo;

> Let's say there are millions of rows in both, and that we have
> flagged the view for incremental maintenance.  (Syntax omitted to
> avoid distracting bikeshedding on that when the point is the
> algorithm.)

> Now, someone runs this:

> update foo set val = val + 1 where fooid between 1 and 10;

> What will happen is this:

> Since foo will be flagged as a relation which is referenced by an
> incrementally maintained matview, a delta relation will be
> materialized for this update, which will contain the net change to
> the underlying table in the count_t system column. 

How and when will it clear old rows of delta relation especially when there
are more than one matview is defined on table and how will it maintain from
where to start
second time refresh from this delta relation to matview.

With Regards,
Amit Kapila.

Re: counting algorithm for incremental matview maintenance

From

Kevin Grittner

Date:

17 May 2013, 21:07:44

Amit Kapila <amit.kapila@huawei.com> wrote:
> On Wednesday, May 15, 2013 1:22 AM Kevin Grittner wrote:
>
> Good explanation for understanding the initial concept of
> incremental update of matviews.

Thanks.  This is one of those topics where it takes a lot of time
going over highly technical papers to really get your head around
it.  I'm hoping some simple examples will give a decent intuitive
grasp for those who want that but don't want to invest the time to
cover all the details.

At some point a README file will be needed.  I will probably start
with a Wiki that can evolve during development and be used to help
created the README file near the end of the release cycle.  Dan and
I did that for SSI and it seemed to work reasonably well.

>> The original and modified versions of the relations (tables or
>> other matviews) which define a matview must be available to
>> calculate the matview deltas, so the snapshot information for
>> this must be available and registered until the matview delta
>> has been calculated.  They can be released once the delta has
>> been established and before it has been applied to the matview.
>
> Here by modified versions of the relations, do you mean to say
> delta relations for recording changes.

During calculation of the deltas to apply to the matviews, it must
be possible to query the referenced tables from the perspective of
both the "before" and "after" versions of the data.

> Could you elaborate a bit about snapshot information, is this
> snapshot is for delta relation

I don't think we need to use snapshots to read the deltas -- those
contain the information about the transition from one state to
another, rather than any persistent state which would be visible to
the user.  I see them more like tuple stores created as interim
steps in query execution.

> when will it acquire snapshot information to Update matviews?

In early milestones the work will be done in the context of
completing a statement or a transaction, and the MVCC data used to
update matviews will be from that context -- much like for changes
made within a trigger.

Once we move on to asynchronous matview maintenance it seems pretty
clear that we need to have queues and background worker processes.

If *deltas for the matview changes* are derived in the originating
transaction we might want one queue of deltas for each target
matview.  I don't think the process applying the deltas needs to do
anything unusual or special about acquiring snapshots as it
processes the delta from each transaction, unless we see some way
to optimize things once we're hip-deep in the details.

We could off-load the delta calculations for the matviews from the
originating transaction, but that would require a separate set of
queues background workers, and it would require that snapshots from
the transaction remain registered so that they are valid for use by
the delta calculations.  It's not clear whether the benefit of that
would justify the extra complexity.  That would be, essentially,
another form of parallelizing the work of a connection; so I'm
leaving that as *possible* work later on, which is only likely to
be worthwhile if there is enough infrastructure from other work on
parallelization to reduce the scope here.

Let me know if that doesn't address what you wanted to know.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: counting algorithm for incremental matview maintenance

From

Kevin Grittner

Date:

17 May 2013, 21:20:34

Amit Kapila <amit.kapila@huawei.com> wrote:
> On Thursday, May 16, 2013 7:02 PM Kevin Grittner wrote:

>> Let's say there is a table and matview like this:
>
>> create table foo (fooid int primary key, val int not null);
>> create materialized view bar as select distinct val from foo;
>
>> Let's say there are millions of rows in both, and that we have
>> flagged the view for incremental maintenance.  (Syntax omitted
>> to avoid distracting bikeshedding on that when the point is the
>> algorithm.)
>
>> Now, someone runs this:
>
>> update foo set val = val + 1 where fooid between 1 and 10;
>
>> What will happen is this:
>
>> Since foo will be flagged as a relation which is referenced by
>> an incrementally maintained matview, a delta relation will be
>> materialized for this update, which will contain the net change
>> to the underlying table in the count_t system column.
>
> How and when will it clear old rows of delta relation especially
> when there are more than one matview is defined on table

There will not be a permanent delta relation -- that is something
that will exist in a tuple store or other transient structure for
the duration it is needed.  Once the incremental maintenance code
iterates through all affected matviews, the delta can be destroyed.

> and how will it maintain from where to start second time refresh
> from this delta relation to matview.

I'm not clear on what you're asking here.  If it's about how
incremental maintenance will be coordinated with any explicit
REFRESH, it seems to me that at the point a REFRESH starts, any
changes which are from commits too late to be included in the
snapshot used for the REFRESH would need to block or be queued
until completion of the REFRESH (depending on whether the view is
using synchronous or asynchronous maintenance), and any unapplied
matview deltas from before that could be discarded.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: counting algorithm for incremental matview maintenance

From

Nicolas Barbier

Date:

17 May 2013, 21:44:32

2013/5/17 Kevin Grittner <kgrittn@ymail.com>:

> During calculation of the deltas to apply to the matviews, it must
> be possible to query the referenced tables from the perspective of
> both the "before" and "after" versions of the data.

[..]

> I don't think the process applying the deltas needs to do
> anything unusual or special about acquiring snapshots as it
> processes the delta from each transaction, unless we see some way
> to optimize things once we're hip-deep in the details.

Note that the basic count algorithm assumes real-serializable
transactions for correctness. Example:

* The example matview’s defining query is a simple join: SELECT * FROM
A JOIN B ON A.a = B.b.
* For simplicity, assume that the matview is initially empty.
* Assume a non real-serializable isolation mode, e.g., REPEATABLE READ.
* Assume that there are two transactions T1 and T2 running in
parallel, that don’t see each other’s changes.
* Both T1 and T2 insert a row in A and B, respectively, in such a way
that the combination of the rows is matched by the join condition
(i.e., the row inserted by T1 has A.a equal to the B.b of the row
inserted by T2).
* The count algorithm will not add anything to the matview as part of
the execution of T1 (because the new row in A doesn’t join with any
rows in B that T1 observes).
* Idem for T2 (nothing will be added).
* After T1 and T2 completed, a new transaction will see an
inconsistency: A and B contain rows that are supposed to result in a
row in the matview, but it doesn’t contain any.

Each of the following would fix this problem:

(1) Real-serializability must be required for all transactions that
update any base table of a matview’s that is maintained using the
count algorithm.
(2) The matview’s defining query and the corresponding base tables
must adhere to certain (very commonly satisfied) conditions (for
example, if there is an FK between A.a and B.b, the example cannot be
made to fail; The FK must not be deferred in the case of
right-after-each-statement matview maintenance).
(3) The count algorithm must be implemented in a way that understands
MVCC internals: Reading the base tables must be done using a technique
that reads all rows (i.e., also the ones not visible to the current
transaction), and the xmin/xmax/etc of the updated rows in the matview
must be set in a smart way, so as to yield visibility results that
correspond to the visibility of the “originating” rows in the base
tables. Calculation of the matview deltas (especially the part where
the base tables are inspected for joining rows) must in this case
probably be done in a serialized manner.

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?

Re: counting algorithm for incremental matview maintenance

From

Kevin Grittner

Date:

17 May 2013, 22:26:03

Nicolas Barbier <nicolas.barbier@gmail.com> wrote:
> 2013/5/17 Kevin Grittner <kgrittn@ymail.com>:
>
>> During calculation of the deltas to apply to the matviews, it must
>> be possible to query the referenced tables from the perspective of
>> both the "before" and "after" versions of the data.
>
> [..]
>
>> I don't think the process applying the deltas needs to do
>> anything unusual or special about acquiring snapshots as it
>> processes the delta from each transaction, unless we see some way
>> to optimize things once we're hip-deep in the details.
>
> Note that the basic count algorithm assumes real-serializable
> transactions for correctness. Example:
>
> * The example matview’s defining query is a simple join: SELECT * FROM
> A JOIN B ON A.a = B.b.
> * For simplicity, assume that the matview is initially empty.
> * Assume a non real-serializable isolation mode, e.g., REPEATABLE READ.
> * Assume that there are two transactions T1 and T2 running in
> parallel, that don’t see each other’s changes.
> * Both T1 and T2 insert a row in A and B, respectively, in such a way
> that the combination of the rows is matched by the join condition
> (i.e., the row inserted by T1 has A.a equal to the B.b of the row
> inserted by T2).
> * The count algorithm will not add anything to the matview as part of
> the execution of T1 (because the new row in A doesn’t join with any
> rows in B that T1 observes).
> * Idem for T2 (nothing will be added).
> * After T1 and T2 completed, a new transaction will see an
> inconsistency: A and B contain rows that are supposed to result in a
> row in the matview, but it doesn’t contain any.

Good point.

It might be hard to detect when this type of race condition exists,
since it could be hidden inside a function.  Any thoughts on that?

> Each of the following would fix this problem:
>
> (1) Real-serializability must be required for all transactions that
> update any base table of a matview’s that is maintained using the
> count algorithm.

Agreed.  I wonder whether we should have an option to constraint
DML on a table to serializable transactions, and an option for a
matview to flag that its incremental update depends on that.  This
strategy would automatically detect problems hidden in functions,
whether or not they were explicitly recognized issues with the
matview definition.

It is clear, however, that this cannot be the only supported
approach.

> (2) The matview’s defining query and the corresponding base tables
> must adhere to certain (very commonly satisfied) conditions (for
> example, if there is an FK between A.a and B.b, the example cannot be
> made to fail; The FK must not be deferred in the case of
> right-after-each-statement matview maintenance).

Agreed.  That would cover a high percentage of cases regardless of
transaction isolation level.  Knowing when you needed such a
constraint to make incremental maintenance safe would be hard,
though.

> (3) The count algorithm must be implemented in a way that understands
> MVCC internals: Reading the base tables must be done using a technique
> that reads all rows (i.e., also the ones not visible to the current
> transaction), and the xmin/xmax/etc of the updated rows in the matview
> must be set in a smart way, so as to yield visibility results that
> correspond to the visibility of the “originating” rows in the base
> tables. Calculation of the matview deltas (especially the part where
> the base tables are inspected for joining rows) must in this case
> probably be done in a serialized manner.

I will look at this option some more, but on the face of it, I'm
not quite following how it would work; and it sounds invasive and
fragile. If you know of any paper on this approach you could point
me to, or if you could sketch it out in a little more detail, I
would appreciate that.  While (1) and (2) are straightforward
approaches, either of which would clearly address the race
conditions, I fear that there are those who would not be happy with
a requirement that one or the other of those are required in order
to use incremental maintenance -- so it would be nice to have a
fall-back strategy.

There will, at least for the next several releases, be matviews
defined with queries for which we cannot support incremental
maintenance -- REFRESH will continue to be the only way to deal
with these until we enhance incremental maintenance to support them.
For example, in 9.4 I don't intend to support incremental
maintenance of matviews defined with recursive queries.  To me it
makes sense to prohibit incremental maintenance of a matview for
which there is no strategy to deal with concurrency problems.

Thoughts?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: counting algorithm for incremental matview maintenance

From

Claudio Freire

Date:

17 May 2013, 22:53:34

On Fri, May 17, 2013 at 4:25 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
>> (3) The count algorithm must be implemented in a way that understands
>> MVCC internals: Reading the base tables must be done using a technique
>> that reads all rows (i.e., also the ones not visible to the current
>> transaction), and the xmin/xmax/etc of the updated rows in the matview
>> must be set in a smart way, so as to yield visibility results that
>> correspond to the visibility of the “originating” rows in the base
>> tables. Calculation of the matview deltas (especially the part where
>> the base tables are inspected for joining rows) must in this case
>> probably be done in a serialized manner.
>
> I will look at this option some more, but on the face of it, I'm
> not quite following how it would work; and it sounds invasive and
> fragile.

I don't believe it would be that problematic if deltas:

1 - contains all changes, matching join conditions or not, that could
potentially alter the matview. This includes the example's alterations
since the columns being altered were part of the matview's definition,
but it would not cover updates to columns that were not part of the
definition (ie: not involved in the join or the selected rows).
2 - each update is marked with its xid

Then, "serialization" can be achieved by only applying committed xids,
discarding rolled back xids, and evaluating join satisfaction only
during the updates, and not during delta logging.

Re: counting algorithm for incremental matview maintenance

From

Nicolas Barbier

Date:

17 May 2013, 23:33:16

2013/5/17 Kevin Grittner <kgrittn@ymail.com>:

> Nicolas Barbier <nicolas.barbier@gmail.com> wrote:
>
>> Note that the basic count algorithm assumes real-serializable
>> transactions for correctness. Example:

[..]

> Good point.
>
> It might be hard to detect when this type of race condition exists,
> since it could be hidden inside a function.  Any thoughts on that?

I think that matviews with defining queries that use functions for
which the system can not prove whether they are safe for the given
incremental maintenance technique (e.g., count algorithm) and
situation (e.g., mandatory isolation level), should not be allowed to
use incremental maintenance of that type.

Although I have only limited practical matview experience (with SQL
Server (2005?) only, there it is called “indexed views”): I assume
that in most other DBMSs, the queries that can be used for defining
incrementally maintained matviews are very limited (they are in SQL
Server).

>> Each of the following would fix this problem:
>>
>> (2) The matview’s defining query and the corresponding base tables
>> must adhere to certain (very commonly satisfied) conditions (for
>> example, if there is an FK between A.a and B.b, the example cannot be
>> made to fail; The FK must not be deferred in the case of
>> right-after-each-statement matview maintenance).
>
> Agreed.  That would cover a high percentage of cases regardless of
> transaction isolation level.  Knowing when you needed such a
> constraint to make incremental maintenance safe would be hard,
> though.

Reasoning about it becomes easier when you have a clean definition of
which queries you accept, and which ones you don’t (see below). Then
you can look at your definition and say “I allow X because it is
always correct  I don’t allow Y because I didn’t prove it to be always
correct (yet).”

>> (3) The count algorithm must be implemented in a way that understands
>> MVCC internals: Reading the base tables must be done using a technique
>> that reads all rows (i.e., also the ones not visible to the current
>> transaction), and the xmin/xmax/etc of the updated rows in the matview
>> must be set in a smart way, so as to yield visibility results that
>> correspond to the visibility of the “originating” rows in the base
>> tables. Calculation of the matview deltas (especially the part where
>> the base tables are inspected for joining rows) must in this case
>> probably be done in a serialized manner.
>
> I will look at this option some more, but on the face of it, I'm
> not quite following how it would work; and it sounds invasive and
> fragile. If you know of any paper on this approach you could point
> me to, or if you could sketch it out in a little more detail, I
> would appreciate that.

Sorry, these are my own musings: I have never found any paper about it.

After a bit more thought, I think it wouldn’t be so easy for the count
algorithm: It would be necessary to express (using physical MVCC rows)
something like “you can only see this (logical) row if both
transaction T1 and T2 are visible to you.” Disjunction would be easy:
create two physical rows with the same data, one that is visible if T1
is visible, and one that is visible if T2 is visible. Unfortunately,
we need conjunction here :-(.

(Upgrading the expressiveness of the MVCC-related information in the
rows would help, but as that is more invasive than anyone wants to
think about, I restrain those thoughts to the confines my own head
;-).)

OTOH, I think that this technique could be used in the case of simple
“aggregation” matviews (without joins, or *maybe* when all joins are
constrained by FKs). (I assume an implementation such as the one that
I sketched in a previous post[1] a few months back.)

[1] <URL:http://www.postgresql.org/message-id/CAP-rdTY+VZ8T_8JBvsksDRdz-_wr5ef80M-_UKgu5bHw+rkjpg@mail.gmail.com>

> There will, at least for the next several releases, be matviews
> defined with queries for which we cannot support incremental
> maintenance -- REFRESH will continue to be the only way to deal
> with these until we enhance incremental maintenance to support them.
> For example, in 9.4 I don't intend to support incremental
> maintenance of matviews defined with recursive queries.  To me it
> makes sense to prohibit incremental maintenance of a matview for
> which there is no strategy to deal with concurrency problems.

+1

I think you need a strict definition, specific to your incremental
maintenance algorithm, of which queries you accept and which ones you
don’t. Such a definition could be based on SQL syntax, or on any of
the intermediate forms that are produced by the
parser/rewriter/analyzer/planner/etc. I think that basing your
definition on a later processing stage (e.g., after subqueries are
flattened, after views are expanded, or after “redundant” predicates
are removed) would increase the amount of queries that are allowed,
but might make it somewhat more difficult for a user to predict
whether a query allows incremental mantenance or not. (It might also
make your definition depend on “internal stuff” that might change from
one release to the next.)

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?

Re: counting algorithm for incremental matview maintenance

From

Kevin Grittner

Date:

17 May 2013, 23:55:36

Nicolas Barbier <nicolas.barbier@gmail.com> wrote:
> 2013/5/17 Kevin Grittner <kgrittn@ymail.com>:
>> Nicolas Barbier <nicolas.barbier@gmail.com> wrote:
>>
>>> Note that the basic count algorithm assumes real-serializable
>>> transactions for correctness. Example:
>
> [..]
>
>> Good point.
>>
>> It might be hard to detect when this type of race condition exists,
>> since it could be hidden inside a function.  Any thoughts on that?
>
> I think that matviews with defining queries that use functions for
> which the system can not prove whether they are safe for the given
> incremental maintenance technique (e.g., count algorithm) and
> situation (e.g., mandatory isolation level), should not be allowed to
> use incremental maintenance of that type.

It's clear enough that any IMMUTABLE function should be safe, but
for anything less I don't see how we could trust it short of having
a "white list" of which functions are known to be safe, or creating
a new property of for functions.  Not something I want to require
up front; maybe later....

> Although I have only limited practical matview experience (with SQL
> Server (2005?) only, there it is called “indexed views”): I assume
> that in most other DBMSs, the queries that can be used for defining
> incrementally maintained matviews are very limited (they are in SQL
> Server).

In researching this feature I have found a few comments by Oracle
users that when incremental maintenance was first added (in 1999)
the queries which it could support were severely limited, but that
they have expanded coverage with each major release.  People don't
seem to have too many complaints about what is covered now;
although how much of that is adapting to the limits and how much is
due to expansion of the limits is not clear.  You would think that
somewhere they would define what the limits for a given version
are, but I have not stumbled across anything like that.

>>> Each of the following would fix this problem:
>>>
>>> (2) The matview’s defining query and the corresponding base tables
>>> must adhere to certain (very commonly satisfied) conditions (for
>>> example, if there is an FK between A.a and B.b, the example cannot be
>>> made to fail; The FK must not be deferred in the case of
>>> right-after-each-statement matview maintenance).
>>
>> Agreed.  That would cover a high percentage of cases regardless of
>> transaction isolation level.  Knowing when you needed such a
>> constraint to make incremental maintenance safe would be hard,
>> though.
>
> Reasoning about it becomes easier when you have a clean definition of
> which queries you accept, and which ones you don’t (see below). Then
> you can look at your definition and say “I allow X because it is
> always correct  I don’t allow Y because I didn’t prove it to be always
> correct (yet).”

Agreed.  I'm just haven't figured out precisely how concurrency
issues adjust those boundaries yet.

>>> (3) The count algorithm must be implemented in a way that understands
>>> MVCC internals: Reading the base tables must be done using a technique
>>> that reads all rows (i.e., also the ones not visible to the current
>>> transaction), and the xmin/xmax/etc of the updated rows in the matview
>>> must be set in a smart way, so as to yield visibility results that
>>> correspond to the visibility of the “originating” rows in the base
>>> tables. Calculation of the matview deltas (especially the part where
>>> the base tables are inspected for joining rows) must in this case
>>> probably be done in a serialized manner.
>>
>> I will look at this option some more, but on the face of it, I'm
>> not quite following how it would work; and it sounds invasive and
>> fragile. If you know of any paper on this approach you could point
>> me to, or if you could sketch it out in a little more detail, I
>> would appreciate that.
>
> Sorry, these are my own musings: I have never found any paper
> about it.

> [musings]

Will review and ponder these and what Claudio offered, and see
where that takes me.

Thanks to both of you for the input!

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: counting algorithm for incremental matview maintenance

From

Josh Berkus

Date:

18 May 2013, 00:12:54

Kevin,

> The count_t column saves us from having to scan foo for all the old
> val values.  It does not require any scan of the entire bar
> matview.  It allows us to zero in on exactly the right rows, and
> lets us know what needs doing.

This sounds like a fairly good approach.  It would require a couple of
things though:

1) admins would need to be able to enable and disable incremental
updating of matviews, so that if the creation of delta tables is bogging
down writes, they can disable them temporarily as a performance workaround.

2) if an admin enables incremental updating of an existing matview, it
would need to be REFRESHed immediately before we could start
incrementally updating it.  So an ENABLE should call a REFRESH.

> Hopefully this makes it fairly clear that the only thing that an
> optimization around the "append-only" assertion for a matview would
> be the ability to skip the probe for an existing record *related to
> rows which are in the delta*.  As long as there is reasonable
> indexing on the matview, maintenance for the append-only case would
> not involve scanning the entire matview.

Yeah, given what you've explained, my first inclination would be just to
let the counting do its thing and see how slow/expensive it is before we
try further optimizations.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: counting algorithm for incremental matview maintenance

From

Kevin Grittner

Date:

18 May 2013, 00:49:45

Josh Berkus <josh@agliodbs.com> wrote:

> This sounds like a fairly good approach.  It would require a
> couple of things though:
>
> 1) admins would need to be able to enable and disable incremental
> updating of matviews, so that if the creation of delta tables is
> bogging down writes, they can disable them temporarily as a
> performance workaround.

Yes.  This is the sort of thing I plan to put on ALTER MATERIALIZED
VIEW and perhaps an ALTER TABLE option which allows the table to
generate deltas.  Turning off the table option should probably have
a CASCADE option with the sort of user feedback DROP options have.

Some other products have AMV options that do things like allow an
attempt to scan a matview automatically cause a REFRESH if
referenced tables have changed since the last REFRESH.  That seems
like something to defer until we have a more sophisticated notion
of freshness (or staleness -- if we prefer to view the glass as
half-empty).

> 2) if an admin enables incremental updating of an existing
> matview, it would need to be REFRESHed immediately before we
> could start incrementally updating it.  So an ENABLE should call
> a REFRESH.

Right.  We would need to coordinate the start of the incremental
maintenance with the snapshot used for the REFRESH.  As far as I
can tell, it could be either type of REFRESH (truncate and create a
new heap or create the new data contents in a temporary relation
and use a differential DELETE and INSERT technique), as long as the
snapshot used for determining the new contents is used to determine
which transactions can provide deltas.

> Yeah, given what you've explained, my first inclination would be
> just to let the counting do its thing and see how slow/expensive
> it is before we try further optimizations.

Agreed.  The counting algorithm itself has some optional
optimizations that I'm not sure we'll get to in 9.4.  Anything this
specialized should be evaluated only after we have all the more
"generic" optimizations in place.

Thanks for the feedback!

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: counting algorithm for incremental matview maintenance

From

Nicolas Barbier

Date:

18 May 2013, 00:53:39

2013/5/17 Claudio Freire <klaussfreire@gmail.com>:

> On Fri, May 17, 2013 at 4:25 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
>
>>> (3) The count algorithm must be implemented in a way that understands
>>> MVCC internals: Reading the base tables must be done using a technique
>>> that reads all rows (i.e., also the ones not visible to the current
>>> transaction), and the xmin/xmax/etc of the updated rows in the matview
>>> must be set in a smart way, so as to yield visibility results that
>>> correspond to the visibility of the “originating” rows in the base
>>> tables. Calculation of the matview deltas (especially the part where
>>> the base tables are inspected for joining rows) must in this case
>>> probably be done in a serialized manner.
>>
>> I will look at this option some more, but on the face of it, I'm
>> not quite following how it would work; and it sounds invasive and
>> fragile.
>
> I don't believe it would be that problematic if deltas:
>
> 1 - contains all changes, matching join conditions or not, that could
> potentially alter the matview. This includes the example's alterations
> since the columns being altered were part of the matview's definition,
> but it would not cover updates to columns that were not part of the
> definition (ie: not involved in the join or the selected rows).
> 2 - each update is marked with its xid
>
> Then, "serialization" can be achieved by only applying committed xids,
> discarding rolled back xids, and evaluating join satisfaction only
> during the updates, and not during delta logging.

I think that your idea of postponing some of the processing to later
commits might indeed make it possible to implement this without
needing the “MVCC-expressiveness enhancement” that I mentioned in my
previous post (by postponing the application to the matview until at
least *someone* must be able to see that particular change). Right now
I have some difficulties working out the MVCC-details of what to do
when joining a base table delta row with the contents of the other
base tables (to create the matview delta), but I’m probably just tired
:-).

In any case, your proposal seems to require the following additional
“infrastructural changes” relative to Kevin’s proposal:

(1) The base table deltas must be shared by all “in flight”
transactions that are in the process of putting stuff in them and
using them to apply changes to the matviews.
(2) Putting stuff in base table deltas must be transactionally safe
(unless all matviews that cares about that specific base table delta
are unlogged).
(3) “Right-after-each-statement matview maintenance” is not possible
(without additional effort).

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?