Thread: counting algorithm for incremental matview maintenance
<div style="color:#000; background-color:#fff; font-family:times new roman, new york, times, serif;font-size:12pt">In surveyingthe literature on $subject, I find that most of the<br />theoretical work related to how to incrementally update<br/>materialized views based on the matview declaration was published<br />between 1988 and 1993. The best paperI have been able to find on<br />the topic was published in ACM SIGMOD in 1993[1], and covers two<br />algorithms: countingand DRed. The former should be very fast for<br />non-recursive views, but not very efficient for recursive views.<br/>The latter algorithm is the other way around -- it looks like it<br />will do well with recursive views but generallybe slower for<br />non-recursive views.<br /><br />It does not seem feasible to me to implement both techniquesin a<br />single one-year PostgreSQL release. In fact, if we have trouble<br />getting everyone onto the samepage early, we might have to settle<br />for trying to just get some infrastructure into place, without<br />anythingto actually make use of it. That would be unfortunate,<br />since Oracle added incremental maintenance of matviewsto their<br />existing feature in 1999, and have been improving it regularly<br />since then. Many other productsalso have mature implementations<br />of this, and there seems to be a lot of demand for it in<br />PostgreSQL. In the best of circumstances it will take years for us<br />to catch up on this front.<br /><br />What I proposefor 9.4 is this:<br /><br />During the first CF I would like to have a discussion to settle on<br />how best to dosome things that the counting algorithm requires, so<br />that following the first CF I can try to implement that<br />infrastructureand get at least simple matviews working for<br />incremental maintenance, and have something ready for thesecond<br />CF. I can hack up a rough patch to suggest a possible approach for<br />purposes of discussion, but I'm notsure whether either that's<br />necessary or desirable.<br /><br />----------<br />Needed infrastructure changes follow.<br/>----------<br /><br />First, the algorithm requires a new system column called<br />count_t (or similar). Itwould be optional (like oid is) would<br />only be used by tuples in matviews maintained by the counting<br />algorithm,and by delta relations used in the matview incremental<br />maintenance process. Tables and matviews not createdor altered to<br />specify incremental maintenance would not carry this new system<br />column. Existing matviewsfrom 9.3 would continue to work as<br />before until and unless they were ALTERed to specify incremental<br />maintenance;this form of ALTER would require rewrite of the<br />matview to add the count_t column.<br /><br />This algorithmalso requires that we have new execution node types<br />(or conditional behavior in existing nodes) to supportprocessing<br />counts. It is worth noting that relations without a count_t column<br />will sometimes need to beprocessed using the new logic (in which<br />case each tuple is treated as having an implied count of one).<br /><br />count_tin a matview must always be greater than zero. In delta<br />relations it can be negative (to reflect tuples beingdeleted or<br />the "before" image of an update) or positive (to reflect tuples<br />being inserted or the "after" imageof an update).<br /><br />Examples of the new behavior in nodes used by incremental<br />maintenance are:<br /><br /> *DISTINCT or GROUP BY: count_t in the result should represent the<br />number of tuples being combined.<br /><br /> *JOIN: count_t in the result is the product of multiplying the<br />counts from the pair of joined tuples.<br /><br /> *UNION (without ALL): count_t in the result should be the value<br />from an unmatched tuple or the sum of the countsof matched tuples,<br />with any result row with a zero count_t being omitted from the<br />result.<br /><br />Notethat for nodes other than UNION, the logic is identical except<br />for calculation of the count for the output.<br/><br />The resulting matviews can be used in the traditional set logic of<br />PostgreSQL, without knowing aboutor caring about the count_t<br />column, exactly as before; the new system column is only used for<br />maintenance.<br/><br />We could drive the triggering of incremental maintenance off of the<br />dependency informationwhich is already stored, but for performance<br />we probably want to add a new pg_class flag to indicate thatthe<br />relation is referenced by a matview definition which specifies<br />incremental update. That would allow afast path for skipping<br />other tests for DML on non-referenced relations, at the expense of<br />some additional catalogupdates on some DDL.<br /><br />----------<br />The above is everything that is a significant infrastructure change<br/>needed for a first cut at incremental maintenance of materialized<br />views, driven by the view definition. Below is additional detail<br />about what will happen when incremental maintenance for a view is<br />specified,for those who wonder why these infrastructure changes<br />are needed.<br />----------<br /><br />Essentially,the technique is based on relational algebra. This<br />allows previously existing theory to prove the correctnessof the<br />technique and its optimizations. The addition of the count_t<br />column allows accurate identificationof exactly which rows need to<br />be touched during incremental maintenance, with minimal overhead.<br /><br/>As the paper notes, these incremental maintenance techniques are a<br />heuristic; there will always be cases whereit is faster to<br />regenerate the data from scratch than to use these incremental<br />techniques, so part of thegoal of having the transactional REFRESH<br />(as described in a separate thread) is to allow some front-end<br />teststo attempt to recognize when an incremental approach will be<br />slower than a REFRESH and fall back onto the transactionalREFRESH<br />technique seamlessly and fairly quietly. (My gut feeling on the<br />right level for an ereportof this is DEBUG1, although I can see<br />the argument for a LOG level entry.)<br /><br />The original and modifiedversions of the relations (tables or<br />other matviews) which define a matview must be available to<br />calculatethe matview deltas, so the snapshot information for this<br />must be available and registered until the matviewdelta has been<br />calculated. They can be released once the delta has been<br />established and before it has beenapplied to the matview. Initial<br />milestones in this development will focus on "eager" maintenance,<br />so thiswill not initially be a big issue, but will become more<br />important as we work on making more of the work asynchronous,so<br />that the foreground query is not held up maintaining data which is<br />allowed to be a little stale.This seems not entirely unrelated to<br />background worker processes and snapshot sharing.<br /><br />I don't knowwhether we can get to this during 9.4 development, but<br />it might be worth at least considering as the other workis done:<br />some aggregate columns with a small amount of data kept while the<br />aggregate is being computed canbe further optimized under this<br />technique by storing the working data in the tuple as some sort of<br />hidden orsystem column related to the aggregate. A few of the<br />obvious candidates for such optimization include count(), sum()and<br />avg(). Note that with this capability, especially if we ever<br />implement the query rewrite capability formatviews, we would have<br />a nice answer for the count(*) speed complaints.<br /><br />Obviously, there will need tobe some new syntax around CMV and AMV<br />to support specification of how the matview should be maintained.<br /><br />Also,we are moving into a phase where it makes sense to start the<br />bikeshedding on what timings we should support,how information<br />about the desired timings should be stored, how "freshness" should<br />be tracked, and whatsyntax we put around all that. Frankly, I<br />think I will have my hands pretty full with the incremental<br />maintenancework, and I see the timing work as largely orthogonal,<br />so it would be great if anyone who was interestedin that aspect of<br />things could take primary responsibility for moving that along.<br /><br />--<br />KevinGrittner<br />EnterpriseDB: http://www.enterprisedb.com<br />The Enterprise PostgreSQL Company<br /><br />[1] http://dl.acm.org/citation.cfm?id=170066<br/><br /></div>
On Tue, May 14, 2013 at 2:52 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > In surveying the literature on $subject, I find that most of the > theoretical work related to how to incrementally update > materialized views based on the matview declaration was published > between 1988 and 1993. The best paper I have been able to find on > the topic was published in ACM SIGMOD in 1993[1], and covers two > algorithms: counting and DRed. The former should be very fast for > non-recursive views, but not very efficient for recursive views. > The latter algorithm is the other way around -- it looks like it > will do well with recursive views but generally be slower for > non-recursive views. > > It does not seem feasible to me to implement both techniques in a > single one-year PostgreSQL release. In fact, if we have trouble > getting everyone onto the same page early, we might have to settle > for trying to just get some infrastructure into place, without > anything to actually make use of it. That would be unfortunate, > since Oracle added incremental maintenance of matviews to their > existing feature in 1999, and have been improving it regularly > since then. Many other products also have mature implementations > of this, and there seems to be a lot of demand for it in > PostgreSQL. In the best of circumstances it will take years for us > to catch up on this front. #1 issue I have with current matview functionality is locking. currently refresh takes out an access exclusive lock. so, question is, do you think your proposal will be such that it will no longer require taking out full lock for refresh purposes (either incremental or otherwise)? merlin
Merlin Moncure <mmoncure@gmail.com> wrote: > #1 issue I have with current matview functionality is locking. > currently refresh takes out an access exclusive lock. so, > question is, do you think your proposal will be such that it will > no longer require taking out full lock for refresh purposes > (either incremental or otherwise)? The right thread for *that* question is "Differential (transactional) REFRESH"; however, I might as well say here that I don't think we want to get rid of the (faster) version that just replaces the current heap when we add the (slower) option to REFRESH it transactionally. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, May 15, 2013 at 11:33 AM, Kevin Grittner <kgrittn@ymail.com> wrote: > Merlin Moncure <mmoncure@gmail.com> wrote: > >> #1 issue I have with current matview functionality is locking. >> currently refresh takes out an access exclusive lock. so, >> question is, do you think your proposal will be such that it will >> no longer require taking out full lock for refresh purposes >> (either incremental or otherwise)? > > The right thread for *that* question is "Differential > (transactional) REFRESH"; however, I might as well say here that I > don't think we want to get rid of the (faster) version that just > replaces the current heap when we add the (slower) option to > REFRESH it transactionally. sorry, didn't notice that thread. agreed, that seems good candidate for user input to refresh command to manage the tradeoff. well, do you expect the application of differential refresh to be automatic? lockless + differential refresh would be game changer in terms of how I build up data for analytics. merlin
Merlin Moncure <mmoncure@gmail.com> wrote: > Kevin Grittner <kgrittn@ymail.com> wrote: >> Merlin Moncure <mmoncure@gmail.com> wrote: >> >>> #1 issue I have with current matview functionality is locking. >>> currently refresh takes out an access exclusive lock. so, >>> question is, do you think your proposal will be such that it >>> will no longer require taking out full lock for refresh >>> purposes (either incremental or otherwise)? >> >> The right thread for *that* question is "Differential >> (transactional) REFRESH"; however, I might as well say here that >> I don't think we want to get rid of the (faster) version that >> just replaces the current heap when we add the (slower) option >> to REFRESH it transactionally. > > sorry, didn't notice that thread. agreed, that seems good > candidate for user input to refresh command to manage the > tradeoff. > > well, do you expect the application of differential refresh to be > automatic? I expect considerable bikeshedding on this, but my preference would be to allow syntax for specifying which technique is desired on the REFRESH command, and in the absence of specification I would prefer that it default to the current technique for a matview which is not yet populated and the transactional technique for a populated matview. We could associate some property with the matview for default REFRESH technique, but I don't know whether that's worth the trouble. > lockless + differential refresh would be game changer in terms of > how I build up data for analytics. Yeah, it's definitely something I would have liked to have in the initial release, but I tried to keep the scope very limited given how little time there was left in the release cycle when I got a chance to start on it. Adding this seemed to be just a little too much. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin, It's fairly common for matviews to be constructed such that updates to them are strictly appends. For example, a matview which has a daily summary would just get appended to each day, and existing rows would not change barring a major historical database cleanup. It seems like we could ... and ought to ... optimize for this pattern somehow for incremental updates. That is, if the user knows that we're going to be only appending new rows and not modifying any old ones, s/he ought to be able to tell the database that somehow and avoid the overhead of checking. While the overhead of checking a count wouldn't be that high for a few hundred rows, I've dealt with matviews which were thousands to millions of rows. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
<div style="color:#000; background-color:#fff; font-family:times new roman, new york, times, serif;font-size:12pt">Josh Berkus<josh@agliodbs.com> wrote:<br /><br />> It's fairly common for matviews to be constructed such that<br />>updates to them are strictly appends. For example, a matview<br />> which has a daily summary would just get appendedto each day,<br />> and existing rows would not change barring a major historical<br />> database cleanup.<br/>><br />> It seems like we could ... and ought to ... optimize for this<br />> pattern somehow for incrementalupdates. That is, if the user<br />> knows that we're going to be only appending new rows and not<br />>modifying any old ones, s/he ought to be able to tell the<br />> database that somehow and avoid the overhead ofchecking. While<br />> the overhead of checking a count wouldn't be that high for a few<br />> hundred rows, I'vedealt with matviews which were thousands to<br />> millions of rows.<br /><br />Thanks for the suggestion; I willkeep an eye out for ways this<br />insight might allow an optimization. I think there might be some<br />misunderstandingof the counting algorithm, though -- there is no<br />need to do a sequential pass through the matviewexamining the<br />counts.<br /><br />I don't want to replicate the content of a fairly dense (in the<br />sense ofhaving a lot of content per page) 10 page computer science<br />paper in this email, but for purposes of illustration Iwill take a<br />very simple case and show how it works. (This is not geared toward<br />your particular case, becausethat could get kinda long to explain<br />here and now, but hopefully this will give an insight into the<br />techniqueoverall.)<br /><br />Let's say there is a table and matview like this:<br /><br />create table foo (fooid intprimary key, val int not null);<br />create materialized view bar as select distinct val from foo;<br /><br />Let's saythere are millions of rows in both, and that we have<br />flagged the view for incremental maintenance. (Syntax omittedto<br />avoid distracting bikeshedding on that when the point is the<br />algorithm.)<br /><br />Now, someone runsthis:<br /><br />update foo set val = val + 1 where fooid between 1 and 10;<br /><br />What will happen is this:<br /><br/>Since foo will be flagged as a relation which is referenced by an<br />incrementally maintained matview, a delta relationwill be<br />materialized for this update, which will contain the net change to<br />the underlying table in thecount_t system column. "Before" tuples<br />will have a count of -1; "after" tuples will have a count of 1.<br />Thenthe query defining the view will be run *against the delta*,<br />resulting in a relation with a count_t column reflectingthe net<br />change for each val. Anything with a zero for the net change will<br />be dropped. We will run alogical UNION of this relation and the<br />bar matview. In this case, we obviously want this to be done in a<br />waythat for each row in this "net change" relation, we do an index<br />scan against the bar matview; if not found, weinsert the new row<br />into the matview with its count from the net change relation.<br />(That had better be positiveor we have a bug -- so elog ERROR if<br />negative.) If bar does contain a matching row, update count_t in<br />thatrow with the sum of its old value and the value from the "net<br />change" relation. Of course, that new value alsohad better be<br />positive or zero -- if zero we delete the old row rather than<br />updating count_t.<br /><br />Thecount_t column saves us from having to scan foo for all the old<br />val values. It does not require any scan of theentire bar<br />matview. It allows us to zero in on exactly the right rows, and<br />lets us know what needs doing.<br/><br />Clearly we want the planner to find the best plans for the interim<br />steps rather than hard-coding particularrule-based plans; I just<br />used an example where it's pretty clear what a reasonable plan<br />would be.<br/><br />Hopefully this makes it fairly clear that the only thing that an<br />optimization around the "append-only"assertion for a matview would<br />be the ability to skip the probe for an existing record *related to<br />rowswhich are in the delta*. As long as there is reasonable<br />indexing on the matview, maintenance for the append-onlycase would<br />not involve scanning the entire matview.<br /><br />-- <br />Kevin Grittner<br />EnterpriseDB:http://www.enterprisedb.com<br />The Enterprise PostgreSQL Company<br /><br /></div>
I wrote: > Let's say there is a table and matview like this: > > create table foo (fooid int primary key, val int not null); > create materialized view bar as select distinct val from foo; Some of the subsequent text doesn't make sense unless that materialized view has an index, like this: create unique index bar_val on bar (val); Without such an index, it would need to use a plan which scanned the entire bar relation for any maintenance. Apologies for the omission and any confusion it caused. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, May 14, 2013 at 3:52 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > We could drive the triggering of incremental maintenance off of the > dependency information which is already stored, but for performance > we probably want to add a new pg_class flag to indicate that the > relation is referenced by a matview definition which specifies > incremental update. That would allow a fast path for skipping > other tests for DML on non-referenced relations, at the expense of > some additional catalog updates on some DDL. I'm afraid this might require creating a matview or updating the definition of a matview to refer to different relations to take AccessExclusiveLock on those relations, in order to avoid SnapshotNow problems while updating this flag for those relations, and I think that's probably unacceptable. Some thought may be needed here to come up with a good solution. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > Kevin Grittner <kgrittn@ymail.com> wrote: >> We could drive the triggering of incremental maintenance off of the >> dependency information which is already stored, but for performance >> we probably want to add a new pg_class flag to indicate that the >> relation is referenced by a matview definition which specifies >> incremental update. That would allow a fast path for skipping >> other tests for DML on non-referenced relations, at the expense of >> some additional catalog updates on some DDL. > > I'm afraid this might require creating a matview or updating the > definition of a matview to refer to different relations to take > AccessExclusiveLock on those relations, in order to avoid SnapshotNow > problems while updating this flag for those relations, and I think > that's probably unacceptable. Some thought may be needed here to come > up with a good solution. Thanks for the feedback. I had been thinking that such a flag would be the moral equivalent of such existing flags as relhaspkey, relhasrules, relhastriggers, and relhassubclass. Those all require owner rights to change, and perhaps we don't want to require that a user be the owner of a table to define a materialized view which references that table and specifies incremental update. On the other hand, we might not want to let just any old user who has SELECT permission on a table to specify that it feeds an incrementally updated matview, since there is no escaping the fact that extra work will be needed for DML against that table if it is doing that. I seem to recall that at least one other product requires the owner of a table to ALTER it to set a flag specifying that the table is allowed to be used to back incrementally updated matviews; perhaps that's the way to go? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, May 16, 2013 at 8:33 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > Robert Haas <robertmhaas@gmail.com> wrote: >> Kevin Grittner <kgrittn@ymail.com> wrote: >>> We could drive the triggering of incremental maintenance off of the >>> dependency information which is already stored, but for performance >>> we probably want to add a new pg_class flag to indicate that the >>> relation is referenced by a matview definition which specifies >>> incremental update. That would allow a fast path for skipping >>> other tests for DML on non-referenced relations, at the expense of >>> some additional catalog updates on some DDL. >> >> I'm afraid this might require creating a matview or updating the >> definition of a matview to refer to different relations to take >> AccessExclusiveLock on those relations, in order to avoid SnapshotNow >> problems while updating this flag for those relations, and I think >> that's probably unacceptable. Some thought may be needed here to come >> up with a good solution. > > Thanks for the feedback. > > I had been thinking that such a flag would be the moral equivalent > of such existing flags as relhaspkey, relhasrules, relhastriggers, > and relhassubclass. Those all require owner rights to change, and > perhaps we don't want to require that a user be the owner of a > table to define a materialized view which references that table and > specifies incremental update. On the other hand, we might not want > to let just any old user who has SELECT permission on a table to > specify that it feeds an incrementally updated matview, since there > is no escaping the fact that extra work will be needed for DML > against that table if it is doing that. I seem to recall that at > least one other product requires the owner of a table to ALTER it > to set a flag specifying that the table is allowed to be used to > back incrementally updated matviews; perhaps that's the way to go? Possibly. That at least has the advantage of transparency: if you do ALTER TABLE wunk ENABLE DELTA QUEUE or somesuch syntax, it's very clear that you're buying an AccessExclusiveLock. And while AccessExclusiveLocks are not a lot of fun, one that you know is coming is a lot better than one that comes as a surprise. I feel like it would be nicer, though, to come up with some trick that avoids the need to update the referenced table's pg_class entry altogether. I don't immediately have a good idea, but I'll mull it over and see if I come up with anything. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wednesday, May 15, 2013 1:22 AM Kevin Grittner wrote: Good explanation for understanding the initial concept of incremental update of matviews. > The original and modified versions of the relations (tables or > other matviews) which define a matview must be available to > calculate the matview deltas, so the snapshot information for this > must be available and registered until the matview delta has been > calculated. They can be released once the delta has been > established and before it has been applied to the matview. Here by modified versions of the relations, do you mean to say delta relations for recording changes. Could you elaborate a bit about snapshot information, is this snapshot is for delta relation, when will it acquire snapshot information to Update matviews? > Initial > milestones in this development will focus on "eager" maintenance, > so this will not initially be a big issue, but will become more > important as we work on making more of the work asynchronous, so > that the foreground query is not held up maintaining data which is > allowed to be a little stale. This seems not entirely unrelated to > background worker processes and snapshot sharing. With Regards, Amit Kapila.
On Thursday, May 16, 2013 7:02 PM Kevin Grittner wrote: Josh Berkus <josh@agliodbs.com> wrote: > Let's say there is a table and matview like this: > create table foo (fooid int primary key, val int not null); > create materialized view bar as select distinct val from foo; > Let's say there are millions of rows in both, and that we have > flagged the view for incremental maintenance. (Syntax omitted to > avoid distracting bikeshedding on that when the point is the > algorithm.) > Now, someone runs this: > update foo set val = val + 1 where fooid between 1 and 10; > What will happen is this: > Since foo will be flagged as a relation which is referenced by an > incrementally maintained matview, a delta relation will be > materialized for this update, which will contain the net change to > the underlying table in the count_t system column. How and when will it clear old rows of delta relation especially when there are more than one matview is defined on table and how will it maintain from where to start second time refresh from this delta relation to matview. With Regards, Amit Kapila.
Amit Kapila <amit.kapila@huawei.com> wrote: > On Wednesday, May 15, 2013 1:22 AM Kevin Grittner wrote: > > Good explanation for understanding the initial concept of > incremental update of matviews. Thanks. This is one of those topics where it takes a lot of time going over highly technical papers to really get your head around it. I'm hoping some simple examples will give a decent intuitive grasp for those who want that but don't want to invest the time to cover all the details. At some point a README file will be needed. I will probably start with a Wiki that can evolve during development and be used to help created the README file near the end of the release cycle. Dan and I did that for SSI and it seemed to work reasonably well. >> The original and modified versions of the relations (tables or >> other matviews) which define a matview must be available to >> calculate the matview deltas, so the snapshot information for >> this must be available and registered until the matview delta >> has been calculated. They can be released once the delta has >> been established and before it has been applied to the matview. > > Here by modified versions of the relations, do you mean to say > delta relations for recording changes. During calculation of the deltas to apply to the matviews, it must be possible to query the referenced tables from the perspective of both the "before" and "after" versions of the data. > Could you elaborate a bit about snapshot information, is this > snapshot is for delta relation I don't think we need to use snapshots to read the deltas -- those contain the information about the transition from one state to another, rather than any persistent state which would be visible to the user. I see them more like tuple stores created as interim steps in query execution. > when will it acquire snapshot information to Update matviews? In early milestones the work will be done in the context of completing a statement or a transaction, and the MVCC data used to update matviews will be from that context -- much like for changes made within a trigger. Once we move on to asynchronous matview maintenance it seems pretty clear that we need to have queues and background worker processes. If *deltas for the matview changes* are derived in the originating transaction we might want one queue of deltas for each target matview. I don't think the process applying the deltas needs to do anything unusual or special about acquiring snapshots as it processes the delta from each transaction, unless we see some way to optimize things once we're hip-deep in the details. We could off-load the delta calculations for the matviews from the originating transaction, but that would require a separate set of queues background workers, and it would require that snapshots from the transaction remain registered so that they are valid for use by the delta calculations. It's not clear whether the benefit of that would justify the extra complexity. That would be, essentially, another form of parallelizing the work of a connection; so I'm leaving that as *possible* work later on, which is only likely to be worthwhile if there is enough infrastructure from other work on parallelization to reduce the scope here. Let me know if that doesn't address what you wanted to know. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Amit Kapila <amit.kapila@huawei.com> wrote: > On Thursday, May 16, 2013 7:02 PM Kevin Grittner wrote: >> Let's say there is a table and matview like this: > >> create table foo (fooid int primary key, val int not null); >> create materialized view bar as select distinct val from foo; > >> Let's say there are millions of rows in both, and that we have >> flagged the view for incremental maintenance. (Syntax omitted >> to avoid distracting bikeshedding on that when the point is the >> algorithm.) > >> Now, someone runs this: > >> update foo set val = val + 1 where fooid between 1 and 10; > >> What will happen is this: > >> Since foo will be flagged as a relation which is referenced by >> an incrementally maintained matview, a delta relation will be >> materialized for this update, which will contain the net change >> to the underlying table in the count_t system column. > > How and when will it clear old rows of delta relation especially > when there are more than one matview is defined on table There will not be a permanent delta relation -- that is something that will exist in a tuple store or other transient structure for the duration it is needed. Once the incremental maintenance code iterates through all affected matviews, the delta can be destroyed. > and how will it maintain from where to start second time refresh > from this delta relation to matview. I'm not clear on what you're asking here. If it's about how incremental maintenance will be coordinated with any explicit REFRESH, it seems to me that at the point a REFRESH starts, any changes which are from commits too late to be included in the snapshot used for the REFRESH would need to block or be queued until completion of the REFRESH (depending on whether the view is using synchronous or asynchronous maintenance), and any unapplied matview deltas from before that could be discarded. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2013/5/17 Kevin Grittner <kgrittn@ymail.com>: > During calculation of the deltas to apply to the matviews, it must > be possible to query the referenced tables from the perspective of > both the "before" and "after" versions of the data. [..] > I don't think the process applying the deltas needs to do > anything unusual or special about acquiring snapshots as it > processes the delta from each transaction, unless we see some way > to optimize things once we're hip-deep in the details. Note that the basic count algorithm assumes real-serializable transactions for correctness. Example: * The example matview’s defining query is a simple join: SELECT * FROM A JOIN B ON A.a = B.b. * For simplicity, assume that the matview is initially empty. * Assume a non real-serializable isolation mode, e.g., REPEATABLE READ. * Assume that there are two transactions T1 and T2 running in parallel, that don’t see each other’s changes. * Both T1 and T2 insert a row in A and B, respectively, in such a way that the combination of the rows is matched by the join condition (i.e., the row inserted by T1 has A.a equal to the B.b of the row inserted by T2). * The count algorithm will not add anything to the matview as part of the execution of T1 (because the new row in A doesn’t join with any rows in B that T1 observes). * Idem for T2 (nothing will be added). * After T1 and T2 completed, a new transaction will see an inconsistency: A and B contain rows that are supposed to result in a row in the matview, but it doesn’t contain any. Each of the following would fix this problem: (1) Real-serializability must be required for all transactions that update any base table of a matview’s that is maintained using the count algorithm. (2) The matview’s defining query and the corresponding base tables must adhere to certain (very commonly satisfied) conditions (for example, if there is an FK between A.a and B.b, the example cannot be made to fail; The FK must not be deferred in the case of right-after-each-statement matview maintenance). (3) The count algorithm must be implemented in a way that understands MVCC internals: Reading the base tables must be done using a technique that reads all rows (i.e., also the ones not visible to the current transaction), and the xmin/xmax/etc of the updated rows in the matview must be set in a smart way, so as to yield visibility results that correspond to the visibility of the “originating” rows in the base tables. Calculation of the matview deltas (especially the part where the base tables are inspected for joining rows) must in this case probably be done in a serialized manner. Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad?
Nicolas Barbier <nicolas.barbier@gmail.com> wrote: > 2013/5/17 Kevin Grittner <kgrittn@ymail.com>: > >> During calculation of the deltas to apply to the matviews, it must >> be possible to query the referenced tables from the perspective of >> both the "before" and "after" versions of the data. > > [..] > >> I don't think the process applying the deltas needs to do >> anything unusual or special about acquiring snapshots as it >> processes the delta from each transaction, unless we see some way >> to optimize things once we're hip-deep in the details. > > Note that the basic count algorithm assumes real-serializable > transactions for correctness. Example: > > * The example matview’s defining query is a simple join: SELECT * FROM > A JOIN B ON A.a = B.b. > * For simplicity, assume that the matview is initially empty. > * Assume a non real-serializable isolation mode, e.g., REPEATABLE READ. > * Assume that there are two transactions T1 and T2 running in > parallel, that don’t see each other’s changes. > * Both T1 and T2 insert a row in A and B, respectively, in such a way > that the combination of the rows is matched by the join condition > (i.e., the row inserted by T1 has A.a equal to the B.b of the row > inserted by T2). > * The count algorithm will not add anything to the matview as part of > the execution of T1 (because the new row in A doesn’t join with any > rows in B that T1 observes). > * Idem for T2 (nothing will be added). > * After T1 and T2 completed, a new transaction will see an > inconsistency: A and B contain rows that are supposed to result in a > row in the matview, but it doesn’t contain any. Good point. It might be hard to detect when this type of race condition exists, since it could be hidden inside a function. Any thoughts on that? > Each of the following would fix this problem: > > (1) Real-serializability must be required for all transactions that > update any base table of a matview’s that is maintained using the > count algorithm. Agreed. I wonder whether we should have an option to constraint DML on a table to serializable transactions, and an option for a matview to flag that its incremental update depends on that. This strategy would automatically detect problems hidden in functions, whether or not they were explicitly recognized issues with the matview definition. It is clear, however, that this cannot be the only supported approach. > (2) The matview’s defining query and the corresponding base tables > must adhere to certain (very commonly satisfied) conditions (for > example, if there is an FK between A.a and B.b, the example cannot be > made to fail; The FK must not be deferred in the case of > right-after-each-statement matview maintenance). Agreed. That would cover a high percentage of cases regardless of transaction isolation level. Knowing when you needed such a constraint to make incremental maintenance safe would be hard, though. > (3) The count algorithm must be implemented in a way that understands > MVCC internals: Reading the base tables must be done using a technique > that reads all rows (i.e., also the ones not visible to the current > transaction), and the xmin/xmax/etc of the updated rows in the matview > must be set in a smart way, so as to yield visibility results that > correspond to the visibility of the “originating” rows in the base > tables. Calculation of the matview deltas (especially the part where > the base tables are inspected for joining rows) must in this case > probably be done in a serialized manner. I will look at this option some more, but on the face of it, I'm not quite following how it would work; and it sounds invasive and fragile. If you know of any paper on this approach you could point me to, or if you could sketch it out in a little more detail, I would appreciate that. While (1) and (2) are straightforward approaches, either of which would clearly address the race conditions, I fear that there are those who would not be happy with a requirement that one or the other of those are required in order to use incremental maintenance -- so it would be nice to have a fall-back strategy. There will, at least for the next several releases, be matviews defined with queries for which we cannot support incremental maintenance -- REFRESH will continue to be the only way to deal with these until we enhance incremental maintenance to support them. For example, in 9.4 I don't intend to support incremental maintenance of matviews defined with recursive queries. To me it makes sense to prohibit incremental maintenance of a matview for which there is no strategy to deal with concurrency problems. Thoughts? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, May 17, 2013 at 4:25 PM, Kevin Grittner <kgrittn@ymail.com> wrote: >> (3) The count algorithm must be implemented in a way that understands >> MVCC internals: Reading the base tables must be done using a technique >> that reads all rows (i.e., also the ones not visible to the current >> transaction), and the xmin/xmax/etc of the updated rows in the matview >> must be set in a smart way, so as to yield visibility results that >> correspond to the visibility of the “originating” rows in the base >> tables. Calculation of the matview deltas (especially the part where >> the base tables are inspected for joining rows) must in this case >> probably be done in a serialized manner. > > I will look at this option some more, but on the face of it, I'm > not quite following how it would work; and it sounds invasive and > fragile. I don't believe it would be that problematic if deltas: 1 - contains all changes, matching join conditions or not, that could potentially alter the matview. This includes the example's alterations since the columns being altered were part of the matview's definition, but it would not cover updates to columns that were not part of the definition (ie: not involved in the join or the selected rows). 2 - each update is marked with its xid Then, "serialization" can be achieved by only applying committed xids, discarding rolled back xids, and evaluating join satisfaction only during the updates, and not during delta logging.
2013/5/17 Kevin Grittner <kgrittn@ymail.com>: > Nicolas Barbier <nicolas.barbier@gmail.com> wrote: > >> Note that the basic count algorithm assumes real-serializable >> transactions for correctness. Example: [..] > Good point. > > It might be hard to detect when this type of race condition exists, > since it could be hidden inside a function. Any thoughts on that? I think that matviews with defining queries that use functions for which the system can not prove whether they are safe for the given incremental maintenance technique (e.g., count algorithm) and situation (e.g., mandatory isolation level), should not be allowed to use incremental maintenance of that type. Although I have only limited practical matview experience (with SQL Server (2005?) only, there it is called “indexed views”): I assume that in most other DBMSs, the queries that can be used for defining incrementally maintained matviews are very limited (they are in SQL Server). >> Each of the following would fix this problem: >> >> (2) The matview’s defining query and the corresponding base tables >> must adhere to certain (very commonly satisfied) conditions (for >> example, if there is an FK between A.a and B.b, the example cannot be >> made to fail; The FK must not be deferred in the case of >> right-after-each-statement matview maintenance). > > Agreed. That would cover a high percentage of cases regardless of > transaction isolation level. Knowing when you needed such a > constraint to make incremental maintenance safe would be hard, > though. Reasoning about it becomes easier when you have a clean definition of which queries you accept, and which ones you don’t (see below). Then you can look at your definition and say “I allow X because it is always correct I don’t allow Y because I didn’t prove it to be always correct (yet).” >> (3) The count algorithm must be implemented in a way that understands >> MVCC internals: Reading the base tables must be done using a technique >> that reads all rows (i.e., also the ones not visible to the current >> transaction), and the xmin/xmax/etc of the updated rows in the matview >> must be set in a smart way, so as to yield visibility results that >> correspond to the visibility of the “originating” rows in the base >> tables. Calculation of the matview deltas (especially the part where >> the base tables are inspected for joining rows) must in this case >> probably be done in a serialized manner. > > I will look at this option some more, but on the face of it, I'm > not quite following how it would work; and it sounds invasive and > fragile. If you know of any paper on this approach you could point > me to, or if you could sketch it out in a little more detail, I > would appreciate that. Sorry, these are my own musings: I have never found any paper about it. After a bit more thought, I think it wouldn’t be so easy for the count algorithm: It would be necessary to express (using physical MVCC rows) something like “you can only see this (logical) row if both transaction T1 and T2 are visible to you.” Disjunction would be easy: create two physical rows with the same data, one that is visible if T1 is visible, and one that is visible if T2 is visible. Unfortunately, we need conjunction here :-(. (Upgrading the expressiveness of the MVCC-related information in the rows would help, but as that is more invasive than anyone wants to think about, I restrain those thoughts to the confines my own head ;-).) OTOH, I think that this technique could be used in the case of simple “aggregation” matviews (without joins, or *maybe* when all joins are constrained by FKs). (I assume an implementation such as the one that I sketched in a previous post[1] a few months back.) [1] <URL:http://www.postgresql.org/message-id/CAP-rdTY+VZ8T_8JBvsksDRdz-_wr5ef80M-_UKgu5bHw+rkjpg@mail.gmail.com> > There will, at least for the next several releases, be matviews > defined with queries for which we cannot support incremental > maintenance -- REFRESH will continue to be the only way to deal > with these until we enhance incremental maintenance to support them. > For example, in 9.4 I don't intend to support incremental > maintenance of matviews defined with recursive queries. To me it > makes sense to prohibit incremental maintenance of a matview for > which there is no strategy to deal with concurrency problems. +1 I think you need a strict definition, specific to your incremental maintenance algorithm, of which queries you accept and which ones you don’t. Such a definition could be based on SQL syntax, or on any of the intermediate forms that are produced by the parser/rewriter/analyzer/planner/etc. I think that basing your definition on a later processing stage (e.g., after subqueries are flattened, after views are expanded, or after “redundant” predicates are removed) would increase the amount of queries that are allowed, but might make it somewhat more difficult for a user to predict whether a query allows incremental mantenance or not. (It might also make your definition depend on “internal stuff” that might change from one release to the next.) Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad?
Nicolas Barbier <nicolas.barbier@gmail.com> wrote: > 2013/5/17 Kevin Grittner <kgrittn@ymail.com>: >> Nicolas Barbier <nicolas.barbier@gmail.com> wrote: >> >>> Note that the basic count algorithm assumes real-serializable >>> transactions for correctness. Example: > > [..] > >> Good point. >> >> It might be hard to detect when this type of race condition exists, >> since it could be hidden inside a function. Any thoughts on that? > > I think that matviews with defining queries that use functions for > which the system can not prove whether they are safe for the given > incremental maintenance technique (e.g., count algorithm) and > situation (e.g., mandatory isolation level), should not be allowed to > use incremental maintenance of that type. It's clear enough that any IMMUTABLE function should be safe, but for anything less I don't see how we could trust it short of having a "white list" of which functions are known to be safe, or creating a new property of for functions. Not something I want to require up front; maybe later.... > Although I have only limited practical matview experience (with SQL > Server (2005?) only, there it is called “indexed views”): I assume > that in most other DBMSs, the queries that can be used for defining > incrementally maintained matviews are very limited (they are in SQL > Server). In researching this feature I have found a few comments by Oracle users that when incremental maintenance was first added (in 1999) the queries which it could support were severely limited, but that they have expanded coverage with each major release. People don't seem to have too many complaints about what is covered now; although how much of that is adapting to the limits and how much is due to expansion of the limits is not clear. You would think that somewhere they would define what the limits for a given version are, but I have not stumbled across anything like that. >>> Each of the following would fix this problem: >>> >>> (2) The matview’s defining query and the corresponding base tables >>> must adhere to certain (very commonly satisfied) conditions (for >>> example, if there is an FK between A.a and B.b, the example cannot be >>> made to fail; The FK must not be deferred in the case of >>> right-after-each-statement matview maintenance). >> >> Agreed. That would cover a high percentage of cases regardless of >> transaction isolation level. Knowing when you needed such a >> constraint to make incremental maintenance safe would be hard, >> though. > > Reasoning about it becomes easier when you have a clean definition of > which queries you accept, and which ones you don’t (see below). Then > you can look at your definition and say “I allow X because it is > always correct I don’t allow Y because I didn’t prove it to be always > correct (yet).” Agreed. I'm just haven't figured out precisely how concurrency issues adjust those boundaries yet. >>> (3) The count algorithm must be implemented in a way that understands >>> MVCC internals: Reading the base tables must be done using a technique >>> that reads all rows (i.e., also the ones not visible to the current >>> transaction), and the xmin/xmax/etc of the updated rows in the matview >>> must be set in a smart way, so as to yield visibility results that >>> correspond to the visibility of the “originating” rows in the base >>> tables. Calculation of the matview deltas (especially the part where >>> the base tables are inspected for joining rows) must in this case >>> probably be done in a serialized manner. >> >> I will look at this option some more, but on the face of it, I'm >> not quite following how it would work; and it sounds invasive and >> fragile. If you know of any paper on this approach you could point >> me to, or if you could sketch it out in a little more detail, I >> would appreciate that. > > Sorry, these are my own musings: I have never found any paper > about it. > [musings] Will review and ponder these and what Claudio offered, and see where that takes me. Thanks to both of you for the input! -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin, > The count_t column saves us from having to scan foo for all the old > val values. It does not require any scan of the entire bar > matview. It allows us to zero in on exactly the right rows, and > lets us know what needs doing. This sounds like a fairly good approach. It would require a couple of things though: 1) admins would need to be able to enable and disable incremental updating of matviews, so that if the creation of delta tables is bogging down writes, they can disable them temporarily as a performance workaround. 2) if an admin enables incremental updating of an existing matview, it would need to be REFRESHed immediately before we could start incrementally updating it. So an ENABLE should call a REFRESH. > Hopefully this makes it fairly clear that the only thing that an > optimization around the "append-only" assertion for a matview would > be the ability to skip the probe for an existing record *related to > rows which are in the delta*. As long as there is reasonable > indexing on the matview, maintenance for the append-only case would > not involve scanning the entire matview. Yeah, given what you've explained, my first inclination would be just to let the counting do its thing and see how slow/expensive it is before we try further optimizations. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> wrote: > This sounds like a fairly good approach. It would require a > couple of things though: > > 1) admins would need to be able to enable and disable incremental > updating of matviews, so that if the creation of delta tables is > bogging down writes, they can disable them temporarily as a > performance workaround. Yes. This is the sort of thing I plan to put on ALTER MATERIALIZED VIEW and perhaps an ALTER TABLE option which allows the table to generate deltas. Turning off the table option should probably have a CASCADE option with the sort of user feedback DROP options have. Some other products have AMV options that do things like allow an attempt to scan a matview automatically cause a REFRESH if referenced tables have changed since the last REFRESH. That seems like something to defer until we have a more sophisticated notion of freshness (or staleness -- if we prefer to view the glass as half-empty). > 2) if an admin enables incremental updating of an existing > matview, it would need to be REFRESHed immediately before we > could start incrementally updating it. So an ENABLE should call > a REFRESH. Right. We would need to coordinate the start of the incremental maintenance with the snapshot used for the REFRESH. As far as I can tell, it could be either type of REFRESH (truncate and create a new heap or create the new data contents in a temporary relation and use a differential DELETE and INSERT technique), as long as the snapshot used for determining the new contents is used to determine which transactions can provide deltas. > Yeah, given what you've explained, my first inclination would be > just to let the counting do its thing and see how slow/expensive > it is before we try further optimizations. Agreed. The counting algorithm itself has some optional optimizations that I'm not sure we'll get to in 9.4. Anything this specialized should be evaluated only after we have all the more "generic" optimizations in place. Thanks for the feedback! -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2013/5/17 Claudio Freire <klaussfreire@gmail.com>: > On Fri, May 17, 2013 at 4:25 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > >>> (3) The count algorithm must be implemented in a way that understands >>> MVCC internals: Reading the base tables must be done using a technique >>> that reads all rows (i.e., also the ones not visible to the current >>> transaction), and the xmin/xmax/etc of the updated rows in the matview >>> must be set in a smart way, so as to yield visibility results that >>> correspond to the visibility of the “originating” rows in the base >>> tables. Calculation of the matview deltas (especially the part where >>> the base tables are inspected for joining rows) must in this case >>> probably be done in a serialized manner. >> >> I will look at this option some more, but on the face of it, I'm >> not quite following how it would work; and it sounds invasive and >> fragile. > > I don't believe it would be that problematic if deltas: > > 1 - contains all changes, matching join conditions or not, that could > potentially alter the matview. This includes the example's alterations > since the columns being altered were part of the matview's definition, > but it would not cover updates to columns that were not part of the > definition (ie: not involved in the join or the selected rows). > 2 - each update is marked with its xid > > Then, "serialization" can be achieved by only applying committed xids, > discarding rolled back xids, and evaluating join satisfaction only > during the updates, and not during delta logging. I think that your idea of postponing some of the processing to later commits might indeed make it possible to implement this without needing the “MVCC-expressiveness enhancement” that I mentioned in my previous post (by postponing the application to the matview until at least *someone* must be able to see that particular change). Right now I have some difficulties working out the MVCC-details of what to do when joining a base table delta row with the contents of the other base tables (to create the matview delta), but I’m probably just tired :-). In any case, your proposal seems to require the following additional “infrastructural changes” relative to Kevin’s proposal: (1) The base table deltas must be shared by all “in flight” transactions that are in the process of putting stuff in them and using them to apply changes to the matviews. (2) Putting stuff in base table deltas must be transactionally safe (unless all matviews that cares about that specific base table delta are unlogged). (3) “Right-after-each-statement matview maintenance” is not possible (without additional effort). Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad?