Thread: Drastic performance loss in assert-enabled build in HEAD
Using HEAD's pg_dump, I see "pg_dump -s regression" taking 5 seconds. On the other hand, running the same executable against the regression database on a 9.2 postmaster takes 1.2 seconds. Looks to me like we broke something performance-wise. A quick check with oprofile says it's all AllocSetCheck's fault: samples % image name symbol name 87777 83.6059 postgres AllocSetCheck 1140 1.0858 postgres base_yyparse 918 0.8744 postgres AllocSetAlloc 778 0.7410 postgres SearchCatCache 406 0.3867 postgres pg_strtok 394 0.3753 postgres hash_search_with_hash_value 387 0.3686 postgres core_yylex 373 0.3553 postgres MemoryContextCheck 256 0.2438 postgres nocachegetattr 231 0.2200 postgres ScanKeywordLookup 207 0.1972 postgres palloc So maybe I'm nuts to care about the performance of an assert-enabled backend, but I don't really find a 4X runtime degradation acceptable, even for development work. Does anyone want to fess up to having caused this, or do I need to start tracking down what changed? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Using HEAD's pg_dump, I see "pg_dump -s regression" taking 5 > seconds. > On the other hand, running the same executable against the regression > database on a 9.2 postmaster takes 1.2 seconds. Looks to me like we > broke something performance-wise. > > A quick check with oprofile says it's all AllocSetCheck's fault: > > samples % image name symbol name > 87777 83.6059 postgres AllocSetCheck > 1140 1.0858 postgres base_yyparse > 918 0.8744 postgres AllocSetAlloc > 778 0.7410 postgres SearchCatCache > 406 0.3867 postgres pg_strtok > 394 0.3753 postgres hash_search_with_hash_value > 387 0.3686 postgres core_yylex > 373 0.3553 postgres MemoryContextCheck > 256 0.2438 postgres nocachegetattr > 231 0.2200 postgres ScanKeywordLookup > 207 0.1972 postgres palloc > > So maybe I'm nuts to care about the performance of an assert-enabled > backend, but I don't really find a 4X runtime degradation acceptable, > even for development work. Does anyone want to fess up to having caused > this, or do I need to start tracking down what changed? I checked master HEAD for a dump of regression and got about 4 seconds. I checked right after my initial push of matview code and got 2.5 seconds. I checked just before that and got 1 second. There was some additional pg_dump work for matviews after the initial push which may or may not account for the rest of the time. Investigating now. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So maybe I'm nuts to care about the performance of an assert-enabled >> backend, but I don't really find a 4X runtime degradation acceptable, >> even for development work.� Does anyone want to fess up to having caused >> this, or do I need to start tracking down what changed? > I checked master HEAD for a dump of regression and got about 4 > seconds.� I checked right after my initial push of matview code and > got 2.5 seconds.� I checked just before that and got 1 second. > There was some additional pg_dump work for matviews after the > initial push which may or may not account for the rest of the time. I poked at this a bit, and eventually found that the performance differential goes away if I dike out the pg_relation_is_scannable() call in getTables()'s table-collection query. What appears to be happening is that those calls cause a great many more relcache entries to be made than used to happen, plus many entries are made earlier in the run than they used to be. Many of those entries have subsidiary memory contexts, which results in an O(N^2) growth in the time spent in AllocSetCheck, since postgres.c does MemoryContextCheck(TopMemoryContext) once per received command, and pg_dump will presumably issue O(N) commands in an N-table database. So one question that brings up is whether we need to dial back the amount of work done in memory context checking, but I'm loath to do so in development builds. That code has fingered an awful lot of bugs. OTOH, if the number of tables in the regression DB continues to grow, we may have little choice. Anyway, the immediate takeaway is that this represents a horribly expensive way for pg_dump to find out which matviews aren't scannable. The cheap way for it to find out would be if we had a boolean flag for that in pg_class. Would you review the bidding as to why things were done the way they are? Because in general, having to ask the kernel something is going to suck in any case, so basing it on the size of the disk file doesn't seem to me to be basically a good thing. We could alleviate the pain by changing pg_dump's query to something like (case when c.relkind = 'm' then pg_relation_is_scannable(c.oid) else false end) but TBH this feels like bandaging a bad design. Another reason why I don't like this code is that pg_relation_is_scannable is broken by design: relid = PG_GETARG_OID(0);relation = RelationIdGetRelation(relid);result = relation->rd_isscannable;RelationClose(relation); You can't do that: if the relcache entry doesn't already exist, this will try to construct one while not holding any lock on the relation, which is subject to all sorts of race conditions. (In general, direct calls on RelationIdGetRelation are probably broken. I see you introduced more than one such in the matviews patch, and I'm willing to bet they are all unsafe.) regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Kevin Grittner <kgrittn@ymail.com> writes: >> Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> So maybe I'm nuts to care about the performance of an >>> assert-enabled backend, but I don't really find a 4X runtime >>> degradation acceptable, even for development work. Does anyone >>> want to fess up to having caused this, or do I need to start >>> tracking down what changed? > >> I checked master HEAD for a dump of regression and got about 4 >> seconds. I checked right after my initial push of matview code >> and got 2.5 seconds. I checked just before that and got 1 >> second. There was some additional pg_dump work for matviews >> after the initial push which may or may not account for the rest >> of the time. > > I poked at this a bit, and eventually found that the performance > differential goes away if I dike out the > pg_relation_is_scannable() call in getTables()'s table-collection > query. What appears to be happening is that those calls cause a > great many more relcache entries to be made than used to happen, > plus many entries are made earlier in the run than they used to > be. Many of those entries have subsidiary memory contexts, which > results in an O(N^2) growth in the time spent in AllocSetCheck, > since postgres.c does MemoryContextCheck(TopMemoryContext) once > per received command, and pg_dump will presumably issue O(N) > commands in an N-table database. > > So one question that brings up is whether we need to dial back > the amount of work done in memory context checking, but I'm loath > to do so in development builds. That code has fingered an awful > lot of bugs. OTOH, if the number of tables in the regression DB > continues to grow, we may have little choice. > > Anyway, the immediate takeaway is that this represents a horribly > expensive way for pg_dump to find out which matviews aren't > scannable. The cheap way for it to find out would be if we had a > boolean flag for that in pg_class. Would you review the bidding > as to why things were done the way they are? Because in general, > having to ask the kernel something is going to suck in any case, > so basing it on the size of the disk file doesn't seem to me to > be basically a good thing. The early versions of the patch had a boolean in pg_class to track this, but I got complaints from Robert and Noah (and possibly others?) that this got too ugly in combination with the support for unlogged matviews, and they suggested the current approach. For an unlogged matview we need to replace the heap (main fork) with the init fork before the database is up and running, so there would need to be some way for that to result in forcing the flag in pg_class. I was attempting to do that when a matview which was flagged as scannable in pg_class was opened, but I gotta agree that it wasn't pretty. Noah suggested a function based on testing the heap to see if it looked like the init fork, and the current state of affairs I my attempt to make it work that way. We could definitely minimize the overhead by only testing this for matviews. It seemed potentially useful for the function to have some purpose for other types of relations, so I was trying to avoid that. Maybe that was a bad idea. > We could alleviate the pain by changing pg_dump's query to > something like > > (case when c.relkind = 'm' > then pg_relation_is_scannable(c.oid) > else false end) Yeah, that's the sort of thing I was thinking of. If we're going to go that way, we may want to touch one or two other spots. > but TBH this feels like bandaging a bad design. So far nobody has been able to suggest a better way to support both unlogged matviews and some way to prevent a matview from being used if it does not contain materialized data for its query from *some* point in time. Suggestions welcome. > Another reason why I don't like this code is that > pg_relation_is_scannable is broken by design: > > relid = PG_GETARG_OID(0); > relation = RelationIdGetRelation(relid); > result = relation->rd_isscannable; > RelationClose(relation); > > You can't do that: if the relcache entry doesn't already exist, > this will try to construct one while not holding any lock on the > relation, which is subject to all sorts of race conditions. Hmm. I think I had that covered earlier but messed up in rearranging to respond to review comments. Will review both new calling locations. > In general, direct calls on RelationIdGetRelation are probably > broken. If valid contexts for use of the function are that limited, it might merit a comment where the function is defined. I'm not sure what a good alternative to this would be. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> wrote: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Another reason why I don't like this code is that >> pg_relation_is_scannable is broken by design: >> >> relid = PG_GETARG_OID(0); >> relation = RelationIdGetRelation(relid); >> result = relation->rd_isscannable; >> RelationClose(relation); >> >> You can't do that: if the relcache entry doesn't already exist, >> this will try to construct one while not holding any lock on the >> relation, which is subject to all sorts of race conditions. > > Hmm. I think I had that covered earlier but messed up in > rearranging to respond to review comments. Will review both new > calling locations. For the SQL-level function, does this look OK?: diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c index d589d26..94e55f0 100644 --- a/src/backend/utils/adt/dbsize.c +++ b/src/backend/utils/adt/dbsize.c @@ -850,9 +850,13 @@ pg_relation_is_scannable(PG_FUNCTION_ARGS) bool result; relid = PG_GETARG_OID(0); - relation = RelationIdGetRelation(relid); + relation = try_relation_open(relid, AccessShareLock); + + if (relation == NULL) + PG_RETURN_BOOL(false); + result = relation->rd_isscannable; - RelationClose(relation); + relation_close(relation, AccessShareLock); PG_RETURN_BOOL(result); } I think the call in ExecCheckRelationsScannable() is safe because it comes after the tables are all already locked. I put it there so that the appropriate lock strength should be used based on the whether it was locked by ExecInitNode() or something before it. Am I missing something? Can I not count on the lock being held at that point? Would the right level of API here be relation_open() with NoLock rather than RelationIdGetRelation()? Or is there some other call which is more appropriate there? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
[ sorry for being slow to respond, things are crazy this week ] Kevin Grittner <kgrittn@ymail.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Anyway, the immediate takeaway is that this represents a horribly >> expensive way for pg_dump to find out which matviews aren't >> scannable. The cheap way for it to find out would be if we had a >> boolean flag for that in pg_class. Would you review the bidding >> as to why things were done the way they are? Because in general, >> having to ask the kernel something is going to suck in any case, >> so basing it on the size of the disk file doesn't seem to me to >> be basically a good thing. > The early versions of the patch had a boolean in pg_class to track > this, but I got complaints from Robert and Noah (and possibly > others?) that this got too ugly in combination with the support for > unlogged matviews, and they suggested the current approach. Meh. I don't think that we should allow that corner case to drive the design like this. I would *far* rather not have unlogged matviews at all than this boondoggle. I'm not even convinced that the existing semantics are desirable: who's to say that having an unlogged matview revert to empty on crash renders it invalid? If it's a summary of underlying unlogged tables, then that's actually the right thing. (If someone is desperately unhappy with such a behavior, why are they insisting on it being unlogged, anyway?) If we go with this implementation, I think we are going to be painting ourselves into a corner that will be difficult or impossible to get out of, because the scannability state of a matview is not just a matter of catalog contents but of on-disk files. That will make it difficult to impossible for pg_upgrade to cope reasonably with any future rethinking of scannability status. In fact, I'm going to go further and say that I do not like the entire concept of scannability, either as to design or implementation, and I think we should just plain rip it out. We can rethink that for 9.4 or later, but what we've got right now is a kluge, and I don't want to find ourselves required to be bug-compatible with it forevermore. To take just the most salient point: assuming that we've somehow determined that some matview is too out-of-date to be usable, why is the appropriate response to that to cause queries on it to fail altogether? That seems like about the least useful feature that could be devised. Why not, say, have queries fall back to expanding the view definition as though it were a regular view? I think matviews without any scannability control are already a pretty useful feature, and one I'd not be ashamed to ship just like that for 9.3. But the scannability stuff is clearly going to need to be revisited. IMO, driving a stake in the ground at the exact spot that this stake is placed is not a good plan. If we simply remove that aspect altogether for now, I think we'll have more room to maneuver in future releases. I apologize for not having griped about this earlier, but I've really paid no attention to the matviews work up to now; there are only so many cycles in a day. regards, tom lane
On Wed, Apr 3, 2013 at 10:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > In fact, I'm going to go further and say that I do not like the entire > concept of scannability, either as to design or implementation, and > I think we should just plain rip it out. This has been my feeling from the beginning, so I'm happy to support this position. I think the current version - where scan-ability is tracked in just one way - is an improvement over the previous version of the patch - where it was tracked in two different ways with a confusing shuffle of information from one place to the other. But my favorite number of places to track it would be zero. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Apr 3, 2013 at 10:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> In fact, I'm going to go further and say that I do not like the entire >> concept of scannability, either as to design or implementation, and >> I think we should just plain rip it out. > This has been my feeling from the beginning, so I'm happy to support > this position. I think the current version - where scan-ability is > tracked in just one way - is an improvement over the previous version > of the patch - where it was tracked in two different ways with a > confusing shuffle of information from one place to the other. But my > favorite number of places to track it would be zero. To be clear, I think we'll end up tracking some notion of scannability eventually. I just don't think the current notion is sufficiently baked to want to promise to be upward-compatible with it in future. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> In fact, I'm going to go further and say that I do not like the >>> entire concept of scannability, either as to design or >>> implementation, and I think we should just plain rip it out. > >> This has been my feeling from the beginning, so I'm happy to >> support this position. I think the current version - where >> scan-ability is tracked in just one way - is an improvement over >> the previous version of the patch - where it was tracked in two >> different ways with a confusing shuffle of information from one >> place to the other. But my favorite number of places to track it >> would be zero. > > To be clear, I think we'll end up tracking some notion of > scannability eventually. I just don't think the current notion > is sufficiently baked to want to promise to be upward-compatible > with it in future. To be honest, I don't think I've personally seen a single use case for matviews where they could be used if you couldn't count on an error if attempting to use them without the contents reflecting a materialization of the associated query at *some* point in time. There probably are such, but I think removing this entirely takes the percentage of use cases covered by the implementation in this release down from 10% to 2%. Consider where the Wisconsin Courts use "home-grown" materialized views currently: (1) On the public web site for circuit court data, visibility is based on supreme court rules and the advice of a committee consisting of judges, representatives of the press, defense attorneys, prosecuting attorneys, etc. There are cases in the database which, for one reason or another, should not show up on the public web site. On a weekly basis, where monitoring shows the lowest usage, the table of cases which are "too old" to be shown according to the rules thus determined is regenerated. If there was the possibility that a dump and load could fail to fill this, and the queries would run without error, they could not move from ad hoc matview techniques to the new feature without excessive risk. (2) Individual judges have a "dashboard" showing such things as the number of court cases which have gone beyond certain thresholds without action. They can "drill down" to detail so that cases which have "slipped through the cracks" can be scheduled for some appropriate action. "Justice delayed..." and all of that. It would be much better to get an error which would result in "information currently unavailable" than to give the impression that there are no such cases. Since the main point of this patch is to get the basis for a more complete matview feature into place while still supporting *some* use cases, and a high priority needs to be place on not painting ourselves into a corner, I agree we should rip this out if we think it does so. I have spent some time looking at what we will want to add in future releases, and a more sophisticated concept of what is "fresh" enough to allow use of the materialized data is certainly on the list, and falling back to running the underlying query like a "normal" view would be a reasonable option to be able to choose in some cases; but I think that will *always* need to start with information about whether data *has* been generated, and an empty set has to be considered valid data if it has been. If we come up with a way to track that which isn't too fragile, I'm confident that it will remain useful as the feature evolves. Making sure that the heap has at least one page if data has been generated seems like a not-entirely-unreasonable way to do that, although there remains at least one vacuum bug to fix if we keep it, in addition to Tom's concerns. It has the advantage of playing nicely with unlogged tables. If this is not going to be what we use long term, do we have a clue what is? Personally, I think it would be sad to reduce the use cases for which matviews in 9.3 can be used to those which can tolerate an error-free reference to a matview for which data has not been generated, but I'll rip out support for that distinction if that is the consensus. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> In fact, I'm going to go further and say that I do not like the >>> entire concept of scannability, either as to design or >>> implementation, and I think we should just plain rip it out. > To be honest, I don't think I've personally seen a single use case > for matviews where they could be used if you couldn't count on an > error if attempting to use them without the contents reflecting a > materialization of the associated query at *some* point in time. Well, if we remove the WITH NO DATA clause from CREATE MATERIALIZED VIEW, that minimum requirement is satisfied no? I wouldn't actually want to remove that option, because pg_dump will need it to avoid circular-reference problems. But if you simply don't use it then you have the minimum guarantee. And I do not see where the current implementation is giving you any more guarantees. What it *is* doing is setting a rather dubious behavioral precedent that we will no doubt hear Robert complaining about when (not if) we decide we don't want to be backward-compatible with it anymore. Granting that throwing an error is actually of some use to some people, I would not think that people would want to turn it on via a command that throws away the existing view contents altogether, nor turn it off with a full-throated REFRESH. There are going to need to be ways to incrementally update matviews, and ways to disable/enable access that are not tied to a complete rebuild, not to mention being based on user-determined rather than hard-wired criteria for what's too stale. So I don't think this is a useful base to build on. If you feel that scannability disable is an absolute must for version 0, let's invent a matview reloption or some such to implement it and let users turn it on and off as they wish. That seems a lot more likely to still be useful two years from now. And if you're absolutely convinced that unlogged matviews mustn't work as I suggest, we can lose those from 9.3, too. What I'd actually rather see us spending time on right now is making some provision for incremental updates, which I will boldly propose could be supported by user-written triggers on the underlying tables if we only diked out the prohibitions against INSERT/UPDATE/DELETE on matviews, and allowed them to operate on a matview's contents just like it was a table. Now admittedly that would foreclose allowing matviews to be updatable in the updatable-view sense, but that's a feature I would readily give up if it meant users could build incremental update mechanisms this year and not two years down the road. > (1) On the public web site for circuit court data, visibility is > based on supreme court rules and the advice of a committee > consisting of judges, representatives of the press, defense > attorneys, prosecuting attorneys, etc.� There are cases in the > database which, for one reason or another, should not show up on > the public web site.� On a weekly basis, where monitoring shows the > lowest usage, the table of cases which are "too old" to be shown > according to the rules thus determined is regenerated.� If there > was the possibility that a dump and load could fail to fill this, > and the queries would run without error, they could not move from > ad hoc matview techniques to the new feature without excessive > risk. Why exactly do you think that what I'm suggesting would cause a dump and reload to not regenerate the data? (Personally, I think pg_dump ought to have an option selecting whether or not to repopulate matviews, but again, if that's not what you want *don't use it*.) > (2) Individual judges have a "dashboard" showing such things as the > number of court cases which have gone beyond certain thresholds > without action.� They can "drill down" to detail so that cases > which have "slipped through the cracks" can be scheduled for some > appropriate action.� "Justice delayed..." and all of that.� It > would be much better to get an error which would result in > "information currently unavailable" than to give the impression > that there are no such cases. If you need 100% accuracy, which is what this scenario appears to be demanding, matviews are not for you (yet). The existing implementation certainly is nowhere near satisfying this scenario. > ... Making sure that > the heap has at least one page if data has been generated seems > like a not-entirely-unreasonable way to do that, although there > remains at least one vacuum bug to fix if we keep it, in addition > to Tom's concerns. No. This is an absolute disaster. It's taking something we have always considered to be an irrelevant implementation detail and making it into user-visible DDL state, despite the fact that it doesn't begin to satisfy basic transactional behaviors. We *need* to get rid of that aspect of things. If you must have scannability state in version 0, okay, but it has to be a catalog property not this. > It has the advantage of playing nicely with > unlogged tables. If this is not going to be what we use long term, > do we have a clue what is? I don't, but I don't think I'm required to propose something, given that (a) we can certainly ship 9.3 without unlogged matviews, and (b) you still haven't convinced me that the current semantics for unlogged matviews are a requirement anyway. Again, somebody who doesn't want his matviews to go to empty on crash simply shouldn't use an unlogged matview. regards, tom lane
2013/4/3 Tom Lane <tgl@sss.pgh.pa.us>: > Kevin Grittner <kgrittn@ymail.com> writes: > >> To be honest, I don't think I've personally seen a single use case >> for matviews where they could be used if you couldn't count on an >> error if attempting to use them without the contents reflecting a >> materialization of the associated query at *some* point in time. > > Well, if we remove the WITH NO DATA clause from CREATE MATERIALIZED > VIEW, that minimum requirement is satisfied no? An argument against that is that computing the contents may be very expensive. > Granting that throwing an error is actually of some use to some people, > I would not think that people would want to turn it on via a command > that throws away the existing view contents altogether, nor turn it off > with a full-throated REFRESH. There are going to need to be ways to > incrementally update matviews, and ways to disable/enable access that > are not tied to a complete rebuild, not to mention being based on > user-determined rather than hard-wired criteria for what's too stale. > So I don't think this is a useful base to build on. Am I correct when I think that you are saying here, that the “zero pages == unscannable” logic is not very future-proof? In that case I concur, and I also think that this knowledge leaks in way too many other places (the VACUUM bug mentioned by Kevin is a good example). > If you feel that scannability disable is an absolute must for version 0, > let's invent a matview reloption or some such to implement it and let > users turn it on and off as they wish. That seems a lot more likely > to still be useful two years from now. (In the context of making an unlogged matview unscannable after a crash:) Is it imaginable that such a reloption could (in a future implementation) be changed during or right after crash recovery? For example, by storing the set of “truncated by crash recovery” relations in a shared catalog table, which is then inspected when connecting to a database to continue the truncation (in the case of a matview by making it unscannable)? > And if you're absolutely convinced that unlogged matviews mustn't work as I > suggest, we can lose those from 9.3, too. +1. Having unlogged matviews without having incremental updates yet, isn’t super useful anyway. > What I'd actually rather see us spending time on right now is making > some provision for incremental updates, which I will boldly propose > could be supported by user-written triggers on the underlying tables > if we only diked out the prohibitions against INSERT/UPDATE/DELETE on > matviews, and allowed them to operate on a matview's contents just like > it was a table. Now admittedly that would foreclose allowing matviews > to be updatable in the updatable-view sense, but that's a feature I > would readily give up if it meant users could build incremental update > mechanisms this year and not two years down the road. Please make the syntax for updating the “extent” (physical representation) of a matview different from updating the view’s logical contents. Examples: (1) Require to use a special function to update the extent: SELECT pg_mv_maintain('INSERT INTO example_matview ...'); While parsing the INSERT, the parser would know that it must interpret “example_matview” as the matview’s extent; As currently the extent and the view are the same, nothing must be done except for only allowing the INSERT when it is parsed in the context of pg_mv_maintain, and otherwise saying that matviews aren’t updatable yet (“NOTICE: did you mean to update the extent? in that case use pg_mv_maintain”). (2) Use a different schema (cf. TOAST) for the extent, e.g., view “public.example_matview” vs. extent “pg_mv_extent.example_matview”. I imagine future implementations to possibly require multiple extents anyway, e.g., for storing the “not yet applied changesets” or other intermediate things. > Why exactly do you think that what I'm suggesting would cause a dump and > reload to not regenerate the data? Expensiveness: Matviews are used in cases where the data is expensive to compute. > We *need* to get rid of that aspect of things. If you must have > scannability state in version 0, okay, but it has to be a catalog property > not this. +1 Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad?
Early versions of the matview patch had a relisvalid flag in pg_class to determine whether the relation was scannable. The name was chosen based on a similarity to the purpose of indisvalid, although the proliferation of new bools for related issues left me wondering if a "char" would be a better idea. Based on on-list reviews I removed that in favor of basing the state on a zero-length heap file, in an attempt to work better with unlogged matviews. I can easily look back through my development branch to find the commits which made this change and revert them if this approach is preferred. I realize this would need to be Real Soon Now, but before reverting to the earlier code I want to allow a *little* more time for opinions. Responses to particular points below. Tom Lane <tgl@sss.pgh.pa.us> wrote: > Granting that throwing an error is actually of some use to some people, > I would not think that people would want to turn it on via a command > that throws away the existing view contents altogether, nor turn it off > with a full-throated REFRESH. There is no doubt that in future versions we will want to be able to disallow scans based on other criteria than just whether the data is valid as of *some* point in time. I can't imagine a circumstance under which we would want to allow scans if it doesn't. So, at least for this release and as a default for future releases, I think it makes sense that a matview last CREATEd or REFRESHed WITH NO DATA should not be scannable. Additional knobs for the users to control this can and should be added in future releases. > There are going to need to be ways to incrementally update > matviews, and ways to disable/enable access that are not tied to > a complete rebuild, not to mention being based on user-determined > rather than hard-wired criteria for what's too stale. Absolutely. So far as I know, nobody has ever suggested or expected otherwise. This paper provides a useful summary of techniques for incremental updates, with references to more detailed papers: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.40.2254&rep=rep1&type=pdf I expect to be working on implementing the most obvious special cases and a "catch-all" general solution for 9.4. Efficient incremental update seems to depend on the ability to have at least one new "hidden" system column, so once we get to a good point for discussing 9.4 issues, I'll be on about that. > If you feel that scannability disable is an absolute must for version 0, > let's invent a matview reloption or some such to implement it and let > users turn it on and off as they wish. That seems a lot more likely > to still be useful two years from now. What about the idea of a relisvalid bool or some "char" column in pg_class? > And if you're absolutely convinced that unlogged matviews mustn't > work as I suggest, we can lose those from 9.3, too. I was loath to do that, but as Nicolas points out, they really aren't that interesting without incremental update. Perhaps it is better to drop those until next release and have a more sophisticated way to deal with invalidation of those -- or as you suggested, just punt them to be empty. (I would hate to do that, but it might be the lesser of evils.) > What I'd actually rather see us spending time on right now is making > some provision for incremental updates, which I will boldly propose > could be supported by user-written triggers on the underlying tables > if we only diked out the prohibitions against INSERT/UPDATE/DELETE on > matviews, and allowed them to operate on a matview's contents just like > it was a table. Now admittedly that would foreclose allowing matviews > to be updatable in the updatable-view sense, but that's a feature I > would readily give up if it meant users could build incremental update > mechanisms this year and not two years down the road. I think that should be the last resort, after we have a more automated declarative way of maintaining the majority of cases. Since I don't see too much there where using that technique with matviews would give you more than you could do right now with tables and triggers, hand-coding the incremental maintenance is very low on my list of priorities. In any event, I don't want to rush into any such thing this close to release; that seems to me to be clearly 9.4 material. >> Individual judges have a "dashboard" showing such things as the >> number of court cases which have gone beyond certain thresholds >> without action > If you need 100% accuracy, which is what this scenario appears to be > demanding, matviews are not for you (yet). The existing implementation > certainly is nowhere near satisfying this scenario. No, we're talking about timelines measured in weeks or months. A nightly update is acceptable, although there are occasional gripes by a judge that they would like to see the results of cleaning up such cases sooner than the next morning. An hour latency would probably make them happy. If async incremental update could give a new view of things in five or ten minutes they would be overjoyed. >> ... Making sure that the heap has at least one page if data has >> been generated seems like a not-entirely-unreasonable way to do >> that, although there remains at least one vacuum bug to fix if >> we keep it, in addition to Tom's concerns. > > No. This is an absolute disaster. It's taking something we have always > considered to be an irrelevant implementation detail and making it into > user-visible DDL state, despite the fact that it doesn't begin to satisfy > basic transactional behaviors. We *need* to get rid of that aspect of > things. OK, I'm convinced. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)
From
Noah Misch
Date:
Subject updated to account for the wider topics now appearing. On Wed, Apr 03, 2013 at 05:49:18PM -0400, Tom Lane wrote: > What I'd actually rather see us spending time on right now is making > some provision for incremental updates, which I will boldly propose > could be supported by user-written triggers on the underlying tables > if we only diked out the prohibitions against INSERT/UPDATE/DELETE on > matviews, and allowed them to operate on a matview's contents just like > it was a table. Now admittedly that would foreclose allowing matviews > to be updatable in the updatable-view sense, but that's a feature I > would readily give up if it meant users could build incremental update > mechanisms this year and not two years down the road. I can't see taking MVs in the direction of allowing arbitrary DML; that's what tables are for. Users wishing to do that should keep using current methods: CREATE VIEW mv_impl AS SELECT ...; CREATE TABLE mv AS SELECT * FROM mv_impl; -- REFRESH analogue: fancier approaches existBEGIN; TRUNCATE mv; INSERT INTO mv SELECT * FROM mv_impl; COMMIT; If anything, I'd help these users by introducing mechanisms for obtaining a TRUNCATE;INSERT with the bells and whistles of REFRESH MATERIALIZED VIEW. Namely, bulk index rebuilds, skipping WAL, and frozen tuples. > > ... Making sure that > > the heap has at least one page if data has been generated seems > > like a not-entirely-unreasonable way to do that, although there > > remains at least one vacuum bug to fix if we keep it, in addition > > to Tom's concerns. > > No. This is an absolute disaster. It's taking something we have always > considered to be an irrelevant implementation detail and making it into > user-visible DDL state, despite the fact that it doesn't begin to satisfy > basic transactional behaviors. We *need* to get rid of that aspect of > things. If you must have scannability state in version 0, okay, but > it has to be a catalog property not this. In large part, this ended up outside the catalogs due to key limitations of the startup process. This isn't the first time we've arranged an unusual dance for this reason, and it probably won't be the last. Sure, we could take out unlogged MVs to evade the problem, but re-adding them will mean either restoring relfilenode-based bookkeeping or attacking the startup process limitation directly. There exist fundamental challenges to a direct attack, like the potential inconsistency of system catalogs themselves. We could teach pg_relation_is_scannable() that unlogged MVs are always non-scannable during recovery, then somehow update system catalogs in all databases at the end of recovery. Not a project for 9.3, and I wouldn't be surprised to see it mushroom in complexity. The currently-committed approach is a good one given the applicable constraints. A slight variation on the committed approach would be to add a "_scannable" relation fork. The fork would either be absent (not scannable if an MV) or empty (possibly-scannable). Create it in CREATE MATERIALIZED VIEW sans WITH NO DATA and in REFRESH MATERIALIZED VIEW. Remove it in TRUNCATE. When the startup process reinitializes an unlogged relation, it would also remove any _scannable fork. This has a few advantages over the current approach: VACUUM won't need a special case, and pg_upgrade will be in a better position to blow away all traces if we introduce a better approach. The disadvantage is an extra inode per healthy MV. (Though it does not lead to a 9.3 solution, I'll note that an always-present relation metapage would help here.) Note that I said "possibly-scannable" -- the relfilenode-based indicator (whether the committed approach or something else) doesn't need to remain the only input to the question of scannability. If 9.5 introduces the concept of age-based scannability expiration, the applicable timestamp could go in pg_class, and pg_relation_is_scannable() could check both that and the relfilenode-based indicator. Concerning the original $SUBJECT, I would look at fixing the performance problem by having pg_relation_is_scannable() use an algorithm like this: 1. Grab the pg_class entry from syscache. If it's not found, return NULL. 2. If it's not a matview, return false. 3. Lock the matview and try to open a relcache entry. Return NULL on failure. 4. Return the scannability as reported by the relcache. This boils down to the CASE statement you noted upthread, except putting the fast-exit logic in the function rather than in its caller(s). nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com
On Thu, Apr 04, 2013 at 12:28:01PM +0200, Nicolas Barbier wrote: > 2013/4/3 Tom Lane <tgl@sss.pgh.pa.us>: > > And if you're absolutely convinced that unlogged matviews mustn't work as I > > suggest, we can lose those from 9.3, too. > > +1. Having unlogged matviews without having incremental updates yet, > isn't super useful anyway. I would have surmised the opposite: since an unlogged MV requires a full refresh at unpredictable moments, logged MVs will be preferred where a refresh is prohibitively expensive. Why might unlogged-MV applications desire incremental updates more acutely than logged-MV applications? -- Noah Misch EnterpriseDB http://www.enterprisedb.com
Re: matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)
From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes: > On Wed, Apr 03, 2013 at 05:49:18PM -0400, Tom Lane wrote: >> No. This is an absolute disaster. It's taking something we have always >> considered to be an irrelevant implementation detail and making it into >> user-visible DDL state, despite the fact that it doesn't begin to satisfy >> basic transactional behaviors. We *need* to get rid of that aspect of >> things. If you must have scannability state in version 0, okay, but >> it has to be a catalog property not this. > In large part, this ended up outside the catalogs due to key limitations of > the startup process. Yeah, I realize that there's no other (easy) way to make unlogged matviews reset to an invalid state on crash, but that doesn't make this design choice less of a disaster. It boxes us into something that's entirely unable to support transitions between scannable and unscannable states by any means short of a complete rewrite of the matview contents; which seems fundamentally incompatible with any sort of incremental update scenario. And I remain of the opinion that it's going to box us into not being able to fix the problems because of pg_upgrade on-disk-compatibility issues. We will be far better off to drop unlogged matviews until we can come up with a better design. If that's so far off that no one can see it happening, well, that's tough. Leaving the door open for incremental maintenance is more important. > A slight variation on the committed approach would be to add a "_scannable" > relation fork. Not very transaction-safe, I think (consider crash midway through a transaction that adds or removes the fork), and in any case orders of magnitude more expensive than looking at a pg_class field. This really needs to be catalog state, not filesystem state. regards, tom lane
Re: matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)
From
Alvaro Herrera
Date:
Tom Lane escribió: > Noah Misch <noah@leadboat.com> writes: > > A slight variation on the committed approach would be to add a "_scannable" > > relation fork. > > Not very transaction-safe, I think (consider crash midway through a > transaction that adds or removes the fork), and in any case orders of > magnitude more expensive than looking at a pg_class field. This really > needs to be catalog state, not filesystem state. We could revive the pg_class_nt patch proposed a decade ago, perhaps ... -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
2013/4/5 Noah Misch <noah@leadboat.com>: > On Thu, Apr 04, 2013 at 12:28:01PM +0200, Nicolas Barbier wrote: > >> +1. Having unlogged matviews without having incremental updates yet, >> isn't super useful anyway. > > I would have surmised the opposite: since an unlogged MV requires a full > refresh at unpredictable moments, logged MVs will be preferred where a refresh > is prohibitively expensive. That sounds like a good reason to choose your matview to be logged indeed. I.e., very expensive to rebuild → choose logged The opposite is also true: If your matview is not so expensive to rebuild, why would it matter that much if it is logged? (Rebuilding would be a tad slower, but it is not that slow to start with, so who cares?) I.e., not so expensive to rebuild → logged or unlogged are fine This would mean “always choose logged,” except for the restricted case of “incremental updates of a matview that is not so expensive to rebuild” that I describe next: > Why might unlogged-MV applications desire incremental updates more acutely > than logged-MV applications? My reasoning was more like: If you have incremental updates, there will probably be some overhead linked to executing any transaction that updates the base tables, namely for storing the changesets somewhere. I imagined it could at least be this storing of changesets that one would want to be unlogged, lest it slowing down the commit of most transactions that don’t even touch the matview. (Note that losing any changeset is the same as losing the contents of the whole matview in the “always guaranteed to be 100% up-to-date when read” case.) Of course, because of the previous reasoning, we should always make a matview logged if it is very expensive to rebuild. That leaves the case of a not-so-expensive matview: It is smart to make it unlogged when the changesets are stored in a way similar to what I just described. Anyway, I conclude that (especially as we don’t have incremental updates yet), unlogged matviews are, as things stand now, not very useful. As a sidenote, I see two ways to avoid storing changesets as part of a commit that changes the base tables. Choosing any of those would invalidate my previous logic, and even more diminish the need for unlogged matviews: (1) WAL is used as the source for the changesets; Andres’ logical replication work comes to mind. Probably mostly useful for the “always guaranteed to be 100% up-to-date when read” case. (2) The base tables are scanned and the xmin/xmax values are used to determine what changed since last time. This probably requires keeping VACUUM from removing rows that are still needed to determine how to change any matviews that depend on that base table. Probably mostly useful for the case where bulk-incremental updates are performed only sporadically, and where the base tables are not so big that they cannot usefully be scanned in full. Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad?
Re: matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)
From
Noah Misch
Date:
On Thu, Apr 04, 2013 at 06:07:17PM -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > On Wed, Apr 03, 2013 at 05:49:18PM -0400, Tom Lane wrote: > >> No. This is an absolute disaster. It's taking something we have always > >> considered to be an irrelevant implementation detail and making it into > >> user-visible DDL state, despite the fact that it doesn't begin to satisfy > >> basic transactional behaviors. We *need* to get rid of that aspect of > >> things. If you must have scannability state in version 0, okay, but > >> it has to be a catalog property not this. > > > In large part, this ended up outside the catalogs due to key limitations of > > the startup process. > > Yeah, I realize that there's no other (easy) way to make unlogged > matviews reset to an invalid state on crash, but that doesn't make this > design choice less of a disaster. It boxes us into something that's > entirely unable to support transitions between scannable and unscannable > states by any means short of a complete rewrite of the matview contents; > which seems fundamentally incompatible with any sort of incremental > update scenario. I said: > > [...] the relfilenode-based indicator > > (whether the committed approach or something else) doesn't need to remain the > > only input to the question of scannability. If 9.5 introduces the concept of > > age-based scannability expiration, the applicable timestamp could go in > > pg_class, and pg_relation_is_scannable() could check both that and the > > relfilenode-based indicator. To make that concrete, suppose we implement something you suggested upthread, "ALTER MATERIALIZED VIEW foo SET (scannable = false)". Implementing that by replacing the relfilenode with a single empty page or removing a _scannable fork would indeed be all wrong. The former blows away data you may wish to make reappear later, and the latter is non-transactional. But what has us do it that way? Store the new indicator in reloptions, where it belongs, and make pg_relation_is_scannable() test "relfilenode_says_scannable && reloptions_say_scannable". > And I remain of the opinion that it's going to box us > into not being able to fix the problems because of pg_upgrade > on-disk-compatibility issues. I said: > > [a "_scannable" fork] has a few advantages over the current approach: VACUUM > > won't need a special case, and pg_upgrade will be in a better position to blow > > away all traces if we introduce a better approach. pg_upgrade looks for _scannable forks, then translates their presence into the new representation. The committed single-empty-page strategy doesn't have an acute problem here, either, though. > > A slight variation on the committed approach would be to add a "_scannable" > > relation fork. > > Not very transaction-safe, I think (consider crash midway through a > transaction that adds or removes the fork) The SQL commands I cited as responsible for creating or removing the fork all make a new relfilenode anyway. Thus, "add" actually means creating the fork with the new relfilenode, and "remove" actually means omitting the fork from the new relfilenode. The association between relfilenodes and relations is, of course, transactional. -- Noah Misch EnterpriseDB http://www.enterprisedb.com
Re: matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)
From
Kevin Grittner
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote: > I realize that there's no other (easy) way to make unlogged > matviews reset to an invalid state on crash, but that doesn't > make this design choice less of a disaster. It boxes us into > something that's entirely unable to support transitions between > scannable and unscannable states by any means short of a complete > rewrite of the matview contents; which seems fundamentally > incompatible with any sort of incremental update scenario. That assertion makes absolutely no sense. Once we design (in a future release cycle) a way for users to declare how current a view must be to be usable, there is no reason the pg_relation_is_scannable() function cannot be modified to use those mechanisms. I continue to believe that it is a bad idea to ever allow a matview to be scanned when it is not a materialization of its query, but that does *not* mean that all matviews that are a materialization of the query would need to be considered scannable. My working assumption was that the isscannable field in relcache would be the first of many tests once we get there; if the matview actually does represent some materialization of data, the function would then proceed to check whether it is current enough to use, It does mean that you could not start incremental maintenance on a matview without first generating a base by running the query to create initial data (as part of CREATE or REFRESH), but that seems like a *good* thing to me. To do otherwise would make no more sense than to try to recover a database using just WAL files without a base backup to apply them to. > And I remain of the opinion that it's going to box us into not > being able to fix the problems because of pg_upgrade on-disk- > compatibility issues. This argument also seems bogus to me. Since it is a valid heap either way, the *worst case* would be recommending that after upgrading users take some special action on any materialized views which were not scannable; and I doubt that we would need to do that. If you see some risk I'm missing, please elaborate. > We will be far better off to drop unlogged matviews until we can > come up with a better design. If that's so far off that no one > can see it happening, well, that's tough. Leaving the door open > for incremental maintenance is more important. I've been looking at what is needed for incremental maintenance, and I'm not seeing the problem. Since you can't incrementally update a view which has never been populated, the only way in which it could create a problem in terms of transactional behavior, I think, is that this would make it harder to not hold an AccessExclusiveLock when transitioning between not having materialized data and having it (or vice versa), and I'm dubious we can avoid such a lock anyway. I don't see where it creates any problems for performing incremental updates once we are in a state which can allow them. > This really needs to be catalog state, not filesystem state. That may be true, but the arguments in this post are so off-base that I'm wondering whether it really is. When I read some earlier posts I was convinced, but now I think I need to review the whole thread again to make sure I wasn't too quick to concede the point. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Noah Misch <noah@leadboat.com> wrote: > On Thu, Apr 04, 2013 at 12:28:01PM +0200, Nicolas Barbier wrote: >> +1. Having unlogged matviews without having incremental updates >> yet, isn't super useful anyway. > > I would have surmised the opposite Hmm. I was thinking about the fact that a full refresh can be unlogged anyway, but that's only true if you're not using WAL for replication. If you need a matview on the master but not on the replica(s), then there is still benefit to declaring it to be unlogged in the absence of incremental maintenance. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)
From
Kevin Grittner
Date:
Noah Misch <noah@leadboat.com> wrote: > The SQL commands I cited as responsible for creating or removing > the fork all make a new relfilenode anyway. Thus, "add" actually > means creating the fork with the new relfilenode, and "remove" > actually means omitting the fork from the new relfilenode. The > association between relfilenodes and relations is, of course, > transactional. The same argument applies to the currently-committed code. The goal is to not change a matview between zero-length and non-zero length once the heap exists; but to only have this state change when REFRESH replaces the heap in the style of CLUSTER, TRUNCATE, ALTER TABLE with heap rewrite, or the new VACUUM FULL. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)
From
Noah Misch
Date:
On Fri, Apr 05, 2013 at 07:00:38AM -0700, Kevin Grittner wrote: > Noah Misch <noah@leadboat.com> wrote: > > > The SQL commands I cited as responsible for creating or removing > > the fork all make a new relfilenode anyway. Thus, "add" actually > > means creating the fork with the new relfilenode, and "remove" > > actually means omitting the fork from the new relfilenode. The > > association between relfilenodes and relations is, of course, > > transactional. > > The same argument applies to the currently-committed code. True. -- Noah Misch EnterpriseDB http://www.enterprisedb.com
Re: matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)
From
Kevin Grittner
Date:
Kevin Grittner <kgrittn@ymail.com> wrote: > I think I need to review the whole thread again to make sure I > wasn't too quick to concede the point. On a fresh reading of this, I think a large part of what is at issue here stems from a bad name for the new bool field I added to the RelationData structure -- instead of rd_isscannable it should probably be called something like rd_ispopulated. The current name led to some fuzzy thinking on my part when it was referenced, and both the name and a couple ill-considered uses of it probably contributed to concerns about how it was generated and used. I will post a draft patch today to see whether concerns abate. Basically, the name change should help make clear that this is not intended to be the only way to determine whether a matview is scannable. Second, there is at least on (and probably more) direct tests of this field which should use a function for a scannability test. For 9.3, that will just wrap a test of this bool, but it makes clear what the longer-term intent is, and help ensure that things don't get missed when patches are written in later releases. Third, some comments need to be corrected and added. Hopefully it can help get us all onto the same page. If not, it should at least better focus the discussion. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)
From
Kevin Grittner
Date:
Kevin Grittner <kgrittn@ymail.com> wrote: > I think a large part of what is at issue here stems from a bad > name for the new bool field I added to the RelationData structure > -- instead of rd_isscannable it should probably be called > something like rd_ispopulated. The current name led to some > fuzzy thinking on my part when it was referenced, and both the > name and a couple ill-considered uses of it probably contributed > to concerns about how it was generated and used. > > I will post a draft patch today to see whether concerns abate. > Basically, the name change should help make clear that this is > not intended to be the only way to determine whether a matview is > scannable. Second, there is at least on (and probably more) > direct tests of this field which should use a function for a > scannability test. For 9.3, that will just wrap a test of this > bool, but it makes clear what the longer-term intent is, and help > ensure that things don't get missed when patches are written in > later releases. Third, some comments need to be corrected and > added. > > Hopefully it can help get us all onto the same page. If not, it > should at least better focus the discussion. Attached is a firt cut at drawing a bright line between the notion of whether a matview has been *populated* and whether it is *scannable*. In 9.3 one is true if and only if the other is, but I plan on doing work such that this will no longer be true in the next release, and not making a clear distinction now has been confusing everyone, including me. This patch is light on functional changes, and heavier on name and comment changes. It does fix the lack of locking for the user-visible pg_relation_is_scannable() function, and it does correct the performance regression in pg_dump, but those should be the only user-visible changes. Internally I also tried to correct a modularity violation complained of by Tom. Vacuum still needs to be taught not to truncate away the first page of a matview, but that seemed like it was better left for a separate patch, and if we decide not to use the current technique for identifying a non-populated matview, it owuld be one more thing to undo. It passed `make check-world` and `make installcheck-world`, and a dump and load of the regression database works. I got sidetracked with some support issues, so I didn't have time to dig around for other weak comments, but I think I'm close on all other changes. Comments? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Fri, Apr 05, 2013 at 11:17:30AM +0200, Nicolas Barbier wrote: > 2013/4/5 Noah Misch <noah@leadboat.com>: > > On Thu, Apr 04, 2013 at 12:28:01PM +0200, Nicolas Barbier wrote: > >> +1. Having unlogged matviews without having incremental updates yet, > >> isn't super useful anyway. > > > > I would have surmised the opposite: since an unlogged MV requires a full > > refresh at unpredictable moments, logged MVs will be preferred where a refresh > > is prohibitively expensive. > > That sounds like a good reason to choose your matview to be logged indeed. > > I.e., very expensive to rebuild → choose logged > > The opposite is also true: If your matview is not so expensive to > rebuild, why would it matter that much if it is logged? (Rebuilding > would be a tad slower, but it is not that slow to start with, so who > cares?) > > I.e., not so expensive to rebuild → logged or unlogged are fine > > This would mean “always choose logged,” except for the restricted case > of “incremental updates of a matview that is not so expensive to > rebuild” that I describe next: Cheap is good, but cheaper is better. Since a refresh locks out readers, mild wins do count. > > Why might unlogged-MV applications desire incremental updates more acutely > > than logged-MV applications? > > My reasoning was more like: If you have incremental updates, there > will probably be some overhead linked to executing any transaction > that updates the base tables, namely for storing the changesets > somewhere. I imagined it could at least be this storing of changesets > that one would want to be unlogged, lest it slowing down the commit of > most transactions that don’t even touch the matview. That's a good point. However, the same holds when refreshing a logged MV, especially at wal_level > minimal. The refresh exerts pressure on the WAL stream, likely delaying other WAL-using transactions. > As a sidenote, I see two ways to avoid storing changesets as part of a > commit that changes the base tables. Choosing any of those would > invalidate my previous logic, and even more diminish the need for > unlogged matviews: Regardless of the change storage method, actually applying incremental changes to a logged MV will require WAL. UNLOGGED MVs will stay relevant. -- Noah Misch EnterpriseDB http://www.enterprisedb.com