Thread: ALTER TABLE lock strength reduction patch is unsafe
If you set up a pgbench test case that hits the database with a lot of concurrent selects and non-exclusive-locking ALTER TABLEs, 9.1 soon falls over. For example: $ cat foo.script alter table pgbench_accounts set (fillfactor = 100); SELECT abalance FROM pgbench_accounts WHERE aid = 525212; $ createdb bench $ pgbench -i -s 10 bench ... $ pgbench -c 50 -t 1000000 -f foo.script bench starting vacuum...end. Client 10 aborted in state 0: ERROR: relation "pgbench_accounts" does not exist Client 5 aborted in state 1: ERROR: cache lookup failed for relation 46260 Client 44 aborted in state 0: ERROR: relation "pgbench_accounts" does not exist Client 3 aborted in state 1: ERROR: relation "pgbench_accounts" does not exist LINE 1: SELECT abalance FROM pgbench_accounts WHERE aid = 525212; ^ Client 45 aborted in state 1: ERROR: could not open relation with OID 46260 LINE 1: SELECT abalance FROM pgbench_accounts WHERE aid = 525212; ^ Client 15 aborted in state 1: ERROR: cache lookup failed for relation 46260 Client 34 aborted in state 1: ERROR: could not open relation with OID 46260 LINE 1: SELECT abalance FROM pgbench_accounts WHERE aid = 525212; ^ Client 43 aborted in state 1: ERROR: cache lookup failed for relation 46260 Client 49 aborted in state 1: ERROR: relation "pgbench_accounts" does not exist LINE 1: SELECT abalance FROM pgbench_accounts WHERE aid = 525212; ^ Client 12 aborted in state 0: ERROR: relation "pgbench_accounts" does not exist Client 23 aborted in state 0: ERROR: relation "pgbench_accounts" does not exist Client 14 aborted in state 0: ERROR: relation "pgbench_accounts" does not exist Client 6 aborted in state 1: ERROR: could not open relation with OID 46260 LINE 1: SELECT abalance FROM pgbench_accounts WHERE aid = 525212; ^ Client 11 aborted in state 1: ERROR: could not open relation with OID 46260 LINE 1: SELECT abalance FROM pgbench_accounts WHERE aid = 525212; ^ Client 4 aborted in state 0: ERROR: relation "pgbench_accounts" does not exist ... etc etc ... On my four-core workstation, the failures are infrequent at up to 30 clients but come pretty fast and furious at 50. What is happening here is this: 1. Some backend commits an ALTER TABLE and sends out an sinval message. 2. In response, some other backend starts to reload its relcache entry for pgbench_accounts when it begins its next command. It does an indexscan with SnapshotNow on pg_class to find the updated pg_class row. 3. Meanwhile, some third backend commits another ALTER TABLE, updating the pg_class row another time. Since we have removed the AccessExclusiveLock that all variants of ALTER TABLE used to take, this commit can happen while backend #2 is in process of scanning pg_class. 4. Backend #2 visits the new, about-to-be-committed version of pgbench_accounts' pg_class row just before backend #3 commits. It sees the row as not good and keeps scanning. By the time it reaches the previous version of the row, however, backend #3 *has* committed. So that version isn't good according to SnapshotNow either. 5. Thus, backend #2 fails to find any version of the pg_class row that satisfies SnapshotNow, and it reports an error. Depending on just when this happens during the cache load process, you can get any of the errors displayed above, or probably some other ones. The particular case I'm showing here only updates pg_class, but other non-exclusive-lock variants of ALTER TABLE can probably provoke similar failures with respect to other catalogs, leading to yet different misbehaviors. In typical cases where both versions of the row are on the same page, the window for the concurrent commit to happen is very narrow --- that's why you need so many clients to make it happen easily. With enough clients there's a good chance of losing the CPU between tuple visits. But of course Murphy's Law says this will happen in production situations even if the load isn't so high. I believe that this is fundamentally unavoidable so long as we use SnapshotNow to read catalogs --- which is something we've talked about changing, but it will require a pretty major R&D effort to make it happen. In the meantime, we have to go back to using AccessExclusiveLock for table alterations. It doesn't help to have a lower lock level if that means that concurrent transactions will unpredictably fail instead of waiting. regards, tom lane
On Thu, Jun 16, 2011 at 11:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > In typical cases where both versions of the row are on the same page, > the window for the concurrent commit to happen is very narrow --- that's > why you need so many clients to make it happen easily. With enough > clients there's a good chance of losing the CPU between tuple visits. > But of course Murphy's Law says this will happen in production > situations even if the load isn't so high. Thanks for doing the research on this. Much better we know now than enter production like this. > I believe that this is fundamentally unavoidable so long as we use > SnapshotNow to read catalogs --- which is something we've talked about > changing, but it will require a pretty major R&D effort to make it > happen. In the meantime, we have to go back to using > AccessExclusiveLock for table alterations. It doesn't help to have > a lower lock level if that means that concurrent transactions will > unpredictably fail instead of waiting. If we were to change ALTER TABLE so it takes a session lock rather than a normal lock, then we can commit the change and then wait until the relcache invalidations have been received before we release the lock. That way we would be able to avoid the concurrent issues you describe. Or alternatively we could just re-scan if we can't find a valid row when building the cache. We have time in the failure path... Do you have stomach for any this in 9.1? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Jun 16, 2011 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I believe that this is fundamentally unavoidable so long as we use > SnapshotNow to read catalogs --- which is something we've talked about > changing, but it will require a pretty major R&D effort to make it > happen. In the meantime, we have to go back to using > AccessExclusiveLock for table alterations. It doesn't help to have > a lower lock level if that means that concurrent transactions will > unpredictably fail instead of waiting. Ouch. I wonder if we could avoid this anomaly by taking a throwaway MVCC snapshot at the beginning of each system catalog scan and using it just for the duration of that scan. If nothing that has touched the catalog commits while the scan is open, then this is logically equivalent to SnapshotNow. If something does commit in mid-scan, then we might not get the latest version of the row, but we should end up with exactly one. If it's not the latest one, we'll do the rebuild again upon seeing the next sinval message; in the meantime, the version we're using mustn't be too intolerably bad or it was an error not to use AccessExclusiveLock in the first place. IIUC, the problem with this approach is not correctness but performance. Taking snapshots is (currently) expensive. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jun 16, 2011 at 11:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > 2. In response, some other backend starts to reload its relcache entry > for pgbench_accounts when it begins its next command. It does an > indexscan with SnapshotNow on pg_class to find the updated pg_class row. > > 3. Meanwhile, some third backend commits another ALTER TABLE, updating > the pg_class row another time. Since we have removed the > AccessExclusiveLock that all variants of ALTER TABLE used to take, this > commit can happen while backend #2 is in process of scanning pg_class. This part is the core of the problem: We must not be able to update the catalog entry while a relcache rebuild scan is in place. So I'm prototyping something that allows LockRelationDefinitionOid(targetRelId, ShareLock); -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jun 17, 2011 at 9:32 AM, simon <simon@2ndquadrant.com> wrote: > On Thu, Jun 16, 2011 at 11:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> 2. In response, some other backend starts to reload its relcache entry >> for pgbench_accounts when it begins its next command. It does an >> indexscan with SnapshotNow on pg_class to find the updated pg_class row. >> >> 3. Meanwhile, some third backend commits another ALTER TABLE, updating >> the pg_class row another time. Since we have removed the >> AccessExclusiveLock that all variants of ALTER TABLE used to take, this >> commit can happen while backend #2 is in process of scanning pg_class. > > This part is the core of the problem: > > We must not be able to update the catalog entry while a relcache rebuild scan is in place. > > So I'm prototyping something that allows > LockRelationDefinitionOid(targetRelId, ShareLock); Similar to the way we lock a relation for extension, as a sub-lock of the main relation lock. Relcache rebuilds use a ShareLock, ALTER TABLE uses an ExclusiveLock. I've written the patch, just need to test later today.... gotta step out now. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jun 16, 2011 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I believe that this is fundamentally unavoidable so long as we use >> SnapshotNow to read catalogs --- which is something we've talked about >> changing, but it will require a pretty major R&D effort to make it >> happen. > Ouch. > I wonder if we could avoid this anomaly by taking a throwaway MVCC > snapshot at the beginning of each system catalog scan and using it > just for the duration of that scan. If nothing that has touched the > catalog commits while the scan is open, then this is logically > equivalent to SnapshotNow. If something does commit in mid-scan, then > we might not get the latest version of the row, but we should end up > with exactly one. If it's not the latest one, we'll do the rebuild > again upon seeing the next sinval message; in the meantime, the > version we're using mustn't be too intolerably bad or it was an error > not to use AccessExclusiveLock in the first place. Yeah, this seems like a possibly workable direction to explore. I like this better than what Simon is proposing, because it would fix the generic issue for all types of catalog SnapshotNow scans. > IIUC, the problem with this approach is not correctness but > performance. Taking snapshots is (currently) expensive. Yeah. After mulling it for awhile, what about this idea: we could redefine SnapshotNow as a snapshot type that includes a list of transactions-in-progress, somewhat like an MVCC snapshot, but we don't fill that list from the PGPROC array. Instead, while running a scan with SnapshotNow, anytime we determine that a particular XID is still-in-progress, we add that XID to the snapshot's list. Subsequently, the SnapshotNow code assumes that XID to be still-in-progress without consulting its actual state. We reset the XID list to empty when starting a new SnapshotNow scan. (We might be able to do so less often than that, like only when we do AcceptInvalidationMessages, but it's not clear to me that there's any real benefit in hanging onto the state longer.) This costs no performance; if anything it should be faster than now, because we'll be replacing expensive transaction state probes with relatively-cheap searches of an XID array that should almost always be quite short. With this approach, we would have no serialization anomalies from single transactions committing while a scan is in progress. There could be anomalies resulting from considering an earlier XID to be in-progress while a later XID is considered committed (because we didn't observe it until later). So far as I can see offhand, the impact of that would be that there might be multiple versions of a tuple that are considered good, but never that there would be no version considered good (so long as the other XIDs simply updated the tuple and didn't delete it). I think this would be all right, since the scan would just seize on the first good version it finds. As you argue above, if that's not good enough for our purposes then the updater(s) should have taken a stronger lock. I am not, however, particularly pleased with the idea of trying to make this work in 9.1. I still think that we should back off the attempt to reduce lock strength in 9.1, and take it up for 9.2. We need to be stabilizing 9.1 for release, not introducing new untested mechanisms in it. regards, tom lane
On Fri, Jun 17, 2011 at 1:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Yeah, this seems like a possibly workable direction to explore. I like > this better than what Simon is proposing, because it would fix the > generic issue for all types of catalog SnapshotNow scans. It would also avoid adding more lock manager traffic which - as recently discussed on the relevant threads - turns out to be a significant performance bottleneck for us right now on some workloads. >> IIUC, the problem with this approach is not correctness but >> performance. Taking snapshots is (currently) expensive. > > Yeah. After mulling it for awhile, what about this idea: we could > redefine SnapshotNow as a snapshot type that includes a list of > transactions-in-progress, somewhat like an MVCC snapshot, but we don't > fill that list from the PGPROC array. Instead, while running a scan > with SnapshotNow, anytime we determine that a particular XID is > still-in-progress, we add that XID to the snapshot's list. > Subsequently, the SnapshotNow code assumes that XID to be > still-in-progress without consulting its actual state. We reset the XID > list to empty when starting a new SnapshotNow scan. (We might be able > to do so less often than that, like only when we do > AcceptInvalidationMessages, but it's not clear to me that there's any > real benefit in hanging onto the state longer.) I think that something like that might possibly work, but what if the XID array overflows? A while back I proposed the idea of a "lazy" snapshot, by which I had in mind something similar to what you are suggesting but different in detail. Initially, when asked to acquire a snapshot, the snapshot manager acknowledges having taken one but does not actually do any work. As long as it sees only XIDs that either precede the oldest XID still running anywhere in the cluster, or have aborted, it can provide answers that are 100% correct without any further data. If it ever sees a newer, non-aborted XID then it goes and really gets an MVCC snapshot at that point, which it can uses from that point onward. I think that it might be possible to make such a system work even for MVCC snapshots generally, but even if not, it might be sufficient for this purpose. Unlike your approach, it would avoid both the "see no rows" and the "see multiple rows" cases, which might be thought an advantage. > I am not, however, particularly pleased with the idea of trying to make > this work in 9.1. I still think that we should back off the attempt > to reduce lock strength in 9.1, and take it up for 9.2. We need to be > stabilizing 9.1 for release, not introducing new untested mechanisms in > it. I like this feature a lot, but it's hard to imagine that any of the fixes anyone has so far suggested can be implemented without collateral damage. Nor is there any certainty that this is the last bug. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Jun 17, 2011 at 1:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yeah. After mulling it for awhile, what about this idea: we could >> redefine SnapshotNow as a snapshot type that includes a list of >> transactions-in-progress, somewhat like an MVCC snapshot, but we don't >> fill that list from the PGPROC array. Instead, while running a scan >> with SnapshotNow, anytime we determine that a particular XID is >> still-in-progress, we add that XID to the snapshot's list. > I think that something like that might possibly work, but what if the > XID array overflows? Well, you repalloc it bigger. In either this idea or yours below, I fear SnapshotNow snaps will have to become dynamically-allocated structures instead of being simple references to a shared constant object. (This is because we can sometimes do a SnapshotNow scan when another one is already in progress, and we couldn't let the inner one change the outer one's state.) That's not really a performance problem; one more palloc to do a catalog scan is a non-issue. But it is likely to be a large notational change compared to what we've got now. > A while back I proposed the idea of a "lazy" snapshot, by which I had > in mind something similar to what you are suggesting but different in > detail. Initially, when asked to acquire a snapshot, the snapshot > manager acknowledges having taken one but does not actually do any > work. As long as it sees only XIDs that either precede the oldest XID > still running anywhere in the cluster, or have aborted, it can provide > answers that are 100% correct without any further data. If it ever > sees a newer, non-aborted XID then it goes and really gets an MVCC > snapshot at that point, which it can uses from that point onward. I > think that it might be possible to make such a system work even for > MVCC snapshots generally, but even if not, it might be sufficient for > this purpose. Unlike your approach, it would avoid both the "see no > rows" and the "see multiple rows" cases, which might be thought an > advantage. Hmm, yeah, I think this idea is probably better than mine, just because of the less dubious semantics. I don't see how you'd make it work for generic MVCC scans, because the behavior will be "the database state as of some hard-to-predict time after the scan starts", which is not what we want for MVCC. But it ought to be fine for SnapshotNow. regards, tom lane
On Fri, Jun 17, 2011 at 7:24 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jun 17, 2011 at 1:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yeah, this seems like a possibly workable direction to explore. I like >> this better than what Simon is proposing, because it would fix the >> generic issue for all types of catalog SnapshotNow scans. > > It would also avoid adding more lock manager traffic which - as > recently discussed on the relevant threads - turns out to be a > significant performance bottleneck for us right now on some workloads. >> I am not, however, particularly pleased with the idea of trying to make >> this work in 9.1. I still think that we should back off the attempt >> to reduce lock strength in 9.1, and take it up for 9.2. We need to be >> stabilizing 9.1 for release, not introducing new untested mechanisms in >> it. > > I like this feature a lot, but it's hard to imagine that any of the > fixes anyone has so far suggested can be implemented without > collateral damage. Nor is there any certainty that this is the last > bug. Not so. The extra locking would only occur on the first lock acquisition after DDL operations occur. If that was common then your other performance patch would not be an effective optimisation. There is no additional locking from what I've proposed in the common code path - that's why we have a relcache. Any effects from the additional locking will only be felt by people issuing a stream of DDL statements against a table. Even assuming there are some effects of real note. So there is no "collateral damage" and additional locking is a viable solution for 9.1. It's possible that we may have a better solution in 9.2+ but then we've said that before and have it never happen, many times. Having spent a few hours mulling through this, I think there is a reasonable solution for 9.1 and I continue to work on it. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jun 17, 2011 at 2:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hmm, yeah, I think this idea is probably better than mine, just because > of the less dubious semantics. I don't see how you'd make it work for > generic MVCC scans, because the behavior will be "the database state as > of some hard-to-predict time after the scan starts", which is not what > we want for MVCC. But it ought to be fine for SnapshotNow. Department of second thoughts: I think I see a problem. Suppose we have a tuple that has not been updated for a long time. Its XMIN is committed and all-visible, and its XMAX is invalid. As we're scanning the table (transaction T1), we see that tuple and say, oh, it's visible. Now, another transaction (T2) begins, updates the tuple, and commits. Our scan then reaches the page where the new tuple is located, and says, oh, this is recent, I'd better take a real snapshot. Of course, the new snapshot can see the new version of the tuple, too. Of course, if T1 had taken its snapshot before starting the scan, the second tuple would have been invisible. But since we didn't take it until later, after T2 had already committed, we see a duplicate. That's still no worse than your idea, which rests on the theory that duplicates don't matter anyway, but the case for it being better is a lot thinner. I'd sure prefer something that had less crazy semantics than either of these ideas, if we can think of something. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jun 17, 2011 at 3:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > Not so. The extra locking would only occur on the first lock > acquisition after DDL operations occur. If that was common then your > other performance patch would not be an effective optimisation. There > is no additional locking from what I've proposed in the common code > path - that's why we have a relcache. The extra locking would also occur when *initially* building relcache entries. In other words, this would increase - likely quite significantly - the overhead of backend startup. It's not going to be sufficient to do this just for pg_class; I think you'll have to do it for pg_attribute, pg_attrdef, pg_constraint, pg_index, pg_trigger, pg_rewrite, and maybe a few others I'm not thinking of right now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jun 17, 2011 at 8:15 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jun 17, 2011 at 3:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Not so. The extra locking would only occur on the first lock >> acquisition after DDL operations occur. If that was common then your >> other performance patch would not be an effective optimisation. There >> is no additional locking from what I've proposed in the common code >> path - that's why we have a relcache. > > The extra locking would also occur when *initially* building relcache > entries. In other words, this would increase - likely quite > significantly - the overhead of backend startup. It's not going to be > sufficient to do this just for pg_class; I think you'll have to do it > for pg_attribute, pg_attrdef, pg_constraint, pg_index, pg_trigger, > pg_rewrite, and maybe a few others I'm not thinking of right now. Nothing you say here is accurate, regrettably. The "extra locking" would be one call to the lock manager per relation. Taken in shared mode, so it doesn't block. I see no reason at all to have separate locks for each catalog table, since it's the relation lock that is the top level lock. Locking is a very well known solution to such problems. We use it everywhere and we can use it here, and now. I think you'd better wait to see the patch. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Robert Haas <robertmhaas@gmail.com> writes: > Department of second thoughts: I think I see a problem. Um, yeah, so that doesn't really work any better than my idea. On further reflection, there's a problem at a higher level than this anyway. Even if we can get a single SnapshotNow scan to produce guaranteed-self-consistent results, that doesn't ensure consistency between the results of scans occurring serially. An example here is ALTER COLUMN DROP DEFAULT, which is currently imagined to impact only writers. However, suppose that a concurrent relcache load fetches the pg_attribute row, notes that it has atthasdef = true, and then the ALTER commits before we start to scan pg_attrdef. The consistency checks in AttrDefaultFetch() will complain about a missing pg_attrdef entry, and rightly so. We could lobotomize those checks, but it doesn't feel right to do so; and anyway there may be other cases that are harder to kluge up. So really we need consistency across *at least* one entire relcache load cycle. We could maybe arrange to take an MVCC snap (or some lighter weight version of that) at the start, and use that for all the resulting scans, but I think that would be notationally messy. It's not clear that it'd solve everything anyhow. There are parts of a relcache entry that we fetch only on-demand, so they are typically loaded later than the core items, and probably couldn't use the same snapshot. Worse, there are lots of places where we assume that use of catcache entries or direct examination of the catalogs will yield results consistent with the relcache. I suspect these latter problems will impact Simon's idea as well. regards, tom lane
On Fri, Jun 17, 2011 at 3:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Department of second thoughts: I think I see a problem. > > Um, yeah, so that doesn't really work any better than my idea. > > On further reflection, there's a problem at a higher level than this > anyway. Even if we can get a single SnapshotNow scan to produce > guaranteed-self-consistent results, that doesn't ensure consistency > between the results of scans occurring serially. An example here is > ALTER COLUMN DROP DEFAULT, which is currently imagined to impact only > writers. However, suppose that a concurrent relcache load fetches the > pg_attribute row, notes that it has atthasdef = true, and then the ALTER > commits before we start to scan pg_attrdef. The consistency checks in > AttrDefaultFetch() will complain about a missing pg_attrdef entry, and > rightly so. We could lobotomize those checks, but it doesn't feel right > to do so; and anyway there may be other cases that are harder to kluge up. > > So really we need consistency across *at least* one entire relcache load > cycle. We could maybe arrange to take an MVCC snap (or some lighter > weight version of that) at the start, and use that for all the resulting > scans, but I think that would be notationally messy. It's not clear > that it'd solve everything anyhow. There are parts of a relcache entry > that we fetch only on-demand, so they are typically loaded later than > the core items, and probably couldn't use the same snapshot. Worse, > there are lots of places where we assume that use of catcache entries or > direct examination of the catalogs will yield results consistent with > the relcache. > > I suspect these latter problems will impact Simon's idea as well. I suspect we're going to be told that they don't. I suspect I'm not going to believe it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Excerpts from Tom Lane's message of vie jun 17 13:22:40 -0400 2011: > With this approach, we would have no serialization anomalies from single > transactions committing while a scan is in progress. There could be > anomalies resulting from considering an earlier XID to be in-progress > while a later XID is considered committed (because we didn't observe > it until later). So far as I can see offhand, the impact of that would > be that there might be multiple versions of a tuple that are considered > good, but never that there would be no version considered good (so long > as the other XIDs simply updated the tuple and didn't delete it). I > think this would be all right, since the scan would just seize on the > first good version it finds. As you argue above, if that's not good > enough for our purposes then the updater(s) should have taken a stronger > lock. Hmm, would there be a problem if a scan on catalog A yields results from supposedly-running transaction X but another scan on catalog B yields result from transaction Y? (X != Y) For example, a scan on pg_class says that there are N triggers but scanning pg_trigger says N-1? -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Robert Haas <robertmhaas@gmail.com> writes: > I like this feature a lot, but it's hard to imagine that any of the > fixes anyone has so far suggested can be implemented without > collateral damage. Nor is there any certainty that this is the last > bug. And in fact, here's something else to worry about: consider pg_dump. pg_dump is pretty heavily reliant on backend catalog-interpretation code (such as ruleutils.c) that mostly runs on SnapshotNow time. But it also does a fair amount of work on the basis of its own inspection of the catalogs, which is done according to the serializable snapshot it gets at the beginning of the dump run. If these two views of a table's schema aren't consistent, you might get a pg_dump error, but it's at least as likely that you'll get a silently incorrect dump. pg_dump tries to minimize the risk by taking AccessShareLock right away on each table it's going to dump. This is not perfect but it at least results in a narrow window for conflicting table changes to occur. However, that strategy has been blown out of the water by the ALTER TABLE lock strength reduction. There is now a *very* wide window for concurrent ALTERs to occur and possibly break the dump results. As far as I can see, the only simple way to return pg_dump to its previous level of safety while retaining this patch is to make it take ShareUpdateExclusiveLocks, so that it will still block all forms of ALTER TABLE. This is rather unpleasant, since it will also block autovacuum for the duration of the dump. In the long run, we really ought to fix things so that ruleutils.c runs on the transaction snapshot, but that's a massive rewrite that is certainly not getting done for 9.1, and will likely result in considerable code bloat :-(. (BTW, I just noticed that dumpSchema does a pretty fair amount of work before it gets around to calling getTables, which is where said locks get taken. Seems like we'd better rearrange the order of operations there...) regards, tom lane
Alvaro Herrera <alvherre@commandprompt.com> writes: > Hmm, would there be a problem if a scan on catalog A yields results from > supposedly-running transaction X but another scan on catalog B yields > result from transaction Y? (X != Y) For example, a scan on pg_class > says that there are N triggers but scanning pg_trigger says N-1? Yeah, I came to that same conclusion downthread. regards, tom lane
Excerpts from Tom Lane's message of vie jun 17 17:08:25 -0400 2011: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Hmm, would there be a problem if a scan on catalog A yields results from > > supposedly-running transaction X but another scan on catalog B yields > > result from transaction Y? (X != Y) For example, a scan on pg_class > > says that there are N triggers but scanning pg_trigger says N-1? > > Yeah, I came to that same conclusion downthread. Something is seriously wrong with my email :-( -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Fri, Jun 17, 2011 at 5:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > As far as I can see, the only simple way to return pg_dump to its > previous level of safety while retaining this patch is to make it take > ShareUpdateExclusiveLocks, so that it will still block all forms of > ALTER TABLE. This is rather unpleasant, since it will also block > autovacuum for the duration of the dump. I have been thinking for a while now that it would be sensible to make vacuum use a different lock type, much as we do for relation extension. DROP TABLE and CLUSTER and at least some forms of ALTER TABLE and maybe a few other things like CREATE INDEX would need to grab that lock in addition to the ones they already acquire, but a whole lot of other things wouldn't. In particular, it's currently not possible to lock a table against SELECT without also locking it against VACUUM - and booting off any auto-vacuum worker that happens to already be processing it. If you imagine a large table with a bunch of short-duration exclusive locks, it's not too hard to see how you can end up with VACUUM starvation. But that's not something I want to do in 9.1, and I doubt it would completely solve this problem anyway. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I have been thinking for a while now that it would be sensible to make > vacuum use a different lock type, much as we do for relation > extension. Hmm. I had just been toying with the idea of introducing a new user-visible locking level to allow separation of anti-vacuum locks from anti-schema-alteration locks. But I think you're probably right that it could be done as a specialized LockTag. That would make it not easily user-accessible, but it's hard to think of reasons for users to lock out vacuum anyway, unless they want to lock out everything via AccessExclusiveLock. > ... In particular, it's currently not > possible to lock a table against SELECT without also locking it > against VACUUM Well, it still wouldn't be, since AccessExclusiveLock certainly had better lock out vacuum. As said above, I think the important thing is to distinguish vacuum from schema changes. > But that's not something I want to do in 9.1, Definitely. regards, tom lane
On Thu, Jun 16, 2011 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > 4. Backend #2 visits the new, about-to-be-committed version of > pgbench_accounts' pg_class row just before backend #3 commits. > It sees the row as not good and keeps scanning. By the time it > reaches the previous version of the row, however, backend #3 > *has* committed. So that version isn't good according to SnapshotNow > either. <thinks some more> Why isn't this a danger for every pg_class update? For example, it would seem that if VACUUM updates relpages/reltuples, it would be prone to this same hazard. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jun 16, 2011 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> 4. Backend #2 visits the new, about-to-be-committed version of >> pgbench_accounts' pg_class row just before backend #3 commits. >> It sees the row as not good and keeps scanning. �By the time it >> reaches the previous version of the row, however, backend #3 >> *has* committed. �So that version isn't good according to SnapshotNow >> either. > <thinks some more> > Why isn't this a danger for every pg_class update? For example, it > would seem that if VACUUM updates relpages/reltuples, it would be > prone to this same hazard. VACUUM does that with an in-place, nontransactional update. But yes, this is a risk for every transactional catalog update. regards, tom lane
On Fri, Jun 17, 2011 at 11:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Jun 16, 2011 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> 4. Backend #2 visits the new, about-to-be-committed version of >>> pgbench_accounts' pg_class row just before backend #3 commits. >>> It sees the row as not good and keeps scanning. By the time it >>> reaches the previous version of the row, however, backend #3 >>> *has* committed. So that version isn't good according to SnapshotNow >>> either. > >> <thinks some more> > >> Why isn't this a danger for every pg_class update? For example, it >> would seem that if VACUUM updates relpages/reltuples, it would be >> prone to this same hazard. > > VACUUM does that with an in-place, nontransactional update. But yes, > this is a risk for every transactional catalog update. Well, after various efforts to fix the problem, I notice that there are two distinct problems brought out by your test case. One of them is caused by my patch, one of them was already there in the code - this latter one is actually the hardest to fix. It took me about an hour to fix the first bug, but its taken a while of thinking about the second before I realised it was a pre-existing bug. The core problem is, as you observed that a pg_class update can cause rows to be lost with concurrent scans. We scan pg_class in two ways: to rebuild a relcache entry based on a relation's oid (easy fix). We also scan pg_class to resolve the name to oid mapping. The name to oid mapping is performed *without* a lock on the relation, since we don't know which relation to lock. So the name lookup can fail if we are in the middle of a pg_class update. This is an existing potential bug in Postgres unrelated to my patch. Ref: SearchCatCache() I've been looking at ways to lock the relation name and namespace prior to the lookup (or more precisely the hash), but its worth discussing whether we want that at all? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Jun 19, 2011 at 10:13 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > pre-existing bug I've been able to reproduce the bug on 9.0 stable as well, using the OP's test script. Multiple errors like this... ERROR: relation "pgbench_accounts" does not exist STATEMENT: alter table pgbench_accounts set (fillfactor = 100); Presume we want to fix both bugs, not just the most recent one so will continue with both patches. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Jun 19, 2011 at 5:13 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Fri, Jun 17, 2011 at 11:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> On Thu, Jun 16, 2011 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> 4. Backend #2 visits the new, about-to-be-committed version of >>>> pgbench_accounts' pg_class row just before backend #3 commits. >>>> It sees the row as not good and keeps scanning. By the time it >>>> reaches the previous version of the row, however, backend #3 >>>> *has* committed. So that version isn't good according to SnapshotNow >>>> either. >> >>> <thinks some more> >> >>> Why isn't this a danger for every pg_class update? For example, it >>> would seem that if VACUUM updates relpages/reltuples, it would be >>> prone to this same hazard. >> >> VACUUM does that with an in-place, nontransactional update. But yes, >> this is a risk for every transactional catalog update. > > Well, after various efforts to fix the problem, I notice that there > are two distinct problems brought out by your test case. > > One of them is caused by my patch, one of them was already there in > the code - this latter one is actually the hardest to fix. > > It took me about an hour to fix the first bug, but its taken a while > of thinking about the second before I realised it was a pre-existing > bug. > > The core problem is, as you observed that a pg_class update can cause > rows to be lost with concurrent scans. > We scan pg_class in two ways: to rebuild a relcache entry based on a > relation's oid (easy fix). We also scan pg_class to resolve the name > to oid mapping. The name to oid mapping is performed *without* a lock > on the relation, since we don't know which relation to lock. So the > name lookup can fail if we are in the middle of a pg_class update. > This is an existing potential bug in Postgres unrelated to my patch. > Ref: SearchCatCache() > > I've been looking at ways to lock the relation name and namespace > prior to the lookup (or more precisely the hash), but its worth > discussing whether we want that at all? If this is a pre-existing bug, then it's not clear to me why we need to do anything about it at all right now. I mean, it would be nice to have a fix, but it's hard to imagine that any proposed fix will be low-risk, and I don't remember user complaints about this. I continue to think that the root of the problem here is that SnapshotNow is Evil (TM). If we get rid of that, then this problem goes away, but that strikes me as a long-term project. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Jun 20, 2011 at 2:14 PM, Robert Haas <robertmhaas@gmail.com> wrote: > If this is a pre-existing bug, then it's not clear to me why we need > to do anything about it at all right now. I mean, it would be nice to > have a fix, but it's hard to imagine that any proposed fix will be > low-risk, and I don't remember user complaints about this. I continue > to think that the root of the problem here is that SnapshotNow is Evil > (TM). If we get rid of that, then this problem goes away, but that > strikes me as a long-term project. There are 2 bugs, one caused by my patch in 9.1, one that is pre-existing. The 9.1 bug can be fixed easily. I will edit my patch down and repost here shortly. The pre-existing bug is slightly harder/contentious because we have to lock the name of a possible relation, even before we know it exists. I've been looking to implement that as a lock on the uint32 hash of the relation's name and namespace. I'm looking for opinions ranging from fix-now-and-backpatch thru to ignore and discuss for 9.2. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> wrote: > I'm looking for opinions ranging from fix-now-and-backpatch thru > to ignore and discuss for 9.2. If it's a pre-existing bug I would think that one option would be to put it into the next bug-fix release of each supported major release in which it is manifest. Of course, if it is *safe* to work it into 9.1, that'd be great. -Kevin
On Mon, Jun 20, 2011 at 9:42 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Simon Riggs <simon@2ndQuadrant.com> wrote: > >> I'm looking for opinions ranging from fix-now-and-backpatch thru >> to ignore and discuss for 9.2. > > If it's a pre-existing bug I would think that one option would be to > put it into the next bug-fix release of each supported major release > in which it is manifest. Of course, if it is *safe* to work it into > 9.1, that'd be great. I'm currently on the other end of the spectrum: ignore and consider for 9.2. But that's mostly based on the belief that there isn't going to be a way of fixing this that isn't far too invasive to back-patch. Should that turn out to be incorrect, that's a different matter, of course... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > I'm currently on the other end of the spectrum: ignore and > consider for 9.2. I guess part of my point was that if we can't safely get something into the initial 9.1 release, it doesn't mean we necessarily need to wait for 9.2. Bugs can be fixed along the way in minor releases. A 9.1.1 fix might give us more time to work through details and ensure that it is a safe and well-targeted fix. -Kevin
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Jun 19, 2011 at 5:13 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> We scan pg_class in two ways: to rebuild a relcache entry based on a >> relation's oid (easy fix). We also scan pg_class to resolve the name >> to oid mapping. The name to oid mapping is performed *without* a lock >> on the relation, since we don't know which relation to lock. So the >> name lookup can fail if we are in the middle of a pg_class update. >> This is an existing potential bug in Postgres unrelated to my patch. > If this is a pre-existing bug, then it's not clear to me why we need > to do anything about it at all right now. Yeah. This behavior has been there since day zero, and there have been very few complaints about it. But note that there's only a risk for pg_class updates, not any other catalog, and there is exactly one kind of failure with very predictable consequences. The ALTER TABLE patch has greatly expanded the scope of the issue, and that *is* a regression compared to prior releases. BTW, it seems to me that this issue is closely related to what Noah is trying to fix here: http://archives.postgresql.org/message-id/20110612191843.GF21098@tornado.leadboat.com regards, tom lane
On Mon, Jun 20, 2011 at 10:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sun, Jun 19, 2011 at 5:13 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> We scan pg_class in two ways: to rebuild a relcache entry based on a >>> relation's oid (easy fix). We also scan pg_class to resolve the name >>> to oid mapping. The name to oid mapping is performed *without* a lock >>> on the relation, since we don't know which relation to lock. So the >>> name lookup can fail if we are in the middle of a pg_class update. >>> This is an existing potential bug in Postgres unrelated to my patch. > >> If this is a pre-existing bug, then it's not clear to me why we need >> to do anything about it at all right now. > > Yeah. This behavior has been there since day zero, and there have been > very few complaints about it. But note that there's only a risk for > pg_class updates, not any other catalog, and there is exactly one kind > of failure with very predictable consequences. I agree we shouldn't do anything about the name lookups for 9.1 That is SearchCatCache using RELNAMENSP lookups, to be precise, as well as triggers and few other similar call types. > The ALTER TABLE patch > has greatly expanded the scope of the issue, and that *is* a regression > compared to prior releases. I agree the scope for RELOID errors increased with my 9.1 patch. I'm now happy with the locking patch (attached), which significantly reduces the scope - back to the original error scope, in my testing. I tried to solve both, but I think that's a step too far given the timing. It seems likely that there will be objections to this patch. All I would say is that issuing a stream of ALTER TABLEs against the same table is not a common situation; if it were we would have seen more of the pre-existing bug. ALTER TABLE command encompasses many subcommands and we should evaluate each subcommand differently when we decide what to do. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Mon, Jun 20, 2011 at 11:55 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > I agree we shouldn't do anything about the name lookups for 9.1 > That is SearchCatCache using RELNAMENSP lookups, to be precise, as > well as triggers and few other similar call types. Name lookups give ERRORs that look like this... relation "pgbench_accounts" does not exist >> The ALTER TABLE patch >> has greatly expanded the scope of the issue, and that *is* a regression >> compared to prior releases. > > I agree the scope for RELOID errors increased with my 9.1 patch. Which originally generated ERRORs like cache lookup failed for relation 16390 or could not open relation with OID 16390 which should now be absent from the server log. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jun 20, 2011 at 5:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sun, Jun 19, 2011 at 5:13 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> We scan pg_class in two ways: to rebuild a relcache entry based on a >>> relation's oid (easy fix). We also scan pg_class to resolve the name >>> to oid mapping. The name to oid mapping is performed *without* a lock >>> on the relation, since we don't know which relation to lock. So the >>> name lookup can fail if we are in the middle of a pg_class update. >>> This is an existing potential bug in Postgres unrelated to my patch. > >> If this is a pre-existing bug, then it's not clear to me why we need >> to do anything about it at all right now. > > Yeah. This behavior has been there since day zero, and there have been > very few complaints about it. But note that there's only a risk for > pg_class updates, not any other catalog, and there is exactly one kind > of failure with very predictable consequences. The ALTER TABLE patch > has greatly expanded the scope of the issue, and that *is* a regression > compared to prior releases. It's not entirely clear to me how many additional failure cases we've bought ourselves with this patch. The particular one you've demonstrated seems pretty similar to the on we already had, although possibly the window for it is wider. Did you run the same test you used on 9.1 on 9.0 for comparison? > BTW, it seems to me that this issue is closely related to what Noah is > trying to fix here: > http://archives.postgresql.org/message-id/20110612191843.GF21098@tornado.leadboat.com I think there are two general problems here. One, we use SnapshotNow to scan system catalogs, and SnapshotNow semantics are a mess. Two, even if we used an MVCC snapshot to scan the system catalogs, it's not necessarily safe or correct to latch onto an old row version, because the row update is combined with other actions that are not MVCC-safe, like removing files on disk. Breaking it down a bit more: A. The problem with using a DDL lock level < AccessExclusiveLock, AFAICS, is entirely due to SnapshotNow semantics. Any row version we can possibly see is OK, but we had better see exactly one. Or at least not less than one. B. The problem with name lookups failing in the middle of a pg_class update is also entirely due to SnapshotNow semantics. C. The problem Noah is complaining about is NOT due to SnapshotNow semantics. If someone does BEGIN; DROP TABLE foo; CREATE TABLE foo(); COMMIT, it is no longer OK for anyone to be looking at the old foo. An MVCC snapshot is enough to guarantee that we'll see some row, but examining the old one won't cut it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Jun 20, 2011 at 6:55 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > I agree the scope for RELOID errors increased with my 9.1 patch. I'm > now happy with the locking patch (attached), which significantly > reduces the scope - back to the original error scope, in my testing. > > I tried to solve both, but I think that's a step too far given the timing. > > It seems likely that there will be objections to this patch. All I > would say is that issuing a stream of ALTER TABLEs against the same > table is not a common situation; if it were we would have seen more of > the pre-existing bug. ALTER TABLE command encompasses many subcommands > and we should evaluate each subcommand differently when we decide what > to do. Well, my principal objection is that I think heavyweight locking is an excessively expensive solution to this problem. I think the patch is simple enough that I wouldn't object to applying it on those grounds even at this late date, but I bet if we do some benchmarking on the right workload we'll find a significant performance regression. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Jun 20, 2011 at 5:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yeah. This behavior has been there since day zero, and there have been >> very few complaints about it. But note that there's only a risk for >> pg_class updates, not any other catalog, and there is exactly one kind >> of failure with very predictable consequences. The ALTER TABLE patch >> has greatly expanded the scope of the issue, and that *is* a regression >> compared to prior releases. > It's not entirely clear to me how many additional failure cases we've > bought ourselves with this patch. The particular one you've > demonstrated seems pretty similar to the on we already had, although > possibly the window for it is wider. It's not so much the probability of failure that is bothering me, as the variety of possible symptoms. There was exactly one failure mode before, namely "no such relation". I'm not sure how many possible symptoms there are now, but there's a lot, and most of them are going to be weird "what the heck was that??" behaviors. If we let 9.1 ship like this, we are going to be creating a support headache. Even worse, knowing that those bugs exist will tempt us to write off reports of weird cache lookup failures as being instances of this problem, when closer investigation might show that they're something else. Please note that this position should not be regarded as support for Simon's proposed patch. I still think the right decision is to revert the ALTER TABLE feature, mainly because I do not believe this is the last bug in it. And the fact that there's a pre-existing bug with a vaguely similar symptom is no justification for introducing more bugs. regards, tom lane
Simon Riggs <simon@2ndQuadrant.com> writes: > On Mon, Jun 20, 2011 at 10:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The ALTER TABLE patch >> has greatly expanded the scope of the issue, and that *is* a regression >> compared to prior releases. > I agree the scope for RELOID errors increased with my 9.1 patch. I'm > now happy with the locking patch (attached), which significantly > reduces the scope - back to the original error scope, in my testing. > I tried to solve both, but I think that's a step too far given the timing. > It seems likely that there will be objections to this patch. Yup, you're right. Having read this patch, I have absolutely zero confidence in it. It introduces some locks in random places, with no rhyme or reason that I can see. There is no reason to think that this is a complete solution, and considerable reason to think that it isn't (notably, the RELOID syscache is hardly the only one at risk). Worse, it's adding more locking in performance-critical places, which seems to me to severely degrade the argument for the original feature, namely that it was supposed to give us *less* locking. regards, tom lane
Excerpts from Robert Haas's message of mar jun 21 09:40:16 -0400 2011: > On Mon, Jun 20, 2011 at 6:55 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > I agree the scope for RELOID errors increased with my 9.1 patch. I'm > > now happy with the locking patch (attached), which significantly > > reduces the scope - back to the original error scope, in my testing. > > > > I tried to solve both, but I think that's a step too far given the timing. > > > > It seems likely that there will be objections to this patch. All I > > would say is that issuing a stream of ALTER TABLEs against the same > > table is not a common situation; if it were we would have seen more of > > the pre-existing bug. ALTER TABLE command encompasses many subcommands > > and we should evaluate each subcommand differently when we decide what > > to do. > > Well, my principal objection is that I think heavyweight locking is an > excessively expensive solution to this problem. I think the patch is > simple enough that I wouldn't object to applying it on those grounds > even at this late date, but I bet if we do some benchmarking on the > right workload we'll find a significant performance regression. Yeah, taking a hw lock at each relcache item build is likely to be prohibitively expensive. Heck, relation extension is expensive already in some loads. (I'm guessing that things will tank when there's a relcache reset). Still, this seems to be an overall better approach to the problem than what's been proposed elsewhere. Maybe we can do something with a map of relations that are protected by a bunch of lwlocks instead. (We could use that for relation extension too; that'd rock.) The patch may be simple, but is it complete? I think you need to have lock acquisition on create rule and create index too. Right now it only has locks on trigger stuff and alter table. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Excerpts from Tom Lane's message of mar jun 21 11:06:22 -0400 2011: > Please note that this position should not be regarded as support for > Simon's proposed patch. I still think the right decision is to revert > the ALTER TABLE feature, mainly because I do not believe this is the > last bug in it. And the fact that there's a pre-existing bug with a > vaguely similar symptom is no justification for introducing more bugs. Note that this feature can be disabled by tweaking AlterTableGetLockLevel so that it always returns AccessExclusive. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Tom Lane's message of mar jun 21 11:06:22 -0400 2011: >> Please note that this position should not be regarded as support for >> Simon's proposed patch. I still think the right decision is to revert >> the ALTER TABLE feature, mainly because I do not believe this is the >> last bug in it. And the fact that there's a pre-existing bug with a >> vaguely similar symptom is no justification for introducing more bugs. > Note that this feature can be disabled by tweaking > AlterTableGetLockLevel so that it always returns AccessExclusive. I think Simon had also hacked a couple of other places such as CREATE TRIGGER, but yeah, I was thinking of just lobotomizing that function with an #ifdef. When and if we get these problems worked out, it'll be easy to re-enable the feature. regards, tom lane
On Tue, Jun 21, 2011 at 4:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On Mon, Jun 20, 2011 at 10:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> The ALTER TABLE patch >>> has greatly expanded the scope of the issue, and that *is* a regression >>> compared to prior releases. > >> I agree the scope for RELOID errors increased with my 9.1 patch. I'm >> now happy with the locking patch (attached), which significantly >> reduces the scope - back to the original error scope, in my testing. > >> I tried to solve both, but I think that's a step too far given the timing. > >> It seems likely that there will be objections to this patch. > > Yup, you're right. Having read this patch, I have absolutely zero > confidence in it. It introduces some locks in random places, with no > rhyme or reason that I can see. There is something to be salvaged here and I have additional analysis and suggestions. Please read all the way through. We already allowed INHERIT/dis INHERIT with a lower lock level than AccessExclusiveLock and have received no complaints to speak of. We aren't discussing moving that operation to AccessExclusiveLock in the light of this discussion, nor are we discussing TYPE OF operations. I don't think its sensible to reset those operations to AccessExclusiveLock. The patch has extensive comments that explain the carefully analysed and tested locations for the locking. We take an ExclusiveLock on the RelationDefinition during an ALTER TABLE. We take a ShareLock on CatCache and RelCache rebuilds. The only time these can block is during an ALTER TABLE with a lower lock than AccessExclusiveLock that occurs immediately after other DDL. The locking has been placed to minimise the lock window for simple ALTER TABLE statements. > There is no reason to think that this > is a complete solution, and considerable reason to think that it isn't > (notably, the RELOID syscache is hardly the only one at risk). The whole of the relcache rebuild is protected, not just access to pg_class, so that is safe, no matter which catalog tables are accessed. Catcache does look more complex, so your fears here are worth looking into. I was already doing that since my last post and have now finished. Based on the above, I think we should limit the patch to these areas: 1. Simple operations on pg_class and/or pg_attribute. This includes case AT_SetStatistics: ATTNAME catcache case AT_SetOptions: ATTNAME catcache case AT_ResetOptions: ATTNAME catcache case AT_SetStorage: ATTNAME catcache case AT_SetRelOptions: RELOID catcache case AT_ResetRelOptions: RELOID catcache 2. FK VALIDATION which touches CONSTROID Limiting the scope of the patch to those areas provides good benefit, whilst at the same time limiting our risk footprint. Doing that means we can evaluate the patch against those aspects. For people that still think there is still risk, I've added a parameter called "relaxed_ddl_locking" which defaults to "off". That way people can use this feature in an informed way without exposing us to support risks from its use. [Patch needs minor docs and test documentation changes prior to commit] > Worse, > it's adding more locking in performance-critical places, which seems > to me to severely degrade the argument for the original feature, > namely that it was supposed to give us *less* locking. I think we should be clear that this adds *one* lock/unlock for each object for each session, ONCE only after each transaction that executed a DDL statement against an object. So with 200 sessions, a typical ALTER TABLE statement will cause 400 locks. The current lock manager maxes out above 50,000 locks per second, so any comparative analysis will show this is a minor blip on even a heavy workload. It isn't in a performance critical place and the lock held in case of a rebuild is a ShareLock so won't block, accept against an ALTER TABLE statement. This is a marked *reduction* in locking overhead, in comparison with ALTER TABLE being an AccessExclusiveLock which would make the SQL statement wait for potentially a very long time. With relaxed_ddl_locking = off (default) there is zero effect. So that is not a valid argument to refuse this patch. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Wed, Jun 22, 2011 at 11:26 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > For people that still think there is still risk, I've added a > parameter called "relaxed_ddl_locking" which defaults to "off". That > way people can use this feature in an informed way without exposing us > to support risks from its use. I can't get excited about adding a GUC that says "you can turn this off if you want but don't blame us if it breaks". That just doesn't seem like good software engineering to me. > I think we should be clear that this adds *one* lock/unlock for each > object for each session, ONCE only after each transaction that > executed a DDL statement against an object. So with 200 sessions, a > typical ALTER TABLE statement will cause 400 locks. The current lock > manager maxes out above 50,000 locks per second, so any comparative > analysis will show this is a minor blip on even a heavy workload. I think that using locks to address this problem is the wrong approach. I think the right fix is to use something other than SnapshotNow to do the system catalog scans. However, if we were going to use locking, then we'd need to ensure that: (1) No transaction which has made changes to a system catalog may commit while some other transaction is in the middle of scanning that catalog. (2) No transaction which has made changes to a set of system catalogs may commit while some other transaction is in the midst of fetching interrelated data from a subset of those catalogs. It's important to note, I think, that the problem here doesn't occur at the time that the table rows are updated, because those rows aren't visible yet. The problem happens at commit time. So unless I'm missing something, ALTER TABLE would only need to acquire the relation definition lock after it has finished all of its other work. In fact, it doesn't even really need to acquire it then, either. It could just queue up a list of relation definition locks to acquire immediately before commit, which would be advantageous if the transaction went on to abort, since in that case we'd skip the work of acquiring the locks at all. In fact, without doing that, this patch would undo much of the point of the original ALTER TABLE lock strength reduction: begin; alter table foo alter column a set storage plain; <start a new psql session in another window> select * from foo; In master, the SELECT completes without blocking. With this patch applied, the SELECT blocks, just as it did in 9.0, because it can't rebuild the relcache entry without getting the relation definition lock, which the alter table will hold until commit. In fact, the same thing happens even if foo has been accessed previously in the same session. If there is already an open *transaction* that has accessed foo, then, it seems, it can continue to do so until it commits. But as soon as it tries to start a new transaction, it blocks again. I don't quite understand why that is - I didn't think we invalidated the entire relcache on every commit. But that's what happens. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jun 23, 2011 at 8:59 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Jun 22, 2011 at 11:26 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> For people that still think there is still risk, I've added a >> parameter called "relaxed_ddl_locking" which defaults to "off". That >> way people can use this feature in an informed way without exposing us >> to support risks from its use. > > I can't get excited about adding a GUC that says "you can turn this > off if you want but don't blame us if it breaks". That just doesn't > seem like good software engineering to me. Nobody is suggesting we fix the pre-existing bug - we're just going to leave it. That doesn't sound like good software engineering either, but I'm not going to complain because I understand. We're in a bind here and I'm trying to suggest practical ways out, as well as cater for the levels of risk people are willing to accept. I don't think we need that parameter at all, but if you do then I'm willing to tolerate it. >> I think we should be clear that this adds *one* lock/unlock for each >> object for each session, ONCE only after each transaction that >> executed a DDL statement against an object. So with 200 sessions, a >> typical ALTER TABLE statement will cause 400 locks. The current lock >> manager maxes out above 50,000 locks per second, so any comparative >> analysis will show this is a minor blip on even a heavy workload. > > I think that using locks to address this problem is the wrong > approach. I think the right fix is to use something other than > SnapshotNow to do the system catalog scans. I agree with that, but its a much bigger job than you think and I really doubt that you will do it for 9.2, definitely not for 9.1. There are so many call points, I would not bother personally to attempt a such an critical and widely invasive fix. So I'd put that down as a Blue Sky will-fix-one-day thing. > However, if we were going > to use locking, then we'd need to ensure that: > > (1) No transaction which has made changes to a system catalog may > commit while some other transaction is in the middle of scanning that > catalog. > (2) No transaction which has made changes to a set of system catalogs > may commit while some other transaction is in the midst of fetching > interrelated data from a subset of those catalogs. > > It's important to note, I think, that the problem here doesn't occur > at the time that the table rows are updated, because those rows aren't > visible yet. The problem happens at commit time. So unless I'm > missing something, ALTER TABLE would only need to acquire the relation > definition lock after it has finished all of its other work. In fact, > it doesn't even really need to acquire it then, either. It could just > queue up a list of relation definition locks to acquire immediately > before commit, which would be advantageous if the transaction went on > to abort, since in that case we'd skip the work of acquiring the locks > at all. That would work if the only thing ALTER TABLE does is write. There are various places where we read catalogs and all of those catalogs need to be relationally integrated. So the only safe way, without lots of incredibly detailed analysis (which you would then find fault with) is to lock at the top and hold the lock throughout most of the processing. That's the safe thing to do, at least. I do already use the point you make in the patch, where it's used to unlock before and then relock after the FK constraint check, so that the RelationDefinition lock is never held for long periods in any code path. > In fact, without doing that, this patch would undo much of the point > of the original ALTER TABLE lock strength reduction: > > begin; > alter table foo alter column a set storage plain; > <start a new psql session in another window> > select * from foo; > > In master, the SELECT completes without blocking. With this patch > applied, the SELECT blocks, just as it did in 9.0, because it can't > rebuild the relcache entry without getting the relation definition > lock, which the alter table will hold until commit. It's not the same at all. Yes, they are both locks; what counts is the frequency and duration of locking. With AccessExclusiveLock we prevent all SELECTs from accessing the table while we queue for the lock, which can be blocked behind a long running query. So the total outage to the application for one minor change can be hours - unpredictably. (Which is why I never wrote the ALTER TABLE set ndistinct myself, even though I suggested it, cos its mostly unusable). With the reduced locking level AND this patch the *rebuild* can be blocked behind the DDL. But the DDL itself is not blocked in the same way, so the total outage is much reduced and the whole set of actions goes through fairly quickly. If you only submit one DDL operation then the rebuild has nothing to block against, so the whole thing goes through smoothly. > In fact, the same > thing happens even if foo has been accessed previously in the same > session. If there is already an open *transaction* that has accessed > foo, then, it seems, it can continue to do so until it commits. But > as soon as it tries to start a new transaction, it blocks again. I > don't quite understand why that is - I didn't think we invalidated the > entire relcache on every commit. We don't invalidate the relcache on every commit. > But that's what happens. Test case please. I don't understand the problem you're describing. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jun 24, 2011 at 3:46 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > Test case please. I don't understand the problem you're describing. S1: select * from foo; S2: begin; S2: alter table foo alter column a set storage plain; S1: select * from foo; <blocks> -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jun 24, 2011 at 9:00 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jun 24, 2011 at 3:46 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Test case please. I don't understand the problem you're describing. > > S1: select * from foo; > S2: begin; > S2: alter table foo alter column a set storage plain; > S1: select * from foo; > <blocks> Er,,.yes, that what locks do. Where is the bug? We have these choices of behaviour 1. It doesn't error and doesn't block - not possible for 9.1, probably not for 9.2 either 2. It doesn't block, but may throw an error sometimes - the reported bug 3. It blocks in some cases for short periods where people do repeated DDL, but never throws errors - this patch 4. Full scale locking - human sacrifice, cats and dogs, living together, mass hysteria If you want to avoid the blocking, then don't hold open the transaction. Do this S1: select * from foo S2: alter table.... run in its own transaction S1: select * from foo Doesn't block, no errors. Which is exactly what most people do on their production servers. The ALTER TABLE statements we're talking about are not schema changes. They don't need to be coordinated with other DDL. This patch has locking, but its the most reduced form of locking that is available for a non invasive patch for 9.1 -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jun 24, 2011 at 4:27 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Fri, Jun 24, 2011 at 9:00 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Fri, Jun 24, 2011 at 3:46 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> Test case please. I don't understand the problem you're describing. >> >> S1: select * from foo; >> S2: begin; >> S2: alter table foo alter column a set storage plain; >> S1: select * from foo; >> <blocks> > > Er,,.yes, that what locks do. Where is the bug? I didn't say it was a bug. I said that the additional locking you added acted to remove the much of the benefit of reducing the lock strength in the first place. If a brief exclusive lock (such as would be taken by ALTER TABLE SET STORAGE in 9.0 and prior release) wasn't acceptable, a brief exclusive lock on a different lock tag that blocks most of the same operations isn't going to be any better. You might still see some improvement in the cases where some complicated operation like scanning or rewriting the table can be performed without holding the relation definition lock, and then only get the relation definition lock afterward. But for something like the above example (or the ALTER TABLE SET (n_distinct) feature you mentioned in your previous email) the benefit is going to be darned close to zero. > This patch has locking, but its the most reduced form of locking that > is available for a non invasive patch for 9.1 I don't like the extra lock manager traffic this adds to operations that aren't even performing DDL, I'm not convinced that it is safe, the additional locking negates a significant portion of the benefit of the original patch, and Tom's made a fairly convincing argument that even with this pg_dump will still become significantly less reliable than heretofore even if we did apply it. In my view, what we need to do is revert to using AccessExclusiveLock until some of these other details are worked out. I wasn't entirely convinced that that was necessary when Tom first posted this email, but having thought through the issues and looked over your patch, it's now my position that that is the best way forward. The fact that your proposed patch appears not to be thinking very clearly about when locks need to be taken and released only adds to my feeling that we are in for a bad time if we go down this route. In short: -1 from me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jun 24, 2011 at 10:08 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jun 24, 2011 at 4:27 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> On Fri, Jun 24, 2011 at 9:00 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Fri, Jun 24, 2011 at 3:46 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >>>> Test case please. I don't understand the problem you're describing. >>> >>> S1: select * from foo; >>> S2: begin; >>> S2: alter table foo alter column a set storage plain; >>> S1: select * from foo; >>> <blocks> >> >> Er,,.yes, that what locks do. Where is the bug? > > I didn't say it was a bug. I said that the additional locking you > added acted to remove the much of the benefit of reducing the lock > strength in the first place. If a brief exclusive lock (such as would > be taken by ALTER TABLE SET STORAGE in 9.0 and prior release) wasn't > acceptable, a brief exclusive lock on a different lock tag that blocks > most of the same operations isn't going to be any better. You might > still see some improvement in the cases where some complicated > operation like scanning or rewriting the table can be performed > without holding the relation definition lock, and then only get the > relation definition lock afterward. But for something like the above > example (or the ALTER TABLE SET (n_distinct) feature you mentioned in > your previous email) the benefit is going to be darned close to zero. > >> This patch has locking, but its the most reduced form of locking that >> is available for a non invasive patch for 9.1 > > I don't like the extra lock manager traffic this adds to operations > that aren't even performing DDL, I'm not convinced that it is safe, > the additional locking negates a significant portion of the benefit of > the original patch, and Tom's made a fairly convincing argument that > even with this pg_dump will still become significantly less reliable > than heretofore even if we did apply it. In my view, what we need to > do is revert to using AccessExclusiveLock until some of these other > details are worked out. I wasn't entirely convinced that that was > necessary when Tom first posted this email, but having thought through > the issues and looked over your patch, it's now my position that that > is the best way forward. The fact that your proposed patch appears > not to be thinking very clearly about when locks need to be taken and > released only adds to my feeling that we are in for a bad time if we > go down this route. > > In short: -1 from me. I've explained all of the above points to you already and you're wrong. I don't think you understand this patch at all, nor even the original feature. Locking the table is completely different from locking the definition of a table. Do you advocate that all ALTER TABLE operations use AccessExclusiveLock, or just the operations I added? If you see problems here, then you must also be willing to increase the lock strength of things such as INHERITS, and back patch them to where the same problems exist. You'll wriggle out of that, I'm sure. There are regrettably, many bugs here and they can't be fixed in the simple manner you propose. It's not me you block Robert, I'm not actually a user and I will sleep well whatever happens, happy that I tried to resolve this. Users watch and remember. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jun 24, 2011 at 5:28 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > I've explained all of the above points to you already and you're wrong. We're going to have to agree to disagree on that point. > Do you advocate that all ALTER TABLE operations use > AccessExclusiveLock, or just the operations I added? If you see > problems here, then you must also be willing to increase the lock > strength of things such as INHERITS, and back patch them to where the > same problems exist. You'll wriggle out of that, I'm sure. There are > regrettably, many bugs here and they can't be fixed in the simple > manner you propose. I think there is quite a lot of difference between realizing that we can't fix every problem, and deciding to put out a release that adds a whole lot more of them that we have no plans to fix. > It's not me you block Robert, I'm not actually a user and I will sleep > well whatever happens, happy that I tried to resolve this. Users watch > and remember. If you are proposing that I should worry about a posse of angry PostgreSQL users hunting me down (or abandoning the product) because I agreed with Tom Lane on the necessity of reverting one of your patches, then I'm willing to take that chance. For one thing, there's a pretty good chance they'll go after Tom first. For two things, there's at least an outside chance I might be rescued by an alternative posse who supports our tradition of putting out high quality releases. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jun 24, 2011 at 10:43 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> It's not me you block Robert, I'm not actually a user and I will sleep >> well whatever happens, happy that I tried to resolve this. Users watch >> and remember. > > If you are proposing that I should worry about a posse of angry > PostgreSQL users hunting me down (or abandoning the product) because I > agreed with Tom Lane on the necessity of reverting one of your > patches, then I'm willing to take that chance. For one thing, there's > a pretty good chance they'll go after Tom first. For two things, > there's at least an outside chance I might be rescued by an > alternative posse who supports our tradition of putting out high > quality releases. Not sure where anger and violence entered the discussion, but a posse does sound amusing. If such groups ever met, I'm sure both would support the tradition of high quality, but I suspect they might define it differently and that would be the source of the problem. Much like Life of Brian. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jun 17, 2011 at 8:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Department of second thoughts: I think I see a problem. > > Um, yeah, so that doesn't really work any better than my idea. > > On further reflection, there's a problem at a higher level than this > anyway. Even if we can get a single SnapshotNow scan to produce > guaranteed-self-consistent results, that doesn't ensure consistency > between the results of scans occurring serially. An example here is > ALTER COLUMN DROP DEFAULT, which is currently imagined to impact only > writers. However, suppose that a concurrent relcache load fetches the > pg_attribute row, notes that it has atthasdef = true, and then the ALTER > commits before we start to scan pg_attrdef. The consistency checks in > AttrDefaultFetch() will complain about a missing pg_attrdef entry, and > rightly so. We could lobotomize those checks, but it doesn't feel right > to do so; and anyway there may be other cases that are harder to kluge up. Locking the whole definition is at least one way of solving this problem. My locking fix does that. > So really we need consistency across *at least* one entire relcache load > cycle. We could maybe arrange to take an MVCC snap (or some lighter > weight version of that) at the start, and use that for all the resulting > scans, but I think that would be notationally messy. It's not clear > that it'd solve everything anyhow. There are parts of a relcache entry > that we fetch only on-demand, so they are typically loaded later than > the core items, and probably couldn't use the same snapshot. Worse, > there are lots of places where we assume that use of catcache entries or > direct examination of the catalogs will yield results consistent with > the relcache. > > I suspect these latter problems will impact Simon's idea as well. I think you're probably right, or at least, the suspicion is not something I can address quickly enough to be safe. I will revert to the AccessExclusiveLocks. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jun 17, 2011 at 6:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Jun 16, 2011 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I believe that this is fundamentally unavoidable so long as we use >>> SnapshotNow to read catalogs --- which is something we've talked about >>> changing, but it will require a pretty major R&D effort to make it >>> happen. > >> Ouch. > >> I wonder if we could avoid this anomaly by taking a throwaway MVCC >> snapshot at the beginning of each system catalog scan and using it >> just for the duration of that scan. If nothing that has touched the >> catalog commits while the scan is open, then this is logically >> equivalent to SnapshotNow. If something does commit in mid-scan, then >> we might not get the latest version of the row, but we should end up >> with exactly one. If it's not the latest one, we'll do the rebuild >> again upon seeing the next sinval message; in the meantime, the >> version we're using mustn't be too intolerably bad or it was an error >> not to use AccessExclusiveLock in the first place. > > Yeah, this seems like a possibly workable direction to explore. I like > this better than what Simon is proposing, because it would fix the > generic issue for all types of catalog SnapshotNow scans. > >> IIUC, the problem with this approach is not correctness but >> performance. Taking snapshots is (currently) expensive. > > Yeah. After mulling it for awhile, what about this idea: we could > redefine SnapshotNow as a snapshot type that includes a list of > transactions-in-progress, somewhat like an MVCC snapshot, but we don't > fill that list from the PGPROC array. Instead, while running a scan > with SnapshotNow, anytime we determine that a particular XID is > still-in-progress, we add that XID to the snapshot's list. > Subsequently, the SnapshotNow code assumes that XID to be > still-in-progress without consulting its actual state. We reset the XID > list to empty when starting a new SnapshotNow scan. (We might be able > to do so less often than that, like only when we do > AcceptInvalidationMessages, but it's not clear to me that there's any > real benefit in hanging onto the state longer.) > > This costs no performance; if anything it should be faster than now, > because we'll be replacing expensive transaction state probes with > relatively-cheap searches of an XID array that should almost always > be quite short. > > With this approach, we would have no serialization anomalies from single > transactions committing while a scan is in progress. There could be > anomalies resulting from considering an earlier XID to be in-progress > while a later XID is considered committed (because we didn't observe > it until later). So far as I can see offhand, the impact of that would > be that there might be multiple versions of a tuple that are considered > good, but never that there would be no version considered good (so long > as the other XIDs simply updated the tuple and didn't delete it). I > think this would be all right, since the scan would just seize on the > first good version it finds. As you argue above, if that's not good > enough for our purposes then the updater(s) should have taken a stronger > lock. I liked this idea, so began to prototype the code. My rough hack is attached, for the record. One thing that occurs to me about this is that SnapshotNow with or without these changes returns the latest committed row and ignores in-progress changes. Accepting an older version of the definition will always be potentially dangerous. I can't see a way of doing this that doesn't require locking - for changes such as new constraints we need to wait until in progress changes are complete. So maybe this idea is worth doing, but I don't think it helps us much reduce lock levels for DDL. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Simon Riggs wrote: > > This costs no performance; if anything it should be faster than now, > > because we'll be replacing expensive transaction state probes with > > relatively-cheap searches of an XID array that should almost always > > be quite short. > > > > With this approach, we would have no serialization anomalies from single > > transactions committing while a scan is in progress. ?There could be > > anomalies resulting from considering an earlier XID to be in-progress > > while a later XID is considered committed (because we didn't observe > > it until later). ?So far as I can see offhand, the impact of that would > > be that there might be multiple versions of a tuple that are considered > > good, but never that there would be no version considered good (so long > > as the other XIDs simply updated the tuple and didn't delete it). ?I > > think this would be all right, since the scan would just seize on the > > first good version it finds. ?As you argue above, if that's not good > > enough for our purposes then the updater(s) should have taken a stronger > > lock. > > I liked this idea, so began to prototype the code. My rough hack is > attached, for the record. > > One thing that occurs to me about this is that SnapshotNow with or > without these changes returns the latest committed row and ignores > in-progress changes. > > Accepting an older version of the definition will always be > potentially dangerous. I can't see a way of doing this that doesn't > require locking - for changes such as new constraints we need to wait > until in progress changes are complete. > > So maybe this idea is worth doing, but I don't think it helps us much > reduce lock levels for DDL. Just confirming we decided not to persue this. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Tue, Nov 29, 2011 at 11:50 AM, Bruce Momjian <bruce@momjian.us> wrote: > Just confirming we decided not to persue this. Doesn't sound like it. I've been thinking a lot about the more general problem here - namely, that allowing catalog changes without an access exclusive lock is unsafe - and trying to come up with a good solution, because I'd really like to see us make this work. It strikes me that there are really two separate problems here. 1. If you are scanning a system catalog using SnapshotNow, and a commit happens at just the right moment, you might see two versions of the row or zero rather than one. You'll see two if you scan the old row version, then the concurrent transaction commits, and then you scan the new one. You'll see zero if you scan the new row (ignoring it, since it isn't committed yet) and then the concurrent transaction commits, and then you scan the old row. On a related note, if you scan several interrelated catalogs (e.g. when building a relcache entry) and a transaction that has made matching modifications to multiple catalogs commits midway through, you might see the old version of the data from one table and the new version of the data from some other table. Using AccessExclusiveLock fixes the problem because we guarantee that anybody who wants to read those rows first takes some kind of lock, which they won't be able to do while the updater holds AccessExclusiveLock. 2. Other backends may have data in the relcache or catcaches which won't get invalidated until they do AcceptInvalidationMessages(). That will always happen no later than the next time they lock the relation, so if you are holding AccessExclusiveLock then you should be OK: no one else holds any lock, and they'll have to go get one before doing anything interesting. But if you are holding any weaker lock, there's nothing to force AcceptInvalidationMessages to happen before you reuse those cache entries. In both cases, name lookups are an exception: we don't know what OID to lock until we've done the name lookup. Random ideas for solving the first problem: 1-A. Take a fresh MVCC snapshot for each catalog scan. I think we've rejected this before because of the cost involved, but maybe with Pavan's patch and some of the other improvements that are on the way we could make this cheap enough to be acceptable. 1-B. Don't allow transactions that have modified system catalog X to commit while we're scanning system catalog X; make the scans take some kind of shared lock and commits an exclusive lock. This seems awfully bad for concurrency, though, not to mention the overhead of taking and releasing all those locks even when nothing conflicts. 1-C. Wrap a retry loop around every place where we scan a system catalog. If a transaction that has written rows in catalog X commits while we're scanning catalog X, discard the results of the first scan and declare a do-over (or maybe something more coarse-grained, or at the other extreme even more fine-grained). If we're doing several interrelated scans then the retry loop must surround the entire unit, and enforce a do-over of the whole operation if commits occur in any of the catalogs before the scan completes. Unfortunately, any of these seem likely to require a Lot of Work, probably more than can be done in time for 9.2. However, perhaps it would be feasible to hone in on a limited subset of the cases (e.g. name lookups, which are mainly done in only a handful of places, and represent an existing bug) and implement the necessary code changes just for those cases. Then we could generalize that pattern to other cases as time and energy permit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Nov 30, 2011 at 4:42 AM, Robert Haas <robertmhaas@gmail.com> wrote: > It strikes me that there are really two separate problems here. > > 1. If you are scanning a system catalog using SnapshotNow, and a > commit happens at just the right moment, you might see two versions of > the row or zero rather than one. You'll see two if you scan the old > row version, then the concurrent transaction commits, and then you > scan the new one. You'll see zero if you scan the new row (ignoring > it, since it isn't committed yet) and then the concurrent transaction > commits, and then you scan the old row. That is a bug and one we should fix. I supplied a patch for that, written to Tom's idea for how to solve it. I will apply that, unless there are objections. > 2. Other backends may have data in the relcache or catcaches which > won't get invalidated until they do AcceptInvalidationMessages(). > That will always happen no later than the next time they lock the > relation, so if you are holding AccessExclusiveLock then you should be > OK: no one else holds any lock, and they'll have to go get one before > doing anything interesting. But if you are holding any weaker lock, > there's nothing to force AcceptInvalidationMessages to happen before > you reuse those cache entries. Yes, inconsistent cache entries will occur if we allow catalog updates while the table is already locked by others. The question is whether that is a problem in all cases. With these ALTER TABLE subcommands, I don't see any problem with different backends seeing different values. /* * These subcommands affect general strategies for performance * and maintenance, thoughdon't change the semantic results * from normal data reads and writes. Delaying an ALTER TABLE * behind currently active writes only delays the point where * the new strategy begins to take effect, sothere is no * benefit in waiting. In this case the minimum restriction * applies: we don't currentlyallow concurrent catalog * updates. */ case AT_SetStatistics: // only used by ANALYZE, which is shut out by the ShareUpdateExclusiveLock case AT_ClusterOn: case AT_DropCluster: // only used by CLUSTER, which is shut out because it needs AccessExclusiveLock case AT_SetRelOptions: case AT_ResetRelOptions: case AT_SetOptions: case AT_ResetOptions: case AT_SetStorage: // not critical case AT_ValidateConstraint: // not a problem if some people think its invalid when it is valid So ISTM that we can allow reduced lock levels for the above commands if we fix SnapshotNow. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Dec 16, 2011 at 7:07 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > That is a bug and one we should fix. I supplied a patch for that, > written to Tom's idea for how to solve it. > > I will apply that, unless there are objections. I remember several attempts at that, but I don't remember any that didn't meet with objections. Do you have a link? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Dec 16, 2011 at 1:38 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Dec 16, 2011 at 7:07 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> That is a bug and one we should fix. I supplied a patch for that, >> written to Tom's idea for how to solve it. >> >> I will apply that, unless there are objections. > > I remember several attempts at that, but I don't remember any that > didn't meet with objections. Do you have a link? My patch to implement SnapshotNow correctly, from Jun 27 on this thread was never reviewed or commented upon by anybody. That was probably because it only fixes one of the problems, not all of them. But it does fix a current bug and that's why I'm asking now if there are objections to committing it. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Dec 16, 2011 at 8:54 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Fri, Dec 16, 2011 at 1:38 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Fri, Dec 16, 2011 at 7:07 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> That is a bug and one we should fix. I supplied a patch for that, >>> written to Tom's idea for how to solve it. >>> >>> I will apply that, unless there are objections. >> >> I remember several attempts at that, but I don't remember any that >> didn't meet with objections. Do you have a link? > > My patch to implement SnapshotNow correctly, from Jun 27 on this > thread was never reviewed or commented upon by anybody. That was > probably because it only fixes one of the problems, not all of them. Well, I think it was mostly because you didn't sound terribly optimistic about the approach: "So maybe this idea is worth doing, but I don't think it helps us much reduce lock levels for DDL." And also because you described the patch as a "rough hack", and not something you thought ready to commit. I am also not entirely sure I believe that this is plugging all the failure cases. I think that it may just be making the failure cases more obscure, rather than really getting rid of them. Consider something like the following: T1: Update row version A, creating new row version B. T2: Begin scanning the catalog in question. We happen to encounter row version B first. We remember T1's XID as in progress, but don't see the row since T1 hasn't committed. T1: Rollback. T3: Update row version A, creating new row version C. T3: Commit. T2: Scan now comes to row version A; we don't see that version either, since T3 is committed. I don't think there's any guarantee that T2's scan will see tuples inserted after the start of the scan. If I'm correct about that, and I'm pretty sure it's true for sequential scans anyway, then T2's scan might end without seeing C either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Dec 17, 2011 at 3:22 AM, Robert Haas <robertmhaas@gmail.com> wrote: > I am also not entirely sure I believe that this is plugging all the > failure cases. I think that it may just be making the failure cases > more obscure, rather than really getting rid of them. Consider > something like the following: > > T1: Update row version A, creating new row version B. > T2: Begin scanning the catalog in question. We happen to encounter > row version B first. We remember T1's XID as in progress, but don't > see the row since T1 hasn't committed. > T1: Rollback. > T3: Update row version A, creating new row version C. > T3: Commit. > T2: Scan now comes to row version A; we don't see that version either, > since T3 is committed. > > I don't think there's any guarantee that T2's scan will see tuples > inserted after the start of the scan. If I'm correct about that, and > I'm pretty sure it's true for sequential scans anyway, then T2's scan > might end without seeing C either. A simpler example shows it better. Tom's idea prevents 2 rows being returned, but doesn't prevent zero rows. That's a useful improvement, but not the only thing. ISTM we can run a scan as we do now, without locking. If scan returns zero rows we scan again, but take a definition lock first. ALTER TABLE holds the definition lock while running, which won't be a problem because we would only use that on shorter AT statements. Definition lock being the type of lock described earlier on this thread, that we already have code for. So we have code for all the parts we need to make this work. So that means we'd be able to plug the gaps noted, without reducing performance on the common code paths. I don't see any objections made so far remain valid with that approach. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Dec 17, 2011 at 6:11 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > A simpler example shows it better. Tom's idea prevents 2 rows being > returned, but doesn't prevent zero rows. I don't think it prevents either one. Suppose we read and return a tuple, and then someone HOT-updates it and commits. SnapshotNow is very happy to also return the updated version of that same tuple on the next iteration of the scan. See commit c0f03aae0469e758964faac0fb741685170c39a5. > That's a useful improvement, but not the only thing. > > ISTM we can run a scan as we do now, without locking. If scan returns > zero rows we scan again, but take a definition lock first. ALTER TABLE > holds the definition lock while running, which won't be a problem > because we would only use that on shorter AT statements. > > Definition lock being the type of lock described earlier on this > thread, that we already have code for. So we have code for all the > parts we need to make this work. > > So that means we'd be able to plug the gaps noted, without reducing > performance on the common code paths. > > I don't see any objections made so far remain valid with that approach. I think a retry loop is a possibly useful tool for solving this problem, but I'm very skeptical about the definition locks, unless we can arrange to have them taken just before commit (regardless of when during the transaction ALTER TABLE was executed) and released immediately afterwards. What I think might be even better is something along the lines of what Noah Misch did RangeVarGetRelid -- don't try to lock out concurrent activity, just detect it and redo the operation if anything bad happens while we're in process. In this case, that would mean having a mechanism to know whether any concurrent transaction had updated the catalog we're scanning during the scan. Yet another option, which I wonder whether we're dismissing too lightly, is to just call GetSnapshotData() and do the scan using a plain old MVCC snapshot. Sure, it's more expensive than SnapshotNow, but is it expensive enough to worry about? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Dec 19, 2011 at 12:05:09PM -0500, Robert Haas wrote: > Yet another option, which I wonder whether we're dismissing too > lightly, is to just call GetSnapshotData() and do the scan using a > plain old MVCC snapshot. Sure, it's more expensive than SnapshotNow, > but is it expensive enough to worry about? Good point. For the most part, we already regard a catalog scan as too expensive for bulk use, hence relcache and catcache. That's not license to slow them down recklessly, but it's worth discovering how much of a hit we'd actually face. I created a function that does this in a loop: HeapTuple t; CatalogCacheFlushCatalog(ProcedureRelationId); t = SearchSysCache1(PROCOID, ObjectIdGetDatum(42) /* int4in */); if (!HeapTupleIsValid(t)) elog(ERROR, "cache lookup failed for function 42"); ReleaseSysCache(t); Then, I had pgbench call the function once per client with various numbers of clients and a loop iteration count such that the total number of scans per run was always 19200000. Results for master and for a copy patched to use MVCC snapshots in catcache.c only: 2 clients, master: 4:30.66elapsed4 clients, master: 4:26.82elapsed 32 clients, master: 4:25.30elapsed2 clients, master: 4:25.67elapsed4 clients, master: 4:26.58elapsed 32 clients, master: 4:26.40elapsed2 clients, master: 4:27.54elapsed4 clients, master: 4:26.60elapsed 32 clients, master: 4:27.20elapsed2 clients, mvcc-catcache: 4:35.13elapsed4 clients, mvcc-catcache: 4:30.40elapsed 32 clients, mvcc-catcache: 4:37.91elapsed2 clients, mvcc-catcache: 4:28.13elapsed4 clients, mvcc-catcache: 4:27.06elapsed 32 clients, mvcc-catcache: 4:32.84elapsed2 clients, mvcc-catcache: 4:32.47elapsed4 clients, mvcc-catcache: 4:24.35elapsed 32 clients, mvcc-catcache: 4:31.54elapsed I see roughly a 2% performance regression. However, I'd expect any bulk losses to come from increased LWLock contention, which just doesn't materialize in a big way on this 2-core box. If anyone would like to rerun this on a larger machine, I can package it up for reuse.
Noah Misch <noah@leadboat.com> writes: > On Mon, Dec 19, 2011 at 12:05:09PM -0500, Robert Haas wrote: >> Yet another option, which I wonder whether we're dismissing too >> lightly, is to just call GetSnapshotData() and do the scan using a >> plain old MVCC snapshot. Sure, it's more expensive than SnapshotNow, >> but is it expensive enough to worry about? That might actually be workable ... > I created a function that does this in a loop: > HeapTuple t; > CatalogCacheFlushCatalog(ProcedureRelationId); > t = SearchSysCache1(PROCOID, ObjectIdGetDatum(42) /* int4in */); > if (!HeapTupleIsValid(t)) > elog(ERROR, "cache lookup failed for function 42"); > ReleaseSysCache(t); ... but this performance test seems to me to be entirely misguided, because it's testing a situation that isn't going to occur much in the field, precisely because the syscache should prevent constant reloads of the same syscache entry. Poking around a bit, it looks to me like one of the bigger users of non-cache-fronted SnapshotNow scans is dependency.c. So maybe testing the speed of large cascaded drops would be a more relevant test case. For instance, create a schema containing a few thousand functions, and see how long it takes to drop the schema. Another thing that would be useful to know is what effect such a change would have on the time to run the regression tests with CLOBBER_CACHE_ALWAYS. That has nothing to do with any real-world performance concerns, but it would be good to see if we're going to cause a problem for the long-suffering buildfarm member that does that. regards, tom lane
On Mon, Dec 19, 2011 at 11:13:57PM -0500, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > I created a function that does this in a loop: > > > HeapTuple t; > > > CatalogCacheFlushCatalog(ProcedureRelationId); > > t = SearchSysCache1(PROCOID, ObjectIdGetDatum(42) /* int4in */); > > if (!HeapTupleIsValid(t)) > > elog(ERROR, "cache lookup failed for function 42"); > > ReleaseSysCache(t); > > ... but this performance test seems to me to be entirely misguided, > because it's testing a situation that isn't going to occur much in the > field, precisely because the syscache should prevent constant reloads of > the same syscache entry. > [ideas for more-realistic tests] Granted, but I don't hope to reliably measure a change in a macro-benchmark after seeing a rickety 2% change in a micro-benchmark.
Noah Misch <noah@leadboat.com> writes: > On Mon, Dec 19, 2011 at 11:13:57PM -0500, Tom Lane wrote: >> ... but this performance test seems to me to be entirely misguided, >> because it's testing a situation that isn't going to occur much in the >> field, precisely because the syscache should prevent constant reloads of >> the same syscache entry. >> [ideas for more-realistic tests] > Granted, but I don't hope to reliably measure a change in a macro-benchmark > after seeing a rickety 2% change in a micro-benchmark. No, I'm not sure about that at all. In particular I think that CatalogCacheFlushCatalog is pretty expensive and so the snapshot costs could be a larger part of a more-realistic test. regards, tom lane
On Tue, Dec 20, 2011 at 4:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Noah Misch <noah@leadboat.com> writes: >> On Mon, Dec 19, 2011 at 11:13:57PM -0500, Tom Lane wrote: >>> ... but this performance test seems to me to be entirely misguided, >>> because it's testing a situation that isn't going to occur much in the >>> field, precisely because the syscache should prevent constant reloads of >>> the same syscache entry. > >>> [ideas for more-realistic tests] > >> Granted, but I don't hope to reliably measure a change in a macro-benchmark >> after seeing a rickety 2% change in a micro-benchmark. > > No, I'm not sure about that at all. In particular I think that > CatalogCacheFlushCatalog is pretty expensive and so the snapshot costs > could be a larger part of a more-realistic test. Attached patch makes SnapshotNow into an MVCC snapshot, initialised at the start of each scan iff SnapshotNow is passed as the scan's snapshot. It's fairly brief but seems to do the trick. Assuming that is the right approach, some timings 10,000 functions individually Create 15.7s 14.3s 14.8s 14.8s Drop 11.9s 11.7 11.6s 12.0s 10,000 functions in a schema Create ... same ... Drop Cascade 2.5s That seems OK to me. Any feedback? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Simon Riggs <simon@2ndQuadrant.com> writes: > Assuming that is the right approach, some timings Um ... timings of what? regards, tom lane
On Mon, Jan 2, 2012 at 5:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> Assuming that is the right approach, some timings > > Um ... timings of what? Apologies for being terse, no problem to give a full explanation. You suggested earlier that putting calls to GetSnapshotData() in place of using SnapshotNow might increase the time taken to do catalog scans. That matters most in places where stuff isn't cached, one such place you suggested might be the dependency check code, so a test of dropping thousands of functions might show up any bad regressions. From what I can see, there are no bad performance regressions from making SnapshotNow use MVCC. We're benefiting in 9.2 from good reductions in GetSnapshotData() contention and improved performance anyway, so it looks like a reasonable balance. I'm proposing that we apply the above patch (v4) to allow SnapshotNow scans to return consistent, trustable results. That fixes some of the corner case bugs we've seen and paves the way for me to re-enable the lock reduction code. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote: > Attached patch makes SnapshotNow into an MVCC snapshot, initialised at > the start of each scan iff SnapshotNow is passed as the scan's > snapshot. It's fairly brief but seems to do the trick. That's a neat trick. However, if you start a new SnapshotNow scan while one is ongoing, the primordial scan's snapshot will change mid-stream.
On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah@leadboat.com> wrote: > On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote: >> Attached patch makes SnapshotNow into an MVCC snapshot, initialised at >> the start of each scan iff SnapshotNow is passed as the scan's >> snapshot. It's fairly brief but seems to do the trick. > > That's a neat trick. However, if you start a new SnapshotNow scan while one is > ongoing, the primordial scan's snapshot will change mid-stream. Do we ever do that? (and if so, Why?!? or perhaps just Where?) We can use more complex code if required, but we'll be adding complexity and code into the main path that I'd like to avoid. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes: > On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah@leadboat.com> wrote: >> On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote: >>> Attached patch makes SnapshotNow into an MVCC snapshot, >> That's a neat trick. �However, if you start a new SnapshotNow scan while one is >> ongoing, the primordial scan's snapshot will change mid-stream. > Do we ever do that? Almost certainly yes. For example, a catcache load may invoke catcache or relcache reload operations on its way to opening the table or index needed to fetch the desired row. I think you can only safely do this if each caller has its own snapshot variable, a la SnapshotDirty, and that's going to be hugely more invasive. regards, tom lane
Simon Riggs <simon@2ndQuadrant.com> writes: > On Mon, Jan 2, 2012 at 5:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Um ... timings of what? > Apologies for being terse, no problem to give a full explanation. But you still didn't. I wanted to know what those numbers were and how they show that there's not a performance regression. Presumably you meant that some were "before" and some "after", but they were not so labeled. regards, tom lane
On Mon, Jan 02, 2012 at 06:41:31PM +0000, Simon Riggs wrote: > On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah@leadboat.com> wrote: > > On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote: > >> Attached patch makes SnapshotNow into an MVCC snapshot, initialised at > >> the start of each scan iff SnapshotNow is passed as the scan's > >> snapshot. It's fairly brief but seems to do the trick. > > > > That's a neat trick. ?However, if you start a new SnapshotNow scan while one is > > ongoing, the primordial scan's snapshot will change mid-stream. > > Do we ever do that? (and if so, Why?!? or perhaps just Where?) I hacked up your patch a bit, as attached, to emit a WARNING for any nested use of SnapshotNow. This made 97/127 test files fail. As one example, RelationBuildRuleLock() does TextDatumGetCString() for every tuple of its SnapshotNow scan. That may need a detoast, which itself runs a scan. > We can use more complex code if required, but we'll be adding > complexity and code into the main path that I'd like to avoid. Agreed.
Attachment
Excerpts from Noah Misch's message of lun ene 02 16:25:25 -0300 2012: > On Mon, Jan 02, 2012 at 06:41:31PM +0000, Simon Riggs wrote: > > On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah@leadboat.com> wrote: > > > On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote: > > >> Attached patch makes SnapshotNow into an MVCC snapshot, initialised at > > >> the start of each scan iff SnapshotNow is passed as the scan's > > >> snapshot. It's fairly brief but seems to do the trick. > > > > > > That's a neat trick. ?However, if you start a new SnapshotNow scan while one is > > > ongoing, the primordial scan's snapshot will change mid-stream. > > > > Do we ever do that? (and if so, Why?!? or perhaps just Where?) > > I hacked up your patch a bit, as attached, to emit a WARNING for any nested > use of SnapshotNow. This made 97/127 test files fail. As one example, > RelationBuildRuleLock() does TextDatumGetCString() for every tuple of its > SnapshotNow scan. That may need a detoast, which itself runs a scan. Uh, I thought detoasting had its own visibility test function .. I mean, otherwise, what is HeapTupleSatisfiesToast for? -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Mon, Jan 02, 2012 at 04:33:28PM -0300, Alvaro Herrera wrote: > > Excerpts from Noah Misch's message of lun ene 02 16:25:25 -0300 2012: > > On Mon, Jan 02, 2012 at 06:41:31PM +0000, Simon Riggs wrote: > > > On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah@leadboat.com> wrote: > > > > On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote: > > > >> Attached patch makes SnapshotNow into an MVCC snapshot, initialised at > > > >> the start of each scan iff SnapshotNow is passed as the scan's > > > >> snapshot. It's fairly brief but seems to do the trick. > > > > > > > > That's a neat trick. ?However, if you start a new SnapshotNow scan while one is > > > > ongoing, the primordial scan's snapshot will change mid-stream. > > > > > > Do we ever do that? (and if so, Why?!? or perhaps just Where?) > > > > I hacked up your patch a bit, as attached, to emit a WARNING for any nested > > use of SnapshotNow. This made 97/127 test files fail. As one example, > > RelationBuildRuleLock() does TextDatumGetCString() for every tuple of its > > SnapshotNow scan. That may need a detoast, which itself runs a scan. > > Uh, I thought detoasting had its own visibility test function .. I mean, > otherwise, what is HeapTupleSatisfiesToast for? The SnapshotNow scan was actually to build the relcache entry for the toast table before scanning the toast table itself. Stack trace: #0 InitSnapshotNowIfNeeded (snap=0xb7e8c0) at snapmgr.c:216 #1 0x000000000048f773 in index_beginscan_internal (indexRelation=0x7f9a091e5cc8, nkeys=2, norderbys=0, snapshot=0xb7e8c0)at indexam.c:295 #2 0x000000000048f7e9 in index_beginscan (heapRelation=0x7f9a091a6c60, indexRelation=0xb7e8c0, snapshot=0xb7e8c0, nkeys=152984776,norderbys=0) at indexam.c:238 #3 0x000000000048e1de in systable_beginscan (heapRelation=0x7f9a091a6c60, indexId=<value optimized out>, indexOK=<valueoptimized out>, snapshot=0xb7e8c0, nkeys=2, key=0x7fffe211b610) at genam.c:289 #4 0x00000000007287db in RelationBuildTupleDesc (targetRelId=<value optimized out>, insertIt=1 '\001') at relcache.c:455 #5 RelationBuildDesc (targetRelId=<value optimized out>, insertIt=1 '\001') at relcache.c:880 #6 0x000000000072a050 in RelationIdGetRelation (relationId=2838) at relcache.c:1567 #7 0x00000000004872c0 in relation_open (relationId=2838, lockmode=1) at heapam.c:910 #8 0x0000000000487328 in heap_open (relationId=12052672, lockmode=152984776) at heapam.c:1052 #9 0x000000000048a737 in toast_fetch_datum (attr=<value optimized out>) at tuptoaster.c:1586 #10 0x000000000048bf74 in heap_tuple_untoast_attr (attr=0xb7e8c0) at tuptoaster.c:135 #11 0x0000000000737d77 in pg_detoast_datum_packed (datum=0xb7e8c0) at fmgr.c:2266 #12 0x00000000006f0fb0 in text_to_cstring (t=0xb7e8c0) at varlena.c:135 #13 0x0000000000727083 in RelationBuildRuleLock (relation=0x7f9a091b3818) at relcache.c:673 #14 0x000000000072909e in RelationBuildDesc (targetRelId=<value optimized out>, insertIt=1 '\001') at relcache.c:886 #15 0x000000000072a050 in RelationIdGetRelation (relationId=11119) at relcache.c:1567 #16 0x00000000004872c0 in relation_open (relationId=11119, lockmode=0) at heapam.c:910 #17 0x0000000000487483 in relation_openrv_extended (relation=<value optimized out>, lockmode=<value optimized out>, missing_ok=<valueoptimized out>) at heapam.c:1011 #18 0x000000000048749c in heap_openrv_extended (relation=0xb7e8c0, lockmode=152984776, missing_ok=5 '\005') at heapam.c:1110 #19 0x000000000053570c in parserOpenTable (pstate=0xc885f8, relation=0xc88258, lockmode=1) at parse_relation.c:835 #20 0x000000000053594d in addRangeTableEntry (pstate=0xc885f8, relation=0xc88258, alias=0x0, inh=1 '\001', inFromCl=1 '\001')at parse_relation.c:901 #21 0x0000000000524dbf in transformTableEntry (pstate=0xc885f8, n=0xc88258, top_rte=0x7fffe211bd78, top_rti=0x7fffe211bd84,relnamespace=0x7fffe211bd70, containedRels=0x7fffe211bd68) at parse_clause.c:445 #22 transformFromClauseItem (pstate=0xc885f8, n=0xc88258, top_rte=0x7fffe211bd78, top_rti=0x7fffe211bd84, relnamespace=0x7fffe211bd70,containedRels=0x7fffe211bd68) at parse_clause.c:683 #23 0x0000000000526124 in transformFromClause (pstate=0xc885f8, frmList=<value optimized out>) at parse_clause.c:129 #24 0x000000000050c321 in transformSelectStmt (pstate=0xb7e8c0, parseTree=0xc88340) at analyze.c:896 #25 transformStmt (pstate=0xb7e8c0, parseTree=0xc88340) at analyze.c:186 #26 0x000000000050d583 in parse_analyze (parseTree=0xc88340, sourceText=0xc87828 "table pg_stat_user_tables;", paramTypes=0x0,numParams=0) at analyze.c:94 #27 0x00000000006789c5 in pg_analyze_and_rewrite (parsetree=0xc88340, query_string=0xc87828 "table pg_stat_user_tables;",paramTypes=0x0, numParams=0) at postgres.c:583 #28 0x0000000000678c77 in exec_simple_query (query_string=0xc87828 "table pg_stat_user_tables;") at postgres.c:941 #29 0x0000000000679997 in PostgresMain (argc=<value optimized out>, argv=<value optimized out>, username=0xbe03c0 "nm") atpostgres.c:3881 #30 0x0000000000633d41 in BackendRun () at postmaster.c:3587 #31 BackendStartup () at postmaster.c:3272 #32 ServerLoop () at postmaster.c:1350 #33 0x0000000000635330 in PostmasterMain (argc=1, argv=0xbdc170) at postmaster.c:1110 #34 0x00000000005d206b in main (argc=1, argv=0xbdc170) at main.c:199
On Mon, Jan 2, 2012 at 7:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah@leadboat.com> wrote: >>> On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote: >>>> Attached patch makes SnapshotNow into an MVCC snapshot, > >>> That's a neat trick. However, if you start a new SnapshotNow scan while one is >>> ongoing, the primordial scan's snapshot will change mid-stream. > >> Do we ever do that? > > Almost certainly yes. For example, a catcache load may invoke catcache > or relcache reload operations on its way to opening the table or index > needed to fetch the desired row. Ah, of course. I was thinking they would be rare by design. > I think you can only safely do this if each caller has its own snapshot > variable, a la SnapshotDirty, and that's going to be hugely more > invasive. OK, will look into it. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jan 2, 2012 at 7:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On Mon, Jan 2, 2012 at 5:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Um ... timings of what? > >> Apologies for being terse, no problem to give a full explanation. > > But you still didn't. I wanted to know what those numbers were and how > they show that there's not a performance regression. Presumably you > meant that some were "before" and some "after", but they were not so > labeled. All timings were "after" applying the patch. Since all of the tests had very acceptable absolute values I didn't test without-patch. Anyway, looks like we need to bin that and retest with new patch when it comes. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Excerpts from Noah Misch's message of lun ene 02 16:39:09 -0300 2012: > > On Mon, Jan 02, 2012 at 04:33:28PM -0300, Alvaro Herrera wrote: > > Uh, I thought detoasting had its own visibility test function .. I mean, > > otherwise, what is HeapTupleSatisfiesToast for? > > The SnapshotNow scan was actually to build the relcache entry for the toast > table before scanning the toast table itself. Stack trace: Oh, right, that makes sense. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Mon, Jan 2, 2012 at 7:49 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > > Excerpts from Noah Misch's message of lun ene 02 16:39:09 -0300 2012: >> >> On Mon, Jan 02, 2012 at 04:33:28PM -0300, Alvaro Herrera wrote: > >> > Uh, I thought detoasting had its own visibility test function .. I mean, >> > otherwise, what is HeapTupleSatisfiesToast for? >> >> The SnapshotNow scan was actually to build the relcache entry for the toast >> table before scanning the toast table itself. Stack trace: > > Oh, right, that makes sense. Certainly does. Thanks Noah, much appreciated. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes: > On Mon, Jan 2, 2012 at 7:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> But you still didn't. �I wanted to know what those numbers were and how >> they show that there's not a performance regression. �Presumably you >> meant that some were "before" and some "after", but they were not so >> labeled. > All timings were "after" applying the patch. Since all of the tests > had very acceptable absolute values I didn't test without-patch. What is a "very acceptable absolute value", and how do you know it's acceptable if you don't know what the previous performance was? This reasoning makes no sense to me at all. regards, tom lane
On Mon, Jan 2, 2012 at 8:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On Mon, Jan 2, 2012 at 7:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> But you still didn't. I wanted to know what those numbers were and how >>> they show that there's not a performance regression. Presumably you >>> meant that some were "before" and some "after", but they were not so >>> labeled. > >> All timings were "after" applying the patch. Since all of the tests >> had very acceptable absolute values I didn't test without-patch. > > What is a "very acceptable absolute value", and how do you know it's > acceptable if you don't know what the previous performance was? This > reasoning makes no sense to me at all. I don't need to do things twice before deciding I enjoyed it the first time. A low value showed that there were no problems, to me. If you want to see more or discuss, that's cool, no problem. But no need to beat me up for not guessing correctly the level of rigour that would be acceptable to you. Now I have the first test result of your requirements, I will be able to judge further test results against the required standard. As I've said, this is all invalid now anyway. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jan 2, 2012 at 7:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah@leadboat.com> wrote: >>> On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote: >>>> Attached patch makes SnapshotNow into an MVCC snapshot, > >>> That's a neat trick. However, if you start a new SnapshotNow scan while one is >>> ongoing, the primordial scan's snapshot will change mid-stream. > >> Do we ever do that? > > Almost certainly yes. For example, a catcache load may invoke catcache > or relcache reload operations on its way to opening the table or index > needed to fetch the desired row. > > I think you can only safely do this if each caller has its own snapshot > variable, a la SnapshotDirty, and that's going to be hugely more > invasive. I think I can avoid doing things that way and keep it non-invasive. If we #define SnapshotNow (GetSnapshotNow()) and make GetSnapshotNow() reuse the static SnapshotNowData if possible. If not available we can allocate a new snapshot in TopTransactionContext. We can free the snapshot at the end of scan. That hides most of the complexity without causing any problems, ISTM. Working on this now. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jan 2, 2012 at 9:17 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Mon, Jan 2, 2012 at 7:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Simon Riggs <simon@2ndQuadrant.com> writes: >>> On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah@leadboat.com> wrote: >>>> On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote: >>>>> Attached patch makes SnapshotNow into an MVCC snapshot, >> >>>> That's a neat trick. However, if you start a new SnapshotNow scan while one is >>>> ongoing, the primordial scan's snapshot will change mid-stream. >> >>> Do we ever do that? >> >> Almost certainly yes. For example, a catcache load may invoke catcache >> or relcache reload operations on its way to opening the table or index >> needed to fetch the desired row. >> >> I think you can only safely do this if each caller has its own snapshot >> variable, a la SnapshotDirty, and that's going to be hugely more >> invasive. > > > I think I can avoid doing things that way and keep it non-invasive. > > If we > #define SnapshotNow (GetSnapshotNow()) > > and make GetSnapshotNow() reuse the static SnapshotNowData if possible. > If not available we can allocate a new snapshot in TopTransactionContext. > > We can free the snapshot at the end of scan. > > That hides most of the complexity without causing any problems, ISTM. > > Working on this now. Options for implementing this are 1) add FreeSnapshotNowIfNeeded() to every heap_endscan and index_endscan Small code footprint, touches every scan 2) change all heap_scans that uses SnapshotNow into a systable_scan and add FreeSnapshotNowIfNeeded() to systable_endscan 35 call points, touches only system table scans 3) code specific calls to create a snapshot for each SnapshotNow call, similar to InitDirtySnapshot 185 call points, very fine grained control, but probably overkill (2) seems the right way to do this, with selective use of (3) and has the side benefit of a reasonable code rationalisation -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jan 2, 2012 at 6:41 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah@leadboat.com> wrote: >> On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote: >>> Attached patch makes SnapshotNow into an MVCC snapshot, initialised at >>> the start of each scan iff SnapshotNow is passed as the scan's >>> snapshot. It's fairly brief but seems to do the trick. >> >> That's a neat trick. However, if you start a new SnapshotNow scan while one is >> ongoing, the primordial scan's snapshot will change mid-stream. > > Do we ever do that? (and if so, Why?!? or perhaps just Where?) Just for the record, yes we do run multiple catalog scans in some parts of the code. So I can see how we might trigger 4 nested scans, using cache replacement while scanning, so best assume more, with no guarantee of them being neatly stacked for pop/push type access. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jan 03, 2012 at 01:18:41AM +0000, Simon Riggs wrote: > Just for the record, yes we do run multiple catalog scans in some > parts of the code. > > So I can see how we might trigger 4 nested scans, using cache > replacement while scanning, so best assume more, with no guarantee of > them being neatly stacked for pop/push type access. Yeah, I wouldn't want to commit to a nesting limit. However, I _would_ have expected that a stack would suffice; PushActiveSnapshot()/PopActiveSnapshot() is adequate for a great deal of the backend, after all. In what sort of situation do catalog scans not strictly nest? Thanks, nm
On Tue, Jan 3, 2012 at 4:40 AM, Noah Misch <noah@leadboat.com> wrote: > On Tue, Jan 03, 2012 at 01:18:41AM +0000, Simon Riggs wrote: >> Just for the record, yes we do run multiple catalog scans in some >> parts of the code. >> >> So I can see how we might trigger 4 nested scans, using cache >> replacement while scanning, so best assume more, with no guarantee of >> them being neatly stacked for pop/push type access. > > Yeah, I wouldn't want to commit to a nesting limit. Agreed > However, I _would_ have > expected that a stack would suffice; PushActiveSnapshot()/PopActiveSnapshot() > is adequate for a great deal of the backend, after all. In what sort of > situation do catalog scans not strictly nest? It's possible. Making assumptions about what is possible bit me before, as you showed. I've seen code where we are explicitly running two concurrent scans. I don't want to put unnecessary and subtle assumptions into catalog scan code so I'm working on the assumption that endscan may not be called in strict FIFO order. The difference is fairly minor and doesn't restrict us in other ways. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jan 3, 2012 at 5:47 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Tue, Jan 3, 2012 at 4:40 AM, Noah Misch <noah@leadboat.com> wrote: >> On Tue, Jan 03, 2012 at 01:18:41AM +0000, Simon Riggs wrote: >>> Just for the record, yes we do run multiple catalog scans in some >>> parts of the code. >>> >>> So I can see how we might trigger 4 nested scans, using cache >>> replacement while scanning, so best assume more, with no guarantee of >>> them being neatly stacked for pop/push type access. >> >> Yeah, I wouldn't want to commit to a nesting limit. > > Agreed > >> However, I _would_ have >> expected that a stack would suffice; PushActiveSnapshot()/PopActiveSnapshot() >> is adequate for a great deal of the backend, after all. In what sort of >> situation do catalog scans not strictly nest? > > It's possible. Making assumptions about what is possible bit me > before, as you showed. > > I've seen code where we are explicitly running two concurrent scans. > > I don't want to put unnecessary and subtle assumptions into catalog > scan code so I'm working on the assumption that endscan may not be > called in strict FIFO order. The difference is fairly minor and > doesn't restrict us in other ways. I feel like the first thing we should be doing here is some benchmarking. If we change just the scans in dependency.c and then try the test case Tom suggested (dropping a schema containing a large number of functions), we can compare the patched code with master and get an idea of whether the performance is acceptable. If it is, changing everything else is a mostly mechanical process that we can simply grind through. If it's not, I'd rather learn that before we start grinding. I started to do this before Christmas, but then I ... went on vacation. Here's Perl script that can be used to generate an SQL script to create as many functions as you'd like to try dropping at once, in case it's helpful. The idea is to run the resulting SQL script through psql and then test the speed of "DROP SCHEMA lots_of_functions CASCADE;". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Tue, Jan 3, 2012 at 3:24 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I feel like the first thing we should be doing here is some > benchmarking. If we change just the scans in dependency.c and then > try the test case Tom suggested (dropping a schema containing a large > number of functions), we can compare the patched code with master and > get an idea of whether the performance is acceptable. Yes, I've done this and it takes 2.5s to drop 10,000 functions using an MVCC snapshot. That was acceptable to *me*, so I didn't try measuring using just SnapshotNow. We can do a lot of tests but at the end its a human judgement. Is 100% correct results from catalog accesses worth having when the real world speed of it is not substantially very good? (Whether its x1000000 times slower or not is not relevant if it is still fast enough). Anybody with that many DDL statements probably cares about doing various operations with lower lock levels. Fastpath locking will slow down DDL but we didn't measure the performance slow down there. We understood the benefit and were willing to pay the price. So I'll proceed for now with the patch, which isn't as simple as you think. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jan 3, 2012 at 11:17 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Tue, Jan 3, 2012 at 3:24 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> I feel like the first thing we should be doing here is some >> benchmarking. If we change just the scans in dependency.c and then >> try the test case Tom suggested (dropping a schema containing a large >> number of functions), we can compare the patched code with master and >> get an idea of whether the performance is acceptable. > > Yes, I've done this and it takes 2.5s to drop 10,000 functions using > an MVCC snapshot. > > That was acceptable to *me*, so I didn't try measuring using just SnapshotNow. > > We can do a lot of tests but at the end its a human judgement. Is 100% > correct results from catalog accesses worth having when the real world > speed of it is not substantially very good? (Whether its x1000000 > times slower or not is not relevant if it is still fast enough). Sure. But I don't see why that means it wouldn't be nice to know whether or not it is in fact a million times slower. If Tom's artificial worst case is a million times slower, then probably there are some cases we care about more that are going to be measurably impacted, and we're going to want to think about what to do about that. We can wait until you've finished the patch before we do that testing, or we can do it now and maybe get some idea whether the approach is likely to be viable or whether it's going to require some adjustment before we actually go trawl through all that code. On my laptop, dropping a schema with 10,000 functions using commit d5448c7d31b5af66a809e6580bae9bd31448bfa7 takes 400-500 ms. If my laptop is the same speed as your laptop, that would mean a 5-6x slowdown, but of course that's comparing apples and oranges... in any event, if the real number is anywhere in that ballpark, it's probably a surmountable obstacle, but I'd like to know rather than guessing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Simon Riggs <simon@2ndQuadrant.com> writes: > That was acceptable to *me*, so I didn't try measuring using just SnapshotNow. > We can do a lot of tests but at the end its a human judgement. Is 100% > correct results from catalog accesses worth having when the real world > speed of it is not substantially very good? (Whether its x1000000 > times slower or not is not relevant if it is still fast enough). That argument is just nonsense AFAICT. Yes, 2.5 s to drop 10000 functions is probably fine, but that is an artificial test case whose only real interest is to benchmark what a change in SnapshotNow scans might cost us. In the real world it's hard to guess what fraction of a real workload might consist of such scans, but I suspect there are some where a significant increase in that cost might hurt. So in my mind the point of the exercise is to find out how much the cost increased, and I'm just baffled as to why you won't use a benchmark case to, um, benchmark. Another point that requires some thought is that switching SnapshotNow to be MVCC-based will presumably result in a noticeable increase in each backend's rate of wanting to acquire snapshots. Hence, more contention in GetSnapshotData can be expected. A single-threaded test case doesn't prove anything at all about what that might cost under load. regards, tom lane
On Tue, Jan 3, 2012 at 12:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> That was acceptable to *me*, so I didn't try measuring using just SnapshotNow. > >> We can do a lot of tests but at the end its a human judgement. Is 100% >> correct results from catalog accesses worth having when the real world >> speed of it is not substantially very good? (Whether its x1000000 >> times slower or not is not relevant if it is still fast enough). > > That argument is just nonsense AFAICT. Yes, 2.5 s to drop 10000 > functions is probably fine, but that is an artificial test case whose > only real interest is to benchmark what a change in SnapshotNow scans > might cost us. In the real world it's hard to guess what fraction of a > real workload might consist of such scans, but I suspect there are some > where a significant increase in that cost might hurt. So in my mind the > point of the exercise is to find out how much the cost increased, and > I'm just baffled as to why you won't use a benchmark case to, um, > benchmark. Ditto. > Another point that requires some thought is that switching SnapshotNow > to be MVCC-based will presumably result in a noticeable increase in each > backend's rate of wanting to acquire snapshots. Hence, more contention > in GetSnapshotData can be expected. A single-threaded test case doesn't > prove anything at all about what that might cost under load. This is obviously true at some level, but I'm not sure that it really matters. It's not that difficult to construct a test case where we have lots of people concurrently reading a table, or reading many tables, or writing a table, or writing many tables, but what kind of realistic test case involves enough DDL for any of this to matter? If you're creating or dropping tables, for example, the filesystem costs are likely to be a much bigger problem than GetSnapshotData(), to the point where you probably can't get enough concurrency for GetSnapshotData() to matter. Maybe you could find a problem case involving creating or dropping lots and lots of functions concurrently, or something like that, but who does that? You'd have to be performing operations on hundreds of non-table SQL objects per second, and it is hard for me to imagine why anyone would be doing that. Maybe I'm not imaginative enough? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Excerpts from Robert Haas's message of mar ene 03 12:24:52 -0300 2012: > I feel like the first thing we should be doing here is some > benchmarking. If we change just the scans in dependency.c and then > try the test case Tom suggested (dropping a schema containing a large > number of functions), we can compare the patched code with master and > get an idea of whether the performance is acceptable. If it is, > changing everything else is a mostly mechanical process that we can > simply grind through. If it's not, I'd rather learn that before we > start grinding. If there are many call sites, maybe it'd be a good idea to use a semantic patcher tool such as Coccinelle instead of doing it one by one. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jan 3, 2012 at 12:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Another point that requires some thought is that switching SnapshotNow >> to be MVCC-based will presumably result in a noticeable increase in each >> backend's rate of wanting to acquire snapshots. �Hence, more contention >> in GetSnapshotData can be expected. �A single-threaded test case doesn't >> prove anything at all about what that might cost under load. > This is obviously true at some level, but I'm not sure that it really > matters. It's not that difficult to construct a test case where we > have lots of people concurrently reading a table, or reading many > tables, or writing a table, or writing many tables, but what kind of > realistic test case involves enough DDL for any of this to matter? Um ... you're supposing that only DDL uses SnapshotNow, which is wrong. I refer you to the parser, the planner, execution functions for arrays, records, enums, any sort of relcache reload, etc etc etc. Yes, some of that is masked by backend-internal caching, some of the time, but it's folly to just assume that there are no SnapshotNow scans during normal queries. None of this is necessarily grounds to reject a patch along the proposed lines. I'm just asking for some benchmarking effort to establish what the costs might be, rather than naively hoping they are negligible. regards, tom lane
On Tue, Jan 3, 2012 at 1:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Um ... you're supposing that only DDL uses SnapshotNow, which is wrong. > I refer you to the parser, the planner, execution functions for arrays, > records, enums, any sort of relcache reload, etc etc etc. Yes, some > of that is masked by backend-internal caching, some of the time, but > it's folly to just assume that there are no SnapshotNow scans during > normal queries. Hmm. That's unfortunate, because it seems difficult to construct a test case that will exercise every feature in the system. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Um ... you're supposing that only DDL uses SnapshotNow, which is >> wrong. I refer you to the parser, the planner, execution >> functions for arrays, records, enums, any sort of relcache >> reload, etc etc etc. Yes, some of that is masked by >> backend-internal caching, some of the time, but it's folly to >> just assume that there are no SnapshotNow scans during normal >> queries. > > Hmm. That's unfortunate, because it seems difficult to construct > a test case that will exercise every feature in the system. Would the result of IsMVCCSnapshot() change for these snapshots? If so, it might require work in the SSI code to avoid a performance hit there. In early profiling and stepping through execution I noticed that we had overhead in serializable transactions for the planner checks for the actual values at the beginning or end of an index. This went away when we avoided SSI work for reads using a non-MVCC snapshot. If we're going to start using MVCC snapshots for such things, we need some other way to avoid unnecessary work in this area (and possibly others). At a minimum, some comparative benchmarks at the serializable isolation level would be in order when considering a patch like this. -Kevin
I wrote: >> Another point that requires some thought is that switching SnapshotNow >> to be MVCC-based will presumably result in a noticeable increase in each >> backend's rate of wanting to acquire snapshots. BTW, I wonder if this couldn't be ameliorated by establishing some ground rules about how up-to-date a snapshot really needs to be. Arguably, it should be okay for successive SnapshotNow scans to use the same snapshot as long as we have not acquired a new lock in between. If not, reusing an old snap doesn't introduce any race condition that wasn't there already. regards, tom lane
On Tue, Jan 3, 2012 at 1:42 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Robert Haas <robertmhaas@gmail.com> wrote: >> Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Um ... you're supposing that only DDL uses SnapshotNow, which is >>> wrong. I refer you to the parser, the planner, execution >>> functions for arrays, records, enums, any sort of relcache >>> reload, etc etc etc. Yes, some of that is masked by >>> backend-internal caching, some of the time, but it's folly to >>> just assume that there are no SnapshotNow scans during normal >>> queries. >> >> Hmm. That's unfortunate, because it seems difficult to construct >> a test case that will exercise every feature in the system. > > Would the result of IsMVCCSnapshot() change for these snapshots? If > so, it might require work in the SSI code to avoid a performance hit > there. In early profiling and stepping through execution I noticed > that we had overhead in serializable transactions for the planner > checks for the actual values at the beginning or end of an index. > This went away when we avoided SSI work for reads using a non-MVCC > snapshot. If we're going to start using MVCC snapshots for such > things, we need some other way to avoid unnecessary work in this > area (and possibly others). > > At a minimum, some comparative benchmarks at the serializable > isolation level would be in order when considering a patch like > this. Ugh. Yeah, that sounds like a problem. :-( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Jan 3, 2012 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >>> Another point that requires some thought is that switching SnapshotNow >>> to be MVCC-based will presumably result in a noticeable increase in each >>> backend's rate of wanting to acquire snapshots. > > BTW, I wonder if this couldn't be ameliorated by establishing some > ground rules about how up-to-date a snapshot really needs to be. > Arguably, it should be okay for successive SnapshotNow scans to use the > same snapshot as long as we have not acquired a new lock in between. > If not, reusing an old snap doesn't introduce any race condition that > wasn't there already. Is that likely to help much? I think our usual pattern is to lock the catalog, scan it, and then release the lock, so there will normally be an AcceptInvalidationMessages() just before the scan. Or at least, I think there will. Another thought is that it should always be safe to reuse an old snapshot if no transactions have committed or aborted since it was taken (possibly we could narrow that to "no transactions have committed since it was taken", but I think that might cause some headaches with RecentGlobalXmin). I wonder if we could come up with a cheap, hopefully lock-free method of determining whether or not that's the case. For example, suppose we store the last XID to commit or abort in shared memory. In GetSnapshotData(), we check whether that value has changed (probably, after enforcing a memory barrier of some kind) before acquiring the lock. If it has, we proceed normally, but if not, we just reuse the results from the previous GetSnapshotData(). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jan 3, 2012 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> BTW, I wonder if this couldn't be ameliorated by establishing some >> ground rules about how up-to-date a snapshot really needs to be. >> Arguably, it should be okay for successive SnapshotNow scans to use the >> same snapshot as long as we have not acquired a new lock in between. >> If not, reusing an old snap doesn't introduce any race condition that >> wasn't there already. > Is that likely to help much? I think our usual pattern is to lock the > catalog, scan it, and then release the lock, so there will normally be > an AcceptInvalidationMessages() just before the scan. Or at least, I > think there will. Um, good point. Those locks aren't meant to avoid race conditions, but the mechanism doesn't know that. > Another thought is that it should always be safe to reuse an old > snapshot if no transactions have committed or aborted since it was > taken Yeah, that might work better. And it'd be a win for all MVCC snaps, not just the ones coming from promoted SnapshotNow ... regards, tom lane
On Tue, Jan 3, 2012 at 6:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Jan 3, 2012 at 12:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Another point that requires some thought is that switching SnapshotNow >>> to be MVCC-based will presumably result in a noticeable increase in each >>> backend's rate of wanting to acquire snapshots. Hence, more contention >>> in GetSnapshotData can be expected. A single-threaded test case doesn't >>> prove anything at all about what that might cost under load. > >> This is obviously true at some level, but I'm not sure that it really >> matters. It's not that difficult to construct a test case where we >> have lots of people concurrently reading a table, or reading many >> tables, or writing a table, or writing many tables, but what kind of >> realistic test case involves enough DDL for any of this to matter? > > Um ... you're supposing that only DDL uses SnapshotNow, which is wrong. > I refer you to the parser, the planner, execution functions for arrays, > records, enums, any sort of relcache reload, etc etc etc. Yes, some > of that is masked by backend-internal caching, some of the time, but > it's folly to just assume that there are no SnapshotNow scans during > normal queries. > > None of this is necessarily grounds to reject a patch along the proposed > lines. I'm just asking for some benchmarking effort to establish what > the costs might be, rather than naively hoping they are negligible. All of which is reasonable doubt. So far, all I'm saying is that on a couple of heavy duty cases that I've run, I haven't noticed a problem. That is sufficient for me to push forwards with a patch that does very close to what we want, so we can judge acceptability with wider tests. I'm not saying it will be acceptable a priori, only that it is worth trying because the benefits are high. The number of call points are also high. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jan 3, 2012 at 6:14 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > If there are many call sites, maybe it'd be a good idea to use a > semantic patcher tool such as Coccinelle instead of doing it one by one. Thanks for the suggestion, regrettably I've already made those changes. After examining the call sites, I identified 35 that might need changing. Of those, about 30 were changed to use systable_beginscan, while a few others use declared snapshots instead. So not a great effort and worth doing the by-hand inspection. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Excerpts from Simon Riggs's message of mar ene 03 17:57:56 -0300 2012: > > On Tue, Jan 3, 2012 at 6:14 PM, Alvaro Herrera > <alvherre@commandprompt.com> wrote: > > > If there are many call sites, maybe it'd be a good idea to use a > > semantic patcher tool such as Coccinelle instead of doing it one by one. > > Thanks for the suggestion, regrettably I've already made those changes. > > After examining the call sites, I identified 35 that might need > changing. Of those, about 30 were changed to use systable_beginscan, > while a few others use declared snapshots instead. So not a great > effort and worth doing the by-hand inspection. Yeah, for 35 changes I wouldn't have gone all the length required to learn the new tool either. If they had been 200, OTOH ... -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Was this resolved? (Sorry to be bugging everyone.) --------------------------------------------------------------------------- On Tue, Nov 29, 2011 at 11:42:10PM -0500, Robert Haas wrote: > On Tue, Nov 29, 2011 at 11:50 AM, Bruce Momjian <bruce@momjian.us> wrote: > > Just confirming we decided not to persue this. > > Doesn't sound like it. > > I've been thinking a lot about the more general problem here - namely, > that allowing catalog changes without an access exclusive lock is > unsafe - and trying to come up with a good solution, because I'd > really like to see us make this work. It strikes me that there are > really two separate problems here. > > 1. If you are scanning a system catalog using SnapshotNow, and a > commit happens at just the right moment, you might see two versions of > the row or zero rather than one. You'll see two if you scan the old > row version, then the concurrent transaction commits, and then you > scan the new one. You'll see zero if you scan the new row (ignoring > it, since it isn't committed yet) and then the concurrent transaction > commits, and then you scan the old row. On a related note, if you > scan several interrelated catalogs (e.g. when building a relcache > entry) and a transaction that has made matching modifications to > multiple catalogs commits midway through, you might see the old > version of the data from one table and the new version of the data > from some other table. Using AccessExclusiveLock fixes the problem > because we guarantee that anybody who wants to read those rows first > takes some kind of lock, which they won't be able to do while the > updater holds AccessExclusiveLock. > > 2. Other backends may have data in the relcache or catcaches which > won't get invalidated until they do AcceptInvalidationMessages(). > That will always happen no later than the next time they lock the > relation, so if you are holding AccessExclusiveLock then you should be > OK: no one else holds any lock, and they'll have to go get one before > doing anything interesting. But if you are holding any weaker lock, > there's nothing to force AcceptInvalidationMessages to happen before > you reuse those cache entries. > > In both cases, name lookups are an exception: we don't know what OID > to lock until we've done the name lookup. > > Random ideas for solving the first problem: > > 1-A. Take a fresh MVCC snapshot for each catalog scan. I think we've > rejected this before because of the cost involved, but maybe with > Pavan's patch and some of the other improvements that are on the way > we could make this cheap enough to be acceptable. > 1-B. Don't allow transactions that have modified system catalog X to > commit while we're scanning system catalog X; make the scans take some > kind of shared lock and commits an exclusive lock. This seems awfully > bad for concurrency, though, not to mention the overhead of taking and > releasing all those locks even when nothing conflicts. > 1-C. Wrap a retry loop around every place where we scan a system > catalog. If a transaction that has written rows in catalog X commits > while we're scanning catalog X, discard the results of the first scan > and declare a do-over (or maybe something more coarse-grained, or at > the other extreme even more fine-grained). If we're doing several > interrelated scans then the retry loop must surround the entire unit, > and enforce a do-over of the whole operation if commits occur in any > of the catalogs before the scan completes. > > Unfortunately, any of these seem likely to require a Lot of Work, > probably more than can be done in time for 9.2. However, perhaps it > would be feasible to hone in on a limited subset of the cases (e.g. > name lookups, which are mainly done in only a handful of places, and > represent an existing bug) and implement the necessary code changes > just for those cases. Then we could generalize that pattern to other > cases as time and energy permit. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, Aug 16, 2012 at 9:11 PM, Bruce Momjian <bruce@momjian.us> wrote: > Was this resolved? (Sorry to be bugging everyone.) Nope. I've got some ideas, but not enough round tuits. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 3 January 2012 18:42, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >>> Another point that requires some thought is that switching SnapshotNow >>> to be MVCC-based will presumably result in a noticeable increase in each >>> backend's rate of wanting to acquire snapshots. > > BTW, I wonder if this couldn't be ameliorated by establishing some > ground rules about how up-to-date a snapshot really needs to be. > Arguably, it should be okay for successive SnapshotNow scans to use the > same snapshot as long as we have not acquired a new lock in between. > If not, reusing an old snap doesn't introduce any race condition that > wasn't there already. Now that has been implemented using the above design, we can resubmit the lock level reduction patch, with thanks to Robert. Submitted patch passes original complaint/benchmark. Changes * various forms of ALTER TABLE, notably ADD constraint and VALIDATE * CREATE TRIGGER One minor coirrections to earlier thinking with respect to toast tables. That might be later relaxed. Full tests including proof of lock level reductions, plus docs. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Sun, Jul 7, 2013 at 9:24 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 3 January 2012 18:42, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I wrote: >>>> Another point that requires some thought is that switching SnapshotNow >>>> to be MVCC-based will presumably result in a noticeable increase in each >>>> backend's rate of wanting to acquire snapshots. >> >> BTW, I wonder if this couldn't be ameliorated by establishing some >> ground rules about how up-to-date a snapshot really needs to be. >> Arguably, it should be okay for successive SnapshotNow scans to use the >> same snapshot as long as we have not acquired a new lock in between. >> If not, reusing an old snap doesn't introduce any race condition that >> wasn't there already. > > Now that has been implemented using the above design, we can resubmit > the lock level reduction patch, with thanks to Robert. > > Submitted patch passes original complaint/benchmark. > > Changes > * various forms of ALTER TABLE, notably ADD constraint and VALIDATE > * CREATE TRIGGER > > One minor coirrections to earlier thinking with respect to toast > tables. That might be later relaxed. > > Full tests including proof of lock level reductions, plus docs. I'm quite glad to see this patch back again for another round. I haven't reviewed it in detail yet, so the purpose of this email is just to lay out some general areas of concern and food for thought rather than to critique anything in the patch too specifically. Generally, the question on the table is: to what extent do MVCC catalog scans make the world safe for concurrent DDL, or put negatively, what hazards remain? Noah discovered an interesting one recently: apparently, the relcache machinery has some logic in it that depends on the use of AccessExclusiveLock in subtle ways. I'm going to attempt to explain it, and maybe he can jump in and explain it better. Essentially, the problem is that when a relcache reload occurs, certain data structures (like the tuple descriptor, but there are others) are compared against the old version of the same data structure. If there are no changes, we do nothing; else, we free the old one and install the new one. The reason why we don't free the old one and install the new one unconditionally is because other parts of the backend might have pointers to the old data structure, so just replacing it all the time would result in crashes. Since DDL requires AccessExclusiveLock + CheckTableNotInUse(), any actual change to the data structure will happen when we haven't got any pointers anyway. A second thing I'm concerned about is that, although MVCC catalog scans guarantee that we won't miss a concurrently-updated row entirely, or see a duplicate, they don't guarantee that the rows we see during a scan of catalog A will be consistent with the rows we see during a scan of catalog B moments later. For example, hilarity will ensue if relnatts doesn't match what we see in pg_attribute. Now I don't think this particular example matters very much because I think there are probably lots of other things that would also break if we try to add a column without a full table lock, so we're probably doomed there anyway. But there might be other instances of this problem that are more subtle. I'll try to find some time to look at this in more detail. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-07-15 10:06:31 -0400, Robert Haas wrote: > Noah discovered an interesting one recently: apparently, the relcache > machinery has some logic in it that depends on the use of > AccessExclusiveLock in subtle ways. I'm going to attempt to explain > it, and maybe he can jump in and explain it better. Essentially, the > problem is that when a relcache reload occurs, certain data structures > (like the tuple descriptor, but there are others) are compared against > the old version of the same data structure. If there are no changes, > we do nothing; else, we free the old one and install the new one. The > reason why we don't free the old one and install the new one > unconditionally is because other parts of the backend might have > pointers to the old data structure, so just replacing it all the time > would result in crashes. Since DDL requires AccessExclusiveLock + > CheckTableNotInUse(), any actual change to the data structure will > happen when we haven't got any pointers anyway. Aren't we swapping in the new data on a data level for that reason? See RelationClearRelation(). > A second thing I'm concerned about is that, although MVCC catalog > scans guarantee that we won't miss a concurrently-updated row > entirely, or see a duplicate, they don't guarantee that the rows we > see during a scan of catalog A will be consistent with the rows we see > during a scan of catalog B moments later. For example, hilarity will > ensue if relnatts doesn't match what we see in pg_attribute. Now I > don't think this particular example matters very much because I think > there are probably lots of other things that would also break if we > try to add a column without a full table lock, so we're probably > doomed there anyway. But there might be other instances of this > problem that are more subtle. Hm. Other transactions basically should be protected against this because they can't se uncommitted data anyway, right? ISTM that our own session basically already has be safe against hilarity of this kind, right? I am concerned about stuff like violating constraints because we haven't yet seen the new constraint definition and the like... Or generating wrong plans because we haven't seen that somebody has dropped a constraint and inserted data violating the old one. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jul 15, 2013 at 10:32 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-07-15 10:06:31 -0400, Robert Haas wrote: >> Noah discovered an interesting one recently: apparently, the relcache >> machinery has some logic in it that depends on the use of >> AccessExclusiveLock in subtle ways. I'm going to attempt to explain >> it, and maybe he can jump in and explain it better. Essentially, the >> problem is that when a relcache reload occurs, certain data structures >> (like the tuple descriptor, but there are others) are compared against >> the old version of the same data structure. If there are no changes, >> we do nothing; else, we free the old one and install the new one. The >> reason why we don't free the old one and install the new one >> unconditionally is because other parts of the backend might have >> pointers to the old data structure, so just replacing it all the time >> would result in crashes. Since DDL requires AccessExclusiveLock + >> CheckTableNotInUse(), any actual change to the data structure will >> happen when we haven't got any pointers anyway. > > Aren't we swapping in the new data on a data level for that reason? See > RelationClearRelation(). Look at the keep_tupdesc and keep_rules variables. >> A second thing I'm concerned about is that, although MVCC catalog >> scans guarantee that we won't miss a concurrently-updated row >> entirely, or see a duplicate, they don't guarantee that the rows we >> see during a scan of catalog A will be consistent with the rows we see >> during a scan of catalog B moments later. For example, hilarity will >> ensue if relnatts doesn't match what we see in pg_attribute. Now I >> don't think this particular example matters very much because I think >> there are probably lots of other things that would also break if we >> try to add a column without a full table lock, so we're probably >> doomed there anyway. But there might be other instances of this >> problem that are more subtle. > > Hm. Other transactions basically should be protected against this > because they can't se uncommitted data anyway, right? ISTM that our own > session basically already has be safe against hilarity of this kind, > right? Other transactions are protected only within a single scan. If they do two or more separate scans one after the another, some other transaction can commit in the middle of things. Any commits that happen after starting the first scan and before starting the second scan will be reflected in the second scan, but not in the first. That's what I'm concerned about. Our own session can't have a problem of this kind, because we'll never commit a subtransaction (or, heaven forbid, a top-level transaction) while in the middle of a system catalog scan. > I am concerned about stuff like violating constraints because we haven't > yet seen the new constraint definition and the like... Or generating > wrong plans because we haven't seen that somebody has dropped a > constraint and inserted data violating the old one. Yes, we need to carefully audit the commands for dependencies of that type. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 15 July 2013 15:06, Robert Haas <robertmhaas@gmail.com> wrote: > Generally, the question on the table is: to what extent do MVCC > catalog scans make the world safe for concurrent DDL, or put > negatively, what hazards remain? On Tom's test I've been unable to find a single problem. > Noah discovered an interesting one recently: apparently, the relcache > machinery has some logic in it that depends on the use of > AccessExclusiveLock in subtle ways. I'm going to attempt to explain > it, and maybe he can jump in and explain it better. Essentially, the > problem is that when a relcache reload occurs, certain data structures > (like the tuple descriptor, but there are others) are compared against > the old version of the same data structure. If there are no changes, > we do nothing; else, we free the old one and install the new one. The > reason why we don't free the old one and install the new one > unconditionally is because other parts of the backend might have > pointers to the old data structure, so just replacing it all the time > would result in crashes. Since DDL requires AccessExclusiveLock + > CheckTableNotInUse(), any actual change to the data structure will > happen when we haven't got any pointers anyway. > > A second thing I'm concerned about is that, although MVCC catalog > scans guarantee that we won't miss a concurrently-updated row > entirely, or see a duplicate, they don't guarantee that the rows we > see during a scan of catalog A will be consistent with the rows we see > during a scan of catalog B moments later. For example, hilarity will > ensue if relnatts doesn't match what we see in pg_attribute. Now I > don't think this particular example matters very much because I think > there are probably lots of other things that would also break if we > try to add a column without a full table lock, so we're probably > doomed there anyway. But there might be other instances of this > problem that are more subtle. If you look at this as a generalised problem you probably can find some issues, but that doesn't mean they apply in the specific cases we're addressing. The lock reductions we are discussing in all cases have nothing at all to do with structure and only relate to various options. Except in the case of constraints, though even there I see no issues as yet. I've no problem waiting awhile to apply while others try to find loopholes. --Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jul 15, 2013 at 05:50:40PM +0100, Simon Riggs wrote: > On 15 July 2013 15:06, Robert Haas <robertmhaas@gmail.com> wrote: > > Generally, the question on the table is: to what extent do MVCC > > catalog scans make the world safe for concurrent DDL, or put > > negatively, what hazards remain? > > On Tom's test I've been unable to find a single problem. > > > Noah discovered an interesting one recently: apparently, the relcache > > machinery has some logic in it that depends on the use of > > AccessExclusiveLock in subtle ways. I'm going to attempt to explain > > it, and maybe he can jump in and explain it better. Essentially, the > > problem is that when a relcache reload occurs, certain data structures > > (like the tuple descriptor, but there are others) are compared against > > the old version of the same data structure. If there are no changes, > > we do nothing; else, we free the old one and install the new one. The > > reason why we don't free the old one and install the new one > > unconditionally is because other parts of the backend might have > > pointers to the old data structure, so just replacing it all the time > > would result in crashes. Since DDL requires AccessExclusiveLock + > > CheckTableNotInUse(), any actual change to the data structure will > > happen when we haven't got any pointers anyway. > If you look at this as a generalised problem you probably can find > some issues, but that doesn't mean they apply in the specific cases > we're addressing. > > The lock reductions we are discussing in all cases have nothing at all > to do with structure and only relate to various options. Except in the > case of constraints, though even there I see no issues as yet. I was able to distill the above hypothesis into an actual crash with reduce_lock_levels.v13.patch. Test recipe: 1. Build with --enable-cassert and with -DCATCACHE_FORCE_RELEASE=1. An AcceptInvalidationMessages() will then happen at nearly every syscache lookup, making it far easier to hit a problem of this sort. 2. Run these commands as setup: create table root (c int); create table part (check (c > 0), check (c > 0)) inherits (root); 3. Attach a debugger to your session and set a breakpoint at plancat.c:660 (as of commit 16f38f72ab2b8a3b2d45ba727d213bb31111cea4). 4. Run this in your session; the breakpoint will trip: select * from root where c = -1; 5. Start another session and run: alter table part add check (c > 0); 6. Exit the debugger to release the first session. It will crash. plancache.c:657 stashes a pointer to memory belonging to the rd_att of a relcache entry. It then proceeds to call eval_const_expressions(), which performs a syscache lookup in its simplify_function() subroutine. Under CATCACHE_FORCE_RELEASE, the syscache lookup reliably prompts an AcceptInvalidationMessages(). The ensuing RelationClearRelation() against "part" notices the new constraint, decides keep_tupdesc = false, and frees the existing tupdesc. plancache.c is now left holding a pointer into freed memory. The next loop iteration will crash when it dereferences a pointer stored within that freed memory at plancat.c:671. A remediation strategy that seemed attractive when I last contemplated this problem is to repoint rd_att immediately but arrange to free the obsolete TupleDesc in AtEOXact_RelationCache(). -- Noah Misch EnterpriseDB http://www.enterprisedb.com
On 1 August 2013 01:53, Noah Misch <noah@leadboat.com> wrote: > I was able to distill the above hypothesis into an actual crash with > reduce_lock_levels.v13.patch. Test recipe: Thank you for the report; thank you again for reporting in sufficient time to allow me to investigate and fix by the next CF. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 1 August 2013 01:53, Noah Misch <noah@leadboat.com> wrote: > A remediation strategy that seemed attractive when I last contemplated this > problem is to repoint rd_att immediately but arrange to free the obsolete > TupleDesc in AtEOXact_RelationCache(). I agree that the best way to resolve this is to retain a copy of the TupleDesc, so that copied pointers to it remain valid. EOXact is actually longer than strictly necessary in some cases, but trying to work out a more minimal approach seems hard and possibly inefficient. Comments in relcache.c indicate that the Relation swapping concept might be replaced by refcounting approach. I can't see how that differs from your suggested route. Which means I can't see any other way of doing this other than the way you suggest. Will implement. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 7 July 2013 14:24, Simon Riggs <simon@2ndquadrant.com> wrote: > On 3 January 2012 18:42, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I wrote: >>>> Another point that requires some thought is that switching SnapshotNow >>>> to be MVCC-based will presumably result in a noticeable increase in each >>>> backend's rate of wanting to acquire snapshots. >> >> BTW, I wonder if this couldn't be ameliorated by establishing some >> ground rules about how up-to-date a snapshot really needs to be. >> Arguably, it should be okay for successive SnapshotNow scans to use the >> same snapshot as long as we have not acquired a new lock in between. >> If not, reusing an old snap doesn't introduce any race condition that >> wasn't there already. > > Now that has been implemented using the above design, we can resubmit > the lock level reduction patch, with thanks to Robert. > > Submitted patch passes original complaint/benchmark. > > Changes > * various forms of ALTER TABLE, notably ADD constraint and VALIDATE > * CREATE TRIGGER > > One minor coirrections to earlier thinking with respect to toast > tables. That might be later relaxed. > > Full tests including proof of lock level reductions, plus docs. Rebased to v14 -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 1 August 2013 01:53, Noah Misch <noah@leadboat.com> wrote: > On Mon, Jul 15, 2013 at 05:50:40PM +0100, Simon Riggs wrote: >> On 15 July 2013 15:06, Robert Haas <robertmhaas@gmail.com> wrote: >> > Generally, the question on the table is: to what extent do MVCC >> > catalog scans make the world safe for concurrent DDL, or put >> > negatively, what hazards remain? >> >> On Tom's test I've been unable to find a single problem. >> >> > Noah discovered an interesting one recently: apparently, the relcache >> > machinery has some logic in it that depends on the use of >> > AccessExclusiveLock in subtle ways. I'm going to attempt to explain >> > it, and maybe he can jump in and explain it better. Essentially, the >> > problem is that when a relcache reload occurs, certain data structures >> > (like the tuple descriptor, but there are others) are compared against >> > the old version of the same data structure. If there are no changes, >> > we do nothing; else, we free the old one and install the new one. The >> > reason why we don't free the old one and install the new one >> > unconditionally is because other parts of the backend might have >> > pointers to the old data structure, so just replacing it all the time >> > would result in crashes. Since DDL requires AccessExclusiveLock + >> > CheckTableNotInUse(), any actual change to the data structure will >> > happen when we haven't got any pointers anyway. > >> If you look at this as a generalised problem you probably can find >> some issues, but that doesn't mean they apply in the specific cases >> we're addressing. >> >> The lock reductions we are discussing in all cases have nothing at all >> to do with structure and only relate to various options. Except in the >> case of constraints, though even there I see no issues as yet. > > I was able to distill the above hypothesis into an actual crash with > reduce_lock_levels.v13.patch. Test recipe: > > 1. Build with --enable-cassert and with -DCATCACHE_FORCE_RELEASE=1. An > AcceptInvalidationMessages() will then happen at nearly every syscache lookup, > making it far easier to hit a problem of this sort. > > 2. Run these commands as setup: > create table root (c int); > create table part (check (c > 0), check (c > 0)) inherits (root); > > 3. Attach a debugger to your session and set a breakpoint at plancat.c:660 (as > of commit 16f38f72ab2b8a3b2d45ba727d213bb31111cea4). > > 4. Run this in your session; the breakpoint will trip: > select * from root where c = -1; > > 5. Start another session and run: > alter table part add check (c > 0); > > 6. Exit the debugger to release the first session. It will crash. > > plancache.c:657 stashes a pointer to memory belonging to the rd_att of a > relcache entry. It then proceeds to call eval_const_expressions(), which > performs a syscache lookup in its simplify_function() subroutine. Under > CATCACHE_FORCE_RELEASE, the syscache lookup reliably prompts an > AcceptInvalidationMessages(). The ensuing RelationClearRelation() against > "part" notices the new constraint, decides keep_tupdesc = false, and frees the > existing tupdesc. plancache.c is now left holding a pointer into freed > memory. The next loop iteration will crash when it dereferences a pointer > stored within that freed memory at plancat.c:671. > > > A remediation strategy that seemed attractive when I last contemplated this > problem is to repoint rd_att immediately but arrange to free the obsolete > TupleDesc in AtEOXact_RelationCache(). v15 to fix the above problem. Looking at other potential problems around plancache but nothing found as yet. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Wed, Jan 15, 2014 at 6:57 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > v15 to fix the above problem. Minor quibble: I get a compiler warning with the patch applied. "relcache.c: In function ‘RememberToFreeTupleDescAtEOX’: relcache.c:2317:3: warning: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]". -- Peter Geoghegan
On 24 January 2014 07:08, Peter Geoghegan <pg@heroku.com> wrote: > On Wed, Jan 15, 2014 at 6:57 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> v15 to fix the above problem. > > Minor quibble: I get a compiler warning with the patch applied. > "relcache.c: In function 'RememberToFreeTupleDescAtEOX': > relcache.c:2317:3: warning: ISO C90 forbids mixed declarations and > code [-Werror=declaration-after-statement]". Thanks, that was a wart, now fixed. v16 attached -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 24 January 2014 08:33, Simon Riggs <simon@2ndquadrant.com> wrote: > On 24 January 2014 07:08, Peter Geoghegan <pg@heroku.com> wrote: >> On Wed, Jan 15, 2014 at 6:57 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> v15 to fix the above problem. >> > v16 attached v17 attached This version adds a GUC called ddl_exclusive_locks which allows us to keep the 9.3 behaviour if we wish it. Some people may be surprised that their programs don't wait in the same places they used to. We hope that is a positive and useful behaviour, but it may not always be so. I'll commit this on Thurs 30 Jan unless I hear objections. Thanks -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 27 January 2014 17:58, Simon Riggs <simon@2ndquadrant.com> wrote: > On 24 January 2014 08:33, Simon Riggs <simon@2ndquadrant.com> wrote: >> On 24 January 2014 07:08, Peter Geoghegan <pg@heroku.com> wrote: >>> On Wed, Jan 15, 2014 at 6:57 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>>> v15 to fix the above problem. >>> >> v16 attached > > v17 attached Frostbite... -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Mon, Jan 27, 2014 at 12:58 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 24 January 2014 08:33, Simon Riggs <simon@2ndquadrant.com> wrote: >> On 24 January 2014 07:08, Peter Geoghegan <pg@heroku.com> wrote: >>> On Wed, Jan 15, 2014 at 6:57 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>>> v15 to fix the above problem. >>> >> v16 attached > > v17 attached > > This version adds a GUC called ddl_exclusive_locks which allows us to > keep the 9.3 behaviour if we wish it. Some people may be surprised > that their programs don't wait in the same places they used to. We > hope that is a positive and useful behaviour, but it may not always be > so. > > I'll commit this on Thurs 30 Jan unless I hear objections. I haven't reviewed the patch, but -1 for adding a GUC. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Jan 27, 2014 at 12:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I haven't reviewed the patch, but -1 for adding a GUC. I'm pretty surprised that it's been suggested that some people might prefer AccessExclusiveLocks. Why would anyone prefer that? -- Peter Geoghegan
Peter Geoghegan <pg@heroku.com> writes: > On Mon, Jan 27, 2014 at 12:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> I haven't reviewed the patch, but -1 for adding a GUC. > I'm pretty surprised that it's been suggested that some people might > prefer AccessExclusiveLocks. Why would anyone prefer that? For one thing, so they can back this out if it proves to be broken, as the last committed version was. Given that this patch was marked (by its author) as Ready for Committer without any review in the current CF, I can't say that I have any faith in it. regards, tom lane
On 27 January 2014 20:35, Peter Geoghegan <pg@heroku.com> wrote: > On Mon, Jan 27, 2014 at 12:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> I haven't reviewed the patch, but -1 for adding a GUC. > > I'm pretty surprised that it's been suggested that some people might > prefer AccessExclusiveLocks. Why would anyone prefer that? Nobody has said "prefer". I said... > Some people may be surprised > that their programs don't wait in the same places they used to. We > hope that is a positive and useful behaviour, but it may not always be > so. We get the new behaviour by default and I expect we'll be very happy with it. A second thought is that if we have problems of some kind in the field as a result of the new lock modes then we will be able to turn them off. I'm happy to fix any problems that occur, but that doesn't mean there won't be any. If everybody is confident that we've foreseen every bug, then no problem, lets remove it. I recall being asked to add hot_standby = on | off for similar reasons. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 27 January 2014 20:47, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Geoghegan <pg@heroku.com> writes: >> On Mon, Jan 27, 2014 at 12:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> I haven't reviewed the patch, but -1 for adding a GUC. > >> I'm pretty surprised that it's been suggested that some people might >> prefer AccessExclusiveLocks. Why would anyone prefer that? > > For one thing, so they can back this out if it proves to be broken, > as the last committed version was. Agreed > Given that this patch was marked > (by its author) as Ready for Committer without any review in the current > CF True. The main review happened in a previous commitfest and there was a small addition for this CF. It was my understanding that you wanted us to indicate that to allow you to review. I am happy to set status differently, as you wish, presumably back to needs review. >I can't say that I have any faith in it. That's a shame. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jan 27, 2014 at 08:57:26PM +0000, Simon Riggs wrote: > We get the new behaviour by default and I expect we'll be very happy with it. > > A second thought is that if we have problems of some kind in the field > as a result of the new lock modes then we will be able to turn them > off. I'm happy to fix any problems that occur, but that doesn't mean > there won't be any. If everybody is confident that we've foreseen > every bug, then no problem, lets remove it. I recall being asked to > add hot_standby = on | off for similar reasons. Addressing a larger issue, I have no problem with systematically adding GUCs to turn off features we add in each major release if we can also systematically remove those GUCs in the next major release. This would require putting all these settings in the compatibility section of postgresql.conf. However, I don't think it makes sense to do this in a one-off manner. It is also possible that there are enough cases where we _can't_ turn the feature off with a GUC that this would be unworkable. So, if we can't do it systematically, that means we will have enough breakage cases that we just need to rush out new versions to fix major breakage and one-off GUCs just don't buy us much, and add confusion. Does that make sense? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 28 January 2014 14:55, Bruce Momjian <bruce@momjian.us> wrote: > On Mon, Jan 27, 2014 at 08:57:26PM +0000, Simon Riggs wrote: >> We get the new behaviour by default and I expect we'll be very happy with it. >> >> A second thought is that if we have problems of some kind in the field >> as a result of the new lock modes then we will be able to turn them >> off. I'm happy to fix any problems that occur, but that doesn't mean >> there won't be any. If everybody is confident that we've foreseen >> every bug, then no problem, lets remove it. I recall being asked to >> add hot_standby = on | off for similar reasons. > > Addressing a larger issue, I have no problem with systematically adding > GUCs to turn off features we add in each major release if we can also > systematically remove those GUCs in the next major release. Agreed. I propose 2 releases since the time between release of 9.4 and the feature freeze of 9.5 is only 4 months, not usually enough for subtle bugs to be discovered. > This would require putting all these settings in the compatibility > section of postgresql.conf. Agreed, that is where I have added the parameter. > However, I don't think it makes sense to do this in a one-off manner. > It is also possible that there are enough cases where we _can't_ turn > the feature off with a GUC that this would be unworkable. > > So, if we can't do it systematically, that means we will have enough > breakage cases that we just need to rush out new versions to fix major > breakage and one-off GUCs just don't buy us much, and add confusion. > > Does that make sense? For me, reducing the strength of DDL locking is a major change in RDBMS behaviour that could both delight and surprise our users. Maybe a few actually depend upon the locking behaviour, maybe. After some years of various people looking at this, I think we've got it right. Experience tells me that while I think this is the outcome, we are well advised to protect against the possibility that it is not correct and that if we have corner case issues, it would be good to easily disable this in the field. In the current case, a simple parameter works very well to disable the feature; in other cases, not. Summary: This is an atypical case. I do not normally propose such things - this is the third time in 10 years, IIRC. I have no problem removing the parameter if required to. In that case, I would like to leave the parameter in until mid beta, to allow greater certainty. In any case, I would wish to retain as a minimum an extern bool variable allowing it to be turned off by C function if desired. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jan 28, 2014 at 04:36:39PM +0000, Simon Riggs wrote: > For me, reducing the strength of DDL locking is a major change in > RDBMS behaviour that could both delight and surprise our users. Maybe > a few actually depend upon the locking behaviour, maybe. After some > years of various people looking at this, I think we've got it right. > Experience tells me that while I think this is the outcome, we are > well advised to protect against the possibility that it is not correct > and that if we have corner case issues, it would be good to easily > disable this in the field. In the current case, a simple parameter > works very well to disable the feature; in other cases, not. > > Summary: This is an atypical case. I do not normally propose such > things - this is the third time in 10 years, IIRC. Uh, in my memory, you are the person who is most likely to suggest a GUC of all our developers. > I have no problem removing the parameter if required to. In that case, > I would like to leave the parameter in until mid beta, to allow > greater certainty. In any case, I would wish to retain as a minimum an > extern bool variable allowing it to be turned off by C function if > desired. Anything changed to postgresql.conf during beta is going to require an initdb. Also, lots of backward-compatibility infrastructure, as you are suggesting above, add complexity to the system. I am basically against a GUC on this. We have far larger problems with 9.3 than backward compatibility, and limited resources. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 01/28/2014 07:15 PM, Bruce Momjian wrote: > On Tue, Jan 28, 2014 at 04:36:39PM +0000, Simon Riggs wrote: >> For me, reducing the strength of DDL locking is a major change in >> RDBMS behaviour that could both delight and surprise our users. Maybe >> a few actually depend upon the locking behaviour, maybe. After some >> years of various people looking at this, I think we've got it right. >> Experience tells me that while I think this is the outcome, we are >> well advised to protect against the possibility that it is not correct >> and that if we have corner case issues, it would be good to easily >> disable this in the field. In the current case, a simple parameter >> works very well to disable the feature; in other cases, not. I don't understand why anyone would want to turn this feature off, ie. require stronger locks than necessary for a DDL change. If we're not confident that the patch is correct, then it should not be applied. If we are confident enough to commit it, and a bug crops up later, we'll fix the bug as usual. A user savvy enough to fiddle with such a GUC can also do their DDL with: BEGIN; LOCK TABLE <table> <DDL> COMMIT; >> I have no problem removing the parameter if required to. In that case, >> I would like to leave the parameter in until mid beta, to allow >> greater certainty. In any case, I would wish to retain as a minimum an >> extern bool variable allowing it to be turned off by C function if >> desired. > > Anything changed to postgresql.conf during beta is going to require an > initdb. Huh? Surely not. - Heikki
On Tue, Jan 28, 2014 at 07:21:50PM +0200, Heikki Linnakangas wrote: > >>I have no problem removing the parameter if required to. In that case, > >>I would like to leave the parameter in until mid beta, to allow > >>greater certainty. In any case, I would wish to retain as a minimum an > >>extern bool variable allowing it to be turned off by C function if > >>desired. > > > >Anything changed to postgresql.conf during beta is going to require an > >initdb. > > Huh? Surely not. Uh, if we ship beta1 with a GUC in postgresql.conf, and then we remove support for the GUC in beta2, anyone starting a server initdb-ed with beta1 is going to get an error and the server is not going to start: LOG: unrecognized configuration parameter "xxx" in file "/u/pgsql/data/postgresql.conf" line 1FATAL: configuration file"/u/pgsql/data/postgresql.conf" contains errors so, yeah, it isn't going to require an initdb, but it is going to require everyone to edit their postgresql.conf. My guess is a lot of people are going to assume the old postgresql.conf is not compatible and are going to initdb and reload. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 01/28/2014 07:26 PM, Bruce Momjian wrote: > On Tue, Jan 28, 2014 at 07:21:50PM +0200, Heikki Linnakangas wrote: >>>> I have no problem removing the parameter if required to. In that case, >>>> I would like to leave the parameter in until mid beta, to allow >>>> greater certainty. In any case, I would wish to retain as a minimum an >>>> extern bool variable allowing it to be turned off by C function if >>>> desired. >>> >>> Anything changed to postgresql.conf during beta is going to require an >>> initdb. >> >> Huh? Surely not. > > Uh, if we ship beta1 with a GUC in postgresql.conf, and then we remove > support for the GUC in beta2, anyone starting a server initdb-ed with > beta1 is going to get an error and the server is not going to start: > > LOG: unrecognized configuration parameter "xxx" in file "/u/pgsql/data/postgresql.conf" line 1 > FATAL: configuration file "/u/pgsql/data/postgresql.conf" contains errors > > so, yeah, it isn't going to require an initdb, but it is going to > require everyone to edit their postgresql.conf. Only if you uncommented the value in the first place. - Heikki
Bruce Momjian escribió: > > I have no problem removing the parameter if required to. In that case, > > I would like to leave the parameter in until mid beta, to allow > > greater certainty. Uhm. If we remove a GUC during beta we don't need to force an initdb. At worst we will need to keep a no-op GUC variable that is removed in the next devel cycle. That said, if we're going to have a GUC that's going to disappear later, I think it's better to wait for 2 releases as proposed, not remove mid-beta. > > In any case, I would wish to retain as a minimum an extern bool > > variable allowing it to be turned off by C function if desired. I think this amounts to having an undocumented GUC that is hard to change. I prefer the GUC, myself. > Anything changed to postgresql.conf during beta is going to require an > initdb. > Also, lots of backward-compatibility infrastructure, as you are > suggesting above, add complexity to the system. > > I am basically against a GUC on this. We have far larger problems with > 9.3 than backward compatibility, and limited resources. If we have a clear plan on removing the parameter, it's easy enough to follow through. I don't think lack of resources is a good argument, because at that point there will be little to discuss and it's an easy change to make. And I think the plan is clear: if no bug is found the parameter is removed. If a bug is found, it is fixed and the parameter is removed anyway. Honestly, I would prefer that we push a patch that has been thoroughly reviewed and in which we have more confidence, so that we can push without a GUC. This means mark in CF as needs-review, then some other developer reviews it and marks it as ready-for-committer. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jan 28, 2014 at 07:30:47PM +0200, Heikki Linnakangas wrote: > On 01/28/2014 07:26 PM, Bruce Momjian wrote: > >On Tue, Jan 28, 2014 at 07:21:50PM +0200, Heikki Linnakangas wrote: > >>>>I have no problem removing the parameter if required to. In that case, > >>>>I would like to leave the parameter in until mid beta, to allow > >>>>greater certainty. In any case, I would wish to retain as a minimum an > >>>>extern bool variable allowing it to be turned off by C function if > >>>>desired. > >>> > >>>Anything changed to postgresql.conf during beta is going to require an > >>>initdb. > >> > >>Huh? Surely not. > > > >Uh, if we ship beta1 with a GUC in postgresql.conf, and then we remove > >support for the GUC in beta2, anyone starting a server initdb-ed with > >beta1 is going to get an error and the server is not going to start: > > > > LOG: unrecognized configuration parameter "xxx" in file "/u/pgsql/data/postgresql.conf" line 1 > > FATAL: configuration file "/u/pgsql/data/postgresql.conf" contains errors > > > >so, yeah, it isn't going to require an initdb, but it is going to > >require everyone to edit their postgresql.conf. > > Only if you uncommented the value in the first place. Oh, I had forgotten that. Right. It would still be odd to have a commented-out line in postgresql.conf that was not support. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Honestly, I would prefer that we push a patch that has been thoroughly > reviewed and in which we have more confidence, so that we can push > without a GUC. This means mark in CF as needs-review, then some other > developer reviews it and marks it as ready-for-committer. FWIW, that was the point I was trying to make as well ;-) regards, tom lane
On 28 January 2014 17:15, Bruce Momjian <bruce@momjian.us> wrote: > On Tue, Jan 28, 2014 at 04:36:39PM +0000, Simon Riggs wrote: >> For me, reducing the strength of DDL locking is a major change in >> RDBMS behaviour that could both delight and surprise our users. Maybe >> a few actually depend upon the locking behaviour, maybe. After some >> years of various people looking at this, I think we've got it right. >> Experience tells me that while I think this is the outcome, we are >> well advised to protect against the possibility that it is not correct >> and that if we have corner case issues, it would be good to easily >> disable this in the field. In the current case, a simple parameter >> works very well to disable the feature; in other cases, not. >> >> Summary: This is an atypical case. I do not normally propose such >> things - this is the third time in 10 years, IIRC. > > Uh, in my memory, you are the person who is most likely to suggest a > GUC of all our developers. (smiles) I have suggested parameters for many features, mostly in the early days of my developments before I saw the light if autotuning. But those were control parameters for tuning features. So yes, I have probably introduced more parameters than most, amongst the many things I've done. I'm guessing you don't recall how much trouble I went to in order to allow sync rep to have only 1 parameter, c'est la vie. What I'm discussing here is a compatibility parameter to allow new features to be disabled. This would be the third time in 10 years I suggested a parameter for that reason, i.e. one that the user would hardly ever wish to set. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2014-01-27 15:25:22 -0500, Robert Haas wrote: > On Mon, Jan 27, 2014 at 12:58 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > This version adds a GUC called ddl_exclusive_locks which allows us to > > keep the 9.3 behaviour if we wish it. Some people may be surprised > > that their programs don't wait in the same places they used to. We > > hope that is a positive and useful behaviour, but it may not always be > > so. > > > > I'll commit this on Thurs 30 Jan unless I hear objections. > > I haven't reviewed the patch, but -1 for adding a GUC. Dito. I don't think the patch in a bad shape otherwise. I'd very quickly looked at a previous version and it did look rather sane, and several other people had looked at earlier versions as well. IIRC Noah had a fairly extensive look at some intricate race conditions... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 28 January 2014 17:21, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > I don't understand why anyone would want to turn this feature off, ie. > require stronger locks than necessary for a DDL change. Nobody would *want* to turn it off. They might need to, as explained. > If we're not confident that the patch is correct, then it should not be > applied. If we are confident enough to commit it, and a bug crops up later, > we'll fix the bug as usual. I'd like to point out here that my own customers are already well covered by the support services we offer. They will receive any such fix very quickly. My proposal was of assistance only to those without such contracts in place, as are many of my proposals. It doesn't bother me at all if you insist it should not be added. Just choose v16 of the patch for review rather than v17. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jan 28, 2014 at 12:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Bruce Momjian escribió: >> > I have no problem removing the parameter if required to. In that case, >> > I would like to leave the parameter in until mid beta, to allow >> > greater certainty. > > Uhm. If we remove a GUC during beta we don't need to force an initdb. > At worst we will need to keep a no-op GUC variable that is removed in > the next devel cycle. That said, if we're going to have a GUC that's > going to disappear later, I think it's better to wait for 2 releases as > proposed, not remove mid-beta. > >> > In any case, I would wish to retain as a minimum an extern bool >> > variable allowing it to be turned off by C function if desired. > > I think this amounts to having an undocumented GUC that is hard to > change. I prefer the GUC, myself. > >> Anything changed to postgresql.conf during beta is going to require an >> initdb. >> Also, lots of backward-compatibility infrastructure, as you are >> suggesting above, add complexity to the system. >> >> I am basically against a GUC on this. We have far larger problems with >> 9.3 than backward compatibility, and limited resources. > > If we have a clear plan on removing the parameter, it's easy enough to > follow through. I don't think lack of resources is a good argument, > because at that point there will be little to discuss and it's an easy > change to make. And I think the plan is clear: if no bug is found the > parameter is removed. If a bug is found, it is fixed and the parameter > is removed anyway. > > Honestly, I would prefer that we push a patch that has been thoroughly > reviewed and in which we have more confidence, so that we can push > without a GUC. This means mark in CF as needs-review, then some other > developer reviews it and marks it as ready-for-committer. I also believe that would be the correct procedure. Personally, I think it would be great if Noah can review this, as he's very good at finding the kind of problems that are likely to crop up here, and has examined previous versions. I also have some interest in looking at it myself. But I doubt that either of us (or any other senior hacker) can do that by Thursday. I think all such people are hip-deep in other patches at the moment; I certainly am. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jan 28, 2014 at 12:36 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: >> Honestly, I would prefer that we push a patch that has been thoroughly >> reviewed and in which we have more confidence, so that we can push >> without a GUC. This means mark in CF as needs-review, then some other >> developer reviews it and marks it as ready-for-committer. > I also believe that would be the correct procedure. Personally, I > think it would be great if Noah can review this, as he's very good at > finding the kind of problems that are likely to crop up here, and has > examined previous versions. I also have some interest in looking at > it myself. But I doubt that either of us (or any other senior hacker) > can do that by Thursday. I think all such people are hip-deep in > other patches at the moment; I certainly am. Yeah. I actually have little doubt that the patch is sane on its own terms of discussion, that is that Simon has chosen locking levels that are mutually consistent in an abstract sense. What sank the previous iteration was that he'd failed to consider an implementation detail, namely possible inconsistencies in SnapshotNow-based catalog fetches. I'm afraid that there may be more such problems lurking under the surface. Noah's pretty good at finding such things, but really two or three of us need to sit down and think about it for awhile before I'd have much confidence in such a fundamental change. And I sure don't have time for this patch right now myself. regards, tom lane
On 29 January 2014 05:43, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Jan 28, 2014 at 12:36 PM, Alvaro Herrera >> <alvherre@2ndquadrant.com> wrote: >>> Honestly, I would prefer that we push a patch that has been thoroughly >>> reviewed and in which we have more confidence, so that we can push >>> without a GUC. This means mark in CF as needs-review, then some other >>> developer reviews it and marks it as ready-for-committer. > >> I also believe that would be the correct procedure. Personally, I >> think it would be great if Noah can review this, as he's very good at >> finding the kind of problems that are likely to crop up here, and has >> examined previous versions. I also have some interest in looking at >> it myself. But I doubt that either of us (or any other senior hacker) >> can do that by Thursday. I think all such people are hip-deep in >> other patches at the moment; I certainly am. > > Yeah. I actually have little doubt that the patch is sane on its own > terms of discussion, that is that Simon has chosen locking levels that > are mutually consistent in an abstract sense. What sank the previous > iteration was that he'd failed to consider an implementation detail, > namely possible inconsistencies in SnapshotNow-based catalog fetches. > I'm afraid that there may be more such problems lurking under the > surface. Agreed > Noah's pretty good at finding such things, but really two > or three of us need to sit down and think about it for awhile before > I'd have much confidence in such a fundamental change. And I sure don't > have time for this patch right now myself. I've reviewed Noah's earlier comments, fixed things and also further reviewed for any similar relcache related issues. I've also reviewed Hot Standby to see if any locking issues arise, since the ALTER TABLE now won't generate an AccessExclusiveLock as it used to do for certain operations. I can't see any problems there. While doing those reviews, I'd remind everybody that this only affects roughly half of ALTER TABLE subcommands and definitely nothing that affects SELECTs. So the risk profile is much less than it sounds at first glance. If anybody else has any hints or clues about where to look, please mention them and I will investigate. Thanks. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Hi, I took a quick peek at this, and noticed the following things: * I am pretty sure this patch doesn't compile anymore after the latest set of releases. * This definitely should include isolationtester tests actually performing concurrent ALTER TABLEs. All that's currentlythere is tests that the locklevel isn't too high, but not that it actually works. * So far no DDL uses ShareRowExclusiveLocks, why do so now? Not sure if there aren't relevant cases for with foreign keychecks (which afair *do* acquire SRE locks). * Why is AddConstraint "so complicated" when it's all used SRE locks? * Are you sure AlterConstraint is generally safe without an AEL? It should be safe to change whether something is deferred,but not necessarily whether it's deferrable? * Why does ChangeOwner need AEL? * You changed several places to take out lower locks, which in itself is fine, but doesn't that lead to lock upgrade hazardif a later step acquires a stronger lock? Or do we take out a strong enough lock from the beginning. * There's no explanation why the EOXact TupleDesc stuff is needed? That's it for now, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 24 February 2014 16:01, Andres Freund <andres@2ndquadrant.com> wrote: > Hi, > > I took a quick peek at this, and noticed the following things: > * I am pretty sure this patch doesn't compile anymore after the latest > set of releases. Refreshed to v18, will post shortly. > * This definitely should include isolationtester tests actually > performing concurrent ALTER TABLEs. All that's currently there is > tests that the locklevel isn't too high, but not that it actually works. There is no concurrent behaviour here, hence no code that would be exercised by concurrent tests. Lock levels are proven in regression tests, so no further tests needed. > * So far no DDL uses ShareRowExclusiveLocks, why do so now? Not sure if > there aren't relevant cases for with foreign key checks (which afair > *do* acquire SRE locks). That was discussed long ago. We can relax it more in the future, if that is considered safe. > * Why is AddConstraint "so complicated" when it's all used SRE locks? To ensure each case was considered, rather than just assume all cases are the same. > * Are you sure AlterConstraint is generally safe without an AEL? It > should be safe to change whether something is deferred, but not > necessarily whether it's deferrable? We change the lock levels for individual commands. This patch provides some initial settings and infrastructure. It is possible you are correct that changing the deferrability is not safe without an AEL. That was enabled for the first time in this release in a patch added by me after this patch was written. I will think on that and change, if required. > * Why does ChangeOwner need AEL? Ownership affects privileges, which includes SELECTs, hence AEL. This is not a critical aspect of the patch. > * You changed several places to take out lower locks, which in itself is > fine, but doesn't that lead to lock upgrade hazard if a later step > acquires a stronger lock? Or do we take out a strong enough lock from > the beginning. We asess the lock needed at parse time, then use it consistently later. There is no lock upgrade hazard since that has been specifically considered and removed. > * There's no explanation why the EOXact TupleDesc stuff is needed? I will update comments. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Hi, On 2014-02-26 07:32:45 +0000, Simon Riggs wrote: > > * This definitely should include isolationtester tests actually > > performing concurrent ALTER TABLEs. All that's currently there is > > tests that the locklevel isn't too high, but not that it actually works. > > There is no concurrent behaviour here, hence no code that would be > exercised by concurrent tests. Huh? There's most definitely new concurrent behaviour. Previously no other backends could have a relation open (and locked) while it got altered (which then sends out relcache invalidations). That's something that should be tested. > > * Why does ChangeOwner need AEL? > > Ownership affects privileges, which includes SELECTs, hence AEL. So? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 26 February 2014 13:38, Andres Freund <andres@2ndquadrant.com> wrote: > Hi, > > On 2014-02-26 07:32:45 +0000, Simon Riggs wrote: >> > * This definitely should include isolationtester tests actually >> > performing concurrent ALTER TABLEs. All that's currently there is >> > tests that the locklevel isn't too high, but not that it actually works. >> >> There is no concurrent behaviour here, hence no code that would be >> exercised by concurrent tests. > > Huh? There's most definitely new concurrent behaviour. Previously no > other backends could have a relation open (and locked) while it got > altered (which then sends out relcache invalidations). That's something > that should be tested. It has been. High volume concurrent testing has been performed, per Tom's original discussion upthread, but that's not part of the test suite. For other tests I have no guide as to how to write a set of automated regression tests. Anything could cause a failure, so I'd need to write an infinite set of tests to prove there is no bug *somewhere*. How many tests are required? 0, 1, 3, 30? >> > * Why does ChangeOwner need AEL? >> >> Ownership affects privileges, which includes SELECTs, hence AEL. > > So? That reply could be added to any post. Please explain your concern. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2014-02-26 15:15:00 +0000, Simon Riggs wrote: > On 26 February 2014 13:38, Andres Freund <andres@2ndquadrant.com> wrote: > > Hi, > > > > On 2014-02-26 07:32:45 +0000, Simon Riggs wrote: > >> > * This definitely should include isolationtester tests actually > >> > performing concurrent ALTER TABLEs. All that's currently there is > >> > tests that the locklevel isn't too high, but not that it actually works. > >> > >> There is no concurrent behaviour here, hence no code that would be > >> exercised by concurrent tests. > > > > Huh? There's most definitely new concurrent behaviour. Previously no > > other backends could have a relation open (and locked) while it got > > altered (which then sends out relcache invalidations). That's something > > that should be tested. > > It has been. High volume concurrent testing has been performed, per > Tom's original discussion upthread, but that's not part of the test > suite. Yea, that's not what I am looking for. > For other tests I have no guide as to how to write a set of automated > regression tests. Anything could cause a failure, so I'd need to write > an infinite set of tests to prove there is no bug *somewhere*. How > many tests are required? 0, 1, 3, 30? I think some isolationtester tests for the most important changes in lock levels are appropriate. Say, create a PRIMARY KEY, DROP INHERIT, ... while a query is in progress in a nother session. > >> > * Why does ChangeOwner need AEL? > >> > >> Ownership affects privileges, which includes SELECTs, hence AEL. > > > > So? > > That reply could be added to any post. Please explain your concern. I don't understand why that means it needs an AEL. After all, e.g. changes in table inheritance do *not* require an AEL. I think it's perfectly ok to not go for the minimally required locklevel for all subcommands, but then it should be commented as such and not with "change visible to SELECT" where other operations that do so as well require lower locklevels. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 26 February 2014 07:32, Simon Riggs <simon@2ndquadrant.com> wrote: >> * Are you sure AlterConstraint is generally safe without an AEL? It >> should be safe to change whether something is deferred, but not >> necessarily whether it's deferrable? > > We change the lock levels for individual commands. This patch provides > some initial settings and infrastructure. > > It is possible you are correct that changing the deferrability is not > safe without an AEL. That was enabled for the first time in this > release in a patch added by me after this patch was written. I will > think on that and change, if required. Thoughts... It would be a problem to change a DEFERRABLE constraint into a NOT DEFERRABLE constraint concurrently with a transaction that was currently deferring its constraint checks. It would not be a problem to go in the other direction, relaxing the constraint attributes. The patch uses ShareRowExclusive for AlterConstraint, so no writes are possible concurrently with the table being ALTERed, hence the problem situation cannot arise. So in my understanding, no issue is possible. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 26 February 2014 15:25, Andres Freund <andres@2ndquadrant.com> wrote: >> >> > * Why does ChangeOwner need AEL? >> >> >> >> Ownership affects privileges, which includes SELECTs, hence AEL. >> > >> > So? >> >> That reply could be added to any post. Please explain your concern. > > I don't understand why that means it needs an AEL. After all, > e.g. changes in table inheritance do *not* require an AEL. I think it's > perfectly ok to not go for the minimally required locklevel for all > subcommands, but then it should be commented as such and not with > "change visible to SELECT" where other operations that do so as well > require lower locklevels. Those are two separate cases, with separate lock levels, so that argument doesn't hold. My understanding of the argument as to why Inheritance doesn't need AEL is that adding/removing children is akin to inserting or deleting rows from the parent. Removing SELECT privilege while running a SELECT would be a different matter. This is all a matter of definition; we can make up any rules we like. Doing so is IMHO a separate patch and not something to hold up the main patch. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 26 February 2014 15:25, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-02-26 15:15:00 +0000, Simon Riggs wrote: >> On 26 February 2014 13:38, Andres Freund <andres@2ndquadrant.com> wrote: >> > Hi, >> > >> > On 2014-02-26 07:32:45 +0000, Simon Riggs wrote: >> >> > * This definitely should include isolationtester tests actually >> >> > performing concurrent ALTER TABLEs. All that's currently there is >> >> > tests that the locklevel isn't too high, but not that it actually works. >> >> >> >> There is no concurrent behaviour here, hence no code that would be >> >> exercised by concurrent tests. >> > >> > Huh? There's most definitely new concurrent behaviour. Previously no >> > other backends could have a relation open (and locked) while it got >> > altered (which then sends out relcache invalidations). That's something >> > that should be tested. >> >> It has been. High volume concurrent testing has been performed, per >> Tom's original discussion upthread, but that's not part of the test >> suite. > > Yea, that's not what I am looking for. > >> For other tests I have no guide as to how to write a set of automated >> regression tests. Anything could cause a failure, so I'd need to write >> an infinite set of tests to prove there is no bug *somewhere*. How >> many tests are required? 0, 1, 3, 30? > > I think some isolationtester tests for the most important changes in > lock levels are appropriate. Say, create a PRIMARY KEY, DROP INHERIT, > ... while a query is in progress in a nother session. OK, I'll work on some tests. v18 attached, with v19 coming soon -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 27 February 2014 08:48, Simon Riggs <simon@2ndquadrant.com> wrote: > On 26 February 2014 15:25, Andres Freund <andres@2ndquadrant.com> wrote: >> On 2014-02-26 15:15:00 +0000, Simon Riggs wrote: >>> On 26 February 2014 13:38, Andres Freund <andres@2ndquadrant.com> wrote: >>> > Hi, >>> > >>> > On 2014-02-26 07:32:45 +0000, Simon Riggs wrote: >>> >> > * This definitely should include isolationtester tests actually >>> >> > performing concurrent ALTER TABLEs. All that's currently there is >>> >> > tests that the locklevel isn't too high, but not that it actually works. >>> >> >>> >> There is no concurrent behaviour here, hence no code that would be >>> >> exercised by concurrent tests. >>> > >>> > Huh? There's most definitely new concurrent behaviour. Previously no >>> > other backends could have a relation open (and locked) while it got >>> > altered (which then sends out relcache invalidations). That's something >>> > that should be tested. >>> >>> It has been. High volume concurrent testing has been performed, per >>> Tom's original discussion upthread, but that's not part of the test >>> suite. >> >> Yea, that's not what I am looking for. >> >>> For other tests I have no guide as to how to write a set of automated >>> regression tests. Anything could cause a failure, so I'd need to write >>> an infinite set of tests to prove there is no bug *somewhere*. How >>> many tests are required? 0, 1, 3, 30? >> >> I think some isolationtester tests for the most important changes in >> lock levels are appropriate. Say, create a PRIMARY KEY, DROP INHERIT, >> ... while a query is in progress in a nother session. > > OK, I'll work on some tests. > > v18 attached, with v19 coming soon v19 complete apart from requested comment additions -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 03/01/2014 12:06 PM, Simon Riggs wrote: > On 27 February 2014 08:48, Simon Riggs <simon@2ndquadrant.com> wrote: >> On 26 February 2014 15:25, Andres Freund <andres@2ndquadrant.com> wrote: >>> On 2014-02-26 15:15:00 +0000, Simon Riggs wrote: >>>> On 26 February 2014 13:38, Andres Freund <andres@2ndquadrant.com> wrote: >>>>> Hi, >>>>> >>>>> On 2014-02-26 07:32:45 +0000, Simon Riggs wrote: >>>>>>> * This definitely should include isolationtester tests actually >>>>>>> performing concurrent ALTER TABLEs. All that's currently there is >>>>>>> tests that the locklevel isn't too high, but not that it actually works. >>>>>> There is no concurrent behaviour here, hence no code that would be >>>>>> exercised by concurrent tests. >>>>> Huh? There's most definitely new concurrent behaviour. Previously no >>>>> other backends could have a relation open (and locked) while it got >>>>> altered (which then sends out relcache invalidations). That's something >>>>> that should be tested. >>>> It has been. High volume concurrent testing has been performed, per >>>> Tom's original discussion upthread, but that's not part of the test >>>> suite. >>> Yea, that's not what I am looking for. >>> >>>> For other tests I have no guide as to how to write a set of automated >>>> regression tests. Anything could cause a failure, so I'd need to write >>>> an infinite set of tests to prove there is no bug *somewhere*. How >>>> many tests are required? 0, 1, 3, 30? >>> I think some isolationtester tests for the most important changes in >>> lock levels are appropriate. Say, create a PRIMARY KEY, DROP INHERIT, >>> ... while a query is in progress in a nother session. >> OK, I'll work on some tests. >> >> v18 attached, with v19 coming soon > v19 complete apart from requested comment additions I've started to look at this patch and re-read the thread. The first thing I noticed is what seems like an automated replace error. The docs say "This form requires only an SHARE x EXCLUSIVE lock." where the "an" was not changed to "a". Attached is a patch-on-patch to fix this. A more complete review will come later. -- Vik
Attachment
On 1 March 2014 21:25, Vik Fearing <vik.fearing@dalibo.com> wrote: > On 03/01/2014 12:06 PM, Simon Riggs wrote: >> On 27 February 2014 08:48, Simon Riggs <simon@2ndquadrant.com> wrote: >>> On 26 February 2014 15:25, Andres Freund <andres@2ndquadrant.com> wrote: >>>> On 2014-02-26 15:15:00 +0000, Simon Riggs wrote: >>>>> On 26 February 2014 13:38, Andres Freund <andres@2ndquadrant.com> wrote: >>>>>> Hi, >>>>>> >>>>>> On 2014-02-26 07:32:45 +0000, Simon Riggs wrote: >>>>>>>> * This definitely should include isolationtester tests actually >>>>>>>> performing concurrent ALTER TABLEs. All that's currently there is >>>>>>>> tests that the locklevel isn't too high, but not that it actually works. >>>>>>> There is no concurrent behaviour here, hence no code that would be >>>>>>> exercised by concurrent tests. >>>>>> Huh? There's most definitely new concurrent behaviour. Previously no >>>>>> other backends could have a relation open (and locked) while it got >>>>>> altered (which then sends out relcache invalidations). That's something >>>>>> that should be tested. >>>>> It has been. High volume concurrent testing has been performed, per >>>>> Tom's original discussion upthread, but that's not part of the test >>>>> suite. >>>> Yea, that's not what I am looking for. >>>> >>>>> For other tests I have no guide as to how to write a set of automated >>>>> regression tests. Anything could cause a failure, so I'd need to write >>>>> an infinite set of tests to prove there is no bug *somewhere*. How >>>>> many tests are required? 0, 1, 3, 30? >>>> I think some isolationtester tests for the most important changes in >>>> lock levels are appropriate. Say, create a PRIMARY KEY, DROP INHERIT, >>>> ... while a query is in progress in a nother session. >>> OK, I'll work on some tests. >>> >>> v18 attached, with v19 coming soon >> v19 complete apart from requested comment additions > > I've started to look at this patch and re-read the thread. The first > thing I noticed is what seems like an automated replace error. The docs > say "This form requires only an SHARE x EXCLUSIVE lock." where the "an" > was not changed to "a". > > Attached is a patch-on-patch to fix this. A more complete review will > come later. v20 includes slightly re-ordered checks in GetLockLevel, plus more detailed comments on each group of subcommands. Also corrects grammar as noted by Vik. Plus adds an example of usage to the docs. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Thu, Feb 27, 2014 at 3:12 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Removing SELECT privilege while running a SELECT would be a different > matter. This is all a matter of definition; we can make up any rules > we like. Doing so is IMHO a separate patch and not something to hold > up the main patch. So I think this is an interesting point. There are various things that could go wrong as a result of using the wrong lock level. Worst would be that the server crashes or corrupts data. A little less bad would be that sessions error out with inexplicable error conditions, as in SnapshotNow days. Alternatively, we could just have arguably wrong behavior, like basing query results on the old version of the table's metadata even after it's been changed. I don't really care about that second category of behavior. If somebody changes some property of a table and existing sessions continue to use the old value until eoxact, well, we can argue about that, but at least until we have concrete reports of really undesirable behavior, I don't think it's the primary issue. What I'm really concerned about is whether there are other things like the SnapshotNow issues that can cause stuff to halt and catch fire. I don't know whether there are or are not, but that's my concern. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 3 March 2014 15:19, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Feb 27, 2014 at 3:12 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > What I'm > really concerned about is whether there are other things like the > SnapshotNow issues that can cause stuff to halt and catch fire. I > don't know whether there are or are not, but that's my concern. Of course its a concern, I feel it also. But that's why we have beta period to handle the unknowns. The question is are there any specific areas of concern here? If not, then we commit because we've done a lot of work on it and at the moment the balance is high benefit to users against a non-specific feeling of risk. @Noah - Last call... -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes: > On 3 March 2014 15:19, Robert Haas <robertmhaas@gmail.com> wrote: >> What I'm >> really concerned about is whether there are other things like the >> SnapshotNow issues that can cause stuff to halt and catch fire. I >> don't know whether there are or are not, but that's my concern. > Of course its a concern, I feel it also. But that's why we have beta > period to handle the unknowns. I have exactly zero faith that beta testing would catch low-probability problems in this area. What's needed, and hasn't happened AFAIK, is detailed study of the patch by assorted senior hackers. > The question is are there any specific areas of concern here? If not, > then we commit because we've done a lot of work on it and at the > moment the balance is high benefit to users against a non-specific > feeling of risk. This is backwards. The default decision around here has never been to commit when in doubt. regards, tom lane
On Sun, Mar 2, 2014 at 4:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > v20 includes slightly re-ordered checks in GetLockLevel, plus more > detailed comments on each group of subcommands. > > Also corrects grammar as noted by Vik. > > Plus adds an example of usage to the docs. This patch contains a one line change to src/bin/pg_dump/pg_backup_archiver.c which seems not to belong. This hunk in ATRewriteCatalogs() looks scary: + /* + * If we think we might need to add/re-add toast tables then + * we currently need to hold an AccessExclusiveLock. + */ + if (lockmode < AccessExclusiveLock) + return; It would make sense to me to add an Assert() or elog() check inside the subsequent loop to verify that the lock level is adequate ... but just returning silently seems like a bad idea. I have my doubts about whether it's safe to do AT_AddInherit, AT_DropInherit, AT_AddOf, or AT_DropOf with a full lock. All of those can change the tuple descriptor, and we discussed, back when we did this the first time, the fact that the executor may get *very* unhappy if the tuple descriptor changes in mid-execution. I strongly suspect these are unsafe with less than a full AccessExclusiveLock. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 3 March 2014 15:53, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On 3 March 2014 15:19, Robert Haas <robertmhaas@gmail.com> wrote: >>> What I'm >>> really concerned about is whether there are other things like the >>> SnapshotNow issues that can cause stuff to halt and catch fire. I >>> don't know whether there are or are not, but that's my concern. > >> Of course its a concern, I feel it also. But that's why we have beta >> period to handle the unknowns. > > I have exactly zero faith that beta testing would catch low-probability > problems in this area. What's needed, and hasn't happened AFAIK, is > detailed study of the patch by assorted senior hackers. > >> The question is are there any specific areas of concern here? If not, >> then we commit because we've done a lot of work on it and at the >> moment the balance is high benefit to users against a non-specific >> feeling of risk. > > This is backwards. The default decision around here has never been > to commit when in doubt. I'm not in doubt. If anybody can give me some more pointers of things to look at, I will. But I've looked and I can't see anything more. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 3 March 2014 16:06, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Mar 2, 2014 at 4:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> v20 includes slightly re-ordered checks in GetLockLevel, plus more >> detailed comments on each group of subcommands. >> >> Also corrects grammar as noted by Vik. >> >> Plus adds an example of usage to the docs. > > This patch contains a one line change to > src/bin/pg_dump/pg_backup_archiver.c which seems not to belong. > > This hunk in ATRewriteCatalogs() looks scary: > > + /* > + * If we think we might need to add/re-add toast tables then > + * we currently need to hold an AccessExclusiveLock. > + */ > + if (lockmode < AccessExclusiveLock) > + return; > > It would make sense to me to add an Assert() or elog() check inside > the subsequent loop to verify that the lock level is adequate ... but > just returning silently seems like a bad idea. OK, I will check elog. > I have my doubts about whether it's safe to do AT_AddInherit, > AT_DropInherit, AT_AddOf, or AT_DropOf with a full lock. All of those > can change the tuple descriptor, and we discussed, back when we did > this the first time, the fact that the executor may get *very* unhappy > if the tuple descriptor changes in mid-execution. I strongly suspect > these are unsafe with less than a full AccessExclusiveLock. I'm happy to change those if you feel there is insufficient evidence. I don't personally feel that it would matter to usability to keep locks for those at AccessExclusiveLock, especially since they are otherwise fast. Some others might be kept higher also. I'm merely trying to balance between requests to reduce to minimal theoretical level and fears that anything less than AccessExclusiveLock is a problem. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Mar 3, 2014 at 11:29 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 3 March 2014 16:06, Robert Haas <robertmhaas@gmail.com> wrote: >> On Sun, Mar 2, 2014 at 4:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> v20 includes slightly re-ordered checks in GetLockLevel, plus more >>> detailed comments on each group of subcommands. >>> >>> Also corrects grammar as noted by Vik. >>> >>> Plus adds an example of usage to the docs. >> >> This patch contains a one line change to >> src/bin/pg_dump/pg_backup_archiver.c which seems not to belong. >> >> This hunk in ATRewriteCatalogs() looks scary: >> >> + /* >> + * If we think we might need to add/re-add toast tables then >> + * we currently need to hold an AccessExclusiveLock. >> + */ >> + if (lockmode < AccessExclusiveLock) >> + return; >> >> It would make sense to me to add an Assert() or elog() check inside >> the subsequent loop to verify that the lock level is adequate ... but >> just returning silently seems like a bad idea. > > OK, I will check elog. > >> I have my doubts about whether it's safe to do AT_AddInherit, >> AT_DropInherit, AT_AddOf, or AT_DropOf with a full lock. All of those >> can change the tuple descriptor, and we discussed, back when we did >> this the first time, the fact that the executor may get *very* unhappy >> if the tuple descriptor changes in mid-execution. I strongly suspect >> these are unsafe with less than a full AccessExclusiveLock. > > I'm happy to change those if you feel there is insufficient evidence. Not sure what that means, but yes, I think the lock levels on those need to be increased. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Mar 03, 2014 at 10:19:55AM -0500, Robert Haas wrote: > On Thu, Feb 27, 2014 at 3:12 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > Removing SELECT privilege while running a SELECT would be a different > > matter. This is all a matter of definition; we can make up any rules > > we like. Doing so is IMHO a separate patch and not something to hold > > up the main patch. > > So I think this is an interesting point. There are various things > that could go wrong as a result of using the wrong lock level. Worst > would be that the server crashes or corrupts data. A little less bad > would be that sessions error out with inexplicable error conditions, > as in SnapshotNow days. Alternatively, we could just have arguably > wrong behavior, like basing query results on the old version of the > table's metadata even after it's been changed. I would order the concerns like this: 1. Data corruption 2. Transient, clearly-wrong answers without an error 3. Server crash 4. Catalog logical inconsistency 5. Inexplicable, transient errors 6. Valid behavior capable of surprising more than zero upgraders > I don't really care about that second category of behavior. If > somebody changes some property of a table and existing sessions > continue to use the old value until eoxact, well, we can argue about > that, but at least until we have concrete reports of really > undesirable behavior, I don't think it's the primary issue. What I'm > really concerned about is whether there are other things like the > SnapshotNow issues that can cause stuff to halt and catch fire. I > don't know whether there are or are not, but that's my concern. Since we can't know whether something qualifies as (2) or (6) without analyzing it, I don't find waiting for user complaints to be a good strategy here. An ownership change not immediately affecting ACL checks does fall under (6), for me. (However, changing ownership without AccessExclusiveLock might also create hazards in category (4) for concurrent DDL that performs owner checks.) -- Noah Misch EnterpriseDB http://www.enterprisedb.com
On Mon, Mar 03, 2014 at 03:43:46PM +0000, Simon Riggs wrote: > The question is are there any specific areas of concern here? If not, > then we commit because we've done a lot of work on it and at the > moment the balance is high benefit to users against a non-specific > feeling of risk. > > @Noah - Last call... I am not specifically aware of any outstanding problems. I have planned to give this a close look, but it will be at least two weeks before I dig out far enough to do so. If that makes it a post-commit review, so be it. -- Noah Misch EnterpriseDB http://www.enterprisedb.com
On 3 March 2014 18:57, Noah Misch <noah@leadboat.com> wrote: > On Mon, Mar 03, 2014 at 10:19:55AM -0500, Robert Haas wrote: >> On Thu, Feb 27, 2014 at 3:12 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> > Removing SELECT privilege while running a SELECT would be a different >> > matter. This is all a matter of definition; we can make up any rules >> > we like. Doing so is IMHO a separate patch and not something to hold >> > up the main patch. >> >> So I think this is an interesting point. There are various things >> that could go wrong as a result of using the wrong lock level. Worst >> would be that the server crashes or corrupts data. A little less bad >> would be that sessions error out with inexplicable error conditions, >> as in SnapshotNow days. Alternatively, we could just have arguably >> wrong behavior, like basing query results on the old version of the >> table's metadata even after it's been changed. > > I would order the concerns like this: > > 1. Data corruption > 2. Transient, clearly-wrong answers without an error > 3. Server crash > 4. Catalog logical inconsistency > 5. Inexplicable, transient errors > 6. Valid behavior capable of surprising more than zero upgraders I like your model for risk assessment. How can we apply it in detail in a way that helps us decide? Or do we just go on gut feel? My experience with mentioning such topics is that without structure it results in an assessment of "unacceptable risk" just simply because somebody has mentioned some scary words. >> I don't really care about that second category of behavior. If >> somebody changes some property of a table and existing sessions >> continue to use the old value until eoxact, well, we can argue about >> that, but at least until we have concrete reports of really >> undesirable behavior, I don't think it's the primary issue. What I'm >> really concerned about is whether there are other things like the >> SnapshotNow issues that can cause stuff to halt and catch fire. I >> don't know whether there are or are not, but that's my concern. > > Since we can't know whether something qualifies as (2) or (6) without > analyzing it, I don't find waiting for user complaints to be a good strategy > here. An ownership change not immediately affecting ACL checks does fall > under (6), for me. (However, changing ownership without AccessExclusiveLock > might also create hazards in category (4) for concurrent DDL that performs > owner checks.) err, guys, you do realise that changing ownership is staying at AccessExclusiveLock in this patch? (and I haven't ever suggested lowering that). -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 3 March 2014 16:36, Robert Haas <robertmhaas@gmail.com> wrote: >>> This hunk in ATRewriteCatalogs() looks scary: >>> >>> + /* >>> + * If we think we might need to add/re-add toast tables then >>> + * we currently need to hold an AccessExclusiveLock. >>> + */ >>> + if (lockmode < AccessExclusiveLock) >>> + return; >>> >>> It would make sense to me to add an Assert() or elog() check inside >>> the subsequent loop to verify that the lock level is adequate ... but >>> just returning silently seems like a bad idea. >> >> OK, I will check elog. >> >>> I have my doubts about whether it's safe to do AT_AddInherit, >>> AT_DropInherit, AT_AddOf, or AT_DropOf with a full lock. All of those >>> can change the tuple descriptor, and we discussed, back when we did >>> this the first time, the fact that the executor may get *very* unhappy >>> if the tuple descriptor changes in mid-execution. I strongly suspect >>> these are unsafe with less than a full AccessExclusiveLock. >> >> I'm happy to change those if you feel there is insufficient evidence. > > Not sure what that means, but yes, I think the lock levels on those > need to be increased. v21 with all requested changes, comments and cleanup -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Mon, Mar 03, 2014 at 07:19:45PM +0000, Simon Riggs wrote: > On 3 March 2014 18:57, Noah Misch <noah@leadboat.com> wrote: > > On Mon, Mar 03, 2014 at 10:19:55AM -0500, Robert Haas wrote: > >> On Thu, Feb 27, 2014 at 3:12 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > >> > Removing SELECT privilege while running a SELECT would be a different > >> > matter. This is all a matter of definition; we can make up any rules > >> > we like. Doing so is IMHO a separate patch and not something to hold > >> > up the main patch. > >> > >> So I think this is an interesting point. There are various things > >> that could go wrong as a result of using the wrong lock level. Worst > >> would be that the server crashes or corrupts data. A little less bad > >> would be that sessions error out with inexplicable error conditions, > >> as in SnapshotNow days. Alternatively, we could just have arguably > >> wrong behavior, like basing query results on the old version of the > >> table's metadata even after it's been changed. > > > > I would order the concerns like this: > > > > 1. Data corruption > > 2. Transient, clearly-wrong answers without an error > > 3. Server crash > > 4. Catalog logical inconsistency > > 5. Inexplicable, transient errors > > 6. Valid behavior capable of surprising more than zero upgraders > > I like your model for risk assessment. How can we apply it in detail > in a way that helps us decide? Or do we just go on gut feel? > > My experience with mentioning such topics is that without structure it > results in an assessment of "unacceptable risk" just simply because > somebody has mentioned some scary words. True; it's tough to make use of such a prioritization. By the time you can confidently assign something to a category, you can probably just fix it. (Or, in the case of category (6), document/release-note it.) Just to be clear, that list is not a commentary on the particular patch at hand. Those are merely the kinds of regressions to look for in a patch affecting this area of the code. > err, guys, you do realise that changing ownership is staying at > AccessExclusiveLock in this patch? > (and I haven't ever suggested lowering that). Yep. -- Noah Misch EnterpriseDB http://www.enterprisedb.com
Noah Misch <noah@leadboat.com> writes: > Just to be clear, that list is not a commentary on the particular patch at > hand. Those are merely the kinds of regressions to look for in a patch > affecting this area of the code. A complaint on pgsql-bugs just now reminded me of a specific area that needs to be looked at hard: how bad are the implications for pg_dump? Up to now, pg_dump could be reasonably confident that once it had AccessShareLock on every table it intended to dump, there would be no schema changes happening on those tables until it got done. This greatly ameliorates the snapshot-skew problems that arise from its habit of doing some things for itself and other things via backend-internal functions (which historically used SnapshotNow and now use a fresh MVCC snapshot, either way potentially quite newer than the transaction snapshot pg_dump's own queries will use). I suspect that lowering the lock requirements for anything that affects what pg_dump can see is going to make things a whole lot worse in terms of consistency of pg_dump output in the face of concurrent DDL. Admittedly, we're not perfect in that area now, but I'm not sure that's an adequate excuse for making it worse. regards, tom lane
On 2014-03-03 19:15:27 -0500, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > Just to be clear, that list is not a commentary on the particular patch at > > hand. Those are merely the kinds of regressions to look for in a patch > > affecting this area of the code. > > A complaint on pgsql-bugs just now reminded me of a specific area that > needs to be looked at hard: how bad are the implications for pg_dump? > > Up to now, pg_dump could be reasonably confident that once it had > AccessShareLock on every table it intended to dump, there would be no > schema changes happening on those tables until it got done. The guarantee wasn't actually that strong. It already was quite possible that indexes got created/dropped during that time, which probably is the by far most frequent DDL run in production. > This greatly > ameliorates the snapshot-skew problems that arise from its habit of doing > some things for itself and other things via backend-internal functions > (which historically used SnapshotNow and now use a fresh MVCC snapshot, > either way potentially quite newer than the transaction snapshot pg_dump's > own queries will use). Yea, I wonder if we shouldn't start to make them use a different snapshot. It's the pg_get_*def() functions, or is there something else? Afair (I really haven't rechecked) all the actions that have a changed locklevels affect things that pg_dump recreates clientside, using a repeatable read snapshot, so there shouldn't be much change there? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-03-03 19:15:27 -0500, Tom Lane wrote: >> This greatly >> ameliorates the snapshot-skew problems that arise from its habit of doing >> some things for itself and other things via backend-internal functions >> (which historically used SnapshotNow and now use a fresh MVCC snapshot, >> either way potentially quite newer than the transaction snapshot pg_dump's >> own queries will use). > Yea, I wonder if we shouldn't start to make them use a different > snapshot. It's the pg_get_*def() functions, or is there something else? See past discussions. By the time you trace all of ruleutils.c's dependencies, a dauntingly large fraction of the backend's basic catalog access support is implicated. For example, you'd need some way of making the catcaches return data that they know to be outdated. And I'm pretty sure pg_dump is making free use of stuff that isn't even in ruleutils. I would like to see pg_dump doing something much more bulletproof, but I'm afraid that there isn't any nice simple fix available. One relatively narrow fix that would probably make it a lot better *in the current state of affairs* is to provide a way whereby, once pg_dump has locks on everything it wants to dump, it can advance its transaction snapshot to current time. Then at least both its own queries and the backend's lookups will see post-locking state. However, if AccessShareLock isn't enough to block DDL on the tables then we're still hosed. > Afair (I really haven't rechecked) all the actions that have a changed > locklevels affect things that pg_dump recreates clientside, using a > repeatable read snapshot, so there shouldn't be much change there? You're missing the point entirely if you think pg_dump recreates everything client-side. It's never done that 100%, and we've migrated more and more stuff to the server side over time to avoid duplicate implementations of things like index expression decompilation. So while a theoretical answer would be to remove all of pg_dump's dependency on server-side functions, I am pretty sure that's never going to happen. regards, tom lane
On 2014-03-03 20:32:13 -0500, Tom Lane wrote: > > Afair (I really haven't rechecked) all the actions that have a changed > > locklevels affect things that pg_dump recreates clientside, using a > > repeatable read snapshot, so there shouldn't be much change there? > > You're missing the point entirely if you think pg_dump recreates > everything client-side. No, I am not obviously not thinking that. What I mean is that the things that actually change their locking requirement in the proposed patch primarily influence things that are reconstructed clientside by pg_dump. E.g ALTER TABLE ... CLUSTER ON, SET(...), ... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-03-03 20:32:13 -0500, Tom Lane wrote: >> You're missing the point entirely if you think pg_dump recreates >> everything client-side. > No, I am not obviously not thinking that. What I mean is that the things > that actually change their locking requirement in the proposed patch > primarily influence things that are reconstructed clientside by > pg_dump. E.g ALTER TABLE ... CLUSTER ON, SET(...), ... [ raised eyebrow... ] I'm pretty sure that no such constraint was part of the design discussion to start with. Even if it accidentally happens to be the case now, it sounds utterly fragile. regards, tom lane
On 4 March 2014 01:07, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-03-03 19:15:27 -0500, Tom Lane wrote: >> Noah Misch <noah@leadboat.com> writes: >> > Just to be clear, that list is not a commentary on the particular patch at >> > hand. Those are merely the kinds of regressions to look for in a patch >> > affecting this area of the code. >> >> A complaint on pgsql-bugs just now reminded me of a specific area that >> needs to be looked at hard: how bad are the implications for pg_dump? >> >> Up to now, pg_dump could be reasonably confident that once it had >> AccessShareLock on every table it intended to dump, there would be no >> schema changes happening on those tables until it got done. > > The guarantee wasn't actually that strong. It already was quite possible > that indexes got created/dropped during that time, which probably is the > by far most frequent DDL run in production. Good points. In most cases, DDL is applied manually after careful thought, so people seldom dump at the same time they upgrade the database. This is especially true for pg_dump since it captures the logical definition of tables. So most people will be happy with the default locking, but we could make the lock level optional. Currently we use AccessShareLock. Locking out all DDL, even with this patch applied would only require ShareUpdateExclusiveLock. Looking at the code, it will take about an hour to add an option to pg_dump that specifies the lock level used when dumping. I would be happy to include that as part of this patch. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Good points.
In most cases, DDL is applied manually after careful thought, so
people seldom dump at the same time they upgrade the database. This is
especially true for pg_dump since it captures the logical definition
of tables. So most people will be happy with the default locking, but
we could make the lock level optional.
Currently we use AccessShareLock. Locking out all DDL, even with this
patch applied would only require ShareUpdateExclusiveLock.
Looking at the code, it will take about an hour to add an option to
pg_dump that specifies the lock level used when dumping. I would be
happy to include that as part of this patch.
If its not the case, the user should be more careful about when he is scheduling backups to so that they dont conflict with DDL changes.
I am not too comfortable with exposing the locking type to the user. That may be just me though.
Regards,
Atri
--
Regards,
Atri
l'apprenant
On 4 March 2014 08:39, Atri Sharma <atri.jiit@gmail.com> wrote: > >> >> Good points. >> >> In most cases, DDL is applied manually after careful thought, so >> people seldom dump at the same time they upgrade the database. This is >> especially true for pg_dump since it captures the logical definition >> of tables. So most people will be happy with the default locking, but >> we could make the lock level optional. >> >> Currently we use AccessShareLock. Locking out all DDL, even with this >> patch applied would only require ShareUpdateExclusiveLock. >> >> Looking at the code, it will take about an hour to add an option to >> pg_dump that specifies the lock level used when dumping. I would be >> happy to include that as part of this patch. > > > > I think the use case for specifying multiple locks is pretty slim given that > a ShareUpdateExclusiveLock is good enough mostly for everybody. Increasing the lock strength would be a change in behaviour that might adversely affect existing users. > If its not the case, the user should be more careful about when he is > scheduling backups to so that they dont conflict with DDL changes. That is most certainly the wise choice. > I am not too comfortable with exposing the locking type to the user. That > may be just me though. Why would that be a problem? Hard reasons, please. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
> If its not the case, the user should be more careful about when he isThat is most certainly the wise choice.
> scheduling backups to so that they dont conflict with DDL changes.Why would that be a problem? Hard reasons, please.
> I am not too comfortable with exposing the locking type to the user. That
> may be just me though.
Should we genuinely depend on the user's good judgement to decide the locking types?
Regards,
Atri
l'apprenant
On 4 March 2014 09:31, Simon Riggs <simon@2ndquadrant.com> wrote: > On 4 March 2014 08:39, Atri Sharma <atri.jiit@gmail.com> wrote: >> >>> >>> Good points. >>> >>> In most cases, DDL is applied manually after careful thought, so >>> people seldom dump at the same time they upgrade the database. This is >>> especially true for pg_dump since it captures the logical definition >>> of tables. So most people will be happy with the default locking, but >>> we could make the lock level optional. >>> >>> Currently we use AccessShareLock. Locking out all DDL, even with this >>> patch applied would only require ShareUpdateExclusiveLock. >>> >>> Looking at the code, it will take about an hour to add an option to >>> pg_dump that specifies the lock level used when dumping. I would be >>> happy to include that as part of this patch. >> >> >> >> I think the use case for specifying multiple locks is pretty slim given that >> a ShareUpdateExclusiveLock is good enough mostly for everybody. > > Increasing the lock strength would be a change in behaviour that might > adversely affect existing users. The main impact I see is that this would block VACUUM while pg_dump runs. But then, while pg_dump runs VACUUM is ineffective anyway so perhaps that is no bad thing. Autovacuum requests VACOPT_NOWAIT so would skip the relations being dumped rather than waiting. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Mar 4, 2014 at 6:57 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > The main impact I see is that this would block VACUUM while pg_dump runs. > > But then, while pg_dump runs VACUUM is ineffective anyway so perhaps > that is no bad thing. Well, a vacuum that's already running when pg_dump starts up may be doing a lot of good, so it would be a shame to see pg_dump kill them all off. Also, this would put us in the surprising situation that you can't run two simultaneous dumps of overlapping sets of tables, which doesn't strike me as a great thing. I'd really like to see us find a way to apply some version of this patch. I was in favor of the concept 3 years ago when we did this the first time, and I've subsequently done quite a bit of work (viz., MVCC catalog snapshots) to eliminate the main objection that was raised at that time. But it's really hard to reason about what might happen with lowered lock levels, and convince yourself that there's absolutely nothing that can ever go wrong. I don't know what to do about that tension, but I think even modest improvements in this area stand to benefit an awful lot of users. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
I'd really like to see us find a way to apply some version of this
patch. I was in favor of the concept 3 years ago when we did this the
first time, and I've subsequently done quite a bit of work (viz., MVCC
catalog snapshots) to eliminate the main objection that was raised at
that time. But it's really hard to reason about what might happen
with lowered lock levels, and convince yourself that there's
absolutely nothing that can ever go wrong. I don't know what to do
about that tension, but I think even modest improvements in this area
stand to benefit an awful lot of users.
Wouldnt MVCC's strictness rules pose harder restrictions on pg_dump instead of relaxing them?
Regards,
Atri
Regards,
Atri
--
Regards,
Atri
l'apprenant
On 4 March 2014 12:18, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Mar 4, 2014 at 6:57 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> The main impact I see is that this would block VACUUM while pg_dump runs. >> >> But then, while pg_dump runs VACUUM is ineffective anyway so perhaps >> that is no bad thing. > > Well, a vacuum that's already running when pg_dump starts up may be > doing a lot of good, so it would be a shame to see pg_dump kill them > all off. > > Also, this would put us in the surprising situation that you can't run > two simultaneous dumps of overlapping sets of tables, which doesn't > strike me as a great thing. These changes in concurrency are the most serious objections and a definite change in previous behaviour. So we cannot pick a single lock level that suits all goals the user may have. > I'd really like to see us find a way to apply some version of this > patch. I was in favor of the concept 3 years ago when we did this the > first time, and I've subsequently done quite a bit of work (viz., MVCC > catalog snapshots) to eliminate the main objection that was raised at > that time. But it's really hard to reason about what might happen > with lowered lock levels, and convince yourself that there's > absolutely nothing that can ever go wrong. I don't know what to do > about that tension, but I think even modest improvements in this area > stand to benefit an awful lot of users. Agreed. The question is, which subset? The issue just raised would affect whichever subset we choose, so reducing the scope of the patch does nothing to the impact of the pg_dump issue. I will add the option to change lock level for pg_dump. It's simple to use, clear as to why it would be needed and effective at removing this as an obstacle. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
* Atri Sharma (atri.jiit@gmail.com) wrote: > If its not the case, the user should be more careful about when he is > scheduling backups to so that they dont conflict with DDL changes. I'm not following this as closely as I'd like to, but I wanted to voice my opinion that this is just not acceptable as a general answer. There are a good many applications out there which do DDL as part of ongoing activity (part of ETL, or something else) and still need to be able to get a pg_dump done. It's not a design I'd recommend, but I don't think we get to just write it off either. Thanks, Stephen
On Tue, Mar 4, 2014 at 8:19 PM, Stephen Frost <sfrost@snowman.net> wrote:
* Atri Sharma (atri.jiit@gmail.com) wrote:I'm not following this as closely as I'd like to, but I wanted to voice
> If its not the case, the user should be more careful about when he is
> scheduling backups to so that they dont conflict with DDL changes.
my opinion that this is just not acceptable as a general answer. There
are a good many applications out there which do DDL as part of ongoing
activity (part of ETL, or something else) and still need to be able to
get a pg_dump done. It's not a design I'd recommend, but I don't think
we get to just write it off either.
Well, that will require something like MVCC or stricter locking in general. That is not in line with the aim of this patch, hence I raised this point.
Regards,
Atri
Regards,
Atri
Regards,
Atri
l'apprenant
Stephen Frost escribió: > * Atri Sharma (atri.jiit@gmail.com) wrote: > > If its not the case, the user should be more careful about when he is > > scheduling backups to so that they dont conflict with DDL changes. > > I'm not following this as closely as I'd like to, but I wanted to voice > my opinion that this is just not acceptable as a general answer. There > are a good many applications out there which do DDL as part of ongoing > activity (part of ETL, or something else) and still need to be able to > get a pg_dump done. It's not a design I'd recommend, but I don't think > we get to just write it off either. Agreed -- "user caution" is a recipe for trouble, because these things cannot always be planned in minute detail (or such planning creates an excessive cost.) One concern is schema changes that make a dump unrestorable, for instance if there's a foreign key relationship between tables A and B, such that pg_dump dumps the FK for table A but by the time it dumps table B the unique index has gone and thus restoring the FK fails. If this is a realistic failure scenario, then we need some mechanism to avoid it. One possible idea would be to create a new lock level which conflicts with DDL changes but not with regular operation including dumps; so it wouldn't self-conflict but it would conflict with ShareUpdateExclusive. pg_dump would acquire a lock of that level instead of AccessShare; thus two pg_dumps would be able to run on the same table simultaneously, but it would block and be blocked by DDL changes that grab SUE. The big hole in this is that pg_dump would still block vacuum, which is a problem. I hesitate two suggest two extra levels, one for dumps (which wouldn't conflict with SUE) and one for non-exclusive DDL changes (which would.) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Mar 4, 2014 at 10:17 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > One possible idea would be to create a new lock level which conflicts > with DDL changes but not with regular operation including dumps; so it > wouldn't self-conflict but it would conflict with ShareUpdateExclusive. > pg_dump would acquire a lock of that level instead of AccessShare; thus > two pg_dumps would be able to run on the same table simultaneously, but > it would block and be blocked by DDL changes that grab SUE. The big > hole in this is that pg_dump would still block vacuum, which is a > problem. I hesitate two suggest two extra levels, one for dumps (which > wouldn't conflict with SUE) and one for non-exclusive DDL changes (which > would.) AFAIK, the only reason why vacuum takes ShareUpdateExclusive lock is because it can't run at the same time as another vacuum. I tend to think (and have thought for some time) that we really ought to have vacuum take AccessShareLock on the relation plus some other lock that is specific to vacuum (say, a "relation vacuum" lock, just as we have "relation extension" locks). Your idea of a lock strong enough to conflict with DDL but not self-conflicting is interesting, too, but I can't claim to have thought through it all that carefully just yet. I think this is all too late for 9.4, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > One concern is schema changes that make a dump unrestorable, for > instance if there's a foreign key relationship between tables A and B, Yeah. Ideally, what pg_dump would produce would be a consistent snapshot of the database state as of its transaction snapshot time. We have always had that guarantee so far as user data was concerned, but it's been shaky (and getting worse) so far as the database schema is concerned. What bothers me about the current patch is that it's going to make it a whole lot more worse. Also, I don't have any love at all for proposals that we increase the lock level that pg_dump holds. pg_dump tends to run for a long time. I've not been paying all that much attention to the logical-decoding patches, but wasn't there something in there about being able to see the catalog state as it was at some point in the past? If so, maybe we could leverage that to allow a backend to enter a "pg_dump state" wherein its view of the catalogs was frozen at its transaction start snapshot. We'd have to restrict it to read-only operation for safety, but that's surely no problem for pg_dump. If we had that, then this whole problem of server-side computations producing inconsistent results would go away. There might still be a window wherein tables visible at transaction start could be dropped before AccessShareLock could be acquired, but I think we could let pg_dump error out in that case. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Yeah. Ideally, what pg_dump would produce would be a consistent snapshot > of the database state as of its transaction snapshot time. We have always > had that guarantee so far as user data was concerned, but it's been shaky > (and getting worse) so far as the database schema is concerned. What > bothers me about the current patch is that it's going to make it a whole > lot more worse. > Also, I don't have any love at all for proposals that we increase the lock > level that pg_dump holds. pg_dump tends to run for a long time. Agreed. > I've not been paying all that much attention to the logical-decoding > patches, but wasn't there something in there about being able to see > the catalog state as it was at some point in the past? If so, maybe > we could leverage that to allow a backend to enter a "pg_dump state" > wherein its view of the catalogs was frozen at its transaction start > snapshot. We'd have to restrict it to read-only operation for safety, > but that's surely no problem for pg_dump. If we had that, then this > whole problem of server-side computations producing inconsistent > results would go away. That certainly sounds like a tempting idea. > There might still be a window wherein tables visible at transaction start > could be dropped before AccessShareLock could be acquired, but I think > we could let pg_dump error out in that case. I don't have too much of an issue with the above, but I would like to have us figure out a solution to the deadlock problem with parallel pg_dump. The issue arises when pg_dump gets an AccessShareLock and then another process attempts to acquire an AccessExclusiveLock, which then blocks, and then the pg_dump worker process tries to get its AccessShareLock- we end up not being able to make any progress on anything at that point. One suggestion that was discussed at PGConf.EU was having processes which share the same snapshot (the pg_dump master and worker processes) able to either share the same locks or at least be able to "jump" the lock queue (that is, the worker process wouldn't have to wait being the AEL to get an ASL, since the ASL was already aquired for the snapshot which was exported and shared with it). Thanks, Stephen
Robert Haas <robertmhaas@gmail.com> writes: > I think this is all too late for 9.4, though. I agree with the feeling that a meaningful fix for pg_dump isn't going to get done for 9.4. So that leaves us with the alternatives of (1) put off the lock-strength-reduction patch for another year; (2) push it anyway and accept a reduction in pg_dump reliability. I don't care for (2). I'd like to have lock strength reduction as much as anybody, but it can't come at the price of reduction of reliability. The bigger picture here is that it seems like anytime I've thought for more than five minutes about the lock strength reduction patch, I've come up with some fundamental problem. That doesn't leave me with a warm feeling that we're getting close to having something committable. regards, tom lane
Tom Lane escribió: > I'd like to have lock strength reduction as much as anybody, but it > can't come at the price of reduction of reliability. Can we have at least a cut-down version of it? If we can just reduce the lock level required for ALTER TABLE / VALIDATE, that would be an enormous improvement already. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Stephen Frost <sfrost@snowman.net> writes: > I don't have too much of an issue with the above, but I would like to > have us figure out a solution to the deadlock problem with parallel > pg_dump. The issue arises when pg_dump gets an AccessShareLock and then > another process attempts to acquire an AccessExclusiveLock, which then > blocks, and then the pg_dump worker process tries to get its > AccessShareLock- we end up not being able to make any progress on > anything at that point. The original ASL was acquired by the pg_dump master, right? > One suggestion that was discussed at PGConf.EU was having processes > which share the same snapshot (the pg_dump master and worker processes) > able to either share the same locks or at least be able to "jump" the > lock queue (that is, the worker process wouldn't have to wait being the > AEL to get an ASL, since the ASL was already aquired for the snapshot > which was exported and shared with it). Yeah, it seems like we need lock export not only snapshot export to make this work nicely. But that sounds orthogonal to the issues being discussed in this thread. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > I don't have too much of an issue with the above, but I would like to > > have us figure out a solution to the deadlock problem with parallel > > pg_dump. The issue arises when pg_dump gets an AccessShareLock and then > > another process attempts to acquire an AccessExclusiveLock, which then > > blocks, and then the pg_dump worker process tries to get its > > AccessShareLock- we end up not being able to make any progress on > > anything at that point. > > The original ASL was acquired by the pg_dump master, right? Yup. It goes through and gets ASLs on everything first. > > One suggestion that was discussed at PGConf.EU was having processes > > which share the same snapshot (the pg_dump master and worker processes) > > able to either share the same locks or at least be able to "jump" the > > lock queue (that is, the worker process wouldn't have to wait being the > > AEL to get an ASL, since the ASL was already aquired for the snapshot > > which was exported and shared with it). > > Yeah, it seems like we need lock export not only snapshot export to make > this work nicely. But that sounds orthogonal to the issues being > discussed in this thread. Indeed, just figured I'd mention it since we're talking about pg_dump-related locking. Thanks, Stephen
On Tue, Mar 4, 2014 at 10:20 PM, Stephen Frost <sfrost@snowman.net> wrote:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:> Stephen Frost <sfrost@snowman.net> writes:Yup. It goes through and gets ASLs on everything first.
> > I don't have too much of an issue with the above, but I would like to
> > have us figure out a solution to the deadlock problem with parallel
> > pg_dump. The issue arises when pg_dump gets an AccessShareLock and then
> > another process attempts to acquire an AccessExclusiveLock, which then
> > blocks, and then the pg_dump worker process tries to get its
> > AccessShareLock- we end up not being able to make any progress on
> > anything at that point.
>
> The original ASL was acquired by the pg_dump master, right?Indeed, just figured I'd mention it since we're talking about
> > One suggestion that was discussed at PGConf.EU was having processes
> > which share the same snapshot (the pg_dump master and worker processes)
> > able to either share the same locks or at least be able to "jump" the
> > lock queue (that is, the worker process wouldn't have to wait being the
> > AEL to get an ASL, since the ASL was already aquired for the snapshot
> > which was exported and shared with it).
>
> Yeah, it seems like we need lock export not only snapshot export to make
> this work nicely. But that sounds orthogonal to the issues being
> discussed in this thread.
pg_dump-related locking.
What happens for foreign key constraints? For pg_dump, do we lock the tables referenced by the table which is being dumped right now? If that is the case, wouldnt MVCC based approach be the best way for this?
Please ignore if I said anything silly. I am just trying to understand how it works here.
Regards,
Atri
Regards,
Atri
On 2014-03-04 11:40:10 -0500, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > I think this is all too > late for 9.4, though. > > I agree with the feeling that a meaningful fix for pg_dump isn't going > to get done for 9.4. So that leaves us with the alternatives of (1) > put off the lock-strength-reduction patch for another year; (2) push > it anyway and accept a reduction in pg_dump reliability. > > I don't care for (2). I'd like to have lock strength reduction as > much as anybody, but it can't come at the price of reduction of > reliability. I am sorry, but I think this is vastly overstating the scope of the pg_dump problem. CREATE INDEX *already* doesn't require a AEL, and the amount of problems that has caused in the past is surprisingly low. If such a frequently used command doesn't cause problems, why are you assuming other commands to be that problematic? And I think it's hard to argue that the proposed changes are more likely to cause problems. Let's try to go at this a bit more methodically. The commands that - afaics - change their locklevel due to latest patch (v21) are: 01) ALTER TABLE .. ADD CONSTRAINT 02) ALTER TABLE .. ADD CONSTRAINT ... USING 03) ALTER TABLE .. ENABLE | DISABLE [ REPLICA | ALWAYS ] TRIGGER [ ALL ] 04) ALTER TABLE .. ALTER CONSTRAINT 05) ALTER TABLE .. REPLICA IDENTITY 06) ALTER TABLE .. ALTER COLUMN .. SET NOT NULL (*not* DROP NULL) cmd_lockmode = ShareRowExclusiveLock; 07) ALTER TABLE ... ALTER COLUMN ... SET STATISTICS 08) ALTER TABLE ... CLUSTER ON ... 09) ALTER TABLE ... SET WITHOUT CLUSTER 10) ALTER TABLE ... SET (...) 11) ALTER TABLE ... RESET (...) 12) ALTER TABLE ... ALTER COLUMN ... SET (...) 13) ALTER TABLE ... ALTER COLUMN ... RESET (...) 14) ALTER TABLE ... VALIDATE CONSTRAINT constraint_name cmd_lockmode = ShareUpdateExclusiveLock; I have my reservations about ADD CONSTRAINT (including SET NOT NULL) being unproblematic (mostly because I haven't thought through possible consquences for the planner making different choices with added constraints). From the perspective of pg_dump consistency, except ADD CONSTRAINT, none of those seem to have graver possible consequences than CREATE INDEX (and DROP INDEX CONCURRENTLY) already being unsafe. In fact all of those should actually end up being *safer*, even ending up always being dumped consistently since they are all reconstructed clientside by pg_dump. You argue elsewhere that that's a fragile coincidence. But so what, even if it changes, the consequences still are going to be *far* less significant than missing various index, trigger, and whatnot commands. I think the set of problems you mention are going to be really important when we someday get around to make stuff like ALTER TABLE ... ADD/DROP COLUMN require lower lock levels, but that's not what's proposed. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7Support, Training & Services
On 4 March 2014 16:27, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> One concern is schema changes that make a dump unrestorable, for >> instance if there's a foreign key relationship between tables A and B, > > Yeah. Ideally, what pg_dump would produce would be a consistent snapshot > of the database state as of its transaction snapshot time. We have always > had that guarantee so far as user data was concerned, but it's been shaky > (and getting worse) so far as the database schema is concerned. What > bothers me about the current patch is that it's going to make it a whole > lot more worse. While thinking this through it occurs to me that there is no problem at all. Your earlier claim that the dump is inconsistent just isn't accurate. We now have MVCC catalogs, so any dump is going to see a perfectly consistent set of data plus DDL. OK the catalogs may change AFTER the snapshot was taken for the dump, but then so can the data change - that's just MVCC. I was going to add an option to increase lock level, but I can't see why you'd want it even. The dumps are consistent... -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On March 4, 2014 8:39:55 PM CET, Simon Riggs <simon@2ndQuadrant.com> wrote: >On 4 March 2014 16:27, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Alvaro Herrera <alvherre@2ndquadrant.com> writes: >>> One concern is schema changes that make a dump unrestorable, for >>> instance if there's a foreign key relationship between tables A and >B, >> >> Yeah. Ideally, what pg_dump would produce would be a consistent >snapshot >> of the database state as of its transaction snapshot time. We have >always >> had that guarantee so far as user data was concerned, but it's been >shaky >> (and getting worse) so far as the database schema is concerned. What >> bothers me about the current patch is that it's going to make it a >whole >> lot more worse. > >While thinking this through it occurs to me that there is no problem at >all. > >Your earlier claim that the dump is inconsistent just isn't accurate. >We now have MVCC catalogs, so any dump is going to see a perfectly >consistent set of data plus DDL. OK the catalogs may change AFTER the >snapshot was taken for the dump, but then so can the data change - >that's just MVCC. > >I was going to add an option to increase lock level, but I can't see >why you'd want it even. The dumps are consistent... Mvcc scans only guarantee that individual scans are consistent, not that separate scans are. Each individual scan takes anew snapshot if there's been ddl. Andres -- Please excuse brevity and formatting - I am writing this on my mobile phone. Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-03-04 11:40:10 -0500, Tom Lane wrote: >> I don't care for (2). I'd like to have lock strength reduction as >> much as anybody, but it can't come at the price of reduction of >> reliability. > I am sorry, but I think this is vastly overstating the scope of the > pg_dump problem. CREATE INDEX *already* doesn't require a AEL, and the > amount of problems that has caused in the past is surprisingly low. CREATE INDEX happens to be okay because pg_dump won't try to dump indexes it doesn't see in its snapshot, ie the list of indexes to dump is created client-side. CREATE INDEX CONCURRENTLY, otoh, already did break pg_dump, and we had to hack things to fix it; see commit 683abc73dff549e94555d4020dae8d02f32ed78b. regards, tom lane
On Tue, Mar 4, 2014 at 2:39 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 4 March 2014 16:27, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Alvaro Herrera <alvherre@2ndquadrant.com> writes: >>> One concern is schema changes that make a dump unrestorable, for >>> instance if there's a foreign key relationship between tables A and B, >> >> Yeah. Ideally, what pg_dump would produce would be a consistent snapshot >> of the database state as of its transaction snapshot time. We have always >> had that guarantee so far as user data was concerned, but it's been shaky >> (and getting worse) so far as the database schema is concerned. What >> bothers me about the current patch is that it's going to make it a whole >> lot more worse. > > While thinking this through it occurs to me that there is no problem at all. > > Your earlier claim that the dump is inconsistent just isn't accurate. > We now have MVCC catalogs, so any dump is going to see a perfectly > consistent set of data plus DDL. OK the catalogs may change AFTER the > snapshot was taken for the dump, but then so can the data change - > that's just MVCC. Unfortunately, this isn't correct. The MVCC snapshots taken for catalog scans are "instantaneous"; that is, we take a new, current snapshot for each catalog scan. If all of the ruleutils.c stuff were using the transaction snapshot rather than instantaneous snapshots, this would be right. But as has been previously discussed, that's not the case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Mar 4, 2014 at 2:39 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Your earlier claim that the dump is inconsistent just isn't accurate. >> We now have MVCC catalogs, so any dump is going to see a perfectly >> consistent set of data plus DDL. OK the catalogs may change AFTER the >> snapshot was taken for the dump, but then so can the data change - >> that's just MVCC. > Unfortunately, this isn't correct. The MVCC snapshots taken for > catalog scans are "instantaneous"; that is, we take a new, current > snapshot for each catalog scan. If all of the ruleutils.c stuff were > using the transaction snapshot rather than instantaneous snapshots, > this would be right. But as has been previously discussed, that's not > the case. Yeah. And that's *necessary* for catalog lookups in a normally functioning backend, because we have to see latest data (eg, it wouldn't do for a backend to fail to enforce a just-added CHECK constraint because it was committed after the backend's transaction started). However, it seems possible that we could have a mode in which a read-only session did all its catalog fetches according to the transaction snapshot. That would get us to a situation where the backend-internal lookups that ruleutils relies on would give the same answers as queries done by pg_dump. Robert's work on getting rid of SnapshotNow has probably moved that much closer than it was before, but it's still not exactly a trivial patch. Meanwhile, Andres claimed upthread that none of the currently-proposed reduced-lock ALTER commands affect data that pg_dump is using ruleutils to fetch. If that's the case, then maybe this is a problem that we can punt till later. I've not gone through the list to verify it though. regards, tom lane
On 03/04/2014 11:43 AM, Andres Freund wrote: > On March 4, 2014 8:39:55 PM CET, Simon Riggs <simon@2ndQuadrant.com> wrote: >> I was going to add an option to increase lock level, but I can't see >> why you'd want it even. The dumps are consistent... > > Mvcc scans only guarantee that individual scans are consistent, not that separate scans are. Each individual scan takesa new snapshot if there's been ddl. > > Andres > I thought that we were sharing the same snapshot, for parallel dump? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2014-03-04 14:29:31 -0800, Josh Berkus wrote: > On 03/04/2014 11:43 AM, Andres Freund wrote: > > On March 4, 2014 8:39:55 PM CET, Simon Riggs <simon@2ndQuadrant.com> wrote: > >> I was going to add an option to increase lock level, but I can't see > >> why you'd want it even. The dumps are consistent... > > > > Mvcc scans only guarantee that individual scans are consistent, not that separate scans are. Each individual scan takesa new snapshot if there's been ddl. > I thought that we were sharing the same snapshot, for parallel dump? That snapshot is about data, not the catalog. And no, we can't easily reuse one for the other, see elsewhere in this thread for some of the reasons. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Mar 4, 2014 at 8:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > CREATE INDEX CONCURRENTLY, otoh, already did break pg_dump, > and we had to hack things to fix it; see commit > 683abc73dff549e94555d4020dae8d02f32ed78b. Well pg_dump was only broken in that there was a new catalog state to deal with. But the commit you linked to was fixing pg_upgrade which was broken because the on-disk schema was then out of sync with what pg_dump would generate. But that should only matter for creating or deleting whole relations. -- greg
On 2014-03-04 16:37:48 -0500, Tom Lane wrote: > However, it seems possible that we could have a mode in which a read-only > session did all its catalog fetches according to the transaction snapshot. > That would get us to a situation where the backend-internal lookups that > ruleutils relies on would give the same answers as queries done by > pg_dump. Robert's work on getting rid of SnapshotNow has probably moved > that much closer than it was before, but it's still not exactly a trivial > patch. The most interesting bit seems to be the cache invalidation handling. It would need to be something like PushCatalogSnapshot() which disables applying invals, including catchup interrupts and all. If the sinval queue hasn't overflown while that snapshot was up, everything is fine we just need to apply all pending invalidations, if it has, we need to do a InvalidateSystemCaches(). > Meanwhile, Andres claimed upthread that none of the currently-proposed > reduced-lock ALTER commands affect data that pg_dump is using ruleutils > to fetch. If that's the case, then maybe this is a problem that we can > punt till later. I've not gone through the list to verify it though. I think it's true for all but T_AddConstraint, but I am wary about that one anyway. But somebody else should definitely check the list. I wonder if AddConstraintUsing would possibly be safe... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Greg Stark <stark@mit.edu> writes: > On Tue, Mar 4, 2014 at 8:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> CREATE INDEX CONCURRENTLY, otoh, already did break pg_dump, >> and we had to hack things to fix it; see commit >> 683abc73dff549e94555d4020dae8d02f32ed78b. > Well pg_dump was only broken in that there was a new catalog state to > deal with. But the commit you linked to was fixing pg_upgrade which > was broken because the on-disk schema was then out of sync with what > pg_dump would generate. No, it was fixing cases that would cause problems with or without pg_upgrade. Arguably that patch made it worse for pg_upgrade, which needed a followon patch (203d8ae2d). regards, tom lane
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-03-04 16:37:48 -0500, Tom Lane wrote: >> However, it seems possible that we could have a mode in which a read-only >> session did all its catalog fetches according to the transaction snapshot. >> That would get us to a situation where the backend-internal lookups that >> ruleutils relies on would give the same answers as queries done by >> pg_dump. Robert's work on getting rid of SnapshotNow has probably moved >> that much closer than it was before, but it's still not exactly a trivial >> patch. > The most interesting bit seems to be the cache invalidation handling. It > would need to be something like PushCatalogSnapshot() which disables > applying invals, including catchup interrupts and all. If the sinval > queue hasn't overflown while that snapshot was up, everything is fine we > just need to apply all pending invalidations, if it has, we need to do a > InvalidateSystemCaches(). Yeah, at least within the transaction we would simply ignore invals. To avoid causing sinval queue overrun (which would hurt everyone system-wide), my inclination would be to just drop invals on the floor all the time when in this mode, and forcibly do InvalidateSystemCaches at transaction end. For pg_dump, at least, there is no value in working any harder than that, since it's going to quit at transaction end anyhow. regards, tom lane
On 4 March 2014 21:37, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Mar 4, 2014 at 2:39 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> Your earlier claim that the dump is inconsistent just isn't accurate. >>> We now have MVCC catalogs, so any dump is going to see a perfectly >>> consistent set of data plus DDL. OK the catalogs may change AFTER the >>> snapshot was taken for the dump, but then so can the data change - >>> that's just MVCC. > >> Unfortunately, this isn't correct. The MVCC snapshots taken for >> catalog scans are "instantaneous"; that is, we take a new, current >> snapshot for each catalog scan. If all of the ruleutils.c stuff were >> using the transaction snapshot rather than instantaneous snapshots, >> this would be right. But as has been previously discussed, that's not >> the case. > > Yeah. And that's *necessary* for catalog lookups in a normally > functioning backend, because we have to see latest data (eg, it wouldn't > do for a backend to fail to enforce a just-added CHECK constraint because > it was committed after the backend's transaction started). OK, thanks for explaining. A valuable point to note for us all. > However, it seems possible that we could have a mode in which a read-only > session did all its catalog fetches according to the transaction snapshot. > That would get us to a situation where the backend-internal lookups that > ruleutils relies on would give the same answers as queries done by > pg_dump. Robert's work on getting rid of SnapshotNow has probably moved > that much closer than it was before, but it's still not exactly a trivial > patch. > > Meanwhile, Andres claimed upthread that none of the currently-proposed > reduced-lock ALTER commands affect data that pg_dump is using ruleutils > to fetch. If that's the case, then maybe this is a problem that we can > punt till later. I've not gone through the list to verify it though. So that returns us to solving the catalog consistency problem in pg_dump and similar applications. We could (1) change the lock taken by pg_dump to be ShareUpdateExclusive. As discussed, this would be optional. (Trivial implementation) The catalog accesses are all in a rather isolated piece of code in pg_dump and run for a short period. That allows us to consider locking *always* at ShareUpdateExclusive but only for the period of catalog access and then release the higher level lock before transaction end. Since pg_dump is a client program any action we take to resolve this would need to be done in a user accessible way. That is acceptable since there may be other user programs that wish/need to read a consistent view of the definition of a table. This can be implemented in a few ways: (2) Implement a server function that allows you to lock a table for a short duration. e.g. pg_lock_catalog(Oid) and pg_unlock_catalog(Oid). We can already do this in server-side code, so this is simply a matter of exposing that same functionality for users. (3) A new variant of the LOCK command: LOCK CATALOG FOR tablename IN lock mode MODE NOWAIT, which then would have a matching UNLOCK CATALOG FOR tablename command. This is just a sugar coated version of (2). (4) Implement functions to suspend invalidation message handling for a short period. That's basically the same as (2) in profile. My feeling is that sounds rather dangerous and not something I'd want to go near now in in the future. We tried to avoid locking the catalog some years back, which is how we went off down this MVCC catalog access, which now seems to have been something of a red-shifted herring. ISTM that the user would need to specifically request a "consistent catalog". Using (2) in pg_dump is pretty easy - patch attached. So we can solve this problem completely in about another 1 hour of work, so I suggest we implement (2) and be done. Happy to document this in a new subsection of docs to describe how to dump a consistent view of a database object from a user application. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 5 March 2014 09:28, Simon Riggs <simon@2ndquadrant.com> wrote: > So that returns us to solving the catalog consistency problem in > pg_dump and similar applications. No answer, guys, and time is ticking away here. I'd like to get a communal solution to this rather than just punting the whole patch. If we have to strip it down to the bar essentials, so be it. For me, the biggest need here is to make VALIDATE CONSTRAINT take only a ShareUpdateExclusiveLock while it runs. Almost everything else needs a full AccessExclusiveLock anyway, or doesn't run for very long so isn't a critical problem. (Perhaps we can then wrap ADD CONSTRAINT ... NOT VALID and VALIDATE into a single command using the CONCURRENTLY keyword so it runs two transactions to complete the task). Validating FKs on big tables can take hours and it really isn't acceptable for us to lock out access while we do that. FKs are *supposed* to be a major reason people use RDBMS, so keeping them in a state where they are effectively unusable is a major debilitating point against adoption of PostgreSQL. If there are issues with pg_dump we can just document them. Guide me with your thoughts. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Mar 04, 2014 at 06:50:17PM +0100, Andres Freund wrote: > On 2014-03-04 11:40:10 -0500, Tom Lane wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > I think this is all too > > late for 9.4, though. > > > > I agree with the feeling that a meaningful fix for pg_dump isn't going > > to get done for 9.4. So that leaves us with the alternatives of (1) > > put off the lock-strength-reduction patch for another year; (2) push > > it anyway and accept a reduction in pg_dump reliability. > > > > I don't care for (2). I'd like to have lock strength reduction as > > much as anybody, but it can't come at the price of reduction of > > reliability. > > I am sorry, but I think this is vastly overstating the scope of the > pg_dump problem. CREATE INDEX *already* doesn't require a AEL, and the > amount of problems that has caused in the past is surprisingly low. If > such a frequently used command doesn't cause problems, why are you > assuming other commands to be that problematic? And I think it's hard to > argue that the proposed changes are more likely to cause problems. > > Let's try to go at this a bit more methodically. The commands that - > afaics - change their locklevel due to latest patch (v21) are: [snip] Good analysis. The hazards arise when pg_dump uses one of the ruleutils.c deparse worker functions. As a cross-check to your study, I looked briefly at the use of those functions in pg_dump and how this patch might affect them: -- pg_get_constraintdef() pg_dump reads the constraint OID with its transaction snapshot, so we will never see a too-new constraint. Dropping a constraint still requires AccessExclusiveLock. Concerning VALIDATE CONSTRAINT, pg_dump reads convalidated with its transaction snapshot and uses that to decide whether to dump the CHECK constraint as part of the CREATE TABLE or as a separate ALTER TABLE ADD CONSTRAINT following the data load. However, pg_get_constraintdef() reads the latest convalidated to decide whether to emit NOT VALID. Consequently, one can get a dump in which the dumped table data did not yet conform to the constraint, and the ALTER TABLE ADD CONSTRAINT (w/o NOT VALID) fails. (Suppose you deleted the last invalid rows just before executing the VALIDATE CONSTRAINT. I tested this by committing the DELETE + VALIDATE CONSTRAINT with pg_dump stopped at getTableAttrs().) One hacky, but maintainable and effective, solution to the VALIDATE CONSTRAINT problem is to have pg_dump tack on a NOT VALID if pg_get_constraintdef() did not do so. It's, conveniently, the last part of the definition. I would tend to choose this. We could also just decide this isn't bad enough to worry about. The consequence is that an ALTER TABLE ADD CONSTRAINT fails. Assuming no --single-transaction for the original restoral, you just add NOT VALID to the command and rerun. Like most of the potential new pg_dump problems, this can already happen today if the relevant database changes happen between taking the pg_dump transaction snapshot and locking the tables. -- pg_get_expr() for default expressions pg_dump reads pg_attrdef.adbin using its transaction snapshot, so it will never see a too-new default. This does allow us to read a dropped default. That's not a problem directly. However, suppose the default references a function dropped at the same time as the default. pg_dump could fail in pg_get_expr(). -- pg_get_indexdef() As you explained elsewhere, new indexes are no problem. DROP INDEX still requires AccessExclusiveLock. Overall, no problems here. -- pg_get_ruledef() The patch changes lock requirements for enabling and disabling of rules, but that is all separate from the rule expression handling. No problems. -- pg_get_triggerdef() The patch reduces CREATE TRIGGER and DROP TRIGGER to ShareUpdateExclusiveLock. The implications for pg_dump are similar to those for pg_get_expr(). -- pg_get_viewdef() Untamed: pg_dump does not lock views at all. One thing not to forget is that you can always get the old mutual exclusion back by issuing LOCK TABLE just before a DDL operation. If some unlucky user regularly gets pg_dump failures due to concurrent DROP TRIGGER, he has a workaround. There's no comparable way for someone who would not experience that problem to weaken the now-hardcoded AccessExclusiveLock. Many consequences of insufficient locking are too severe for that workaround to bring comfort, but the pg_dump failure scenarios around pg_get_expr() and pg_get_triggerdef() seem mild enough. Restore-time failures are more serious, hence my recommendation to put a hack in pg_dump around VALIDATE CONSTRAINT. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com
On 6 March 2014 22:43, Noah Misch <noah@leadboat.com> wrote: > Good analysis. The hazards arise when pg_dump uses one of the ruleutils.c > deparse worker functions. Ah, good. We're thinking along the same lines. I was already working on this analysis also. I'll complete mine and then compare notes. > One thing not to forget is that you can always get the old mutual exclusion > back by issuing LOCK TABLE just before a DDL operation. If some unlucky user > regularly gets pg_dump failures due to concurrent DROP TRIGGER, he has a > workaround. There's no comparable way for someone who would not experience > that problem to weaken the now-hardcoded AccessExclusiveLock. Good point. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 6 March 2014 22:43, Noah Misch <noah@leadboat.com> wrote: > On Tue, Mar 04, 2014 at 06:50:17PM +0100, Andres Freund wrote: >> On 2014-03-04 11:40:10 -0500, Tom Lane wrote: >> > Robert Haas <robertmhaas@gmail.com> writes: > I think this is all too >> > late for 9.4, though. >> > >> > I agree with the feeling that a meaningful fix for pg_dump isn't going >> > to get done for 9.4. So that leaves us with the alternatives of (1) >> > put off the lock-strength-reduction patch for another year; (2) push >> > it anyway and accept a reduction in pg_dump reliability. >> > >> > I don't care for (2). I'd like to have lock strength reduction as >> > much as anybody, but it can't come at the price of reduction of >> > reliability. >> >> I am sorry, but I think this is vastly overstating the scope of the >> pg_dump problem. CREATE INDEX *already* doesn't require a AEL, and the >> amount of problems that has caused in the past is surprisingly low. If >> such a frequently used command doesn't cause problems, why are you >> assuming other commands to be that problematic? And I think it's hard to >> argue that the proposed changes are more likely to cause problems. >> >> Let's try to go at this a bit more methodically. The commands that - >> afaics - change their locklevel due to latest patch (v21) are: > [snip] > > Good analysis. The hazards arise when pg_dump uses one of the ruleutils.c > deparse worker functions. As a cross-check to your study, I looked briefly at > the use of those functions in pg_dump and how this patch might affect them: > > -- pg_get_constraintdef() > > pg_dump reads the constraint OID with its transaction snapshot, so we will > never see a too-new constraint. Dropping a constraint still requires > AccessExclusiveLock. Agreed > Concerning VALIDATE CONSTRAINT, pg_dump reads convalidated with its > transaction snapshot and uses that to decide whether to dump the CHECK > constraint as part of the CREATE TABLE or as a separate ALTER TABLE ADD > CONSTRAINT following the data load. However, pg_get_constraintdef() reads the > latest convalidated to decide whether to emit NOT VALID. Consequently, one > can get a dump in which the dumped table data did not yet conform to the > constraint, and the ALTER TABLE ADD CONSTRAINT (w/o NOT VALID) fails. > (Suppose you deleted the last invalid rows just before executing the VALIDATE > CONSTRAINT. I tested this by committing the DELETE + VALIDATE CONSTRAINT with > pg_dump stopped at getTableAttrs().) > > One hacky, but maintainable and effective, solution to the VALIDATE CONSTRAINT > problem is to have pg_dump tack on a NOT VALID if pg_get_constraintdef() did > not do so. It's, conveniently, the last part of the definition. I would tend > to choose this. We could also just decide this isn't bad enough to worry > about. The consequence is that an ALTER TABLE ADD CONSTRAINT fails. Assuming > no --single-transaction for the original restoral, you just add NOT VALID to > the command and rerun. Like most of the potential new pg_dump problems, this > can already happen today if the relevant database changes happen between > taking the pg_dump transaction snapshot and locking the tables. Too hacky for me, but some good thinking. My proposed solution is below. > -- pg_get_expr() for default expressions > > pg_dump reads pg_attrdef.adbin using its transaction snapshot, so it will > never see a too-new default. This does allow us to read a dropped default. > That's not a problem directly. However, suppose the default references a > function dropped at the same time as the default. pg_dump could fail in > pg_get_expr(). > > -- pg_get_indexdef() > > As you explained elsewhere, new indexes are no problem. DROP INDEX still > requires AccessExclusiveLock. Overall, no problems here. > > -- pg_get_ruledef() > > The patch changes lock requirements for enabling and disabling of rules, but > that is all separate from the rule expression handling. No problems. > > -- pg_get_triggerdef() > > The patch reduces CREATE TRIGGER and DROP TRIGGER to ShareUpdateExclusiveLock. > The implications for pg_dump are similar to those for pg_get_expr(). These are certainly concerning. What surprises me the most is that pg_dump has been so happy to randomly mix SQL using the transaction snapshot with sys cache access code using a different snapshot. If that was intention there is no documentation in code or in the docs to explain that. > -- pg_get_viewdef() > > Untamed: pg_dump does not lock views at all. OMG, its really a wonder pg_dump works at all. > One thing not to forget is that you can always get the old mutual exclusion > back by issuing LOCK TABLE just before a DDL operation. If some unlucky user > regularly gets pg_dump failures due to concurrent DROP TRIGGER, he has a > workaround. There's no comparable way for someone who would not experience > that problem to weaken the now-hardcoded AccessExclusiveLock. Many > consequences of insufficient locking are too severe for that workaround to > bring comfort, but the pg_dump failure scenarios around pg_get_expr() and > pg_get_triggerdef() seem mild enough. Restore-time failures are more serious, > hence my recommendation to put a hack in pg_dump around VALIDATE CONSTRAINT. The right thing to do here is to not push to the extremes. If we mess too much with the ruleutil stuff it will just be buggy. A more considered analysis in a later release is required for a full and complete approach. As I indicated earlier, an 80/20 solution is better for this release. Slimming down the patch, I've removed changes to lock levels for almost all variants. The only lock levels now reduced are those for VALIDATE, plus setting of relation and attribute level options. VALIDATE is implemented by calling pg_get_constraintdef_mvcc(), a slightly modified variant of pg_get_constraintdef that uses the transaction snapshot. I propose this rather than Noah's solution solely because this will allow any user to request the MVCC data, rather than implement a hack that only works for pg_dump. I will post the patch later today. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 7 March 2014 09:04, Simon Riggs <simon@2ndquadrant.com> wrote: > The right thing to do here is to not push to the extremes. If we mess > too much with the ruleutil stuff it will just be buggy. A more > considered analysis in a later release is required for a full and > complete approach. As I indicated earlier, an 80/20 solution is better > for this release. > > Slimming down the patch, I've removed changes to lock levels for > almost all variants. The only lock levels now reduced are those for > VALIDATE, plus setting of relation and attribute level options. > > VALIDATE is implemented by calling pg_get_constraintdef_mvcc(), a > slightly modified variant of pg_get_constraintdef that uses the > transaction snapshot. I propose this rather than Noah's solution > solely because this will allow any user to request the MVCC data, > rather than implement a hack that only works for pg_dump. I will post > the patch later today. Implemented in attached patch, v22 The following commands (only) are allowed with ShareUpdateExclusiveLock, patch includes doc changes. ALTER TABLE ... VALIDATE CONSTRAINT constraint_name covered by isolation test, plus verified manually with pg_dump ALTER TABLE ... ALTER COLUMN ... SET STATISTICS ALTER TABLE ... ALTER COLUMN ... SET (...) ALTER TABLE ... ALTER COLUMN ... RESET (...) ALTER TABLE ... CLUSTER ON ... ALTER TABLE ... SET WITHOUT CLUSTER ALTER TABLE ... SET (...) covered by isolation test ALTER TABLE ... RESET (...) ALTER INDEX ... SET (...) ALTER INDEX ... RESET (...) All other ALTER commands take AccessExclusiveLock I commend this patch to you for final review; I would like to commit this in a few days. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 03/06/2014 10:47 AM, Simon Riggs wrote: > On 5 March 2014 09:28, Simon Riggs <simon@2ndquadrant.com> wrote: > >> So that returns us to solving the catalog consistency problem in >> pg_dump and similar applications. > No answer, guys, and time is ticking away here. Sorry, I've accumulated several days of backlog (and it'll only get worse in the coming week) so I haven't had time to look at all this in the detail I wanted to. > I'd like to get a communal solution to this rather than just punting > the whole patch. > > If we have to strip it down to the bar essentials, so be it. For me, > the biggest need here is to make VALIDATE CONSTRAINT take only a > ShareUpdateExclusiveLock while it runs. Almost everything else needs a > full AccessExclusiveLock anyway, or doesn't run for very long so isn't > a critical problem. (Perhaps we can then wrap ADD CONSTRAINT ... NOT > VALID and VALIDATE into a single command using the CONCURRENTLY > keyword so it runs two transactions to complete the task). > > Validating FKs on big tables can take hours and it really isn't > acceptable for us to lock out access while we do that. FKs are > *supposed* to be a major reason people use RDBMS, so keeping them in a > state where they are effectively unusable is a major debilitating > point against adoption of PostgreSQL. > > If there are issues with pg_dump we can just document them. > > Guide me with your thoughts. I think committing VALIDATE CONSTRAINT is essential for 9.4; the rest can be delayed until 9.5. None of the discussion in this thread has been about that subcommand, and I don't personally see a problem with it. I don't care much about ADD CONSTRAINT CONCURRENTLY. If it's there, fine. If it's not, that's fine, too. My personal use case for this, and I even started writing a patch before I realized I was re-writing this one, is adding a CHECK constraint NOT VALID so that future commits respect it, then UPDATEing the existing rows to "fix" them, and then VALIDATE CONSTRAINTing it. There is zero need for an AccessExclusiveLock on that last step. My original idea was to concurrently create a partial index on the "bad" rows, and then validate the constraint using that index. The AEL would only be held long enough to check if the index is empty or not. Obviously, reducing the lock level is a cleaner solution, so I'd like to see that happen. -- Vik
On 8 March 2014 11:14, Simon Riggs <simon@2ndquadrant.com> wrote: > On 7 March 2014 09:04, Simon Riggs <simon@2ndquadrant.com> wrote: > >> The right thing to do here is to not push to the extremes. If we mess >> too much with the ruleutil stuff it will just be buggy. A more >> considered analysis in a later release is required for a full and >> complete approach. As I indicated earlier, an 80/20 solution is better >> for this release. >> >> Slimming down the patch, I've removed changes to lock levels for >> almost all variants. The only lock levels now reduced are those for >> VALIDATE, plus setting of relation and attribute level options. >> >> VALIDATE is implemented by calling pg_get_constraintdef_mvcc(), a >> slightly modified variant of pg_get_constraintdef that uses the >> transaction snapshot. I propose this rather than Noah's solution >> solely because this will allow any user to request the MVCC data, >> rather than implement a hack that only works for pg_dump. I will post >> the patch later today. > > Implemented in attached patch, v22 > > The following commands (only) are allowed with > ShareUpdateExclusiveLock, patch includes doc changes. > > ALTER TABLE ... VALIDATE CONSTRAINT constraint_name > covered by isolation test, plus verified manually with pg_dump > > ALTER TABLE ... ALTER COLUMN ... SET STATISTICS > ALTER TABLE ... ALTER COLUMN ... SET (...) > ALTER TABLE ... ALTER COLUMN ... RESET (...) > > ALTER TABLE ... CLUSTER ON ... > ALTER TABLE ... SET WITHOUT CLUSTER > ALTER TABLE ... SET (...) > covered by isolation test > > ALTER TABLE ... RESET (...) > > ALTER INDEX ... SET (...) > ALTER INDEX ... RESET (...) > > All other ALTER commands take AccessExclusiveLock > > I commend this patch to you for final review; I would like to commit > this in a few days. I'm planning to commit this today at 1500UTC barring objections or negative reviews. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Mar 18, 2014 at 10:39:03AM +0000, Simon Riggs wrote: > On 8 March 2014 11:14, Simon Riggs <simon@2ndquadrant.com> wrote: > > Implemented in attached patch, v22 > > I commend this patch to you for final review; I would like to commit > > this in a few days. > > I'm planning to commit this today at 1500UTC barring objections or > negative reviews. Not an objection, but FYI, I'm currently in the midst of reviewing this. -- Noah Misch EnterpriseDB http://www.enterprisedb.com
On 18 March 2014 21:21, Noah Misch <noah@leadboat.com> wrote: > On Tue, Mar 18, 2014 at 10:39:03AM +0000, Simon Riggs wrote: >> On 8 March 2014 11:14, Simon Riggs <simon@2ndquadrant.com> wrote: >> > Implemented in attached patch, v22 > >> > I commend this patch to you for final review; I would like to commit >> > this in a few days. >> >> I'm planning to commit this today at 1500UTC barring objections or >> negative reviews. > > Not an objection, but FYI, I'm currently in the midst of reviewing this. Thanks, I'll holdoff until I hear from you. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 03/18/2014 11:39 AM, Simon Riggs wrote: > On 8 March 2014 11:14, Simon Riggs <simon@2ndquadrant.com> wrote: >> On 7 March 2014 09:04, Simon Riggs <simon@2ndquadrant.com> wrote: >> >>> The right thing to do here is to not push to the extremes. If we mess >>> too much with the ruleutil stuff it will just be buggy. A more >>> considered analysis in a later release is required for a full and >>> complete approach. As I indicated earlier, an 80/20 solution is better >>> for this release. >>> >>> Slimming down the patch, I've removed changes to lock levels for >>> almost all variants. The only lock levels now reduced are those for >>> VALIDATE, plus setting of relation and attribute level options. >>> >>> VALIDATE is implemented by calling pg_get_constraintdef_mvcc(), a >>> slightly modified variant of pg_get_constraintdef that uses the >>> transaction snapshot. I propose this rather than Noah's solution >>> solely because this will allow any user to request the MVCC data, >>> rather than implement a hack that only works for pg_dump. I will post >>> the patch later today. >> Implemented in attached patch, v22 >> >> The following commands (only) are allowed with >> ShareUpdateExclusiveLock, patch includes doc changes. >> >> ALTER TABLE ... VALIDATE CONSTRAINT constraint_name >> covered by isolation test, plus verified manually with pg_dump >> >> ALTER TABLE ... ALTER COLUMN ... SET STATISTICS >> ALTER TABLE ... ALTER COLUMN ... SET (...) >> ALTER TABLE ... ALTER COLUMN ... RESET (...) >> >> ALTER TABLE ... CLUSTER ON ... >> ALTER TABLE ... SET WITHOUT CLUSTER >> ALTER TABLE ... SET (...) >> covered by isolation test >> >> ALTER TABLE ... RESET (...) >> >> ALTER INDEX ... SET (...) >> ALTER INDEX ... RESET (...) >> >> All other ALTER commands take AccessExclusiveLock >> >> I commend this patch to you for final review; I would like to commit >> this in a few days. > I'm planning to commit this today at 1500UTC barring objections or > negative reviews. At my current level of competence, this patch looks good to me. I'm looking forward to reading Noah's review to see what I may have missed. The attached patch fixes two typos in the code comments. -- Vik
Attachment
On Sat, Mar 08, 2014 at 11:14:30AM +0000, Simon Riggs wrote: > On 7 March 2014 09:04, Simon Riggs <simon@2ndquadrant.com> wrote: > > The right thing to do here is to not push to the extremes. If we mess > > too much with the ruleutil stuff it will just be buggy. A more > > considered analysis in a later release is required for a full and > > complete approach. As I indicated earlier, an 80/20 solution is better > > for this release. > > > > Slimming down the patch, I've removed changes to lock levels for > > almost all variants. The only lock levels now reduced are those for > > VALIDATE, plus setting of relation and attribute level options. Good call. > The following commands (only) are allowed with > ShareUpdateExclusiveLock, patch includes doc changes. > > ALTER TABLE ... VALIDATE CONSTRAINT constraint_name > covered by isolation test, plus verified manually with pg_dump I found a pre-existing bug aggravated by this, which I will shortly report on a new thread. > ALTER TABLE ... ALTER COLUMN ... SET STATISTICS > ALTER TABLE ... ALTER COLUMN ... SET (...) > ALTER TABLE ... ALTER COLUMN ... RESET (...) > > ALTER TABLE ... CLUSTER ON ... > ALTER TABLE ... SET WITHOUT CLUSTER A comment at check_index_is_clusterable() still mentions "exclusive lock". > ALTER TABLE ... SET (...) > covered by isolation test > > ALTER TABLE ... RESET (...) > > ALTER INDEX ... SET (...) > ALTER INDEX ... RESET (...) See discussion below. > All other ALTER commands take AccessExclusiveLock > --- a/doc/src/sgml/mvcc.sgml > +++ b/doc/src/sgml/mvcc.sgml > @@ -865,7 +865,8 @@ ERROR: could not serialize access due to read/write dependencies among transact > <para> > Acquired by <command>VACUUM</command> (without <option>FULL</option>), > <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>, and > - some forms of <command>ALTER TABLE</command>. > + <command>ALTER TABLE VALIDATE</command> and other > + <command>ALTER TABLE</command> variants that set options. ALTER TABLE's documentation covers the details, so the old text sufficed for here. I find "variants that set options" too vague, considering the nuances of the actual list of affected forms. > </para> > </listitem> > </varlistentry> > @@ -951,7 +952,7 @@ ERROR: could not serialize access due to read/write dependencies among transact > </para> > > <para> > - Acquired by the <command>ALTER TABLE</>, <command>DROP TABLE</>, > + Acquired by the <command>ALTER TABLE</> for rewriting, <command>DROP TABLE</>, > <command>TRUNCATE</command>, <command>REINDEX</command>, > <command>CLUSTER</command>, and <command>VACUUM FULL</command> > commands. Forms that rewrite the table are just one class of examples. I would write "Acquired by DROP TABLE, ..., VACUUM FULL, and some forms of ALTER TABLE." > --- a/doc/src/sgml/ref/alter_table.sgml > +++ b/doc/src/sgml/ref/alter_table.sgml > @@ -420,6 +439,9 @@ ALTER TABLE [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> > index specification from the table. This affects > future cluster operations that don't specify an index. > </para> > + <para> > + Changing cluster options uses a <literal>SHARE UPDATE EXCLUSIVE</literal> lock. > + </para> > </listitem> > </varlistentry> > > @@ -467,6 +489,10 @@ ALTER TABLE [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> > FULL</>, <xref linkend="SQL-CLUSTER"> or one of the forms > of <command>ALTER TABLE</> that forces a table rewrite. > </para> > + <para> > + Changing storage parameters requires only a > + <literal>SHARE UPDATE EXCLUSIVE</literal> lock. > + </para> Some places say "requires only", while others say "uses". Please adopt one of those consistently. I somewhat prefer the latter. > @@ -1075,6 +1105,14 @@ ALTER TABLE distributors ADD CONSTRAINT distfk FOREIGN KEY (address) REFERENCES > </para> > > <para> > + To add a foreign key constraint to a table minimising impact on other work: Our documentation has used the "minimize" spelling exclusively. > --- a/src/backend/catalog/toasting.c > +++ b/src/backend/catalog/toasting.c > @@ -52,7 +53,7 @@ static bool needs_toast_table(Relation rel); > * to end with CommandCounterIncrement if it makes any changes. > */ > void > -AlterTableCreateToastTable(Oid relOid, Datum reloptions) > +AlterTableCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode) > { > Relation rel; > > @@ -63,10 +64,10 @@ AlterTableCreateToastTable(Oid relOid, Datum reloptions) > * concurrent readers of the pg_class tuple won't have visibility issues, > * so let's be safe. > */ The comment ending right here is falsified by the change. > - rel = heap_open(relOid, AccessExclusiveLock); > + rel = heap_open(relOid, lockmode); We now request whatever lock our caller has already taken. If that is AccessExclusiveLock, create_toast_table() could actually add a toast table. Otherwise, it will either deduce that no change is required or raise an error. > @@ -161,6 +164,14 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio > return false; > > /* > + * We need to create a toast table. We shouldn't > + * have got this far if we're in ALTER TABLE unless > + * we are holding AccessExclusiveLock. > + */ > + if (lockmode < AccessExclusiveLock) > + elog(ERROR, "AccessExclusiveLock required to add toast tables."); Our model of lock levels has not regarded them as being ordered, so I recommend "lockmode != AccessExclusiveLock". (I don't object to using incidental order in AlterTableGetLockLevel(), where all the relevant levels originate in the same function.) > --- a/src/backend/commands/tablecmds.c > +++ b/src/backend/commands/tablecmds.c > @@ -2687,8 +2687,7 @@ AlterTableLookupRelation(AlterTableStmt *stmt, LOCKMODE lockmode) > * The caller must lock the relation, with an appropriate lock level > * for the subcommands requested. Any subcommand that needs to rewrite > * tuples in the table forces the whole command to be executed with > - * AccessExclusiveLock (actually, that is currently required always, but > - * we hope to relax it at some point). We pass the lock level down > + * AccessExclusiveLock. We pass the lock level down The set of commands needing AccessExclusiveLock is much broader than those rewriting the table. Consider replacing the quoted paragraph with something like: * The caller has locked the relation with "lockmode", which must be at least* as strong as AlterTableGetLockLevel(stmt->cmds). We pass the lock level* down so ... > @@ -2750,35 +2772,29 @@ AlterTableInternal(Oid relid, List *cmds, bool recurse) > * ALTER TABLE RENAME which is treated as a different statement type T_RenameStmt > * and does not travel through this section of code and cannot be combined with > * any of the subcommands given here. > + * > + * Note that Hot Standby only knows about AccessExclusiveLocks on the master > + * so any changes that might affect SELECTs running on standbys need to use > + * AccessExclusiveLocks even if you think a lesser lock would do, unless you > + * have a solution for that also. Out of curiosity, do SELECTs on hot standbys impose known challenges in this area not shared with local SELECTs? > + * > + * Also note that pg_dump uses only an AccessShareLock, meaning that anything > + * that takes a lock less than AccessExclusiveLock can change object definitions > + * while pg_dump is running. Be careful to check that the appropriate data is > + * derived by pg_dump using an MVCC snapshot, rather than syscache lookups, > + * otherwise we might end up with an inconsistent dump that can't restore. > + * > + * Be careful to ensure this function is called for Tables and Indexes only. > + * It is not currently safe to be called for Views because security_barrier > + * is listed as an option and so would be allowed to be set at a level lower > + * than AccessExclusiveLock, which would not be correct. This statement is accepted and takes only ShareUpdateExclusiveLock: alter table information_schema.triggers set (security_barrier = true); I suggest adding a LOCKMODE field to relopt_gen and adding a reloptions_locklevel() function to determine the required level from an options list. That puts control of the lock level near the code that understands the implications for each option. You can then revert the addition of AlterViewInternal() and some of the utility.c changes. If I had to pick a reloption most likely to benefit from AccessExclusiveLock, it would be user_catalog_table rather than security_barrier. I wonder if output plugins would want a clear moment in the WAL stream -- the commit of the transaction that sets the reloption -- after which they can assume that all future records pertaining to the table in question have the needed bits. If so, that implies AccessExclusiveLock for this reloption, because heap_page_prune_opt() issues affected WAL records under AccessShareLock. AT_ReplaceRelOptions could require the highest lock level required by any option. It usage seems to be limited to CREATE OR REPLACE VIEW, so I would give it AccessExclusiveLock and be done. > - * 2. Relcache needs to be internally consistent, so unless we lock the > - * definition during reads we have no way to guarantee that. I looked for hazards like this, but I found none in the ALTER forms covered by this patch. None of them modify multiple catalog rows affecting the same relcache entry. However, thinking about that did lead me to ponder another class of hazards. When backends can use one or more relations concurrently with a DDL operation affecting those relations, those backends can find themselves running with a subset of the catalog changes made within a particular DDL operation. Consider VALIDATE CONSTRAINT against an inherited constraint of an inheritance parent. It validates child table constraints, modifying one catalog row per table. At COMMIT time, we queue sinval messages for all affected tables. We add to the queue in atomic groups of WRITE_QUANTUM (64) messages. Between two such groups joining the queue, another backend may process the first group of messages. If the original DDL used AccessExclusiveLock, this is always harmless. The DDL-issuing backend still holds its lock, which means the inval-accepting backend must not have the relation open. If the inval-accepting backend later opens the affected relation, it will first acquire some lock and process the rest of the invalidations from the DDL operation. When doing DDL under a weaker lock, the inval-accepting backend might apply half the invalidations and immediately use them in the context of an open relation. For VALIDATE CONSTRAINT, this means a backend might briefly recognize only a subset of the inheritance tree becoming valid. (I did not actually build a test case to confirm this.) Considering that constraint exclusion is the sole consumer of convalidated/ccvalid that can run in parallel with VALIDATE CONSTRAINT, I think this is harmless. I did not find problems of this nature in any ALTER TABLE forms affected by the patch. Let's just keep it in mind during future lock level changes. > --- a/src/backend/utils/adt/ruleutils.c > +++ b/src/backend/utils/adt/ruleutils.c > static char * > pg_get_constraintdef_worker(Oid constraintId, bool fullCommand, > - int prettyFlags) > + int prettyFlags, bool useSyscache) > { > HeapTuple tup; > Form_pg_constraint conForm; > StringInfoData buf; > + SysScanDesc scandesc; > + ScanKeyData scankey[1]; > + Relation relation; gcc 4.2.4 gives me these warnings: ruleutils.c: In function `pg_get_constraintdef_worker': ruleutils.c:1318: warning: `relation' may be used uninitialized in this function ruleutils.c:1316: warning: `scandesc' may be used uninitialized in this function > + > + if (useSyscache) > + tup = SearchSysCache1(CONSTROID, ObjectIdGetDatum(constraintId)); > + else > + { > + Snapshot snapshot = RegisterSnapshot(GetTransactionSnapshot()); > + PushActiveSnapshot(snapshot); There's no need to push the snapshot; registration is enough. > + > + relation = heap_open(ConstraintRelationId, AccessShareLock); > + > + ScanKeyInit(&scankey[0], > + ObjectIdAttributeNumber, > + BTEqualStrategyNumber, F_OIDEQ, > + ObjectIdGetDatum(constraintId)); > + > + scandesc = systable_beginscan(relation, > + ConstraintOidIndexId, > + true, > + snapshot, > + 1, > + scankey); > + tup = systable_getnext(scandesc); > + > + PopActiveSnapshot(); > + UnregisterSnapshot(snapshot); > + } Later code uses SysCacheGetAttr() regardless of how we acquired the tuple. That works, but it deserves a brief comment mention. pg_get_constraintdef_mvcc() still does syscache lookups by way of decompile_column_index_array(), get_constraint_index(), and deparse_expression_pretty(). It uses MVCC for things that matter for pg_dump vs. reduced lock levels, but not comprehensively. I recommend not adding a new function and instead changing pg_get_constraintdef() to use the transaction snapshot unconditionally. It's hard to imagine an application author making a reasoned choice between the two. Our in-tree callers, psql and pg_dump, have no cause to prefer the less-MVCC behavior at any time. > --- a/src/backend/utils/cache/relcache.c > +++ b/src/backend/utils/cache/relcache.c > @@ -161,6 +161,14 @@ static bool eoxact_list_overflowed = false; > eoxact_list_overflowed = true; \ > } while (0) > > +/* > + * EOXactTupleDescArray stores *TupleDescs that (might) need AtEOXact Stray asterisk. > @@ -2312,6 +2334,37 @@ RelationCloseSmgrByOid(Oid relationId) > RelationCloseSmgr(relation); > } > > +void > +RememberToFreeTupleDescAtEOX(TupleDesc td) > +{ > + if (EOXactTupleDescArray == NULL) > + { > + MemoryContext oldcxt; > + oldcxt = MemoryContextSwitchTo(CacheMemoryContext); > + > + EOXactTupleDescArray = (TupleDesc *) palloc(16 * sizeof(TupleDesc)); > + EOXactTupleDescArrayLen = 16; > + NextEOXactTupleDescNum = 0; > + MemoryContextSwitchTo(oldcxt); > + } > + else if (NextEOXactTupleDescNum >= EOXactTupleDescArrayLen) > + { > + MemoryContext oldcxt; > + int32 newlen = EOXactTupleDescArrayLen * 2; > + > + oldcxt = MemoryContextSwitchTo(CacheMemoryContext); > + > + Assert(EOXactTupleDescArrayLen > 0); > + > + EOXactTupleDescArray = (TupleDesc *) repalloc(EOXactTupleDescArray, > + newlen * sizeof(TupleDesc)); > + EOXactTupleDescArrayLen = newlen; > + MemoryContextSwitchTo(oldcxt); No need to switch contexts for repalloc(). > --- /dev/null > +++ b/src/test/isolation/specs/alter-table-1.spec > @@ -0,0 +1,34 @@ > +# ALTER TABLE - Add and Validate constraint with concurrent writes > +# > +# ADD FOREIGN KEY allows a minimum of ShareRowExclusiveLock, > +# so we mix writes with it to see what works or waits. > +# VALIDATE allows a minimum of ShareUpdateExclusiveLock > +# so we mix reads with it to see what works or waits This comment is obsolete regarding the ADD FOREIGN KEY lock type. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com
On 21 March 2014 03:45, Noah Misch <noah@leadboat.com> wrote: > On Sat, Mar 08, 2014 at 11:14:30AM +0000, Simon Riggs wrote: >> On 7 March 2014 09:04, Simon Riggs <simon@2ndquadrant.com> wrote: >> > The right thing to do here is to not push to the extremes. If we mess >> > too much with the ruleutil stuff it will just be buggy. A more >> > considered analysis in a later release is required for a full and >> > complete approach. As I indicated earlier, an 80/20 solution is better >> > for this release. >> > >> > Slimming down the patch, I've removed changes to lock levels for >> > almost all variants. The only lock levels now reduced are those for >> > VALIDATE, plus setting of relation and attribute level options. > > Good call. Thanks for the review. I'll respond to each point on a later email but looks nothing much major, apart from the point raised on separate thread. >> + * Be careful to ensure this function is called for Tables and Indexes only. >> + * It is not currently safe to be called for Views because security_barrier >> + * is listed as an option and so would be allowed to be set at a level lower >> + * than AccessExclusiveLock, which would not be correct. > > This statement is accepted and takes only ShareUpdateExclusiveLock: > > alter table information_schema.triggers set (security_barrier = true); I find it hard to justify why we accept such a statement. Surely its a bug when the named table turns out to be a view? Presumably ALTER SEQUENCE and ALTER <other stuff> has checks for the correct object type? OMG. > I suggest adding a LOCKMODE field to relopt_gen and adding a > reloptions_locklevel() function to determine the required level from an > options list. That puts control of the lock level near the code that > understands the implications for each option. You can then revert the > addition of AlterViewInternal() and some of the utility.c changes. Sure, that's how we code it, but I'm not sure we should introduce that feature. The above weirdness is not itself justification. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Mar 21, 2014 at 04:11:12PM +0000, Simon Riggs wrote: > On 21 March 2014 03:45, Noah Misch <noah@leadboat.com> wrote: > > On Sat, Mar 08, 2014 at 11:14:30AM +0000, Simon Riggs wrote: > Thanks for the review. I'll respond to each point on a later email but > looks nothing much major, apart from the point raised on separate > thread. Yep. > >> + * Be careful to ensure this function is called for Tables and Indexes only. > >> + * It is not currently safe to be called for Views because security_barrier > >> + * is listed as an option and so would be allowed to be set at a level lower > >> + * than AccessExclusiveLock, which would not be correct. > > > > This statement is accepted and takes only ShareUpdateExclusiveLock: > > > > alter table information_schema.triggers set (security_barrier = true); > > I find it hard to justify why we accept such a statement. Surely its a > bug when the named table turns out to be a view? Presumably ALTER > SEQUENCE and ALTER <other stuff> has checks for the correct object > type? OMG. We've framed ALTER TABLE's relkind leniency as a historic artifact. As a move toward stricter checks, ALTER TABLE refused to operate on foreign tables in 9.1 and 9.2. 9.3 reversed that course, though. For better or worse, ALTER TABLE is nearly a union of the relation ALTER possibilities. That choice is well-entrenched. -- Noah Misch EnterpriseDB http://www.enterprisedb.com
On 21 March 2014 17:49, Noah Misch <noah@leadboat.com> wrote: >> >> + * Be careful to ensure this function is called for Tables and Indexes only. >> >> + * It is not currently safe to be called for Views because security_barrier >> >> + * is listed as an option and so would be allowed to be set at a level lower >> >> + * than AccessExclusiveLock, which would not be correct. >> > >> > This statement is accepted and takes only ShareUpdateExclusiveLock: >> > >> > alter table information_schema.triggers set (security_barrier = true); >> >> I find it hard to justify why we accept such a statement. Surely its a >> bug when the named table turns out to be a view? Presumably ALTER >> SEQUENCE and ALTER <other stuff> has checks for the correct object >> type? OMG. > > We've framed ALTER TABLE's relkind leniency as a historic artifact. As a move > toward stricter checks, ALTER TABLE refused to operate on foreign tables in > 9.1 and 9.2. 9.3 reversed that course, though. For better or worse, ALTER > TABLE is nearly a union of the relation ALTER possibilities. That choice is > well-entrenched. By "well entrenched", I think you mean undocumented, untested, unintentional? Do we think anyone *relies* on being able to say the word TABLE when in fact they mean VIEW or SEQUENCE? How is that artefact anything but a bug? i.e. is anyone going to stop me fixing it? -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 21 March 2014 03:45, Noah Misch <noah@leadboat.com> wrote: >> + * Note that Hot Standby only knows about AccessExclusiveLocks on the master >> + * so any changes that might affect SELECTs running on standbys need to use >> + * AccessExclusiveLocks even if you think a lesser lock would do, unless you >> + * have a solution for that also. > > Out of curiosity, do SELECTs on hot standbys impose known challenges in this > area not shared with local SELECTs? No, but locks less than AccessExclusiveLock won't happen at all, so its a difference that if improperly handled could cause a bug. Plus I wanted to indicate I'd thought about it. >> - * 2. Relcache needs to be internally consistent, so unless we lock the >> - * definition during reads we have no way to guarantee that. > > I looked for hazards like this, but I found none in the ALTER forms covered by > this patch. None of them modify multiple catalog rows affecting the same > relcache entry. However, thinking about that did lead me to ponder another > class of hazards. When backends can use one or more relations concurrently > with a DDL operation affecting those relations, those backends can find > themselves running with a subset of the catalog changes made within a > particular DDL operation. Consider VALIDATE CONSTRAINT against an inherited > constraint of an inheritance parent. It validates child table constraints, > modifying one catalog row per table. At COMMIT time, we queue sinval messages > for all affected tables. We add to the queue in atomic groups of > WRITE_QUANTUM (64) messages. Between two such groups joining the queue, > another backend may process the first group of messages. If the original DDL > used AccessExclusiveLock, this is always harmless. The DDL-issuing backend > still holds its lock, which means the inval-accepting backend must not have > the relation open. If the inval-accepting backend later opens the affected > relation, it will first acquire some lock and process the rest of the > invalidations from the DDL operation. When doing DDL under a weaker lock, the > inval-accepting backend might apply half the invalidations and immediately use > them in the context of an open relation. For VALIDATE CONSTRAINT, this means > a backend might briefly recognize only a subset of the inheritance tree > becoming valid. (I did not actually build a test case to confirm this.) > > Considering that constraint exclusion is the sole consumer of > convalidated/ccvalid that can run in parallel with VALIDATE CONSTRAINT, I > think this is harmless. I did not find problems of this nature in any ALTER > TABLE forms affected by the patch. Let's just keep it in mind during future > lock level changes. I'll document > pg_get_constraintdef_mvcc() still does syscache lookups by way of > decompile_column_index_array(), get_constraint_index(), and > deparse_expression_pretty(). It uses MVCC for things that matter for pg_dump > vs. reduced lock levels, but not comprehensively. I recommend not adding a > new function and instead changing pg_get_constraintdef() to use the > transaction snapshot unconditionally. OK -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Mar 21, 2014 at 06:53:27PM +0000, Simon Riggs wrote: > On 21 March 2014 17:49, Noah Misch <noah@leadboat.com> wrote: > > >> > alter table information_schema.triggers set (security_barrier = true); > >> > >> I find it hard to justify why we accept such a statement. Surely its a > >> bug when the named table turns out to be a view? Presumably ALTER > >> SEQUENCE and ALTER <other stuff> has checks for the correct object > >> type? OMG. > > > > We've framed ALTER TABLE's relkind leniency as a historic artifact. As a move > > toward stricter checks, ALTER TABLE refused to operate on foreign tables in > > 9.1 and 9.2. 9.3 reversed that course, though. For better or worse, ALTER > > TABLE is nearly a union of the relation ALTER possibilities. That choice is > > well-entrenched. > > By "well entrenched", I think you mean undocumented, untested, unintentional? It's deliberate; a -hackers discussion revisits it perhaps once a year. The ALTER VIEW documentation says: For historical reasons, ALTER TABLE can be used with views too; but the only variants of ALTER TABLE that are allowed withviews are equivalent to the ones shown above. ALTER INDEX and ALTER SEQUENCE say something similar. > Do we think anyone *relies* on being able to say the word TABLE when > in fact they mean VIEW or SEQUENCE? pg_dump emits statements that exercise it: psql -c 'create view v as select 1 as c; alter view v alter c set default 0;' pg_dump --table v | grep ALTER > How is that artefact anything but a bug? i.e. is anyone going to stop > me fixing it? It's not the behavior I would choose for a new product, but I can't see benefits sufficient to overturn previous decisions to keep it. -- Noah Misch EnterpriseDB http://www.enterprisedb.com
On 21 March 2014 20:58, Noah Misch <noah@leadboat.com> wrote: > On Fri, Mar 21, 2014 at 06:53:27PM +0000, Simon Riggs wrote: >> On 21 March 2014 17:49, Noah Misch <noah@leadboat.com> wrote: >> >> >> > alter table information_schema.triggers set (security_barrier = true); >> >> >> >> I find it hard to justify why we accept such a statement. Surely its a >> >> bug when the named table turns out to be a view? Presumably ALTER >> >> SEQUENCE and ALTER <other stuff> has checks for the correct object >> >> type? OMG. >> > >> > We've framed ALTER TABLE's relkind leniency as a historic artifact. As a move >> > toward stricter checks, ALTER TABLE refused to operate on foreign tables in >> > 9.1 and 9.2. 9.3 reversed that course, though. For better or worse, ALTER >> > TABLE is nearly a union of the relation ALTER possibilities. That choice is >> > well-entrenched. >> >> By "well entrenched", I think you mean undocumented, untested, unintentional? > > It's deliberate; a -hackers discussion revisits it perhaps once a year. The > ALTER VIEW documentation says: > > For historical reasons, ALTER TABLE can be used with views too; but the only > variants of ALTER TABLE that are allowed with views are equivalent to the > ones shown above. > > ALTER INDEX and ALTER SEQUENCE say something similar. > >> Do we think anyone *relies* on being able to say the word TABLE when >> in fact they mean VIEW or SEQUENCE? > > pg_dump emits statements that exercise it: > > psql -c 'create view v as select 1 as c; alter view v alter c set default 0;' > pg_dump --table v | grep ALTER > >> How is that artefact anything but a bug? i.e. is anyone going to stop >> me fixing it? > > It's not the behavior I would choose for a new product, but I can't see > benefits sufficient to overturn previous decisions to keep it. Speechless -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes: > On 21 March 2014 20:58, Noah Misch <noah@leadboat.com> wrote: >> It's not the behavior I would choose for a new product, but I can't see >> benefits sufficient to overturn previous decisions to keep it. > Speechless The key argument for not "fixing" this is that it would break existing pg_dump files. That's a pretty hard argument to overcome, unfortunately, even if you're willing to blow off the possibility that client applications might contain similar shortcuts. We still do our best to read dump files from the 7.0 era (see ConvertTriggerToFK() for one example of going above and beyond for that); and every so often we do hear of people trying to get data out of such ancient servers. So even if you went and fixed pg_dump tomorrow, it'd probably be ten or fifteen years before people would let you stop reading dumps from existing versions. regards, tom lane
On 21 March 2014 23:36, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On 21 March 2014 20:58, Noah Misch <noah@leadboat.com> wrote: >>> It's not the behavior I would choose for a new product, but I can't see >>> benefits sufficient to overturn previous decisions to keep it. > >> Speechless > > The key argument for not "fixing" this is that it would break existing > pg_dump files. That's a pretty hard argument to overcome, unfortunately, > even if you're willing to blow off the possibility that client > applications might contain similar shortcuts. We still do our best to > read dump files from the 7.0 era (see ConvertTriggerToFK() for one example > of going above and beyond for that); and every so often we do hear of > people trying to get data out of such ancient servers. So even if you > went and fixed pg_dump tomorrow, it'd probably be ten or fifteen years > before people would let you stop reading dumps from existing versions. Noah had already convinced me, but thank you for explaining the issue behind that. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 21 March 2014 16:11, Simon Riggs <simon@2ndquadrant.com> wrote: >>> + * Be careful to ensure this function is called for Tables and Indexes only. >>> + * It is not currently safe to be called for Views because security_barrier >>> + * is listed as an option and so would be allowed to be set at a level lower >>> + * than AccessExclusiveLock, which would not be correct. >> >> This statement is accepted and takes only ShareUpdateExclusiveLock: >> >> alter table information_schema.triggers set (security_barrier = true); >> I suggest adding a LOCKMODE field to relopt_gen and adding a >> reloptions_locklevel() function to determine the required level from an >> options list. That puts control of the lock level near the code that >> understands the implications for each option. You can then revert the >> addition of AlterViewInternal() and some of the utility.c changes. > > Sure, that's how we code it, but I'm not sure we should introduce that > feature. The above weirdness is not itself justification. OK, will follow this path. It's a good idea since it encourages authors of new "options" to consider properly the lock level that will be required. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2/26/14, 9:15 AM, Simon Riggs wrote: > On 26 February 2014 13:38, Andres Freund<andres@2ndquadrant.com> wrote: >> >Hi, >> > >> >On 2014-02-26 07:32:45 +0000, Simon Riggs wrote: >>>> >> >* This definitely should include isolationtester tests actually >>>> >> > performing concurrent ALTER TABLEs. All that's currently there is >>>> >> > tests that the locklevel isn't too high, but not that it actually works. >>> >> >>> >>There is no concurrent behaviour here, hence no code that would be >>> >>exercised by concurrent tests. >> > >> >Huh? There's most definitely new concurrent behaviour. Previously no >> >other backends could have a relation open (and locked) while it got >> >altered (which then sends out relcache invalidations). That's something >> >that should be tested. > It has been. High volume concurrent testing has been performed, per > Tom's original discussion upthread, but that's not part of the test > suite. > For other tests I have no guide as to how to write a set of automated > regression tests. Anything could cause a failure, so I'd need to write > an infinite set of tests to prove there is no bug*somewhere*. How > many tests are required? 0, 1, 3, 30? ISTM that we don't want hand-written tests here, but rather generated tests that actually hit all potential cases. Obviouslywe'd never run that as part of normal reqression, but farm animals certainly could. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net