Thread: Assertion failure on hot standby
Hi, http://archives.postgresql.org/pgsql-hackers/2010-11/msg01303.php When I did unusual operations (e.g., suspend bgwriter by gdb, pgbench -i and issue txid_current many times) on the master in order to try to reproduce the above HS error, I encountered the following assertion error. Since I compiled the standby postgres with WAL_DEBUG and ran it with wal_debug = on, all the replayed WAL records were logged. ------------------------ sby LOG: REDO @ 0/134C0490; LSN 0/134C04D0: prev 0/134C0450; xid 23253; len 32: Transaction - commit: 2010-11-24 12:15:02.315634+09 sby LOG: REDO @ 0/134C04D0; LSN 0/134C0510: prev 0/134C0490; xid 23254; len 32: Transaction - commit: 2010-11-24 12:15:02.325252+09 sby LOG: consistent recovery state reached at 0/134C0510 sby LOG: REDO @ 0/134C0510; LSN 0/134C0550: prev 0/134C04D0; xid 23255; len 32: Transaction - commit: 2010-11-24 12:15:09.224343+09 sby LOG: REDO @ 0/134C0550; LSN 0/134C0580: prev 0/134C0510; xid 0; len 16: Standby - AccessExclusive locks: xid 0 db 11910 rel 16409 sby LOG: REDO @ 0/134C0580; LSN 0/134C05B8: prev 0/134C0550; xid 0; len 20: Standby - running xacts: nextXid 23256 latestCompletedXid 23255 oldestRunningXid 23256 TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File: "twophase.c", Line: 1209) sby LOG: database system is ready to accept read only connections sby LOG: startup process (PID 32666) was terminated by signal 6: Aborted sby LOG: terminating any other active server processes ------------------------ Does anyone know what the cause of the problem is? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Nov 24, 2010 at 1:27 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > Hi, > > http://archives.postgresql.org/pgsql-hackers/2010-11/msg01303.php > > When I did unusual operations (e.g., suspend bgwriter by gdb, > pgbench -i and issue txid_current many times) on the master > in order to try to reproduce the above HS error, I encountered > the following assertion error. > > Since I compiled the standby postgres with WAL_DEBUG and > ran it with wal_debug = on, all the replayed WAL records were > logged. > > ------------------------ > sby LOG: REDO @ 0/134C0490; LSN 0/134C04D0: prev 0/134C0450; xid > 23253; len 32: Transaction - commit: 2010-11-24 12:15:02.315634+09 > sby LOG: REDO @ 0/134C04D0; LSN 0/134C0510: prev 0/134C0490; xid > 23254; len 32: Transaction - commit: 2010-11-24 12:15:02.325252+09 > sby LOG: consistent recovery state reached at 0/134C0510 > sby LOG: REDO @ 0/134C0510; LSN 0/134C0550: prev 0/134C04D0; xid > 23255; len 32: Transaction - commit: 2010-11-24 12:15:09.224343+09 > sby LOG: REDO @ 0/134C0550; LSN 0/134C0580: prev 0/134C0510; xid 0; > len 16: Standby - AccessExclusive locks: xid 0 db 11910 rel 16409 > sby LOG: REDO @ 0/134C0580; LSN 0/134C05B8: prev 0/134C0550; xid 0; > len 20: Standby - running xacts: nextXid 23256 latestCompletedXid > 23255 oldestRunningXid 23256 > TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File: > "twophase.c", Line: 1209) > sby LOG: database system is ready to accept read only connections > sby LOG: startup process (PID 32666) was terminated by signal 6: Aborted > sby LOG: terminating any other active server processes > ------------------------ > > Does anyone know what the cause of the problem is? I was able to reproduce this problem. This happens because CHECKPOINT can write the WAL record indicating that the transaction with XID = 0 has taken the AccessExclusive lock. This WAL record causes that assertion failure in the standby. Here is the procedure to reproduce the problem: ------------------- 1. Execute "DROP TABLE" and suspend the execution before calling RemoveRelations -> LockRelationOid -> LockAcquire -> LockAcquireExtended -> LogAccessExclusiveLock by, for example, using gdb. 2. While "DROP TABLE" is being suspended, execute CHECKPOINT. This CHECKPOINT will generate the above-mentioned WAL record. ------------------- To solve the problem, ISTM that XID should be assigned before the information about AccessExclusive lock becomes visible to another process. Or CHECKPOINT (i.e., GetRunningTransactionLocks) should ignore the locks with XID = 0. Thought? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, 2010-11-25 at 16:59 +0900, Fujii Masao wrote: > To solve the problem, ISTM that XID should be assigned before the > information about AccessExclusive lock becomes visible to another > process. Or CHECKPOINT (i.e., GetRunningTransactionLocks) should > ignore the locks with XID = 0. First, thanks for pursuing this. I realise I made the mistake of assuming there was just one bug; I see that the bug Heikki was discussing is a separate issue. As to solutions, it cannot be acceptable to ignore some locks just because an xid has not been assigned. If I understand you correctly, it seems possible to generate an AccessExclusiveLock before an xid is assigned, so that its possible to log that situation before the transaction assigns an xid slightly later. So there's a narrow window where we can generate a lock WAL record with xid 0. The sensible resolution is to ensure that all AccessExclusiveLocks have an xid assigned prior to them registering their proclock. That would mean running GetCurrentTransactionId() inside LockAcquire() if (lockmode >= AccessExclusiveLock && locktag->locktag_type == LOCKTAG_RELATION && !RecoveryInProgress())(void) GetCurrentTransactionId(); Any objections to that fix? -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Fri, Nov 26, 2010 at 7:40 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > As to solutions, it cannot be acceptable to ignore some locks just > because an xid has not been assigned. Even if GetRunningTransactionLocks ignores such a lock, it's eventually WAL-logged by LogAccessExclusiveLock, isn't it? > If I understand you correctly, it seems possible to generate an > AccessExclusiveLock before an xid is assigned, so that its possible to > log that situation before the transaction assigns an xid slightly later. > So there's a narrow window where we can generate a lock WAL record with > xid 0. Right. > The sensible resolution is to ensure that all > AccessExclusiveLocks have an xid assigned prior to them registering > their proclock. > > That would mean running GetCurrentTransactionId() inside LockAcquire() > > if (lockmode >= AccessExclusiveLock && > locktag->locktag_type == LOCKTAG_RELATION && > !RecoveryInProgress()) > (void) GetCurrentTransactionId(); s/GetCurrentTransactionId/GetTopTransactionId? > Any objections to that fix? Or can we call LogAccessExclusiveLock before registering the lock? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Simon Riggs <simon@2ndQuadrant.com> writes: > That would mean running GetCurrentTransactionId() inside LockAcquire() > if (lockmode >= AccessExclusiveLock && > locktag->locktag_type == LOCKTAG_RELATION && > !RecoveryInProgress()) > (void) GetCurrentTransactionId(); > Any objections to that fix? Could we have a wal level test in there too please? It's pretty awful in any case... regards, tom lane
On Fri, Nov 26, 2010 at 1:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> That would mean running GetCurrentTransactionId() inside LockAcquire() > >> if (lockmode >= AccessExclusiveLock && >> locktag->locktag_type == LOCKTAG_RELATION && >> !RecoveryInProgress()) >> (void) GetCurrentTransactionId(); > >> Any objections to that fix? > > Could we have a wal level test in there too please? It's pretty awful > in any case... +1. Incidentally, I haven't been able to wrap my head around why we need to propagate AccessExclusiveLocks to the standby in the first place. Can someone explain? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Friday 26 November 2010 13:32:18 Robert Haas wrote: > On Fri, Nov 26, 2010 at 1:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Simon Riggs <simon@2ndQuadrant.com> writes: > >> That would mean running GetCurrentTransactionId() inside LockAcquire() > >> > >> if (lockmode >= AccessExclusiveLock && > >> locktag->locktag_type == LOCKTAG_RELATION && > >> !RecoveryInProgress()) > >> (void) GetCurrentTransactionId(); > >> > >> Any objections to that fix? > > > > Could we have a wal level test in there too please? It's pretty awful > > in any case... > Incidentally, I haven't been able to wrap my head around why we need > to propagate AccessExclusiveLocks to the standby in the first place. > Can someone explain? To make the standby stop applying WAL when a local transaction on the standby uses an object. E.g. dropping a table on the master need the standby top stop applying wal (or kill the local client using the table). How would you want to protect against something like that otherwise? Andres
On Fri, Nov 26, 2010 at 7:41 AM, Andres Freund <andres@anarazel.de> wrote: > On Friday 26 November 2010 13:32:18 Robert Haas wrote: >> On Fri, Nov 26, 2010 at 1:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> > Simon Riggs <simon@2ndQuadrant.com> writes: >> >> That would mean running GetCurrentTransactionId() inside LockAcquire() >> >> >> >> if (lockmode >= AccessExclusiveLock && >> >> locktag->locktag_type == LOCKTAG_RELATION && >> >> !RecoveryInProgress()) >> >> (void) GetCurrentTransactionId(); >> >> >> >> Any objections to that fix? >> > >> > Could we have a wal level test in there too please? It's pretty awful >> > in any case... >> Incidentally, I haven't been able to wrap my head around why we need >> to propagate AccessExclusiveLocks to the standby in the first place. >> Can someone explain? > To make the standby stop applying WAL when a local transaction on the standby > uses an object. > E.g. dropping a table on the master need the standby top stop applying wal (or > kill the local client using the table). > How would you want to protect against something like that otherwise? Hmm. But it seems like that it would be enough to log any exclusive locks held at commit time, rather than logging them as they're acquired. By then, the XID will be assigned (if you need it - if you don't then you probably don't need to XLOG it anyway) and you avoid holding the lock for more than a moment on the standby. But it seems like an even better idea would be to actually XLOG the operations that are problematic specifically. Because, for example, if a user session on the master does LOCK TABLE ... IN ACCESS EXCLUSIVE MODE, AFAICS there's no reason for the standby to care. Or am I confused? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, 2010-11-26 at 11:19 +0900, Fujii Masao wrote: > On Fri, Nov 26, 2010 at 7:40 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > As to solutions, it cannot be acceptable to ignore some locks just > > because an xid has not been assigned. > > Even if GetRunningTransactionLocks ignores such a lock, it's eventually > WAL-logged by LogAccessExclusiveLock, isn't it? If it were true always, I would much prefer your solution. Assuming that would then cause a race condition between the logging of the RunningXactsData and the lock, which wouldn't move us forwards. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On 26.11.2010 17:28, Robert Haas wrote: > On Fri, Nov 26, 2010 at 7:41 AM, Andres Freund<andres@anarazel.de> wrote: >> On Friday 26 November 2010 13:32:18 Robert Haas wrote: >>> On Fri, Nov 26, 2010 at 1:11 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>>> Simon Riggs<simon@2ndQuadrant.com> writes: >>>>> That would mean running GetCurrentTransactionId() inside LockAcquire() >>>>> >>>>> if (lockmode>= AccessExclusiveLock&& >>>>> locktag->locktag_type == LOCKTAG_RELATION&& >>>>> !RecoveryInProgress()) >>>>> (void) GetCurrentTransactionId(); >>>>> >>>>> Any objections to that fix? >>>> >>>> Could we have a wal level test in there too please? It's pretty awful >>>> in any case... >>> Incidentally, I haven't been able to wrap my head around why we need >>> to propagate AccessExclusiveLocks to the standby in the first place. >>> Can someone explain? >> To make the standby stop applying WAL when a local transaction on the standby >> uses an object. >> E.g. dropping a table on the master need the standby top stop applying wal (or >> kill the local client using the table). >> How would you want to protect against something like that otherwise? > > Hmm. But it seems like that it would be enough to log any exclusive > locks held at commit time, rather than logging them as they're > acquired. By then, the XID will be assigned (if you need it - if you > don't then you probably don't need to XLOG it anyway) and you avoid > holding the lock for more than a moment on the standby. > > But it seems like an even better idea would be to actually XLOG the > operations that are problematic specifically. Because, for example, > if a user session on the master does LOCK TABLE ... IN ACCESS > EXCLUSIVE MODE, AFAICS there's no reason for the standby to care. Or > am I confused? Let's approach this from a different direction: If you have operation A in the master that currently acquires an AccessExclusiveLock on a table, do you think it's safe for another transaction to peek at the table at the same time? Say, run a heap scan simultaneously. If yes, why did you take an AccessExclusiveLock in the first place. If not, then surely it's not safe to have a heap scan running against the table in the standby either. The read-only query in the standby sees the same actions as a read-only query on the master would see. You can only take AccessShareLocks on the standby, and the only locks that conflict with an AccessShareLock is the AccessExclusiveLock. (Hmm, looking at the code, we also allow RowShareLock and RowExclusiveLock in the standby. You can't actually insert/update/delete tuples or set xmax as SELECT FOR SHARE does on standby, though, so why do we allow that? ) As a concrete example, VACUUM acquires an AccessExclusiveLock when it wants to truncate the relation. A sequential scan running against the table in the standby will get upset, if the startup process replays a truncation record on the table without warning. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, 2010-11-26 at 17:53 +0200, Heikki Linnakangas wrote: > Hmm, > looking at the code, we also allow RowShareLock and RowExclusiveLock > in > the standby. You can't actually insert/update/delete tuples or set > xmax > as SELECT FOR SHARE does on standby, though, so why do we allow that? It was considered sensible to allow PREPARE on DML on a standby, which required those lock levels. Archives... -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Fri, Nov 26, 2010 at 10:53 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >>>> Incidentally, I haven't been able to wrap my head around why we need >>>> to propagate AccessExclusiveLocks to the standby in the first place. >>>> Can someone explain? >>> >>> To make the standby stop applying WAL when a local transaction on the >>> standby >>> uses an object. >>> E.g. dropping a table on the master need the standby top stop applying >>> wal (or >>> kill the local client using the table). >>> How would you want to protect against something like that otherwise? >> >> Hmm. But it seems like that it would be enough to log any exclusive >> locks held at commit time, rather than logging them as they're >> acquired. By then, the XID will be assigned (if you need it - if you >> don't then you probably don't need to XLOG it anyway) and you avoid >> holding the lock for more than a moment on the standby. >> >> But it seems like an even better idea would be to actually XLOG the >> operations that are problematic specifically. Because, for example, >> if a user session on the master does LOCK TABLE ... IN ACCESS >> EXCLUSIVE MODE, AFAICS there's no reason for the standby to care. Or >> am I confused? > > Let's approach this from a different direction: > > If you have operation A in the master that currently acquires an > AccessExclusiveLock on a table, do you think it's safe for another > transaction to peek at the table at the same time? Beep, time out. The notion of "at the same time" is extremely fuzzy here. The operations on the master and slave are not simultaneous, or anything close to it. Let's go back to the case of a dropped table. Suppose that, on the master, someone begins a transaction, drops a table, and heads out to lunch. Upon returning, they commit the transaction. At what point does it became unsafe for readers on the standby to be looking at the table? Surely, the whole time the guy is out to lunch, readers on the standby are free to do whatever they want. Only at the point when we actually remove the file does it become a problem for somebody to be in the middle of using it. In fact, you could apply the same logic to the master, if you were willing to defer the removal of the actual physical file until all transactions that were using it released their locks. The reason we don't do that - aside from complexity - is that it would result in an unpredictable and indefinite delay between issuing the DROP TABLE command and OS-level storage reclamation. But in the standby situation, there is *already* an unpredictable and indefinite delay. The standby can fall behind in applying WAL, lose connectivity, have replay paused, etc. You lose nothing by waiting until the last possible moment to kick everyone out. (In fact, you gain something: the standby is more usable.) The problem here is not propagating operations from the master, but making sure that actions performed by the startup process on the standby are properly locked. In the case of dropping a relation, the problem is that the startup process only knows which relfilenode it needs to blow away, not which relation that relfilenode is associated with. If the AccessShareLock were against the relfilenode rather than the relation itself, the startup process would have no problem at all generating a conflicting lock - it would simply lock each relfilenode before dropping it, without any additional XLOG information at all. > As a concrete example, VACUUM acquires an AccessExclusiveLock when it wants > to truncate the relation. A sequential scan running against the table in the > standby will get upset, if the startup process replays a truncation record > on the table without warning. This case is similar. xl_smgr_truncate has only a relfilenode number, not a relation OID, so there's no way for the startup process to generate a conflicting lock request itself. But if the standby backends locked the relfilenode, or if the xl_smgr_truncate WAL record included the relation OID, it would be simple. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 26.11.2010 20:17, Robert Haas wrote: > On Fri, Nov 26, 2010 at 10:53 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> If you have operation A in the master that currently acquires an >> AccessExclusiveLock on a table, do you think it's safe for another >> transaction to peek at the table at the same time? > > Beep, time out. The notion of "at the same time" is extremely fuzzy > here. The operations on the master and slave are not simultaneous, or > anything close to it. In this context, "at the same time" means "at the same point in WAL". >> As a concrete example, VACUUM acquires an AccessExclusiveLock when it wants >> to truncate the relation. A sequential scan running against the table in the >> standby will get upset, if the startup process replays a truncation record >> on the table without warning. > > This case is similar. xl_smgr_truncate has only a relfilenode number, > not a relation OID, so there's no way for the startup process to > generate a conflicting lock request itself. But if the standby > backends locked the relfilenode, or if the xl_smgr_truncate WAL record > included the relation OID, it would be simple. If you go down that path, you're going to spend a lot of time thinking through every single case that uses an AccessExclusiveLock, ensuring that the standby has enough information, and tinkering with the replay code to acquire locks at the right moment. And gain what, exactly? WAL-logging AccessExclusiveLocks is a bit ugly, but it beats having to infer that information from other records hands down. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, Nov 26, 2010 at 2:06 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >>> As a concrete example, VACUUM acquires an AccessExclusiveLock when it >>> wants >>> to truncate the relation. A sequential scan running against the table in >>> the >>> standby will get upset, if the startup process replays a truncation >>> record >>> on the table without warning. >> >> This case is similar. xl_smgr_truncate has only a relfilenode number, >> not a relation OID, so there's no way for the startup process to >> generate a conflicting lock request itself. But if the standby >> backends locked the relfilenode, or if the xl_smgr_truncate WAL record >> included the relation OID, it would be simple. > > If you go down that path, you're going to spend a lot of time thinking > through every single case that uses an AccessExclusiveLock, ensuring that > the standby has enough information, and tinkering with the replay code to > acquire locks at the right moment. And gain what, exactly? Well, fewer useless locks on the standby, for one thing, in all likelihood, and less WAL traffic. We probably don't have much of a choice but to put in Simon's suggested fix (with a wal_level check added) for 9.0. But I suspect we're going to end up revisiting this down the road. Beyond the usability issues, the current approach sounds brittle as heck to me. All somebody has to do is introduce a mechanism that drops or rewrites a relation file without an access exclusive lock, and this whole approach snaps right off (e.g. CLUSTER CONCURRENTLY, taking only an EXCLUSIVE lock, with some kind of garbage collection for the old files left behind; or an on-line table-compaction operation that tries to systematically move tuples to lower CTIDs by - for example - writing the new tuple, waiting until all scans in progress at the time of the write have finished, and then pruning the old tuples). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Nov 26, 2010 at 2:06 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> If you go down that path, you're going to spend a lot of time thinking >> through every single case that uses an AccessExclusiveLock, ensuring that >> the standby has enough information, and tinkering with the replay code to >> acquire locks at the right moment. And gain what, exactly? > Well, fewer useless locks on the standby, for one thing, in all > likelihood, and less WAL traffic. I think it's not only useless from a performance standpoint, but probably actually dangerous, to not take AccessExclusiveLock on the standby when it's taken on the master. If you try to delay taking the lock, then locks will be taken in a different order on master and standby, which is quite likely to lead to deadlock situations. Speaking of which, is there any code in there to ensure that a deadlock in the standby is resolved by killing HS queries and not the replay process? Because deadlocks are certainly going to be possible no matter what. > All somebody has to do is introduce a > mechanism that drops or rewrites a relation file without an access > exclusive lock, and this whole approach snaps right off ... as would queries on the master, so that's not ever happening. regards, tom lane
On Fri, Nov 26, 2010 at 6:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Nov 26, 2010 at 2:06 PM, Heikki Linnakangas >> <heikki.linnakangas@enterprisedb.com> wrote: >>> If you go down that path, you're going to spend a lot of time thinking >>> through every single case that uses an AccessExclusiveLock, ensuring that >>> the standby has enough information, and tinkering with the replay code to >>> acquire locks at the right moment. And gain what, exactly? > >> Well, fewer useless locks on the standby, for one thing, in all >> likelihood, and less WAL traffic. > > I think it's not only useless from a performance standpoint, but > probably actually dangerous, to not take AccessExclusiveLock on the > standby when it's taken on the master. If you try to delay taking the > lock, then locks will be taken in a different order on master and > standby, which is quite likely to lead to deadlock situations. As far as I can see, that's complete nonsense. Deadlocks between what and what? To get a deadlock on the standby, you have to have a cycle in the lock-wait graph; and since recovery is single-threaded, all locks from the master are held by the startup process. The only possible cycle is between the startup process and some HS backend; and why should we assume that the HS backend will acquire locks in the same order as the transactions running on the master rather than any other order? Perhaps that could be accomplished by very careful application design, but in normal usage it doesn't sound very likely. In fact, now that I think about it, what I'm proposing would actually substantially REDUCE the risk of deadlock on the standby, because the master would only ever need to lock a backing file long enough to drop or truncate it, whereas under the present system the startup process might need to hold many locks at once. It'd help reduce the chance of lock table overflow, too. > Speaking of which, is there any code in there to ensure that a deadlock > in the standby is resolved by killing HS queries and not the replay > process? Because deadlocks are certainly going to be possible no matter > what. I believe the place to look is ResolveRecoveryConflictWithLock(). >> All somebody has to do is introduce a >> mechanism that drops or rewrites a relation file without an access >> exclusive lock, and this whole approach snaps right off > > ... as would queries on the master, so that's not ever happening. There might be a problem with what I wrote in the part you didn't quote here, but if there is, an explanation would be a lot more helpful than a categorical statement unsupported by any argument. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Nov 26, 2010 at 6:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think it's not only useless from a performance standpoint, but >> probably actually dangerous, to not take AccessExclusiveLock on the >> standby when it's taken on the master. > As far as I can see, that's complete nonsense. Deadlocks between what > and what? master locks table A, then table B. Some transaction on the standby locks table A, then table B (perhaps only with AccessShareLock). This will work reliably only as long as we don't change the order in which replay acquires locks. Now, if the standby process takes its locks in the other order, then it's at risk anyway. But coding that works reliably against the master should not randomly fail on the slave. > In fact, now that I think about it, what I'm proposing would actually > substantially REDUCE the risk of deadlock on the standby, because the > master would only ever need to lock a backing file long enough to drop > or truncate it, whereas under the present system the startup process > might need to hold many locks at once. Now you're the one spouting nonsense. Consider a master transaction that does begin;lock table foo;alter table foo --- some rewriting table operation;alter table foo --- some rewriting table operation;altertable foo --- some rewriting table operation;alter table foo --- some rewriting table operation;alter tablefoo --- some rewriting table operation;alter table foo --- some rewriting table operation;alter table foo --- some rewritingtable operation;commit; On the master, no other transaction can see the intermediate states. We don't want that property to disappear on the slave. regards, tom lane
On Fri, 2010-11-26 at 18:35 -0500, Tom Lane wrote: > Speaking of which, is there any code in there to ensure that a > deadlock > in the standby is resolved by killing HS queries and not the replay > process? Because deadlocks are certainly going to be possible no > matter > what. Locks cause query conflicts, so any attempt to take a lock that is blocked will be resolved before a deadlock takes place. So that situation does not arise in the startup process. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Fri, Nov 26, 2010 at 7:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> In fact, now that I think about it, what I'm proposing would actually >> substantially REDUCE the risk of deadlock on the standby, because the >> master would only ever need to lock a backing file long enough to drop >> or truncate it, whereas under the present system the startup process >> might need to hold many locks at once. > > Now you're the one spouting nonsense. Consider a master transaction > that does > > begin; > lock table foo; > alter table foo --- some rewriting table operation; > alter table foo --- some rewriting table operation; > alter table foo --- some rewriting table operation; > alter table foo --- some rewriting table operation; > alter table foo --- some rewriting table operation; > alter table foo --- some rewriting table operation; > alter table foo --- some rewriting table operation; > commit; > > On the master, no other transaction can see the intermediate states. > We don't want that property to disappear on the slave. And why would it? Which states are visible is something that gets controlled by MVCC, not by the order in which we remove backing files. It's possible that to make what I'm describing work correctly, you need to acquire the locks and then drop all the backing files, rather than interleaving those operations. I'm not sure about that. But acquiring them at the same point in WAL that master did seems quite pointless - AFAICS, you lose nothing by postponing until commit time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, 2010-11-26 at 01:11 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > That would mean running GetCurrentTransactionId() inside LockAcquire() > > > if (lockmode >= AccessExclusiveLock && > > locktag->locktag_type == LOCKTAG_RELATION && > > !RecoveryInProgress()) > > (void) GetCurrentTransactionId(); > > > Any objections to that fix? > > Could we have a wal level test in there too please? It's pretty awful > in any case... Slightly neater version of same idea applied to resolve this. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services