Re: Hot Standby (v9d) - Mailing list pgsql-hackers
From | Gregory Stark |
---|---|
Subject | Re: Hot Standby (v9d) |
Date | |
Msg-id | 87mydbjm5x.fsf@oxford.xeocode.com Whole thread Raw |
In response to | Re: Hot Standby (v9d) (Simon Riggs <simon@2ndQuadrant.com>) |
Responses |
Re: Hot Standby (v9d)
Re: Hot Standby (v9d) Re: Hot Standby (v9d) |
List | pgsql-hackers |
I skimmed through the Hot Standby patch for a preliminary review. I noted the following things, some minor tweaks, some just questions. None of the things I noted are big issues unless some of the questions uncover issues. 1) This code is obviously a cut-pasto: + else if (strcmp(tok1, "max_standby_delay") == 0) + { + errno = 0; + maxStandbyDelay = (TransactionId) strtoul(tok2, NULL, 0); + if (errno == EINVAL || errno == ERANGE) + ereport(FATAL, + (errmsg("max_standby_delay is not a valid number: \"%s\"", + tok2))); Have you ever tested the behaviour with max_standby_delay = -1 ? Also, the default max_standby_delay is currently 0 -- ie, kill queries immediately upon detecting any conflicts at all -- which I don't really think anyone would be happy with. I still *strongly* feel the default has to be the non-destructive conservative -1. 2) This hard coded constant of 500ms seems arbitrary to me. If your use case is a heavily-used reporting engine you might get this message all the time. I think this either has to be configurable (warn_standby_delay?) or based on max_standby_delay or some other parameter somehow. + /* + * log that we have been waiting for a while now... + */ + if (!logged && standbyWait_ms > 500) 3) These two blocks of code seem unsatisfactory: ! /* ! * Keep track of the block number of the lastBlockVacuumed, so ! * we can scan those blocks as well during WAL replay. This then ! * provides concurrency protection and allows btrees to be used ! * while in recovery. ! */ ! if (lastBlockVacuumed > vstate->lastBlockVacuumed) ! vstate->lastBlockVacuumed = lastBlockVacuumed; ! + /* + * XXXHS we don't actually need to read the block, we + * just need to confirm it is unpinned. If we had a special call + * into the buffer manager we could optimise this so that + * if the block is not in shared_buffers we confirm it as unpinned. + */ + buffer = XLogReadBufferExtended(xlrec->node, MAIN_FORKNUM, blkno, RBM_NORMAL); + if (BufferIsValid(buffer)) + { + LockBufferForCleanup(buffer); + UnlockReleaseBuffer(buffer); + } Aside from the XXX comment (I thought we actually had such a call now, but if not shouldn't we just add one instead of carping?) I'm not convinced this handles all the cases that can arise. Notable, what happens if a previous vacuum died in the middle of the scan? I think we have a vacuum id which we use already with btrees to the same issue. It seems to me this be more robust if you stamped the xlog record with that id number and ensured it matched the id of the lastBloockVacuumed? Then you could start from 0 if the id changes. 4) Why is this necessary? + if (IsRecoveryProcessingMode() && + locktag->locktag_type == LOCKTAG_OBJECT && + lockmode > AccessShareLock) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("cannot acquire lockmode %s on database objects while recovery is in progress", + lockMethodTable->lockModeNames[lockmode]), + errhint("Only AccessShareLock can be acquired on database objects during recovery."))); + Obviously we can't lock records (SELECT FOR UPDATE) since that requires write access. But shouldn't we be able to do manual LOCK TABLE calls? I hope this isn't the only interlock we have against trying to do write i/o or DDL against tables...? 5) The code for killing off conflicting transactions looks odd to me but I haven't really traced through it to see what it's doing. It seems to kill off just one process? What if there are several holding the lock in question? Also, it doesn't seem to take into account how long the various transactions have held the lock? Is there a risk of, for example, if you have a long report running against a table and lots of OLTP requests against the same table it seems the conflict resolution code would fire randomly into the crowd and hit mostly innocent OLTP transactions until eventually it found the culprit report? Also, it kills of backends using SIGINT which I *think* Tom went through and made safe earlier this release, right? In any case if the backend doesn't die off promptly we wait forever with no further warnings or log messages. I would think we should at least print some sort of message here, even if it's a "can't happen" elog. The doubling thing is probably unnecessary too in this case. + if (!XLogRecPtrIsValid(conflict_LSN)) + { + /* wait awhile for it to die */ + pg_usleep(wontDieWait * 5000L); + wontDieWait *= 2; + } + } 6) I still don't understand why you need unobserved_xids. We don't need this in normal running, an xid we don't know for certain is committed is exactly the same as a transaction we know is currently running or aborted. So why do you need it during HS? The comment says: + * This is very important because if we leave + * those xids out of the snapshot then they will appear to be already complete. + * Later, when they have actually completed this could lead to confusion as to + * whether those xids are visible or not, blowing a huge hole in MVCC. + * We need 'em. But that doesn't sound rational to me. I'm not sure what "confusion" this would cause. If they later actually complete then any existing snapshots would still not see them. And any later snapshots wouldn't be confused by the earlier conclusions. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!
pgsql-hackers by date: