Re: hot_standby_feedback vs excludeVacuum and snapshots - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: hot_standby_feedback vs excludeVacuum and snapshots |
Date | |
Msg-id | CAA4eK1K9qpkYBgO1JB4tT+s_WmJz-UgpWL5RWTAAd=oJgF5RXg@mail.gmail.com Whole thread Raw |
In response to | Re: hot_standby_feedback vs excludeVacuum and snapshots (Andres Freund <andres@anarazel.de>) |
Responses |
Re: hot_standby_feedback vs excludeVacuum and snapshots
|
List | pgsql-hackers |
On Fri, Jun 8, 2018 at 2:55 AM, Andres Freund <andres@anarazel.de> wrote: > > On 2018-06-07 14:19:18 -0700, Andres Freund wrote: > > Hi, > > > > On 2018-03-29 12:17:24 +0100, Greg Stark wrote: > > > I'm poking around to see debug a vacuuming problem and wondering if > > > I've found something more serious. > > > > > > As far as I can tell the snapshots on HOT standby are built using a > > > list of running xids that the primary builds and puts in the WAL and > > > that seems to include all xids from transactions running in all > > > databases. The HOT standby would then build a snapshot and eventually > > > send the xmin of that snapshot back to the primary in the hot standby > > > feedback and that would block vacuuming tuples that might be visible > > > to the standby. > > > > > Many ages ago Alvaro sweated blood to ensure vacuums could run for > > > long periods of time without holding back the xmin horizon and > > > blocking other vacuums from cleaning up tuples. That's the purpose of > > > the excludeVacuum flag in GetCurrentVirtualXIDs(). That's possible > > > because we know vacuums won't insert any tuples that queries might try > > > to view and also vacuums won't try to perform any sql queries on other > > > tables. > > > > > I can't find anywhere that the standby snapshot building mechanism > > > gets this same information about which xids are actually vacuums that > > > can be ignored when building a snapshot. So I'm concerned that the hot > > > standby sending back its xmin would be effectively undermining this > > > mechanism and forcing vacuum xids to be included in the xmin horizon > > > and prevent vacuuming of tuples. > > > > > Am I missing something obvious? Is this a known problem? > > > > Maybe I'm missing something, but the running transaction data reported > > to the standby does *NOT* include anything about lazy vacuums - they > > don't have an xid. The reason there's PROC_IN_VACUUM etc isn't the xid, > > it's *xmin*, no? > > > > We currently do acquire an xid when truncating the relation - but I > > think it'd somewhat fair to argue that that's somewhat of a bug. The > > reason a log is acquired is that we need to log AEL locks, and that > > currently means they have to be assigned to a transaction. > > > > Given that the truncation happens at the end of VACUUM and it *NEEDS* to > > be present on the standby - otherwise the locking stuff is useless - I > > don't think the fix commited in this thread is correct. > > > > Wonder if the right thing here wouldn't be to instead transiently > > acquire an AEL lock during replay when truncating a relation? > If we go that route, then don't we need to bother about when to release the lock which is right now done at the commit/abort of the transaction. In master, for vacuum, we do release the lock immediately after truncate, but I think that is not true generally. So, if we release the lock at a time other than commit/abort, we might end up releasing them at the different times in master and standby. > Isn't the fact that vacuum truncation requires an AEL, and that the > change committed today excludes those transactions from running xacts > records, flat out broken? > > Look at: > > void > ProcArrayApplyRecoveryInfo(RunningTransactions running) > ... > /* > * Remove stale locks, if any. > * > * Locks are always assigned to the toplevel xid so we don't need to care > * about subxcnt/subxids (and by extension not about ->suboverflowed). > */ > StandbyReleaseOldLocks(running->xcnt, running->xids); > > by excluding running transactions you have, as far as I can tell, > effectively removed the vacuum truncation AEL from the standby. Now that > only happens when a running xact record is logged, but as that happens > in the background... > Yeah, that seems problematic. I think we can avoid that if before releasing the lock in StandbyReleaseOldLocks, we also have an additional check to see whether the transaction is committed or aborted as we do before adding it to KnownAssignedXids. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: