Thread: old_snapshot_threshold allows heap:toast disagreement
On Thu, Jul 21, 2016 at 12:06 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Jul 20, 2016 at 9:15 PM, Noah Misch <noah@leadboat.com> wrote: >> This PostgreSQL 9.6 open item now needs a permanent owner. Would any other >> committer like to take ownership? If this role interests you, please read >> this thread and the policy linked above, then send an initial status update >> bearing a date for your subsequent status update. If the item does not have a >> permanent owner by 2016-07-24 02:00 UTC, I will resolve the item by reverting >> commit 848ef42 and followups. > > I will adopt this item. I will provide an initial patch for this > issue, or convince someone else to do so, within one week. Therefore, > expect a further status update from me on or before July 28th. I > expect that the patch will be based on ideas from these emails: > > https://www.postgresql.org/message-id/1AB8F80A-D16E-4154-9497-98FBB164253D@anarazel.de > https://www.postgresql.org/message-id/20160720181213.f4io7gc6lyc377sw@alap3.anarazel.de Here is a patch. Please review. I'm not suffering from an overwhelming excess of confidence that this is correct. In particular, I'm concerned about the fact that there isn't always a registered snapshot - if you make it ERROR out when that happens instead of doing what it actually does, the regression tests fail in CLUSTER and CREATE VIEW. Also, I wonder why it's right to use pairingheap_first() instead of looking at the oldest snapshot on the active snapshot stack, or conversely why that is right and this is not. Or maybe we need to check both. The basic principle of testing both MVCC and TOAST snapshots for snapshot-too-old problems seems sound. Furthermore, it seems clear enough that if we're ever looking up a datum in the TOAST table whose xmin is no longer included in our MyPgXact->xmin, then we're vulnerable to having TOAST chunks vacuumed away under us even without snapshot_too_old enabled. But there's a bit of spooky action at a distance: if we don't see any snapshots, then we have to assume that the scan is being performed with a non-MVCC snapshot and therefore we don't need to worry. But there's no way to verify that via an assertion, because the connection between the relevant MVCC snapshot and the corresponding TOAST snapshot is pretty indirect, mediated only by snapmgr.c. It's a bit like a 9-1-1 operator (or whatever your local emergency number is) saying "hey, thanks for calling. if I don't hear back from you about this issue again, I'll just assume everything's OK." That might actually the correct approach in some cases, but it certainly comes with a bit of risk. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Hi, On 2016-07-26 15:13:41 -0400, Robert Haas wrote: > On Thu, Jul 21, 2016 at 12:06 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Jul 20, 2016 at 9:15 PM, Noah Misch <noah@leadboat.com> wrote: > >> This PostgreSQL 9.6 open item now needs a permanent owner. Would any other > >> committer like to take ownership? If this role interests you, please read > >> this thread and the policy linked above, then send an initial status update > >> bearing a date for your subsequent status update. If the item does not have a > >> permanent owner by 2016-07-24 02:00 UTC, I will resolve the item by reverting > >> commit 848ef42 and followups. > > > > I will adopt this item. I will provide an initial patch for this > > issue, or convince someone else to do so, within one week. Therefore, > > expect a further status update from me on or before July 28th. I > > expect that the patch will be based on ideas from these emails: > > > > https://www.postgresql.org/message-id/1AB8F80A-D16E-4154-9497-98FBB164253D@anarazel.de > > https://www.postgresql.org/message-id/20160720181213.f4io7gc6lyc377sw@alap3.anarazel.de > > Here is a patch. Cool! Why did you decide to introduce MaximumXLogRecPtr? Wouldn't just using InvalidXLogRecPtr do the trick? That already prevents errors. > Please review. I'm not suffering from an overwhelming excess of > confidence that this is correct. In particular, I'm concerned about > the fact that there isn't always a registered snapshot - if you make > it ERROR out when that happens instead of doing what it actually does, > the regression tests fail in CLUSTER and CREATE VIEW. Right. > Also, I wonder why it's right to use > pairingheap_first() instead of looking at the oldest snapshot on the > active snapshot stack, or conversely why that is right and this is > not. Or maybe we need to check both. Well, generally, the older the snapshot we use is, the more we'll error out spuriously. The reason to use the oldest from the pairing heap is that that'll be the most conservative value, right? If there's neither an active, nor a registered snapshot, we'll not prevent pruning in the first place (and xmin ought to be invalid). As registered snapshots are usually "older" than active ones, we definitely have to check those. But you're right, it seems safer to also check active ones - which squares with SnapshotResetXmin(). > The basic principle of testing both MVCC and TOAST snapshots for > snapshot-too-old problems seems sound. Furthermore, it seems clear > enough that if we're ever looking up a datum in the TOAST table whose > xmin is no longer included in our MyPgXact->xmin, then we're > vulnerable to having TOAST chunks vacuumed away under us even without > snapshot_too_old enabled. Exactly. > But there's a bit of spooky action at a > distance: if we don't see any snapshots, then we have to assume that > the scan is being performed with a non-MVCC snapshot and therefore we > don't need to worry. But there's no way to verify that via an > assertion, because the connection between the relevant MVCC snapshot > and the corresponding TOAST snapshot is pretty indirect, mediated only > by snapmgr.c. Hm. Could we perhaps assert that the session has a valid xmin? Greetings, Andres Freund
On Wed, Jul 27, 2016 at 7:26 PM, Andres Freund <andres@anarazel.de> wrote: > Why did you decide to introduce MaximumXLogRecPtr? Wouldn't just using > InvalidXLogRecPtr do the trick? That already prevents errors. Oh, right. >> Also, I wonder why it's right to use >> pairingheap_first() instead of looking at the oldest snapshot on the >> active snapshot stack, or conversely why that is right and this is >> not. Or maybe we need to check both. > > Well, generally, the older the snapshot we use is, the more we'll error > out spuriously. The reason to use the oldest from the pairing heap is > that that'll be the most conservative value, right? If there's neither > an active, nor a registered snapshot, we'll not prevent pruning in the > first place (and xmin ought to be invalid). As registered snapshots are > usually "older" than active ones, we definitely have to check those. But > you're right, it seems safer to also check active ones - which squares > with SnapshotResetXmin(). OK. That's a bit inconvenient because we don't have an O(1) way to access the bottom of the active snapshot stack; I've attempted to add such a mechanism here. >> But there's a bit of spooky action at a >> distance: if we don't see any snapshots, then we have to assume that >> the scan is being performed with a non-MVCC snapshot and therefore we >> don't need to worry. But there's no way to verify that via an >> assertion, because the connection between the relevant MVCC snapshot >> and the corresponding TOAST snapshot is pretty indirect, mediated only >> by snapmgr.c. > > Hm. Could we perhaps assert that the session has a valid xmin? I don't think so. CLUSTER? New version attached. (Official status update: I expect to have this issue wrapped up in the next few days, assuming that review and discussion continue. If for some reason that fails to be the case, I will provide a further official status update no later than Tuesday of next week: August 2, 2016.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 2016-07-28 15:40:48 -0400, Robert Haas wrote: > On Wed, Jul 27, 2016 at 7:26 PM, Andres Freund <andres@anarazel.de> wrote: > >> Also, I wonder why it's right to use > >> pairingheap_first() instead of looking at the oldest snapshot on the > >> active snapshot stack, or conversely why that is right and this is > >> not. Or maybe we need to check both. > > > > Well, generally, the older the snapshot we use is, the more we'll error > > out spuriously. The reason to use the oldest from the pairing heap is > > that that'll be the most conservative value, right? If there's neither > > an active, nor a registered snapshot, we'll not prevent pruning in the > > first place (and xmin ought to be invalid). As registered snapshots are > > usually "older" than active ones, we definitely have to check those. But > > you're right, it seems safer to also check active ones - which squares > > with SnapshotResetXmin(). > > OK. That's a bit inconvenient because we don't have an O(1) way to > access the bottom of the active snapshot stack; I've attempted to add > such a mechanism here. I think just iterating through the active snapshots would have been fine. Afaics there's no guarantee that the first active snapshot pushed is the relevant one - in contrast to registered one, which are ordered by virtue of the heap. > >> But there's a bit of spooky action at a > >> distance: if we don't see any snapshots, then we have to assume that > >> the scan is being performed with a non-MVCC snapshot and therefore we > >> don't need to worry. But there's no way to verify that via an > >> assertion, because the connection between the relevant MVCC snapshot > >> and the corresponding TOAST snapshot is pretty indirect, mediated only > >> by snapmgr.c. > > > > Hm. Could we perhaps assert that the session has a valid xmin? > > I don't think so. CLUSTER? That should have one during any toast lookups afaics - the relevant code is /* Start a new transaction for each relation. */ StartTransactionCommand(); /* functions in indexesmay want a snapshot set */ PushActiveSnapshot(GetTransactionSnapshot()); /* Do the job. */ cluster_rel(rvtc->tableOid,rvtc->indexOid, true, stmt->verbose); PopActiveSnapshot(); CommitTransactionCommand(); right? And Snapshot GetSnapshotData(Snapshot snapshot) { ... if (!TransactionIdIsValid(MyPgXact->xmin)) MyPgXact->xmin = TransactionXmin = xmin; sets xmin. Greetings, Andres Freund
On Thu, Jul 28, 2016 at 7:29 PM, Andres Freund <andres@anarazel.de> wrote: > I think just iterating through the active snapshots would have been > fine. Afaics there's no guarantee that the first active snapshot pushed > is the relevant one - in contrast to registered one, which are ordered > by virtue of the heap. I think the oldest snapshot has to be on the bottom of the stack; how not? >> > Hm. Could we perhaps assert that the session has a valid xmin? >> >> I don't think so. CLUSTER? > > That should have one during any toast lookups afaics - the relevant code > is > /* Start a new transaction for each relation. */ > StartTransactionCommand(); > /* functions in indexes may want a snapshot set */ > PushActiveSnapshot(GetTransactionSnapshot()); > /* Do the job. */ > cluster_rel(rvtc->tableOid, rvtc->indexOid, true, stmt->verbose); > PopActiveSnapshot(); > CommitTransactionCommand(); > right? And > Snapshot > GetSnapshotData(Snapshot snapshot) > { > ... > > if (!TransactionIdIsValid(MyPgXact->xmin)) > MyPgXact->xmin = TransactionXmin = xmin; > sets xmin. Hmm, OK, I'll have to check on that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2016-07-28 23:08:13 -0400, Robert Haas wrote: > On Thu, Jul 28, 2016 at 7:29 PM, Andres Freund <andres@anarazel.de> wrote: > > I think just iterating through the active snapshots would have been > > fine. Afaics there's no guarantee that the first active snapshot pushed > > is the relevant one - in contrast to registered one, which are ordered > > by virtue of the heap. > > I think the oldest snapshot has to be on the bottom of the stack; how not? Well, one might push a previously acuired (and registered) snapshot onto the stack. Afaics that'll only happen if the snapshot is already registered, but I'm not sure.
On Fri, Jul 29, 2016 at 1:10 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Jul 27, 2016 at 7:26 PM, Andres Freund <andres@anarazel.de> wrote: > > New version attached. > +static inline void +InitToastSnapshot(Snapshot snapshot, XLogRecPtr lsn) +{ + snapshot->satisfies = HeapTupleSatisfiesToast; + snapshot->lsn = lsn; +} Here, don't you need to initialize whenTaken as that is also used in TestForOldSnapshot_impl() to report error "snapshot too old". -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sat, Jul 30, 2016 at 8:17 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Fri, Jul 29, 2016 at 1:10 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Wed, Jul 27, 2016 at 7:26 PM, Andres Freund <andres@anarazel.de> wrote: >> >> New version attached. > > +static inline void > +InitToastSnapshot(Snapshot snapshot, XLogRecPtr lsn) > +{ > + snapshot->satisfies = HeapTupleSatisfiesToast; > + snapshot->lsn = lsn; > +} > > Here, don't you need to initialize whenTaken as that is also used in > TestForOldSnapshot_impl() to report error "snapshot too old". Hmm, yeah. This is actually a bit confusing. We want the "oldest" snapshot, but there are three different notions of "oldest": 1. Smallest LSN. 2. Smallest whenTaken. 3. Smallest xmin. Which one do we use? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Aug 2, 2016 at 9:10 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Jul 30, 2016 at 8:17 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Fri, Jul 29, 2016 at 1:10 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Wed, Jul 27, 2016 at 7:26 PM, Andres Freund <andres@anarazel.de> wrote: >>> >>> New version attached. >> >> +static inline void >> +InitToastSnapshot(Snapshot snapshot, XLogRecPtr lsn) >> +{ >> + snapshot->satisfies = HeapTupleSatisfiesToast; >> + snapshot->lsn = lsn; >> +} >> >> Here, don't you need to initialize whenTaken as that is also used in >> TestForOldSnapshot_impl() to report error "snapshot too old". > > Hmm, yeah. This is actually a bit confusing. We want the "oldest" > snapshot, but there are three different notions of "oldest": > > 1. Smallest LSN. > 2. Smallest whenTaken. > 3. Smallest xmin. > > Which one do we use? > Which ever notion we choose, I think we should use the LSN and whenTaken from the same snapshot. I think we can choose the snapshot with smallest xmin and then use the LSN and whenTaken from it. I think xmin comparison makes more sense because you are already retrieving smallest xmin snapshot from the registered snapshots. Whatever value we choose here, I think we can't guarantee that it will be in sync with the value used for heap. So there is a chance that we might spuriously raise "snapshot too old" error. I think we should mentioned this as a caveat in docs [1] where we explain behaviour of about old_snapshot_threshold. [1] - https://www.postgresql.org/docs/9.6/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Jul 28, 2016 at 7:29 PM, Andres Freund <andres@anarazel.de> wrote: > I think just iterating through the active snapshots would have been > fine. Afaics there's no guarantee that the first active snapshot pushed > is the relevant one - in contrast to registered one, which are ordered > by virtue of the heap. After a bit of scrutiny, I don't think this is a problem, and I don't want to introduce belt-and-suspenders protections against can't-happen conditions that may get cargo-culted into other places. (How's that for cramming as many Tom-Lane-isms as possible into one sentence? More might be done, but it would be some sort of bespoke crock of the first water.) Anyway, the reason I think it's OK is that PushActiveSnapshot() does not copy the input snapshot unless it's CurrentSnapshot or SecondarySnapshot -- so every snapshot that gets pushed on the active snapshot stack is either one that's already registered or one that was just taken (because CurrentSnapshot will get overwritten on next call to GetTransactionSnapshot). In the former case, it's included in the registered snapshots so it doesn't even matter whether we notice that it is also on the active snapshot stack. In the latter case, having just been taken, it must be newer than the oldest registered snapshot, which was necessarily taken earlier. I think that the only time it matters whether we even look at the active snapshot stack is when there are no registered snapshots whatsoever. In many cases -- like QueryDescs -- we register the snapshot first and then later put it on the active snapshot stack, but there are some things like CLUSTER that only make the snapshot active and don't register it. But if they do that, they can only do it in first-in, first-out fashion. >> >> But there's a bit of spooky action at a >> >> distance: if we don't see any snapshots, then we have to assume that >> >> the scan is being performed with a non-MVCC snapshot and therefore we >> >> don't need to worry. But there's no way to verify that via an >> >> assertion, because the connection between the relevant MVCC snapshot >> >> and the corresponding TOAST snapshot is pretty indirect, mediated only >> >> by snapmgr.c. >> > >> > Hm. Could we perhaps assert that the session has a valid xmin? >> >> I don't think so. CLUSTER? > > That should have one during any toast lookups afaics - the relevant code > is > /* Start a new transaction for each relation. */ > StartTransactionCommand(); > /* functions in indexes may want a snapshot set */ > PushActiveSnapshot(GetTransactionSnapshot()); > /* Do the job. */ > cluster_rel(rvtc->tableOid, rvtc->indexOid, true, stmt->verbose); > PopActiveSnapshot(); > CommitTransactionCommand(); > right? And > Snapshot > GetSnapshotData(Snapshot snapshot) > { > ... > > if (!TransactionIdIsValid(MyPgXact->xmin)) > MyPgXact->xmin = TransactionXmin = xmin; > sets xmin. Yeah. Actually, consistent with the above, I discovered that as long as we consult both the active snapshot stack and the pairingheap of registered snapshots, it seems to be fine to just insist that we always get a snapshot back from the snapmgr and use that to initialize the TOAST snapshot. So I did it that way in the attached version. New patch attached. Barring objections, I will commit this tomorrow. If there are objections, I will send another status update tomorrow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Wed, Aug 3, 2016 at 2:22 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Jul 28, 2016 at 7:29 PM, Andres Freund <andres@anarazel.de> wrote: > > Yeah. Actually, consistent with the above, I discovered that as long > as we consult both the active snapshot stack and the pairingheap of > registered snapshots, it seems to be fine to just insist that we > always get a snapshot back from the snapmgr and use that to initialize > the TOAST snapshot. So I did it that way in the attached version. > > New patch attached. > Code looks good to me. I have not tested the patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com