Thread: old_snapshot_threshold allows heap:toast disagreement

old_snapshot_threshold allows heap:toast disagreement

From
Robert Haas
Date:
On Thu, Jul 21, 2016 at 12:06 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jul 20, 2016 at 9:15 PM, Noah Misch <noah@leadboat.com> wrote:
>> This PostgreSQL 9.6 open item now needs a permanent owner.  Would any other
>> committer like to take ownership?  If this role interests you, please read
>> this thread and the policy linked above, then send an initial status update
>> bearing a date for your subsequent status update.  If the item does not have a
>> permanent owner by 2016-07-24 02:00 UTC, I will resolve the item by reverting
>> commit 848ef42 and followups.
>
> I will adopt this item.  I will provide an initial patch for this
> issue, or convince someone else to do so, within one week.  Therefore,
> expect a further status update from me on or before July 28th.  I
> expect that the patch will be based on ideas from these emails:
>
> https://www.postgresql.org/message-id/1AB8F80A-D16E-4154-9497-98FBB164253D@anarazel.de
> https://www.postgresql.org/message-id/20160720181213.f4io7gc6lyc377sw@alap3.anarazel.de

Here is a patch.  Please review.  I'm not suffering from an
overwhelming excess of confidence that this is correct.  In
particular, I'm concerned about the fact that there isn't always a
registered snapshot - if you make it ERROR out when that happens
instead of doing what it actually does, the regression tests fail in
CLUSTER and CREATE VIEW.  Also, I wonder why it's right to use
pairingheap_first() instead of looking at the oldest snapshot on the
active snapshot stack, or conversely why that is right and this is
not.  Or maybe we need to check both.

The basic principle of testing both MVCC and TOAST snapshots for
snapshot-too-old problems seems sound.  Furthermore, it seems clear
enough that if we're ever looking up a datum in the TOAST table whose
xmin is no longer included in our MyPgXact->xmin, then we're
vulnerable to having TOAST chunks vacuumed away under us even without
snapshot_too_old enabled.  But there's a bit of spooky action at a
distance: if we don't see any snapshots, then we have to assume that
the scan is being performed with a non-MVCC snapshot and therefore we
don't need to worry.  But there's no way to verify that via an
assertion, because the connection between the relevant MVCC snapshot
and the corresponding TOAST snapshot is pretty indirect, mediated only
by snapmgr.c.  It's a bit like a 9-1-1 operator (or whatever your
local emergency number is) saying "hey, thanks for calling.  if I
don't hear back from you about this issue again, I'll just assume
everything's OK."  That might actually the correct approach in some
cases, but it certainly comes with a bit of risk.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: old_snapshot_threshold allows heap:toast disagreement

From
Andres Freund
Date:
Hi,

On 2016-07-26 15:13:41 -0400, Robert Haas wrote:
> On Thu, Jul 21, 2016 at 12:06 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Wed, Jul 20, 2016 at 9:15 PM, Noah Misch <noah@leadboat.com> wrote:
> >> This PostgreSQL 9.6 open item now needs a permanent owner.  Would any other
> >> committer like to take ownership?  If this role interests you, please read
> >> this thread and the policy linked above, then send an initial status update
> >> bearing a date for your subsequent status update.  If the item does not have a
> >> permanent owner by 2016-07-24 02:00 UTC, I will resolve the item by reverting
> >> commit 848ef42 and followups.
> >
> > I will adopt this item.  I will provide an initial patch for this
> > issue, or convince someone else to do so, within one week.  Therefore,
> > expect a further status update from me on or before July 28th.  I
> > expect that the patch will be based on ideas from these emails:
> >
> > https://www.postgresql.org/message-id/1AB8F80A-D16E-4154-9497-98FBB164253D@anarazel.de
> > https://www.postgresql.org/message-id/20160720181213.f4io7gc6lyc377sw@alap3.anarazel.de
> 
> Here is a patch.

Cool!


Why did you decide to introduce MaximumXLogRecPtr? Wouldn't just using
InvalidXLogRecPtr do the trick? That already prevents errors.


> Please review.  I'm not suffering from an overwhelming excess of
> confidence that this is correct.  In particular, I'm concerned about
> the fact that there isn't always a registered snapshot - if you make
> it ERROR out when that happens instead of doing what it actually does,
> the regression tests fail in CLUSTER and CREATE VIEW.

Right.


> Also, I wonder why it's right to use
> pairingheap_first() instead of looking at the oldest snapshot on the
> active snapshot stack, or conversely why that is right and this is
> not.  Or maybe we need to check both.

Well, generally, the older the snapshot we use is, the more we'll error
out spuriously. The reason to use the oldest from the pairing heap is
that that'll be the most conservative value, right?  If there's neither
an active, nor a registered snapshot, we'll not prevent pruning in the
first place (and xmin ought to be invalid).  As registered snapshots are
usually "older" than active ones, we definitely have to check those. But
you're right, it seems safer to also check active ones - which squares
with SnapshotResetXmin().


> The basic principle of testing both MVCC and TOAST snapshots for
> snapshot-too-old problems seems sound.  Furthermore, it seems clear
> enough that if we're ever looking up a datum in the TOAST table whose
> xmin is no longer included in our MyPgXact->xmin, then we're
> vulnerable to having TOAST chunks vacuumed away under us even without
> snapshot_too_old enabled.

Exactly.


> But there's a bit of spooky action at a
> distance: if we don't see any snapshots, then we have to assume that
> the scan is being performed with a non-MVCC snapshot and therefore we
> don't need to worry.  But there's no way to verify that via an
> assertion, because the connection between the relevant MVCC snapshot
> and the corresponding TOAST snapshot is pretty indirect, mediated only
> by snapmgr.c.

Hm. Could we perhaps assert that the session has a valid xmin?


Greetings,

Andres Freund



Re: old_snapshot_threshold allows heap:toast disagreement

From
Robert Haas
Date:
On Wed, Jul 27, 2016 at 7:26 PM, Andres Freund <andres@anarazel.de> wrote:
> Why did you decide to introduce MaximumXLogRecPtr? Wouldn't just using
> InvalidXLogRecPtr do the trick? That already prevents errors.

Oh, right.

>> Also, I wonder why it's right to use
>> pairingheap_first() instead of looking at the oldest snapshot on the
>> active snapshot stack, or conversely why that is right and this is
>> not.  Or maybe we need to check both.
>
> Well, generally, the older the snapshot we use is, the more we'll error
> out spuriously. The reason to use the oldest from the pairing heap is
> that that'll be the most conservative value, right?  If there's neither
> an active, nor a registered snapshot, we'll not prevent pruning in the
> first place (and xmin ought to be invalid).  As registered snapshots are
> usually "older" than active ones, we definitely have to check those. But
> you're right, it seems safer to also check active ones - which squares
> with SnapshotResetXmin().

OK.  That's a bit inconvenient because we don't have an O(1) way to
access the bottom of the active snapshot stack; I've attempted to add
such a mechanism here.

>> But there's a bit of spooky action at a
>> distance: if we don't see any snapshots, then we have to assume that
>> the scan is being performed with a non-MVCC snapshot and therefore we
>> don't need to worry.  But there's no way to verify that via an
>> assertion, because the connection between the relevant MVCC snapshot
>> and the corresponding TOAST snapshot is pretty indirect, mediated only
>> by snapmgr.c.
>
> Hm. Could we perhaps assert that the session has a valid xmin?

I don't think so.  CLUSTER?

New version attached.

(Official status update: I expect to have this issue wrapped up in the
next few days, assuming that review and discussion continue.  If for
some reason that fails to be the case, I will provide a further
official status update no later than Tuesday of next week: August 2,
2016.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: old_snapshot_threshold allows heap:toast disagreement

From
Andres Freund
Date:
On 2016-07-28 15:40:48 -0400, Robert Haas wrote:
> On Wed, Jul 27, 2016 at 7:26 PM, Andres Freund <andres@anarazel.de> wrote:
> >> Also, I wonder why it's right to use
> >> pairingheap_first() instead of looking at the oldest snapshot on the
> >> active snapshot stack, or conversely why that is right and this is
> >> not.  Or maybe we need to check both.
> >
> > Well, generally, the older the snapshot we use is, the more we'll error
> > out spuriously. The reason to use the oldest from the pairing heap is
> > that that'll be the most conservative value, right?  If there's neither
> > an active, nor a registered snapshot, we'll not prevent pruning in the
> > first place (and xmin ought to be invalid).  As registered snapshots are
> > usually "older" than active ones, we definitely have to check those. But
> > you're right, it seems safer to also check active ones - which squares
> > with SnapshotResetXmin().
> 
> OK.  That's a bit inconvenient because we don't have an O(1) way to
> access the bottom of the active snapshot stack; I've attempted to add
> such a mechanism here.

I think just iterating through the active snapshots would have been
fine. Afaics there's no guarantee that the first active snapshot pushed
is the relevant one - in contrast to registered one, which are ordered
by virtue of the heap.


> >> But there's a bit of spooky action at a
> >> distance: if we don't see any snapshots, then we have to assume that
> >> the scan is being performed with a non-MVCC snapshot and therefore we
> >> don't need to worry.  But there's no way to verify that via an
> >> assertion, because the connection between the relevant MVCC snapshot
> >> and the corresponding TOAST snapshot is pretty indirect, mediated only
> >> by snapmgr.c.
> >
> > Hm. Could we perhaps assert that the session has a valid xmin?
> 
> I don't think so.  CLUSTER?

That should have one during any toast lookups afaics - the relevant code
is        /* Start a new transaction for each relation. */        StartTransactionCommand();        /* functions in
indexesmay want a snapshot set */        PushActiveSnapshot(GetTransactionSnapshot());        /* Do the job. */
cluster_rel(rvtc->tableOid,rvtc->indexOid, true, stmt->verbose);        PopActiveSnapshot();
CommitTransactionCommand();
right? And
Snapshot
GetSnapshotData(Snapshot snapshot)
{
...
if (!TransactionIdIsValid(MyPgXact->xmin))    MyPgXact->xmin = TransactionXmin = xmin;
sets xmin.


Greetings,

Andres Freund



Re: old_snapshot_threshold allows heap:toast disagreement

From
Robert Haas
Date:
On Thu, Jul 28, 2016 at 7:29 PM, Andres Freund <andres@anarazel.de> wrote:
> I think just iterating through the active snapshots would have been
> fine. Afaics there's no guarantee that the first active snapshot pushed
> is the relevant one - in contrast to registered one, which are ordered
> by virtue of the heap.

I think the oldest snapshot has to be on the bottom of the stack; how not?

>> > Hm. Could we perhaps assert that the session has a valid xmin?
>>
>> I don't think so.  CLUSTER?
>
> That should have one during any toast lookups afaics - the relevant code
> is
>                         /* Start a new transaction for each relation. */
>                         StartTransactionCommand();
>                         /* functions in indexes may want a snapshot set */
>                         PushActiveSnapshot(GetTransactionSnapshot());
>                         /* Do the job. */
>                         cluster_rel(rvtc->tableOid, rvtc->indexOid, true, stmt->verbose);
>                         PopActiveSnapshot();
>                         CommitTransactionCommand();
> right? And
> Snapshot
> GetSnapshotData(Snapshot snapshot)
> {
> ...
>
>         if (!TransactionIdIsValid(MyPgXact->xmin))
>                 MyPgXact->xmin = TransactionXmin = xmin;
> sets xmin.

Hmm, OK, I'll have to check on that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: old_snapshot_threshold allows heap:toast disagreement

From
Andres Freund
Date:
On 2016-07-28 23:08:13 -0400, Robert Haas wrote:
> On Thu, Jul 28, 2016 at 7:29 PM, Andres Freund <andres@anarazel.de> wrote:
> > I think just iterating through the active snapshots would have been
> > fine. Afaics there's no guarantee that the first active snapshot pushed
> > is the relevant one - in contrast to registered one, which are ordered
> > by virtue of the heap.
> 
> I think the oldest snapshot has to be on the bottom of the stack; how not?

Well, one might push a previously acuired (and registered) snapshot onto
the stack. Afaics that'll only happen if the snapshot is already
registered, but I'm not sure.



Re: old_snapshot_threshold allows heap:toast disagreement

From
Amit Kapila
Date:
On Fri, Jul 29, 2016 at 1:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jul 27, 2016 at 7:26 PM, Andres Freund <andres@anarazel.de> wrote:
>
> New version attached.
>

+static inline void
+InitToastSnapshot(Snapshot snapshot, XLogRecPtr lsn)
+{
+ snapshot->satisfies = HeapTupleSatisfiesToast;
+ snapshot->lsn = lsn;
+}

Here, don't you need to initialize whenTaken as that is also used in
TestForOldSnapshot_impl() to report error "snapshot too old".


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: old_snapshot_threshold allows heap:toast disagreement

From
Robert Haas
Date:
On Sat, Jul 30, 2016 at 8:17 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Jul 29, 2016 at 1:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Jul 27, 2016 at 7:26 PM, Andres Freund <andres@anarazel.de> wrote:
>>
>> New version attached.
>
> +static inline void
> +InitToastSnapshot(Snapshot snapshot, XLogRecPtr lsn)
> +{
> + snapshot->satisfies = HeapTupleSatisfiesToast;
> + snapshot->lsn = lsn;
> +}
>
> Here, don't you need to initialize whenTaken as that is also used in
> TestForOldSnapshot_impl() to report error "snapshot too old".

Hmm, yeah.  This is actually a bit confusing.  We want the "oldest"
snapshot, but there are three different notions of "oldest":

1. Smallest LSN.
2. Smallest whenTaken.
3. Smallest xmin.

Which one do we use?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: old_snapshot_threshold allows heap:toast disagreement

From
Amit Kapila
Date:
On Tue, Aug 2, 2016 at 9:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Jul 30, 2016 at 8:17 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Fri, Jul 29, 2016 at 1:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Wed, Jul 27, 2016 at 7:26 PM, Andres Freund <andres@anarazel.de> wrote:
>>>
>>> New version attached.
>>
>> +static inline void
>> +InitToastSnapshot(Snapshot snapshot, XLogRecPtr lsn)
>> +{
>> + snapshot->satisfies = HeapTupleSatisfiesToast;
>> + snapshot->lsn = lsn;
>> +}
>>
>> Here, don't you need to initialize whenTaken as that is also used in
>> TestForOldSnapshot_impl() to report error "snapshot too old".
>
> Hmm, yeah.  This is actually a bit confusing.  We want the "oldest"
> snapshot, but there are three different notions of "oldest":
>
> 1. Smallest LSN.
> 2. Smallest whenTaken.
> 3. Smallest xmin.
>
> Which one do we use?
>

Which ever notion we choose, I think we should use the LSN and
whenTaken from the same snapshot.  I think we can choose the snapshot
with smallest xmin and then use the LSN and whenTaken from it.  I
think xmin comparison makes more sense because you are already
retrieving smallest xmin snapshot from the registered snapshots.

Whatever value we choose here, I think we can't guarantee that it will
be in sync with the value used for heap.  So there is a chance that we
might spuriously raise "snapshot too old" error.  I think we should
mentioned this as a caveat in docs [1] where we explain behaviour of
about old_snapshot_threshold.

[1] - https://www.postgresql.org/docs/9.6/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: old_snapshot_threshold allows heap:toast disagreement

From
Robert Haas
Date:
On Thu, Jul 28, 2016 at 7:29 PM, Andres Freund <andres@anarazel.de> wrote:
> I think just iterating through the active snapshots would have been
> fine. Afaics there's no guarantee that the first active snapshot pushed
> is the relevant one - in contrast to registered one, which are ordered
> by virtue of the heap.

After a bit of scrutiny, I don't think this is a problem, and I don't
want to introduce belt-and-suspenders protections against can't-happen
conditions that may get cargo-culted into other places.  (How's that
for cramming as many Tom-Lane-isms as possible into one sentence?
More might be done, but it would be some sort of bespoke crock of the
first water.)

Anyway, the reason I think it's OK is that PushActiveSnapshot() does
not copy the input snapshot unless it's CurrentSnapshot or
SecondarySnapshot -- so every snapshot that gets pushed on the active
snapshot stack is either one that's already registered or one that was
just taken (because CurrentSnapshot will get overwritten on next call
to GetTransactionSnapshot).  In the former case, it's included in the
registered snapshots so it doesn't even matter whether we notice that
it is also on the active snapshot stack.  In the latter case, having
just been taken, it must be newer than the oldest registered snapshot,
which was necessarily taken earlier.  I think that the only time it
matters whether we even look at the active snapshot stack is when
there are no registered snapshots whatsoever.  In many cases -- like
QueryDescs -- we register the snapshot first and then later put it on
the active snapshot stack, but there are some things like CLUSTER that
only make the snapshot active and don't register it.  But if they do
that, they can only do it in first-in, first-out fashion.

>> >> But there's a bit of spooky action at a
>> >> distance: if we don't see any snapshots, then we have to assume that
>> >> the scan is being performed with a non-MVCC snapshot and therefore we
>> >> don't need to worry.  But there's no way to verify that via an
>> >> assertion, because the connection between the relevant MVCC snapshot
>> >> and the corresponding TOAST snapshot is pretty indirect, mediated only
>> >> by snapmgr.c.
>> >
>> > Hm. Could we perhaps assert that the session has a valid xmin?
>>
>> I don't think so.  CLUSTER?
>
> That should have one during any toast lookups afaics - the relevant code
> is
>                         /* Start a new transaction for each relation. */
>                         StartTransactionCommand();
>                         /* functions in indexes may want a snapshot set */
>                         PushActiveSnapshot(GetTransactionSnapshot());
>                         /* Do the job. */
>                         cluster_rel(rvtc->tableOid, rvtc->indexOid, true, stmt->verbose);
>                         PopActiveSnapshot();
>                         CommitTransactionCommand();
> right? And
> Snapshot
> GetSnapshotData(Snapshot snapshot)
> {
> ...
>
>         if (!TransactionIdIsValid(MyPgXact->xmin))
>                 MyPgXact->xmin = TransactionXmin = xmin;
> sets xmin.

Yeah.  Actually, consistent with the above, I discovered that as long
as we consult both the active snapshot stack and the pairingheap of
registered snapshots, it seems to be fine to just insist that we
always get a snapshot back from the snapmgr and use that to initialize
the TOAST snapshot.  So I did it that way in the attached version.

New patch attached.  Barring objections, I will commit this tomorrow.
If there are objections, I will send another status update tomorrow.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: old_snapshot_threshold allows heap:toast disagreement

From
Amit Kapila
Date:
On Wed, Aug 3, 2016 at 2:22 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Jul 28, 2016 at 7:29 PM, Andres Freund <andres@anarazel.de> wrote:
>
> Yeah.  Actually, consistent with the above, I discovered that as long
> as we consult both the active snapshot stack and the pairingheap of
> registered snapshots, it seems to be fine to just insist that we
> always get a snapshot back from the snapmgr and use that to initialize
> the TOAST snapshot.  So I did it that way in the attached version.
>
> New patch attached.
>

Code looks good to me.  I have not tested the patch.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com