Thread: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

[COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Alvaro Herrera
Date:
Fix freezing of a dead HOT-updated tuple

Vacuum calls page-level HOT prune to remove dead HOT tuples before doing
liveness checks (HeapTupleSatisfiesVacuum) on the remaining tuples.  But
concurrent transaction commit/abort may turn DEAD some of the HOT tuples
that survived the prune, before HeapTupleSatisfiesVacuum tests them.
This happens to activate the code that decides to freeze the tuple ...
which resuscitates it, duplicating data.

(This is especially bad if there's any unique constraints, because those
are now internally violated due to the duplicate entries, though you
won't know until you try to REINDEX or dump/restore the table.)

One possible fix would be to simply skip doing anything to the tuple,
and hope that the next HOT prune would remove it.  But there is a
problem: if the tuple is older than freeze horizon, this would leave an
unfrozen XID behind, and if no HOT prune happens to clean it up before
the containing pg_clog segment is truncated away, it'd later cause an
error when the XID is looked up.

Fix the problem by having the tuple freezing routines cope with the
situation: don't freeze the tuple (and keep it dead).  In the cases that
the XID is older than the freeze age, set the HEAP_XMAX_COMMITTED flag
so that there is no need to look up the XID in pg_clog later on.

An isolation test is included, authored by Michael Paquier, loosely
based on Daniel Wood's original reproducer.  It only tests one
particular scenario, though, not all the possible ways for this problem
to surface; it be good to have a more reliable way to test this more
fully, but it'd require more work.
In message https://postgr.es/m/20170911140103.5akxptyrwgpc25bw@alvherre.pgsql
I outlined another test case (more closely matching Dan Wood's) that
exposed a few more ways for the problem to occur.

Backpatch all the way back to 9.3, where this problem was introduced by
multixact juggling.  In branches 9.3 and 9.4, this includes a backpatch
of commit e5ff9fefcd50 (of 9.5 era), since the original is not
correctable without matching the coding pattern in 9.5 up.

Reported-by: Daniel Wood
Diagnosed-by: Daniel Wood
Reviewed-by: Yi Wen Wong, Michaël Paquier
Discussion: https://postgr.es/m/E5711E62-8FDF-4DCA-A888-C200BF6B5742@amazon.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/20b655224249e6d2daf7ef0595995228baddb381

Modified Files
--------------
src/backend/access/heap/heapam.c                |  57 +++++++++----
src/backend/commands/vacuumlazy.c               |  20 ++---
src/test/isolation/expected/freeze-the-dead.out | 101 ++++++++++++++++++++++++
src/test/isolation/isolation_schedule           |   1 +
src/test/isolation/specs/freeze-the-dead.spec   |  27 +++++++
5 files changed, 179 insertions(+), 27 deletions(-)


--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Peter Geoghegan
Date:
On Thu, Sep 28, 2017 at 7:47 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Fix freezing of a dead HOT-updated tuple

If I run Dan Wood's test case again, the obvious symptom (spurious
duplicates) goes away. However, the enhanced amcheck, and thus CREATE
INDEX/REINDEX, still isn't happy about this:

postgres=# select bt_index_check('t_pkey', true);
DEBUG:  00000: verifying presence of required tuples in index "t_pkey"
LOCATION:  bt_check_every_level, verify_nbtree.c:424
ERROR:  XX000: failed to find parent tuple for heap-only tuple at
(0,6) in table "t"
LOCATION:  IndexBuildHeapRangeScan, index.c:2597
Time: 3.699 ms

-- 
Peter Geoghegan


-- 
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Andres Freund
Date:
Hi,

On 2017-09-28 14:47:53 +0000, Alvaro Herrera wrote:
> Fix freezing of a dead HOT-updated tuple
>
> Vacuum calls page-level HOT prune to remove dead HOT tuples before doing
> liveness checks (HeapTupleSatisfiesVacuum) on the remaining tuples.  But
> concurrent transaction commit/abort may turn DEAD some of the HOT tuples
> that survived the prune, before HeapTupleSatisfiesVacuum tests them.
> This happens to activate the code that decides to freeze the tuple ...
> which resuscitates it, duplicating data.
>
> (This is especially bad if there's any unique constraints, because those
> are now internally violated due to the duplicate entries, though you
> won't know until you try to REINDEX or dump/restore the table.)
>
> One possible fix would be to simply skip doing anything to the tuple,
> and hope that the next HOT prune would remove it.  But there is a
> problem: if the tuple is older than freeze horizon, this would leave an
> unfrozen XID behind, and if no HOT prune happens to clean it up before
> the containing pg_clog segment is truncated away, it'd later cause an
> error when the XID is looked up.
>
> Fix the problem by having the tuple freezing routines cope with the
> situation: don't freeze the tuple (and keep it dead).  In the cases that
> the XID is older than the freeze age, set the HEAP_XMAX_COMMITTED flag
> so that there is no need to look up the XID in pg_clog later on.

I think this is the wrong fix - the assumption that ctid chains can be
validated based on the prev-xmax = cur-xmin is fairly ingrained into the
system, and we shouldn't just be breaking it. The need to later
lobotomize the checks, in a5736bf754, is some evidence of that.

I spent some time discussing this with Robert today (with both of us
alternating between feeling the other and ourselves as stupid), and the
conclusion I think is that the problem is on the pruning, rather than
the freezing side.

FWIW, I don't think the explanation in the commit message of how the
problem triggers is actually correct - the testcase you added doesn't
have any transactions concurrently committing / aborting when
crashing. Rather the problem is that the liveliness checks for freezing
is different from the ones for pruning. HTSV considers xmin
RECENTLY_DEAD when there's a multi xmax with at least one alive locker,
whereas pruning thinks it has to do something because there's the member
xid below the cutoff. No concurrent activity is needed to trigger that.

I think the problem is on the pruning, rather than the freezing side. We
can't freeze a tuple if it has an alive predecessor - rather than
weakining this, we should be fixing the pruning to not have the alive
predecessor.

The relevant logic in HTSV is:
if (tuple->t_infomask & HEAP_XMAX_IS_MULTI){    TransactionId xmax;
    if (MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))    {        /* already checked above */
Assert(!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask));
        xmax = HeapTupleGetUpdateXid(tuple);
        /* not LOCKED_ONLY, so it has to have an xmax */        Assert(TransactionIdIsValid(xmax));
        if (TransactionIdIsInProgress(xmax))            return HEAPTUPLE_DELETE_IN_PROGRESS;        else if
(TransactionIdDidCommit(xmax))           /* there are still lockers around -- can't return DEAD here */
returnHEAPTUPLE_RECENTLY_DEAD;        /* updating transaction aborted */        return HEAPTUPLE_LIVE;
 

with the problematic branch being the TransactionIdDidCommit()
case. Rather than unconditionally returning HEAPTUPLE_RECENTLY_DEAD, we
should test the update xid against OldestXmin and return DEAD /
RECENTLY_DEAD according to that.

If the update xmin is actually below the cutoff we can remove the tuple
even if there's live lockers - the lockers will also be present in the
newer version of the tuple.  I verified that for me that fixes the
problem. Obviously that'd require some comment work and more careful
diagnosis.


I think a5736bf754c82d8b86674e199e232096c679201d might be dangerous in
the face of previously corrupted tuple chains and pg_upgraded clusters -
it can lead to tuples being considered related, even though they they're
from entirely independent hot chains. Especially when upgrading 9.3 post
your fix, to current releases.


In short, I think the two commits should be reverted, and replaced with
a fix along what I'm outlining above.

There'll be some trouble for people that upgraded to an unreleased
version, but I don't really see what we could do about that.

I could be entirely wrong - I've been travelling for the last two weeks
and my brain is somewhat more fried than usual.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Robert Haas
Date:
On Thu, Nov 2, 2017 at 4:50 PM, Andres Freund <andres@anarazel.de> wrote:
> I think a5736bf754c82d8b86674e199e232096c679201d might be dangerous in
> the face of previously corrupted tuple chains and pg_upgraded clusters -
> it can lead to tuples being considered related, even though they they're
> from entirely independent hot chains. Especially when upgrading 9.3 post
> your fix, to current releases.

I think this is a key point.  If the new behavior were merely not
entirely correct, we could perhaps refine it later.  But it's not only
not correct - it actually has the potential to create new problems
that didn't exist before those commits.  And if we release without
reverting those commits then we can't change our mind later.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Alvaro Herrera
Date:
Andres Freund wrote:

> I spent some time discussing this with Robert today (with both of us
> alternating between feeling the other and ourselves as stupid), and the
> conclusion I think is that the problem is on the pruning, rather than
> the freezing side.

Thanks both for spending some more time on this.

> I think the problem is on the pruning, rather than the freezing side. We
> can't freeze a tuple if it has an alive predecessor - rather than
> weakining this, we should be fixing the pruning to not have the alive
> predecessor.

I gave a look at HTSV back then, but I didn't find what the right tweak
was, but then I only tried changing the return value to DEAD and
DELETE_IN_PROGRESS; the thought of selecting DEAD or RECENTLY_DEAD based
on OldestXmin didn't occur to me ... I was thinking that the fact that
there were live lockers meant that the tuple could not be removed,
obviously failing to notice that the subsequent versions of the tuple
would be good enough.

> If the update xmin is actually below the cutoff we can remove the tuple
> even if there's live lockers - the lockers will also be present in the
> newer version of the tuple.  I verified that for me that fixes the
> problem. Obviously that'd require some comment work and more careful
> diagnosis.

Sounds good.

I'll revert those commits then, keeping the test, and then you can
commit your change.  OK?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
Hi,

On 2017-11-02 13:49:47 +0100, Alvaro Herrera wrote:
> Andres Freund wrote:
> > I think the problem is on the pruning, rather than the freezing side. We
> > can't freeze a tuple if it has an alive predecessor - rather than
> > weakining this, we should be fixing the pruning to not have the alive
> > predecessor.
> 
> I gave a look at HTSV back then, but I didn't find what the right tweak
> was, but then I only tried changing the return value to DEAD and
> DELETE_IN_PROGRESS; the thought of selecting DEAD or RECENTLY_DEAD based
> on OldestXmin didn't occur to me ... I was thinking that the fact that
> there were live lockers meant that the tuple could not be removed,
> obviously failing to notice that the subsequent versions of the tuple
> would be good enough.

I'll try to write up a commit based on that idea. I think there's some
comment work needed too, Robert and I were both confused by a few
things.
I'm unfortunately travelling atm - it's evening here, and I'll flying
back to the US all Saturday. I'm fairly sure I'll be able to come up
with a decent patch tomorrow, but I'll need review and testing by
others.


> > If the update xmin is actually below the cutoff we can remove the tuple
> > even if there's live lockers - the lockers will also be present in the
> > newer version of the tuple.  I verified that for me that fixes the
> > problem. Obviously that'd require some comment work and more careful
> > diagnosis.
> 
> Sounds good.
> 
> I'll revert those commits then, keeping the test, and then you can
> commit your change.  OK?

Generally that sounds good - but you can't keep the testcase in without
the new fix - the buildfarm would immediately turn red. I guess the best
thing would be to temporarily remove it from the schedule?


Do we care about people upgrading to unreleased versions? We could do
nothing, document it in the release notes, or ???


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> Do we care about people upgrading to unreleased versions? We could do
> nothing, document it in the release notes, or ???

Do nothing.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Do we care about people upgrading to unreleased versions? We could do
> > nothing, document it in the release notes, or ???
>
> Do nothing.

Agreed.  Not much we can do there.

Thanks!

Stephen

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Alvaro Herrera
Date:
Stephen Frost wrote:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > Do we care about people upgrading to unreleased versions? We could do
> > > nothing, document it in the release notes, or ???
> > 
> > Do nothing.
> 
> Agreed.  Not much we can do there.

Pushed the reverts.

I noticed while doing so that REL_10_STABLE contains the bogus commits.  
Does that change our opinion regarding what to do for people upgrading
to a version containing the broken commits?  I don't think so, because
 1) we hope that not many people will trust their data to 10.0    immediately after release 2) the bug is very low
probability3) it doesn't look like we can do a lot about it anyway.
 

I'll experiment with Andres' proposed fix now.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Robert Haas
Date:
On Thu, Nov 2, 2017 at 8:25 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Pushed the reverts.
>
> I noticed while doing so that REL_10_STABLE contains the bogus commits.
> Does that change our opinion regarding what to do for people upgrading
> to a version containing the broken commits?  I don't think so, because
>
>   1) we hope that not many people will trust their data to 10.0
>      immediately after release
>   2) the bug is very low probability
>   3) it doesn't look like we can do a lot about it anyway.

Just to be clear, it looks like "Fix freezing of a dead HOT-updated
tuple" (46c35116ae1acc8826705ef2a7b5d9110f9d6e84) went in before 10.0
was stamped, but "Fix traversal of half-frozen update chains"
(22576734b805fb0952f9be841ca8f643694ee868) went in afterwards and is
therefore unreleased at present.

Users of 10.0 who hit the code introduced by
46c35116ae1acc8826705ef2a7b5d9110f9d6e84 will have XIDs stored in the
xmax fields of tuples that predate relfrozenxid.  Those tuples will be
hinted-committed.  That's not good, but it might not really have much
in the way of consequences.  *IF* the next VACUUM doesn't get confused
by the old XID, then it will prune the tuple then and I think we'll be
OK.  And I think it won't, because it should just call
HeapTupleSatisfiesVacuum() and that should see that
HEAP_XMAX_COMMITTED is set and not actually try to consult the old
CLOG.  If that hint bit can ever get lost - or fail to propagate to a
standby - then we have more trouble, but the fact that it's set by a
logged operation makes me hope that can't happen. Furthermore, that
follow-on VACUUM should indeed arrive in due time, because we will not
have marked the page all-visible -- HeapTupleSatisfiesVacuum() will
NOT have returned HEAPTUPLE_LIVE when called from lazy_scan_heap(),
and therefore we will have set all_visible = false.

The second commit (22576734b805fb0952f9be841ca8f643694ee868) is where
I think things get a lot more dangerous.  The problem (as Andres
pointed out to me this afternoon) is that it seems possible to end up
with a situation where there should be two HOT chains on a page, and
because of the weakened xmin/xmax checking rules, we end up thinking
that the second one is a continuation of the first one, which will be
all kinds of bad news.  That would be particularly likely to happen in
releases from before we invented HEAP_XMIN_FROZEN, when there's no
xmin/xmax matching at all, but could happen on later releases if we
are extraordinarily unlucky (i.e. xmin of the first tuple in the
second chain happens to be the same as the pre-freeze xmax in the old
chain, probably because the same XID was used to update the page in
two consecutive epochs).  Fortunately, that commit is (I think) not
released anywhere.

Personally, I think it would be best to push the release out a week.
I think we understand this well enough now that we can fix it
relatively easily, but haste makes bugs, and (I know you're all tired
of hearing me say this) patches that implicate the on-disk format are
scary.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Peter Geoghegan
Date:
On Thu, Nov 2, 2017 at 4:20 AM, Andres Freund <andres@anarazel.de> wrote:
> I think the problem is on the pruning, rather than the freezing side. We
> can't freeze a tuple if it has an alive predecessor - rather than
> weakining this, we should be fixing the pruning to not have the alive
> predecessor.

Excellent catch.

> If the update xmin is actually below the cutoff we can remove the tuple
> even if there's live lockers - the lockers will also be present in the
> newer version of the tuple.  I verified that for me that fixes the
> problem. Obviously that'd require some comment work and more careful
> diagnosis.

I didn't even know that that was safe.

> I think a5736bf754c82d8b86674e199e232096c679201d might be dangerous in
> the face of previously corrupted tuple chains and pg_upgraded clusters -
> it can lead to tuples being considered related, even though they they're
> from entirely independent hot chains. Especially when upgrading 9.3 post
> your fix, to current releases.

Frankly, I'm relieved that you got to this. I was highly suspicious of
a5736bf754c82d8b86674e199e232096c679201d, even beyond my specific,
actionable concern about how it failed to handle the
9.3/FrozenTransactionId xmin case as special. As I went into in the
"heap/SLRU verification, relfrozenxid cut-off, and freeze-the-dead
bug" thread, these commits left us with a situation where there didn't
seem to be a reliable way of knowing whether or not it is safe to
interrogate clog for a given heap tuple using a tool like amcheck.
And, it wasn't obvious that you couldn't have a codepath that failed
to account for pre-cutoff non-frozen tuples -- codepaths that call
TransactionIdDidCommit() despite it actually being unsafe.

If I'm not mistaken, your proposed fix restores sanity there.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Peter Geoghegan
Date:
On Thu, Nov 2, 2017 at 9:44 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> The second commit (22576734b805fb0952f9be841ca8f643694ee868) is where
> I think things get a lot more dangerous.  The problem (as Andres
> pointed out to me this afternoon) is that it seems possible to end up
> with a situation where there should be two HOT chains on a page, and
> because of the weakened xmin/xmax checking rules, we end up thinking
> that the second one is a continuation of the first one, which will be
> all kinds of bad news.  That would be particularly likely to happen in
> releases from before we invented HEAP_XMIN_FROZEN, when there's no
> xmin/xmax matching at all, but could happen on later releases if we
> are extraordinarily unlucky (i.e. xmin of the first tuple in the
> second chain happens to be the same as the pre-freeze xmax in the old
> chain, probably because the same XID was used to update the page in
> two consecutive epochs).  Fortunately, that commit is (I think) not
> released anywhere.

FWIW, if you look at the second commit
(22576734b805fb0952f9be841ca8f643694ee868) carefully, you'll realize
that it doesn't even treat those two cases differently. It was buggy
even on its own terms. The FrozenTransactionId test used an xmin from
HeapTupleHeaderGetXmin(), not HeapTupleHeaderGetRawXmin().

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Just to be clear, it looks like "Fix freezing of a dead HOT-updated
> tuple" (46c35116ae1acc8826705ef2a7b5d9110f9d6e84) went in before 10.0
> was stamped, but "Fix traversal of half-frozen update chains"
> (22576734b805fb0952f9be841ca8f643694ee868) went in afterwards and is
> therefore unreleased at present.

Thanks for doing this analysis of the actual effects in 10.0.

> Personally, I think it would be best to push the release out a week.

I would only be in favor of that if there were some reason to think that
the bug is worse now than it's been in the four years since 9.3 was
released.  Otherwise, we should ship the bug fixes we have on-schedule.
I think it's a very very safe bet that there are other data-loss-causing
bugs in there, so I see no good reason for panicking over this one.

> ... and (I know you're all tired of hearing me say this) patches that
> implicate the on-disk format are scary.

Agreed, but how is that relevant now that the bogus patches are reverted?
        regards, tom lane


-- 
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Robert Haas
Date:
On Thu, Nov 2, 2017 at 10:26 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Thu, Nov 2, 2017 at 9:44 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> The second commit (22576734b805fb0952f9be841ca8f643694ee868) is where
>> I think things get a lot more dangerous.  The problem (as Andres
>> pointed out to me this afternoon) is that it seems possible to end up
>> with a situation where there should be two HOT chains on a page, and
>> because of the weakened xmin/xmax checking rules, we end up thinking
>> that the second one is a continuation of the first one, which will be
>> all kinds of bad news.  That would be particularly likely to happen in
>> releases from before we invented HEAP_XMIN_FROZEN, when there's no
>> xmin/xmax matching at all, but could happen on later releases if we
>> are extraordinarily unlucky (i.e. xmin of the first tuple in the
>> second chain happens to be the same as the pre-freeze xmax in the old
>> chain, probably because the same XID was used to update the page in
>> two consecutive epochs).  Fortunately, that commit is (I think) not
>> released anywhere.
>
> FWIW, if you look at the second commit
> (22576734b805fb0952f9be841ca8f643694ee868) carefully, you'll realize
> that it doesn't even treat those two cases differently. It was buggy
> even on its own terms. The FrozenTransactionId test used an xmin from
> HeapTupleHeaderGetXmin(), not HeapTupleHeaderGetRawXmin().

Oh, wow.  You seem to be correct.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Robert Haas
Date:
On Thu, Nov 2, 2017 at 10:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Personally, I think it would be best to push the release out a week.
>
> I would only be in favor of that if there were some reason to think that
> the bug is worse now than it's been in the four years since 9.3 was
> released.  Otherwise, we should ship the bug fixes we have on-schedule.
> I think it's a very very safe bet that there are other data-loss-causing
> bugs in there, so I see no good reason for panicking over this one.

Well, my thought was that delaying this release for a week would be
better than either (a) doing an extra minor release just to get this
fix out or (b) waiting another three months to release this fix.  The
former seems like fairly unnecessary work, and the latter doesn't seem
particularly responsible.  Users can't reasonably expect us to fix
data-loss-causing bugs that we don't know about yet, but they can
reasonably expect us to issue fixes promptly for ones that we do know
about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Well, my thought was that delaying this release for a week would be
> better than either (a) doing an extra minor release just to get this
> fix out or (b) waiting another three months to release this fix.  The
> former seems like fairly unnecessary work, and the latter doesn't seem
> particularly responsible.  Users can't reasonably expect us to fix
> data-loss-causing bugs that we don't know about yet, but they can
> reasonably expect us to issue fixes promptly for ones that we do know
> about.

Our experience with "hold the release waiting for a fix for bug X"
decisions has been consistently bad.  Furthermore, if we can't produce
a patch we trust by Monday, I would much rather that we not do it
in a rushed fashion at all.  I think it would be entirely reasonable
to consider making off-cadence releases in, perhaps, a month, once
the dust is entirely settled.
        regards, tom lane


-- 
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
On 2017-11-02 06:05:51 -0700, Andres Freund wrote:
> Hi,
> 
> On 2017-11-02 13:49:47 +0100, Alvaro Herrera wrote:
> > Andres Freund wrote:
> > > I think the problem is on the pruning, rather than the freezing side. We
> > > can't freeze a tuple if it has an alive predecessor - rather than
> > > weakining this, we should be fixing the pruning to not have the alive
> > > predecessor.
> > 
> > I gave a look at HTSV back then, but I didn't find what the right tweak
> > was, but then I only tried changing the return value to DEAD and
> > DELETE_IN_PROGRESS; the thought of selecting DEAD or RECENTLY_DEAD based
> > on OldestXmin didn't occur to me ... I was thinking that the fact that
> > there were live lockers meant that the tuple could not be removed,
> > obviously failing to notice that the subsequent versions of the tuple
> > would be good enough.
> 
> I'll try to write up a commit based on that idea. I think there's some
> comment work needed too, Robert and I were both confused by a few
> things.
> I'm unfortunately travelling atm - it's evening here, and I'll flying
> back to the US all Saturday. I'm fairly sure I'll be able to come up
> with a decent patch tomorrow, but I'll need review and testing by
> others.

Here's that patch.  I've stared at this some, and Robert did too. Robert
mentioned that the commit message might need some polish and I'm not
100% sure about the error message texts yet.

I'm not yet convinced that the new elog in vacuumlazy can never trigger
- but I also don't think we want to actually freeze the tuple in that
case.

Staring at the vacuumlazy hunk I think I might have found a related bug:
heap_update_tuple() just copies the old xmax to the new tuple's xmax if
a multixact and still running.  It does so without verifying liveliness
of members.  Isn't that buggy? Consider what happens if we have three
blocks: 1 has free space, two is being vacuumed and is locked, three is
full and has a tuple that's key share locked by a live tuple and is
updated by a dead xmax from before the xmin horizon. In that case afaict
the multi will be copied from the third page to the first one.  Which is
quite bad, because vacuum already processed it, and we'll set
relfrozenxid accordingly.  I hope I'm missing something here?

Greetings,

Andres Freund

-- 
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Attachment

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Peter Geoghegan
Date:
Andres Freund <andres@anarazel.de> wrote:
>Here's that patch.  I've stared at this some, and Robert did too. Robert
>mentioned that the commit message might need some polish and I'm not
>100% sure about the error message texts yet.

The commit message should probably say that the bug involves the
resurrection of previously dead tuples, which is different to there
being duplicates because a constraint is not enforced because HOT chains
are broken (that's a separate, arguably less serious problem).

>Staring at the vacuumlazy hunk I think I might have found a related bug:
>heap_update_tuple() just copies the old xmax to the new tuple's xmax if
>a multixact and still running.  It does so without verifying liveliness
>of members.  Isn't that buggy? Consider what happens if we have three
>blocks: 1 has free space, two is being vacuumed and is locked, three is
>full and has a tuple that's key share locked by a live tuple and is
>updated by a dead xmax from before the xmin horizon. In that case afaict
>the multi will be copied from the third page to the first one.  Which is
>quite bad, because vacuum already processed it, and we'll set
>relfrozenxid accordingly.  I hope I'm missing something here?

Can you be more specific about what you mean here? I think that I
understand where you're going with this, but I'm not sure.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Alvaro Herrera
Date:
Peter Geoghegan wrote:
> Andres Freund <andres@anarazel.de> wrote:

> > Staring at the vacuumlazy hunk I think I might have found a related bug:
> > heap_update_tuple() just copies the old xmax to the new tuple's xmax if
> > a multixact and still running.  It does so without verifying liveliness
> > of members.  Isn't that buggy? Consider what happens if we have three
> > blocks: 1 has free space, two is being vacuumed and is locked, three is
> > full and has a tuple that's key share locked by a live tuple and is
> > updated by a dead xmax from before the xmin horizon. In that case afaict
> > the multi will be copied from the third page to the first one.  Which is
> > quite bad, because vacuum already processed it, and we'll set
> > relfrozenxid accordingly.  I hope I'm missing something here?
> 
> Can you be more specific about what you mean here? I think that I
> understand where you're going with this, but I'm not sure.

He means that the tuple that heap_update moves to page 1 (which will no
longer be processed by vacuum) will contain a multixact that's older
than relminmxid -- because it is copied unchanged by heap_update instead
of properly checking against age limit.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Peter Geoghegan
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>He means that the tuple that heap_update moves to page 1 (which will no
>longer be processed by vacuum) will contain a multixact that's older
>than relminmxid -- because it is copied unchanged by heap_update instead
>of properly checking against age limit.

I see. The problem is more or less with this heap_update() code:
   /*    * And also prepare an Xmax value for the new copy of the tuple.  If there    * was no xmax previously, or
therewas one but all lockers are now gone,    * then use InvalidXid; otherwise, get the xmax from the old tuple.  (In
* rare cases that might also be InvalidXid and yet not have the    * HEAP_XMAX_INVALID bit set; that's fine.)    */
if((oldtup.t_data->t_infomask & HEAP_XMAX_INVALID) ||       HEAP_LOCKED_UPGRADED(oldtup.t_data->t_infomask) ||
(checked_lockers&& !locker_remains))       xmax_new_tuple = InvalidTransactionId;   else       xmax_new_tuple =
HeapTupleHeaderGetRawXmax(oldtup.t_data);

My naive guess is that we have to create a new MultiXactId here in at
least some cases, just like FreezeMultiXactId() sometimes does.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Andres Freund
Date:

On November 4, 2017 1:22:04 AM GMT+05:30, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>Peter Geoghegan wrote:
>> Andres Freund <andres@anarazel.de> wrote:
>
>> > Staring at the vacuumlazy hunk I think I might have found a related
>bug:
>> > heap_update_tuple() just copies the old xmax to the new tuple's
>xmax if
>> > a multixact and still running.  It does so without verifying
>liveliness
>> > of members.  Isn't that buggy? Consider what happens if we have
>three
>> > blocks: 1 has free space, two is being vacuumed and is locked,
>three is
>> > full and has a tuple that's key share locked by a live tuple and is
>> > updated by a dead xmax from before the xmin horizon. In that case
>afaict
>> > the multi will be copied from the third page to the first one.
>Which is
>> > quite bad, because vacuum already processed it, and we'll set
>> > relfrozenxid accordingly.  I hope I'm missing something here?
>>
>> Can you be more specific about what you mean here? I think that I
>> understand where you're going with this, but I'm not sure.
>
>He means that the tuple that heap_update moves to page 1 (which will no
>longer be processed by vacuum) will contain a multixact that's older
>than relminmxid -- because it is copied unchanged by heap_update
>instead
>of properly checking against age limit.

Right. That, or a member xid below relminxid. I think both scenarios are possible.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
On 2017-11-03 12:36:59 -0700, Peter Geoghegan wrote:
> Andres Freund <andres@anarazel.de> wrote:
> > Here's that patch.  I've stared at this some, and Robert did too. Robert
> > mentioned that the commit message might need some polish and I'm not
> > 100% sure about the error message texts yet.
> 
> The commit message should probably say that the bug involves the
> resurrection of previously dead tuples, which is different to there
> being duplicates because a constraint is not enforced because HOT chains
> are broken (that's a separate, arguably less serious problem).

The reason for that is that I hadn't yet quite figured out how the bug I
described in the commit message (and the previously committed testcase)
would cause that. I was planning to diagnose / experiment more with this
and write an email if I couldn't come up with an explanation.   The
committed test does *not* actually trigger that.

The reason I couldn't quite figure out how the problem triggers is that
the xmax removing branch in FreezeMultiXactId() can only be reached if
the multi is from before the cutoff - which it can't have been for a
single vacuum execution to trigger the bug, because that'd require the
multi to be running to falsely return RECENTLY_DEAD rather than DEAD (by
definition a multi can't be below the cutoff if running).

For the problem to occur I think vacuum has to be executed *twice*: The
first time through HTSV mistakenly returns RECENTLY_DEAD preventing the
tuple from being pruned. That triggers FreezeMultiXactId() to create a
*new* multi with dead members. At this point the database already is in
a bad state. Then in a second vacuum HTSV returns DEAD, but                 * Ordinarily, DEAD tuples would have been
removedby                 * heap_page_prune(), but it's possible that the tuple                 * state changed since
heap_page_prune()looked.  In                 * particular an INSERT_IN_PROGRESS tuple could have                 *
changedto DEAD if the inserter aborted.  So this                 * cannot be considered an error condition.
   *
 
..                if (HeapTupleIsHotUpdated(&tuple) ||                    HeapTupleIsHeapOnly(&tuple))                {
                  nkeep += 1;
 

prevents the tuple from being removed. If now the multi xmax is below
the xmin horizon it triggers        /*         * If the xid is older than the cutoff, it has to have aborted,         *
otherwisethe tuple would have gotten pruned away.         */        if (TransactionIdPrecedes(xid, cutoff_xid))
{           if (TransactionIdDidCommit(xid))                elog(ERROR, "can't freeze committed xmax");
*flags|= FRM_INVALIDATE_XMAX;
 
in FreezeMultiXact. Without the new elog, this then causes xmax to be
removed, reviving the tuple.


The current testcase, and I think the descriptions in the relevant
threads, all actually fail to point out the precise way the bug is
triggered.  As e.g. evidenced that the freeze-the-dead case appears to
not cause any failures in !assertion builds even if the bug is present.


The good news is that the error checks I added in the patch upthread
prevent all of this from happening, even though I'd not yet understood
the mechanics fully - it's imnsho pretty clear that we need to be more
paranoid in production builds around this.   A bunch of users that
triggered largely "invisible" corruption (the first vacuum described
above) will possibly run into one of these elog()s, but that seems far
preferrable to making the corruption a lot worse.


I think unfortunately the check + elog() in the                    HeapTupleIsHeapOnly(&tuple))                {
           nkeep += 1;
 
                    /*                     * If this were to happen for a tuple that actually                     *
neededto be frozen, we'd be in trouble, because                     * it'd leave a tuple below the relation's xmin
              * horizon alive.                     */                    if (heap_tuple_needs_freeze(tuple.t_data,
FreezeLimit,                                               MultiXactCutoff, buf))                    {
     elog(ERROR, "encountered tuple from before xid cutoff, sleeping");
 
case needs to go, because it's too coarse - there very well could be
lockers or such that need to be removed and where it's fine to do
so. The checks the actual freezing code ought to be sufficient however.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
On 2017-11-04 06:15:00 -0700, Andres Freund wrote:
> The current testcase, and I think the descriptions in the relevant
> threads, all actually fail to point out the precise way the bug is
> triggered.  As e.g. evidenced that the freeze-the-dead case appears to
> not cause any failures in !assertion builds even if the bug is present.

Trying to write up tests reproducing more of the issues in the area, I
think I might have found a third issue - although I'm not sure how
practically relevant it is:

When FreezeMultiXactId() decides it needs to create a new multi because
the old one is below the cutoff, that attempt can be defeated by the
multixact id caching. If the new multi has exactly the same members the
multixact id cache will just return the existing multi with the same
members. The cache will routinely be primed from the lookup of its
members.

I'm not yet sure how easily this can be hit in practice, because
commonly the multixact horizon should prevent a multi with all its
members living from being below the horizon. I found a situation where
that's not the case with the current bug, but I'm not sif that can
happen otherwise.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
On 2017-11-04 06:15:00 -0700, Andres Freund wrote:
> The reason for that is that I hadn't yet quite figured out how the bug I
> described in the commit message (and the previously committed testcase)
> would cause that. I was planning to diagnose / experiment more with this
> and write an email if I couldn't come up with an explanation.   The
> committed test does *not* actually trigger that.
> 
> The reason I couldn't quite figure out how the problem triggers is that
> [ long explanation ]

Attached is a version of the already existing regression test that both
reproduces the broken hot chain (and thus failing index lookups) and
then also the tuple reviving.  I don't see any need for letting this run
with arbitrary permutations.

Thanks to whoever allowed isolationtester permutations to go over
multiple lines and allow comments. I was wondering about adding that as
a feature just to discover it's already there ;)


What I'm currently wondering about is how much we need to harden
postgres against such existing corruption. If e.g. the hot chains are
broken somebody might have reindexed thinking the problem is fixed - but
if they then later vacuum everything goes to shit again, with dead rows
reappearing.  There's no way we can fix hot chains after the fact, but
preventing dead rows from reapparing seems important.  A minimal version
of that is fairly easy - we slap a bunch of if if
!TransactionIdDidCommit() elog(ERROR) at various code paths. But that'll
often trigger clog access errors when the problem occurred - if we want
to do better we need to pass down relfrozenxid/relminmxid to a few
functions.  I'm inclined to do so, but it'll make the patch larger...

Comments?

Greetings,

Andres Freund

-- 
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Attachment

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Peter Geoghegan
Date:
On Thu, Nov 9, 2017 at 2:24 PM, Andres Freund <andres@anarazel.de> wrote:
> Attached is a version of the already existing regression test that both
> reproduces the broken hot chain (and thus failing index lookups) and
> then also the tuple reviving.  I don't see any need for letting this run
> with arbitrary permutations.

I thought that the use of every possible permutation was excessive,
myself. It left us with an isolation test that didn't precisely
describe the behavior that is tested. What you came up with seems far,
far better, especially because of the comments you included. The mail
message-id references seem to add a lot, too.

> What I'm currently wondering about is how much we need to harden
> postgres against such existing corruption. If e.g. the hot chains are
> broken somebody might have reindexed thinking the problem is fixed - but
> if they then later vacuum everything goes to shit again, with dead rows
> reappearing.

I don't follow you here. Why would REINDEXing make the rows that
should be dead disappear again, even for a short period of time? It
might do so for index scans, I suppose, but not for sequential scans.
Are you concerned about a risk of somebody not noticing that
sequential scans are still broken?

Actually, on second thought, I take that back -- I don't think that
REINDEXing will even finish once a HOT chain is broken by the bug.
IndexBuildHeapScan() actually does quite a good job of making sure
that HOT chains are sane, which is how the enhanced amcheck notices
the bug here in practice. (Before this bug was discovered, I would
have expected amcheck to catch problems like it slightly later, during
the Bloom filter probe for that HOT chain...but, in fact, it never
gets there with corruption from this bug in practice, AFAIK.)

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
On 2017-11-09 16:02:17 -0800, Peter Geoghegan wrote:
> > What I'm currently wondering about is how much we need to harden
> > postgres against such existing corruption. If e.g. the hot chains are
> > broken somebody might have reindexed thinking the problem is fixed - but
> > if they then later vacuum everything goes to shit again, with dead rows
> > reappearing.
> 
> I don't follow you here. Why would REINDEXing make the rows that
> should be dead disappear again, even for a short period of time?

It's not the REINDEX that makes them reappear. It's the second
vacuum. The reindex part was about $user trying to fix the problem...
As you need two vacuums with appropriate cutoffs to hit the "rows
revive" problem, that'll often in practice not happen immediately.


> Actually, on second thought, I take that back -- I don't think that
> REINDEXing will even finish once a HOT chain is broken by the bug.
> IndexBuildHeapScan() actually does quite a good job of making sure
> that HOT chains are sane, which is how the enhanced amcheck notices
> the bug here in practice.

I think that's too optimistic.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Peter Geoghegan
Date:
On Thu, Nov 9, 2017 at 4:17 PM, Andres Freund <andres@anarazel.de> wrote:
>> I don't follow you here. Why would REINDEXing make the rows that
>> should be dead disappear again, even for a short period of time?
>
> It's not the REINDEX that makes them reappear.

Of course. I was just trying to make sense of what you said.

> It's the second
> vacuum. The reindex part was about $user trying to fix the problem...
> As you need two vacuums with appropriate cutoffs to hit the "rows
> revive" problem, that'll often in practice not happen immediately.

This explanation clears things up, though.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Peter Geoghegan
Date:
On Thu, Nov 9, 2017 at 4:17 PM, Andres Freund <andres@anarazel.de> wrote:
>> Actually, on second thought, I take that back -- I don't think that
>> REINDEXing will even finish once a HOT chain is broken by the bug.
>> IndexBuildHeapScan() actually does quite a good job of making sure
>> that HOT chains are sane, which is how the enhanced amcheck notices
>> the bug here in practice.
>
> I think that's too optimistic.

Why? Because the "find the TID of the root" logic in
IndexBuildHeapScan()/heap_get_root_tuples() won't reliably find the
actual root (it might be some other HOT chain root following TID
recycling by VACUUM)?

Assuming that's what you meant: I would have thought that the
xmin/xmax matching within heap_get_root_tuples() makes the sanity
checking fairly reliable in practice.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
On 2017-11-09 16:45:07 -0800, Peter Geoghegan wrote:
> On Thu, Nov 9, 2017 at 4:17 PM, Andres Freund <andres@anarazel.de> wrote:
> >> Actually, on second thought, I take that back -- I don't think that
> >> REINDEXing will even finish once a HOT chain is broken by the bug.
> >> IndexBuildHeapScan() actually does quite a good job of making sure
> >> that HOT chains are sane, which is how the enhanced amcheck notices
> >> the bug here in practice.
> >
> > I think that's too optimistic.
> 
> Why? Because the "find the TID of the root" logic in
> IndexBuildHeapScan()/heap_get_root_tuples() won't reliably find the
> actual root (it might be some other HOT chain root following TID
> recycling by VACUUM)?

Primarily because it's not an anti-corruption tool. I'd be surprised if
there weren't ways to corrupt the page using these corruptions that
aren't detected by it.  But even if it were, I don't think there's
enough information to do so in the general case.  You very well can end
up with pages where subsequent hot pruning has removed a good bit of the
direct evidence of this bug.

But I'm not really sure why the error detection capabilities of matter
much for the principal point I raised, which is how much work we need to
do to not further worsen the corruption.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Peter Geoghegan
Date:


On Thu, Nov 9, 2017 at 4:53 PM, Andres Freund <andres@anarazel.de> wrote:
> Primarily because it's not an anti-corruption tool. I'd be surprised if
> there weren't ways to corrupt the page using these corruptions that
> aren't detected by it.

It's very hard to assess the risk of missing something that's actually detectable with total confidence, but I think that the check is actually very thorough.

> But even if it were, I don't think there's
> enough information to do so in the general case. You very well can end
> up with pages where subsequent hot pruning has removed a good bit of the
> direct evidence of this bug.

Sure, but maybe those are cases that can't get any worse anyway. So the question of avoiding making it worse doesn't arise. 

> But I'm not really sure why the error detection capabilities of matter
> much for the principal point I raised, which is how much work we need to
> do to not further worsen the corruption.

You're right. Just trying to put the risk in context, and to understand the extent of the concern that you have. 

--
Peter Geoghegan

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
Hi,

On 2017-11-03 07:53:30 -0700, Andres Freund wrote:
> Here's that patch.  I've stared at this some, and Robert did too. Robert
> mentioned that the commit message might need some polish and I'm not
> 100% sure about the error message texts yet.
> 
> I'm not yet convinced that the new elog in vacuumlazy can never trigger
> - but I also don't think we want to actually freeze the tuple in that
> case.

I'm fairly sure it could be triggered, therefore I've rewritten that.

I've played around quite some with the attached patch. So far, after
applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
the situation worse for already existing corruption. HOT pruning can
change the exact appearance of existing corruption a bit, but I don't
think it can make the corruption meaningfully worse.  It's a bit
annoying and scary to add so many checks to backbranches but it kinda
seems required.  The error message texts aren't perfect, but these are
"should never be hit" type elog()s so I'm not too worried about that.


Please review!


One thing I'm wondering about is whether we have any way for users to
diagnose whether they need to dump & restore - I can't really think of
anything actually meaningful. Reindexing will find some instances, but
far from all. VACUUM (disable_page_skipping, freeze) pg_class will
also find a bunch.  Not a perfect story.


> Staring at the vacuumlazy hunk I think I might have found a related bug:
> heap_update_tuple() just copies the old xmax to the new tuple's xmax if
> a multixact and still running.  It does so without verifying liveliness
> of members.  Isn't that buggy? Consider what happens if we have three
> blocks: 1 has free space, two is being vacuumed and is locked, three is
> full and has a tuple that's key share locked by a live tuple and is
> updated by a dead xmax from before the xmin horizon. In that case afaict
> the multi will be copied from the third page to the first one.  Which is
> quite bad, because vacuum already processed it, and we'll set
> relfrozenxid accordingly.  I hope I'm missing something here?

Trying to write a testcase for that now.

Greetings,

Andres Freund

Attachment

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
Hi,

On 2017-11-03 07:53:30 -0700, Andres Freund wrote:
> Here's that patch.  I've stared at this some, and Robert did too. Robert
> mentioned that the commit message might need some polish and I'm not
> 100% sure about the error message texts yet.
> 
> I'm not yet convinced that the new elog in vacuumlazy can never trigger
> - but I also don't think we want to actually freeze the tuple in that
> case.

I'm fairly sure it could be triggered, therefore I've rewritten that.

I've played around quite some with the attached patch. So far, after
applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
the situation worse for already existing corruption. HOT pruning can
change the exact appearance of existing corruption a bit, but I don't
think it can make the corruption meaningfully worse.  It's a bit
annoying and scary to add so many checks to backbranches but it kinda
seems required.  The error message texts aren't perfect, but these are
"should never be hit" type elog()s so I'm not too worried about that.


Please review!


One thing I'm wondering about is whether we have any way for users to
diagnose whether they need to dump & restore - I can't really think of
anything actually meaningful. Reindexing will find some instances, but
far from all. VACUUM (disable_page_skipping, freeze) pg_class will
also find a bunch.  Not a perfect story.


> Staring at the vacuumlazy hunk I think I might have found a related bug:
> heap_update_tuple() just copies the old xmax to the new tuple's xmax if
> a multixact and still running.  It does so without verifying liveliness
> of members.  Isn't that buggy? Consider what happens if we have three
> blocks: 1 has free space, two is being vacuumed and is locked, three is
> full and has a tuple that's key share locked by a live tuple and is
> updated by a dead xmax from before the xmin horizon. In that case afaict
> the multi will be copied from the third page to the first one.  Which is
> quite bad, because vacuum already processed it, and we'll set
> relfrozenxid accordingly.  I hope I'm missing something here?

Trying to write a testcase for that now.

Greetings,

Andres Freund

Attachment

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
> Hi,
> 
> On 2017-11-03 07:53:30 -0700, Andres Freund wrote:
> > Here's that patch.  I've stared at this some, and Robert did too. Robert
> > mentioned that the commit message might need some polish and I'm not
> > 100% sure about the error message texts yet.
> > 
> > I'm not yet convinced that the new elog in vacuumlazy can never trigger
> > - but I also don't think we want to actually freeze the tuple in that
> > case.
> 
> I'm fairly sure it could be triggered, therefore I've rewritten that.
> 
> I've played around quite some with the attached patch. So far, after
> applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
> the situation worse for already existing corruption. HOT pruning can
> change the exact appearance of existing corruption a bit, but I don't
> think it can make the corruption meaningfully worse.  It's a bit
> annoying and scary to add so many checks to backbranches but it kinda
> seems required.  The error message texts aren't perfect, but these are
> "should never be hit" type elog()s so I'm not too worried about that.
> 
> 
> Please review!

Please note:
I'd forgotten to git add the change to isolation_schedule re-activating
the freeze-the-dead regression test.

- Andres


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
> Hi,
> 
> On 2017-11-03 07:53:30 -0700, Andres Freund wrote:
> > Here's that patch.  I've stared at this some, and Robert did too. Robert
> > mentioned that the commit message might need some polish and I'm not
> > 100% sure about the error message texts yet.
> > 
> > I'm not yet convinced that the new elog in vacuumlazy can never trigger
> > - but I also don't think we want to actually freeze the tuple in that
> > case.
> 
> I'm fairly sure it could be triggered, therefore I've rewritten that.
> 
> I've played around quite some with the attached patch. So far, after
> applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
> the situation worse for already existing corruption. HOT pruning can
> change the exact appearance of existing corruption a bit, but I don't
> think it can make the corruption meaningfully worse.  It's a bit
> annoying and scary to add so many checks to backbranches but it kinda
> seems required.  The error message texts aren't perfect, but these are
> "should never be hit" type elog()s so I'm not too worried about that.
> 
> 
> Please review!

Please note:
I'd forgotten to git add the change to isolation_schedule re-activating
the freeze-the-dead regression test.

- Andres


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
> > Staring at the vacuumlazy hunk I think I might have found a related bug:
> > heap_update_tuple() just copies the old xmax to the new tuple's xmax if
> > a multixact and still running.  It does so without verifying liveliness
> > of members.  Isn't that buggy? Consider what happens if we have three
> > blocks: 1 has free space, two is being vacuumed and is locked, three is
> > full and has a tuple that's key share locked by a live tuple and is
> > updated by a dead xmax from before the xmin horizon. In that case afaict
> > the multi will be copied from the third page to the first one.  Which is
> > quite bad, because vacuum already processed it, and we'll set
> > relfrozenxid accordingly.  I hope I'm missing something here?
> 
> Trying to write a testcase for that now.

This indeed happens, but I can't quite figure out a way to write an
isolationtester test for this. The problem is that to have something
reproducible one has to make vacuum block on a cleanup lock, but that
currently doesn't register as waiting for the purpose of
isolationtester's logic.

Greetings,

Andres Freund


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
> > Staring at the vacuumlazy hunk I think I might have found a related bug:
> > heap_update_tuple() just copies the old xmax to the new tuple's xmax if
> > a multixact and still running.  It does so without verifying liveliness
> > of members.  Isn't that buggy? Consider what happens if we have three
> > blocks: 1 has free space, two is being vacuumed and is locked, three is
> > full and has a tuple that's key share locked by a live tuple and is
> > updated by a dead xmax from before the xmin horizon. In that case afaict
> > the multi will be copied from the third page to the first one.  Which is
> > quite bad, because vacuum already processed it, and we'll set
> > relfrozenxid accordingly.  I hope I'm missing something here?
> 
> Trying to write a testcase for that now.

This indeed happens, but I can't quite figure out a way to write an
isolationtester test for this. The problem is that to have something
reproducible one has to make vacuum block on a cleanup lock, but that
currently doesn't register as waiting for the purpose of
isolationtester's logic.

Greetings,

Andres Freund


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
Hi,

On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
> Hi,
> 
> On 2017-11-03 07:53:30 -0700, Andres Freund wrote:
> > Here's that patch.  I've stared at this some, and Robert did too. Robert
> > mentioned that the commit message might need some polish and I'm not
> > 100% sure about the error message texts yet.
> > 
> > I'm not yet convinced that the new elog in vacuumlazy can never trigger
> > - but I also don't think we want to actually freeze the tuple in that
> > case.
> 
> I'm fairly sure it could be triggered, therefore I've rewritten that.
> 
> I've played around quite some with the attached patch. So far, after
> applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
> the situation worse for already existing corruption. HOT pruning can
> change the exact appearance of existing corruption a bit, but I don't
> think it can make the corruption meaningfully worse.  It's a bit
> annoying and scary to add so many checks to backbranches but it kinda
> seems required.  The error message texts aren't perfect, but these are
> "should never be hit" type elog()s so I'm not too worried about that.
> 
> 
> Please review!

Ping? Alvaro, it'd be good to get some input here.

- Andres


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
Hi,

On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
> Hi,
> 
> On 2017-11-03 07:53:30 -0700, Andres Freund wrote:
> > Here's that patch.  I've stared at this some, and Robert did too. Robert
> > mentioned that the commit message might need some polish and I'm not
> > 100% sure about the error message texts yet.
> > 
> > I'm not yet convinced that the new elog in vacuumlazy can never trigger
> > - but I also don't think we want to actually freeze the tuple in that
> > case.
> 
> I'm fairly sure it could be triggered, therefore I've rewritten that.
> 
> I've played around quite some with the attached patch. So far, after
> applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
> the situation worse for already existing corruption. HOT pruning can
> change the exact appearance of existing corruption a bit, but I don't
> think it can make the corruption meaningfully worse.  It's a bit
> annoying and scary to add so many checks to backbranches but it kinda
> seems required.  The error message texts aren't perfect, but these are
> "should never be hit" type elog()s so I'm not too worried about that.
> 
> 
> Please review!

Ping? Alvaro, it'd be good to get some input here.

- Andres


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Michael Paquier
Date:
On Tue, Nov 21, 2017 at 4:18 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
>> On 2017-11-03 07:53:30 -0700, Andres Freund wrote:
>> > Here's that patch.  I've stared at this some, and Robert did too. Robert
>> > mentioned that the commit message might need some polish and I'm not
>> > 100% sure about the error message texts yet.
>> >
>> > I'm not yet convinced that the new elog in vacuumlazy can never trigger
>> > - but I also don't think we want to actually freeze the tuple in that
>> > case.
>>
>> I'm fairly sure it could be triggered, therefore I've rewritten that.
>>
>> I've played around quite some with the attached patch. So far, after
>> applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
>> the situation worse for already existing corruption. HOT pruning can
>> change the exact appearance of existing corruption a bit, but I don't
>> think it can make the corruption meaningfully worse.  It's a bit
>> annoying and scary to add so many checks to backbranches but it kinda
>> seems required.  The error message texts aren't perfect, but these are
>> "should never be hit" type elog()s so I'm not too worried about that.
>>
>> Please review!
>
> Ping? Alvaro, it'd be good to get some input here.

Note that I will be able to jump on the ship after being released from
commit fest duties. This is likely a multi-day task for testing and
looking at it, and I am not the most knowledgeable human being with
this code.
-- 
Michael


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Michael Paquier
Date:
On Tue, Nov 21, 2017 at 4:18 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
>> On 2017-11-03 07:53:30 -0700, Andres Freund wrote:
>> > Here's that patch.  I've stared at this some, and Robert did too. Robert
>> > mentioned that the commit message might need some polish and I'm not
>> > 100% sure about the error message texts yet.
>> >
>> > I'm not yet convinced that the new elog in vacuumlazy can never trigger
>> > - but I also don't think we want to actually freeze the tuple in that
>> > case.
>>
>> I'm fairly sure it could be triggered, therefore I've rewritten that.
>>
>> I've played around quite some with the attached patch. So far, after
>> applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
>> the situation worse for already existing corruption. HOT pruning can
>> change the exact appearance of existing corruption a bit, but I don't
>> think it can make the corruption meaningfully worse.  It's a bit
>> annoying and scary to add so many checks to backbranches but it kinda
>> seems required.  The error message texts aren't perfect, but these are
>> "should never be hit" type elog()s so I'm not too worried about that.
>>
>> Please review!
>
> Ping? Alvaro, it'd be good to get some input here.

Note that I will be able to jump on the ship after being released from
commit fest duties. This is likely a multi-day task for testing and
looking at it, and I am not the most knowledgeable human being with
this code.
-- 
Michael


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
On 2017-11-20 11:18:45 -0800, Andres Freund wrote:
> Hi,
> 
> On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
> > Hi,
> > 
> > On 2017-11-03 07:53:30 -0700, Andres Freund wrote:
> > > Here's that patch.  I've stared at this some, and Robert did too. Robert
> > > mentioned that the commit message might need some polish and I'm not
> > > 100% sure about the error message texts yet.
> > > 
> > > I'm not yet convinced that the new elog in vacuumlazy can never trigger
> > > - but I also don't think we want to actually freeze the tuple in that
> > > case.
> > 
> > I'm fairly sure it could be triggered, therefore I've rewritten that.
> > 
> > I've played around quite some with the attached patch. So far, after
> > applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
> > the situation worse for already existing corruption. HOT pruning can
> > change the exact appearance of existing corruption a bit, but I don't
> > think it can make the corruption meaningfully worse.  It's a bit
> > annoying and scary to add so many checks to backbranches but it kinda
> > seems required.  The error message texts aren't perfect, but these are
> > "should never be hit" type elog()s so I'm not too worried about that.
> > 
> > 
> > Please review!
> 
> Ping? Alvaro, it'd be good to get some input here.

Ping.  I'm a bit surprised that a bug fixing a significant data
corruption issue has gotten no reviews at all.

Greetings,

Andres Freund


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
On 2017-11-20 11:18:45 -0800, Andres Freund wrote:
> Hi,
> 
> On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
> > Hi,
> > 
> > On 2017-11-03 07:53:30 -0700, Andres Freund wrote:
> > > Here's that patch.  I've stared at this some, and Robert did too. Robert
> > > mentioned that the commit message might need some polish and I'm not
> > > 100% sure about the error message texts yet.
> > > 
> > > I'm not yet convinced that the new elog in vacuumlazy can never trigger
> > > - but I also don't think we want to actually freeze the tuple in that
> > > case.
> > 
> > I'm fairly sure it could be triggered, therefore I've rewritten that.
> > 
> > I've played around quite some with the attached patch. So far, after
> > applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
> > the situation worse for already existing corruption. HOT pruning can
> > change the exact appearance of existing corruption a bit, but I don't
> > think it can make the corruption meaningfully worse.  It's a bit
> > annoying and scary to add so many checks to backbranches but it kinda
> > seems required.  The error message texts aren't perfect, but these are
> > "should never be hit" type elog()s so I'm not too worried about that.
> > 
> > 
> > Please review!
> 
> Ping? Alvaro, it'd be good to get some input here.

Ping.  I'm a bit surprised that a bug fixing a significant data
corruption issue has gotten no reviews at all.

Greetings,

Andres Freund


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Michael Paquier
Date:
On Wed, Dec 6, 2017 at 5:03 PM, Andres Freund <andres@anarazel.de> wrote:
> Ping.  I'm a bit surprised that a bug fixing a significant data
> corruption issue has gotten no reviews at all.

Note that I was planning to look at this problem today and tomorrow my
time, getting stuck for CF handling last week and conference this
week. If you think that helps, I'll be happy to help at the extent of
what I can do.
-- 
Michael


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Michael Paquier
Date:
On Wed, Dec 6, 2017 at 5:03 PM, Andres Freund <andres@anarazel.de> wrote:
> Ping.  I'm a bit surprised that a bug fixing a significant data
> corruption issue has gotten no reviews at all.

Note that I was planning to look at this problem today and tomorrow my
time, getting stuck for CF handling last week and conference this
week. If you think that helps, I'll be happy to help at the extent of
what I can do.
-- 
Michael


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Alvaro Herrera
Date:
I think you've done a stellar job of identifying what the actual problem
was.  I like the new (simpler) coding of that portion of
HeapTupleSatisfiesVacuum.

freeze-the-dead is not listed in isolation_schedule; an easy fix.
I confirm that the test crashes with an assertion failure without the
code fix, and that it doesn't with it.

I think the comparison to OldestXmin should be reversed:

            if (!TransactionIdPrecedes(xmax, OldestXmin))
                return HEAPTUPLE_RECENTLY_DEAD;

            return HEAPTUPLE_DEAD;

This way, an xmax that has exactly the OldestXmin value will return
RECENTLY_DEAD rather DEAD, which seems reasonable to me (since
OldestXmin value itself is supposed to be still possibly visible to
somebody).  Also, this way it is consistent with the other comparison to
OldestXmin at the bottom of the function.  There is no reason for the
"else" or the extra braces.

Put together, I propose the attached delta for 0001.

Your commit message does a poor job of acknowledging prior work on
diagnosing the problem starting from Dan's initial test case and patch.

I haven't looked at your 0002 yet.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Alvaro Herrera
Date:
I think you've done a stellar job of identifying what the actual problem
was.  I like the new (simpler) coding of that portion of
HeapTupleSatisfiesVacuum.

freeze-the-dead is not listed in isolation_schedule; an easy fix.
I confirm that the test crashes with an assertion failure without the
code fix, and that it doesn't with it.

I think the comparison to OldestXmin should be reversed:

            if (!TransactionIdPrecedes(xmax, OldestXmin))
                return HEAPTUPLE_RECENTLY_DEAD;

            return HEAPTUPLE_DEAD;

This way, an xmax that has exactly the OldestXmin value will return
RECENTLY_DEAD rather DEAD, which seems reasonable to me (since
OldestXmin value itself is supposed to be still possibly visible to
somebody).  Also, this way it is consistent with the other comparison to
OldestXmin at the bottom of the function.  There is no reason for the
"else" or the extra braces.

Put together, I propose the attached delta for 0001.

Your commit message does a poor job of acknowledging prior work on
diagnosing the problem starting from Dan's initial test case and patch.

I haven't looked at your 0002 yet.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Alvaro Herrera
Date:
Andres Freund wrote:

> I've played around quite some with the attached patch. So far, after
> applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
> the situation worse for already existing corruption. HOT pruning can
> change the exact appearance of existing corruption a bit, but I don't
> think it can make the corruption meaningfully worse.  It's a bit
> annoying and scary to add so many checks to backbranches but it kinda
> seems required.  The error message texts aren't perfect, but these are
> "should never be hit" type elog()s so I'm not too worried about that.

Looking at 0002: I agree with the stuff being done here.  I think a
couple of these checks could be moved one block outerwards in term of
scope; I don't see any reason why the check should not apply in that
case.  I didn't catch any place missing additional checks.

Despite these being "shouldn't happen" conditions, I think we should
turn these up all the way to ereports with an errcode and all, and also
report the XIDs being complained about.  No translation required,
though.  Other than those changes and minor copy editing a commit
(attached), 0002 looks good to me.

I started thinking it'd be good to report block number whenever anything
happened while scanning the relation.  The best way to go about this
seems to be to add an errcontext callback to lazy_scan_heap, so I attach
a WIP untested patch to add that.  (I'm not proposing this for
back-patch for now, mostly because I don't have the time/energy to push
for it right now.)

It appears that you got all the places that seem to reasonably need
additional checks.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Alvaro Herrera
Date:
Andres Freund wrote:

> I've played around quite some with the attached patch. So far, after
> applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
> the situation worse for already existing corruption. HOT pruning can
> change the exact appearance of existing corruption a bit, but I don't
> think it can make the corruption meaningfully worse.  It's a bit
> annoying and scary to add so many checks to backbranches but it kinda
> seems required.  The error message texts aren't perfect, but these are
> "should never be hit" type elog()s so I'm not too worried about that.

Looking at 0002: I agree with the stuff being done here.  I think a
couple of these checks could be moved one block outerwards in term of
scope; I don't see any reason why the check should not apply in that
case.  I didn't catch any place missing additional checks.

Despite these being "shouldn't happen" conditions, I think we should
turn these up all the way to ereports with an errcode and all, and also
report the XIDs being complained about.  No translation required,
though.  Other than those changes and minor copy editing a commit
(attached), 0002 looks good to me.

I started thinking it'd be good to report block number whenever anything
happened while scanning the relation.  The best way to go about this
seems to be to add an errcontext callback to lazy_scan_heap, so I attach
a WIP untested patch to add that.  (I'm not proposing this for
back-patch for now, mostly because I don't have the time/energy to push
for it right now.)

It appears that you got all the places that seem to reasonably need
additional checks.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Michael Paquier
Date:
On Thu, Dec 7, 2017 at 1:21 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Put together, I propose the attached delta for 0001.

I have been looking at Andres' 0001 and your tweaks here for some time
since yesterday...

I have also performed sanity checks using all the scripts that have
accumulated on my archives for this stuff. This looks solid to me. I
have not seen failures with broken hot chains, REINDEX, etc.

> This way, an xmax that has exactly the OldestXmin value will return
> RECENTLY_DEAD rather DEAD, which seems reasonable to me (since
> OldestXmin value itself is supposed to be still possibly visible to
> somebody).  Also, this way it is consistent with the other comparison to
> OldestXmin at the bottom of the function.  There is no reason for the
> "else" or the extra braces.

+1. It would be nice to add a comment in the patched portion
mentioning that the new code had better match what is at the bottom of
the function.

+       else if (!MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
+       {
+           /*
+            * Not in Progress, Not Committed, so either Aborted or crashed.
+            * Mark the Xmax as invalid.
+            */
+           SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
        }

-       /*
-        * Not in Progress, Not Committed, so either Aborted or crashed.
-        * Remove the Xmax.
-        */
-       SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
        return HEAPTUPLE_LIVE;

I would find cleaner if the last "else if" is put into its own
separate if condition, and that for a multixact still running this
refers to an updating transaction aborted so hint bits are not set.

> Your commit message does a poor job of acknowledging prior work on
> diagnosing the problem starting from Dan's initial test case and patch.

(Nit: I have extracted from the test case of Dan an isolation test,
which Andres has reduced to a subset of permutations. Peter G. also
complained about what is visibly the same bug we are discussing here
but without a test case.)
-- 
Michael


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Michael Paquier
Date:
On Thu, Dec 7, 2017 at 1:21 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Put together, I propose the attached delta for 0001.

I have been looking at Andres' 0001 and your tweaks here for some time
since yesterday...

I have also performed sanity checks using all the scripts that have
accumulated on my archives for this stuff. This looks solid to me. I
have not seen failures with broken hot chains, REINDEX, etc.

> This way, an xmax that has exactly the OldestXmin value will return
> RECENTLY_DEAD rather DEAD, which seems reasonable to me (since
> OldestXmin value itself is supposed to be still possibly visible to
> somebody).  Also, this way it is consistent with the other comparison to
> OldestXmin at the bottom of the function.  There is no reason for the
> "else" or the extra braces.

+1. It would be nice to add a comment in the patched portion
mentioning that the new code had better match what is at the bottom of
the function.

+       else if (!MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
+       {
+           /*
+            * Not in Progress, Not Committed, so either Aborted or crashed.
+            * Mark the Xmax as invalid.
+            */
+           SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
        }

-       /*
-        * Not in Progress, Not Committed, so either Aborted or crashed.
-        * Remove the Xmax.
-        */
-       SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
        return HEAPTUPLE_LIVE;

I would find cleaner if the last "else if" is put into its own
separate if condition, and that for a multixact still running this
refers to an updating transaction aborted so hint bits are not set.

> Your commit message does a poor job of acknowledging prior work on
> diagnosing the problem starting from Dan's initial test case and patch.

(Nit: I have extracted from the test case of Dan an isolation test,
which Andres has reduced to a subset of permutations. Peter G. also
complained about what is visibly the same bug we are discussing here
but without a test case.)
-- 
Michael


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Michael Paquier
Date:
On Thu, Dec 7, 2017 at 5:23 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Looking at 0002: I agree with the stuff being done here.

The level of details you are providing with a proper error code is an
improvement over the first version proposed in my opinion.

> I think a
> couple of these checks could be moved one block outerwards in term of
> scope; I don't see any reason why the check should not apply in that
> case. I didn't catch any place missing additional checks.

In FreezeMultiXactId() wouldn't it be better to issue an error as well
for this assertion?
Assert(!TransactionIdPrecedes(members[i].xid, cutoff_xid));

> Despite these being "shouldn't happen" conditions, I think we should
> turn these up all the way to ereports with an errcode and all, and also
> report the XIDs being complained about.  No translation required,
> though.  Other than those changes and minor copy editing a commit
> (attached), 0002 looks good to me.

+           if (!(tuple->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+               TransactionIdDidCommit(xid))
+               ereport(ERROR,
+                       (errcode(ERRCODE_DATA_CORRUPTED),
+                        errmsg("can't freeze committed xmax %u", xid)));
The usual wording used in errmsg is not the "can't" but "cannot".

+               ereport(ERROR,
+                       (errcode(ERRCODE_DATA_CORRUPTED),
+                        errmsg_internal("uncommitted Xmin %u from
before xid cutoff %u needs to be frozen",
+                                        xid, cutoff_xid)));
"Xmin" I have never seen, but "xmin" I did.

> I started thinking it'd be good to report block number whenever anything
> happened while scanning the relation.  The best way to go about this
> seems to be to add an errcontext callback to lazy_scan_heap, so I attach
> a WIP untested patch to add that.  (I'm not proposing this for
> back-patch for now, mostly because I don't have the time/energy to push
> for it right now.)

I would recommend to start a new thread and to add that patch to the
next commit fest as you would get more visibility and input from other
folks on -hackers. It looks like a good idea to me.
-- 
Michael


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Michael Paquier
Date:
On Thu, Dec 7, 2017 at 5:23 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Looking at 0002: I agree with the stuff being done here.

The level of details you are providing with a proper error code is an
improvement over the first version proposed in my opinion.

> I think a
> couple of these checks could be moved one block outerwards in term of
> scope; I don't see any reason why the check should not apply in that
> case. I didn't catch any place missing additional checks.

In FreezeMultiXactId() wouldn't it be better to issue an error as well
for this assertion?
Assert(!TransactionIdPrecedes(members[i].xid, cutoff_xid));

> Despite these being "shouldn't happen" conditions, I think we should
> turn these up all the way to ereports with an errcode and all, and also
> report the XIDs being complained about.  No translation required,
> though.  Other than those changes and minor copy editing a commit
> (attached), 0002 looks good to me.

+           if (!(tuple->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+               TransactionIdDidCommit(xid))
+               ereport(ERROR,
+                       (errcode(ERRCODE_DATA_CORRUPTED),
+                        errmsg("can't freeze committed xmax %u", xid)));
The usual wording used in errmsg is not the "can't" but "cannot".

+               ereport(ERROR,
+                       (errcode(ERRCODE_DATA_CORRUPTED),
+                        errmsg_internal("uncommitted Xmin %u from
before xid cutoff %u needs to be frozen",
+                                        xid, cutoff_xid)));
"Xmin" I have never seen, but "xmin" I did.

> I started thinking it'd be good to report block number whenever anything
> happened while scanning the relation.  The best way to go about this
> seems to be to add an errcontext callback to lazy_scan_heap, so I attach
> a WIP untested patch to add that.  (I'm not proposing this for
> back-patch for now, mostly because I don't have the time/energy to push
> for it right now.)

I would recommend to start a new thread and to add that patch to the
next commit fest as you would get more visibility and input from other
folks on -hackers. It looks like a good idea to me.
-- 
Michael


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
Hi,

On 2017-12-06 13:21:15 -0300, Alvaro Herrera wrote:
> I think you've done a stellar job of identifying what the actual problem
> was.  I like the new (simpler) coding of that portion of
> HeapTupleSatisfiesVacuum.

Thanks!

> freeze-the-dead is not listed in isolation_schedule; an easy fix.

Yea, I'd sent an update about that, stupidly forgot git amend the
commit...


> I confirm that the test crashes with an assertion failure without the
> code fix, and that it doesn't with it.
> 
> I think the comparison to OldestXmin should be reversed:
> 
>             if (!TransactionIdPrecedes(xmax, OldestXmin))
>                 return HEAPTUPLE_RECENTLY_DEAD;
> 
>             return HEAPTUPLE_DEAD;
> 
> This way, an xmax that has exactly the OldestXmin value will return
> RECENTLY_DEAD rather DEAD, which seems reasonable to me (since
> OldestXmin value itself is supposed to be still possibly visible to
> somebody).

Yes, I think you're right. That's a bug.


> Your commit message does a poor job of acknowledging prior work on
> diagnosing the problem starting from Dan's initial test case and patch.

Yea, you're right. I was writing it with 14h of jetlag, apparently that
does something to your brain...

Greetings,

Andres Freund


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
Hi,

On 2017-12-06 17:23:55 -0300, Alvaro Herrera wrote:
> > I've played around quite some with the attached patch. So far, after
> > applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
> > the situation worse for already existing corruption. HOT pruning can
> > change the exact appearance of existing corruption a bit, but I don't
> > think it can make the corruption meaningfully worse.  It's a bit
> > annoying and scary to add so many checks to backbranches but it kinda
> > seems required.  The error message texts aren't perfect, but these are
> > "should never be hit" type elog()s so I'm not too worried about that.
> 
> Looking at 0002: I agree with the stuff being done here.  I think a
> couple of these checks could be moved one block outerwards in term of
> scope; I don't see any reason why the check should not apply in that
> case.  I didn't catch any place missing additional checks.

I think I largely put them into the inner blocks because they were
guaranteed to be reached in those case (the horizon has to be before the
cutoff etc), and that way additional branches are avoided.


> Despite these being "shouldn't happen" conditions, I think we should
> turn these up all the way to ereports with an errcode and all, and also
> report the XIDs being complained about.  No translation required,
> though.  Other than those changes and minor copy editing a commit
> (attached), 0002 looks good to me.

Hm, I don't really care one way or another. I do see that you used
errmsg() in some places, errmsg_internal() in others. Was that
intentional?

> I started thinking it'd be good to report block number whenever anything
> happened while scanning the relation.  The best way to go about this
> seems to be to add an errcontext callback to lazy_scan_heap, so I attach
> a WIP untested patch to add that.  (I'm not proposing this for
> back-patch for now, mostly because I don't have the time/energy to push
> for it right now.)

That seems like a good idea. There's some cases where that could
increase log spam noticeably (unitialized blocks), but that seems
acceptable.


> +static void
> +lazy_scan_heap_cb(void *arg)
> +{
> +    LazyScanHeapInfo *info = (LazyScanHeapInfo *) arg;
> +
> +    if (info->blkno != InvalidBlockNumber)
> +        errcontext("while scanning page %u of relation %s",
> +                   info->blkno, RelationGetRelationName(info->relation));
> +    else
> +        errcontext("while vacuuming relation %s",
> +                   RelationGetRelationName(info->relation));
> +}

Hm, perhaps rephrase so both messages refer to vacuuming? E.g. just by
replacing scanning with vacuuming?

Greetings,

Andres Freund


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
On 2017-12-07 12:08:38 +0900, Michael Paquier wrote:
> > Your commit message does a poor job of acknowledging prior work on
> > diagnosing the problem starting from Dan's initial test case and patch.
> 
> (Nit: I have extracted from the test case of Dan an isolation test,
> which Andres has reduced to a subset of permutations. Peter G. also
> complained about what is visibly the same bug we are discussing here
> but without a test case.)

The previous versions of the test case didn't actually hit the real
issues, so I don't think that matter much.

Greetings,

Andres Freund


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Alvaro Herrera
Date:
Hello,

Andres Freund wrote:

> On 2017-12-06 17:23:55 -0300, Alvaro Herrera wrote:
> > > I've played around quite some with the attached patch. So far, after
> > > applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
> > > the situation worse for already existing corruption. HOT pruning can
> > > change the exact appearance of existing corruption a bit, but I don't
> > > think it can make the corruption meaningfully worse.  It's a bit
> > > annoying and scary to add so many checks to backbranches but it kinda
> > > seems required.  The error message texts aren't perfect, but these are
> > > "should never be hit" type elog()s so I'm not too worried about that.
> > 
> > Looking at 0002: I agree with the stuff being done here.  I think a
> > couple of these checks could be moved one block outerwards in term of
> > scope; I don't see any reason why the check should not apply in that
> > case.  I didn't catch any place missing additional checks.
> 
> I think I largely put them into the inner blocks because they were
> guaranteed to be reached in those case (the horizon has to be before the
> cutoff etc), and that way additional branches are avoided.

Hmm, it should be possible to call vacuum with a very low freeze_min_age
(which sets a very recent relfrozenxid), then shortly thereafter call it
with a large one, no?  So it's not really guaranteed ...


> > Despite these being "shouldn't happen" conditions, I think we should
> > turn these up all the way to ereports with an errcode and all, and also
> > report the XIDs being complained about.  No translation required,
> > though.  Other than those changes and minor copy editing a commit
> > (attached), 0002 looks good to me.
> 
> Hm, I don't really care one way or another. I do see that you used
> errmsg() in some places, errmsg_internal() in others. Was that
> intentional?

Eh, no, my intention was to make these all errmsg_internal() to avoid
translation (serves no purpose here).  Feel free to update the remaining
ones.


> > I started thinking it'd be good to report block number whenever anything
> > happened while scanning the relation.  The best way to go about this
> > seems to be to add an errcontext callback to lazy_scan_heap, so I attach
> > a WIP untested patch to add that.  (I'm not proposing this for
> > back-patch for now, mostly because I don't have the time/energy to push
> > for it right now.)
> 
> That seems like a good idea. There's some cases where that could
> increase log spam noticeably (unitialized blocks), but that seems
> acceptable.

Yeah, I noticed that and I agree it seems ok.

> > +    if (info->blkno != InvalidBlockNumber)
> > +        errcontext("while scanning page %u of relation %s",
> > +                   info->blkno, RelationGetRelationName(info->relation));
> > +    else
> > +        errcontext("while vacuuming relation %s",
> > +                   RelationGetRelationName(info->relation));
> 
> Hm, perhaps rephrase so both messages refer to vacuuming? E.g. just by
> replacing scanning with vacuuming?

Makes sense.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
On 2017-12-07 17:41:56 -0300, Alvaro Herrera wrote:
> > > Looking at 0002: I agree with the stuff being done here.  I think a
> > > couple of these checks could be moved one block outerwards in term of
> > > scope; I don't see any reason why the check should not apply in that
> > > case.  I didn't catch any place missing additional checks.
> > 
> > I think I largely put them into the inner blocks because they were
> > guaranteed to be reached in those case (the horizon has to be before the
> > cutoff etc), and that way additional branches are avoided.
> 
> Hmm, it should be possible to call vacuum with a very low freeze_min_age
> (which sets a very recent relfrozenxid), then shortly thereafter call it
> with a large one, no?  So it's not really guaranteed ...

Fair point!

- Andres


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
On 2017-12-07 18:32:51 +0900, Michael Paquier wrote:
> On Thu, Dec 7, 2017 at 5:23 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > Looking at 0002: I agree with the stuff being done here.
> 
> The level of details you are providing with a proper error code is an
> improvement over the first version proposed in my opinion.
> 
> > I think a
> > couple of these checks could be moved one block outerwards in term of
> > scope; I don't see any reason why the check should not apply in that
> > case. I didn't catch any place missing additional checks.
> 
> In FreezeMultiXactId() wouldn't it be better to issue an error as well
> for this assertion?
> Assert(!TransactionIdPrecedes(members[i].xid, cutoff_xid));

I'm not really concerned that much about pure lockers, they don't cause
permanent data corruption...


> > Despite these being "shouldn't happen" conditions, I think we should
> > turn these up all the way to ereports with an errcode and all, and also
> > report the XIDs being complained about.  No translation required,
> > though.  Other than those changes and minor copy editing a commit
> > (attached), 0002 looks good to me.

If you want to go around doing that in some more places we can do so in
master only...


> +           if (!(tuple->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
> +               TransactionIdDidCommit(xid))
> +               ereport(ERROR,
> +                       (errcode(ERRCODE_DATA_CORRUPTED),
> +                        errmsg("can't freeze committed xmax %u", xid)));
> The usual wording used in errmsg is not the "can't" but "cannot".
> 
> +               ereport(ERROR,
> +                       (errcode(ERRCODE_DATA_CORRUPTED),
> +                        errmsg_internal("uncommitted Xmin %u from
> before xid cutoff %u needs to be frozen",
> +                                        xid, cutoff_xid)));
> "Xmin" I have never seen, but "xmin" I did.

Changed...

Greetings,

Andres Freund


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
On 2017-12-07 18:32:51 +0900, Michael Paquier wrote:
> On Thu, Dec 7, 2017 at 5:23 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > Looking at 0002: I agree with the stuff being done here.
> 
> The level of details you are providing with a proper error code is an
> improvement over the first version proposed in my opinion.
> 
> > I think a
> > couple of these checks could be moved one block outerwards in term of
> > scope; I don't see any reason why the check should not apply in that
> > case. I didn't catch any place missing additional checks.
> 
> In FreezeMultiXactId() wouldn't it be better to issue an error as well
> for this assertion?
> Assert(!TransactionIdPrecedes(members[i].xid, cutoff_xid));

I'm not really concerned that much about pure lockers, they don't cause
permanent data corruption...


> > Despite these being "shouldn't happen" conditions, I think we should
> > turn these up all the way to ereports with an errcode and all, and also
> > report the XIDs being complained about.  No translation required,
> > though.  Other than those changes and minor copy editing a commit
> > (attached), 0002 looks good to me.

If you want to go around doing that in some more places we can do so in
master only...


> +           if (!(tuple->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
> +               TransactionIdDidCommit(xid))
> +               ereport(ERROR,
> +                       (errcode(ERRCODE_DATA_CORRUPTED),
> +                        errmsg("can't freeze committed xmax %u", xid)));
> The usual wording used in errmsg is not the "can't" but "cannot".
> 
> +               ereport(ERROR,
> +                       (errcode(ERRCODE_DATA_CORRUPTED),
> +                        errmsg_internal("uncommitted Xmin %u from
> before xid cutoff %u needs to be frozen",
> +                                        xid, cutoff_xid)));
> "Xmin" I have never seen, but "xmin" I did.

Changed...

Greetings,

Andres Freund


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
Hi,

On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
> diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
> index f93c194e182..7d163c91379 100644
> --- a/src/backend/access/heap/rewriteheap.c
> +++ b/src/backend/access/heap/rewriteheap.c
> @@ -407,7 +407,10 @@ rewrite_heap_tuple(RewriteState state,
>       * While we have our hands on the tuple, we may as well freeze any
>       * eligible xmin or xmax, so that future VACUUM effort can be saved.
>       */
> -    heap_freeze_tuple(new_tuple->t_data, state->rs_freeze_xid,
> +    heap_freeze_tuple(new_tuple->t_data,
> +                      state->rs_old_rel->rd_rel->relfrozenxid,
> +                      state->rs_old_rel->rd_rel->relminmxid,
> +                      state->rs_freeze_xid,
>                        state->rs_cutoff_multi);

Hm. So this requires backpatching the introduction of
RewriteStateData->rs_old_rel into 9.3, which in turn requires a new
argument to begin_heap_rewrite().  It originally was added in the
logical decoding commit (i.e. 9.4).

I'm fine with that, but it could theoretically cause issues for somebody
with an extension that calls begin_heap_rewrite() - which seems unlikely
and I couldn't find any that does so.

Does anybody have a problem with that?

Regards,

Andres


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
Hi,

On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
> diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
> index f93c194e182..7d163c91379 100644
> --- a/src/backend/access/heap/rewriteheap.c
> +++ b/src/backend/access/heap/rewriteheap.c
> @@ -407,7 +407,10 @@ rewrite_heap_tuple(RewriteState state,
>       * While we have our hands on the tuple, we may as well freeze any
>       * eligible xmin or xmax, so that future VACUUM effort can be saved.
>       */
> -    heap_freeze_tuple(new_tuple->t_data, state->rs_freeze_xid,
> +    heap_freeze_tuple(new_tuple->t_data,
> +                      state->rs_old_rel->rd_rel->relfrozenxid,
> +                      state->rs_old_rel->rd_rel->relminmxid,
> +                      state->rs_freeze_xid,
>                        state->rs_cutoff_multi);

Hm. So this requires backpatching the introduction of
RewriteStateData->rs_old_rel into 9.3, which in turn requires a new
argument to begin_heap_rewrite().  It originally was added in the
logical decoding commit (i.e. 9.4).

I'm fine with that, but it could theoretically cause issues for somebody
with an extension that calls begin_heap_rewrite() - which seems unlikely
and I couldn't find any that does so.

Does anybody have a problem with that?

Regards,

Andres


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
On 2017-12-14 17:00:29 -0800, Andres Freund wrote:
> Hi,
> 
> On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
> > diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
> > index f93c194e182..7d163c91379 100644
> > --- a/src/backend/access/heap/rewriteheap.c
> > +++ b/src/backend/access/heap/rewriteheap.c
> > @@ -407,7 +407,10 @@ rewrite_heap_tuple(RewriteState state,
> >       * While we have our hands on the tuple, we may as well freeze any
> >       * eligible xmin or xmax, so that future VACUUM effort can be saved.
> >       */
> > -    heap_freeze_tuple(new_tuple->t_data, state->rs_freeze_xid,
> > +    heap_freeze_tuple(new_tuple->t_data,
> > +                      state->rs_old_rel->rd_rel->relfrozenxid,
> > +                      state->rs_old_rel->rd_rel->relminmxid,
> > +                      state->rs_freeze_xid,
> >                        state->rs_cutoff_multi);
> 
> Hm. So this requires backpatching the introduction of
> RewriteStateData->rs_old_rel into 9.3, which in turn requires a new
> argument to begin_heap_rewrite().  It originally was added in the
> logical decoding commit (i.e. 9.4).
> 
> I'm fine with that, but it could theoretically cause issues for somebody
> with an extension that calls begin_heap_rewrite() - which seems unlikely
> and I couldn't find any that does so.
> 
> Does anybody have a problem with that?

Pushed this way.  Moved some more relfrozenxid/relminmxid tests outside
of the cutoff changes, polished some error messages.


Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you
could have a look at the backported version, just about everything but
v10 had conflicts, some of them not insubstantial.

Greetings,

Andres Freund


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
On 2017-12-14 17:00:29 -0800, Andres Freund wrote:
> Hi,
> 
> On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
> > diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
> > index f93c194e182..7d163c91379 100644
> > --- a/src/backend/access/heap/rewriteheap.c
> > +++ b/src/backend/access/heap/rewriteheap.c
> > @@ -407,7 +407,10 @@ rewrite_heap_tuple(RewriteState state,
> >       * While we have our hands on the tuple, we may as well freeze any
> >       * eligible xmin or xmax, so that future VACUUM effort can be saved.
> >       */
> > -    heap_freeze_tuple(new_tuple->t_data, state->rs_freeze_xid,
> > +    heap_freeze_tuple(new_tuple->t_data,
> > +                      state->rs_old_rel->rd_rel->relfrozenxid,
> > +                      state->rs_old_rel->rd_rel->relminmxid,
> > +                      state->rs_freeze_xid,
> >                        state->rs_cutoff_multi);
> 
> Hm. So this requires backpatching the introduction of
> RewriteStateData->rs_old_rel into 9.3, which in turn requires a new
> argument to begin_heap_rewrite().  It originally was added in the
> logical decoding commit (i.e. 9.4).
> 
> I'm fine with that, but it could theoretically cause issues for somebody
> with an extension that calls begin_heap_rewrite() - which seems unlikely
> and I couldn't find any that does so.
> 
> Does anybody have a problem with that?

Pushed this way.  Moved some more relfrozenxid/relminmxid tests outside
of the cutoff changes, polished some error messages.


Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you
could have a look at the backported version, just about everything but
v10 had conflicts, some of them not insubstantial.

Greetings,

Andres Freund


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Michael Paquier
Date:
On Fri, Dec 15, 2017 at 11:30 AM, Andres Freund <andres@anarazel.de> wrote:
> Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you
> could have a look at the backported version, just about everything but
> v10 had conflicts, some of them not insubstantial.

I have gone through the backpatched versions for the fixes in tuple
pruning, running some tests on the way and those look good to me. I
have not taken the time to go through the ones changing the assertions
to ereport() though...
-- 
Michael


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Michael Paquier
Date:
On Fri, Dec 15, 2017 at 11:30 AM, Andres Freund <andres@anarazel.de> wrote:
> Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you
> could have a look at the backported version, just about everything but
> v10 had conflicts, some of them not insubstantial.

I have gone through the backpatched versions for the fixes in tuple
pruning, running some tests on the way and those look good to me. I
have not taken the time to go through the ones changing the assertions
to ereport() though...
-- 
Michael


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Peter Geoghegan
Date:
On Thu, Dec 14, 2017 at 6:30 PM, Andres Freund <andres@anarazel.de> wrote:
> Pushed this way.  Moved some more relfrozenxid/relminmxid tests outside
> of the cutoff changes, polished some error messages.
>
>
> Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you
> could have a look at the backported version, just about everything but
> v10 had conflicts, some of them not insubstantial.

I have one minor piece of feedback on the upgrading of assertions to
ereport()s with ERRCODE_DATA_CORRUPTION: It would be nice if you could
upgrade the raw elog() "can't happen" error within
IndexBuildHeapRangeScan() to be an ereport() with
ERRCODE_DATA_CORRUPTION. I'm referring to the "failed to find parent
tuple for heap-only tuple" error, which I think merits being a real
user-visible error, just like the relfrozenxid/relminmxid tests are
now. As you know, the enhanced amcheck will sometimes detect
corruption due to this bug by hitting that error.

It would be nice if we could tighten up the number of errcodes that
can be involved in an error that amcheck detects. I know that elog()
implicitly has an errcode, that could be included in the errcodes to
check when verification occurs in an automated fashion across a large
number of databases. However, this is a pretty esoteric point, and I'd
prefer to just try to limit it to ERRCODE_DATA_CORRUPTION and
ERRCODE_INDEX_CORRUPTED, insofar as that's practical. When we ran
amcheck on the Heroku fleet, back when I worked there, there were all
kinds of non-interesting errors that could occur that needed to be
filtered out. I want to try to make that process somewhat less painful.

-- 
Peter Geoghegan


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Peter Geoghegan
Date:
On Thu, Dec 14, 2017 at 6:30 PM, Andres Freund <andres@anarazel.de> wrote:
> Pushed this way.  Moved some more relfrozenxid/relminmxid tests outside
> of the cutoff changes, polished some error messages.
>
>
> Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you
> could have a look at the backported version, just about everything but
> v10 had conflicts, some of them not insubstantial.

I have one minor piece of feedback on the upgrading of assertions to
ereport()s with ERRCODE_DATA_CORRUPTION: It would be nice if you could
upgrade the raw elog() "can't happen" error within
IndexBuildHeapRangeScan() to be an ereport() with
ERRCODE_DATA_CORRUPTION. I'm referring to the "failed to find parent
tuple for heap-only tuple" error, which I think merits being a real
user-visible error, just like the relfrozenxid/relminmxid tests are
now. As you know, the enhanced amcheck will sometimes detect
corruption due to this bug by hitting that error.

It would be nice if we could tighten up the number of errcodes that
can be involved in an error that amcheck detects. I know that elog()
implicitly has an errcode, that could be included in the errcodes to
check when verification occurs in an automated fashion across a large
number of databases. However, this is a pretty esoteric point, and I'd
prefer to just try to limit it to ERRCODE_DATA_CORRUPTION and
ERRCODE_INDEX_CORRUPTED, insofar as that's practical. When we ran
amcheck on the Heroku fleet, back when I worked there, there were all
kinds of non-interesting errors that could occur that needed to be
filtered out. I want to try to make that process somewhat less painful.

-- 
Peter Geoghegan


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
On 2017-12-15 20:25:22 +0900, Michael Paquier wrote:
> On Fri, Dec 15, 2017 at 11:30 AM, Andres Freund <andres@anarazel.de> wrote:
> > Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you
> > could have a look at the backported version, just about everything but
> > v10 had conflicts, some of them not insubstantial.
> 
> I have gone through the backpatched versions for the fixes in tuple
> pruning, running some tests on the way and those look good to me.

Thanks.


> I have not taken the time to go through the ones changing the
> assertions to ereport() though...

Those were the ones with a lot of conflicts tho - I'd temporarily broken
freezing for 9.3, but both review and testing caught it...

Greetings,

Andres Freund


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
On 2017-12-15 20:25:22 +0900, Michael Paquier wrote:
> On Fri, Dec 15, 2017 at 11:30 AM, Andres Freund <andres@anarazel.de> wrote:
> > Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you
> > could have a look at the backported version, just about everything but
> > v10 had conflicts, some of them not insubstantial.
> 
> I have gone through the backpatched versions for the fixes in tuple
> pruning, running some tests on the way and those look good to me.

Thanks.


> I have not taken the time to go through the ones changing the
> assertions to ereport() though...

Those were the ones with a lot of conflicts tho - I'd temporarily broken
freezing for 9.3, but both review and testing caught it...

Greetings,

Andres Freund


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
On 2017-12-15 10:46:05 -0800, Peter Geoghegan wrote:
> On Thu, Dec 14, 2017 at 6:30 PM, Andres Freund <andres@anarazel.de> wrote:
> > Pushed this way.  Moved some more relfrozenxid/relminmxid tests outside
> > of the cutoff changes, polished some error messages.
> >
> >
> > Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you
> > could have a look at the backported version, just about everything but
> > v10 had conflicts, some of them not insubstantial.
> 
> I have one minor piece of feedback on the upgrading of assertions to
> ereport()s with ERRCODE_DATA_CORRUPTION: It would be nice if you could
> upgrade the raw elog() "can't happen" error within
> IndexBuildHeapRangeScan() to be an ereport() with
> ERRCODE_DATA_CORRUPTION. I'm referring to the "failed to find parent
> tuple for heap-only tuple" error, which I think merits being a real
> user-visible error, just like the relfrozenxid/relminmxid tests are
> now. As you know, the enhanced amcheck will sometimes detect
> corruption due to this bug by hitting that error.

I'm not opposed to that, it just seems independent from this thread. Not
sure I really want to go around and backpatch such a change, that code
has changed a bit between branches. Happy to do so on master.

Greetings,

Andres Freund


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
On 2017-12-15 10:46:05 -0800, Peter Geoghegan wrote:
> On Thu, Dec 14, 2017 at 6:30 PM, Andres Freund <andres@anarazel.de> wrote:
> > Pushed this way.  Moved some more relfrozenxid/relminmxid tests outside
> > of the cutoff changes, polished some error messages.
> >
> >
> > Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you
> > could have a look at the backported version, just about everything but
> > v10 had conflicts, some of them not insubstantial.
> 
> I have one minor piece of feedback on the upgrading of assertions to
> ereport()s with ERRCODE_DATA_CORRUPTION: It would be nice if you could
> upgrade the raw elog() "can't happen" error within
> IndexBuildHeapRangeScan() to be an ereport() with
> ERRCODE_DATA_CORRUPTION. I'm referring to the "failed to find parent
> tuple for heap-only tuple" error, which I think merits being a real
> user-visible error, just like the relfrozenxid/relminmxid tests are
> now. As you know, the enhanced amcheck will sometimes detect
> corruption due to this bug by hitting that error.

I'm not opposed to that, it just seems independent from this thread. Not
sure I really want to go around and backpatch such a change, that code
has changed a bit between branches. Happy to do so on master.

Greetings,

Andres Freund


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Peter Geoghegan
Date:
On Fri, Dec 15, 2017 at 10:58 AM, Andres Freund <andres@anarazel.de> wrote:
>> I have one minor piece of feedback on the upgrading of assertions to
>> ereport()s with ERRCODE_DATA_CORRUPTION: It would be nice if you could
>> upgrade the raw elog() "can't happen" error within
>> IndexBuildHeapRangeScan() to be an ereport() with
>> ERRCODE_DATA_CORRUPTION. I'm referring to the "failed to find parent
>> tuple for heap-only tuple" error, which I think merits being a real
>> user-visible error, just like the relfrozenxid/relminmxid tests are
>> now. As you know, the enhanced amcheck will sometimes detect
>> corruption due to this bug by hitting that error.
>
> I'm not opposed to that, it just seems independent from this thread. Not
> sure I really want to go around and backpatch such a change, that code
> has changed a bit between branches. Happy to do so on master.

The elog(), which was itself upgraded from a simple Assert by commit
d70cf811, appears in exactly the same form in 9.3+. Things did change
there, but they were kept in sync.

-- 
Peter Geoghegan


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Peter Geoghegan
Date:
On Fri, Dec 15, 2017 at 10:58 AM, Andres Freund <andres@anarazel.de> wrote:
>> I have one minor piece of feedback on the upgrading of assertions to
>> ereport()s with ERRCODE_DATA_CORRUPTION: It would be nice if you could
>> upgrade the raw elog() "can't happen" error within
>> IndexBuildHeapRangeScan() to be an ereport() with
>> ERRCODE_DATA_CORRUPTION. I'm referring to the "failed to find parent
>> tuple for heap-only tuple" error, which I think merits being a real
>> user-visible error, just like the relfrozenxid/relminmxid tests are
>> now. As you know, the enhanced amcheck will sometimes detect
>> corruption due to this bug by hitting that error.
>
> I'm not opposed to that, it just seems independent from this thread. Not
> sure I really want to go around and backpatch such a change, that code
> has changed a bit between branches. Happy to do so on master.

The elog(), which was itself upgraded from a simple Assert by commit
d70cf811, appears in exactly the same form in 9.3+. Things did change
there, but they were kept in sync.

-- 
Peter Geoghegan


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Peter Geoghegan
Date:
On Fri, Dec 15, 2017 at 11:03 AM, Peter Geoghegan <pg@bowt.ie> wrote:
> The elog(), which was itself upgraded from a simple Assert by commit
> d70cf811, appears in exactly the same form in 9.3+. Things did change
> there, but they were kept in sync.

BTW, if you're going to do it, I would target the similar error within
validate_index_heapscan(), too. That was also added by d70cf811.

-- 
Peter Geoghegan


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Peter Geoghegan
Date:
On Fri, Dec 15, 2017 at 11:03 AM, Peter Geoghegan <pg@bowt.ie> wrote:
> The elog(), which was itself upgraded from a simple Assert by commit
> d70cf811, appears in exactly the same form in 9.3+. Things did change
> there, but they were kept in sync.

BTW, if you're going to do it, I would target the similar error within
validate_index_heapscan(), too. That was also added by d70cf811.

-- 
Peter Geoghegan


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
On 2017-12-15 11:15:47 -0800, Peter Geoghegan wrote:
> On Fri, Dec 15, 2017 at 11:03 AM, Peter Geoghegan <pg@bowt.ie> wrote:
> > The elog(), which was itself upgraded from a simple Assert by commit
> > d70cf811, appears in exactly the same form in 9.3+. Things did change
> > there, but they were kept in sync.
> 
> BTW, if you're going to do it, I would target the similar error within
> validate_index_heapscan(), too. That was also added by d70cf811.

Please send a patch for master on a *new* thread.

Greetings,

Andres Freund


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Andres Freund
Date:
On 2017-12-15 11:15:47 -0800, Peter Geoghegan wrote:
> On Fri, Dec 15, 2017 at 11:03 AM, Peter Geoghegan <pg@bowt.ie> wrote:
> > The elog(), which was itself upgraded from a simple Assert by commit
> > d70cf811, appears in exactly the same form in 9.3+. Things did change
> > there, but they were kept in sync.
> 
> BTW, if you're going to do it, I would target the similar error within
> validate_index_heapscan(), too. That was also added by d70cf811.

Please send a patch for master on a *new* thread.

Greetings,

Andres Freund


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Alvaro Herrera
Date:
I noticed that I'm committer for this patch in the commitfest, though I
don't remember setting that.  Are you expecting me to commit it?  I
thought you'd do it, but if you want me to assume the responsibility I
can do that.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Alvaro Herrera
Date:
I noticed that I'm committer for this patch in the commitfest, though I
don't remember setting that.  Are you expecting me to commit it?  I
thought you'd do it, but if you want me to assume the responsibility I
can do that.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Andres Freund
Date:

On December 23, 2017 9:26:22 PM GMT+01:00, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>I noticed that I'm committer for this patch in the commitfest, though I
>don't remember setting that.  Are you expecting me to commit it?  I
>thought you'd do it, but if you want me to assume the responsibility I
>can do that.

I thought I pushed it to all branches? Do you see anything missing? Didn't know there's a CF entry...

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From
Andres Freund
Date:

On December 23, 2017 9:26:22 PM GMT+01:00, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>I noticed that I'm committer for this patch in the commitfest, though I
>don't remember setting that.  Are you expecting me to commit it?  I
>thought you'd do it, but if you want me to assume the responsibility I
>can do that.

I thought I pushed it to all branches? Do you see anything missing? Didn't know there's a CF entry...

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Alvaro Herrera
Date:
Andres Freund wrote:

> On December 23, 2017 9:26:22 PM GMT+01:00, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >I noticed that I'm committer for this patch in the commitfest, though I
> >don't remember setting that.  Are you expecting me to commit it?  I
> >thought you'd do it, but if you want me to assume the responsibility I
> >can do that.
> 
> I thought I pushed it to all branches? Do you see anything missing?
> Didn't know there's a CF entry...

Ah, no, looks correct to me in all branches.  Updated the CF now too.

Thanks!

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Alvaro Herrera
Date:
Andres Freund wrote:

> On December 23, 2017 9:26:22 PM GMT+01:00, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >I noticed that I'm committer for this patch in the commitfest, though I
> >don't remember setting that.  Are you expecting me to commit it?  I
> >thought you'd do it, but if you want me to assume the responsibility I
> >can do that.
> 
> I thought I pushed it to all branches? Do you see anything missing?
> Didn't know there's a CF entry...

Ah, no, looks correct to me in all branches.  Updated the CF now too.

Thanks!

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Michael Paquier
Date:
On Sat, Dec 23, 2017 at 05:26:22PM -0300, Alvaro Herrera wrote:
> I noticed that I'm committer for this patch in the commitfest, though I
> don't remember setting that.  Are you expecting me to commit it?  I
> thought you'd do it, but if you want me to assume the responsibility I
> can do that.

I tend to create CF entries for all patches that we need to track as bug
fixes. No things lost this way. Alvaro, you have been marked as a committer
of this patch as you were the one to work and push the first version that
has finished reverted.
--
Michael

Attachment

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From
Michael Paquier
Date:
On Sat, Dec 23, 2017 at 05:26:22PM -0300, Alvaro Herrera wrote:
> I noticed that I'm committer for this patch in the commitfest, though I
> don't remember setting that.  Are you expecting me to commit it?  I
> thought you'd do it, but if you want me to assume the responsibility I
> can do that.

I tend to create CF entries for all patches that we need to track as bug
fixes. No things lost this way. Alvaro, you have been marked as a committer
of this patch as you were the one to work and push the first version that
has finished reverted.
--
Michael

Attachment