Thread: [BUGS] Old row version in hot chain become visible after a freeze

[BUGS] Old row version in hot chain become visible after a freeze

From

"Wood, Dan"

Date:

01 September 2017, 04:36:43

From: Wood, Dan hexpert(at)amazon(dot)com

I’ve found a bug in Postgres which causes old row versions to appear in a table. DEAD rows in a hot chain are getting frozen and becoming visible. I’ve repro’d this in both 9.6.1 and 11-devel.

The repro consists of two short psql scripts.

While the repro does an explicit VACUUM FREEZE, this bug also happens with autovacuum.

FILE: lock.sql

begin;

select id from t where id=3 for key share;

select pg_sleep(1);

update t set x=x+1 where id=3;

commit;

vacuum freeze t;

select ctid, xmin, xmax, id from t;

FILE: repro.sql

drop table t;

create table t (id int primary key, name char(3), x integer);

insert into t values (1, '111', 0);

insert into t values (3, '333', 0);

\! psql -p 5432 postgres -f lock.sql &

It’s about 50-50 whether any given run of repro.sql will produce output like:

…

ctid | xmin | xmax | id | x

-------+------+------+----+---

(0,1) | 984 | 0 | 1 | 0

(0,7) | 990 | 0 | 3 | 5

(2 rows)

ctid | xmin | xmax | id | x

-------+------+------+----+---

(0,1) | 984 | 0 | 1 | 0

(0,3) | 986 | 0 | 3 | 1 // This, and x = 2, 3 and 4 came back from the DEAD

(0,4) | 987 | 0 | 3 | 2

(0,5) | 988 | 0 | 3 | 3

(0,6) | 989 | 0 | 3 | 4

(0,7) | 990 | 0 | 3 | 5

(6 rows)

Root cause analysis: lazy_scan_heap() deletes DEAD tuples in heap_page_prune(). However, it is possible for concurrent commits/rollbacks to render a tuple DEAD by the time we reach the switch statement on HeapTupleSatisfiesVacuum(). If such a row IsHotUpdated or IsHeapOnly we can’t delete it below, and must allow a later prune to take care of it.

if (HeapTupleIsHotUpdated(&tuple) || HeapTupleIsHeapOnly(&tuple))

nkeep += 1; // Don't delete, allow later prune to delete it

else

tupgone = true; // We can delete it below

Because tupgone is false we freeze instead of deleting. Freezing a DEAD tuple makes it visible. Here is a comment in heap_prepare_freeze_tuple()

* It is assumed that the caller has checked the tuple with

* HeapTupleSatisfiesVacuum() and determined that it is not HEAPTUPLE_DEAD

* (else we should be removing the tuple, not freezing it).

It is rare that we run into a DEAD tuple in this way during a freeze. More often RECENTLY_DEAD is returned. But we did see this with a more realistic long running test and I was able to create the simplified test case above. Skipping the Freeze on a DEAD tuple that IsHotUpdated or IsHeapOnly does fix the problem. I’ve attached a patch with this fix.

Attachment

update_freeze.patch

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Peter Geoghegan

Date:

01 September 2017, 04:56:20

Hi Dan,

Nice to hear from you.

On Thu, Aug 31, 2017 at 3:36 PM, Wood, Dan <hexpert@amazon.com> wrote:
> Because tupgone is false we freeze instead of deleting.  Freezing a DEAD
> tuple makes it visible.  Here is a comment in heap_prepare_freeze_tuple()
>
>
>
>   * It is assumed that the caller has checked the tuple with
>
>   * HeapTupleSatisfiesVacuum() and determined that it is not HEAPTUPLE_DEAD

Funny that there'd be another bug associated with
heap_prepare_freeze_tuple() so soon after the last one was discovered.
Are you aware of the other one [1]?

BTW, I just posted a patch to enhance amcheck, to allow it to verify
that an index has all the entries that it ought to [2]. Perhaps you'd
find it useful for this kind of thing. I welcome your feedback on
that.

[1]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=31b8db8e6c1fa4436116f4be5ca789f3a01b9ebf;hp=f1dae097f2945ffcb59a9f236843e0e0bbf0920d
[2] https://commitfest.postgresql.org/14/1263/
-- 
Peter Geoghegan

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

"Wood, Dan"

Date:

01 September 2017, 05:20:04

I was aware of the other one and, in fact, reverted the change just to make sure it wasn’t Involved.  A strange
coincidenceindeed.
 

On 8/31/17, 3:57 PM, "Peter Geoghegan" <pg@bowt.ie> wrote:
   Hi Dan,      Nice to hear from you.      On Thu, Aug 31, 2017 at 3:36 PM, Wood, Dan <hexpert@amazon.com> wrote:   >
Becausetupgone is false we freeze instead of deleting.  Freezing a DEAD   > tuple makes it visible.  Here is a comment
inheap_prepare_freeze_tuple()   >   >   >   >   * It is assumed that the caller has checked the tuple with   >   >   *
HeapTupleSatisfiesVacuum()and determined that it is not HEAPTUPLE_DEAD      Funny that there'd be another bug
associatedwith   heap_prepare_freeze_tuple() so soon after the last one was discovered.   Are you aware of the other
one[1]?      BTW, I just posted a patch to enhance amcheck, to allow it to verify   that an index has all the entries
thatit ought to [2]. Perhaps you'd   find it useful for this kind of thing. I welcome your feedback on   that.      [1]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=31b8db8e6c1fa4436116f4be5ca789f3a01b9ebf;hp=f1dae097f2945ffcb59a9f236843e0e0bbf0920d
 [2] https://commitfest.postgresql.org/14/1263/   --    Peter Geoghegan   
 


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Peter Geoghegan

Date:

01 September 2017, 06:07:39

On Thu, Aug 31, 2017 at 4:20 PM, Wood, Dan <hexpert@amazon.com> wrote:
> I was aware of the other one and, in fact, reverted the change just to make sure it wasn’t Involved.  A strange
coincidenceindeed. 

FWIW, the enhanced amcheck does actually detect this:

postgres=# select bt_index_check('t_pkey', true);
ERROR:  XX000: failed to find parent tuple for heap-only tuple at
(0,3) in table "t"
LOCATION:  IndexBuildHeapRangeScan, index.c:2597

The good news is that this happens in a codepath that is also used by REINDEX:

postgres=# reindex index t_pkey ;
ERROR:  XX000: failed to find parent tuple for heap-only tuple at
(0,3) in table "t"
LOCATION:  IndexBuildHeapRangeScan, index.c:2597

Perhaps the lack of field reports of this error suggests that it's
fairly rare. Very hard to be sure about that, though.

--
Peter Geoghegan


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Peter Geoghegan

Date:

01 September 2017, 09:25:25

On Thu, Aug 31, 2017 at 3:36 PM, Wood, Dan <hexpert@amazon.com> wrote:
> I’ve found a bug in Postgres which causes old row versions to appear in a
> table.  DEAD rows in a hot chain are getting frozen and becoming visible.
> I’ve repro’d this in both 9.6.1 and 11-devel.

I can confirm that this goes all the way back to 9.3, where "for key
share"/foreign key locks were introduced.

--
Peter Geoghegan


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Michael Paquier

Date:

01 September 2017, 09:46:44

On Fri, Sep 1, 2017 at 12:25 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Thu, Aug 31, 2017 at 3:36 PM, Wood, Dan <hexpert@amazon.com> wrote:
>> I’ve found a bug in Postgres which causes old row versions to appear in a
>> table.  DEAD rows in a hot chain are getting frozen and becoming visible.
>> I’ve repro’d this in both 9.6.1 and 11-devel.
>
> I can confirm that this goes all the way back to 9.3, where "for key
> share"/foreign key locks were introduced.

Hm. That looks pretty bad to me... It is bad luck that this report
happens just after the last round of minor releases has been out, and
we would have needed at least a couple of days and a couple of pairs
of eyes to come up with a correct patch. (I haven't looked at the
proposed solution and the attached patch yet, so no opinions yet).
--
Michael


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Michael Paquier

Date:

05 September 2017, 18:44:39

On Fri, Sep 1, 2017 at 12:46 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Fri, Sep 1, 2017 at 12:25 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>> On Thu, Aug 31, 2017 at 3:36 PM, Wood, Dan <hexpert@amazon.com> wrote:
>>> I’ve found a bug in Postgres which causes old row versions to appear in a
>>> table.  DEAD rows in a hot chain are getting frozen and becoming visible.
>>> I’ve repro’d this in both 9.6.1 and 11-devel.
>>
>> I can confirm that this goes all the way back to 9.3, where "for key
>> share"/foreign key locks were introduced.
>
> Hm. That looks pretty bad to me... It is bad luck that this report
> happens just after the last round of minor releases has been out, and
> we would have needed at least a couple of days and a couple of pairs
> of eyes to come up with a correct patch. (I haven't looked at the
> proposed solution and the attached patch yet, so no opinions yet).

Using a build where assertions are enabled, the provided test case
blows up before even returning results:   frame #3: 0x0000000109e8df90
postgres`ExceptionalCondition(conditionName="!(!((update_xid) !=
((TransactionId) 0)) || !TransactionIdPrecedes(update_xid,
cutoff_xid))", errorType="FailedAssertion", fileName="heapam.c",
lineNumber=6533) + 128 at assert.c:54   frame #4: 0x00000001098fba6b postgres`FreezeMultiXactId(multi=34,
t_infomask=4930, cutoff_xid=897, cutoff_multi=30,
flags=0x00007fff56372fae) + 1179 at heapam.c:6532   frame #5: 0x00000001098fb2cd
postgres`heap_prepare_freeze_tuple(tuple=0x000000010b04d0b0,
cutoff_xid=897, cutoff_multi=30, frz=0x00007fae0d800040,
totally_frozen_p="\x01\x02") + 285 at heapam.c:6673   frame #6: 0x0000000109af356d
postgres`lazy_scan_heap(onerel=0x000000010a6fe630, options=9,
vacrelstats=0x00007fae0a0580a8, Irel=0x00007fae0a058688, nindexes=1,
aggressive='\x01') + 4413 at vacuumlazy.c:1081   frame #7: 0x0000000109af1ef2
postgres`lazy_vacuum_rel(onerel=0x000000010a6fe630, options=9,
params=0x00007fff56373608, bstrategy=0x00007fae0a047c40) + 626 at
vacuumlazy.c:253
(lldb) up 1
frame #4: 0x00000001098fba6b postgres`FreezeMultiXactId(multi=34,
t_infomask=4930, cutoff_xid=897, cutoff_multi=30,
flags=0x00007fff56372fae) + 1179 at heapam.c:6532  6529                 * Since the tuple wasn't marked HEAPTUPLE_DEAD
by vacuum, the  6530                 * update Xid cannot possibly be older than the
xid cutoff.  6531                 */
-> 6532                Assert(!TransactionIdIsValid(update_xid) ||  6533
!TransactionIdPrecedes(update_xid,cutoff_xid));  6534  6535                /* 
(lldb) p update_xid
(TransactionId) $0 = 896
(lldb) p cutoff_xid
(TransactionId) $1 = 897

As the portion doing vacuum freeze is the one blowing up, I think that
it is possible to extract an isolation test and include it in the
patch with repro.sql as the initialization phase.
                /*
-                 * Each non-removable tuple must be checked to see if it needs
+                 * Each non-removable tuple that we do not keep must
be checked to see if it needs                 * freezing.  Note we already have exclusive buffer lock.
*/
-                if (heap_prepare_freeze_tuple(tuple.t_data, FreezeLimit,
+                if (!tupkeep)
+                {
+                    if (heap_prepare_freeze_tuple(tuple.t_data, FreezeLimit,
  MultiXactCutoff, 
&frozen[nfrozen],                                              &tuple_totally_frozen))
-                    frozen[nfrozen++].offset = offnum;
+                        frozen[nfrozen++].offset = offnum;

Wouldn't it be better to check if freeze is needed for the tuple
scanned with heap_tuple_needs_freeze() instead of inventing a new
custom condition? Please note as well that heap_tuple_needs_freeze()
does not need its last "buf" argument.
--
Michael


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Michael Paquier

Date:

06 September 2017, 10:02:57

On Tue, Sep 5, 2017 at 9:44 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> As the portion doing vacuum freeze is the one blowing up, I think that
> it is possible to extract an isolation test and include it in the
> patch with repro.sql as the initialization phase.

Indeed, I have been able to reduce that to an isolation test which
crashes with a rate of 100% if assertions are enabled.

I have also spent a couple more hours looking at the proposed patch
and eye-balling the surrounding code, and my suggestion about
heap_tuple_needs_freeze() is proving to be wrong. So I am arriving at
the conclusion that your patch is taking the right approach to skip
freezing completely if the tuple is not to be removed yet if it is for
vacuum either DEAD or RECENTLY_DEAD.

I am adding as well in CC Álvaro and Andres, who reworked tuple
freezing in 3b97e682 a couple of years back for 9.3. Could you look
more at the bug and the attached patch? It does not fundamentally
change from what has been proposed first, just did the following:
- Updated set of comments that incorrectly assumed that only DEAD
tuples should have freeze skipped.
- Added isolation test for the failure.

An entry has been added in the next open CF to track this problem.
This should not fall into the cracks!
--
Michael

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Attachment

update_freeze_v2.patch

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Alvaro Herrera

Date:

06 September 2017, 16:40:20

Michael Paquier wrote:

> I have also spent a couple more hours looking at the proposed patch
> and eye-balling the surrounding code, and my suggestion about
> heap_tuple_needs_freeze() is proving to be wrong. So I am arriving at
> the conclusion that your patch is taking the right approach to skip
> freezing completely if the tuple is not to be removed yet if it is for
> vacuum either DEAD or RECENTLY_DEAD.

I think in the "tupkeep" case we must not mark the page as frozen in VM;
in other words I think that block needs to look like this:
           // tupgone = false           {               bool        tuple_totally_frozen;
               num_tuples += 1;               hastup = true;
               /*                * Each non-removable tuple that we do not keep must be checked                * to see
ifit needs freezing.  Note we already have exclusive                * buffer lock.                */               if
(!tupkeep&&                   heap_prepare_freeze_tuple(tuple.t_data, FreezeLimit,
      MultiXactCutoff,                                             &frozen[nfrozen],
        &tuple_totally_frozen))                       frozen[nfrozen++].offset = offnum;
 
               if (tupkeep || !tuple_totally_frozen)                   all_frozen = false;           }

Otherwise, we risk marking the page as all-frozen, and it would be
skipped by vacuum.  If we never come around to HOT-pruning the page, a
non-permanent xid (or a multixact? not sure that that can happen;
probably not) would linger unnoticed and cause a DoS condition later
("cannot open file pg_clog/1234") when the tuple header is read.

Now, it is possible that HOT pruning would fix the page promptly without
causing an actual DoS, but nonetheless it seems dangerous to leave
things like this.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Alvaro Herrera

Date:

06 September 2017, 19:27:38

Another thing is that if you're going through heap_copy_data, the tuple
is only checked for DEAD, not RECENTLY_DEAD.  Maybe there is no bug here
but I'm not sure yet.

I attach the patch as I have it now -- mostly comment tweaks, but also
the change to not mark page as all-frozen when we skip freezing a
recently read tuple.  I also renamed the test case, because I think it
may be possible to reach the problem code without involving a multixact.
Haven't tried too hard though.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Attachment

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

"Wood, Dan"

Date:

06 September 2017, 23:24:23

First new reply

On 9/6/17, 3:41 AM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote:
   Michael Paquier wrote:      > I have also spent a couple more hours looking at the proposed patch   > and
eye-ballingthe surrounding code, and my suggestion about   > heap_tuple_needs_freeze() is proving to be wrong. So I am
arrivingat   > the conclusion that your patch is taking the right approach to skip   > freezing completely if the tuple
isnot to be removed yet if it is for   > vacuum either DEAD or RECENTLY_DEAD.      I think in the "tupkeep" case we
mustnot mark the page as frozen in VM;   in other words I think that block needs to look like this:                  //
tupgone= false               {                   bool        tuple_totally_frozen;                      num_tuples +=
1;                  hastup = true;                      /*                    * Each non-removable tuple that we do not
keepmust be checked                    * to see if it needs freezing.  Note we already have exclusive
* buffer lock.                    */                   if (!tupkeep &&
heap_prepare_freeze_tuple(tuple.t_data,FreezeLimit,                                                 MultiXactCutoff,
                                            &frozen[nfrozen],
&tuple_totally_frozen))                          frozen[nfrozen++].offset = offnum;                      if (tupkeep ||
!tuple_totally_frozen)                      all_frozen = false;               }      Otherwise, we risk marking the
pageas all-frozen, and it would be   skipped by vacuum.  If we never come around to HOT-pruning the page, a
non-permanentxid (or a multixact? not sure that that can happen;   probably not) would linger unnoticed and cause a DoS
conditionlater   ("cannot open file pg_clog/1234") when the tuple header is read.      Now, it is possible that HOT
pruningwould fix the page promptly without   causing an actual DoS, but nonetheless it seems dangerous to leave
thingslike this.      --    Álvaro Herrera                https://www.2ndQuadrant.com/   PostgreSQL Development, 24x7
Support,Remote DBA, Training & Services   
 


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

"Wood, Dan"

Date:

06 September 2017, 23:27:39

Sorry for my mistaken last mail 


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Alvaro Herrera

Date:

07 September 2017, 04:12:17

Michael Paquier wrote:

> frame #4: 0x00000001098fba6b postgres`FreezeMultiXactId(multi=34,
> t_infomask=4930, cutoff_xid=897, cutoff_multi=30,
> flags=0x00007fff56372fae) + 1179 at heapam.c:6532
>    6529                 * Since the tuple wasn't marked HEAPTUPLE_DEAD by vacuum, the
>    6530                 * update Xid cannot possibly be older than the xid cutoff.
>    6531                 */
> -> 6532                Assert(!TransactionIdIsValid(update_xid) ||
>    6533                       !TransactionIdPrecedes(update_xid, cutoff_xid));
>    6534
>    6535                /*
> (lldb) p update_xid
> (TransactionId) $0 = 896
> (lldb) p cutoff_xid
> (TransactionId) $1 = 897

So, looking at this closely, I think there is a bigger problem here: if
we use any of the proposed patches or approaches, we risk leaving an old
Xid in a tuple (because of skipping the heap_tuple_prepare_freeze on a
tuple which remains in the heap with live Xids), followed by later
truncating pg_multixact / pg_clog removing a segment that might be
critical to resolving this tuple status later on.

I think doing the tuple freezing dance for any tuple we don't remove
from the heap is critical, not optional.  Maybe a later HOT pruning
would save you from actual disaster, but I think it'd be a bad idea to
rely on that.

So ISTM we need a different solution than what's been proposed so far;
and I think that solution is different for each of the possible problem
cases, which are two: HEAPTUPLE_DEAD and HEAPTUPLE_RECENTLY_DEAD.

I think we can cover the HEAPTUPLE_DEAD case by just redoing the
heap_page_prune (just add a "goto" back to it if we detect the case).
It's a bit wasteful because we'd re-process all the prior tuples in the
loop below, but since it's supposed to be an infrequent condition, I
think it should be okay.

The RECENTLY_DEAD case is interesting.  We know that the updater is
committed, and since the update XID is older than the cutoff XID, then
we know nobody else can see the tuple.  So we can simply remove it ...
and we already have a mechanism for that: return FRM_MARK_COMMITTED in
FreezeMultiXactId.  But the code already does that!  The only thing we
need in order for this to be handled correctly is to remove the assert.

A case I didn't think about yet is RECENTLY_DEAD if the xmax is a plain
Xid (not a multi).  My vague feeling is that there is no bug here.

I haven't actually tested this.  Planning to look into it tomorrow.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Michael Paquier

Date:

07 September 2017, 07:23:46

On Wed, Sep 6, 2017 at 7:40 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Otherwise, we risk marking the page as all-frozen, and it would be
> skipped by vacuum.  If we never come around to HOT-pruning the page, a
> non-permanent xid (or a multixact? not sure that that can happen;
> probably not) would linger unnoticed and cause a DoS condition later
> ("cannot open file pg_clog/1234") when the tuple header is read.

Yes, you are definitely right here.

On Wed, Sep 6, 2017 at 10:27 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Another thing is that if you're going through heap_copy_data, the tuple
> is only checked for DEAD, not RECENTLY_DEAD.  Maybe there is no bug here
> but I'm not sure yet.

Not sure to what you are referring here. Perhaps heap_prune_chain()?
It seems to me that it does the right thing.

> I attach the patch as I have it now -- mostly comment tweaks, but also
> the change to not mark page as all-frozen when we skip freezing a
> recently read tuple.  I also renamed the test case, because I think it
> may be possible to reach the problem code without involving a multixact.
> Haven't tried too hard though.

+test: freeze-the-dead
This one has made me smile..

+++ b/src/test/isolation/expected/freeze-the-dead.spec
@@ -0,0 +1,101 @@
+Parsed test spec with 2 sessions
The test fails as expected/ files need to be named with .out, not .spec.
-- 
Michael


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Jeff Frost

Date:

07 September 2017, 09:23:28

On Sep 6, 2017, at 6:27 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> 
> Another thing is that if you're going through heap_copy_data, the tuple
> is only checked for DEAD, not RECENTLY_DEAD.  Maybe there is no bug here
> but I'm not sure yet.

Any idea on how to identify affected rows on a running system?


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Jeff Frost

Date:

07 September 2017, 10:45:45

> On Sep 6, 2017, at 8:23 PM, Jeff Frost <jeff@pgexperts.com> wrote:
> 
> 
> Any idea on how to identify affected rows on a running system?

I guess better questions would be:

Is it as simple as:

SELECT id, count(*) FROM foo GROUP BY id HAVING count(*) > 1;

Maybe also need to:

set enable_indexscan = 0;
set enable_indexonlyscan = 0;

before running the SELECT?

Is it possible to affect a DELETE or does it need to be a HOT updated row?


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

"Wong, Yi Wen"

Date:

08 September 2017, 02:02:34

________________________________________
From: pgsql-bugs-owner@postgresql.org <pgsql-bugs-owner@postgresql.org> on behalf of Alvaro Herrera
<alvherre@alvh.no-ip.org>
Sent: Wednesday, September 6, 2017 3:12 PM
To: Michael Paquier
Cc: Peter Geoghegan; Wood, Dan; pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Old row version in hot chain become visible after a freeze

>Michael Paquier wrote:

>> frame #4: 0x00000001098fba6b postgres`FreezeMultiXactId(multi=34,
>> t_infomask=4930, cutoff_xid=897, cutoff_multi=30,
>> flags=0x00007fff56372fae) + 1179 at heapam.c:6532
>>    6529                 * Since the tuple wasn't marked HEAPTUPLE_DEAD by vacuum, the
>>    6530                 * update Xid cannot possibly be older than the xid cutoff.
>>    6531                 */
>> -> 6532                Assert(!TransactionIdIsValid(update_xid) ||
>>    6533                       !TransactionIdPrecedes(update_xid, cutoff_xid));
>>    6534
>>    6535                /*
>> (lldb) p update_xid
>> (TransactionId) $0 = 896
>> (lldb) p cutoff_xid
>> (TransactionId) $1 = 897

>So, looking at this closely, I think there is a bigger problem here: if
>we use any of the proposed patches or approaches, we risk leaving an old
>Xid in a tuple (because of skipping the heap_tuple_prepare_freeze on a
>tuple which remains in the heap with live Xids), followed by later
>truncating pg_multixact / pg_clog removing a segment that might be
>critical to resolving this tuple status later on.

>I think doing the tuple freezing dance for any tuple we don't remove
>from the heap is critical, not optional.  Maybe a later HOT pruning
>would save you from actual disaster, but I think it'd be a bad idea to
>rely on that.

There is actually another case where HEAPTUPLE_DEAD tuples may be kept and have
prepare_freeze skipped on them entirely.

lazy_record_dead_tuple may fail to record the heap for later pruning
for lazy_vacuum_heap if there is already a sufficiently large number of dead tuples
in the array:

/** The array shouldn't overflow under normal behavior, but perhaps it* could if we are given a really small
maintenance_work_mem.In that* case, just forget the last few tuples (we'll get 'em next time).*/ 
if (vacrelstats->num_dead_tuples < vacrelstats->max_dead_tuples)
...

It looks like we don't even check if the lazy_record_dead_tuple operation actually did any
actual recording in the tupgone case...
if (tupgone){   lazy_record_dead_tuple(vacrelstats, &(tuple.t_self));
HeapTupleHeaderAdvanceLatestRemovedXid(tuple.t_data,
                  &vacrelstats->latestRemovedXid);   tups_vacuumed += 1;   has_dead_tuples = true; 
}

It's probably unsafe for any operations calling TruncateXXX code to assume that old Xids don't stick around
after a vacuum in itself.

Thanks,
Yi Wen

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Alvaro Herrera

Date:

08 September 2017, 22:42:44

Wong, Yi Wen wrote:

> There is actually another case where HEAPTUPLE_DEAD tuples may be kept and have
> prepare_freeze skipped on them entirely.
> 
> lazy_record_dead_tuple may fail to record the heap for later pruning
> for lazy_vacuum_heap if there is already a sufficiently large number of dead tuples
> in the array:

Hmm, ouch, good catch.

AFAICS this is a shouldn't-happen condition, since we bail out of the
loop pessimistically as soon as we would be over the array limit if the
next page were to be full of dead tuples (i.e., we never give the chance
for overflow to actually happen).  So unless I misunderstand, this could
only fail if you give maint_work_mem smaller than necessary for one
pageful of dead tuples, which should be about 1800 bytes ...

If we wanted to be very sure about this we could add a test and perhaps
abort the vacuum, but I'm not sure it's worth the trouble.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Tom Lane

Date:

08 September 2017, 23:04:10

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Wong, Yi Wen wrote:
>> lazy_record_dead_tuple may fail to record the heap for later pruning
>> for lazy_vacuum_heap if there is already a sufficiently large number of dead tuples
>> in the array:

> Hmm, ouch, good catch.

> AFAICS this is a shouldn't-happen condition, since we bail out of the
> loop pessimistically as soon as we would be over the array limit if the
> next page were to be full of dead tuples (i.e., we never give the chance
> for overflow to actually happen).  So unless I misunderstand, this could
> only fail if you give maint_work_mem smaller than necessary for one
> pageful of dead tuples, which should be about 1800 bytes ...

> If we wanted to be very sure about this we could add a test and perhaps
> abort the vacuum, but I'm not sure it's worth the trouble.

I think if we're going to depend on that, we should change the logic from
"don't record tuple if no space" to "throw error on no space".
        regards, tom lane


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Alvaro Herrera

Date:

11 September 2017, 20:01:03

Michael Paquier wrote:
> On Wed, Sep 6, 2017 at 7:40 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > Otherwise, we risk marking the page as all-frozen, and it would be
> > skipped by vacuum.  If we never come around to HOT-pruning the page, a
> > non-permanent xid (or a multixact? not sure that that can happen;
> > probably not) would linger unnoticed and cause a DoS condition later
> > ("cannot open file pg_clog/1234") when the tuple header is read.
> 
> Yes, you are definitely right here.
> 
> On Wed, Sep 6, 2017 at 10:27 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > Another thing is that if you're going through heap_copy_data, the tuple
> > is only checked for DEAD, not RECENTLY_DEAD.  Maybe there is no bug here
> > but I'm not sure yet.
> 
> Not sure to what you are referring here. Perhaps heap_prune_chain()?
> It seems to me that it does the right thing.

I meant copy_heap_data, which also "freezes" tuples while copying the
heap.  In the end, I concluded the fact that it runs with access
exclusive lock is sufficient protection.  I was worried that in the
future we might improve that code and enable it to run without AEL, but
after looking at it a little more, it sounds like a big enough project
that this is going to be the least of its troubles.

Anyway, in order to test this, I tried running this one-liner (files
attached):
  psql -f /tmp/repro.sql ; for i in `seq 1 50`; do psql --no-psqlrc -f /tmp/lock.sql & done

(I also threw in a small sleep between heap_page_prune and
HeapTupleSatisfiesVacuum while testing, just to widen the problem window
to hopefully make any remaining problems more evident.)

This turned up a few different failure modes, which I fixed until no
further problems arose.  With the attached patch, I no longer see any
failures (assertion failures) or misbehavior (additional rows), in a few
dozen runs, which were easy to come by with the original code.  The
resulting patch, which I like better than the previously proposed idea
of skipping the freeze, takes the approach of handling freeze correctly
for the cases where the tuple still exists after pruning.

I also tweaked lazy_record_dead_tuple to fail with ERROR if the tuple
cannot be recorded, as observed by Yi Wen.  AFAICS that's not reachable
because of the way the array is allocated, so an elog(ERROR) is
sufficient.

I regret my inability to turn the oneliner into a committable test case,
but I think that's beyond what I can do for now.

FWIW running this against 9.2 I couldn't detect any problems, which I
suppose was expected.  Changing the lock mode in the test file to FOR
SHARE I only see an enormous amount of deadlocks reported (which makes
me confident that doing multixact stuff in 9.3 was a good thing, despite
what bug chasers may think.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Attachment

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Michael Paquier

Date:

12 September 2017, 13:45:50

On Mon, Sep 11, 2017 at 11:01 PM, Alvaro Herrera
<alvherre@alvh.no-ip.org> wrote:
> (I also threw in a small sleep between heap_page_prune and
> HeapTupleSatisfiesVacuum while testing, just to widen the problem window
> to hopefully make any remaining problems more evident.)

I am understanding that you mean heap_prepare_freeze_tuple here
instead of heap_page_prune.

> This turned up a few different failure modes, which I fixed until no
> further problems arose.  With the attached patch, I no longer see any
> failures (assertion failures) or misbehavior (additional rows), in a few
> dozen runs, which were easy to come by with the original code.

Well, you simply removed the assertion ;), and my tests don't show
additional rows as well, which is nice.

> The
> resulting patch, which I like better than the previously proposed idea
> of skipping the freeze, takes the approach of handling freeze correctly
> for the cases where the tuple still exists after pruning.

That's also something I was wondering when looking at the first patch.
I am unfortunately not as skilled as you are with this area of the
code (this thread has brought its quantity of study!), so I was not
able to draw a clear line with what needs to be done. But I am clearly
+1 with this approach.

> I also tweaked lazy_record_dead_tuple to fail with ERROR if the tuple
> cannot be recorded, as observed by Yi Wen.  AFAICS that's not reachable
> because of the way the array is allocated, so an elog(ERROR) is
> sufficient.
>
> I regret my inability to turn the oneliner into a committable test case,
> but I think that's beyond what I can do for now.

Here are some comments about your last patch.

heap_tuple_needs_freeze looks to be still consistent with
heap_prepare_freeze_tuple even after what you have changed, which is
good.

Using again the test of Dan at the top of the thread, I am seeing from
time to time what looks like garbage data in xmax, like that:ctid  | xmin | xmax | id
-------+------+------+----(0,1) |  620 |    0 |  1(0,7) |  625 |   84 |  3
(2 rows)
[...]ctid  | xmin | xmax | id
-------+------+------+----(0,1) |  656 |    0 |  1(0,6) |  661 |  128 |  3
(2 rows)
Putting manual sleeps in lazy_scan_heap does not change the frequency
of their appearances.
-- 
Michael


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Alvaro Herrera

Date:

12 September 2017, 14:22:03

Michael Paquier wrote:
> On Mon, Sep 11, 2017 at 11:01 PM, Alvaro Herrera
> <alvherre@alvh.no-ip.org> wrote:
> > (I also threw in a small sleep between heap_page_prune and
> > HeapTupleSatisfiesVacuum while testing, just to widen the problem window
> > to hopefully make any remaining problems more evident.)
> 
> I am understanding that you mean heap_prepare_freeze_tuple here
> instead of heap_page_prune.

Hmm ... no, I meant adding a sleep after the page is pruned, before
HeapTupleSatisfiesVacuum call that determines the action with regards to
freezing.

> > This turned up a few different failure modes, which I fixed until no
> > further problems arose.  With the attached patch, I no longer see any
> > failures (assertion failures) or misbehavior (additional rows), in a few
> > dozen runs, which were easy to come by with the original code.
> 
> Well, you simply removed the assertion ;), and my tests don't show
> additional rows as well, which is nice.

Yeah, the assertion was wrong -- it was essentially assuming that the
window above (between page pruning and HTSV) was of zero size, which is
evidently what caused this whole disaster.

> > The
> > resulting patch, which I like better than the previously proposed idea
> > of skipping the freeze, takes the approach of handling freeze correctly
> > for the cases where the tuple still exists after pruning.
> 
> That's also something I was wondering when looking at the first patch.
> I am unfortunately not as skilled as you are with this area of the
> code (this thread has brought its quantity of study!), so I was not
> able to draw a clear line with what needs to be done. But I am clearly
> +1 with this approach.

Great, thanks.

> > I also tweaked lazy_record_dead_tuple to fail with ERROR if the tuple
> > cannot be recorded, as observed by Yi Wen.  AFAICS that's not reachable
> > because of the way the array is allocated, so an elog(ERROR) is
> > sufficient.
> >
> > I regret my inability to turn the oneliner into a committable test case,
> > but I think that's beyond what I can do for now.
> 
> Here are some comments about your last patch.
> 
> heap_tuple_needs_freeze looks to be still consistent with
> heap_prepare_freeze_tuple even after what you have changed, which is
> good.

Thanks for confirming.

> Using again the test of Dan at the top of the thread, I am seeing from
> time to time what looks like garbage data in xmax, like that:
>  ctid  | xmin | xmax | id
> -------+------+------+----
>  (0,1) |  620 |    0 |  1
>  (0,7) |  625 |   84 |  3
> (2 rows)
> [...]
>  ctid  | xmin | xmax | id
> -------+------+------+----
>  (0,1) |  656 |    0 |  1
>  (0,6) |  661 |  128 |  3
> (2 rows)

I bet those are multixact values rather than garbage.  You should see
HEAP_XMAX_IS_MULTI in t_infomask in those cases, and the values should
be near NextMultiXactId (make sure to checkpoint if you examine with
pg_controldata; I think it's easier to obtain it from shmem.  Or you
could just use pg_get_multixact_members()).

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Michael Paquier

Date:

12 September 2017, 18:32:12

On Tue, Sep 12, 2017 at 5:22 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Michael Paquier wrote:
>> On Mon, Sep 11, 2017 at 11:01 PM, Alvaro Herrera
>> <alvherre@alvh.no-ip.org> wrote:
>> > (I also threw in a small sleep between heap_page_prune and
>> > HeapTupleSatisfiesVacuum while testing, just to widen the problem window
>> > to hopefully make any remaining problems more evident.)
>>
>> I am understanding that you mean heap_prepare_freeze_tuple here
>> instead of heap_page_prune.
>
> Hmm ... no, I meant adding a sleep after the page is pruned, before
> HeapTupleSatisfiesVacuum call that determines the action with regards to
> freezing.

Well, I have tested both. With the test case of Dan adding a sleep
after calling HeapTupleSatisfiesVacuum() helped in increasing the
window. Of course feel free to correct me.

>> Using again the test of Dan at the top of the thread, I am seeing from
>> time to time what looks like garbage data in xmax, like that:
>>  ctid  | xmin | xmax | id
>> -------+------+------+----
>>  (0,1) |  620 |    0 |  1
>>  (0,7) |  625 |   84 |  3
>> (2 rows)
>> [...]
>>  ctid  | xmin | xmax | id
>> -------+------+------+----
>>  (0,1) |  656 |    0 |  1
>>  (0,6) |  661 |  128 |  3
>> (2 rows)
>
> I bet those are multixact values rather than garbage.  You should see
> HEAP_XMAX_IS_MULTI in t_infomask in those cases, and the values should
> be near NextMultiXactId (make sure to checkpoint if you examine with
> pg_controldata; I think it's easier to obtain it from shmem.  Or you
> could just use pg_get_multixact_members()).

Yes, you're right. That's the case. So I guess that I don't have much
complains about the patch as presented.
-- 
Michael


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Alvaro Herrera

Date:

12 September 2017, 18:41:36

Michael Paquier wrote:
> On Tue, Sep 12, 2017 at 5:22 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > Michael Paquier wrote:
> >> On Mon, Sep 11, 2017 at 11:01 PM, Alvaro Herrera
> >> <alvherre@alvh.no-ip.org> wrote:
> >> > (I also threw in a small sleep between heap_page_prune and
> >> > HeapTupleSatisfiesVacuum while testing, just to widen the problem window
> >> > to hopefully make any remaining problems more evident.)
> >>
> >> I am understanding that you mean heap_prepare_freeze_tuple here
> >> instead of heap_page_prune.
> >
> > Hmm ... no, I meant adding a sleep after the page is pruned, before
> > HeapTupleSatisfiesVacuum call that determines the action with regards to
> > freezing.
> 
> Well, I have tested both. With the test case of Dan adding a sleep
> after calling HeapTupleSatisfiesVacuum() helped in increasing the
> window. Of course feel free to correct me.

OK, great, thanks for testing.


> > I bet those are multixact values rather than garbage.  You should see
> > HEAP_XMAX_IS_MULTI in t_infomask in those cases, and the values should
> > be near NextMultiXactId (make sure to checkpoint if you examine with
> > pg_controldata; I think it's easier to obtain it from shmem.  Or you
> > could just use pg_get_multixact_members()).
> 
> Yes, you're right. That's the case. So I guess that I don't have much
> complains about the patch as presented.

Thanks for the confirmation.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

"Wong, Yi Wen"

Date:

13 September 2017, 00:38:02

> > The
> > resulting patch, which I like better than the previously proposed idea
> > of skipping the freeze, takes the approach of handling freeze correctly
> > for the cases where the tuple still exists after pruning.
>
> That's also something I was wondering when looking at the first patch.
> I am unfortunately not as skilled as you are with this area of the
> code (this thread has brought its quantity of study!), so I was not
> able to draw a clear line with what needs to be done. But I am clearly
> +1 with this approach.

I have tried out the new patch and it works great as well! +1 from me.

> > I also tweaked lazy_record_dead_tuple to fail with ERROR if the tuple
> > cannot be recorded, as observed by Yi Wen.  AFAICS that's not reachable
> > because of the way the array is allocated, so an elog(ERROR) is
> > sufficient.

I agree the fail is rare (and probably doesn't happen in real cases, although the comment
does imply with a sufficiently low working_memory it might?). However, I'd dispute that ERROR
is sufficient --PANIC is probably appropriate here because FREEZE is not WAL-logged until
the end of the page; so we'd end up with unfrozen Xids hanging around with an appropriately
timed crash. I wouldn't worry much about the PANIC tripping because like you said,
this seems unreachable normally.

The errmsg should come with a errhint saying "Increase maintenance_work_mem".

I've done some code inspection to try to figure out if crashing at any point around before FREEZE and
the later lazy_vacuum_page phase is safe (i.e. older Xids might be left behind on restart/crash); it seems to me it
shouldn't be a problem until new_min_xid and new_min_multi isn't updated until later for the truncate.

Yi Wen

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

"Wong, Yi Wen"

Date:

13 September 2017, 02:54:58

> > > I also tweaked lazy_record_dead_tuple to fail with ERROR if the tuple
> > > cannot be recorded, as observed by Yi Wen.  AFAICS that's not reachable
> > > because of the way the array is allocated, so an elog(ERROR) is
> > > sufficient.
>
> I agree the fail is rare (and probably doesn't happen in real cases, although the comment
> does imply with a sufficiently low working_memory it might?). However, I'd dispute that ERROR
> is sufficient --PANIC is probably appropriate here because FREEZE is not WAL-logged until
> the end of the page; so we'd end up with unfrozen Xids hanging around with an appropriately
> timed crash. I wouldn't worry much about the PANIC tripping because like you said,
> this seems unreachable normally.
>
> The errmsg should come with a errhint saying "Increase maintenance_work_mem".

On closer inspection, I misread the code. The freezes are collected instead of executed, so a ERROR would
suffice. We don't need to change it to PANIC. errhint comment remains.

Yi Wen

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

"Wong, Yi Wen"

Date:

13 September 2017, 05:29:25

I've spent some time reviewing the new patch and have made a couple of minor changes.

I've made some changes to this chunk:

        else if (TransactionIdIsNormal(xid))
        {
                if (TransactionIdPrecedes(xid, cutoff_xid))
-                       freeze_xmax = true;
+               {
+                       if (HeapTupleHeaderIsHeapOnly(tuple) ||
+                               HeapTupleHeaderIsHotUpdated(tuple))
+                       {
+                               frz->t_infomask |= HEAP_XMAX_COMMITTED;
+                               totally_frozen = false;
+                               changed = true;
+                       }
+                       else
+                               freeze_xmax = true;
+               }
                else
                        totally_frozen = false;
        }

The changes are:

1) To add the line changed = true;
This is necessary to actually set the bits of to HEAP_XMAX_COMMITTED. In repro.sql, this is set anyway
because the xmin is frozen. However, consider a scenario where we have xmin frozen and xmax still live in a
previous VACUUM FREEZE pass, if we do a second pass where HEAP_XMAX_COMMITTED needs to be set,
we will lose that change.

2) Flip the order of HeapTupleHeaderIsHeapOnly(tuple) and HeapTupleHeaderIsHotUpdated(tuple). This is merely a
microoptimization on a non-performance critical path because HeapOnly is a broader condititon than HotUpdated :-)

I've also changed the elog ERROR. This seems necessary for a user to know, because it does mean VACUUM is broken on
their 
system:

+       if (vacrelstats->num_dead_tuples >= vacrelstats->max_dead_tuples)
+               ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+                               errmsg ("dead tuple array overflow: %d dead, %d max",
+                                               vacrelstats->num_dead_tuples,
+                                               vacrelstats->max_dead_tuples),
+                               errhint("Increase maintenance_work_mem")));

I've also made some changes to comments in the code.

Thanks,
Yi Wen

________________________________________
From: pgsql-bugs-owner@postgresql.org <pgsql-bugs-owner@postgresql.org> on behalf of Wong, Yi Wen <yiwong@amazon.com>
Sent: Tuesday, September 12, 2017 1:54 PM
To: Alvaro Herrera; Michael Paquier
Cc: Peter Geoghegan; Wood, Dan; pgsql-bugs@postgresql.org; Andres Freund
Subject: Re: [BUGS] Old row version in hot chain become visible after a freeze

> > > I also tweaked lazy_record_dead_tuple to fail with ERROR if the tuple
> > > cannot be recorded, as observed by Yi Wen.  AFAICS that's not reachable
> > > because of the way the array is allocated, so an elog(ERROR) is
> > > sufficient.
>
> I agree the fail is rare (and probably doesn't happen in real cases, although the comment
> does imply with a sufficiently low working_memory it might?). However, I'd dispute that ERROR
> is sufficient --PANIC is probably appropriate here because FREEZE is not WAL-logged until
> the end of the page; so we'd end up with unfrozen Xids hanging around with an appropriately
> timed crash. I wouldn't worry much about the PANIC tripping because like you said,
> this seems unreachable normally.
>
> The errmsg should come with a errhint saying "Increase maintenance_work_mem".

On closer inspection, I misread the code. The freezes are collected instead of executed, so a ERROR would
suffice. We don't need to change it to PANIC. errhint comment remains.

Yi Wen

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Attachment

update_freeze_v4.patch

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Alvaro Herrera

Date:

23 September 2017, 20:05:48

This fails in backbranches.  I think I should back-patch
c2581794f37e ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Michael Paquier

Date:

27 September 2017, 11:18:04

On Sat, Sep 23, 2017 at 11:05 PM, Alvaro Herrera
<alvherre@alvh.no-ip.org> wrote:
> This fails in backbranches.  I think I should back-patch
> c2581794f37e ...

If you mean to merge those patches directly in the final versions to
be committed, yes that looks adapted.
-- 
Michael


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Alvaro Herrera

Date:

28 September 2017, 19:44:00

Wong, Yi Wen wrote:

> 2) Flip the order of HeapTupleHeaderIsHeapOnly(tuple) and HeapTupleHeaderIsHotUpdated(tuple). This is merely a
> microoptimization on a non-performance critical path because HeapOnly is a broader condititon than HotUpdated :-)

Actually, is this correct?  AFAICS, IsHeapOnly refers to the *new*
version of the tuple in a HOT update, whereas IsHotUpdated refers to the
*old* version.  We surely must never freeze a IsHotUpdated tuple
(because that would bring a dead tuple back to life, which is the bug at
hand), but what is the argument against freezing a IsHeapOnly tuple?

I think the way this test is written comes from some fuzzy thinking --
somebody (probably me) copied the test from lazy_scan_heap, but that
routine has two different reasons for each of those two conditions (one
is merely an optimization, the other is necessary for correctness).  But
in the spot we're patching, one of them is wrong.

I could observe that this bug causes xids older than cutoff xid to
remain in xmax occasionally with that test in place.  Removing the
"IsHeapOnly" part and keeping only IsHotUpdated makes that go away.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Alvaro Herrera

Date:

28 September 2017, 20:56:57

Okay, I have pushed this patch now backpatched to all branches since
9.3, after staring at this code for way too long.  I spent a lot of time
testing it to ensure things are correct, but please double-check.

Many thanks to Dan Wood for the initial diagnosys, and to Yi Wen and
Michael for the review and discussion.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Alvaro Herrera

Date:

28 September 2017, 21:00:29

Jeff Frost wrote:

> Any idea on how to identify affected rows on a running system?

> Is it as simple as:
> 
> SELECT id, count(*) FROM foo GROUP BY id HAVING count(*) > 1;
> 
> Maybe also need to:
> 
> set enable_indexscan = 0;
> set enable_indexonlyscan = 0;
> 
> before running the SELECT?

Did you find out?  Your proposed query/settings seems like a good way to
determine if any supposed unique column is still unique, but I wonder if
there's a better way.  I suppose a procedure would involve seeing which
tables have HOT updates, and scan those.

Of course, this could theoretically happen to tables without any UNIQUE,
and then the recipe doesn't work.

> Is it possible to affect a DELETE or does it need to be a HOT updated row?

AFAICS it needs to be a HOT update.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Alvaro Herrera

Date:

28 September 2017, 21:24:33

Hmm.  In going over this once more while looking at the coverage report
site, I wonder if the "NB" comment on top of heap_prepare_freeze_tuple
has important consequences for some of this new code.  It probably does,
but I'm not sure what to do about it.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Old row version in hot chain become visible after a freeze

From

Alvaro Herrera

Date:

28 September 2017, 22:47:13

Alvaro Herrera wrote:
> Okay, I have pushed this patch now backpatched to all branches since
> 9.3, after staring at this code for way too long.

By the way, as a side-effect of this commit, heapam.c has gone from
76.6% covered to 78.6% covered; in particular, FreezeMultiXactId() has
gone from absolutely not touched at all to having about half its lines
covered; also some newly covered paths in heap_prepare_freeze_tuple.
It's not a huge increase, but it's a start.  IMO we need some more test
cases to cover a few more branches here ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs