Thread: [BUGS] Old row version in hot chain become visible after a freeze
From: Wood, Dan hexpert(at)amazon(dot)com
I’ve found a bug in Postgres which causes old row versions to appear in a table. DEAD rows in a hot chain are getting frozen and becoming visible. I’ve repro’d this in both 9.6.1 and 11-devel.
The repro consists of two short psql scripts.
While the repro does an explicit VACUUM FREEZE, this bug also happens with autovacuum.
FILE: lock.sql
begin;
select id from t where id=3 for key share;
select pg_sleep(1);
update t set x=x+1 where id=3;
commit;
vacuum freeze t;
select ctid, xmin, xmax, id from t;
FILE: repro.sql
drop table t;
create table t (id int primary key, name char(3), x integer);
insert into t values (1, '111', 0);
insert into t values (3, '333', 0);
\! psql -p 5432 postgres -f lock.sql &
\! psql -p 5432 postgres -f lock.sql &
\! psql -p 5432 postgres -f lock.sql &
\! psql -p 5432 postgres -f lock.sql &
\! psql -p 5432 postgres -f lock.sql &
It’s about 50-50 whether any given run of repro.sql will produce output like:
…
ctid | xmin | xmax | id | x
-------+------+------+----+---
(0,1) | 984 | 0 | 1 | 0
(0,7) | 990 | 0 | 3 | 5
(2 rows)
ctid | xmin | xmax | id | x
-------+------+------+----+---
(0,1) | 984 | 0 | 1 | 0
(0,3) | 986 | 0 | 3 | 1 // This, and x = 2, 3 and 4 came back from the DEAD
(0,4) | 987 | 0 | 3 | 2
(0,5) | 988 | 0 | 3 | 3
(0,6) | 989 | 0 | 3 | 4
(0,7) | 990 | 0 | 3 | 5
(6 rows)
Root cause analysis: lazy_scan_heap() deletes DEAD tuples in heap_page_prune(). However, it is possible for concurrent commits/rollbacks to render a tuple DEAD by the time we reach the switch statement on HeapTupleSatisfiesVacuum(). If such a row IsHotUpdated or IsHeapOnly we can’t delete it below, and must allow a later prune to take care of it.
if (HeapTupleIsHotUpdated(&tuple) || HeapTupleIsHeapOnly(&tuple))
nkeep += 1; // Don't delete, allow later prune to delete it
else
tupgone = true; // We can delete it below
Because tupgone is false we freeze instead of deleting. Freezing a DEAD tuple makes it visible. Here is a comment in heap_prepare_freeze_tuple()
* It is assumed that the caller has checked the tuple with
* HeapTupleSatisfiesVacuum() and determined that it is not HEAPTUPLE_DEAD
* (else we should be removing the tuple, not freezing it).
It is rare that we run into a DEAD tuple in this way during a freeze. More often RECENTLY_DEAD is returned. But we did see this with a more realistic long running test and I was able to create the simplified test case above. Skipping the Freeze on a DEAD tuple that IsHotUpdated or IsHeapOnly does fix the problem. I’ve attached a patch with this fix.
Attachment
Hi Dan, Nice to hear from you. On Thu, Aug 31, 2017 at 3:36 PM, Wood, Dan <hexpert@amazon.com> wrote: > Because tupgone is false we freeze instead of deleting. Freezing a DEAD > tuple makes it visible. Here is a comment in heap_prepare_freeze_tuple() > > > > * It is assumed that the caller has checked the tuple with > > * HeapTupleSatisfiesVacuum() and determined that it is not HEAPTUPLE_DEAD Funny that there'd be another bug associated with heap_prepare_freeze_tuple() so soon after the last one was discovered. Are you aware of the other one [1]? BTW, I just posted a patch to enhance amcheck, to allow it to verify that an index has all the entries that it ought to [2]. Perhaps you'd find it useful for this kind of thing. I welcome your feedback on that. [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=31b8db8e6c1fa4436116f4be5ca789f3a01b9ebf;hp=f1dae097f2945ffcb59a9f236843e0e0bbf0920d [2] https://commitfest.postgresql.org/14/1263/ -- Peter Geoghegan -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
I was aware of the other one and, in fact, reverted the change just to make sure it wasn’t Involved. A strange coincidenceindeed. On 8/31/17, 3:57 PM, "Peter Geoghegan" <pg@bowt.ie> wrote: Hi Dan, Nice to hear from you. On Thu, Aug 31, 2017 at 3:36 PM, Wood, Dan <hexpert@amazon.com> wrote: > Becausetupgone is false we freeze instead of deleting. Freezing a DEAD > tuple makes it visible. Here is a comment inheap_prepare_freeze_tuple() > > > > * It is assumed that the caller has checked the tuple with > > * HeapTupleSatisfiesVacuum()and determined that it is not HEAPTUPLE_DEAD Funny that there'd be another bug associatedwith heap_prepare_freeze_tuple() so soon after the last one was discovered. Are you aware of the other one[1]? BTW, I just posted a patch to enhance amcheck, to allow it to verify that an index has all the entries thatit ought to [2]. Perhaps you'd find it useful for this kind of thing. I welcome your feedback on that. [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=31b8db8e6c1fa4436116f4be5ca789f3a01b9ebf;hp=f1dae097f2945ffcb59a9f236843e0e0bbf0920d [2] https://commitfest.postgresql.org/14/1263/ -- Peter Geoghegan -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Thu, Aug 31, 2017 at 4:20 PM, Wood, Dan <hexpert@amazon.com> wrote: > I was aware of the other one and, in fact, reverted the change just to make sure it wasn’t Involved. A strange coincidenceindeed. FWIW, the enhanced amcheck does actually detect this: postgres=# select bt_index_check('t_pkey', true); ERROR: XX000: failed to find parent tuple for heap-only tuple at (0,3) in table "t" LOCATION: IndexBuildHeapRangeScan, index.c:2597 The good news is that this happens in a codepath that is also used by REINDEX: postgres=# reindex index t_pkey ; ERROR: XX000: failed to find parent tuple for heap-only tuple at (0,3) in table "t" LOCATION: IndexBuildHeapRangeScan, index.c:2597 Perhaps the lack of field reports of this error suggests that it's fairly rare. Very hard to be sure about that, though. -- Peter Geoghegan -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Thu, Aug 31, 2017 at 3:36 PM, Wood, Dan <hexpert@amazon.com> wrote: > I’ve found a bug in Postgres which causes old row versions to appear in a > table. DEAD rows in a hot chain are getting frozen and becoming visible. > I’ve repro’d this in both 9.6.1 and 11-devel. I can confirm that this goes all the way back to 9.3, where "for key share"/foreign key locks were introduced. -- Peter Geoghegan -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Fri, Sep 1, 2017 at 12:25 PM, Peter Geoghegan <pg@bowt.ie> wrote: > On Thu, Aug 31, 2017 at 3:36 PM, Wood, Dan <hexpert@amazon.com> wrote: >> I’ve found a bug in Postgres which causes old row versions to appear in a >> table. DEAD rows in a hot chain are getting frozen and becoming visible. >> I’ve repro’d this in both 9.6.1 and 11-devel. > > I can confirm that this goes all the way back to 9.3, where "for key > share"/foreign key locks were introduced. Hm. That looks pretty bad to me... It is bad luck that this report happens just after the last round of minor releases has been out, and we would have needed at least a couple of days and a couple of pairs of eyes to come up with a correct patch. (I haven't looked at the proposed solution and the attached patch yet, so no opinions yet). -- Michael -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Fri, Sep 1, 2017 at 12:46 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Fri, Sep 1, 2017 at 12:25 PM, Peter Geoghegan <pg@bowt.ie> wrote: >> On Thu, Aug 31, 2017 at 3:36 PM, Wood, Dan <hexpert@amazon.com> wrote: >>> I’ve found a bug in Postgres which causes old row versions to appear in a >>> table. DEAD rows in a hot chain are getting frozen and becoming visible. >>> I’ve repro’d this in both 9.6.1 and 11-devel. >> >> I can confirm that this goes all the way back to 9.3, where "for key >> share"/foreign key locks were introduced. > > Hm. That looks pretty bad to me... It is bad luck that this report > happens just after the last round of minor releases has been out, and > we would have needed at least a couple of days and a couple of pairs > of eyes to come up with a correct patch. (I haven't looked at the > proposed solution and the attached patch yet, so no opinions yet). Using a build where assertions are enabled, the provided test case blows up before even returning results: frame #3: 0x0000000109e8df90 postgres`ExceptionalCondition(conditionName="!(!((update_xid) != ((TransactionId) 0)) || !TransactionIdPrecedes(update_xid, cutoff_xid))", errorType="FailedAssertion", fileName="heapam.c", lineNumber=6533) + 128 at assert.c:54 frame #4: 0x00000001098fba6b postgres`FreezeMultiXactId(multi=34, t_infomask=4930, cutoff_xid=897, cutoff_multi=30, flags=0x00007fff56372fae) + 1179 at heapam.c:6532 frame #5: 0x00000001098fb2cd postgres`heap_prepare_freeze_tuple(tuple=0x000000010b04d0b0, cutoff_xid=897, cutoff_multi=30, frz=0x00007fae0d800040, totally_frozen_p="\x01\x02") + 285 at heapam.c:6673 frame #6: 0x0000000109af356d postgres`lazy_scan_heap(onerel=0x000000010a6fe630, options=9, vacrelstats=0x00007fae0a0580a8, Irel=0x00007fae0a058688, nindexes=1, aggressive='\x01') + 4413 at vacuumlazy.c:1081 frame #7: 0x0000000109af1ef2 postgres`lazy_vacuum_rel(onerel=0x000000010a6fe630, options=9, params=0x00007fff56373608, bstrategy=0x00007fae0a047c40) + 626 at vacuumlazy.c:253 (lldb) up 1 frame #4: 0x00000001098fba6b postgres`FreezeMultiXactId(multi=34, t_infomask=4930, cutoff_xid=897, cutoff_multi=30, flags=0x00007fff56372fae) + 1179 at heapam.c:6532 6529 * Since the tuple wasn't marked HEAPTUPLE_DEAD by vacuum, the 6530 * update Xid cannot possibly be older than the xid cutoff. 6531 */ -> 6532 Assert(!TransactionIdIsValid(update_xid) || 6533 !TransactionIdPrecedes(update_xid,cutoff_xid)); 6534 6535 /* (lldb) p update_xid (TransactionId) $0 = 896 (lldb) p cutoff_xid (TransactionId) $1 = 897 As the portion doing vacuum freeze is the one blowing up, I think that it is possible to extract an isolation test and include it in the patch with repro.sql as the initialization phase. /* - * Each non-removable tuple must be checked to see if it needs + * Each non-removable tuple that we do not keep must be checked to see if it needs * freezing. Note we already have exclusive buffer lock. */ - if (heap_prepare_freeze_tuple(tuple.t_data, FreezeLimit, + if (!tupkeep) + { + if (heap_prepare_freeze_tuple(tuple.t_data, FreezeLimit, MultiXactCutoff, &frozen[nfrozen], &tuple_totally_frozen)) - frozen[nfrozen++].offset = offnum; + frozen[nfrozen++].offset = offnum; Wouldn't it be better to check if freeze is needed for the tuple scanned with heap_tuple_needs_freeze() instead of inventing a new custom condition? Please note as well that heap_tuple_needs_freeze() does not need its last "buf" argument. -- Michael -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Tue, Sep 5, 2017 at 9:44 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > As the portion doing vacuum freeze is the one blowing up, I think that > it is possible to extract an isolation test and include it in the > patch with repro.sql as the initialization phase. Indeed, I have been able to reduce that to an isolation test which crashes with a rate of 100% if assertions are enabled. I have also spent a couple more hours looking at the proposed patch and eye-balling the surrounding code, and my suggestion about heap_tuple_needs_freeze() is proving to be wrong. So I am arriving at the conclusion that your patch is taking the right approach to skip freezing completely if the tuple is not to be removed yet if it is for vacuum either DEAD or RECENTLY_DEAD. I am adding as well in CC Álvaro and Andres, who reworked tuple freezing in 3b97e682 a couple of years back for 9.3. Could you look more at the bug and the attached patch? It does not fundamentally change from what has been proposed first, just did the following: - Updated set of comments that incorrectly assumed that only DEAD tuples should have freeze skipped. - Added isolation test for the failure. An entry has been added in the next open CF to track this problem. This should not fall into the cracks! -- Michael -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Attachment
Michael Paquier wrote: > I have also spent a couple more hours looking at the proposed patch > and eye-balling the surrounding code, and my suggestion about > heap_tuple_needs_freeze() is proving to be wrong. So I am arriving at > the conclusion that your patch is taking the right approach to skip > freezing completely if the tuple is not to be removed yet if it is for > vacuum either DEAD or RECENTLY_DEAD. I think in the "tupkeep" case we must not mark the page as frozen in VM; in other words I think that block needs to look like this: // tupgone = false { bool tuple_totally_frozen; num_tuples += 1; hastup = true; /* * Each non-removable tuple that we do not keep must be checked * to see ifit needs freezing. Note we already have exclusive * buffer lock. */ if (!tupkeep&& heap_prepare_freeze_tuple(tuple.t_data, FreezeLimit, MultiXactCutoff, &frozen[nfrozen], &tuple_totally_frozen)) frozen[nfrozen++].offset = offnum; if (tupkeep || !tuple_totally_frozen) all_frozen = false; } Otherwise, we risk marking the page as all-frozen, and it would be skipped by vacuum. If we never come around to HOT-pruning the page, a non-permanent xid (or a multixact? not sure that that can happen; probably not) would linger unnoticed and cause a DoS condition later ("cannot open file pg_clog/1234") when the tuple header is read. Now, it is possible that HOT pruning would fix the page promptly without causing an actual DoS, but nonetheless it seems dangerous to leave things like this. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Another thing is that if you're going through heap_copy_data, the tuple is only checked for DEAD, not RECENTLY_DEAD. Maybe there is no bug here but I'm not sure yet. I attach the patch as I have it now -- mostly comment tweaks, but also the change to not mark page as all-frozen when we skip freezing a recently read tuple. I also renamed the test case, because I think it may be possible to reach the problem code without involving a multixact. Haven't tried too hard though. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Attachment
First new reply On 9/6/17, 3:41 AM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote: Michael Paquier wrote: > I have also spent a couple more hours looking at the proposed patch > and eye-ballingthe surrounding code, and my suggestion about > heap_tuple_needs_freeze() is proving to be wrong. So I am arrivingat > the conclusion that your patch is taking the right approach to skip > freezing completely if the tuple isnot to be removed yet if it is for > vacuum either DEAD or RECENTLY_DEAD. I think in the "tupkeep" case we mustnot mark the page as frozen in VM; in other words I think that block needs to look like this: // tupgone= false { bool tuple_totally_frozen; num_tuples += 1; hastup = true; /* * Each non-removable tuple that we do not keepmust be checked * to see if it needs freezing. Note we already have exclusive * buffer lock. */ if (!tupkeep && heap_prepare_freeze_tuple(tuple.t_data,FreezeLimit, MultiXactCutoff, &frozen[nfrozen], &tuple_totally_frozen)) frozen[nfrozen++].offset = offnum; if (tupkeep || !tuple_totally_frozen) all_frozen = false; } Otherwise, we risk marking the pageas all-frozen, and it would be skipped by vacuum. If we never come around to HOT-pruning the page, a non-permanentxid (or a multixact? not sure that that can happen; probably not) would linger unnoticed and cause a DoS conditionlater ("cannot open file pg_clog/1234") when the tuple header is read. Now, it is possible that HOT pruningwould fix the page promptly without causing an actual DoS, but nonetheless it seems dangerous to leave thingslike this. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support,Remote DBA, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Sorry for my mistaken last mail -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Michael Paquier wrote: > frame #4: 0x00000001098fba6b postgres`FreezeMultiXactId(multi=34, > t_infomask=4930, cutoff_xid=897, cutoff_multi=30, > flags=0x00007fff56372fae) + 1179 at heapam.c:6532 > 6529 * Since the tuple wasn't marked HEAPTUPLE_DEAD by vacuum, the > 6530 * update Xid cannot possibly be older than the xid cutoff. > 6531 */ > -> 6532 Assert(!TransactionIdIsValid(update_xid) || > 6533 !TransactionIdPrecedes(update_xid, cutoff_xid)); > 6534 > 6535 /* > (lldb) p update_xid > (TransactionId) $0 = 896 > (lldb) p cutoff_xid > (TransactionId) $1 = 897 So, looking at this closely, I think there is a bigger problem here: if we use any of the proposed patches or approaches, we risk leaving an old Xid in a tuple (because of skipping the heap_tuple_prepare_freeze on a tuple which remains in the heap with live Xids), followed by later truncating pg_multixact / pg_clog removing a segment that might be critical to resolving this tuple status later on. I think doing the tuple freezing dance for any tuple we don't remove from the heap is critical, not optional. Maybe a later HOT pruning would save you from actual disaster, but I think it'd be a bad idea to rely on that. So ISTM we need a different solution than what's been proposed so far; and I think that solution is different for each of the possible problem cases, which are two: HEAPTUPLE_DEAD and HEAPTUPLE_RECENTLY_DEAD. I think we can cover the HEAPTUPLE_DEAD case by just redoing the heap_page_prune (just add a "goto" back to it if we detect the case). It's a bit wasteful because we'd re-process all the prior tuples in the loop below, but since it's supposed to be an infrequent condition, I think it should be okay. The RECENTLY_DEAD case is interesting. We know that the updater is committed, and since the update XID is older than the cutoff XID, then we know nobody else can see the tuple. So we can simply remove it ... and we already have a mechanism for that: return FRM_MARK_COMMITTED in FreezeMultiXactId. But the code already does that! The only thing we need in order for this to be handled correctly is to remove the assert. A case I didn't think about yet is RECENTLY_DEAD if the xmax is a plain Xid (not a multi). My vague feeling is that there is no bug here. I haven't actually tested this. Planning to look into it tomorrow. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Sep 6, 2017 at 7:40 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Otherwise, we risk marking the page as all-frozen, and it would be > skipped by vacuum. If we never come around to HOT-pruning the page, a > non-permanent xid (or a multixact? not sure that that can happen; > probably not) would linger unnoticed and cause a DoS condition later > ("cannot open file pg_clog/1234") when the tuple header is read. Yes, you are definitely right here. On Wed, Sep 6, 2017 at 10:27 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Another thing is that if you're going through heap_copy_data, the tuple > is only checked for DEAD, not RECENTLY_DEAD. Maybe there is no bug here > but I'm not sure yet. Not sure to what you are referring here. Perhaps heap_prune_chain()? It seems to me that it does the right thing. > I attach the patch as I have it now -- mostly comment tweaks, but also > the change to not mark page as all-frozen when we skip freezing a > recently read tuple. I also renamed the test case, because I think it > may be possible to reach the problem code without involving a multixact. > Haven't tried too hard though. +test: freeze-the-dead This one has made me smile.. +++ b/src/test/isolation/expected/freeze-the-dead.spec @@ -0,0 +1,101 @@ +Parsed test spec with 2 sessions The test fails as expected/ files need to be named with .out, not .spec. -- Michael -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Sep 6, 2017, at 6:27 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > Another thing is that if you're going through heap_copy_data, the tuple > is only checked for DEAD, not RECENTLY_DEAD. Maybe there is no bug here > but I'm not sure yet. Any idea on how to identify affected rows on a running system? -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
> On Sep 6, 2017, at 8:23 PM, Jeff Frost <jeff@pgexperts.com> wrote: > > > Any idea on how to identify affected rows on a running system? I guess better questions would be: Is it as simple as: SELECT id, count(*) FROM foo GROUP BY id HAVING count(*) > 1; Maybe also need to: set enable_indexscan = 0; set enable_indexonlyscan = 0; before running the SELECT? Is it possible to affect a DELETE or does it need to be a HOT updated row? -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
________________________________________ From: pgsql-bugs-owner@postgresql.org <pgsql-bugs-owner@postgresql.org> on behalf of Alvaro Herrera <alvherre@alvh.no-ip.org> Sent: Wednesday, September 6, 2017 3:12 PM To: Michael Paquier Cc: Peter Geoghegan; Wood, Dan; pgsql-bugs@postgresql.org Subject: Re: [BUGS] Old row version in hot chain become visible after a freeze >Michael Paquier wrote: >> frame #4: 0x00000001098fba6b postgres`FreezeMultiXactId(multi=34, >> t_infomask=4930, cutoff_xid=897, cutoff_multi=30, >> flags=0x00007fff56372fae) + 1179 at heapam.c:6532 >> 6529 * Since the tuple wasn't marked HEAPTUPLE_DEAD by vacuum, the >> 6530 * update Xid cannot possibly be older than the xid cutoff. >> 6531 */ >> -> 6532 Assert(!TransactionIdIsValid(update_xid) || >> 6533 !TransactionIdPrecedes(update_xid, cutoff_xid)); >> 6534 >> 6535 /* >> (lldb) p update_xid >> (TransactionId) $0 = 896 >> (lldb) p cutoff_xid >> (TransactionId) $1 = 897 >So, looking at this closely, I think there is a bigger problem here: if >we use any of the proposed patches or approaches, we risk leaving an old >Xid in a tuple (because of skipping the heap_tuple_prepare_freeze on a >tuple which remains in the heap with live Xids), followed by later >truncating pg_multixact / pg_clog removing a segment that might be >critical to resolving this tuple status later on. >I think doing the tuple freezing dance for any tuple we don't remove >from the heap is critical, not optional. Maybe a later HOT pruning >would save you from actual disaster, but I think it'd be a bad idea to >rely on that. There is actually another case where HEAPTUPLE_DEAD tuples may be kept and have prepare_freeze skipped on them entirely. lazy_record_dead_tuple may fail to record the heap for later pruning for lazy_vacuum_heap if there is already a sufficiently large number of dead tuples in the array: /** The array shouldn't overflow under normal behavior, but perhaps it* could if we are given a really small maintenance_work_mem.In that* case, just forget the last few tuples (we'll get 'em next time).*/ if (vacrelstats->num_dead_tuples < vacrelstats->max_dead_tuples) ... It looks like we don't even check if the lazy_record_dead_tuple operation actually did any actual recording in the tupgone case... if (tupgone){ lazy_record_dead_tuple(vacrelstats, &(tuple.t_self)); HeapTupleHeaderAdvanceLatestRemovedXid(tuple.t_data, &vacrelstats->latestRemovedXid); tups_vacuumed += 1; has_dead_tuples = true; } It's probably unsafe for any operations calling TruncateXXX code to assume that old Xids don't stick around after a vacuum in itself. Thanks, Yi Wen -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Wong, Yi Wen wrote: > There is actually another case where HEAPTUPLE_DEAD tuples may be kept and have > prepare_freeze skipped on them entirely. > > lazy_record_dead_tuple may fail to record the heap for later pruning > for lazy_vacuum_heap if there is already a sufficiently large number of dead tuples > in the array: Hmm, ouch, good catch. AFAICS this is a shouldn't-happen condition, since we bail out of the loop pessimistically as soon as we would be over the array limit if the next page were to be full of dead tuples (i.e., we never give the chance for overflow to actually happen). So unless I misunderstand, this could only fail if you give maint_work_mem smaller than necessary for one pageful of dead tuples, which should be about 1800 bytes ... If we wanted to be very sure about this we could add a test and perhaps abort the vacuum, but I'm not sure it's worth the trouble. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Wong, Yi Wen wrote: >> lazy_record_dead_tuple may fail to record the heap for later pruning >> for lazy_vacuum_heap if there is already a sufficiently large number of dead tuples >> in the array: > Hmm, ouch, good catch. > AFAICS this is a shouldn't-happen condition, since we bail out of the > loop pessimistically as soon as we would be over the array limit if the > next page were to be full of dead tuples (i.e., we never give the chance > for overflow to actually happen). So unless I misunderstand, this could > only fail if you give maint_work_mem smaller than necessary for one > pageful of dead tuples, which should be about 1800 bytes ... > If we wanted to be very sure about this we could add a test and perhaps > abort the vacuum, but I'm not sure it's worth the trouble. I think if we're going to depend on that, we should change the logic from "don't record tuple if no space" to "throw error on no space". regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Michael Paquier wrote: > On Wed, Sep 6, 2017 at 7:40 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > Otherwise, we risk marking the page as all-frozen, and it would be > > skipped by vacuum. If we never come around to HOT-pruning the page, a > > non-permanent xid (or a multixact? not sure that that can happen; > > probably not) would linger unnoticed and cause a DoS condition later > > ("cannot open file pg_clog/1234") when the tuple header is read. > > Yes, you are definitely right here. > > On Wed, Sep 6, 2017 at 10:27 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > Another thing is that if you're going through heap_copy_data, the tuple > > is only checked for DEAD, not RECENTLY_DEAD. Maybe there is no bug here > > but I'm not sure yet. > > Not sure to what you are referring here. Perhaps heap_prune_chain()? > It seems to me that it does the right thing. I meant copy_heap_data, which also "freezes" tuples while copying the heap. In the end, I concluded the fact that it runs with access exclusive lock is sufficient protection. I was worried that in the future we might improve that code and enable it to run without AEL, but after looking at it a little more, it sounds like a big enough project that this is going to be the least of its troubles. Anyway, in order to test this, I tried running this one-liner (files attached): psql -f /tmp/repro.sql ; for i in `seq 1 50`; do psql --no-psqlrc -f /tmp/lock.sql & done (I also threw in a small sleep between heap_page_prune and HeapTupleSatisfiesVacuum while testing, just to widen the problem window to hopefully make any remaining problems more evident.) This turned up a few different failure modes, which I fixed until no further problems arose. With the attached patch, I no longer see any failures (assertion failures) or misbehavior (additional rows), in a few dozen runs, which were easy to come by with the original code. The resulting patch, which I like better than the previously proposed idea of skipping the freeze, takes the approach of handling freeze correctly for the cases where the tuple still exists after pruning. I also tweaked lazy_record_dead_tuple to fail with ERROR if the tuple cannot be recorded, as observed by Yi Wen. AFAICS that's not reachable because of the way the array is allocated, so an elog(ERROR) is sufficient. I regret my inability to turn the oneliner into a committable test case, but I think that's beyond what I can do for now. FWIW running this against 9.2 I couldn't detect any problems, which I suppose was expected. Changing the lock mode in the test file to FOR SHARE I only see an enormous amount of deadlocks reported (which makes me confident that doing multixact stuff in 9.3 was a good thing, despite what bug chasers may think.) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Attachment
On Mon, Sep 11, 2017 at 11:01 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > (I also threw in a small sleep between heap_page_prune and > HeapTupleSatisfiesVacuum while testing, just to widen the problem window > to hopefully make any remaining problems more evident.) I am understanding that you mean heap_prepare_freeze_tuple here instead of heap_page_prune. > This turned up a few different failure modes, which I fixed until no > further problems arose. With the attached patch, I no longer see any > failures (assertion failures) or misbehavior (additional rows), in a few > dozen runs, which were easy to come by with the original code. Well, you simply removed the assertion ;), and my tests don't show additional rows as well, which is nice. > The > resulting patch, which I like better than the previously proposed idea > of skipping the freeze, takes the approach of handling freeze correctly > for the cases where the tuple still exists after pruning. That's also something I was wondering when looking at the first patch. I am unfortunately not as skilled as you are with this area of the code (this thread has brought its quantity of study!), so I was not able to draw a clear line with what needs to be done. But I am clearly +1 with this approach. > I also tweaked lazy_record_dead_tuple to fail with ERROR if the tuple > cannot be recorded, as observed by Yi Wen. AFAICS that's not reachable > because of the way the array is allocated, so an elog(ERROR) is > sufficient. > > I regret my inability to turn the oneliner into a committable test case, > but I think that's beyond what I can do for now. Here are some comments about your last patch. heap_tuple_needs_freeze looks to be still consistent with heap_prepare_freeze_tuple even after what you have changed, which is good. Using again the test of Dan at the top of the thread, I am seeing from time to time what looks like garbage data in xmax, like that:ctid | xmin | xmax | id -------+------+------+----(0,1) | 620 | 0 | 1(0,7) | 625 | 84 | 3 (2 rows) [...]ctid | xmin | xmax | id -------+------+------+----(0,1) | 656 | 0 | 1(0,6) | 661 | 128 | 3 (2 rows) Putting manual sleeps in lazy_scan_heap does not change the frequency of their appearances. -- Michael -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Michael Paquier wrote: > On Mon, Sep 11, 2017 at 11:01 PM, Alvaro Herrera > <alvherre@alvh.no-ip.org> wrote: > > (I also threw in a small sleep between heap_page_prune and > > HeapTupleSatisfiesVacuum while testing, just to widen the problem window > > to hopefully make any remaining problems more evident.) > > I am understanding that you mean heap_prepare_freeze_tuple here > instead of heap_page_prune. Hmm ... no, I meant adding a sleep after the page is pruned, before HeapTupleSatisfiesVacuum call that determines the action with regards to freezing. > > This turned up a few different failure modes, which I fixed until no > > further problems arose. With the attached patch, I no longer see any > > failures (assertion failures) or misbehavior (additional rows), in a few > > dozen runs, which were easy to come by with the original code. > > Well, you simply removed the assertion ;), and my tests don't show > additional rows as well, which is nice. Yeah, the assertion was wrong -- it was essentially assuming that the window above (between page pruning and HTSV) was of zero size, which is evidently what caused this whole disaster. > > The > > resulting patch, which I like better than the previously proposed idea > > of skipping the freeze, takes the approach of handling freeze correctly > > for the cases where the tuple still exists after pruning. > > That's also something I was wondering when looking at the first patch. > I am unfortunately not as skilled as you are with this area of the > code (this thread has brought its quantity of study!), so I was not > able to draw a clear line with what needs to be done. But I am clearly > +1 with this approach. Great, thanks. > > I also tweaked lazy_record_dead_tuple to fail with ERROR if the tuple > > cannot be recorded, as observed by Yi Wen. AFAICS that's not reachable > > because of the way the array is allocated, so an elog(ERROR) is > > sufficient. > > > > I regret my inability to turn the oneliner into a committable test case, > > but I think that's beyond what I can do for now. > > Here are some comments about your last patch. > > heap_tuple_needs_freeze looks to be still consistent with > heap_prepare_freeze_tuple even after what you have changed, which is > good. Thanks for confirming. > Using again the test of Dan at the top of the thread, I am seeing from > time to time what looks like garbage data in xmax, like that: > ctid | xmin | xmax | id > -------+------+------+---- > (0,1) | 620 | 0 | 1 > (0,7) | 625 | 84 | 3 > (2 rows) > [...] > ctid | xmin | xmax | id > -------+------+------+---- > (0,1) | 656 | 0 | 1 > (0,6) | 661 | 128 | 3 > (2 rows) I bet those are multixact values rather than garbage. You should see HEAP_XMAX_IS_MULTI in t_infomask in those cases, and the values should be near NextMultiXactId (make sure to checkpoint if you examine with pg_controldata; I think it's easier to obtain it from shmem. Or you could just use pg_get_multixact_members()). -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Tue, Sep 12, 2017 at 5:22 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Michael Paquier wrote: >> On Mon, Sep 11, 2017 at 11:01 PM, Alvaro Herrera >> <alvherre@alvh.no-ip.org> wrote: >> > (I also threw in a small sleep between heap_page_prune and >> > HeapTupleSatisfiesVacuum while testing, just to widen the problem window >> > to hopefully make any remaining problems more evident.) >> >> I am understanding that you mean heap_prepare_freeze_tuple here >> instead of heap_page_prune. > > Hmm ... no, I meant adding a sleep after the page is pruned, before > HeapTupleSatisfiesVacuum call that determines the action with regards to > freezing. Well, I have tested both. With the test case of Dan adding a sleep after calling HeapTupleSatisfiesVacuum() helped in increasing the window. Of course feel free to correct me. >> Using again the test of Dan at the top of the thread, I am seeing from >> time to time what looks like garbage data in xmax, like that: >> ctid | xmin | xmax | id >> -------+------+------+---- >> (0,1) | 620 | 0 | 1 >> (0,7) | 625 | 84 | 3 >> (2 rows) >> [...] >> ctid | xmin | xmax | id >> -------+------+------+---- >> (0,1) | 656 | 0 | 1 >> (0,6) | 661 | 128 | 3 >> (2 rows) > > I bet those are multixact values rather than garbage. You should see > HEAP_XMAX_IS_MULTI in t_infomask in those cases, and the values should > be near NextMultiXactId (make sure to checkpoint if you examine with > pg_controldata; I think it's easier to obtain it from shmem. Or you > could just use pg_get_multixact_members()). Yes, you're right. That's the case. So I guess that I don't have much complains about the patch as presented. -- Michael -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Michael Paquier wrote: > On Tue, Sep 12, 2017 at 5:22 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > Michael Paquier wrote: > >> On Mon, Sep 11, 2017 at 11:01 PM, Alvaro Herrera > >> <alvherre@alvh.no-ip.org> wrote: > >> > (I also threw in a small sleep between heap_page_prune and > >> > HeapTupleSatisfiesVacuum while testing, just to widen the problem window > >> > to hopefully make any remaining problems more evident.) > >> > >> I am understanding that you mean heap_prepare_freeze_tuple here > >> instead of heap_page_prune. > > > > Hmm ... no, I meant adding a sleep after the page is pruned, before > > HeapTupleSatisfiesVacuum call that determines the action with regards to > > freezing. > > Well, I have tested both. With the test case of Dan adding a sleep > after calling HeapTupleSatisfiesVacuum() helped in increasing the > window. Of course feel free to correct me. OK, great, thanks for testing. > > I bet those are multixact values rather than garbage. You should see > > HEAP_XMAX_IS_MULTI in t_infomask in those cases, and the values should > > be near NextMultiXactId (make sure to checkpoint if you examine with > > pg_controldata; I think it's easier to obtain it from shmem. Or you > > could just use pg_get_multixact_members()). > > Yes, you're right. That's the case. So I guess that I don't have much > complains about the patch as presented. Thanks for the confirmation. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
> > The > > resulting patch, which I like better than the previously proposed idea > > of skipping the freeze, takes the approach of handling freeze correctly > > for the cases where the tuple still exists after pruning. > > That's also something I was wondering when looking at the first patch. > I am unfortunately not as skilled as you are with this area of the > code (this thread has brought its quantity of study!), so I was not > able to draw a clear line with what needs to be done. But I am clearly > +1 with this approach. I have tried out the new patch and it works great as well! +1 from me. > > I also tweaked lazy_record_dead_tuple to fail with ERROR if the tuple > > cannot be recorded, as observed by Yi Wen. AFAICS that's not reachable > > because of the way the array is allocated, so an elog(ERROR) is > > sufficient. I agree the fail is rare (and probably doesn't happen in real cases, although the comment does imply with a sufficiently low working_memory it might?). However, I'd dispute that ERROR is sufficient --PANIC is probably appropriate here because FREEZE is not WAL-logged until the end of the page; so we'd end up with unfrozen Xids hanging around with an appropriately timed crash. I wouldn't worry much about the PANIC tripping because like you said, this seems unreachable normally. The errmsg should come with a errhint saying "Increase maintenance_work_mem". I've done some code inspection to try to figure out if crashing at any point around before FREEZE and the later lazy_vacuum_page phase is safe (i.e. older Xids might be left behind on restart/crash); it seems to me it shouldn't be a problem until new_min_xid and new_min_multi isn't updated until later for the truncate. Yi Wen -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
> > > I also tweaked lazy_record_dead_tuple to fail with ERROR if the tuple > > > cannot be recorded, as observed by Yi Wen. AFAICS that's not reachable > > > because of the way the array is allocated, so an elog(ERROR) is > > > sufficient. > > I agree the fail is rare (and probably doesn't happen in real cases, although the comment > does imply with a sufficiently low working_memory it might?). However, I'd dispute that ERROR > is sufficient --PANIC is probably appropriate here because FREEZE is not WAL-logged until > the end of the page; so we'd end up with unfrozen Xids hanging around with an appropriately > timed crash. I wouldn't worry much about the PANIC tripping because like you said, > this seems unreachable normally. > > The errmsg should come with a errhint saying "Increase maintenance_work_mem". On closer inspection, I misread the code. The freezes are collected instead of executed, so a ERROR would suffice. We don't need to change it to PANIC. errhint comment remains. Yi Wen -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
I've spent some time reviewing the new patch and have made a couple of minor changes. I've made some changes to this chunk: else if (TransactionIdIsNormal(xid)) { if (TransactionIdPrecedes(xid, cutoff_xid)) - freeze_xmax = true; + { + if (HeapTupleHeaderIsHeapOnly(tuple) || + HeapTupleHeaderIsHotUpdated(tuple)) + { + frz->t_infomask |= HEAP_XMAX_COMMITTED; + totally_frozen = false; + changed = true; + } + else + freeze_xmax = true; + } else totally_frozen = false; } The changes are: 1) To add the line changed = true; This is necessary to actually set the bits of to HEAP_XMAX_COMMITTED. In repro.sql, this is set anyway because the xmin is frozen. However, consider a scenario where we have xmin frozen and xmax still live in a previous VACUUM FREEZE pass, if we do a second pass where HEAP_XMAX_COMMITTED needs to be set, we will lose that change. 2) Flip the order of HeapTupleHeaderIsHeapOnly(tuple) and HeapTupleHeaderIsHotUpdated(tuple). This is merely a microoptimization on a non-performance critical path because HeapOnly is a broader condititon than HotUpdated :-) I've also changed the elog ERROR. This seems necessary for a user to know, because it does mean VACUUM is broken on their system: + if (vacrelstats->num_dead_tuples >= vacrelstats->max_dead_tuples) + ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_RESOURCES), + errmsg ("dead tuple array overflow: %d dead, %d max", + vacrelstats->num_dead_tuples, + vacrelstats->max_dead_tuples), + errhint("Increase maintenance_work_mem"))); I've also made some changes to comments in the code. Thanks, Yi Wen ________________________________________ From: pgsql-bugs-owner@postgresql.org <pgsql-bugs-owner@postgresql.org> on behalf of Wong, Yi Wen <yiwong@amazon.com> Sent: Tuesday, September 12, 2017 1:54 PM To: Alvaro Herrera; Michael Paquier Cc: Peter Geoghegan; Wood, Dan; pgsql-bugs@postgresql.org; Andres Freund Subject: Re: [BUGS] Old row version in hot chain become visible after a freeze > > > I also tweaked lazy_record_dead_tuple to fail with ERROR if the tuple > > > cannot be recorded, as observed by Yi Wen. AFAICS that's not reachable > > > because of the way the array is allocated, so an elog(ERROR) is > > > sufficient. > > I agree the fail is rare (and probably doesn't happen in real cases, although the comment > does imply with a sufficiently low working_memory it might?). However, I'd dispute that ERROR > is sufficient --PANIC is probably appropriate here because FREEZE is not WAL-logged until > the end of the page; so we'd end up with unfrozen Xids hanging around with an appropriately > timed crash. I wouldn't worry much about the PANIC tripping because like you said, > this seems unreachable normally. > > The errmsg should come with a errhint saying "Increase maintenance_work_mem". On closer inspection, I misread the code. The freezes are collected instead of executed, so a ERROR would suffice. We don't need to change it to PANIC. errhint comment remains. Yi Wen -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Attachment
This fails in backbranches. I think I should back-patch c2581794f37e ... -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Sat, Sep 23, 2017 at 11:05 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > This fails in backbranches. I think I should back-patch > c2581794f37e ... If you mean to merge those patches directly in the final versions to be committed, yes that looks adapted. -- Michael -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Wong, Yi Wen wrote: > 2) Flip the order of HeapTupleHeaderIsHeapOnly(tuple) and HeapTupleHeaderIsHotUpdated(tuple). This is merely a > microoptimization on a non-performance critical path because HeapOnly is a broader condititon than HotUpdated :-) Actually, is this correct? AFAICS, IsHeapOnly refers to the *new* version of the tuple in a HOT update, whereas IsHotUpdated refers to the *old* version. We surely must never freeze a IsHotUpdated tuple (because that would bring a dead tuple back to life, which is the bug at hand), but what is the argument against freezing a IsHeapOnly tuple? I think the way this test is written comes from some fuzzy thinking -- somebody (probably me) copied the test from lazy_scan_heap, but that routine has two different reasons for each of those two conditions (one is merely an optimization, the other is necessary for correctness). But in the spot we're patching, one of them is wrong. I could observe that this bug causes xids older than cutoff xid to remain in xmax occasionally with that test in place. Removing the "IsHeapOnly" part and keeping only IsHotUpdated makes that go away. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Okay, I have pushed this patch now backpatched to all branches since 9.3, after staring at this code for way too long. I spent a lot of time testing it to ensure things are correct, but please double-check. Many thanks to Dan Wood for the initial diagnosys, and to Yi Wen and Michael for the review and discussion. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Jeff Frost wrote: > Any idea on how to identify affected rows on a running system? > Is it as simple as: > > SELECT id, count(*) FROM foo GROUP BY id HAVING count(*) > 1; > > Maybe also need to: > > set enable_indexscan = 0; > set enable_indexonlyscan = 0; > > before running the SELECT? Did you find out? Your proposed query/settings seems like a good way to determine if any supposed unique column is still unique, but I wonder if there's a better way. I suppose a procedure would involve seeing which tables have HOT updates, and scan those. Of course, this could theoretically happen to tables without any UNIQUE, and then the recipe doesn't work. > Is it possible to affect a DELETE or does it need to be a HOT updated row? AFAICS it needs to be a HOT update. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Hmm. In going over this once more while looking at the coverage report site, I wonder if the "NB" comment on top of heap_prepare_freeze_tuple has important consequences for some of this new code. It probably does, but I'm not sure what to do about it. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Alvaro Herrera wrote: > Okay, I have pushed this patch now backpatched to all branches since > 9.3, after staring at this code for way too long. By the way, as a side-effect of this commit, heapam.c has gone from 76.6% covered to 78.6% covered; in particular, FreezeMultiXactId() has gone from absolutely not touched at all to having about half its lines covered; also some newly covered paths in heap_prepare_freeze_tuple. It's not a huge increase, but it's a start. IMO we need some more test cases to cover a few more branches here ... -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs