Thread: lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

From

Noah Misch

Date:

08 January 2013, 02:50:09

Per this comment in lazy_scan_heap(), almost all tuple removal these days
happens in heap_page_prune():

                case HEAPTUPLE_DEAD:
                    /*
                     * Ordinarily, DEAD tuples would have been removed by
                     * heap_page_prune(), but it's possible that the tuple
                     * state changed since heap_page_prune() looked.  In
                     * particular an INSERT_IN_PROGRESS tuple could have
                     * changed to DEAD if the inserter aborted.  So this
                     * cannot be considered an error condition.

vacuumlazy.c remains responsible for noticing the LP_DEAD line pointers left
by heap_page_prune(), removing corresponding index entries, and marking those
line pointers LP_UNUSED.

Nonetheless, lazy_vacuum_heap() retains the ability to remove actual
HEAPTUPLE_DEAD tuples and reclaim their LP_NORMAL line pointers.  This support
gets exercised only in the scenario described in the above comment.  For hot
standby, this capability requires its own WAL record, XLOG_HEAP2_CLEANUP_INFO,
to generate the necessary conflicts[1].  There is a bug in lazy_scan_heap()'s
bookkeeping for the xid to place in that WAL record.  Each call to
heap_page_prune() simply overwrites vacrelstats->latestRemovedXid, but
lazy_scan_heap() expects it to only ever increase the value.  I have a
attached a minimal fix to be backpatched.  It has lazy_scan_heap() ignore
heap_page_prune()'s actions for the purpose of this conflict xid, because
heap_page_prune() emitted an XLOG_HEAP2_CLEAN record covering them.

At that point in the investigation, I realized that the cost of being able to
remove entire tuples in lazy_vacuum_heap() greatly exceeds the benefit.
Again, the benefit is being able to remove tuples whose inserting transaction
aborted between the HeapTupleSatisfiesVacuum() call in heap_page_prune() and
the one in lazy_scan_heap().  To make that possible, lazy_vacuum_heap() grabs
a cleanup lock, calls PageRepairFragmentation(), and emits a WAL record for
every page containing LP_DEAD line pointers or HEAPTUPLE_DEAD tuples.  If we
take it out of the business of removing tuples, lazy_vacuum_heap() can skip
WAL and update lp_flags under a mere shared lock.  The second attached patch,
for HEAD, implements that.  Besides optimizing things somewhat, it simplifies
the code and removes rarely-tested branches.  (This patch supersedes the
backpatch-oriented patch rather than stacking with it.)

The bookkeeping behind the "page containing dead tuples is marked as
all-visible in relation" warning is also faulty; it only fires when
lazy_heap_scan() saw the HEAPTUPLE_DEAD tuple; again, heap_page_prune() will
be the one to see it in almost every case.  I changed the warning to fire
whenever the table cannot be marked all-visible for a reason other than the
presence of too-recent live tuples.

I considered renaming lazy_vacuum_heap() to lazy_heap_clear_dead_items(),
reflecting its narrower role.  Ultimately, I left function names unchanged.

This patch conflicts textually with Pavan's "Setting visibility map in
VACUUM's second phase" patch, but I don't see any conceptual incompatibility.

I can't give a simple statement of the performance improvement here.  The
XLOG_HEAP2_CLEAN record is fairly compact, so the primary benefit of avoiding
it is the possibility of avoiding a full-page image.  For example, if a
checkpoint lands just before the VACUUM and again during the index-cleaning
phase (assume just one such phase in this example), this patch reduces
heap-related WAL volume by almost 50%.  Improvements as extreme as 2% and 97%
are possible given other timings of checkpoints relatively to the VACUUM.  In
general, expect this to help VACUUMs spanning several checkpoint cycles more
than it helps shorter VACUUMs.  I have attached a script I used as a reference
workload for testing different checkpoint timings.  There should also be some
improvement from keeping off WALInsertLock, not requiring WAL flushes to evict
from the ring buffer during the lazy_vacuum_heap() phase, and not taking a
second buffer cleanup lock.  I did not attempt to quantify those.

Thanks,
nm

[1] Normally, heap_page_prune() removes the tuple first (leaving an LP_DEAD
line pointer), and vacuumlazy.c removes index entries afterward.  When the
removal happens in this order, the XLOG_HEAP2_CLEAN record takes care of
conflicts.  However, in the rarely-used code path, we remove the index entries
before removing the tuple.  XLOG_HEAP2_CLEANUP_INFO conflicts with standby
snapshots that might need the vanishing index entries.

Attachment

Re: lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

From

Pavan Deolasee

Date:

08 January 2013, 09:12:19

On Tue, Jan 8, 2013 at 8:19 AM, Noah Misch <noah@leadboat.com> wrote:

At that point in the investigation, I realized that the cost of being able to
remove entire tuples in lazy_vacuum_heap() greatly exceeds the benefit.
Again, the benefit is being able to remove tuples whose inserting transaction
aborted between the HeapTupleSatisfiesVacuum() call in heap_page_prune() and
the one in lazy_scan_heap(). To make that possible, lazy_vacuum_heap() grabs
a cleanup lock, calls PageRepairFragmentation(), and emits a WAL record for
every page containing LP_DEAD line pointers or HEAPTUPLE_DEAD tuples. If we
take it out of the business of removing tuples, lazy_vacuum_heap() can skip
WAL and update lp_flags under a mere shared lock. The second attached patch,
for HEAD, implements that. Besides optimizing things somewhat, it simplifies
the code and removes rarely-tested branches. (This patch supersedes the
backpatch-oriented patch rather than stacking with it.)

+1. I'd also advocated removing the line pointer array scan in lazy_scan_heap() because the heap_page_prune() does most of the work. The reason why we still have that scan in lazy_scan_heap() is to check for new dead tuples (which should be rare), check for all-visibility of the page and freeze tuples if possible. I think we can move all of that to heap_page_prune().

But while you are at it, I wonder if you have time and energy to look at the single pass vacuum work that I, Robert and others tried earlier but failed to get to the final stage [1][2]. If we do something like that, we might not need the second scan of the heap at all, which as you rightly pointed out, does not do much today anyway, other than marking LP_DEAD line pointers to LP_UNUSED. We can push that work to the next vacuum or even a foreground task.

BTW, with respect to your idea of not WAL logging the operation of setting LP_DEAD to LP_UNUSED in lazy_vacuum_page(), I wonder if this would create problems with crash recovery. During normal vacuum operation, you may have converted LP_DEAD to LP_UNUSED. Then you actually used one of those line pointers for subsequent PageAddItem() on the page. If you crash after that but before the page gets written to the disk, during crash recovery, AFAICS PageAddItem() will complain with "will not overwrite a used ItemId" warning and subsequent PANIC of the recovery.

Thanks,

Pavan

[1] http://archives.postgresql.org/pgsql-hackers/2011-07/msg00624.php

[2] http://archives.postgresql.org/message-id/CABOikdPhAX5uGugB9RJNSj+zVEYTV8Sn4ctYfcMBc47r6_B2_g@mail.gmail.com

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Re: lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

From

Noah Misch

Date:

10 January 2013, 01:49:23

Hi Pavan,

Thanks for reviewing.

On Tue, Jan 08, 2013 at 02:41:54PM +0530, Pavan Deolasee wrote:
> On Tue, Jan 8, 2013 at 8:19 AM, Noah Misch <noah@leadboat.com> wrote:
> > At that point in the investigation, I realized that the cost of being able
> > to
> > remove entire tuples in lazy_vacuum_heap() greatly exceeds the benefit.
> > Again, the benefit is being able to remove tuples whose inserting
> > transaction
> > aborted between the HeapTupleSatisfiesVacuum() call in heap_page_prune()
> > and
> > the one in lazy_scan_heap().  To make that possible, lazy_vacuum_heap()
> > grabs
> > a cleanup lock, calls PageRepairFragmentation(), and emits a WAL record for
> > every page containing LP_DEAD line pointers or HEAPTUPLE_DEAD tuples.  If
> > we
> > take it out of the business of removing tuples, lazy_vacuum_heap() can skip
> > WAL and update lp_flags under a mere shared lock.  The second attached
> > patch,
> > for HEAD, implements that.  Besides optimizing things somewhat, it
> > simplifies
> > the code and removes rarely-tested branches.  (This patch supersedes the
> > backpatch-oriented patch rather than stacking with it.)
> >
> >
> +1. I'd also advocated removing the line pointer array scan in
> lazy_scan_heap() because the heap_page_prune() does most of the work. The
> reason why we still have that scan in lazy_scan_heap() is to check for new
> dead tuples (which should be rare), check for all-visibility of the page
> and freeze tuples if possible. I think we can move all of that to
> heap_page_prune().

Yes; that was an interesting thread.

> But while you are at it, I wonder if you have time and energy to look at
> the single pass vacuum work that I, Robert and others tried earlier but
> failed to get to the final stage [1][2]. If we do something like that, we
> might not need the second scan of the heap at all, which as you rightly
> pointed out, does not do much today anyway, other than marking LP_DEAD line
> pointers to LP_UNUSED. We can push that work to the next vacuum or even a
> foreground task.

I don't wish to morph this patch into a removal of lazy_vacuum_heap(), but
that will remain promising for the future.

> BTW, with respect to your idea of not WAL logging the operation of setting
> LP_DEAD to LP_UNUSED in lazy_vacuum_page(), I wonder if this would create
> problems with crash recovery. During normal vacuum operation, you may have
> converted LP_DEAD to LP_UNUSED. Then you actually used one of those line
> pointers for subsequent PageAddItem() on the page. If you crash after that
> but before the page gets written to the disk, during crash recovery, AFAICS
> PageAddItem() will complain with "will not overwrite a used ItemId" warning
> and subsequent PANIC of the recovery.

Good catch; I can reproduce that problem.  I've now taught PageAddItem() to
tolerate replacing an LP_DEAD item until recovery reaches consistency.  Also,
since lazy_vacuum_page() no longer calls PageRepairFragmentation(), it should
call PageSetHasFreeLinePointers() itself.  The attached update fixes both
problems.  (I have also attached the unchanged backpatch-oriented fix to keep
things together.)

nm

Attachment

Re: lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

From

Simon Riggs

Date:

10 January 2013, 02:45:44

On 8 January 2013 02:49, Noah Misch <noah@leadboat.com> wrote:

> There is a bug in lazy_scan_heap()'s
> bookkeeping for the xid to place in that WAL record.  Each call to
> heap_page_prune() simply overwrites vacrelstats->latestRemovedXid, but
> lazy_scan_heap() expects it to only ever increase the value.  I have a
> attached a minimal fix to be backpatched.  It has lazy_scan_heap() ignore
> heap_page_prune()'s actions for the purpose of this conflict xid, because
> heap_page_prune() emitted an XLOG_HEAP2_CLEAN record covering them.

Interesting. Yes, bug, and my one of mine also.

ISTM the right fix is fix to correctly initialize on pruneheap.c line 176   prstate.latestRemovedXid =
*latestRemovedXid;
better to make it work than to just leave stuff hanging.

I very much like your patch to remove all that cruft altogether; good
riddance. I think you're missing removing a few calls to
HeapTupleHeaderAdvanceLatestRemovedXid(), perhaps even that routine as
well.

Not sure about the skipping WAL records and share locking part, that's
too radical for me.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

From

Noah Misch

Date:

10 January 2013, 03:11:43

On Thu, Jan 10, 2013 at 02:45:36AM +0000, Simon Riggs wrote:
> On 8 January 2013 02:49, Noah Misch <noah@leadboat.com> wrote:
> > There is a bug in lazy_scan_heap()'s
> > bookkeeping for the xid to place in that WAL record.  Each call to
> > heap_page_prune() simply overwrites vacrelstats->latestRemovedXid, but
> > lazy_scan_heap() expects it to only ever increase the value.  I have a
> > attached a minimal fix to be backpatched.  It has lazy_scan_heap() ignore
> > heap_page_prune()'s actions for the purpose of this conflict xid, because
> > heap_page_prune() emitted an XLOG_HEAP2_CLEAN record covering them.
> 
> Interesting. Yes, bug, and my one of mine also.
> 
> ISTM the right fix is fix to correctly initialize on pruneheap.c line 176
>     prstate.latestRemovedXid = *latestRemovedXid;
> better to make it work than to just leave stuff hanging.

That works, too.

> I very much like your patch to remove all that cruft altogether; good
> riddance. I think you're missing removing a few calls to
> HeapTupleHeaderAdvanceLatestRemovedXid(), perhaps even that routine as
> well.

Thanks.  Did you have a particular HeapTupleHeaderAdvanceLatestRemovedXid()
call site in mind?  The three calls remaining in the tree look right to me.

> Not sure about the skipping WAL records and share locking part, that's
> too radical for me.

Certainly a fair point of discussion.  In particular, using a plain exclusive
lock wouldn't be much worse.

Re: lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

From

Jeff Janes

Date:

19 January 2013, 22:58:38

On Wednesday, January 9, 2013, Noah Misch wrote:

On Thu, Jan 10, 2013 at 02:45:36AM +0000, Simon Riggs wrote:
> On 8 January 2013 02:49, Noah Misch <noah@leadboat.com> wrote:
> > There is a bug in lazy_scan_heap()'s
> > bookkeeping for the xid to place in that WAL record. Each call to
> > heap_page_prune() simply overwrites vacrelstats->latestRemovedXid, but
> > lazy_scan_heap() expects it to only ever increase the value. I have a
> > attached a minimal fix to be backpatched. It has lazy_scan_heap() ignore
> > heap_page_prune()'s actions for the purpose of this conflict xid, because
> > heap_page_prune() emitted an XLOG_HEAP2_CLEAN record covering them.
>
> Interesting. Yes, bug, and my one of mine also.
>
> ISTM the right fix is fix to correctly initialize on pruneheap.c line 176
> prstate.latestRemovedXid = *latestRemovedXid;
> better to make it work than to just leave stuff hanging.

That works, too.

As bug fixes don't usually go through the commit-fest process, will someone be committing one of these two ideas for the back-branches? And to HEAD, in case the more invasive patch doesn't make it in?

I have a preliminary nit-pick on the big patch. It generates a compiler warning:

vacuumlazy.c: In function ‘lazy_scan_heap’:

vacuumlazy.c:445:9: warning: variable ‘prev_dead_count’ set but not used [-Wunused-but-set-variable]

Thanks,

Jeff

Re: lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

From

Stephen Frost

Date:

23 January 2013, 02:45:48

Noah,

* Noah Misch (noah@leadboat.com) wrote:
> The attached update fixes both
> problems.  (I have also attached the unchanged backpatch-oriented fix to keep
> things together.)

I've just started looking at/playing with this patch and was wondering
if you'd missed Jeff's comments on it..?  I note that prev_dead_count is
still in this patch- any particular reason..?  Also, perhaps we should
consider Simon's one-liner fix for backpatching this instead of the
original patch you posted?

To be honest, I'm also a bit concerned about applying the patch you
provided for HEAD this late in the 9.3 cycle, especially if the plan is
to make further changes in this area to simplify things moving forward.
Perhaps we could do all of those changes early in the 9.4 cycle?
Thanks,
    Stephen

Re: lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

From

Noah Misch

Date:

23 January 2013, 03:28:53

On Sat, Jan 19, 2013 at 02:58:32PM -0800, Jeff Janes wrote:
> I have a preliminary nit-pick on the big patch.  It generates a compiler
> warning:
> 
> vacuumlazy.c: In function ?lazy_scan_heap?:
> vacuumlazy.c:445:9: warning: variable ?prev_dead_count? set but not used
> [-Wunused-but-set-variable]

Thanks.  That variable can just go away.  Since a full review may turn up more
things, I'll hold off on sending a version fixing that alone.

Re: lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

From

Noah Misch

Date:

23 January 2013, 04:35:49

On Tue, Jan 22, 2013 at 09:45:37PM -0500, Stephen Frost wrote:
> * Noah Misch (noah@leadboat.com) wrote:
> > The attached update fixes both
> > problems.  (I have also attached the unchanged backpatch-oriented fix to keep
> > things together.)
> 
> I've just started looking at/playing with this patch and was wondering
> if you'd missed Jeff's comments on it..?  I note that prev_dead_count is
> still in this patch- any particular reason..?

Thanks for the reminder; I've now replied to Jeff.

> Also, perhaps we should
> consider Simon's one-liner fix for backpatching this instead of the
> original patch you posted?

I have no nontrivial preference between the two approaches.

> To be honest, I'm also a bit concerned about applying the patch you
> provided for HEAD this late in the 9.3 cycle, especially if the plan is
> to make further changes in this area to simplify things moving forward.
> Perhaps we could do all of those changes early in the 9.4 cycle?

Concerning "further changes," I suppose you refer to Pavan's two designs noted
upthread?  I don't recommend going out of our way to consider all these
changes together; there are disadvantages, too, of making several VACUUM
performance changes in short succession.  I'm also not aware of concrete plans
to implement those designs.

You're the second commentator to be skittish about the patch's correctness, so
I won't argue against a conservatism-motivated bounce of the patch.

Thanks,
nm

Re: lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

From

Pavan Deolasee

Date:

28 January 2013, 13:54:29

On Wed, Jan 23, 2013 at 10:05 AM, Noah Misch <noah@leadboat.com> wrote:

> You're the second commentator to be skittish about the patch's correctness, so
> I won't argue against a conservatism-motivated bounce of the patch.

Can you please rebase the patch against the latest head ? I see
Alvaro's and Simon's recent changes has bit-rotten the patch.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Re: lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

From

Simon Riggs

Date:

28 January 2013, 14:12:41

On 23 January 2013 04:35, Noah Misch <noah@leadboat.com> wrote:

>> Also, perhaps we should
>> consider Simon's one-liner fix for backpatching this instead of the
>> original patch you posted?
>
> I have no nontrivial preference between the two approaches.

Sorry, I didn't see this. I guess you saw I applied my one liner and
backpatched it.

I'm expecting to apply Noah's larger patch to HEAD with the suggested
edits. I'll do that last thing in the CF.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

From

Noah Misch

Date:

29 January 2013, 03:29:06

On Mon, Jan 28, 2013 at 02:12:33PM +0000, Simon Riggs wrote:
> On 23 January 2013 04:35, Noah Misch <noah@leadboat.com> wrote:
> >> Also, perhaps we should
> >> consider Simon's one-liner fix for backpatching this instead of the
> >> original patch you posted?
> >
> > I have no nontrivial preference between the two approaches.
> 
> Sorry, I didn't see this. I guess you saw I applied my one liner and
> backpatched it.

Yes; thanks.

> I'm expecting to apply Noah's larger patch to HEAD with the suggested
> edits. I'll do that last thing in the CF.

What "suggested edits" do you have in mind?

Re: lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

From

Noah Misch

Date:

30 January 2013, 02:05:09

On Mon, Jan 28, 2013 at 07:24:04PM +0530, Pavan Deolasee wrote:
> On Wed, Jan 23, 2013 at 10:05 AM, Noah Misch <noah@leadboat.com> wrote:
>
> > You're the second commentator to be skittish about the patch's correctness, so
> > I won't argue against a conservatism-motivated bounce of the patch.
>
> Can you please rebase the patch against the latest head ? I see
> Alvaro's and Simon's recent changes has bit-rotten the patch.

Attached.

Attachment

lazy_vacuum_heap-simplify-v3.patch

Re: lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

From

Pavan Deolasee

Date:

30 January 2013, 18:08:07

On Wed, Jan 30, 2013 at 7:34 AM, Noah Misch <noah@leadboat.com> wrote:
> On Mon, Jan 28, 2013 at 07:24:04PM +0530, Pavan Deolasee wrote:
>> On Wed, Jan 23, 2013 at 10:05 AM, Noah Misch <noah@leadboat.com> wrote:
>>
>> > You're the second commentator to be skittish about the patch's correctness, so
>> > I won't argue against a conservatism-motivated bounce of the patch.
>>
>> Can you please rebase the patch against the latest head ? I see
>> Alvaro's and Simon's recent changes has bit-rotten the patch.
>
> Attached.

Thanks. Here are a few comments.

1. I saw you took care of the bug that I reported in the first email
by allowing overwriting a LP_DEAD itemid during recovery. While this
should be OK given that we had never seen reports of recovery trying
to accidentally overwrite a LP_DEAD itemid, are you sure after this
change we can't get into this situation post reachedConsistency ? If
we ever do, I think the recovery would just fail.

2. In lazy_scan_heap() after detecting a DEAD tuple you now try to
confirm that the tuple must not require a freeze. Is that really safe
? I think this assumes that the HOT prune would have already truncated
*every* DEAD tuple to LP_DEAD except an INSERT_IN_PROGRESS which may
have turned into DEAD between two calls to HeapTupleSatisfiesVacuum().
While this might be true, but we never tried hard to prove that before
because it wasn't necessary. I remember Heikki raising this concern
when I proposed setting the VM bit after HOT prune. So a proof of that
would probably help.

3. Converting LP_DEAD to LP_UNUSED in lazy_vacuum_page() while holding
just a SHARE lock seems to be OK, but is a bit adventurous. I would
rather just get an EX lock and do it. Also, its probably more
appropriate to just mark the buffer dirty instead of
SetBufferCommitInfoNeedsSave(). It may cause line pointer bloat and
vacuum may not come to process this page again. This will also be kind
of unnecessary after the patch to set VM bit in the second phase gets
in.

4. Are changes in the variable names and logic around them in
lazy_scan_heap() really required ? Just makes me a bit uncomfortable.
See if we can minimize those changes or do it in a separate patch, if
possible.

I haven't run tests with the patch yet. Will see if I can try a few. I
share other's views on making these changes late in the cycle, but if
we can reduce the foot-print of the patch, we should be OK.

I see the following (and similar) messages while applying the patch,
but may be they are harmless.

(Stripping trailing CRs from patch.)

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Re: lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

From

Noah Misch

Date:

04 February 2013, 03:00:38

On Wed, Jan 30, 2013 at 11:37:41PM +0530, Pavan Deolasee wrote:
> On Wed, Jan 30, 2013 at 7:34 AM, Noah Misch <noah@leadboat.com> wrote:
> > On Mon, Jan 28, 2013 at 07:24:04PM +0530, Pavan Deolasee wrote:
> >> On Wed, Jan 23, 2013 at 10:05 AM, Noah Misch <noah@leadboat.com> wrote:
> >>
> >> > You're the second commentator to be skittish about the patch's correctness, so
> >> > I won't argue against a conservatism-motivated bounce of the patch.
> >>
> >> Can you please rebase the patch against the latest head ? I see
> >> Alvaro's and Simon's recent changes has bit-rotten the patch.
> >
> > Attached.
> 
> Thanks. Here are a few comments.
> 
> 1. I saw you took care of the bug that I reported in the first email
> by allowing overwriting a LP_DEAD itemid during recovery. While this
> should be OK given that we had never seen reports of recovery trying
> to accidentally overwrite a LP_DEAD itemid, are you sure after this
> change we can't get into this situation post reachedConsistency ? If
> we ever do, I think the recovery would just fail.

Having pondered this again, I conclude that checking reachedConsistency is
wrong.  As a simple example, during ongoing streaming replication, the startup
process will regularly be asked to overwrite LP_DEAD items.  Those items were
LP_UNUSED on the master, but only a full-page image would transfer that fact
to the standby before something overwrites the item.

That shows another flaw.  VACUUM marking a page all-visible is WAL-logged, so
the standby will receive that change.  But the standby may still have LP_DEAD
pointers on the affected page, where the master has LP_UNUSED.  I'm not aware
of any severe consequences; no scan type would be fooled.  Still, this
violates a current assumption of the system.

> 2. In lazy_scan_heap() after detecting a DEAD tuple you now try to
> confirm that the tuple must not require a freeze. Is that really safe
> ? I think this assumes that the HOT prune would have already truncated
> *every* DEAD tuple to LP_DEAD except an INSERT_IN_PROGRESS which may
> have turned into DEAD between two calls to HeapTupleSatisfiesVacuum().
> While this might be true, but we never tried hard to prove that before
> because it wasn't necessary. I remember Heikki raising this concern
> when I proposed setting the VM bit after HOT prune. So a proof of that
> would probably help.

Yes, the patch assumes all that.  Since lazy_scan_heap() already refuses to
remove a heap-only or HOT-updated HEAPTUPLE_DEAD tuple, heap_page_prune() had
better not miss truncating those.  Otherwise, we would pass such a tuple to
heap_freeze_tuple(), which assumes the tuple is not HEAPTUPLE_DEAD.  The case
where heap_page_prune() could be leaving a HEAPTUPLE_DEAD tuple today without
such problems is a simple tuple not participating in HOT.  I think it's clear
from inspection that heap_page_prune() will visit all such tuples, and
heap_prune_chain() will truncate them when visited.

Since a tuple can move from HEAPTUPLE_INSERT_IN_PROGRESS to HEAPTUPLE_DEAD at
any instant here, heap_freeze_tuple() must not misbehave on such tuples.
Indeed, it does not; since the xmin was running at some point after we chose a
freeze horizon, the xmin will be above that horizon.  Therefore, the
try_freeze dance I added ought to be unnecessary.

> 3. Converting LP_DEAD to LP_UNUSED in lazy_vacuum_page() while holding
> just a SHARE lock seems to be OK, but is a bit adventurous. I would
> rather just get an EX lock and do it.

Fair enough.

> Also, its probably more
> appropriate to just mark the buffer dirty instead of
> SetBufferCommitInfoNeedsSave().

There exist precedents for both styles.

> It may cause line pointer bloat and
> vacuum may not come to process this page again.

How would that happen?

> This will also be kind
> of unnecessary after the patch to set VM bit in the second phase gets
> in.

To the extent that pages tend to become all-visible after removing dead
material, yes.  When a long-running transaction is holding back the
all-visible horizon, it will still matter.

> 4. Are changes in the variable names and logic around them in
> lazy_scan_heap() really required ? Just makes me a bit uncomfortable.
> See if we can minimize those changes or do it in a separate patch, if
> possible.

I assume you refer to the replacement of "all_visible" and "has_dead_tuples"
with "has_nonlive" and "has_too_recent".  As ever, it would be hard to
honestly say that those changes were *required*.  They serve the side goal of
fixing the bookkeeping behind one of the warnings, which, independent of this
patch, isn't firing in as often as it should.  I could split that out as an
independent patch.

> I haven't run tests with the patch yet. Will see if I can try a few. I
> share other's views on making these changes late in the cycle, but if
> we can reduce the foot-print of the patch, we should be OK.

Given those reactions and my shortness of time to solidly fix everything noted
above, let's table this patch.  I'm marking it Returned with Feedback.

> I see the following (and similar) messages while applying the patch,
> but may be they are harmless.
> 
> (Stripping trailing CRs from patch.)

I don't see anything like that.

Thanks,
nm