Re: New vacuum option to do only freezing - Mailing list pgsql-hackers

From Robert Haas
Subject Re: New vacuum option to do only freezing
Date
Msg-id CA+Tgmoa796XctOC0JdEiYJUi-rX=CrGjM7C4=k_s0A1iCZb+WQ@mail.gmail.com
Whole thread Raw
In response to Re: New vacuum option to do only freezing  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: New vacuum option to do only freezing  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Wed, Jan 16, 2019 at 3:30 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> As the above comment says, it's possible that the state of an
> INSERT_IN_PROGRESS tuple could be changed to 'dead' after
> heap_page_prune(). Since such tuple is not truncated at this point we
> record it and set it as UNUSED in lazy_vacuum_page(). I think that the
> DISABLE_INDEX_CLEANUP case is the same; we need to process them after
> recorded. Am I missing something?

I believe you are.  Think about it this way.  After the first pass
over the heap has been completed but before we've done anything to the
indexes, let alone started the second pass over the heap, somebody
could kill the vacuum process.  Somebody could in fact yank the plug
out of the wall, stopping the entire server in its tracks.  If they do
that, then lazy_vacuum_page() will never get executed.  Yet, the heap
can't be in any kind of corrupted state at this point, right?  We know
that the system is resilient against crashes, and killing a vacuum or
even the whole server midway through does not leave the system in any
kind of bad state.  If it's fine for lazy_vacuum_page() to never be
reached in that case, it must also be fine for it never to be reached
if we ask for vacuum to stop cleanly before lazy_vacuum_page().

In the case of the particular comment to which you are referring, that
comment is part of lazy_scan_heap(), not lazy_vacuum_page(), so I
don't see how it bears on the question of whether we need to call
lazy_vacuum_page().  It's true that, at any point in time, an
in-progress transaction could abort.  And if it does then some
insert-in-progress tuples could become dead.  But if that happens,
then the next vacuum will remove them, just as it will remove any
tuples that become dead for that reason when vacuum isn't running in
the first place.  You can't use that as a justification for needing a
second heap pass, because if it were, then you'd also need a THIRD
heap pass in case a transaction aborts after the second heap pass has
visited the pages, and a fourth heap pass in case a transaction aborts
after the third heap pass has visited the pages, etc. etc. forever.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Andrew Gierth
Date:
Subject: Re: draft patch for strtof()
Next
From: John Naylor
Date:
Subject: Re: WIP: Avoid creation of the free space map for small tables