Thread: VACUUMs and WAL

VACUUMs and WAL

From
Simon Riggs
Date:
Looking at a VACUUM's WAL records makes me think twice about the way we
issue a VACUUM.

1. First we scan the heap, issuing a HEAP2 clean record for every block
that needs cleaning.

2. Then we scan the index, issuing WAL records as appropriate.

3. Then we rescan the heap, issuing a HEAP2 clean record for every
block.

I don't see a reason why we would issue 2 WAL records per block for a
VACUUM, nor why we would prune and remove in two steps, dirtying the
block each time. Seems like we could write approximately half the amount
of data that we do.

Surely we can come up with a better plan than that one?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: VACUUMs and WAL

From
Hannu Krosing
Date:
On Tue, 2008-10-28 at 08:49 +0000, Simon Riggs wrote:
> Looking at a VACUUM's WAL records makes me think twice about the way we
> issue a VACUUM.
> 
> 1. First we scan the heap, issuing a HEAP2 clean record for every block
> that needs cleaning.

IIRC the first heap pass just collects info and does nothing else. 
Is this just an empty/do-nothing WAL record ?

> 2. Then we scan the index, issuing WAL records as appropriate.
> 
> 3. Then we rescan the heap, issuing a HEAP2 clean record for every
> block.
> 
> I don't see a reason why we would issue 2 WAL records per block for a
> VACUUM, nor why we would prune and remove in two steps, dirtying the
> block each time. 

The first pass should just be collecting info and not dirtying anything.
Could it be side effect of setting some transaction visibility bits on
first visit ? 

In that case It would be good, if we could disable doing that that for
vacuum.

> Seems like we could write approximately half the amount
> of data that we do.
> 
> Surely we can come up with a better plan than that one?

-------------------
Hannu Krosing 

http://www.2ndQuadrant.com
PostgreSQL Scalability Training, Services and Support




Re: VACUUMs and WAL

From
Simon Riggs
Date:
On Tue, 2008-10-28 at 11:45 +0200, Hannu Krosing wrote:
> On Tue, 2008-10-28 at 08:49 +0000, Simon Riggs wrote:
> > Looking at a VACUUM's WAL records makes me think twice about the way we
> > issue a VACUUM.
> > 
> > 1. First we scan the heap, issuing a HEAP2 clean record for every block
> > that needs cleaning.
> 
> IIRC the first heap pass just collects info and does nothing else. 
> Is this just an empty/do-nothing WAL record ?

8.3 changed that; it used to work that way. I guess I never looked at
the amount of WAL being generated.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: VACUUMs and WAL

From
Gregory Stark
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:

> I don't see a reason why we would issue 2 WAL records per block for a
> VACUUM, nor why we would prune and remove in two steps, dirtying the
> block each time. Seems like we could write approximately half the amount
> of data that we do.
>
> Surely we can come up with a better plan than that one?

This sounds like the same issue Pavan identified and proposed solving by
rotating the three passes so that we do step 3 at the beginning of the next
vacuum run, so it can be done in the same pass as step 1.

To do that he proposed we do:

1. scan heap doing two things: a) remove any marked tuples if they were marked  by a previous vacuum which committed
andb) prune and mark any tuples we  find are deletable for a future vacuum to remove.
 

2. scan indexes and remove the tuples we marked in 1b.

The main blocking issue IIRC was:

How to mark the tuples in a way which is safe if vacuum aborts. He suggested
putting a vacuum xid in pg_class. Other suggestions were to mark the pages in
the page header (which I thought made the most sense) or to put the xid in the
line pointer (since nothing else needs to go there, the tuples are already
cleaned).

There might also have been a question of how to deal with the statistics.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!


Re: VACUUMs and WAL

From
Simon Riggs
Date:
On Tue, 2008-10-28 at 10:59 +0000, Gregory Stark wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> 
> > I don't see a reason why we would issue 2 WAL records per block for a
> > VACUUM, nor why we would prune and remove in two steps, dirtying the
> > block each time. Seems like we could write approximately half the amount
> > of data that we do.
> >
> > Surely we can come up with a better plan than that one?
> 
> This sounds like the same issue Pavan identified and proposed solving by
> rotating the three passes so that we do step 3 at the beginning of the next
> vacuum run, so it can be done in the same pass as step 1.
> 
> To do that he proposed we do:
> 
> 1. scan heap doing two things: a) remove any marked tuples if they were marked
>    by a previous vacuum which committed and b) prune and mark any tuples we
>    find are deletable for a future vacuum to remove.
> 
> 2. scan indexes and remove the tuples we marked in 1b.

It's fairly hard to remove the second heap pass completely. 

I think what I am suggesting is two heap passes, but writing WAL and
dirtying blocks on only one of the passes.

The biggest I/O cost comes from the writes, not the reads, ISTM.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: VACUUMs and WAL

From
Gregory Stark
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:

> On Tue, 2008-10-28 at 10:59 +0000, Gregory Stark wrote:
>> To do that he proposed we do:
>> 
>> 1. scan heap doing two things: a) remove any marked tuples if they were marked
>>    by a previous vacuum which committed and b) prune and mark any tuples we
>>    find are deletable for a future vacuum to remove.
>> 
>> 2. scan indexes and remove the tuples we marked in 1b.
>
> It's fairly hard to remove the second heap pass completely. 
>
> I think what I am suggesting is two heap passes, but writing WAL and
> dirtying blocks on only one of the passes.

How small a patch would it be? I guess you just need to disable all pruning in
the first pass and do it in the second patch? 

I would still rather see Pavan's optimizationo if we can do it cleanly, but if
it's not going to happen and this is trivial then sure, we may as well.

>
> The biggest I/O cost comes from the writes, not the reads, ISTM.

It's counter-intuitive but actually it's usually the other way around. Writes
can be buffered, re-ordered, and scheduled during otherwise idle time. Reads
however are always blocking.

However in this situation I think you may be right. Vacuum is doing a
sequential scan through the table so if the OS has to interrupt that scan to
go do some writes it'll end up having to go back and forth. That would be a
*lot* slower than just doing a sequential scan.



--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!


Re: VACUUMs and WAL

From
Hannu Krosing
Date:
On Tue, 2008-10-28 at 10:10 +0000, Simon Riggs wrote:
> On Tue, 2008-10-28 at 11:45 +0200, Hannu Krosing wrote:
> > On Tue, 2008-10-28 at 08:49 +0000, Simon Riggs wrote:
> > > Looking at a VACUUM's WAL records makes me think twice about the way we
> > > issue a VACUUM.
> > > 
> > > 1. First we scan the heap, issuing a HEAP2 clean record for every block
> > > that needs cleaning.
> > 
> > IIRC the first heap pass just collects info and does nothing else. 
> > Is this just an empty/do-nothing WAL record ?
> 
> 8.3 changed that; it used to work that way. I guess I never looked at
> the amount of WAL being generated.


I can't see how it is safe to do anything more than just lookups on
first pass. 

There will be dangling index pointers if the system crashes/is rebooted
or the vacuum is just interrupted after cleaning some heap pages but
before cleaning corresponding index pages.

---------------
Hannu




Re: VACUUMs and WAL

From
Heikki Linnakangas
Date:
Hannu Krosing wrote:
> On Tue, 2008-10-28 at 10:10 +0000, Simon Riggs wrote:
>> On Tue, 2008-10-28 at 11:45 +0200, Hannu Krosing wrote:
>>> On Tue, 2008-10-28 at 08:49 +0000, Simon Riggs wrote:
>>>> Looking at a VACUUM's WAL records makes me think twice about the way we
>>>> issue a VACUUM.
>>>>
>>>> 1. First we scan the heap, issuing a HEAP2 clean record for every block
>>>> that needs cleaning.
>>> IIRC the first heap pass just collects info and does nothing else. 
>>> Is this just an empty/do-nothing WAL record ?
>> 8.3 changed that; it used to work that way. I guess I never looked at
>> the amount of WAL being generated.
> 
> I can't see how it is safe to do anything more than just lookups on
> first pass. 

What's done in the first pass is the same HOT pruning that is done 
opportunistically on other page accesses as well. IIRC it's required for 
correctness, though I can't remember what exactly the issue was.

I don't think the extra WAL volume is a problem; VACUUM doesn't generate 
much WAL, anyway. As for the extra data page writes it causes; yeah, 
that might cause some I/O that could be avoided, but remember that the 
first pass often dirties buffers anyway to set hint bits.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: VACUUMs and WAL

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> I think what I am suggesting is two heap passes, but writing WAL and
> dirtying blocks on only one of the passes.

I think you've all forgotten about hint-bit setting.  The assumption is
that the first VACUUM pass is going to update a lot of hint bits and we
might as well get some other work done with the same write.

Now of course that doesn't necessarily entail a WAL write too, but
it makes this less than a slam-dunk win.

Also, I think that the reason the code ended up this way is that there
were pretty severe difficulties in making the VACUUM code cope correctly
with un-pruned tuples.  Pavan might remember more about that.
        regards, tom lane


Re: VACUUMs and WAL

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> Looking at a VACUUM's WAL records makes me think twice about the way we
> issue a VACUUM.

> 1. First we scan the heap, issuing a HEAP2 clean record for every block
> that needs cleaning.

> 2. Then we scan the index, issuing WAL records as appropriate.

> 3. Then we rescan the heap, issuing a HEAP2 clean record for every
> block.

The first pass removes dead HOT tuples.  The second pass removes dead
normal tuples (it does NOT write "every block", only those with dead
tuples).  In principle the set of pages written in pass 1 might be
completely disjoint from the set of pages written in pass 2 (though
I admit that's probably not real likely).

> Surely we can come up with a better plan than that one?

Maybe, but it's not as bad as you're painting it.
        regards, tom lane


Re: VACUUMs and WAL

From
Hannu Krosing
Date:
On Tue, 2008-10-28 at 14:28 +0200, Heikki Linnakangas wrote:
> Hannu Krosing wrote:
> > On Tue, 2008-10-28 at 10:10 +0000, Simon Riggs wrote:
> >> On Tue, 2008-10-28 at 11:45 +0200, Hannu Krosing wrote:
> >>> On Tue, 2008-10-28 at 08:49 +0000, Simon Riggs wrote:
> >>>> Looking at a VACUUM's WAL records makes me think twice about the way we
> >>>> issue a VACUUM.
> >>>>
> >>>> 1. First we scan the heap, issuing a HEAP2 clean record for every block
> >>>> that needs cleaning.
> >>> IIRC the first heap pass just collects info and does nothing else. 
> >>> Is this just an empty/do-nothing WAL record ?
> >> 8.3 changed that; it used to work that way. I guess I never looked at
> >> the amount of WAL being generated.
> > 
> > I can't see how it is safe to do anything more than just lookups on
> > first pass. 
> 
> What's done in the first pass is the same HOT pruning that is done 
> opportunistically on other page accesses as well. IIRC it's required for 
> correctness, though I can't remember what exactly the issue was.

Are you sure it is a correctness thing ? Maybe HOT pruning just happened
to be in a path used by vacuum to read pages.

> I don't think the extra WAL volume is a problem; 

Probably not ( unless you need to ship your WAL records via a very
expensive network connection ).

If it is a simple performance problem, then it can probably be fixed by
just running VACUUM slower.

> VACUUM doesn't generate 
> much WAL, anyway. As for the extra data page writes it causes; yeah, 
> that might cause some I/O that could be avoided, but remember that the 
> first pass often dirties buffers anyway to set hint bits.

Still, can't we special-case HOT pruning and hint-bit change WAL-logging
for first the pass of vacuum ? They both seem redundant in case of
VACUUM.

---------------
Hannu