Emit fewer vacuum records by reaping removable tuples during pruning - Mailing list pgsql-hackers

From Melanie Plageman
Subject Emit fewer vacuum records by reaping removable tuples during pruning
Date
Msg-id CAAKRu_bgvb_k0gKOXWzNKWHt560R0smrGe3E8zewKPs8fiMKkw@mail.gmail.com
Whole thread Raw
Responses Re: Emit fewer vacuum records by reaping removable tuples during pruning
Re: Emit fewer vacuum records by reaping removable tuples during pruning
List pgsql-hackers
Hi,

When there are no indexes on the relation, we can set would-be dead
items LP_UNUSED and remove them during pruning. This saves us a vacuum
WAL record, reducing WAL volume (and time spent writing and syncing
WAL).

See this example:

  drop table if exists foo;
  create table foo(a int) with (autovacuum_enabled=false);
  insert into foo select i from generate_series(1,10000000)i;
  update foo set a = 10;
  \timing on
  vacuum foo;

On my machine, the attached patch set provides a 10% speedup for vacuum
for this example -- and a 40% decrease in WAL bytes emitted.

Admittedly, this case is probably unusual in the real world. On-access
pruning would preclude it. Throw a SELECT * FROM foo before the vacuum
and the patch has no performance benefit.

However, it has no downside as far as I can tell. And, IMHO, it is a
code clarity improvement. This change means that lazy_vacuum_heap_page()
is only called when we are actually doing a second pass and reaping dead
items. I found it quite confusing that lazy_vacuum_heap_page() was
called by lazy_scan_heap() to set dead items unused in a block that we
just pruned.

I think it also makes it clear that we should update the VM in
lazy_scan_prune(). All callers of lazy_scan_prune() will now consider
updating the VM after returning. And most of the state communicated back
to lazy_scan_heap() from lazy_scan_prune() is to inform it whether or
not to update the VM. I didn't do that in this patch set because I would
need to pass all_visible_according_to_vm to lazy_scan_prune() and that
change didn't seem worth the improvement in code clarity in
lazy_scan_heap().

I am planning to add a VM update into the freeze record, at which point
I will move the VM update code into lazy_scan_prune(). This will then
allow us to consolidate the freespace map update code for the prune and
noprune cases and make lazy_scan_heap() short and sweet.

Note that (on principle) this patch set is on top of the bug fix I
proposed in [1].

- Melanie

[1] https://www.postgresql.org/message-id/CAAKRu_YiL%3D44GvGnt1dpYouDSSoV7wzxVoXs8m3p311rp-TVQQ%40mail.gmail.com

Attachment

pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: lazy_scan_heap() should release lock on buffer before vacuuming FSM
Next
From: Nathan Bossart
Date:
Subject: Re: archive modules loose ends