Home > mailing lists

Improving vacuum/VM/etc - Mailing list pgsql-hackers

From	Jim Nasby
Subject	Improving vacuum/VM/etc
Date	April 23, 2015 19:10:09
Msg-id	5539437D.7090501@BlueTreble.com Whole thread Raw
Responses	Re: Improving vacuum/VM/etc
List	pgsql-hackers

Tree view

I mentioned this idea in the "other"[1] vacuum thread [2], but I think
it got lost.

Kevin Grittner pointed out that there's a potentially huge number of
writes we incur over the life of a tuple [3]:

(1) WAL log the insert.
(2) Write the tuple.
(3) Hint and rewrite the tuple.
(4) WAL log the freeze of the tuple.
(5) Rewrite the frozen tuple.
(6) WAL-log the delete.
(7) Rewrite the deleted tuple.
(8) Prune and rewrite the page.
(9) Free line pointers and rewrite the page.

He mentioned that a lot of these writes could be combined if they
happened close enough together. We can further add an all-visible state
in at 3.5.

Instead of simply adding all-frozen information to the VM we could
instead store 4 different page states and potentially improve a lot of
different cleanup woes at one time.

Unfortunately, the states I came up with using existing semantics don't
look hugely useful[4], but if we take Robert's idea and make all-visible
mean all-frozen, we can do much better:

0: Newly inserted tuples
Tracking this state allows us to aggressively set hint bits.

1: Newly deleted
There are tuples that have been deleted but not pruned. There may also
be newly inserted tuples that need hinting (state 0).

Similar to state 0, we'd want to be fairly aggressive with these pages,
because as soon as the deleting XID is committed and older than all
snapshots we can prune. Because we can prune without hitting indexes,
this is still a fairly cheap operation, though not as cheap as 0.

2: Fully hinted, not frozen
This is the really painful state to clean up, because we have to deal
with indexes. We must enter this state after being in 1.

3: All-visible-frozen
Every tuple on the page is visible and frozen. Pages in this state need
no maintenance at all. We might be able to enter this state directly
from state 0.

BENEFITS
This tracking should help at least 3 problems: the need to set hint bits
after insert, SELECT queries doing pruning (Simon's recent complaint),
and needing to scan an entire table for freezing.

The improvement in hinting and pruning is based on the idea that
normally there would not be a lot of pages in state 0 or 1, and pages
that were in those states are very likely to still be in disk cache (if
not shared buffers). That means we can have a background process (or 2)
that is very aggressive at targeting pages in these states.

Not needing to scan everything that's frozen is thanks to state 3. I
think it's OK (at least for now) if only vacuum puts pages into this
state, which means it can actually freeze the tuples when it does it
(thanks to 37484ad we won't lose forensic data doing this). That means
there's no extra work necessary by a foreground process that's dirtying
a page.

Because of 37484ad, I think as part of this we should also deprecate
vacuum_freeze_min_age, or at least change it's behavior. AFAIK the only
objection to aggressive freezing was loss of forensic data, and that's
gone now. So vacuum (and presumably the bg process(es) than handle state
0 and 1) should freeze tuples if it would allow the whole page to be
frozen. Possibly it should just do it any time it's dirtying the page.
(We could actually do this right now; it would let us eliminate the GUC,
but I'm not sure there'd be other benefit without the rest of this.)

DOWNSIDES
This does mean doubling the size of the VM. It would still be 32,000
times smaller than the heap with 8k pages (and 128,000 times smaller
with the common warehouse 32k page size), so I suspect this is a
non-issue, but it's worth mentioning. It might have some effect on a
almost entirely read-only system; but I suspect in most other cases the
other benefits will outweigh this.

This approach still does nothing to help the index related activity in
vacuum. My gut says state 2 should be further split; but I'm not sure
why. Perhaps if we had another state we could do something more
intelligent with index cleanup...

This might put a lot more read pressure on the VMs. We might want some
way to summarize per-table VMs (or ranges of VMs) so that we're not
constantly scanning them.

We'd still have to freeze, as opposed to what might be possible with
XID-LSN. OTOH, most of the changes to do this would be limited to
current VM code and callers. I don't think vacuum itself would need a
lot of changes, and I hope the BG code for state 0/1 would be that
complicated; it shouldn't need the complexity of autovacuum or vacuum.
So this should be much lower risk than something like XID-LSN.

So... what am I missing? :)

[1]
http://postgresql.org/message-id/flat/20140912135413.GK4701@eldon.alvh.no-ip.org
[2]
http://postgresql.org/message-id/flat/2011829201.2201963.1429726992897.JavaMail.yahoo@mail.yahoo.com
[3]
http://postgresql.org/message-id/771351984.2266772.1429728671811.JavaMail.yahoo@mail.yahoo.com

[4]
1a: All-visible
What we have today. Page still needs to be visited for freeze, but has
no newly inserted nor newly deleted tuples.

2a: All-frozen
Not only is the page all-visible, it's also all-frozen.

3a: "Other"
Either we couldn't mark the page all-visible after hinting everything
from step 0, or there's newly deleted tuples
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

pgsql-hackers by date:

From: Heikki Linnakangas
Date: 23 April 2015, 19:05:51
Subject: Moving ExecInsertIndexTuples and friends to new file

From: Peter Geoghegan
Date: 23 April 2015, 19:11:27
Subject: Re: Moving ExecInsertIndexTuples and friends to new file

Improving vacuum/VM/etc - Mailing list pgsql-hackers

Previous

Next