Re: MVCC overheads - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: MVCC overheads
Date
Msg-id 20160708183306.GA732779@alvherre.pgsql
Whole thread Raw
In response to Re: MVCC overheads  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: MVCC overheads
List pgsql-hackers
Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > Pete Stevenson wrote:
> >> Maybe I could figure out the lines of code that add versions into a
> >> table and then those that collect old versions (they do get collected,
> >> right?). Anyway, thought being I could profile while running TPC-C or
> >> similar. I was hoping that someone might be able to jump on this with
> >> a response that they already did something similar.
> 
> > Old tuple versions are "collected" (removed) by either vacuum (see
> > vacuumlazy.c) and heap_page_prune.  The latter is one thing that could
> > perhaps somehow be offloaded, as it's quite independent from the other
> > stuff.  You can prune removable tuples at no additional cost from an
> > unlocked dirty page, which is a useful optimization because then
> > client-connected backends don't need to prune them later.
> 
> VACUUM in itself is an offloading optimization; the whole point of it
> is to do maintenance in a background process not foreground queries.

Well, if VACUUM worked so great, we wouldn't get so many trouble reports
with it.  There's substantial improvement we could make in that area.

> AFAIR, heap_page_prune is just a small subset of VACUUM work that
> we decided we could afford to do in foreground.

Sure, but we could *also* do it separately, splitting VACUUMs tasks of
tuple freezing, page compaction, and index entry removal each into
separate tasks.

Currently vacuuming a 4TB table can take weeks, meanwhile dead tuples
accumulate in already scanned pages leading to further bloat, leading to
Xid wraparound danger later, emergency vacuuming leading to applications
blocking on DDL.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
Next
From: Tom Lane
Date:
Subject: Re: Showing parallel status in \df+