Re: Plans for solving the VACUUM problem - Mailing list pgsql-hackers

From Hiroshi Inoue
Subject Re: Plans for solving the VACUUM problem
Date
Msg-id 3B04B147.53E82D49@tpf.co.jp
Whole thread Raw
In response to Plans for solving the VACUUM problem  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Plans for solving the VACUUM problem  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> 
> I have been thinking about the problem of VACUUM and how we might fix it
> for 7.2.  Vadim has suggested that we should attack this by implementing
> an overwriting storage manager and transaction UNDO, but I'm not totally
> comfortable with that approach: 

IIRC, Vadim doesn't intend to implement an overwriting
smgr at least in the near future though he seems to 
have a plan to implement a functionality to allow space
re-use without vacuum as marked in TODO. IMHO UNDO
functionality under no overwriting(ISTM much easier
than under overwriting) smgr has the highest priority.
Savepoints is planned for 7.0 but we don't have it yet.

[snip]

> 
> Locking issues
> --------------
> 
> We will need two extensions to the lock manager:
> 
> 1. A new lock type that allows concurrent reads and writes
> (AccessShareLock, RowShareLock, RowExclusiveLock) but not
> anything else.

What's different from RowExclusiveLock ?
Does it conflict with itself ?

> 
> The conditional lock will be used by lazy VACUUM to try to upgrade its
> VacuumLock to an AccessExclusiveLock at step D (truncate table).  If it's
> able to get exclusive lock, it's safe to truncate any unused end pages.
> Without exclusive lock, it's not, since there might be concurrent
> transactions scanning or inserting into the empty pages.  We do not want
> lazy VACUUM to block waiting to do this, since if it does that it will
> create a lockout situation (reader/writer transactions will stack up
> behind it in the lock queue while everyone waits for the existing
> reader/writer transactions to finish).  Better to not do the truncation.
> 

And would the truncation occur that often in reality under
the scheme(without tuple movement) ?

[snip]

> 
> Index access method improvements
> --------------------------------
> 
> Presently, VACUUM deletes index tuples by doing a standard index scan
> and checking each returned index tuple to see if it points at any of
> the tuples to be deleted.  If so, the index AM is called back to delete
> the tested index tuple.  This is horribly inefficient: it means one trip
> into the index AM (with associated buffer lock/unlock and search overhead)
> for each tuple in the index, plus another such trip for each tuple actually
> deleted.
> 
> This is mainly a problem of a poorly chosen API.  The index AMs should
> offer a "bulk delete" call, which is passed a sorted array of main-table
> TIDs.  The loop over the index tuples should happen internally to the
> index AM.  At least in the case of btree, this could be done by a
> sequential scan over the index pages, which avoids the random I/O of an
> index-order scan and so should offer additional speedup.
> 

???? Isn't current implementation "bulk delete" ?
Fast access to individual index tuples seems to be
also needed in case a few dead tuples.

> Further out (possibly not for 7.2), we should also look at making the
> index AMs responsible for shrinking indexes during deletion, or perhaps
> via a separate "vacuum index" API.  This can be done without exclusive
> locks on the index --- the original Lehman & Yao concurrent-btrees paper
> didn't describe how, but more recent papers show how to do it.  As with
> the main tables, I think it's sufficient to recycle freed space within
> the index, and not necessarily try to give it back to the OS.
> 

Great. There would be few disadvantages in our btree
implementation.

regards,
Hiroshi Inoue


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Plans for solving the VACUUM problem
Next
From: Tom Lane
Date:
Subject: Re: Plans for solving the VACUUM problem