Re: Open issues for HOT patch - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Open issues for HOT patch
Date
Msg-id 46F0F197.2030707@enterprisedb.com
Whole thread Raw
In response to Re: Open issues for HOT patch  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> We could lift the limitation that you can't defragment a page that's
>> pinned, if we play some smoke and mirrors in the buffer manager. When
>> you prune a page, make a *copy* of the page you're pruning, and keep
>> both versions in the buffer cache. Old pointers keep pointing to the old
>> version. Any new calls to ReadBuffer will return the new copy, and the
>> old copy can be dropped when its pin count drops to zero.
> 
> No, that's way too wacky.  How do you prevent people from making further
> changes to the "old" version?  For instance, marking a tuple deleted?

To make any changes to the "old" version, you need to lock the page with
LockBuffer. LockBuffer needs to return a buffer with the latest version
of the page, and the caller has to use that version for any changes.
Changing all callers of LockBuffer (that lock heap pages) to do that is
the biggest change involved, AFAICS.

Hint bit updates to the old version we could just forget about.

> The actual practical application we have, I think, would only require
> being able to defrag a page that our own backend has pins on, which is
> something that might be more workable --- but it still seems awfully
> fragile.  It could maybe be made to work in the simplest case of a
> plain UPDATE, because in practice I think the executor will never
> reference the old tuple's contents after heap_update() returns.  But
> this falls down in more complex situations involving joins --- we might
> continue to try to join the same "old" tuple to other rows, and then any
> pass-by-reference Datums we are using are corrupt if the tuple got
> moved.

Ugh, yeah that's too fragile.

Another wacky idea:

Within our own backend, we could keep track of which tuples we've
accessed, and defrag could move all other tuples as long as the ones
that we might still have pointers to are not touched. The bookkeeping
wouldn't have to be exact, as long as it's conservative.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: "Pavan Deolasee"
Date:
Subject: Re: Open issues for HOT patch
Next
From: "Radosław Zieliński"
Date:
Subject: Re: pg_ctl -w vs unix_socket_directory