Re: Not HOT enough - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Not HOT enough
Date
Msg-id CA+U5nMLJtHq2V+JBn-hrL1O6QytxKK_GiJnZruWJujLtWfWZGg@mail.gmail.com
Whole thread Raw
In response to Re: Not HOT enough  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, Nov 23, 2011 at 8:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Nov 23, 2011 at 1:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> What I think might make more sense is to keep two variables,
>>> RecentGlobalXmin with its current meaning and RecentDatabaseWideXmin
>>> which considers only xmins of transactions in the current database.
>>> Then HOT cleanup could select the appropriate cutoff depending on
>>> whether it's working on a shared or non-shared relation.
>
>> Unfortunately, that would have the effect of lengthening the time for
>> which ProcArrayLock is held, and as benchmark results from Pavan's
>> patch in that area show, that makes a very big difference to total
>> throughput on write-heavy workloads.
>
> [ shrug... ]  Simon's patch already adds nearly as many cycles in the
> hot spot as would be required to do what I suggest.

Well, its deeper than that.

My patch actually skips xids that aren't in the user's database. That
avoids other work in GetSnapshotData(), so will in many cases make it
faster. The snapshots returned will be smaller, which also means more
speed.

As you point out upthread, that generates an MVCC snapshot that is not
safe for user queries against shared catalog tables. Standard catalog
access is safe, but user access isn't. The way to solve that problem
is to make all scans against shared catalog tables use SnapshotNow,
whatever the snapshot says. Which would be more useful since you'll
see exactly what the DBMS sees. Given the infrequency of change to
those tables and the infrequency of user access to those tables it
seems like a very good thing.

If we do as you suggest, snapshots would contain all xids from all
databases, so no effort would be skipped, but we would pay the cost of
deriving two values just in case we ever decide to read a shared
catalog table, which is blue moon frequency, so a net loss.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Not HOT enough
Next
From: Robert Haas
Date:
Subject: Re: Not HOT enough