Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold < - Mailing list pgsql-hackers

On Tue, Apr 19, 2016 at 6:11 PM, Kevin Grittner <kgrittn@gmail.com> wrote:
> On Tue, Apr 19, 2016 at 9:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Sun, Apr 17, 2016 at 2:26 AM, Andres Freund <andres@anarazel.de> wrote:
>>>
>>> On 2016-04-16 16:44:52 -0400, Noah Misch wrote:
>>> > That is more controversial than the potential ~2% regression for
>>> > old_snapshot_threshold=-1.  Alvaro[2] and Robert[3] are okay releasing
>>> > that way, and Andres[4] is not.
>>>
>>> FWIW, I could be kinda convinced that it's temporarily ok, if there'd be
>>> a clear proposal on the table how to solve the scalability issue around
>>> MaintainOldSnapshotTimeMapping().
>>
>> It seems that for read-only workloads, MaintainOldSnapshotTimeMapping()
>> takes EXCLUSIVE LWLock which seems to be a probable reason for a performance
>> regression.  Now, here the question is do we need to acquire that lock if
>> xmin is not changed since the last time value of
>> oldSnapshotControl->latest_xmin is updated or xmin is lesser than equal to
>> oldSnapshotControl->latest_xmin?
>> If we don't need it for above cases, I think it can address the performance
>> regression to a good degree for read-only workloads when the feature is
>> enabled.
>
> Thanks, Amit -- I think something along those lines is the right
> solution to the scaling issues when the feature is enabled.  For
> now I'm focusing on the back-patching issues and the performance
> regression when the feature is disabled, but I'll shift focus to
> this once the "killer" issues are in hand.

I had an idea I wanted to test out. The gist of it is to effectively
have the last slot of timestamp to xid map stored in the latest_xmin
field and only update the mapping when slot boundaries are crossed.
See attached WIP patch for details. This way the exclusive lock only
needs to be acquired once per minute. The common case is a spinlock
that could be replaced with atomics later. And it seems to me that the
mutex_threshold taken in TestForOldSnapshot() can also get pretty hot
under some workloads, so that may also need some tweaking.

I think a better approach would be to base the whole mechanism on a
periodically updated counter, instead of timestamps. Autovacuum
launcher looks like a good candidate to play the clock keeper, without
it the feature has little point anyway. AFAICS only the clock keeper
needs to have the timestamp xid mapping, others can make do with a
couple of periodically updated values. I haven't worked it out in
detail, but it feels like the code would be simpler. But this was a
larger change than I felt comfortable trying out, so I went with the
simple change first.

However, while checking out if my proof of concept patch actually
works I hit another issue. I couldn't get my test for the feature to
actually work. The test script I used is attached. Basically I have a
table with 1000 rows, one high throughput worker deleting old rows and
inserting new ones, one long query that acquires a snapshot and sleeps
for 30min, and one worker that has a repeatable read snapshot and
periodically does count(*) on the table. Based on documentation I
would expect the following:

* The interfering query gets cancelled
* The long running query gets to run
* Old rows will start to be cleaned up after the threshold expires.

However, testing on commit 9c75e1a36b6b2f3ad9f76ae661f42586c92c6f7c,
I'm seeing that the old rows do not get cleaned up, and that I'm only
seeing the interfering query get cancelled when old_snapshot_threshold
= 0. Larger values do not result in cancellation. Am I doing something
wrong or is the feature just not working at all?

Regards,
Ants Aasma

Attachment

pgsql-hackers by date:

Previous
From: Eric Ridge
Date:
Subject: Re: Disallow unique index on system columns
Next
From: Tom Lane
Date:
Subject: Re: Disallow unique index on system columns