Re: advance local xmin more aggressively - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: advance local xmin more aggressively
Date
Msg-id 548575D8.7060103@vmware.com
Whole thread Raw
In response to Re: advance local xmin more aggressively  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: advance local xmin more aggressively  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 12/05/2014 05:05 PM, Robert Haas wrote:
> [ reviving a very old thread ]
>
> On Tue, Feb 10, 2009 at 4:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Alvaro Herrera <alvherre@commandprompt.com> writes:
>>> For example, maybe we could keep track of counts of snapshots removed
>>> since the last xmin calculation, and only run this routine if the number
>>> is different from zero (or some small positive integer).
>>
>> I think most of the callers of SnapshotResetXmin already know they
>> removed something.
>>
>> It might be interesting for FreeSnapshot or something nearby to note
>> whether the snapshot being killed has xmin = proc's xmin, and only do
>> the update calculation if so.
>>
>> I still dislike the assumption that all resource owners are children of
>> a known owner.  I suspect in fact that it's demonstrably wrong right
>> now, let alone in future (cf comments in PortalRun).  If we're going to
>> do this then snapmgr.c needs to track the snapshots for itself.  Of
>> course that's going to make the "is it worth it" question even more
>> pressing.
>
> I've run into essentially the same problem Jeff originally complained
> about with a large customer who has long-running transactions that
> make extensive use of cursors.  Cursors are opened and closed over
> time but it is rare for the number open to reach exactly zero, so what
> ends up happening is that the backend xmin does not advance.  As you
> can imagine, that doesn't work out well.
>
> The approach I came up with initially was similar to Jeff's: start at
> the topmost resource owner and traverse them all, visiting every
> snapshot along the way.  But then I found this thread and saw the
> comment that this might be "demonstrably wrong" and referring to the
> comments in PortalRun.  Having reviewed those comments, which don't
> seem to have changed much in the last five years, I can't understand
> how they related to this issue.  It's true that the TopTransaction
> resource owner could get swapped out under us during an internal
> commit, but why should SnapshotResetXmin() have to care? It just
> traverses the one that is in effect at the time it gets called.  The
> only real danger I see here is that there could be *more than one*
> toplevel resource owner.  I wonder if we could solve that problem by
> adding a registry of active toplevel resource owners, so that if we
> have a forest rather than a tree we can still find everything.

I don't immediately see the problem either, but I have to say that 
grovelling through all the resource owners seems ugly anyway. Resource 
owners are not meant to be traversed like that. And there could be a lot 
of them, and you have to visit every one of them. That could get 
expensive if there are a lot of resource owners.

BTW, you could easily detect that you haven't seen all the registered 
snapshots, after traversing the resource owner, as we keep the counter 
of them. So you could just fall back to not advancing the xmin if it 
happens.

> The problem I see with having snapmgr.c track the snapshots for itself
> is that it is mostly duplicating of bookkeeping which is already being
> done.  Since this problem doesn't affect the majority of users, it's
> not desirable to add a lot of extra bookkeeping to cater to it - but
> even if it did affect a lot of users, we still want it to be as cheap
> as possible, and reusing the tracking that resource owners are already
> doing seems like the way to get there.

I would prefer doing separate bookkeeping in snapmgr.c. It seems like it 
wouldn't be difficult to do. It has to be cheap, but I don't see any 
reason to believe that it'd be more expensive than traversing through 
all resource owners. A binary heap or some other priority queue 
implementation should work pretty well for this.

> I think there are a couple of things we can do to make this cheaper.
> Suppose we keep track of the oldest xmin of any registered snapshot
> and the number of registered snapshots that have that particular xmin.
> Every time we deregister a snapshot, we check whether it is one of the
> ones with the minimum xmin; if it is, we decrement the count.  When
> the count reaches 0, we know that a traversal of all registered
> snapshots is guaranteed to find a newer value to advertise in
> MyProc->xmin; that way, we can arrange to do the work only when it's
> likely to pay off.  In most cases this won't happen until the last
> snapshot is unregistered, because our snapshots normally form a stack,
> with newer snapshots having been taken later.  But if somebody
> unregisters the oldest snapshot we'll immediately notice and
> recalculate.

Yeah, that's a reasonable optimization. It's a reasonable optimization 
even if you do the bookkeeping in snapmgr.c.

And that optimization won't save you in the cases where it doesn't 
apply. For example, what if snapshots happen to form a queue, rather 
than a stack:

DECLARE c1 CURSOR FOR ...;
DECLARE c2 CURSOR FOR ...;
DECLARE c3 CURSOR FOR ...;
...
DECLARE c1000 CURSOR FOR ...;

CLOSE c1;
CLOSE c2;
CLOSE c3;
...
CLOSE c1000;

It's not hard to imagine an application doing that. Sure, you could 
rewrite it to close the cursors in different order, but on the face of 
it that's not an unreasonable thing for an application to do. I think we 
should avoid the O(n^2) behaviour in that case.

(But +1 for addressing this problem, with the extra bookkeeping in 
snapmgr.c)

- Heikki



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Compression of full-page-writes
Next
From: Thomas Reiss
Date:
Subject: Casting issues with domains