Re: advance local xmin more aggressively - Mailing list pgsql-hackers

From Robert Haas
Subject Re: advance local xmin more aggressively
Date
Msg-id CA+Tgmoak8aoP+HOpHa0r=RvAC3Exz4ptusnm-ZeE97=1dfjyiA@mail.gmail.com
Whole thread Raw
In response to Re: advance local xmin more aggressively  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: advance local xmin more aggressively  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Wed, Dec 10, 2014 at 7:34 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
>>> 1. I don't really see why resource owners shouldn't be traversed like
>>> that.  They are clearly intended to form a hierarchy, and there's
>>> existing code that recurses through the hierarchy from a given level
>>> downward.  What's ugly about that?
>
> I can't exactly point my finger on it, but it just feels wrong from a
> modularity point of view. Their purpose is to make sure that we don't leak
> resources on abort, by allowing easy an "release everything" operation. It's
> not designed for finding objects based on some other criteria.
>
> There is similar double bookkeeping of many other things that are tracked by
> resource owners. Heavy-weight locks are tracked by LOCALLOCK structs, buffer
> pins in PrivateRefCount array etc. Those things existed before resource
> owners were invented, but if we were starting from scratch, that design
> would still make sense, as different objects have different access criteria.
> fd.c needs to be able to find the least-recently-used open file, for
> example, and you need to find the snapshot with the lowest xmin.

I think all of these are performance trade-offs rather than questions
of style.  LOCALLOCK structs and private buffer pin tracking existed
before resource owners because we need to do lookups of locks by lock
tag or buffers by buffer number frequently enough that, if we had to
grovel trough all the locks or buffer pins in the system without any
indexing, it would be slow.  We *could* do it that way now that we
have resource owners, but it's pretty obvious that it would suck.

That's less obvious here.  If we throw all of the registered snapshots
into a binary heap, finding the lowest xmin will be cheaper every time
we do it, but we'll have the overhead of maintaining that binary heap
even if it never gets used.

> I think you're confusing the N and the N above. It's true that deleting a
> resource owner is O(M), where the M is the number of children of that
> resource owner. It does not follow that releasing N resource owners is
> O(N^2), where N is the number of resource owners released. Calling
> ResourceOwnerDelete(x) will only visit each resource owner in that tree
> once, so it's O(N), where N is the total number of resource owners in the
> tree.

Not if the portals are dropped in separate commands with an
unpredictable ordering.  Then you have N separate calls to
ResourceOwnerDelete.

> I did some testing of the worst case scenario. The attached script first
> creates a lot of cursors, then a lot of savepoints, and finally closes the
> cursors in FIFO order. When the number of savepoints is high enough, this
> actually segfaults with your patch, because you run out of stack space when
> recursing the subxact resource owners. That's hardly this patch's fault, I'm
> actually surprised it doesn't crash without it, because we recurse into all
> resource owners in ResourceOwnerRelease too. Apparently the subxacts are
> closed in LIFO order at commit, but there might be are other cases where you
> could trigger that. In any case, a stack-depth check would be nice.

Thanks for the test case; that's helpful.

I added a stack depth check, but it doesn't help much:

ERROR:  stack depth limit exceeded
HINT:  Increase the configuration parameter "max_stack_depth"
(currently 2048kB), after ensuring the platform's stack depth limit is
adequate.
STATEMENT:  CLOSE cur0;
ERROR:  stack depth limit exceeded
HINT:  Increase the configuration parameter "max_stack_depth"
(currently 2048kB), after ensuring the platform's stack depth limit is
adequate.
WARNING:  AbortSubTransaction while in ABORT state
ERROR:  stack depth limit exceeded
HINT:  Increase the configuration parameter "max_stack_depth"
(currently 2048kB), after ensuring the platform's stack depth limit is
adequate.
WARNING:  AbortSubTransaction while in ABORT state
ERROR:  stack depth limit exceeded
HINT:  Increase the configuration parameter "max_stack_depth"
(currently 2048kB), after ensuring the platform's stack depth limit is
adequate.
WARNING:  AbortSubTransaction while in ABORT state
ERROR:  stack depth limit exceeded
HINT:  Increase the configuration parameter "max_stack_depth"
(currently 2048kB), after ensuring the platform's stack depth limit is
adequate.
PANIC:  ERRORDATA_STACK_SIZE exceeded

That's at 100k subtransactions.  I took some performance data for
lower numbers of subtransactions:

-- at 1000 subtransactions --
master 0.156s 0.157s 0.154s
advance-xmin 0.177s 0.175s 0.177s

-- at 10000 subtransactions --
master 0.458s 0.457s 0.456s
advance-xmin 0.639s 0.637s 0.633

I guess this bears some further thought.  I certainly don't like the
fact that this makes the whole system crap out at a lower number of
subtransactions than presently.  The actual performance numbers don't
bother me very much; I'm comfortable with the possibility that closing
a cursor will be some modest percentage slower if you've got thousands
of active savepoints.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Arthur Silva
Date:
Subject: Re: [REVIEW] Re: Compression of full-page-writes
Next
From: Petr Jelinek
Date:
Subject: Re: [COMMITTERS] pgsql: Keep track of transaction commit timestamps