Re: advance local xmin more aggressively - Mailing list pgsql-hackers

From Robert Haas
Subject Re: advance local xmin more aggressively
Date
Msg-id CA+TgmoZ94mzJCs0nZeaXoUqDUf_hYNOzPdWq4m5tCq09-vK81Q@mail.gmail.com
Whole thread Raw
In response to Re: advance local xmin more aggressively  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: advance local xmin more aggressively  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Mon, Dec 8, 2014 at 4:56 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> I don't immediately see the problem either, but I have to say that
> grovelling through all the resource owners seems ugly anyway. Resource
> owners are not meant to be traversed like that. And there could be a lot of
> them, and you have to visit every one of them. That could get expensive if
> there are a lot of resource owners.

1. I don't really see why resource owners shouldn't be traversed like
that.  They are clearly intended to form a hierarchy, and there's
existing code that recurses through the hierarchy from a given level
downward.  What's ugly about that?

2. If you have a lot of resource owners, you probably have a lot of
snapshots, so walking a list will be expensive, too.  It will be
disproportionately expensive to walk the resource owner tree only if
there are lots of resource owners but very few of them have any
snapshots.  But I don't think that can really happen.  If you've got
lots of resource owners and each one has a snapshot, you'll traverse
~3 pointers per snapshot: ~1 to find the next ResourceOwner, 1 to find
the snapshot array, and 1 to reach the snapshot itself.  A non-inlined
list would traverse only 2 pointers per snapshot, but that doesn't
seem like enough of a difference to get excited about.

> BTW, you could easily detect that you haven't seen all the registered
> snapshots, after traversing the resource owner, as we keep the counter of
> them. So you could just fall back to not advancing the xmin if it happens.

Not a bad idea.  Or we could elog(FATAL) or fail an assertion if we
don't see them all, and then if it happens we call it a bug and fix
it.

> I would prefer doing separate bookkeeping in snapmgr.c. It seems like it
> wouldn't be difficult to do. It has to be cheap, but I don't see any reason
> to believe that it'd be more expensive than traversing through all resource
> owners. A binary heap or some other priority queue implementation should
> work pretty well for this.

That's optimizing for making the xmin recomputation cheap, but we
don't expect xmin recomputation to happen very often, so I'm not sure
that's the right trade-off.

> And that optimization won't save you in the cases where it doesn't apply.
> For example, what if snapshots happen to form a queue, rather than a stack:
>
> DECLARE c1 CURSOR FOR ...;
> DECLARE c2 CURSOR FOR ...;
> DECLARE c3 CURSOR FOR ...;
> ...
> DECLARE c1000 CURSOR FOR ...;
>
> CLOSE c1;
> CLOSE c2;
> CLOSE c3;
> ...
> CLOSE c1000;
>
> It's not hard to imagine an application doing that. Sure, you could rewrite
> it to close the cursors in different order, but on the face of it that's not
> an unreasonable thing for an application to do. I think we should avoid the
> O(n^2) behaviour in that case.

You won't actually get O(n^2) behavior here unless those DECLARE
CURSOR statements all get snapshots with different xmins; if many of
them have snapshots that share an xmin, then the optimization of
recomputing only when there are no snaps with a given xmin will save
you.  That's a bit pathological, but maybe we should try to cater to
it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Etsuro Fujita
Date:
Subject: Re: inherit support for foreign tables
Next
From: Euler Taveira
Date:
Subject: pg_recvlogical description