Thread: Advisory locks seem rather broken

Advisory locks seem rather broken

From
Tom Lane
Date:
According to
http://archives.postgresql.org/pgsql-general/2012-04/msg00374.php
advisory locks now cause problems for prepared transactions, which
ought to ignore them.  It appears to me that this got broken by
commit 62c7bd31c8878dd45c9b9b2429ab7a12103f3590, which marked the
userlock lock method as transactional, which seems just about 100%
misguided to me.  At the very least this would require reconsidering
every single place that tests lock transactionality, and that evidently
did not happen.

If this patch weren't already in a released branch I would be arguing
for reverting it.  As is, I think we're going to have to clean it up.
I don't have time to look at it in detail right now, though.
        regards, tom lane


Re: Advisory locks seem rather broken

From
Simon Riggs
Date:
On Thu, May 3, 2012 at 1:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> According to
> http://archives.postgresql.org/pgsql-general/2012-04/msg00374.php
> advisory locks now cause problems for prepared transactions, which
> ought to ignore them.  It appears to me that this got broken by
> commit 62c7bd31c8878dd45c9b9b2429ab7a12103f3590, which marked the
> userlock lock method as transactional, which seems just about 100%
> misguided to me.  At the very least this would require reconsidering
> every single place that tests lock transactionality, and that evidently
> did not happen.
>
> If this patch weren't already in a released branch I would be arguing
> for reverting it.  As is, I think we're going to have to clean it up.
> I don't have time to look at it in detail right now, though.

There was an attempt to add a transactional advisory lock call type,
but my understanding of the plan for that was not to change the
existing advisory lock mechanism.

It seems that was bungled, so some change is required, but maybe not
total revoke.

If the change was actually intended that way then I object to it and I
also want it changed back.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Advisory locks seem rather broken

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On Thu, May 3, 2012 at 1:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> If this patch weren't already in a released branch I would be arguing
>> for reverting it. �As is, I think we're going to have to clean it up.
>> I don't have time to look at it in detail right now, though.

> There was an attempt to add a transactional advisory lock call type,
> but my understanding of the plan for that was not to change the
> existing advisory lock mechanism.

> It seems that was bungled, so some change is required, but maybe not
> total revoke.

> If the change was actually intended that way then I object to it and I
> also want it changed back.

After studying the patch a bit more I have the definite feeling that
it needs to be rewritten from scratch.  It has turned the
LockMethodData.transactional flag into something completely useless for
telling whether a lock is session-level or transaction-local.  And,
instead of removing that flag and forcing all the code that checks it to
be rewritten, it's dropped ad-hoc code into just some of those places.

And, as far as I can tell, the ad-hoc test that it's replaced the
transactionality tests with is "is this lock held by a ResourceOwner",
which is a flagrant abuse of the ResourceOwner mechanism.
ResourceOwners should only be used to make sure resources are released
at appropriate times, they should not cause fundamental changes in the
semantics of those resources.

I'm inclined to think that a saner implementation would involve
splitting the userlock lockmethod into two, one transactional and one
not.  That gets rid of the when-to-release kluges, but instead we have
to think of a way for two different lockmethods to share the same
lock keyspace.  If we don't split it then we definitely need to figure
out someplace else to keep the transactionality flag.

Anyway, I'm going to go work on this now ...
        regards, tom lane


Re: Advisory locks seem rather broken

From
Merlin Moncure
Date:
On Thu, May 3, 2012 at 11:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm inclined to think that a saner implementation would involve
> splitting the userlock lockmethod into two, one transactional and one
> not.  That gets rid of the when-to-release kluges, but instead we have
> to think of a way for two different lockmethods to share the same
> lock keyspace.  If we don't split it then we definitely need to figure
> out someplace else to keep the transactionality flag.

hm, would that be exposed through the pg_locks view?  some users might
be running queries like "select * from pg_locks where
locktype='advisory' and ..."

it's a minor point, but ideally if they share the same lockspace the
same locktype would be reported in the view.

merlin


Re: Advisory locks seem rather broken

From
Simon Riggs
Date:
On Thu, May 3, 2012 at 5:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I'm inclined to think that a saner implementation would involve
> splitting the userlock lockmethod into two, one transactional and one
> not.

Agreed

> That gets rid of the when-to-release kluges, but instead we have
> to think of a way for two different lockmethods to share the same
> lock keyspace.  If we don't split it then we definitely need to figure
> out someplace else to keep the transactionality flag.

Is that even an issue? Do we really want an overlapping lock space?

AFAICS you'd either use transactional or session level, but to use
both seems bizarre. And if you really did need both, you can put a
wrapper around the function to check whether a session level exists
before you grant the transaction level lock, or vice versa.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Advisory locks seem rather broken

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On Thu, May 3, 2012 at 5:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> That gets rid of the when-to-release kluges, but instead we have
>> to think of a way for two different lockmethods to share the same
>> lock keyspace.  If we don't split it then we definitely need to figure
>> out someplace else to keep the transactionality flag.

> Is that even an issue? Do we really want an overlapping lock space?

> AFAICS you'd either use transactional or session level, but to use
> both seems bizarre.

I dunno.  That's the existing user-visible semantics, and I wasn't
proposing that we revisit the behavior.  It's a bit late for such
a proposal given this already shipped in 9.1.
        regards, tom lane


Re: Advisory locks seem rather broken

From
Tom Lane
Date:
Merlin Moncure <mmoncure@gmail.com> writes:
> On Thu, May 3, 2012 at 11:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm inclined to think that a saner implementation would involve
>> splitting the userlock lockmethod into two, one transactional and one
>> not.

> hm, would that be exposed through the pg_locks view?  some users might
> be running queries like "select * from pg_locks where
> locktype='advisory' and ..."

I don't think we can or should change what pg_locks reports.  So they'd
have to look like just one lockmethod at that level.

I'm not actually sure that a split is a practical idea anyway, given
that assorted places use a LockMethod as an identifier for a class of
locks; unless all of those happen to want to distinguish transactional
and session-level userlocks, it'd be problematic.  I plan to look also
at the idea of removing the "transactional" field and seeing what that
breaks...
        regards, tom lane


Re: Advisory locks seem rather broken

From
Josh Berkus
Date:
> AFAICS you'd either use transactional or session level, but to use
> both seems bizarre. And if you really did need both, you can put a
> wrapper around the function to check whether a session level exists
> before you grant the transaction level lock, or vice versa.

You wouldn't want to *intentionally*.  On a large complex codebase,
though, who knows?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Advisory locks seem rather broken

From
Robert Haas
Date:
On Thu, May 3, 2012 at 12:12 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> AFAICS you'd either use transactional or session level, but to use
> both seems bizarre.

I'm a bit confused by all this, because we use both transaction and
session level locks internally - on the same lock tags - so I don't
know why we think it wouldn't be useful for user code to do the same.

In fact I'm a bit confused by the original complaint for the same
reason - if LockRelationOid and LockRelationIdForSession can coexist,
why doesn't the same thing work for advisory locks?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Advisory locks seem rather broken

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, May 3, 2012 at 12:12 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> AFAICS you'd either use transactional or session level, but to use
>> both seems bizarre.

> I'm a bit confused by all this, because we use both transaction and
> session level locks internally - on the same lock tags - so I don't
> know why we think it wouldn't be useful for user code to do the same.

Yeah.  I'm too lazy to go look up the original discussion for the
feature, but it seems to me that having session-lifetime and
transaction-lifetime advisory locks conflict is exactly what was wanted.
If you want some that don't conflict, just choose distinct key values.

> In fact I'm a bit confused by the original complaint for the same
> reason - if LockRelationOid and LockRelationIdForSession can coexist,
> why doesn't the same thing work for advisory locks?

The problem (or problems) is bad implementation, not the specification.
In particular, at least one place that should have been patched was not.
        regards, tom lane


Re: Advisory locks seem rather broken

From
Andres Freund
Date:
On Thursday, May 03, 2012 06:12:04 PM Simon Riggs wrote:
> AFAICS you'd either use transactional or session level, but to use
> both seems bizarre. And if you really did need both, you can put a
> wrapper around the function to check whether a session level exists
> before you grant the transaction level lock, or vice versa.
I don't think at all that this is crazy. For queues it very well might make 
sense for a dequeuing side to hold a lock in a session mode while the putting 
side uses normal transaction scope (because its done inside a trigger or 
such).

Andres


Re: Advisory locks seem rather broken

From
Tom Lane
Date:
I wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> In fact I'm a bit confused by the original complaint for the same
>> reason - if LockRelationOid and LockRelationIdForSession can coexist,
>> why doesn't the same thing work for advisory locks?

> The problem (or problems) is bad implementation, not the specification.
> In particular, at least one place that should have been patched was not.

After calming down a bit and reading the patch more, I think the only
place that was really seriously overlooked was PREPARE TRANSACTION,
specifically AtPrepare_Locks/PostPrepare_Locks.  To some extent this is
just a matter of missing code, but there is one assumption in there that
seems hard to get around: the code expects that any given lock object
will be held at session level or at transaction level, never both.
If it is held at session level then ownership stays with the current
session, otherwise ownership of the lock is transferred to the prepared
transaction (the gxact object).  Since advisory-lock objects can be held
at session and transaction levels concurrently, this assumption fails.
It might seem obvious to move the transaction lock to the prepared xact
while keeping the session ownership, but that doesn't look workable
because it would require an additional ProcLock object in shared memory,
which we cannot guarantee in advance is available (and failing at the
PostPrepare stage is not acceptable).

I'm inclined to say that you can PREPARE if your session holds a given
advisory lock at either session or transaction level, but not both.
This is a bit annoying but doesn't seem likely to be a real problem in
practice, so thinking of a hack to support the case seems like more
work than is justified.
        regards, tom lane


Re: Advisory locks seem rather broken

From
Robert Haas
Date:
On Thu, May 3, 2012 at 3:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm inclined to say that you can PREPARE if your session holds a given
> advisory lock at either session or transaction level, but not both.
> This is a bit annoying but doesn't seem likely to be a real problem in
> practice, so thinking of a hack to support the case seems like more
> work than is justified.

I'd be more inclined to say that if you have a session-level lock, you
can't prepare, period.  Doesn't a rollback release session-level
locks?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Advisory locks seem rather broken

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, May 3, 2012 at 3:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm inclined to say that you can PREPARE if your session holds a given
>> advisory lock at either session or transaction level, but not both.
>> This is a bit annoying but doesn't seem likely to be a real problem in
>> practice, so thinking of a hack to support the case seems like more
>> work than is justified.

> I'd be more inclined to say that if you have a session-level lock, you
> can't prepare, period.

The bug report that started this investigation was precisely that
preparing in the presence of a session-level lock failed, where it has
worked in every release before 9.1; the prepare is supposed to simply
ignore session locks.

> Doesn't a rollback release session-level locks?

No, it doesn't.  Read
http://www.postgresql.org/docs/devel/static/explicit-locking.html#ADVISORY-LOCKS
(which could use some wordsmithing, but the specification is clear
enough)
        regards, tom lane


Re: Advisory locks seem rather broken

From
Tom Lane
Date:
... btw, it appears to me that the "fast path" patch has broken things
rather badly in LockReleaseAll.  AFAICS it's not honoring either the
lockmethodid restriction nor the allLocks restriction with respect to
fastpath locks.  Perhaps user locks and session locks are never taken
fast path, but still it would be better to be making those checks
further up, no?
        regards, tom lane


Re: Advisory locks seem rather broken

From
Robert Haas
Date:
On Thu, May 3, 2012 at 5:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> ... btw, it appears to me that the "fast path" patch has broken things
> rather badly in LockReleaseAll.  AFAICS it's not honoring either the
> lockmethodid restriction nor the allLocks restriction with respect to
> fastpath locks.  Perhaps user locks and session locks are never taken
> fast path, but still it would be better to be making those checks
> further up, no?

User locks are never taken fast path, but session locks can be, so I
think you're right that there is a bug here.  I think what we should
probably do is put the nLocks == 0 test before the lockmethodid and
allLocks checks, and then the fast path stuff after those two checks.

In 9.1, we just did this:
               if (locallock->proclock == NULL || locallock->lock == NULL)               {                       /*
                  * We must've run out of shared memory while 
trying to set up this                        * lock.  Just forget the local entry.                        */
          Assert(locallock->nLocks == 0);                       RemoveLocalLock(locallock);
continue;              } 

...and I just shoved the new logic into that stanza without thinking
hard enough about what order to do things in.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Advisory locks seem rather broken

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> In 9.1, we just did this:

>                 if (locallock->proclock == NULL || locallock->lock == NULL)
>                 {
>                         /*
>                          * We must've run out of shared memory while
> trying to set up this
>                          * lock.  Just forget the local entry.
>                          */
>                         Assert(locallock->nLocks == 0);
>                         RemoveLocalLock(locallock);
>                         continue;
>                 }

> ...and I just shoved the new logic into that stanza without thinking
> hard enough about what order to do things in.

Right.  The other thing that was bothering me about that was that it's
not clear now how to tell a broken locallock entry (which is what this
logic originally intended to clean up) from a fastpath one.  Perhaps
it'd be a good idea to add a "valid" flag?  And while I'm wondering
about such things, what happens when it's necessary to convert a
fastpath lock to a regular one, but there's no room in shared memory
for more lock objects?
        regards, tom lane


Re: Advisory locks seem rather broken

From
Robert Haas
Date:
On Fri, May 4, 2012 at 9:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> In 9.1, we just did this:
>
>>                 if (locallock->proclock == NULL || locallock->lock == NULL)
>>                 {
>>                         /*
>>                          * We must've run out of shared memory while
>> trying to set up this
>>                          * lock.  Just forget the local entry.
>>                          */
>>                         Assert(locallock->nLocks == 0);
>>                         RemoveLocalLock(locallock);
>>                         continue;
>>                 }
>
>> ...and I just shoved the new logic into that stanza without thinking
>> hard enough about what order to do things in.
>
> Right.  The other thing that was bothering me about that was that it's
> not clear now how to tell a broken locallock entry (which is what this
> logic originally intended to clean up) from a fastpath one.  Perhaps
> it'd be a good idea to add a "valid" flag?

Well, I think nLocks == 0 should serve that purpose adequately.

> And while I'm wondering
> about such things, what happens when it's necessary to convert a
> fastpath lock to a regular one, but there's no room in shared memory
> for more lock objects?

Then you error out.  Of course, if the fast path mechanism didn't
exist at all, you would have started erroring out much sooner.  Now,
there is some rub here, because the mechanism isn't "fair": strong
lockers will error out instead of weak lockers, and in the worst case
where the lock table remains perpetually on the edge of overflowing,
strong lock requests could be fail repeatedly, essentially a DOS.
Originally, I thought that the patch should include some kind of
accounting mechanism to prevent that from happening, where we'd keep
track of the number of fast-path locks that were outstanding and make
sure to keep that many slots free in the main lock table, but Noah
talked me out of it, on theory that (1) it was very unlikely to occur
in practice and (2) if it did occur, then you probably need to bump up
max_locks_per_transaction anyway and (3) it amounted to forcing
failures in cases where that might not be strictly necessary, which is
usually not a great thing to do.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Advisory locks seem rather broken

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Originally, I thought that the patch should include some kind of
> accounting mechanism to prevent that from happening, where we'd keep
> track of the number of fast-path locks that were outstanding and make
> sure to keep that many slots free in the main lock table, but Noah
> talked me out of it, on theory that (1) it was very unlikely to occur
> in practice and (2) if it did occur, then you probably need to bump up
> max_locks_per_transaction anyway and (3) it amounted to forcing
> failures in cases where that might not be strictly necessary, which is
> usually not a great thing to do.

I agree with that, as long as we can be sure that the system behaves
sanely (doesn't leave the data structures in a corrupt state) when an
out-of-memory condition does occur.
        regards, tom lane


Re: Advisory locks seem rather broken

From
Robert Haas
Date:
On Fri, May 4, 2012 at 10:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Originally, I thought that the patch should include some kind of
>> accounting mechanism to prevent that from happening, where we'd keep
>> track of the number of fast-path locks that were outstanding and make
>> sure to keep that many slots free in the main lock table, but Noah
>> talked me out of it, on theory that (1) it was very unlikely to occur
>> in practice and (2) if it did occur, then you probably need to bump up
>> max_locks_per_transaction anyway and (3) it amounted to forcing
>> failures in cases where that might not be strictly necessary, which is
>> usually not a great thing to do.
>
> I agree with that, as long as we can be sure that the system behaves
> sanely (doesn't leave the data structures in a corrupt state) when an
> out-of-memory condition does occur.

OK.  I believe that commit 53c5b869b464d567c3b8f617201b49a395f437ab
robustified this code path quite a bit; but it is certainly possible
that there are remaining oversights, and I would certainly appreciate
any further review you have time to do.  Basically, it seems like the
likely failure modes, if there are further bugs, would be either (1)
failing to track the strong lock counts properly, leading to
performance degradation if the counters become permanently stuck at a
value other than zero even after all the locks are gone or (2) somehow
muffing the migration of a lock from the fast-path mechanism to the
regular mechanism.

When taking a strong lock, the idea is that the strong locker first
bumps the strong lock count.  That bump must be unwound if we fail to
acquire the lock, which means it has to be cleaned up in the error
path and any case where we give up (e.g. conditional acquire of a
contended lock).  Next, we iterate through all the backend slots and
transfer fast path locks for each one individually.  If we fail midway
through, the strong locker must simply make sure to unwind the strong
lock count.  The weak lockers whose locks got transferred are fine:
they need to know how to cope with releasing a transferred lock
anyway; whether the backend that did the transfer subsequently blew up
is not something they have any need to care about.  Once all the
transfers are complete, the strong locker queues for the lock using
the main mechanism, which now includes all possible conflicting locks.Again, if we blow up while waiting for the lock,
theonly extra thing
 
we need to do is unwind the strong lock count acquisition.

Of course, I may be missing some other kind of bug altogether...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Advisory locks seem rather broken

From
Robert Haas
Date:
On Fri, May 4, 2012 at 9:17 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, May 3, 2012 at 5:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ... btw, it appears to me that the "fast path" patch has broken things
>> rather badly in LockReleaseAll.  AFAICS it's not honoring either the
>> lockmethodid restriction nor the allLocks restriction with respect to
>> fastpath locks.  Perhaps user locks and session locks are never taken
>> fast path, but still it would be better to be making those checks
>> further up, no?
>
> User locks are never taken fast path, but session locks can be, so I
> think you're right that there is a bug here.  I think what we should
> probably do is put the nLocks == 0 test before the lockmethodid and
> allLocks checks, and then the fast path stuff after those two checks.
>
> In 9.1, we just did this:
>
>                if (locallock->proclock == NULL || locallock->lock == NULL)
>                {
>                        /*
>                         * We must've run out of shared memory while
> trying to set up this
>                         * lock.  Just forget the local entry.
>                         */
>                        Assert(locallock->nLocks == 0);
>                        RemoveLocalLock(locallock);
>                        continue;
>                }
>
> ...and I just shoved the new logic into that stanza without thinking
> hard enough about what order to do things in.

This issue had slipped my mind, but Erik's report about another
fast-path locking problem jogged my memory, so I repaired this while I
was at it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company