Thread: Rethinking LOCK TABLE's behavior on views

Rethinking LOCK TABLE's behavior on views

From

Tom Lane

Date:

07 November 2020, 16:57:20

The problems discussed in bug #16703 [1] show that pg_dump needs a
version of LOCK TABLE that behaves differently for views than
what we have now.  Since v11, LOCK TABLE on a view recurses to all
tables and views named in the view, and it does so using the view
owner's permissions, meaning that a view that would have permissions
failures if executed will also have permissions failures when locked.
That's probably fine for ordinary usage, but it's disastrous for
pg_dump --- even a superuser can't lock such a view.

Moreover, pg_dump doesn't really need the recursive behavior.  It just
needs the view's definition to hold still; and in any case, a typical
pg_dump run would have independently acquired locks on all the other
relations anyway.  The recursion is buying us nothing, except perhaps
an increased risk of deadlocks against concurrent DDL operations.
(I'm not quite sure if that's significant, given that pg_dump pays
no attention to the order in which it locks things.  But it sure as
heck isn't *decreasing* the risk; and it's a behavior that we could
not hope to improve with more smarts about pg_dump's lock ordering.)

So we need a way to turn that off.  What I proposed in that thread
was

> (1) Make pg_dump use LOCK TABLE ONLY, not LOCK TABLE.
> (2) Make LOCK TABLE ONLY on a view not recurse to the view's dependencies.

which would each be quite trivial to do.  An objection to this approach
is that ONLY typically controls recursion to a table's inheritance or
partitioning children, which is a different animal from recursion to
a view's dependencies.  That argument would lead to wanting some other
syntax to control this.  I do not find that argument compelling enough
to justify making pg_dump deal with two different commands depending
on the relation's relkind, but it is a plausible concern.

Closely related to this is whether pg_dump ought to be using ONLY for
locking regular tables too.  I tend to think that it should be, again
on the grounds that any child tables we may be interested in will get
locked separately, so that we're not doing anything by recursing except
expending extra cycles and perhaps increasing the chance of a deadlock.

A completely different approach we could consider is to weaken the
permissions requirements for LOCK on a view, say "allow it if either
the calling user or the view owner has the needed permission".  This
seems generally pretty messy and so I don't much like it, but we
should consider as many solutions as we can think of.

Thoughts?

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/16703-e348f58aab3cf6cc%40postgresql.org

Re: Rethinking LOCK TABLE's behavior on views

From

Noah Misch

Date:

07 November 2020, 22:46:06

On Sat, Nov 07, 2020 at 11:57:20AM -0500, Tom Lane wrote:
> The problems discussed in bug #16703 [1] show that pg_dump needs a
> version of LOCK TABLE that behaves differently for views than
> what we have now.  Since v11, LOCK TABLE on a view recurses to all
> tables and views named in the view, and it does so using the view
> owner's permissions, meaning that a view that would have permissions
> failures if executed will also have permissions failures when locked.
> That's probably fine for ordinary usage, but it's disastrous for
> pg_dump --- even a superuser can't lock such a view.
> 
> Moreover, pg_dump doesn't really need the recursive behavior.  It just
> needs the view's definition to hold still; and in any case, a typical
> pg_dump run would have independently acquired locks on all the other
> relations anyway.  The recursion is buying us nothing, except perhaps
> an increased risk of deadlocks against concurrent DDL operations.

The getTables() locking aims to take the locks that will be taken later.  That
avoids failing after expensive work.  For views, the later lock-taker is
pg_get_viewdef(), which locks more than just the view but less than[2] LOCK
TABLE.  Recursion buys us more than nothing for "pg_dump --table=viewname", so
abandoning recursion unconditionally is a step in the wrong direction.  I
don't expect --table to be as excellent as complete dumps, but a change that
makes it worse does lose points.  I want to keep the recursion.

> (I'm not quite sure if that's significant, given that pg_dump pays
> no attention to the order in which it locks things.  But it sure as
> heck isn't *decreasing* the risk; and it's a behavior that we could
> not hope to improve with more smarts about pg_dump's lock ordering.)

Reordering to avoid deadlocks would be best-effort, so it's fine not to have
full control over the order.

> Closely related to this is whether pg_dump ought to be using ONLY for
> locking regular tables too.  I tend to think that it should be, again
> on the grounds that any child tables we may be interested in will get
> locked separately, so that we're not doing anything by recursing except
> expending extra cycles and perhaps increasing the chance of a deadlock.

Agreed.  "pg_dump --table=inheritance_parent" never queries inheritance
children, so it's nice not to lock them.

> A completely different approach we could consider is to weaken the
> permissions requirements for LOCK on a view, say "allow it if either
> the calling user or the view owner has the needed permission".  This
> seems generally pretty messy and so I don't much like it, but we
> should consider as many solutions as we can think of.

This is the best of what you've listed by a strong margin, and I don't know of
better options you've not listed.  +1 for it.  Does it work for you?  I think
the mess arises from LOCK TABLE serving "get locks sufficient for $ACTIONS" as
a family of use cases.  For views only, different $ACTIONS want different
behavior.  $ACTIONS==SELECT wants today's behavior; pg_get_viewdef() wants
shallower recursion and caller permissions; DROP VIEW wants no recursion.

> [1] https://www.postgresql.org/message-id/flat/16703-e348f58aab3cf6cc%40postgresql.org

[2] For example, pg_get_viewdef('pg_user') locks pg_shadow, but "LOCK TABLE
pg_user" additionally locks pg_authid and pg_db_role_setting.

Re: Rethinking LOCK TABLE's behavior on views

From

Alvaro Herrera

Date:

09 November 2020, 14:42:33

On 2020-Nov-07, Noah Misch wrote:

> On Sat, Nov 07, 2020 at 11:57:20AM -0500, Tom Lane wrote:

> > A completely different approach we could consider is to weaken the
> > permissions requirements for LOCK on a view, say "allow it if either
> > the calling user or the view owner has the needed permission".  This
> > seems generally pretty messy and so I don't much like it, but we
> > should consider as many solutions as we can think of.
> 
> This is the best of what you've listed by a strong margin, and I don't know of
> better options you've not listed.  +1 for it.  Does it work for you?

It does sound attractive from a user complexity perspective, even if it
does sound messy form an implementation perspective.

> I think
> the mess arises from LOCK TABLE serving "get locks sufficient for $ACTIONS" as
> a family of use cases.  For views only, different $ACTIONS want different
> behavior.  $ACTIONS==SELECT wants today's behavior; pg_get_viewdef() wants
> shallower recursion and caller permissions; DROP VIEW wants no recursion.

Maybe we can tackle this problem directly, by adding a clause to LOCK
TABLE to indicate a purpose for the lock that the server can use to
determine the level of recursion.  For example
  LOCK TABLE xyz IN <mode> FOR <purpose>
where <purpose> can be READ, DROP, DEFINE.

(For back-patch purposes we could store the purpose in LockStmt->mode,
which has more than enough unused bits).

Re: Rethinking LOCK TABLE's behavior on views

From

Noah Misch

Date:

11 November 2020, 02:21:32

On Mon, Nov 09, 2020 at 11:42:33AM -0300, Alvaro Herrera wrote:
> On 2020-Nov-07, Noah Misch wrote:
> > On Sat, Nov 07, 2020 at 11:57:20AM -0500, Tom Lane wrote:
> > > A completely different approach we could consider is to weaken the
> > > permissions requirements for LOCK on a view, say "allow it if either
> > > the calling user or the view owner has the needed permission".  This
> > > seems generally pretty messy and so I don't much like it, but we
> > > should consider as many solutions as we can think of.
> > 
> > This is the best of what you've listed by a strong margin, and I don't know of
> > better options you've not listed.  +1 for it.  Does it work for you?
> 
> It does sound attractive from a user complexity perspective, even if it
> does sound messy form an implementation perspective.
> 
> > I think
> > the mess arises from LOCK TABLE serving "get locks sufficient for $ACTIONS" as
> > a family of use cases.  For views only, different $ACTIONS want different
> > behavior.  $ACTIONS==SELECT wants today's behavior; pg_get_viewdef() wants
> > shallower recursion and caller permissions; DROP VIEW wants no recursion.
> 
> Maybe we can tackle this problem directly, by adding a clause to LOCK
> TABLE to indicate a purpose for the lock that the server can use to
> determine the level of recursion.  For example
>   LOCK TABLE xyz IN <mode> FOR <purpose>
> where <purpose> can be READ, DROP, DEFINE.

Possible.  Regrettably, we're not set up for it; running pg_get_viewdef() to
completion is today's way to determine what it will lock.  Each of these modes
probably would have condensed copies of the operation they mimic, which I'd
find sadder than locking somewhat more than pg_dump needs (via today's "LOCK
TABLE viewname" behavior).  Is it plausible to do without that duplication?