On 3/20/23 10:43, Tom Lane wrote:
> I'd be more willing to consider the proposed patch if it weren't such
> a hack --- as you say, it doesn't fix the problem when the table has
> policies, so it's hardly a general-purpose solution. I fear that it's
> also fairly expensive: adding sub-selects to the query we must do
> before we can lock any tables is not appetizing, because making that
> window wider adds to the risk of deadlocks, dump failures, etc.
(moving to -hackers and July CF)
= Recap for Potential Reviewers =
The timescaledb extension has an internal table that's owned by the
superuser. It's not dumpable, and other users can only access its
contents through a filtered view. For our cloud deployments, we
additionally have that common trope where the most privileged users
aren't actually superusers, but we still want them to be able to perform
a subset of maintenance tasks, including pg_dumping their data.
When cloud users try to dump that data, pg_dump sees that this internal
table is an extension member and plans to dump ACLs, security labels,
and RLS policies for it. (This behavior cannot be overridden with
--exclude-table. pg_dump ignores that flag for extension members.)
Dumping policies requires the use of pg_get_expr() on the backend; this,
in turn, requires a lock on the table with ACCESS SHARE.
So pg_dump tries to lock a table, with no policies, that it's not going
to dump the schema or data for anyway, and it fails because our users
don't have (and shouldn't need) SELECT access to it. For an example of
this in action, I've attached a test case in v2-0001.
= Proposal =
Since this is affecting users on released Postgres versions, my end goal
is to find a fix that's backportable.
This situation looks very similar to [1], where non-superusers couldn't
perform a dump because we were eagerly grabbing table locks to read
(non-existent) ACLs. But that was solved with the realization that ACLs
don't need locks anyway, which is unfortunately not applicable to policies.
My initial patch to -bugs was a riff on a related performance fix [2],
which figured out which tables had interesting ACLs and skipped that
part if nothing was found. I added the same kind of subselect for RLS
policies as well, but that had nasty corner cases where it would perform
terribly, as Tom alluded to above. (In a cluster of 200k tables, where
one single table had 10M policies, the query ground to a halt.)
So v2-0002 is instead inspired by Tom's rewrite of that ACL dump logic
[3]. It scans pg_policy separately, stores the tables it finds into the
catalog map on the client side, and then again skips the policy dump
(and therefore the lock) if no policies exist. The performance hit now
scales with the size of pg_policy alone.
This is a bit more invasive than the subselect, but hopefully still
straightforward enough to be applicable to the back branches' old
catalog map strategy. It's still not a general-purpose fix, as Tom
pointed out above, but that was true of the discussion in [1] as well,
so I'm optimistic.
WDYT?
--Jacob
[1]
https://postgr.es/m/CAGPqQf3Uzo-yU1suYyoZR83h6QTxXxkGTtEyeMV7EAVBqn%3DPcQ%40mail.gmail.com
[2] https://git.postgresql.org/cgit/postgresql.git/commit/?id=5d589993
[3] https://git.postgresql.org/cgit/postgresql.git/commit/?id=0c9d8442