Thread: Improve behavior of concurrent ANALYZE/VACUUM

Improve behavior of concurrent ANALYZE/VACUUM

From
Michael Paquier
Date:
Hi all,
(In CC are the folks who have reviewed the first patch versions, Nathan
and Horiguchi-san.)

After TRUNCATE and REINDEX, here is the third and last thread I am
spawning for the previous thread "Canceling authentication due to
timeout aka Denial of Service Attack":
https://www.postgresql.org/message-id/152512087100.19803.12733865831237526317%40wrigleys.postgresql.org

And this time the discussion is about VACUUM/ANALYZE.  In this case, we
also check relation ownership after queuing for a lock, which can allow
any user to potentially lock a relation which others could use,
particularly with VACUUM FULL which needs an AEL (access exclusive
lock).

In the previous thread, we discussed a couple of approaches, but I was
not happy with any of those, hence I have been spending more time in
getting to a solution which has no user-facing changes, and still solves
the problems folks have been complaining about, and the result is the
patch attached.  The patch changes a couple of things regarding ACL
checks, by simply centralizing the ownership checks into a single
routine used by both ANALYZE and VACUUM.  This routine is then used in
two more places for manual ANALYZE and VACUUM:
- When specifying directly one or more relations in the command, in
expand_vacuum_rel().
- When building the complete list of relations to work on in the case of
a database-wide operation, in get_all_vacuum_rels().

analyze_rel() and vacuum_rel() have been using the same logic to check
for relation ownership, so refactoring things into a single routine is a
win in my opinion.

While reviewing the code, I have of course noticed that analyze_rel()
makes an effort to not produce a WARNING if both VACOPT_VACUUM and
VACOPT_ANALYZE are specified in VacuumStmt->options, however we can
never see that scenario as analyze_rel() never gets called at the same
time as vacuum_rel() for a single relation.

The patch attached includes tests which I have used to also check that
correct error messages are produced for VACUUM, VACUUM ANALYZE and
ANALYZE.

Please note that like the previous one for TRUNCATE, I would no plans
for a back-patch with the same arguments as previously.  There are also
serious bugs being worked on for REL_11_STABLE so I don't want to take
any risk for this branch.

Thoughts?
--
Michael

Attachment

Re: Improve behavior of concurrent ANALYZE/VACUUM

From
Robert Haas
Date:
On Sun, Aug 12, 2018 at 10:21 PM, Michael Paquier <michael@paquier.xyz> wrote:
> In the previous thread, we discussed a couple of approaches, but I was
> not happy with any of those, hence I have been spending more time in
> getting to a solution which has no user-facing changes, and still solves
> the problems folks have been complaining about, and the result is the
> patch attached.  The patch changes a couple of things regarding ACL
> checks, by simply centralizing the ownership checks into a single
> routine used by both ANALYZE and VACUUM.  This routine is then used in
> two more places for manual ANALYZE and VACUUM:
> - When specifying directly one or more relations in the command, in
> expand_vacuum_rel().
> - When building the complete list of relations to work on in the case of
> a database-wide operation, in get_all_vacuum_rels().

I feel like you're not being very clear about exactly what this new
approach is.  Sorry if I'm being dense.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Improve behavior of concurrent ANALYZE/VACUUM

From
Michael Paquier
Date:
On Tue, Aug 14, 2018 at 03:26:29PM +0000, Robert Haas wrote:
> I feel like you're not being very clear about exactly what this new
> approach is.  Sorry if I'm being dense.

On HEAD, we check the ownership of the relation vacuumed or analyzed
after taking a lock on it in respectively vacuum_rel() and
analyze_rel(), where we already know the OID of the relation and there
may be no RangeVar which we could use with RangeVarGetRelidExtended
(like partitions).  I don't think that we want to use again
RangeVarGetRelidExtended once the relation OID is known anyway.  So My
proposal is to add more ownership checks when building the list of
VacuumRelations, and skip the relations the user has no right to work on
at an earlier stage.  Looking at the code may be easier to understand
than a comment, please remember that there are three code paths used to
build the list of VacuumRelations (each one may be processed in its own
transaction):
1) autovacuum, which specifies only one relation at a time with its OID,
and we build the list in expand_vacuum_rel(), which finishes with a
single element.
2) Manual VACUUM with a list of relation specified, where the list of
elements is built in the second part of expand_vacuum_rel(), which is
able to expand partitioned tables as well.
3) Manual VACUUM with no list specified, where the list of relations is
built in get_all_vacuum_rels().

My proposal is to add two more ownership checks in 2) and 3), and also
refactor the code so as we use a single routine for ANALYZE and VACUUM.
This has the advantage of not making the code of ANALYZE and VACUUM
diverge anymore for the existing ownership checks, and we still generate
WARNINGs if a relation needs to be skipped.

(Thinking about it, it could make sense to add an extra assert at the
beginning of expand_vacuum_rel like I did in 6551f3d...)
--
Michael

Attachment

Re: Improve behavior of concurrent ANALYZE/VACUUM

From
Robert Haas
Date:
On Tue, Aug 14, 2018 at 11:59 AM, Michael Paquier <michael@paquier.xyz> wrote:
> On HEAD, we check the ownership of the relation vacuumed or analyzed
> after taking a lock on it in respectively vacuum_rel() and
> analyze_rel(), where we already know the OID of the relation and there
> may be no RangeVar which we could use with RangeVarGetRelidExtended
> (like partitions).  I don't think that we want to use again
> RangeVarGetRelidExtended once the relation OID is known anyway.

We definitely don't want to use RangeVarGetRelidExtended more than
necessary.  It is important that we use that function only when
necessary - that is, to look up names supplied by users - and it is
also important that we look up each user-supplied name only once, lest
we get different answers on different occasions, possibly introducing
either outright security problems or at the least ludicrous behavior.

In the case where we have an OID already, I think we could just
perform a permissions test before locking the OID.  It's true that
permissions might be revoked after we test them and before the lock is
acquired, but that doesn't seem terrible.  The real point of all of
this stuff is to keep users from locking objects which they never had
any right to access, not to worry about what happens if permissions
are concurrently revoked while we're getting the lock.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Improve behavior of concurrent ANALYZE/VACUUM

From
Michael Paquier
Date:
On Thu, Aug 16, 2018 at 03:07:18PM -0400, Robert Haas wrote:
> We definitely don't want to use RangeVarGetRelidExtended more than
> necessary.  It is important that we use that function only when
> necessary - that is, to look up names supplied by users - and it is
> also important that we look up each user-supplied name only once, lest
> we get different answers on different occasions, possibly introducing
> either outright security problems or at the least ludicrous behavior.

Thanks, that matches my feelings about this stuff.

> In the case where we have an OID already, I think we could just
> perform a permissions test before locking the OID.  It's true that
> permissions might be revoked after we test them and before the lock is
> acquired, but that doesn't seem terrible.  The real point of all of
> this stuff is to keep users from locking objects which they never had
> any right to access, not to worry about what happens if permissions
> are concurrently revoked while we're getting the lock.

One thing is that neither pg_class_ownercheck nor pg_database_ownercheck
are fail-safe, and would issue an ERROR when the relation does not
exist, and this can happen when using multiple transactions for VACUUM
FULL or such, so we cannot simply swap the owner checks before trying to
lock the relation in vacuum_rel() and analyze_rel().  Or we invent new
flavors of those routine able to handle missing relations, then swap the
ACL checks to happen before the relations are locked.

For VACUUM/ANALYZE, I tend to think that it is incorrect to include from
the start in the list of relations to process all the ones a user is not
an owner of, so my approach seems quite natural, at least to me.  Each
one of the two approaches has its good and bad sides.
--
Michael

Attachment

Re: Improve behavior of concurrent ANALYZE/VACUUM

From
Michael Paquier
Date:
On Mon, Aug 13, 2018 at 12:21:42AM +0200, Michael Paquier wrote:
> The patch attached includes tests which I have used to also check that
> correct error messages are produced for VACUUM, VACUUM ANALYZE and
> ANALYZE.

I have reworked the patch on this side, clarifying the use of the new
common API for the logs.  One thing I am wondering about is what do we
want to do when VACUUM ANALYZE is used.  As of HEAD, if vacuum_rel()
stops, then analyze_rel() is never called, and the only log showing up
to a non-owner user would be:
skipping "foo" --- only superuser can vacuum it

With this patch, things become perhaps more confusing by generating two
WARNING log entries:
skipping "foo" --- only superuser can vacuum it
skipping "foo" --- only superuser can analyze it

We could either combine both in a single message, or just generate the
message for vacuum as HEAD does now.  I have also added some simple
regression tests triggering the skipping logs for shared catalogs,
non-shared catalogs and non-owners.  This could be a separate patch as
well.

Input is welcome.
--
Michael

Attachment

Re: Improve behavior of concurrent ANALYZE/VACUUM

From
"Bossart, Nathan"
Date:
Hi,

Sorry for the delay!  I looked through the latest patch.

On 8/17/18, 1:43 AM, "Michael Paquier" <michael@paquier.xyz> wrote:
> I have reworked the patch on this side, clarifying the use of the new
> common API for the logs.  One thing I am wondering about is what do we
> want to do when VACUUM ANALYZE is used.  As of HEAD, if vacuum_rel()
> stops, then analyze_rel() is never called, and the only log showing up
> to a non-owner user would be:
> skipping "foo" --- only superuser can vacuum it
>
> With this patch, things become perhaps more confusing by generating two
> WARNING log entries:
> skipping "foo" --- only superuser can vacuum it
> skipping "foo" --- only superuser can analyze it
>
> We could either combine both in a single message, or just generate the
> message for vacuum as HEAD does now.  I have also added some simple
> regression tests triggering the skipping logs for shared catalogs,
> non-shared catalogs and non-owners.  This could be a separate patch as
> well.

I like the idea of emitting a separate WARNING for each for clarity on
what operations are being skipped.  However, I think it could be a bit
confusing with the current wording.  Perhaps something like "skipping
vacuum of..." and "skipping analyze of..." would make things clearer.
Another thing to keep in mind is how often only one of these messages
will apply.  IIUC the vast majority of VACUUM (ANALYZE) statements
that need to emit such log statements would emit both.  Plus, while
VACUUM (ANALYZE) is clearly documented as performing both operations,
I can easily see the argument that users may view it as one big
command and that emitting multiple log entries could be a confusing
change in behavior.

In short, my vote would be to maintain the current behavior for now
and to bring up any logging improvements separately.

+       /*
+        * Depending on the permission checks done while building the list
+        * of relations to work on, it could be possible that the list is
+        * empty, hence do nothing in this case.
+        */
+       if (relations == NIL)
+               return;

It might be better to let the command go through normally so that we
don't miss any cleanup at the end (e.g. deleting vac_context).

+/*
+ * Check if a given relation can be safely vacuumed or not.  If the
+ * user is not the relation owner, issue a WARNING log message and return
+ * false to let the caller decide what to do with this relation.
+ */
+bool
+vacuum_is_relation_owner(Oid relid, Form_pg_class reltuple, int options)
+{
+       Assert((options & (VACOPT_VACUUM | VACOPT_ANALYZE)) != 0);
+
+       /*
+        * Check permissions.
+        *
+        * We allow the user to vacuum a table if he is superuser, the table
+        * owner, or the database owner (but in the latter case, only if it's not
+        * a shared relation).  pg_class_ownercheck includes the superuser case.
+        *
+        * Note we choose to treat permissions failure as a WARNING and keep
+        * trying to vacuum the rest of the DB --- is this appropriate?
+        */

Do you think it's worth adding ANALYZE to these comments as well?

+       if (!(pg_class_ownercheck(relid, GetUserId()) ||
+                 (pg_database_ownercheck(MyDatabaseId, GetUserId()) && !reltuple->relisshared)))

Returning true right away when the role does have permissions might be
a nice way to save a level of indentation.

+               /*
+                * To check whether the relation is a partitioned table and its
+                * ownership, fetch its syscache entry.
+                */
+               tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+               if (!HeapTupleIsValid(tuple))
+                       elog(ERROR, "cache lookup failed for relation %u", relid);
+               classForm = (Form_pg_class) GETSTRUCT(tuple);
+
+               /* check permissions of relation */
+               if (!vacuum_is_relation_owner(relid, classForm, options))
+               {
+                       ReleaseSysCache(tuple);
+
+                       /*
+                        * Release lock again with AccessShareLock -- see below for
+                        * the reason why this lock is released.
+                        */
+                       UnlockRelationOid(relid, AccessShareLock);
+                       return vacrels;
+               }

I think this actually changes the behavior for partitioned tables.
Presently, we still go through and collect all the partitions in the
vacrels list.  With this change, we will skip collecting a table's
partitions if the current role doesn't have the required permissions.
Perhaps we should skip adding the current relation to vacrels if
vacuum_is_relation_owner() returns false, and then we could go through
the partitions and check permissions on those as well.  Since we don't
take any locks on the individual partitions yet, getting the relation
name and calling pg_class_ownercheck() safely might be tricky, though.

Nathan


Re: Improve behavior of concurrent ANALYZE/VACUUM

From
Michael Paquier
Date:
On Mon, Aug 20, 2018 at 08:57:00PM +0000, Bossart, Nathan wrote:
> Sorry for the delay!  I looked through the latest patch.

Thanks a lot for the review!

> I like the idea of emitting a separate WARNING for each for clarity on
> what operations are being skipped.  However, I think it could be a bit
> confusing with the current wording.  Perhaps something like "skipping
> vacuum of..." and "skipping analyze of..." would make things clearer.
> Another thing to keep in mind is how often only one of these messages
> will apply.  IIUC the vast majority of VACUUM (ANALYZE) statements
> that need to emit such log statements would emit both.  Plus, while
> VACUUM (ANALYZE) is clearly documented as performing both operations,
> I can easily see the argument that users may view it as one big
> command and that emitting multiple log entries could be a confusing
> change in behavior.
>
> In short, my vote would be to maintain the current behavior for now
> and to bring up any logging improvements separately.

On the other hand, it would be useful for the user to know exactly what
is getting skipped.  For example if VACUUM ANALYZE is used then both
operations would happen, but now the user would only know that VACUUM
has been skipped, and may miss the fact that ANALYZE was not attempted.
Let's do as you suggest at the end, aka if both VACOPT_VACUUM and
VACOPT_ANALYZE are passed down to vacuum_is_relation_owner, then only
the log for VACUUM is generated, which is consistent.  Any other changes
could happen later on if necessary.

> +       /*
> +        * Depending on the permission checks done while building the list
> +        * of relations to work on, it could be possible that the list is
> +        * empty, hence do nothing in this case.
> +        */
> +       if (relations == NIL)
> +               return;
>
> It might be better to let the command go through normally so that we
> don't miss any cleanup at the end (e.g. deleting vac_context).

Right, that was a bad idea.  Updating datfrozenxid can actually be a
good thing.

> +/*
> + * Check if a given relation can be safely vacuumed or not.  If the
> + * user is not the relation owner, issue a WARNING log message and return
> + * false to let the caller decide what to do with this relation.
> + */
> +bool
> +vacuum_is_relation_owner(Oid relid, Form_pg_class reltuple, int options)
> +{
> +       Assert((options & (VACOPT_VACUUM | VACOPT_ANALYZE)) != 0);
> +
> +       /*
> +        * Check permissions.
> +        *
> +        * We allow the user to vacuum a table if he is superuser, the table
> +        * owner, or the database owner (but in the latter case, only if it's not
> +        * a shared relation).  pg_class_ownercheck includes the superuser case.
> +        *
> +        * Note we choose to treat permissions failure as a WARNING and keep
> +        * trying to vacuum the rest of the DB --- is this appropriate?
> +        */
>
> Do you think it's worth adding ANALYZE to these comments as well?

Done.

> +       if (!(pg_class_ownercheck(relid, GetUserId()) ||
> +                 (pg_database_ownercheck(MyDatabaseId, GetUserId()) && !reltuple->relisshared)))
>
> Returning true right away when the role does have permissions might be
> a nice way to save a level of indentation.

Done.

> I think this actually changes the behavior for partitioned tables.
> Presently, we still go through and collect all the partitions in the
> vacrels list.  With this change, we will skip collecting a table's
> partitions if the current role doesn't have the required permissions.
> Perhaps we should skip adding the current relation to vacrels if
> vacuum_is_relation_owner() returns false, and then we could go through
> the partitions and check permissions on those as well.  Since we don't
> take any locks on the individual partitions yet, getting the relation
> name and calling pg_class_ownercheck() safely might be tricky, though.

Yes, that's actually intentional on my side as this keeps the logic more
simple, and we avoid risks around deadlocks when working on partitions.
It seems to me that it is also more intuitive to *not* scan a full
partition tree if the user does not have ownership on its root if the
relation is listed, instead of trying to scan all leafs to find perhaps
some of them.  In most data models it would matter much anyway, no?
This is also more consistent with what is done for TRUNCATE where the
ACLs of the parent are considered first.  The documentation also
actually mentions that:
"To vacuum a table, one must ordinarily be the table's owner or a
superuser."
Perhaps we could make that clearer for partitions, with something like:
"If listed explicitly, the vacuum of a partitioned table will include
all its partitions if the user is the owner of the partitioned table."

If we don't want to change the current behavior, then one simple
solution would be close to what you mention, aka skip adding the
partitioned table to the list, include *all* the partitions in the list
as we cannot sanely check their ACLs at this stage, and rely on the
checks already happening in vacuum_rel() and analyze_rel().  This would
cause the original early lock attempts to not be solved for partitions,
which is why the approach taken in the patches makes the most sense.

I have split the patch into two parts:
- 0001 includes new tests which generate WARNING messages for VACUUM,
ANALYZE and VACUUM (ANALYZE).  That's useful separately.
- 0002 is the original patch discussed here.

Thanks,
--
Michael

Attachment

Re: Improve behavior of concurrent ANALYZE/VACUUM

From
"Bossart, Nathan"
Date:
On 8/20/18, 8:29 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
>> In short, my vote would be to maintain the current behavior for now
>> and to bring up any logging improvements separately.
>
> On the other hand, it would be useful for the user to know exactly what
> is getting skipped.  For example if VACUUM ANALYZE is used then both
> operations would happen, but now the user would only know that VACUUM
> has been skipped, and may miss the fact that ANALYZE was not attempted.
> Let's do as you suggest at the end, aka if both VACOPT_VACUUM and
> VACOPT_ANALYZE are passed down to vacuum_is_relation_owner, then only
> the log for VACUUM is generated, which is consistent.  Any other changes
> could happen later on if necessary.

Sounds good.

> If we don't want to change the current behavior, then one simple
> solution would be close to what you mention, aka skip adding the
> partitioned table to the list, include *all* the partitions in the list
> as we cannot sanely check their ACLs at this stage, and rely on the
> checks already happening in vacuum_rel() and analyze_rel().  This would
> cause the original early lock attempts to not be solved for partitions,
> which is why the approach taken in the patches makes the most sense.

I think my biggest concern with this approach is that we'd be
introducing inconsistent behavior whenever there are concurrent
changes.  If a user never had permissions to VACUUM the partitioned
table, the partitions are skipped outright.  However, if the user
loses permissions to VACUUM the partitioned table between
expand_vacuum_rel() and vacuum_rel(), we'll still attempt to VACUUM
each individual partition.

I'll admit I don't have a great alternative proposal that doesn't
involve adding deadlock risk or complexity, but it still seems worth
mulling over.

> I have split the patch into two parts:
> - 0001 includes new tests which generate WARNING messages for VACUUM,
> ANALYZE and VACUUM (ANALYZE).  That's useful separately.

0001 looks good to me.

> - 0002 is the original patch discussed here.

I'd suggest even splitting 0002 into two patches: one for refactoring
the existing permissions checks into vacuum_is_relation_owner() and
another for the new checks.

+# The role doesn't have privileges to vacuum the table, so VACUUM should
+# immediately skip the table without waiting for a lock.

Can we add tests for concurrent changes that cause the relation to be
skipped in vacuum_rel() and analyze_rel() instead of
expand_vacuum_rel()?

Nathan


Re: Improve behavior of concurrent ANALYZE/VACUUM

From
Michael Paquier
Date:
On Tue, Aug 21, 2018 at 04:01:50PM +0000, Bossart, Nathan wrote:
> I think my biggest concern with this approach is that we'd be
> introducing inconsistent behavior whenever there are concurrent
> changes.  If a user never had permissions to VACUUM the partitioned
> table, the partitions are skipped outright.  However, if the user
> loses permissions to VACUUM the partitioned table between
> expand_vacuum_rel() and vacuum_rel(), we'll still attempt to VACUUM
> each individual partition.
>
> I'll admit I don't have a great alternative proposal that doesn't
> involve adding deadlock risk or complexity, but it still seems worth
> mulling over.

That counts only for a manual vacuum/analyze listing directly the
relation in question.  If running a system-wide VACUUM then all the
relations are still processed.  This is a rather edge case in my opinion
but..  I don't mind mulling over it (as you say).  So please let me
think over it for a couple of days.  I don't see a smart solution which
does not create risks of lock upgrades and deadlocks now, there may be
one able to preserve the existing behavior.

>> I have split the patch into two parts:
>> - 0001 includes new tests which generate WARNING messages for VACUUM,
>> ANALYZE and VACUUM (ANALYZE).  That's useful separately.
>
> 0001 looks good to me.

Thanks, I have pushed this one.

>> - 0002 is the original patch discussed here.
>
> I'd suggest even splitting 0002 into two patches: one for refactoring
> the existing permissions checks into vacuum_is_relation_owner() and
> another for the new checks.

Hmmm.  The second patch changes also some comment blocks when calling
vacuum_is_relation_owner(), so we finish by changing the same code
areas, resulting in more code churn for no real gain.

> +# The role doesn't have privileges to vacuum the table, so VACUUM should
> +# immediately skip the table without waiting for a lock.
>
> Can we add tests for concurrent changes that cause the relation to be
> skipped in vacuum_rel() and analyze_rel() instead of
> expand_vacuum_rel()?

Doing that deterministically with concurrent tests look difficult to me
as doing ALTER TABLE OWNER TO to a relation in a first session causes a
second session running VACUUM to block in expand_vacuum_rel(), be it
with a plain table or a partitioned table (doing the ALTER TABLE on a
leaf will block scanning the parent as well).
--
Michael

Attachment

Re: Improve behavior of concurrent ANALYZE/VACUUM

From
"Bossart, Nathan"
Date:
On 8/21/18, 7:44 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> On Tue, Aug 21, 2018 at 04:01:50PM +0000, Bossart, Nathan wrote:
>> I think my biggest concern with this approach is that we'd be
>> introducing inconsistent behavior whenever there are concurrent
>> changes.  If a user never had permissions to VACUUM the partitioned
>> table, the partitions are skipped outright.  However, if the user
>> loses permissions to VACUUM the partitioned table between
>> expand_vacuum_rel() and vacuum_rel(), we'll still attempt to VACUUM
>> each individual partition.
>> 
>> I'll admit I don't have a great alternative proposal that doesn't
>> involve adding deadlock risk or complexity, but it still seems worth
>> mulling over.
>
> That counts only for a manual vacuum/analyze listing directly the
> relation in question.  If running a system-wide VACUUM then all the
> relations are still processed.  This is a rather edge case in my opinion
> but..  I don't mind mulling over it (as you say).  So please let me
> think over it for a couple of days.  I don't see a smart solution which
> does not create risks of lock upgrades and deadlocks now, there may be
> one able to preserve the existing behavior.

Right.  If we don't come up with anything, the behavior change for
this edge case is probably reasonable as long as we update the
documentation like you proposed earlier.

>>> - 0002 is the original patch discussed here.
>> 
>> I'd suggest even splitting 0002 into two patches: one for refactoring
>> the existing permissions checks into vacuum_is_relation_owner() and
>> another for the new checks.
>
> Hmmm.  The second patch changes also some comment blocks when calling
> vacuum_is_relation_owner(), so we finish by changing the same code
> areas, resulting in more code churn for no real gain.

I see.  I only made this suggestion so that we could get some of the
easy changes out of the way, but there's no need if it's just adding
unnecessary code churn.

>> +# The role doesn't have privileges to vacuum the table, so VACUUM should
>> +# immediately skip the table without waiting for a lock.
>> 
>> Can we add tests for concurrent changes that cause the relation to be
>> skipped in vacuum_rel() and analyze_rel() instead of
>> expand_vacuum_rel()?
>
> Doing that deterministically with concurrent tests look difficult to me
> as doing ALTER TABLE OWNER TO to a relation in a first session causes a
> second session running VACUUM to block in expand_vacuum_rel(), be it
> with a plain table or a partitioned table (doing the ALTER TABLE on a
> leaf will block scanning the parent as well).

I think this is doable by locking the table in SHARE mode.  That won't
conflict with the AccessShareLock that expand_vacuum_rel() obtains,
but it will conflict with the ShareUpdateExclusiveLock or
AccessExclusiveLock that vacuum_rel() takes.

    session 1> BEGIN; LOCK test IN SHARE MODE;
    session 2> VACUUM test;
    session 1> ALTER TABLE test OWNER TO not_session_2_user; COMMIT;

Nathan


Re: Improve behavior of concurrent ANALYZE/VACUUM

From
Michael Paquier
Date:
On Wed, Aug 22, 2018 at 02:17:44AM +0000, Bossart, Nathan wrote:
> I think this is doable by locking the table in SHARE mode.  That won't
> conflict with the AccessShareLock that expand_vacuum_rel() obtains,
> but it will conflict with the ShareUpdateExclusiveLock or
> AccessExclusiveLock that vacuum_rel() takes.

Good point.  Still is that really worth adding?  This implies a test
which has at least two roles, one switching the ownership to the other
and do so back-and-forth.  At least that should be on a different
isolation spec file to not complicate the first one.
--
Michael

Attachment

Re: Improve behavior of concurrent ANALYZE/VACUUM

From
"Bossart, Nathan"
Date:
On 8/22/18, 12:37 AM, "Michael Paquier" <michael@paquier.xyz> wrote:
> On Wed, Aug 22, 2018 at 02:17:44AM +0000, Bossart, Nathan wrote:
>> I think this is doable by locking the table in SHARE mode.  That won't
>> conflict with the AccessShareLock that expand_vacuum_rel() obtains,
>> but it will conflict with the ShareUpdateExclusiveLock or
>> AccessExclusiveLock that vacuum_rel() takes.
>
> Good point.  Still is that really worth adding?  This implies a test
> which has at least two roles, one switching the ownership to the other
> and do so back-and-forth.  At least that should be on a different
> isolation spec file to not complicate the first one.

I think so, since this is the only ownership checks we do on
individual partitions.  Another simple way to test this would be to
create a partitioned table with a different owner than the partitions
and to run VACUUM as the partitioned table owner.  In this case, we
would still rely on the checks in vacuum_rel() and analyze_rel().  IMO
this is a reason to avoid skipping gathering the individual partitions
based upon the ownership of the partitioned table.  It's true that
this wouldn't fix the locking issue for partitions, but the
aforementioned edge case is still present with 0002 anyway.  Plus, it
would add a bit more consistency to partition handling in VACUUM.

I've attached a patch that applies on top of 0002 that adds a simple
test to exercise the checks in vacuum_rel() and analyze_rel().

+        /*
+         * For VACUUM ANALYZE, both logs could show up, but just generate
+         * information for VACUUM as that would be the first one to
+         * process.
+         */
+        return;

We should probably return false here.

Nathan


Attachment

Re: Improve behavior of concurrent ANALYZE/VACUUM

From
Michael Paquier
Date:
On Wed, Aug 22, 2018 at 03:49:16PM +0000, Bossart, Nathan wrote:
> I think so, since this is the only ownership checks we do on
> individual partitions.  Another simple way to test this would be to
> create a partitioned table with a different owner than the partitions
> and to run VACUUM as the partitioned table owner.  In this case, we
> would still rely on the checks in vacuum_rel() and analyze_rel().  IMO
> this is a reason to avoid skipping gathering the individual partitions
> based upon the ownership of the partitioned table.  It's true that
> this wouldn't fix the locking issue for partitions, but the
> aforementioned edge case is still present with 0002 anyway.  Plus, it
> would add a bit more consistency to partition handling in VACUUM.

Normal regression tests are less costly than isolation tests, so let's
use them as possible.  What you attached is covering only a portion of
all the scenarios though, as it is as well interesting to see what
happens if another user owns only the partitioned table, only one
partition, and the partitioned as well as at least one partition.  I
have extended your patch as attached.  It applies on top of HEAD.  Once
applied with the other patch one can easily stop the difference in
behavior, and this stresses the ownership checks in vacuum_rel() and
analyze_rel() as well.  Perhaps we could begin by that?

> We should probably return false here.

Oh, my compiler complained here as well.  Fixed it on my branch.
--
Michael

Attachment

Re: Improve behavior of concurrent ANALYZE/VACUUM

From
"Bossart, Nathan"
Date:
On 8/23/18, 12:08 AM, "Michael Paquier" <michael@paquier.xyz> wrote:
> Normal regression tests are less costly than isolation tests, so let's
> use them as possible.  What you attached is covering only a portion of
> all the scenarios though, as it is as well interesting to see what
> happens if another user owns only the partitioned table, only one
> partition, and the partitioned as well as at least one partition.  I
> have extended your patch as attached.  It applies on top of HEAD.  Once
> applied with the other patch one can easily stop the difference in
> behavior, and this stresses the ownership checks in vacuum_rel() and
> analyze_rel() as well.  Perhaps we could begin by that?

This seems reasonable to me.  I think establishing the expected
behavior here is a good idea.

Nathan


Re: Improve behavior of concurrent ANALYZE/VACUUM

From
Michael Paquier
Date:
On Thu, Aug 23, 2018 at 09:53:57PM +0000, Bossart, Nathan wrote:
> This seems reasonable to me.  I think establishing the expected
> behavior here is a good idea.

Thanks, I have pushed the new test series, and reused it to check the
new version of the main patch, which is attached.  I have added a commit
message and I have indented the thing.

After pondering about it, I have also reworked the portion for
partitioned tables so as the list of partitions processed is unchanged
on HEAD, and we keep a consistent behavior compared to past versions.
If VACUUM processing for partitioned tables was something new in 11, I
think that we could have considered it, but changing silently something
that people may rely on for more than one year now is not very
appealing.

I can personally imagine data models with multiple layers of partitions
where the top-most parent has the most restricted access, and then
things get more permitted the more down you get.  For example let's
imagine a table listing a population, which is split by cities.  The
top-most partitioned table references the whole country, which say only
the president has access to.  Then there are partitions which can be
accessed only by the majors of each city.  In this case, even if a mayor
does a VACUUM FULL of its leaf partition then a full read would be
blocked even for the president.

The reverse is technically possible, aka the top-most parent is not
really restrictive, and leafs get more and more restricted, but
logically that does not make much sense as the top-most parent would be
just useless for any kind of operations so as a full table scan.

Still, in the first case, say that each city major uses the same
application layer which vacuums the top-most parent, then we'd break
something that worked in 10 and 11.
--
Michael

Attachment

Re: Improve behavior of concurrent ANALYZE/VACUUM

From
"Bossart, Nathan"
Date:
On 8/23/18, 9:16 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> Thanks, I have pushed the new test series, and reused it to check the
> new version of the main patch, which is attached.  I have added a commit
> message and I have indented the thing.

Thanks for the new version!

> After pondering about it, I have also reworked the portion for
> partitioned tables so as the list of partitions processed is unchanged
> on HEAD, and we keep a consistent behavior compared to past versions.
> If VACUUM processing for partitioned tables was something new in 11, I
> think that we could have considered it, but changing silently something
> that people may rely on for more than one year now is not very
> appealing.

Agreed.  Even though we're not fixing the issue for partitions yet,
this patch should still fix the originally reported authentication
issue (which I see is highlighted in your commit message).  I think
there's still a slight behavior change with the ordering of the
"skipped" log messages in some cases, but that doesn't seem terribly
important.  We might be able to work around this by storing all the
information we need for the log message in the VacuumRelation and
waiting to emit it until vacuum_rel() or analyze_rel(), but I doubt
it's worth the complexity.

  Without patch:
    postgres=> VACUUM parted1, parted2;
    WARNING:  skipping "parted1" --- only table or database owner can vacuum it
    WARNING:  skipping "parted1_part1" --- only table or database owner can vacuum it
    WARNING:  skipping "parted1_part2" --- only table or database owner can vacuum it
    WARNING:  skipping "parted2" --- only table or database owner can vacuum it
    WARNING:  skipping "parted2_part1" --- only table or database owner can vacuum it
    WARNING:  skipping "parted2_part2" --- only table or database owner can vacuum it
    VACUUM

  With patch:
    postgres=> VACUUM parted1, parted2;
    WARNING:  skipping "parted1" --- only table or database owner can vacuum it
    WARNING:  skipping "parted2" --- only table or database owner can vacuum it
    WARNING:  skipping "parted1_part1" --- only table or database owner can vacuum it
    WARNING:  skipping "parted1_part2" --- only table or database owner can vacuum it
    WARNING:  skipping "parted2_part1" --- only table or database owner can vacuum it
    WARNING:  skipping "parted2_part2" --- only table or database owner can vacuum it
    VACUUM

The new version of the patch applies cleanly, builds cleanly, and
'make check-world' succeeds.  Also, I'm no longer able to reproduce
the authentication issue involving 'VACUUM FULL' run by non-
superusers, so it looks good to me.

Nathan


Re: Improve behavior of concurrent ANALYZE/VACUUM

From
Michael Paquier
Date:
On Fri, Aug 24, 2018 at 05:30:01PM +0000, Bossart, Nathan wrote:
> On 8/23/18, 9:16 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
>> Thanks, I have pushed the new test series, and reused it to check the
>> new version of the main patch, which is attached.  I have added a commit
>> message and I have indented the thing.
>
> Thanks for the new version!

Finally, I have been able to come back to it, and pushed the latest
version.  We have come a long way...  I'll check the rest of the backend
code for weird calls of relation_open or such.  We may have other cases
with similar problems.

>> After pondering about it, I have also reworked the portion for
>> partitioned tables so as the list of partitions processed is unchanged
>> on HEAD, and we keep a consistent behavior compared to past versions.
>> If VACUUM processing for partitioned tables was something new in 11, I
>> think that we could have considered it, but changing silently something
>> that people may rely on for more than one year now is not very
>> appealing.
>
> Agreed.  Even though we're not fixing the issue for partitions yet,
> this patch should still fix the originally reported authentication
> issue (which I see is highlighted in your commit message).  I think
> there's still a slight behavior change with the ordering of the
> "skipped" log messages in some cases, but that doesn't seem terribly
> important.  We might be able to work around this by storing all the
> information we need for the log message in the VacuumRelation and
> waiting to emit it until vacuum_rel() or analyze_rel(), but I doubt
> it's worth the complexity.

This one is definitely not worth worrying in my opinion, we still
process the same relations, and the order is preserved when using a
single relation.

> The new version of the patch applies cleanly, builds cleanly, and
> 'make check-world' succeeds.  Also, I'm no longer able to reproduce
> the authentication issue involving 'VACUUM FULL' run by non-
> superusers, so it looks good to me.

Thanks for the help!
--
Michael

Attachment