Thread: Fwd: BUG #15182: Canceling authentication due to timeout aka Denialof Service Attack
Fwd: BUG #15182: Canceling authentication due to timeout aka Denialof Service Attack
From
Jeremy Schneider
Date:
I'd like to bump this old bug that Lloyd filed for more discussion. It seems serious enough to me that we should at least talk about it. Anyone with simply the login privilege and the ability to run SQL can instantly block all new incoming connections to a DB including new superuser connections. session 1: select pg_sleep(9999999999) from pg_stat_activity; session 2: vacuum full pg_authid; -or- truncate table pg_authid; (there are likely other SQL you could run in session 2 as well.) -------- Forwarded Message -------- Subject: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack Date: Mon, 30 Apr 2018 20:41:11 +0000 From: PG Bug reporting form <noreply@postgresql.org> Reply-To: lalbin@scharp.org, pgsql-bugs@lists.postgresql.org To: pgsql-bugs@lists.postgresql.org CC: lalbin@scharp.org The following bug has been logged on the website: Bug reference: 15182 Logged by: Lloyd Albin Email address: lalbin@scharp.org PostgreSQL version: 10.3 Operating system: OpenSUSE Description: Over the last several weeks our developers caused a Denial of Service Attack against ourselves by accident. When looking at the log files, I noticed that we had authentication timeouts during these time periods. In researching the problem I found this is due to locks being held on shared system catalog items, aka system catalog items that are shared between all databases on the same cluster/server. This can be caused by beginning a long running transaction that queries pg_stat_activity, pg_roles, pg_database, etc and then another connection that runs either a REINDEX DATABASE, REINDEX SYSTEM, or VACUUM FULL. This issue is of particular importance to database resellers who use the same cluster/server for multiple clients, as two clients can cause this issue to happen inadvertently or a single client can either cause it to happen maliciously or inadvertently. Note: The large cloud providers give each of their clients their own cluster/server so this will not affect across cloud clients but can affect an individual client. The problem is that traditional hosting companies will have all clients from one or more web servers share the same PostgreSQL cluster/server. This means that one or two clients could inadvertently stop all the other clients from being able to connect to their databases until the first client does either a COMMIT or ROLLBACK of their transaction which they could hold open for hours, which is what happened to us internally. In Connection 1 we need to BEGIN a transaction and then query a shared system item; pg_authid, pg_database, etc; or a view that depends on a shared system item; pg_stat_activity, pg_roles, etc. Our developers were accessing pg_roles. Connection 1 (Any database, Any User) BEGIN; SELECT * FROM pg_stat_activity; Connection 2 (Any database will do as long as you are the database owner) REINDEX DATABASE postgres; Connection 3 (Any Database, Any User) psql -h sqltest-alt -d sandbox All future Connection 3's will hang for however long the transaction in Connection 1 runs. In our case this was hours and denied everybody else the ability to log into the server until Connection 1 was committed. psql will just hang for hours, even overnight in my testing, but our apps would get the "Canceling authentication due to timeout" after 1 minute. Connection 2 can also do any of these commands to also cause the same issue: REINDEX SYSTEM postgres; VACUUM FULL pg_authid; vacuumdb -f -h sqltest-alt -d lloyd -U lalbin Even worse is that the VACUUM FULL pg_authid; can be started by an unprivileged user and it will wait for the AccessShareLock by connection 1 to be released before returning the error that you don't have permission to perform this action, so even an unprivileged user can cause this to happen. The privilege check needs to happen before the waiting for the AccessExclusiveLock happens. This bug report has been simplified and shorted drastically. To read the full information about this issue please see my blog post: http://lloyd.thealbins.com/Canceling%20authentication%20due%20to%20timeout Lloyd Albin Database Administrator Statistical Center for HIV/AIDS Research and Prevention (SCHARP) Fred Hutchinson Cancer Research Center -- Jeremy Schneider Database Engineer Amazon Web Services
Re: BUG #15182: Canceling authentication due to timeout aka Denial ofService Attack
From
Marko Tiikkaja
Date:
On Fri, Jul 20, 2018 at 2:17 AM, Jeremy Schneider <schnjere@amazon.com> wrote:
I'd like to bump this old bug that Lloyd filed for more discussion. It
seems serious enough to me that we should at least talk about it.
Anyone with simply the login privilege and the ability to run SQL can
instantly block all new incoming connections to a DB including new
superuser connections.
So.. don't VACUUM FULL pg_authid without lock_timeout?
I can come up with dozens of ways to achieve the same effect, all of them silly.
.m
Re: BUG #15182: Canceling authentication due to timeout aka Denial ofService Attack
From
Robert Haas
Date:
On Thu, Jul 19, 2018 at 7:17 PM, Jeremy Schneider <schnjere@amazon.com> wrote: > I'd like to bump this old bug that Lloyd filed for more discussion. It > seems serious enough to me that we should at least talk about it. > > Anyone with simply the login privilege and the ability to run SQL can > instantly block all new incoming connections to a DB including new > superuser connections. > > session 1: > select pg_sleep(9999999999) from pg_stat_activity; > > session 2: > vacuum full pg_authid; -or- truncate table pg_authid; > > (there are likely other SQL you could run in session 2 as well.) ExecuteTruncate needs to be refactored to use RangeVarGetRelidExtended with a non-NULL callback rather than heap_openrv, and expand_vacuum_rel needs to use RangeVarGetRelidExtended with a callback instead of RangeVarGetRelid. See cbe24a6dd8fb224b9585f25b882d5ffdb55a0ba5 as an example of what to do. I fixed a large number of cases of this problem back around that time, but then ran out of steam and had to move onto other things before I got them all. Patches welcome. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: BUG #15182: Canceling authentication due to timeout aka Denial ofService Attack
From
Jeff Janes
Date:
On Fri, Jul 20, 2018 at 5:56 PM, Marko Tiikkaja <marko@joh.to> wrote:
On Fri, Jul 20, 2018 at 2:17 AM, Jeremy Schneider <schnjere@amazon.com> wrote:I'd like to bump this old bug that Lloyd filed for more discussion. It
seems serious enough to me that we should at least talk about it.
Anyone with simply the login privilege and the ability to run SQL can
instantly block all new incoming connections to a DB including new
superuser connections.So.. don't VACUUM FULL pg_authid without lock_timeout?
That's like saying the solution to a security hole is for no one to attempt to exploit it.
Note that you do not need to have permissions to do the vacuum full. This works merely from the attempt to do so, before the permissions are checked.
Cheers,
Jeff
Re: BUG #15182: Canceling authentication due to timeout aka Denialof Service Attack
From
Michael Paquier
Date:
On Mon, Jul 23, 2018 at 11:29:33AM -0400, Robert Haas wrote: > ExecuteTruncate needs to be refactored to use RangeVarGetRelidExtended > with a non-NULL callback rather than heap_openrv, and > expand_vacuum_rel needs to use RangeVarGetRelidExtended with a > callback instead of RangeVarGetRelid. See > cbe24a6dd8fb224b9585f25b882d5ffdb55a0ba5 as an example of what to do. > I fixed a large number of cases of this problem back around that time, > but then ran out of steam and had to move onto other things before I > got them all. Patches welcome. Thanks for pointing those out, I looked at both code paths recently for some other work... The amount of work does not consist just in using for example RangeVarCallbackOwnsRelation for VACUUM and TRUNCATE. There are a couple of reasons behind that: - While it would make sense, at least to me, to make VACUUM fall into if allow_system_table_mods is allowed, that's not the case of ANALYZE as I think that we should be able to call ANALYZE on a system catalog as well. So we would basically a new flavor of RangeVarCallbackOwnsRelation for VACUUM which makes this difference between vacuum and analyze with an argument in the callback, the options of VacuumStmt would be nice. This would not be used by autovacuum anyway, but adding an assertion and mentioning that in the comments would not hurt. There is an argument for just restricting VACUUM FULL as well and not plain VACUUM, as that's the one hurting here. - TRUNCATE is closer to a solution, as it has its own flavor of relation checks with truncate_check_rel. So the callback would replace truncate_check_rel but CheckTableNotInUse should be moved out of it. TRUNCATE already uses allow_system_table_mods for its checks. Thoughts? -- Michael
Attachment
Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack
From
Andres Freund
Date:
On July 23, 2018 9:14:03 PM PDT, Michael Paquier <michael@paquier.xyz> wrote: >- While it would make sense, at least to me, to make VACUUM fall into >if >allow_system_table_mods is allowed, I might be mis-parsing this due to typos. Are you actually suggesting vacuum on system tables should depend on that GUC?If so, why? That's seems like a terrible idea. It's pretty normal to occasionally have to vacuum them? Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: BUG #15182: Canceling authentication due to timeout aka Denialof Service Attack
From
Michael Paquier
Date:
On Mon, Jul 23, 2018 at 09:17:53PM -0700, Andres Freund wrote: > I might be mis-parsing this due to typos. Are you actually suggesting > vacuum on system tables should depend on that GUC? If so, why? That's > seems like a terrible idea. It's pretty normal to occasionally have > to vacuum them? Oh, yes, that would be bad. My mind has slipped here. I have seen manual VACUUMs on system catalogs for applications using many temp tables... So we would want to have only VACUUM FULL being conditionally happening? The question comes then about what to do when a VACUUM FULL is run without a list of relations because expand_vacuum_rel() is not actually the only problem. Would we want to ignore system tables as well except if allow_system_table_mods is on? When no relation list is specified, get_all_vacuum_rels() builds the list of relations which causes vacuum_rel() to complain on try_relation_open(), so patching just expand_vacuum_rel() solves only half of the problem for manual VACUUMs. -- Michael
Attachment
Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack
From
Andres Freund
Date:
On July 23, 2018 9:50:10 PM PDT, Michael Paquier <michael@paquier.xyz> wrote: >On Mon, Jul 23, 2018 at 09:17:53PM -0700, Andres Freund wrote: >> I might be mis-parsing this due to typos. Are you actually suggesting >> vacuum on system tables should depend on that GUC? If so, why? That's >> seems like a terrible idea. It's pretty normal to occasionally have >> to vacuum them? > >Oh, yes, that would be bad. My mind has slipped here. I have seen >manual VACUUMs on system catalogs for applications using many temp >tables... So we would want to have only VACUUM FULL being >conditionally >happening? The question comes then about what to do when a VACUUM FULL >is run without a list of relations because expand_vacuum_rel() is not >actually the only problem. Would we want to ignore system tables as >well except if allow_system_table_mods is on? When no relation list is >specified, get_all_vacuum_rels() builds the list of relations which >causes vacuum_rel() to complain on try_relation_open(), so patching >just expand_vacuum_rel() solves only half of the problem for manual >VACUUMs. I think any such restriction is entirely unacceptable. FULL or not. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: BUG #15182: Canceling authentication due to timeout aka Denialof Service Attack
From
Michael Paquier
Date:
On Mon, Jul 23, 2018 at 09:51:54PM -0700, Andres Freund wrote: > On July 23, 2018 9:50:10 PM PDT, Michael Paquier <michael@paquier.xyz> wrote: >> Oh, yes, that would be bad. My mind has slipped here. I have seen >> manual VACUUMs on system catalogs for applications using many temp >> tables... So we would want to have only VACUUM FULL being >> conditionally >> happening? The question comes then about what to do when a VACUUM FULL >> is run without a list of relations because expand_vacuum_rel() is not >> actually the only problem. Would we want to ignore system tables as >> well except if allow_system_table_mods is on? When no relation list is >> specified, get_all_vacuum_rels() builds the list of relations which >> causes vacuum_rel() to complain on try_relation_open(), so patching >> just expand_vacuum_rel() solves only half of the problem for manual >> VACUUMs. > > I think any such restriction is entirely unacceptable. FULL or not. Well, letting any users take an exclusive lock on system catalogs at will is not acceptable either, so two possible answers would be to fail or skip such relations. The first concept applies if a relation list is given by the user, and the second if no list is given. Do you have any thoughts on the matter? -- Michael
Attachment
Re: BUG #15182: Canceling authentication due to timeout aka Denialof Service Attack
From
Michael Paquier
Date:
On Tue, Jul 24, 2018 at 02:23:02PM +0900, Michael Paquier wrote: > Well, letting any users take an exclusive lock on system catalogs at > will is not acceptable either, so two possible answers would be to fail > or skip such relations. The first concept applies if a relation list is > given by the user, and the second if no list is given. The first sentence is incorrect. That's actually "letting any users attempt to take an exclusive lock which makes others to be stuck as well". -- Michael
Attachment
Re: BUG #15182: Canceling authentication due to timeout aka Denialof Service Attack
From
Alvaro Herrera
Date:
On 2018-Jul-24, Michael Paquier wrote: > On Mon, Jul 23, 2018 at 11:29:33AM -0400, Robert Haas wrote: > > ExecuteTruncate needs to be refactored to use RangeVarGetRelidExtended > > with a non-NULL callback rather than heap_openrv, and > > expand_vacuum_rel needs to use RangeVarGetRelidExtended with a > > callback instead of RangeVarGetRelid. See > > cbe24a6dd8fb224b9585f25b882d5ffdb55a0ba5 as an example of what to do. > > I fixed a large number of cases of this problem back around that time, > > but then ran out of steam and had to move onto other things before I > > got them all. Patches welcome. > > Thanks for pointing those out, I looked at both code paths recently for > some other work... The amount of work does not consist just in using > for example RangeVarCallbackOwnsRelation for VACUUM and TRUNCATE. I don't think we're forced to reuse the existing callbacks -- maybe write a specific callback for each case, if really needed. But anyway like Andres I don't think this is related to allow_system_table_mods at all; you just need to do the checks in the right order, no? But I don't see why RangeVarCallbackOwnsTable isn't sufficient. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #15182: Canceling authentication due to timeout aka Denialof Service Attack
From
Michael Paquier
Date:
On Tue, Jul 24, 2018 at 01:34:13AM -0400, Alvaro Herrera wrote: > But I don't see why RangeVarCallbackOwnsTable isn't sufficient. The set of relkinds checked by truncate_check_rel and RangeVarCallbackOwnsTable is different (toast and matviews). And in the case of VACUUM, partitioned tables can call RangeVarGetRelidExtended when one is listed as part of a manual VACUUM command. -- Michael
Attachment
Re: BUG #15182: Canceling authentication due to timeout aka Denialof Service Attack
From
Kyotaro HORIGUCHI
Date:
At Tue, 24 Jul 2018 14:23:02 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20180724052302.GB4736@paquier.xyz> > On Mon, Jul 23, 2018 at 09:51:54PM -0700, Andres Freund wrote: > > On July 23, 2018 9:50:10 PM PDT, Michael Paquier <michael@paquier.xyz> wrote: > >> Oh, yes, that would be bad. My mind has slipped here. I have seen > >> manual VACUUMs on system catalogs for applications using many temp > >> tables... So we would want to have only VACUUM FULL being > >> conditionally > >> happening? The question comes then about what to do when a VACUUM FULL > >> is run without a list of relations because expand_vacuum_rel() is not > >> actually the only problem. Would we want to ignore system tables as > >> well except if allow_system_table_mods is on? When no relation list is > >> specified, get_all_vacuum_rels() builds the list of relations which > >> causes vacuum_rel() to complain on try_relation_open(), so patching > >> just expand_vacuum_rel() solves only half of the problem for manual > >> VACUUMs. > > > > I think any such restriction is entirely unacceptable. FULL or not. > > Well, letting any users attempt to take an exclusive lock which > makes others to be stuck as well is not acceptable either, so > two possible answers would be to fail > or skip such relations. The first concept applies if a relation list is > given by the user, and the second if no list is given. > > Do you have any thoughts on the matter? I'm not sure what is the exact problem here. If it is that a non-privilege user can stuck on VACUUM FULL on the scenario, we should just give up VACUUM_FULLing on unprivileged relations *before* trying to take a lock on it. There's no reason for allowing VACUUM FULL on a relation on that the same user is not allowed to run VACUUM. I don't think it's a prbolem that superuser can be involed in the blocking scenario running VACUUM FULL. I may be misunderstanding something because (really ?) it's still extremely hot in Japan, today. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: BUG #15182: Canceling authentication due to timeout aka Denialof Service Attack
From
Michael Paquier
Date:
On Tue, Jul 24, 2018 at 06:27:16PM +0900, Kyotaro HORIGUCHI wrote: > I may be misunderstanding something because (really ?) it's still > extremely hot in Japan, today. It is better since yesterday, in exchange of a typhoon heading straight to Tokyo :) -- Michael
Attachment
Re: BUG #15182: Canceling authentication due to timeout aka Denialof Service Attack
From
Michael Paquier
Date:
Hi, So, I have spent the last couple of days trying to figure out a nice solution for VACUUM, TRUNCATE and REINDEX, and attached is a set of three patches to address the issues of this thread: 1) 0001 does the work for TRUNCATE, but using RangeVarGetRelidExtended with a custom callback based on the existing truncate_check_rel(). I had to split the session-level checks into a separate routine as no Relation is available, but that finishes by being a neat result without changing any user-facing behavior. 2) 0002 does the work for VACUUM. It happens that vacuum_rel() already does all the skip-related checks we need to know about to decide if a relation needs to be vacuum'ed or not, so I refactored things as follows: 2-1) For a VACUUM manual listing relations, issue an ERROR if it cannot be vacuum'ed. Previously vacuum_rel() would just log a WARNING and call it a day *after* locking the relation. But as we need to rely on RangeVarGetRelidExtended() an ERROR is necessary. The ERROR happens only if VACUUM FULL is used. 2-2) When a relation list is not specified in a manual VACUUM command, then the decision to skip the relation is done in get_all_vacuum_rels() when building the relation list with the pg_class lookup. This logs a DEBUG message when the relation is skipped, which is more information that what we have now. The checks need to happen again in vacuum_rel as the VACUUM work could be spawned across multiple transactions, where a WARNING is logged. 3) REINDEX is already smart enough to check for ownership of relations if one is manually listed and reports an ERROR. However it can cause the instance to be stuck when doing a database-wide REINDEX on a database using just the owner of this database. In this case it seems to me that we need to make ReindexMultipleTables in terms of ACL checks, as per 0003. I quite like the shape of the patches proposed here, and the refactoring is I think pretty clear. Each patch can be treated independently as well. Comments are welcome. (Those patches are not indented yet, which does not matter much at this stage anyway.) Thanks, -- Michael
Attachment
Re: BUG #15182: Canceling authentication due to timeout aka Denial ofService Attack
From
"Bossart, Nathan"
Date:
I took a look at 0001. On 7/26/18, 12:24 AM, "Michael Paquier" <michael@paquier.xyz> wrote: > 1) 0001 does the work for TRUNCATE, but using RangeVarGetRelidExtended > with a custom callback based on the existing truncate_check_rel(). I > had to split the session-level checks into a separate routine as no > Relation is available, but that finishes by being a neat result without > changing any user-facing behavior. Splitting the checks like this seems reasonable. As you pointed out, it doesn't change the behavior of the session checks, which AFAICT aren't necessary for the kind of permissions checks we want to add to the RangeVarGetRelidExtended() call. - myrelid = RelationGetRelid(rel); + myrelid = RangeVarGetRelidExtended(rv, AccessExclusiveLock, + false, RangeVarCallbackForTruncate, + NULL); Should the flags argument be 0 instead of false? + /* Nothing to do if the relation was not found. */ + if (!OidIsValid(relId)) + return; + + tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relId)); + if (!HeapTupleIsValid(tuple)) /* should not happen */ + elog(ERROR, "cache lookup failed for relation %u", relId); The first time we use this callback, the relation won't be locked, so isn't it possible that we won't get a valid tuple here? I did notice that callbacks like RangeVarCallbackForRenameRule, RangeVarCallbackForPolicy, and RangeVarCallbackForRenameTrigger assume that the relation can be concurrently dropped, but RangeVarCallbackOwnsRelation does not. Instead, we assume that the syscache search will succeed if the given OID is valid. Is this a bug, or am I missing something? Nathan
Re: BUG #15182: Canceling authentication due to timeout aka Denial ofService Attack
From
"Bossart, Nathan"
Date:
On 7/26/18, 10:07 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote: > The first time we use this callback, the relation won't be locked, so > isn't it possible that we won't get a valid tuple here? I did notice > that callbacks like RangeVarCallbackForRenameRule, > RangeVarCallbackForPolicy, and RangeVarCallbackForRenameTrigger assume > that the relation can be concurrently dropped, but > RangeVarCallbackOwnsRelation does not. Instead, we assume that the > syscache search will succeed if the given OID is valid. Is this a > bug, or am I missing something? Please pardon the noise. I see that we don't accept invalidation messages until later on in RangeVarGetRelidExtended(), at which point we'll retry and get InvalidOid for concurrently dropped relations. Nathan
Re: BUG #15182: Canceling authentication due to timeout aka Denialof Service Attack
From
Michael Paquier
Date:
On Thu, Jul 26, 2018 at 03:06:59PM +0000, Bossart, Nathan wrote: > I took a look at 0001. Thanks for the lookup. 0003 is the most simple in the set by the way. > On 7/26/18, 12:24 AM, "Michael Paquier" <michael@paquier.xyz> wrote: > - myrelid = RelationGetRelid(rel); > + myrelid = RangeVarGetRelidExtended(rv, AccessExclusiveLock, > + false, RangeVarCallbackForTruncate, NULL); > > Should the flags argument be 0 instead of false? Yes, those should be 0. All patches are missing that. It does not have a bad consequence on the patch, still that's incorrect. -- Michael
Attachment
Re: BUG #15182: Canceling authentication due to timeout aka Denial ofService Attack
From
"Bossart, Nathan"
Date:
On 7/26/18, 3:42 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > Thanks for the lookup. 0003 is the most simple in the set by the way. Minus a possible documentation update, 0003 seems almost ready, too. For 0002, would adding vacuum_skip_rel() before and after try_relation_open() in vacuum_rel() be enough? That way, we could avoid emitting an ERROR for only the VACUUM FULL case (and skip it with a WARNING like everything else). Nathan
Re: BUG #15182: Canceling authentication due to timeout aka Denialof Service Attack
From
Michael Paquier
Date:
On Thu, Jul 26, 2018 at 11:14:31PM +0000, Bossart, Nathan wrote: > On 7/26/18, 3:42 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > Minus a possible documentation update, 0003 seems almost ready, too. The docs mentioned that shared catalogs are processed, so I did not bother, but visibly your comment is that we could be more precise about the ownership in this case? An attempt is attached. > For 0002, would adding vacuum_skip_rel() before and after > try_relation_open() in vacuum_rel() be enough? That way, we could > avoid emitting an ERROR for only the VACUUM FULL case (and skip it > with a WARNING like everything else). Er, we need a lock on it while looking at its data in vacuum_rel(), no? So the call to vacuum_skip_rel() needs to happen after try_relation_open(), once we are sure that the relation is opened. Having two set of checks is actually better as the operation can involve multiple operations. The error messages generated by vacuum_skip_rel are not especially smart when elevel >= ERROR. As we need a proper errcode for that case as well just using a separate error message is less confusing. I have implemented my idea in the updated set attached. Another issue I have found is that when doing for example a system-wide analyze, we would finish with spurious warnings, as toast relations need to be discarded from the first set of relations built. Anyway, I have done more work on the patches, mainly I have fixed the calls to RangeVarGetRelidExtended using booleans. I have added isolation tests for cases which are cheap, aka those not involving a system-wide operation. Running those tests on HEAD, it is easy to see that TRUNCATE or VACUUM complete after a session doing a catalog lookup commits its transaction. VACUUM skips a relation and VACUUM FULL issues an error. Regarding those patches, I am pretty happy how things turn out for TRUNCATE and REINDEX, way less for VACUUM, so getting 0001 and 0003 committed first makes the most sense to me as their logic is rather straight-forward (well way of speaking ;p ). -- Michael
Attachment
Re: BUG #15182: Canceling authentication due to timeout aka Denialof Service Attack
From
Kyotaro HORIGUCHI
Date:
Hello. At Fri, 27 Jul 2018 11:31:23 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20180727023123.GE1754@paquier.xyz> > On Thu, Jul 26, 2018 at 11:14:31PM +0000, Bossart, Nathan wrote: > > On 7/26/18, 3:42 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > > Minus a possible documentation update, 0003 seems almost ready, too. > > The docs mentioned that shared catalogs are processed, so I did not > bother, but visibly your comment is that we could be more precise about > the ownership in this case? An attempt is attached. > > > For 0002, would adding vacuum_skip_rel() before and after > > try_relation_open() in vacuum_rel() be enough? That way, we could > > avoid emitting an ERROR for only the VACUUM FULL case (and skip it > > with a WARNING like everything else). > > Er, we need a lock on it while looking at its data in vacuum_rel(), no? > So the call to vacuum_skip_rel() needs to happen after > try_relation_open(), once we are sure that the relation is opened. > Having two set of checks is actually better as the operation can involve > multiple operations. I intended the same as Bossart said. It is not reasonable that VACUUM and VACUUM FULL behaves differently on rejection for the same reason. The objective of the first (or early) check is rejecting (non-superuser's) unprivileged VACUUM FULL. Any possible modifications on a syscache tuple before taking a lock doesn't seem to harm in the point of view. Anyway the lock acquired in expand_vacuum_rel is not held until vacuum_rel. > The error messages generated by vacuum_skip_rel are not especially smart > when elevel >= ERROR. As we need a proper errcode for that case as well > just using a separate error message is less confusing. I have Apart from the discussion above, it is uneasy for me that the messages seem heavily affected by the callers context. > implemented my idea in the updated set attached. Another issue I have > found is that when doing for example a system-wide analyze, we would > finish with spurious warnings, as toast relations need to be discarded > from the first set of relations built. It seems reasonable. > Anyway, I have done more work on the patches, mainly I have fixed the > calls to RangeVarGetRelidExtended using booleans. I have added I think that it is useful to let callback reutrn bool value that indicates whether stop or go. This allows WARNING in the callback. > isolation tests for cases which are cheap, aka those not involving a > system-wide operation. Running those tests on HEAD, it is easy to see > that TRUNCATE or VACUUM complete after a session doing a catalog lookup > commits its transaction. VACUUM skips a relation and VACUUM FULL issues > an error. > > Regarding those patches, I am pretty happy how things turn out for > TRUNCATE and REINDEX, way less for VACUUM, so getting 0001 and 0003 > committed first makes the most sense to me as their logic is rather > straight-forward (well way of speaking ;p ). About 0001, I'm not sure what the "session-level" means but it looks fine and works as expected. 0003 looks fine overall. I think I heard that gender-free wording is appreciated but it might be only about documentation. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: BUG #15182: Canceling authentication due to timeout aka Denialof Service Attack
From
Michael Paquier
Date:
On Fri, Jul 27, 2018 at 12:37:02PM +0900, Kyotaro HORIGUCHI wrote: > I intended the same as Bossart said. It is not reasonable that > VACUUM and VACUUM FULL behaves differently on rejection for the > same reason. The objective of the first (or early) check is > rejecting (non-superuser's) unprivileged VACUUM FULL. Any > possible modifications on a syscache tuple before taking a lock > doesn't seem to harm in the point of view. Anyway the lock > acquired in expand_vacuum_rel is not held until vacuum_rel. Thinking about it harder, it is possible to pass down a status pointer that the callback of RangeVarGetRelidExtended could fill if we pass it down using the argument list. This would need special handling for the case where expand_vacuum_rel() is NIL but that would be workable. If vacuum_check_rel() does not need to work with elevel >= ERROR, then the set of log messages remains the same. > Apart from the discussion above, it is uneasy for me that the > messages seem heavily affected by the callers context. Sure, the patch for vacuum is still heavily WIP. I am not much happy about that either. >> Regarding those patches, I am pretty happy how things turn out for >> TRUNCATE and REINDEX, way less for VACUUM, so getting 0001 and 0003 >> committed first makes the most sense to me as their logic is rather >> straight-forward (well way of speaking ;p ). > > About 0001, I'm not sure what the "session-level" means but it > looks fine and works as expected. 0003 looks fine overall. What I mean here is checking the interactions with other backends, truncate_check_session is the best name I could come up with. truncate_check_activity was another. I am not attached to a single name, if you have an ideaof course please feel free. > I think I heard that gender-free wording is appreciated but it > might be only about documentation. Sure. -- Michael
Attachment
Re: BUG #15182: Canceling authentication due to timeout aka Denial ofService Attack
From
"Bossart, Nathan"
Date:
On 7/26/18, 11:16 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > On Fri, Jul 27, 2018 at 12:37:02PM +0900, Kyotaro HORIGUCHI wrote: >> I intended the same as Bossart said. It is not reasonable that >> VACUUM and VACUUM FULL behaves differently on rejection for the >> same reason. The objective of the first (or early) check is >> rejecting (non-superuser's) unprivileged VACUUM FULL. Any >> possible modifications on a syscache tuple before taking a lock >> doesn't seem to harm in the point of view. Anyway the lock >> acquired in expand_vacuum_rel is not held until vacuum_rel. > > Thinking about it harder, it is possible to pass down a status pointer > that the callback of RangeVarGetRelidExtended could fill if we pass it > down using the argument list. This would need special handling for the > case where expand_vacuum_rel() is NIL but that would be workable. If > vacuum_check_rel() does not need to work with elevel >= ERROR, then the > set of log messages remains the same. Perhaps I am misunderstanding the goal of the patch. The original issue reported above is for a lock pile-up that blocks connection authentication. Specifically, VACUUM FULL blocks waiting for an AccessExclusiveLock on pg_authid in vacuum_rel(), even if the user does not have the right permissions. IIUC 0002 is adding a callback to the OID lookup logic in expand_vacuum_rel() that aims to prevent users from taking an AccessShareLock when they don't have permissions to VACUUM the relation. However, ISTM that this AccessShareLock is not the issue. If we allowed all users to get the AccessShareLock but we ensured we called vacuum_skip_rel() prior to waiting for the AccessExclusiveLock, I think we are good. Of course, there's a chance that a concurrent change would cause the role to lose permissions in between expand_vacuum_rel() and vacuum_rel(), but that seems like a risk we are taking regardless. I think I'm essentially suggesting what you have in 0002 but without the new RangeVarGetRelidExtended() callback. I've attached a modified version of 0002 that seems to fix the originally reported issue. (I haven't looked into any extra handling needed for ANALYZE or partitioned tables.) Running the same checks for all VACUUMs would keep things simple and provide a more uniform user experience. > The docs mentioned that shared catalogs are processed, so I did not > bother, but visibly your comment is that we could be more precise about > the ownership in this case? An attempt is attached. Sorry, I should have been clearer. But yes, your update is what I was thinking. Nathan
Attachment
Re: BUG #15182: Canceling authentication due to timeout aka Denialof Service Attack
From
Michael Paquier
Date:
On Fri, Jul 27, 2018 at 02:40:42PM +0000, Bossart, Nathan wrote: > On 7/26/18, 11:16 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > I think I'm essentially suggesting what you have in 0002 but without > the new RangeVarGetRelidExtended() callback. I've attached a modified > version of 0002 that seems to fix the originally reported issue. (I > haven't looked into any extra handling needed for ANALYZE or > partitioned tables.) Running the same checks for all VACUUMs would > keep things simple and provide a more uniform user experience. Okay, let me check that. Your patch has at least an error in get_all_vacuum_rels() where toast relations cannot be skipped. >> The docs mentioned that shared catalogs are processed, so I did not >> bother, but visibly your comment is that we could be more precise about >> the ownership in this case? An attempt is attached. > > Sorry, I should have been clearer. But yes, your update is what I was > thinking. No problem. If there are no objections, I am going to fix the REINDEX issue first and back-patch. Its patch is the least invasive of the set. -- Michael
Attachment
Re: BUG #15182: Canceling authentication due to timeout aka Denial ofService Attack
From
"Bossart, Nathan"
Date:
On 7/27/18, 7:10 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > No problem. If there are no objections, I am going to fix the REINDEX > issue first and back-patch. Its patch is the least invasive of the > set. This seems like a reasonable place to start. I took a closer look at 0003. + /* + * We allow the user to reindex a table if he is superuser, the table + * owner, or the database owner (but in the latter case, only if it's not + * a shared relation). + */ + if (!(pg_class_ownercheck(relid, GetUserId()) || + (pg_database_ownercheck(MyDatabaseId, GetUserId()) && + !classtuple->relisshared))) + continue; This is added to ReindexMultipleTables(), which is used for REINDEX SCHEMA, REINDEX SYSTEM, and REINDEX DATABASE. Presently, REINDEX SCHEMA requires being the owner of the schema, and REINDEX SYSTEM and REINDEX DATABASE require being the owner of the database. So, if a role is an owner of a database or the pg_catalog schema, they are able to reindex shared catalogs like pg_authid. This patch changes things so that only superusers or the owner of the table can reindex shared relations. This works as expected for REINDEX SYSTEM and REINDEX DATABASE, but I think there is a problem for REINDEX SCHEMA. If the user is not the owner of the database but is the owner of the schema, they will not be able to use REINDEX SCHEMA to reindex any tables that they do not own. To fix this, we will likely need to use pg_namespace_ownercheck() instead of pg_database_ownercheck() for the REINDEX_OBJECT_SCHEMA case. I also noticed that this patch causes shared relations to be skipped silently. Perhaps we should emit a WARNING or DEBUG message when this happens, at least for REINDEXOPT_VERBOSE. Reindexing a single index or table requires being the owner of that index or table. Reindexing a database requires being the owner of the database (note that the owner can therefore rebuild indexes of - tables owned by other users). Of course, superusers can always + tables owned by other users). Reindexing a shared catalog requires + being the owner of that shared catalog. Of course, superusers can always reindex anything. I noticed that there is no mention that the owner of a schema can do REINDEX SCHEMA, which seems important to note. Also, the proposed wording might seem slightly ambiguous for the REINDEX DATABASE case. It might be clearer to say something like the following: Reindexing a single index or table requires being the owner of that index of table. REINDEX DATABASE and REINDEX SYSTEM require being the owner of the database, and REINDEX SCHEMA requires being the owner of the schema (note that the user can therefore rebuild indexes of tables owned by other users). Reindexing a shared catalog requires being the owner of the shared catalog, even if the user is the owner of the specified database or schema. Of course, superusers can always reindex anything. Nathan
Re: BUG #15182: Canceling authentication due to timeout aka Denialof Service Attack
From
Michael Paquier
Date:
On Sun, Jul 29, 2018 at 04:11:38PM +0000, Bossart, Nathan wrote: > On 7/27/18, 7:10 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > > No problem. If there are no objections, I am going to fix the REINDEX > > issue first and back-patch. Its patch is the least invasive of the > > set. > > This seems like a reasonable place to start. I took a closer look at > 0003. Thanks for looking at it! > This is added to ReindexMultipleTables(), which is used for REINDEX > SCHEMA, REINDEX SYSTEM, and REINDEX DATABASE. Presently, REINDEX > SCHEMA requires being the owner of the schema, and REINDEX SYSTEM and > REINDEX DATABASE require being the owner of the database. So, if a > role is an owner of a database or the pg_catalog schema, they are able > to reindex shared catalogs like pg_authid. Yeah, I was testing that yesterday night and bumped on this case when trying do a REINDEX SCHEMA pg_class. The point is that you can simplify the check and remove pg_database_ownercheck as there is already an ACL check on the database/system/schema at the top of the routine, so you are already sure that pg_database_ownercheck() or pg_namespace_ownercheck would return true. This shaves a few cycles as well for each relation checked. > I also noticed that this patch causes shared relations to be skipped > silently. Perhaps we should emit a WARNING or DEBUG message when this > happens, at least for REINDEXOPT_VERBOSE. That's intentional. I thought about that as well, but I am hesitant to do so as we don't bother mentioning the other relations skipped. REINDEX VERBOSE also shows up what are the tables processed, so it is easy to guess what are the tables skipped, still more painful. And the documentation changes added cover the gap. > I noticed that there is no mention that the owner of a schema can do > REINDEX SCHEMA, which seems important to note. Also, the proposed > wording might seem slightly ambiguous for the REINDEX DATABASE case. > It might be clearer to say something like the following: > > Reindexing a single index or table requires being the owner of > that index of table. REINDEX DATABASE and REINDEX SYSTEM > require being the owner of the database, and REINDEX SCHEMA > requires being the owner of the schema (note that the user can > therefore rebuild indexes of tables owned by other users). > Reindexing a shared catalog requires being the owner of the > shared catalog, even if the user is the owner of the specified > database or schema. Of course, superusers can always reindex > anything. +1, I have included your suggestions. The patch attached applies easily down to 9.5 where REINDEX SCHEMA was added. For 9.4 and 9.3, there is no schema case, still the new check is similar. The commit message is slightly changed so as there is no mention of REINDEX SCHEMA. 9.3 needs a slight change compared to 9.4 as well. So attached are patches for 9.5~master, 9.4 and 9.3 with commit messages. Does that look fine to folks of this thread? -- Michael
Attachment
Re: BUG #15182: Canceling authentication due to timeout aka Denialof Service Attack
From
Kyotaro HORIGUCHI
Date:
At Mon, 30 Jul 2018 09:34:22 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20180730003422.GA2878@paquier.xyz> > On Sun, Jul 29, 2018 at 04:11:38PM +0000, Bossart, Nathan wrote: > > On 7/27/18, 7:10 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > > This is added to ReindexMultipleTables(), which is used for REINDEX > > SCHEMA, REINDEX SYSTEM, and REINDEX DATABASE. Presently, REINDEX > > SCHEMA requires being the owner of the schema, and REINDEX SYSTEM and > > REINDEX DATABASE require being the owner of the database. So, if a > > role is an owner of a database or the pg_catalog schema, they are able > > to reindex shared catalogs like pg_authid. > > Yeah, I was testing that yesterday night and bumped on this case when > trying do a REINDEX SCHEMA pg_class. The point is that you can simplify > the check and remove pg_database_ownercheck as there is already an ACL > check on the database/system/schema at the top of the routine, so you > are already sure that pg_database_ownercheck() or > pg_namespace_ownercheck would return true. This shaves a few cycles as > well for each relation checked. There's a case where the database owner differs from catalogs' owner. ALTER DATABASE OWNER TO can make such configuration. In this case, the previous code allows REINDEX of a shared catalog for the database owner who is not the catalog's owner. Superuser : alice Database owner : joe Catalog owner : andy Schema owner : joe "REINDEX SCHERMA <the schema>" by joe allows shared catalog to be reindexed, even he is *not* the owner of the catalog. The revised patch behaves differently to this case. "REINDEX SCHERMA <the schema>" by joe *skips* shared catalog since he is not the owner of the catalog. # Uh? Alice doesn't involved anywhere.. I feel that just being a database owner doesn't justify to cause this problem innocently. Catalog owner is also doubious but we can carefully configure the ownerships to avoid the problem since only superuser can change it. So I vote +1 for the revised patch. > > I also noticed that this patch causes shared relations to be skipped > > silently. Perhaps we should emit a WARNING or DEBUG message when this > > happens, at least for REINDEXOPT_VERBOSE. > > That's intentional. I thought about that as well, but I am hesitant to > do so as we don't bother mentioning the other relations skipped. > REINDEX VERBOSE also shows up what are the tables processed, so it is > easy to guess what are the tables skipped, still more painful. And the > documentation changes added cover the gap. Even if it is written in the "Notes" section, doesn't the following need some fix? It is the same for the DATBASE item. | Parameters ... | SYSTEM | Recreate all indexes on system catalogs within the current | database. *Indexes on shared system catalogs are included*. | Indexes on user tables are not processed. This form | of REINDEX cannot be executed inside a transaction block. > > I noticed that there is no mention that the owner of a schema can do > > REINDEX SCHEMA, which seems important to note. Also, the proposed > > wording might seem slightly ambiguous for the REINDEX DATABASE case. > > It might be clearer to say something like the following: > > > > Reindexing a single index or table requires being the owner of > > that index of table. REINDEX DATABASE and REINDEX SYSTEM > > require being the owner of the database, and REINDEX SCHEMA > > requires being the owner of the schema (note that the user can > > therefore rebuild indexes of tables owned by other users). > > Reindexing a shared catalog requires being the owner of the > > shared catalog, even if the user is the owner of the specified > > database or schema. Of course, superusers can always reindex > > anything. > > +1, I have included your suggestions. The patch attached applies easily > down to 9.5 where REINDEX SCHEMA was added. For 9.4 and 9.3, there is > no schema case, still the new check is similar. The commit message is > slightly changed so as there is no mention of REINDEX SCHEMA. 9.3 needs > a slight change compared to 9.4 as well. > > So attached are patches for 9.5~master, 9.4 and 9.3 with commit > messages. Does that look fine to folks of this thread? This apparently changes the documented behavior and the problem seems to be a result of a rather malicious/intentional combination of operatoins (especially named vacuum on a shared catalog). I vote -0.5 to backpatch unless we categorize this as a security issue. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: BUG #15182: Canceling authentication due to timeout aka Denialof Service Attack
From
Michael Paquier
Date:
On Mon, Jul 30, 2018 at 05:53:54PM +0900, Kyotaro HORIGUCHI wrote: > I feel that just being a database owner doesn't justify to cause > this problem innocently. Catalog owner is also doubious but we > can carefully configure the ownerships to avoid the problem since > only superuser can change it. So I vote +1 for the revised patch. Thanks for the review. Yes that sucks that being just a database or a schema owner allows such a user to take an exclusive lock on shared catalogs. Please note that depending on the order of the relations, authentication may or may not be blocked depending on what kind of locks the second session takes. > | Parameters > ... > | SYSTEM > | Recreate all indexes on system catalogs within the current > | database. *Indexes on shared system catalogs are included*. > | Indexes on user tables are not processed. This form > | of REINDEX cannot be executed inside a transaction block. This looks correct to me, shared catalogs are included, and the "notes" section clealy mentions that being an owner of the shared catalog is required. > This apparently changes the documented behavior and the problem > seems to be a result of a rather malicious/intentional > combination of operatoins (especially named vacuum on a shared > catalog). I vote -0.5 to backpatch unless we categorize this as a > security issue. Ask that to any vendors doing shared hosting of Postgres :) A backpatch looks like the correct course of events to me. Anybody here is free to express his/her concerns of course. -- Michael
Attachment
Re: BUG #15182: Canceling authentication due to timeout aka Denial ofService Attack
From
"Bossart, Nathan"
Date:
On 7/29/18, 7:35 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > Yeah, I was testing that yesterday night and bumped on this case when > trying do a REINDEX SCHEMA pg_class. The point is that you can simplify > the check and remove pg_database_ownercheck as there is already an ACL > check on the database/system/schema at the top of the routine, so you > are already sure that pg_database_ownercheck() or > pg_namespace_ownercheck would return true. This shaves a few cycles as > well for each relation checked. Makes sense. >> I also noticed that this patch causes shared relations to be skipped >> silently. Perhaps we should emit a WARNING or DEBUG message when this >> happens, at least for REINDEXOPT_VERBOSE. > > That's intentional. I thought about that as well, but I am hesitant to > do so as we don't bother mentioning the other relations skipped. > REINDEX VERBOSE also shows up what are the tables processed, so it is > easy to guess what are the tables skipped, still more painful. And the > documentation changes added cover the gap. Okay, that seems reasonable to me, too. > +1, I have included your suggestions. The patch attached applies easily > down to 9.5 where REINDEX SCHEMA was added. For 9.4 and 9.3, there is > no schema case, still the new check is similar. The commit message is > slightly changed so as there is no mention of REINDEX SCHEMA. 9.3 needs > a slight change compared to 9.4 as well. For 9.3 and 9.4, it might be nice to add the "even if the user is the owner of the specified database" part, too. > So attached are patches for 9.5~master, 9.4 and 9.3 with commit > messages. Does that look fine to folks of this thread? Looks good to me. Since REINDEX can be used to block calls to load_critical_index() from new connections, back-patching seems appropriate. Nathan
Re: BUG #15182: Canceling authentication due to timeout aka Denial ofService Attack
From
Robert Haas
Date:
On Mon, Jul 30, 2018 at 6:21 AM, Michael Paquier <michael@paquier.xyz> wrote: > On Mon, Jul 30, 2018 at 05:53:54PM +0900, Kyotaro HORIGUCHI wrote: >> I feel that just being a database owner doesn't justify to cause >> this problem innocently. Catalog owner is also doubious but we >> can carefully configure the ownerships to avoid the problem since >> only superuser can change it. So I vote +1 for the revised patch. > > Thanks for the review. Yes that sucks that being just a database or a > schema owner allows such a user to take an exclusive lock on shared > catalogs. Please note that depending on the order of the relations, > authentication may or may not be blocked depending on what kind of locks > the second session takes. Just as a general statement, I don't think we should, as part of a patch for the issue discussed on this thread, make any changes AT ALL to who has permission to perform which operations. We *certainly* should not back-patch such changes, but we really also should not make them on master unless they are discussed on a separate thread with a clear subject line and agreed by a clear consensus. This patch should only be about detecting cases where we lack permission earlier, before we try to acquire a lock. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: BUG #15182: Canceling authentication due to timeout aka Denialof Service Attack
From
Michael Paquier
Date:
On Tue, Jul 31, 2018 at 10:34:57AM -0400, Robert Haas wrote: > Just as a general statement, I don't think we should, as part of a > patch for the issue discussed on this thread, make any changes AT ALL > to who has permission to perform which operations. We *certainly* > should not back-patch such changes, but we really also should not make > them on master unless they are discussed on a separate thread with a > clear subject line and agreed by a clear consensus. Well, perhaps spawning one thread for each command would make the most sense then? I am feeling a bit of confusion here. There have been three cases discussed up to now: 1) TRUNCATE, which results in a patch that has no behavioral changes. 2) VACUUM [FULL], where we are also heading toward a patch that has no behavioral change per the last arguments exchanged, where we make sure that permission checks are done before acquiring any locks. 3) REINDEX, where a database or a schema owner is able to take an exclusive lock on a shared catalog with limited permissions. That sucks as this block calls to load_critical_index, but I would be ready to buy to not back-patch such a change if the consensus reached is to not skip shared catalogs if a database/schema owner has no ownership on the shared catalog directly. > This patch should only be about detecting cases where we lack > permission earlier, before we try to acquire a lock. Yeah, the REINDEX case is the only one among the three where the set of relations worked on changes. Please note that ownership checks happen in this case for indexes, tables, database and schemas before taking a lock on them. Databases/systems and schemas just check for ownership of respectively the database and the schema. I'll be happy to create one thread per patch if that helps. Thinking about it this would attract more attention to each individual problem reported. -- Michael
Attachment
Re: BUG #15182: Canceling authentication due to timeout aka Denial ofService Attack
From
ahsan hadi
Date:
Hi Michael, Can you rebase the reindex-priv-93.patch, it is getting some hunk failures. patching file doc/src/sgml/ref/reindex.sgml Hunk #1 succeeded at 227 (offset 13 lines). patching file src/backend/commands/indexcmds.c Hunk #1 FAILED at 1873. 1 out of 1 hunk FAILED -- saving rejects to file src/backend/commands/indexcmds.c.rej Thanks.
Re: BUG #15182: Canceling authentication due to timeout aka Denialof Service Attack
From
Michael Paquier
Date:
On Wed, Aug 01, 2018 at 10:55:11AM +0000, Ahsan Hadi wrote: > Can you rebase the reindex-priv-93.patch, it is getting some hunk failures. Patch reindex-priv-93.patch is for REL9_3_STABLE, where it applies cleanly for me. reindex-priv-94.patch is for REL9_4_STABLE and reindex-priv-95-master.patch is for 9.5~master. -- Michael
Attachment
Re: BUG #15182: Canceling authentication due to timeout aka Denialof Service Attack
From
Michael Paquier
Date:
Hi Nathan, On Fri, Jul 27, 2018 at 02:40:42PM +0000, Bossart, Nathan wrote: > I think I'm essentially suggesting what you have in 0002 but without > the new RangeVarGetRelidExtended() callback. I've attached a modified > version of 0002 that seems to fix the originally reported issue. (I > haven't looked into any extra handling needed for ANALYZE or > partitioned tables.) Running the same checks for all VACUUMs would > keep things simple and provide a more uniform user experience. I have been looking at this patch for VACUUM (perhaps we ought to spawn a new thread as the cases of REINDEX and TRUNCATE have been addressed), and I can see a problem with this approach. If multiple transactions are used with a list of relations vacuum'ed then simply moving around the ACL checks would cause a new problem: if the relation vacuum'ed disappears when processing it with vacuum_rel() after the list of relations is built then we would get an ERROR instead of a skip because of pg_class_ownercheck() which is not fail-safe. Wouldn't it be enough to check just the ACLs in expand_vacuum_rel() and get_all_vacuum_rels()? The first is rather similar to what happens for TRUNCATE, and the second is similar to what has been fixed for VACUUM. We should also remove the relkind checks out of vacuum_skip_rel() as those checks are not completely consistent for all those code paths... -- Michael
Attachment
Re: BUG #15182: Canceling authentication due to timeout aka Denialof Service Attack
From
Michael Paquier
Date:
Hi all, Here is a summary of what has happened since this thread has been created. Three problems reported on this thread have been solved and resulted in different commits for early lock lookups: - VACUUM FULL, patched on 12~: https://www.postgresql.org/message-id/20180812222142.GA6097@paquier.xyz Commit a556549: Improve VACUUM and ANALYZE by avoiding early lock queue - TRUNCATE, patched on 12~: https://www.postgresql.org/message-id/20180806165816.GA19883@paquier.xyz Commit f841ceb: Improve TRUNCATE by avoiding early lock queue - REINDEX, patched on 11~: https://www.postgresql.org/message-id/20180805211059.GA2185@paquier.xyz Commit 661dd23: Restrict access to reindex of shared catalogs for non-privileged users Please note that I have been very conservative with the different fixes as v11 is getting very close to release. The patch for REINDEX is a behavior change which will not get further down anyway. It would still be nice to get a second lookup at the code and look if there are other suspicious calls of relation_open or such which could allow non-privileged users to pile up locks and cause more DOS problems. Thanks, -- Michael