BUG #15182: Canceling authentication due to timeout aka Denial ofService Attack - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #15182: Canceling authentication due to timeout aka Denial ofService Attack |
Date | |
Msg-id | 152512087100.19803.12733865831237526317@wrigleys.postgresql.org Whole thread Raw |
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 15182 Logged by: Lloyd Albin Email address: lalbin@scharp.org PostgreSQL version: 10.3 Operating system: OpenSUSE Description: Over the last several weeks our developers caused a Denial of Service Attack against ourselves by accident. When looking at the log files, I noticed that we had authentication timeouts during these time periods. In researching the problem I found this is due to locks being held on shared system catalog items, aka system catalog items that are shared between all databases on the same cluster/server. This can be caused by beginning a long running transaction that queries pg_stat_activity, pg_roles, pg_database, etc and then another connection that runs either a REINDEX DATABASE, REINDEX SYSTEM, or VACUUM FULL. This issue is of particular importance to database resellers who use the same cluster/server for multiple clients, as two clients can cause this issue to happen inadvertently or a single client can either cause it to happen maliciously or inadvertently. Note: The large cloud providers give each of their clients their own cluster/server so this will not affect across cloud clients but can affect an individual client. The problem is that traditional hosting companies will have all clients from one or more web servers share the same PostgreSQL cluster/server. This means that one or two clients could inadvertently stop all the other clients from being able to connect to their databases until the first client does either a COMMIT or ROLLBACK of their transaction which they could hold open for hours, which is what happened to us internally. In Connection 1 we need to BEGIN a transaction and then query a shared system item; pg_authid, pg_database, etc; or a view that depends on a shared system item; pg_stat_activity, pg_roles, etc. Our developers were accessing pg_roles. Connection 1 (Any database, Any User) BEGIN; SELECT * FROM pg_stat_activity; Connection 2 (Any database will do as long as you are the database owner) REINDEX DATABASE postgres; Connection 3 (Any Database, Any User) psql -h sqltest-alt -d sandbox All future Connection 3's will hang for however long the transaction in Connection 1 runs. In our case this was hours and denied everybody else the ability to log into the server until Connection 1 was committed. psql will just hang for hours, even overnight in my testing, but our apps would get the "Canceling authentication due to timeout" after 1 minute. Connection 2 can also do any of these commands to also cause the same issue: REINDEX SYSTEM postgres; VACUUM FULL pg_authid; vacuumdb -f -h sqltest-alt -d lloyd -U lalbin Even worse is that the VACUUM FULL pg_authid; can be started by an unprivileged user and it will wait for the AccessShareLock by connection 1 to be released before returning the error that you don't have permission to perform this action, so even an unprivileged user can cause this to happen. The privilege check needs to happen before the waiting for the AccessExclusiveLock happens. This bug report has been simplified and shorted drastically. To read the full information about this issue please see my blog post: http://lloyd.thealbins.com/Canceling%20authentication%20due%20to%20timeout Lloyd Albin Database Administrator Statistical Center for HIV/AIDS Research and Prevention (SCHARP) Fred Hutchinson Cancer Research Center
pgsql-bugs by date: