Re: BUG #16614: Stale temporary objects makes vacuum ineffective when 1 million transactions remain - Mailing list pgsql-bugs

From Giorgio Saviane
Subject Re: BUG #16614: Stale temporary objects makes vacuum ineffective when 1 million transactions remain
Date
Msg-id CAHs6c0cs5PtQffn9Nt_sW99ncGBjvKJ01wy+1anUn_4J2cysVA@mail.gmail.com
Whole thread Raw
In response to Re: BUG #16614: Stale temporary objects makes vacuum ineffective when 1 million transactions remain  (Francisco Olarte <folarte@peoplecall.com>)
Responses Re: BUG #16614: Stale temporary objects makes vacuum ineffective when 1 million transactions remain  (Francisco Olarte <folarte@peoplecall.com>)
Re: BUG #16614: Stale temporary objects makes vacuum ineffective when 1 million transactions remain  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Let me clarify.
The sequence of events that led our databases to the wraparound issue is the following:
  1. The application creates temporary tables that should be by default cleaned at the end of the session. As a matter of fact the pooler doesn't actually close the session, neither - apparently - issues a DISCARD ALL, creating leaks among sessions.
  2. Since sessions are kept alive with stale temporary tables the autovacuum process cannot do maintenance on transaction ids.
  3. Since we have no evidence on what is going on, the database reaches the 1 million stop limit to avoid transaction wraparound.
  4. The application gets then out of order. We stop the Postgres server gracefully and we enter single user mode to run the vacuum
  5. Any attempt to vacuum the database ends up with the usual "WARNING: database "mydb" must be vacuumed within XXX transactions" message, revealing that the vacuum could not fix up the transaction ids.
  6. By running this SQL (still in single user mode)
      SELECT nspname || '.' || relname AS objname
      FROM pg_class JOIN pg_namespace
        ON pg_namespace.oid = pg_class.relnamespace
      WHERE relpersistence = 't' AND relkind = 'r'
    We notice the presence of orphaned tables. Sessions are gone at this point, so they are for sure orphaned.
  7. If we manually drop the orphaned tables and run vacuum again the warning message disappears and the database is restored to normal functioning.
No matter this situation is caused by an application misbehavior or a missed temporary table cleanup, I'm asking if there is a reasonable way to prevent this situation improving the alerting with:
  • When vacuum cannot repeatedly perform a transaction fix up in regular multi user mode.
  • When vacuum cannot perform a transaction fix up in single user mode because there are orphaned tables in the way.
If we had these alerts we would not have lost a database.

Kind regards

Girogio Saviane

Il giorno sab 12 set 2020 alle ore 12:59 Francisco Olarte <folarte@peoplecall.com> ha scritto:
On Sat, Sep 12, 2020 at 10:59 AM Giorgio Saviane <gsaviane@gmail.com> wrote:
....
> The question is: can you implement some warning thrown by vacuum/autovacuum in case orphaned temporary tables are in the way?
> The connection pooler we are using is the one provided by Tomcat 7. I'm gonna check if it supports DISCARD ALL on close.

From what I've read the thing is, if the pooler does not work
correctly ( by issuing discard all or otherwise taking care of
temporary tables ), there are few things as the tables are not
orphaned on the server side.

What the server sees is someone connects, creates the temporary
tables, issues a lot of other commands but does not  disconnect, so
the tables must be kept alive in case the the session wants to use
them again. DISCARD ALL will make them orphans.

Francisco Olarte.

pgsql-bugs by date:

Previous
From: Francisco Olarte
Date:
Subject: Re: BUG #16614: Stale temporary objects makes vacuum ineffective when 1 million transactions remain
Next
From: Francisco Olarte
Date:
Subject: Re: BUG #16614: Stale temporary objects makes vacuum ineffective when 1 million transactions remain