Thread: Autovacuum of pg_database

Autovacuum of pg_database

From
Ondřej Světlík
Date:
Hello again,

we have another strange problem with autovacuum. The process goes well
until it reaches the table pg_database (in some databases, not all),
then it loops over it forever  (until I run vacuum full analyze on that
table, simple vacuum doesn't help):

2016-05-05 17:13:33 CEST  34818LOG:  automatic vacuum of table
"somedatabase.pg_catalog.pg_database": index scans: 0
    pages: 0 removed, 233 remain
    tuples: 0 removed, 5817 remain, 0 are dead but not yet removable
    buffer usage: 87 hits, 0 misses, 0 dirtied
    avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
    system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec
2016-05-05 17:13:33 CEST  34819LOG:  automatic vacuum of table
"somedatabase.pg_catalog.pg_database": index scans: 0
    pages: 0 removed, 233 remain
    tuples: 0 removed, 5817 remain, 0 are dead but not yet removable
    buffer usage: 87 hits, 0 misses, 0 dirtied
    avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
    system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec
2016-05-05 17:13:33 CEST  34820LOG:  automatic vacuum of table
"somedatabase.pg_catalog.pg_database": index scans: 0
    pages: 0 removed, 233 remain
    tuples: 0 removed, 5817 remain, 0 are dead but not yet removable
    buffer usage: 87 hits, 0 misses, 0 dirtied
    avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
    system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec
2016-05-05 17:13:34 CEST  34821LOG:  automatic vacuum of table
"somedatabase.pg_catalog.pg_database": index scans: 0
    pages: 0 removed, 233 remain
    tuples: 0 removed, 5817 remain, 0 are dead but not yet removable
    buffer usage: 87 hits, 0 misses, 0 dirtied
    avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
    system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec

Not sure if it's a bug or I'm doing something wrong.

Thanks in advance,

Ondřej


Re: Autovacuum of pg_database

From
Ondřej Světlík
Date:
Dne 5.5.2016 v 17:16 Ondřej Světlík napsal(a):
> Hello again,
>
> we have another strange problem with autovacuum. The process goes well
> until it reaches the table pg_database (in some databases, not all),
> then it loops over it forever  (until I run vacuum full analyze on that
> table, simple vacuum doesn't help):
>
> 2016-05-05 17:13:33 CEST  34818LOG:  automatic vacuum of table
> "somedatabase.pg_catalog.pg_database": index scans: 0
>      pages: 0 removed, 233 remain
>      tuples: 0 removed, 5817 remain, 0 are dead but not yet removable
>      buffer usage: 87 hits, 0 misses, 0 dirtied
>      avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
>      system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec
> 2016-05-05 17:13:33 CEST  34819LOG:  automatic vacuum of table
> "somedatabase.pg_catalog.pg_database": index scans: 0
>      pages: 0 removed, 233 remain
>      tuples: 0 removed, 5817 remain, 0 are dead but not yet removable
>      buffer usage: 87 hits, 0 misses, 0 dirtied
>      avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
>      system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec
> 2016-05-05 17:13:33 CEST  34820LOG:  automatic vacuum of table
> "somedatabase.pg_catalog.pg_database": index scans: 0
>      pages: 0 removed, 233 remain
>      tuples: 0 removed, 5817 remain, 0 are dead but not yet removable
>      buffer usage: 87 hits, 0 misses, 0 dirtied
>      avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
>      system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec
> 2016-05-05 17:13:34 CEST  34821LOG:  automatic vacuum of table
> "somedatabase.pg_catalog.pg_database": index scans: 0
>      pages: 0 removed, 233 remain
>      tuples: 0 removed, 5817 remain, 0 are dead but not yet removable
>      buffer usage: 87 hits, 0 misses, 0 dirtied
>      avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
>      system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec
>
> Not sure if it's a bug or I'm doing something wrong.
>
> Thanks in advance,
>
> Ondřej


Another part of log shows, that the process tries pg_database twice,
than it does one other table and than back to pg_database for the rest
of our days:

2016-05-05 17:16:56 CEST  36675LOG:  automatic vacuum of table
"someotherdb.pg_catalog.pg_database": index scans: 0
    pages: 0 removed, 233 remain
    tuples: 0 removed, 5817 remain, 0 are dead but not yet removable
    buffer usage: 36 hits, 50 misses, 2 dirtied
    avg read rate: 3.277 MB/s, avg write rate: 0.131 MB/s
    system usage: CPU 0.00s/0.00u sec elapsed 0.11 sec
2016-05-05 17:16:56 CEST  36676LOG:  automatic vacuum of table
"someotherdb.pg_catalog.pg_database": index scans: 0
    pages: 0 removed, 233 remain
    tuples: 0 removed, 5817 remain, 0 are dead but not yet removable
    buffer usage: 68 hits, 18 misses, 0 dirtied
    avg read rate: 2.563 MB/s, avg write rate: 0.000 MB/s
    system usage: CPU 0.00s/0.00u sec elapsed 0.05 sec
2016-05-05 17:16:56 CEST  36674LOG:  automatic vacuum of table
"someotherdb.pg_toast.pg_toast_2618": index scans: 1
    pages: 58 removed, 50 remain
    tuples: 141 removed, 219 remain, 0 are dead but not yet removable
    buffer usage: 151 hits, 192 misses, 73 dirtied
    avg read rate: 2.728 MB/s, avg write rate: 1.037 MB/s
    system usage: CPU 0.00s/0.00u sec elapsed 0.54 sec
2016-05-05 17:16:56 CEST  36677LOG:  automatic vacuum of table
"someotherdb.pg_catalog.pg_database": index scans: 0
    pages: 0 removed, 233 remain
    tuples: 0 removed, 5817 remain, 0 are dead but not yet removable
    buffer usage: 86 hits, 0 misses, 0 dirtied
    avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
    system usage: CPU 0.00s/0.00u sec elapsed 0.01 sec

Ondřej


Re: Autovacuum of pg_database

From
Ondřej Světlík
Date:
Dne 5.5.2016 v 17:19 Ondřej Světlík napsal(a):
> Dne 5.5.2016 v 17:16 Ondřej Světlík napsal(a):
>> Hello again,
>>
>> we have another strange problem with autovacuum. The process goes well
>> until it reaches the table pg_database (in some databases, not all),
>> then it loops over it forever  (until I run vacuum full analyze on that
>> table, simple vacuum doesn't help):
>>
>> 2016-05-05 17:13:33 CEST  34818LOG:  automatic vacuum of table
>> "somedatabase.pg_catalog.pg_database": index scans: 0
>>      pages: 0 removed, 233 remain
>>      tuples: 0 removed, 5817 remain, 0 are dead but not yet removable
>>      buffer usage: 87 hits, 0 misses, 0 dirtied

And now the same goes for pg_authid. Sorry for spamming, I hope this is
the last one.

Ondřej



Re: Autovacuum of pg_database

From
Alvaro Herrera
Date:
Ondřej Světlík wrote:
> Dne 5.5.2016 v 17:19 Ondřej Světlík napsal(a):
> >Dne 5.5.2016 v 17:16 Ondřej Světlík napsal(a):
> >>Hello again,
> >>
> >>we have another strange problem with autovacuum. The process goes well
> >>until it reaches the table pg_database (in some databases, not all),
> >>then it loops over it forever  (until I run vacuum full analyze on that
> >>table, simple vacuum doesn't help):
> >>
> >>2016-05-05 17:13:33 CEST  34818LOG:  automatic vacuum of table
> >>"somedatabase.pg_catalog.pg_database": index scans: 0
> >>     pages: 0 removed, 233 remain
> >>     tuples: 0 removed, 5817 remain, 0 are dead but not yet removable
> >>     buffer usage: 87 hits, 0 misses, 0 dirtied
>
> And now the same goes for pg_authid. Sorry for spamming, I hope this is the
> last one.

These are all shared catalogs.  There are others, so you may still see
more.  We got another report for pg_database
https://www.postgresql.org/message-id/A9D40BB7-CFD6-46AF-A0A1-249F04878A2A%40amazon.com
so I suppose there really is a bug.  I don't know what's going on there.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Autovacuum of pg_database

From
"Vicky Soni - Quipment India"
Date:
Hello,

                   I have observed that after upgrading from 9.3 to 9.4 on windows, autovacuum keep running again and
again.

In 9.3 this was not true. Any recommendations?


                                                                                             

Thanks & Regards,
Vicky Soni
Database Administrator



-----Original Message-----
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Alvaro Herrera
Sent: 05 May 2016 21:14
To: Ondřej Světlík <osvetlik@flexibee.eu>
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Autovacuum of pg_database

Ondřej Světlík wrote:
> Dne 5.5.2016 v 17:19 Ondřej Světlík napsal(a):
> >Dne 5.5.2016 v 17:16 Ondřej Světlík napsal(a):
> >>Hello again,
> >>
> >>we have another strange problem with autovacuum. The process goes 
> >>well until it reaches the table pg_database (in some databases, not 
> >>all), then it loops over it forever  (until I run vacuum full 
> >>analyze on that table, simple vacuum doesn't help):
> >>
> >>2016-05-05 17:13:33 CEST  34818LOG:  automatic vacuum of table
> >>"somedatabase.pg_catalog.pg_database": index scans: 0
> >>     pages: 0 removed, 233 remain
> >>     tuples: 0 removed, 5817 remain, 0 are dead but not yet removable
> >>     buffer usage: 87 hits, 0 misses, 0 dirtied
> 
> And now the same goes for pg_authid. Sorry for spamming, I hope this 
> is the last one.

These are all shared catalogs.  There are others, so you may still see more.  We got another report for pg_database
https://www.postgresql.org/message-id/A9D40BB7-CFD6-46AF-A0A1-249F04878A2A%40amazon.com
so I suppose there really is a bug.  I don't know what's going on there.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: Autovacuum of pg_database

From
Alvaro Herrera
Date:
Vicky Soni  - Quipment India wrote:
> Hello,
>
>                    I have observed that after upgrading from 9.3 to 9.4 on windows, autovacuum keep running again and
again.
>
> In 9.3 this was not true. Any recommendations?

Maybe there is a multixact wraparound problem.  What 9.3 were you
running previously, and what 9.4 did you upgrade to?  Can you share the
output of pg_controldata?  Please paste the output of this query
  select datname, datminmxid from pg_database

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Autovacuum of pg_database

From
Ondřej Světlík
Date:
Dne 5.5.2016 v 17:51 Alvaro Herrera napsal(a):
> Vicky Soni  - Quipment India wrote:
>> Hello,
>>
>>                     I have observed that after upgrading from 9.3 to 9.4 on windows, autovacuum keep running again
andagain. 
>>
>> In 9.3 this was not true. Any recommendations?
>
> Maybe there is a multixact wraparound problem.  What 9.3 were you
> running previously, and what 9.4 did you upgrade to?  Can you share the
> output of pg_controldata?  Please paste the output of this query
>    select datname, datminmxid from pg_database
>

That sound's reasonable, our autovacuum problems started with processes
labeled as 'to prevent wraparound' (on pg_shdepend). We use 9.4.

Ondřej


Re: Autovacuum of pg_database

From
"Vicky Soni - Quipment India"
Date:
Output of given query. Not sharing db names except temp database.
'datminmxid'
1
 1
 1
template0 16696
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1



Thanks & Regards,
Vicky Soni
Database Administrator



-----Original Message-----
From: Alvaro Herrera [mailto:alvherre@2ndquadrant.com]
Sent: 05 May 2016 21:21
To: Vicky Soni - Quipment India <vicky.soni@quipment.nl>
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Autovacuum of pg_database

Vicky Soni  - Quipment India wrote:
> Hello,
>
>                    I have observed that after upgrading from 9.3 to 9.4 on windows, autovacuum keep running again and
again.
>
> In 9.3 this was not true. Any recommendations?

Maybe there is a multixact wraparound problem.  What 9.3 were you running previously, and what 9.4 did you upgrade to?
Canyou share the output of pg_controldata?  Please paste the output of this query 
  select datname, datminmxid from pg_database

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Autovacuum of pg_database

From
Ondřej Světlík
Date:
Dne 5.5.2016 v 17:43 Alvaro Herrera napsal(a):
> Ondřej Světlík wrote:
>> Dne 5.5.2016 v 17:19 Ondřej Světlík napsal(a):
>>> Dne 5.5.2016 v 17:16 Ondřej Světlík napsal(a):
>>>> Hello again,
>>>>
>>>> we have another strange problem with autovacuum. The process goes well
>>>> until it reaches the table pg_database (in some databases, not all),
>>>> then it loops over it forever  (until I run vacuum full analyze on that
>>>> table, simple vacuum doesn't help):
>>>>
>>>> 2016-05-05 17:13:33 CEST  34818LOG:  automatic vacuum of table
>>>> "somedatabase.pg_catalog.pg_database": index scans: 0
>>>>      pages: 0 removed, 233 remain
>>>>      tuples: 0 removed, 5817 remain, 0 are dead but not yet removable
>>>>      buffer usage: 87 hits, 0 misses, 0 dirtied
>>
>> And now the same goes for pg_authid. Sorry for spamming, I hope this is the
>> last one.
>
> These are all shared catalogs.  There are others, so you may still see
> more.  We got another report for pg_database
> https://www.postgresql.org/message-id/A9D40BB7-CFD6-46AF-A0A1-249F04878A2A%40amazon.com
> so I suppose there really is a bug.  I don't know what's going on there.
>

Let me know if I can help trace the cause.

With regards

Ondřej


Re: Autovacuum of pg_database

From
"Vicky Soni - Quipment India"
Date:
Hello,

                  In my development setup I promoted slave, and now autovacuum keep running again and again.

Specially on pg_statistics table of one of my OLTP using many temp tables in functions.

Please help.

                                                                                             

Thanks & Regards,
Vicky Soni
Database Administrator


-----Original Message-----
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Ondrej Svetlík
Sent: 05 May 2016 21:28
To: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Autovacuum of pg_database

Dne 5.5.2016 v 17:43 Alvaro Herrera napsal(a):
> Ondřej Světlík wrote:
>> Dne 5.5.2016 v 17:19 Ondřej Světlík napsal(a):
>>> Dne 5.5.2016 v 17:16 Ondřej Světlík napsal(a):
>>>> Hello again,
>>>>
>>>> we have another strange problem with autovacuum. The process goes 
>>>> well until it reaches the table pg_database (in some databases, not 
>>>> all), then it loops over it forever  (until I run vacuum full 
>>>> analyze on that table, simple vacuum doesn't help):
>>>>
>>>> 2016-05-05 17:13:33 CEST  34818LOG:  automatic vacuum of table
>>>> "somedatabase.pg_catalog.pg_database": index scans: 0
>>>>      pages: 0 removed, 233 remain
>>>>      tuples: 0 removed, 5817 remain, 0 are dead but not yet removable
>>>>      buffer usage: 87 hits, 0 misses, 0 dirtied
>>
>> And now the same goes for pg_authid. Sorry for spamming, I hope this 
>> is the last one.
>
> These are all shared catalogs.  There are others, so you may still see 
> more.  We got another report for pg_database 
> https://www.postgresql.org/message-id/A9D40BB7-CFD6-46AF-A0A1-249F0487
> 8A2A%40amazon.com so I suppose there really is a bug.  I don't know 
> what's going on there.
>

Let me know if I can help trace the cause.

With regards

Ondřej


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: Autovacuum of pg_database

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> These are all shared catalogs.  There are others, so you may still see
> more.  We got another report for pg_database
> https://www.postgresql.org/message-id/A9D40BB7-CFD6-46AF-A0A1-249F04878A2A%40amazon.com
> so I suppose there really is a bug.  I don't know what's going on there.

I think it's pretty obvious: autovacuum.c's rule for detecting whether
some other worker is already processing table X is wrong when X is a
shared table.  I propose the attached patch.

            regards, tom lane

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index e2859df..4bc5d42 100644
*** a/src/backend/postmaster/autovacuum.c
--- b/src/backend/postmaster/autovacuum.c
*************** typedef struct autovac_table
*** 203,215 ****
   * wi_links        entry into free list or running list
   * wi_dboid        OID of the database this worker is supposed to work on
   * wi_tableoid    OID of the table currently being vacuumed, if any
   * wi_proc        pointer to PGPROC of the running worker, NULL if not started
   * wi_launchtime Time at which this worker was launched
   * wi_cost_*    Vacuum cost-based delay parameters current in this worker
   *
!  * All fields are protected by AutovacuumLock, except for wi_tableoid which is
!  * protected by AutovacuumScheduleLock (which is read-only for everyone except
!  * that worker itself).
   *-------------
   */
  typedef struct WorkerInfoData
--- 203,216 ----
   * wi_links        entry into free list or running list
   * wi_dboid        OID of the database this worker is supposed to work on
   * wi_tableoid    OID of the table currently being vacuumed, if any
+  * wi_tableshared    true if the table currently being vacuumed is a shared rel
   * wi_proc        pointer to PGPROC of the running worker, NULL if not started
   * wi_launchtime Time at which this worker was launched
   * wi_cost_*    Vacuum cost-based delay parameters current in this worker
   *
!  * All fields are protected by AutovacuumLock, except for wi_tableoid and
!  * wi_tableshared which are protected by AutovacuumScheduleLock (and are
!  * read-only for everyone except that worker itself).
   *-------------
   */
  typedef struct WorkerInfoData
*************** typedef struct WorkerInfoData
*** 217,222 ****
--- 218,224 ----
      dlist_node    wi_links;
      Oid            wi_dboid;
      Oid            wi_tableoid;
+     bool        wi_tableshared;
      PGPROC       *wi_proc;
      TimestampTz wi_launchtime;
      bool        wi_dobalance;
*************** autovac_balance_cost(void)
*** 1791,1798 ****
          }

          if (worker->wi_proc != NULL)
!             elog(DEBUG2, "autovac_balance_cost(pid=%u db=%u, rel=%u, dobalance=%s cost_limit=%d, cost_limit_base=%d,
cost_delay=%d)",
                   worker->wi_proc->pid, worker->wi_dboid, worker->wi_tableoid,
                   worker->wi_dobalance ? "yes" : "no",
                   worker->wi_cost_limit, worker->wi_cost_limit_base,
                   worker->wi_cost_delay);
--- 1793,1801 ----
          }

          if (worker->wi_proc != NULL)
!             elog(DEBUG2, "autovac_balance_cost(pid=%u db=%u, rel=%u%s, dobalance=%s cost_limit=%d,
cost_limit_base=%d,cost_delay=%d)", 
                   worker->wi_proc->pid, worker->wi_dboid, worker->wi_tableoid,
+                  worker->wi_tableshared ? " (shared)" : "",
                   worker->wi_dobalance ? "yes" : "no",
                   worker->wi_cost_limit, worker->wi_cost_limit_base,
                   worker->wi_cost_delay);
*************** do_autovacuum(void)
*** 1885,1893 ****
      HeapScanDesc relScan;
      Form_pg_database dbForm;
      List       *table_oids = NIL;
      HASHCTL        ctl;
      HTAB       *table_toast_map;
!     ListCell   *volatile cell;
      PgStat_StatDBEntry *shared;
      PgStat_StatDBEntry *dbentry;
      BufferAccessStrategy bstrategy;
--- 1888,1898 ----
      HeapScanDesc relScan;
      Form_pg_database dbForm;
      List       *table_oids = NIL;
+     List       *table_shares = NIL;
      HASHCTL        ctl;
      HTAB       *table_toast_map;
!     ListCell   *volatile lco;
!     ListCell   *volatile lcs;
      PgStat_StatDBEntry *shared;
      PgStat_StatDBEntry *dbentry;
      BufferAccessStrategy bstrategy;
*************** do_autovacuum(void)
*** 2004,2009 ****
--- 2009,2015 ----
          PgStat_StatTabEntry *tabentry;
          AutoVacOpts *relopts;
          Oid            relid;
+         bool        relisshared;
          bool        dovacuum;
          bool        doanalyze;
          bool        wraparound;
*************** do_autovacuum(void)
*** 2013,2022 ****
              continue;

          relid = HeapTupleGetOid(tuple);

          /* Fetch reloptions and the pgstat entry for this table */
          relopts = extract_autovac_opts(tuple, pg_class_desc);
!         tabentry = get_pgstat_tabentry_relid(relid, classForm->relisshared,
                                               shared, dbentry);

          /* Check if it needs vacuum or analyze */
--- 2019,2029 ----
              continue;

          relid = HeapTupleGetOid(tuple);
+         relisshared = classForm->relisshared;

          /* Fetch reloptions and the pgstat entry for this table */
          relopts = extract_autovac_opts(tuple, pg_class_desc);
!         tabentry = get_pgstat_tabentry_relid(relid, relisshared,
                                               shared, dbentry);

          /* Check if it needs vacuum or analyze */
*************** do_autovacuum(void)
*** 2069,2077 ****
          }
          else
          {
!             /* relations that need work are added to table_oids */
              if (dovacuum || doanalyze)
                  table_oids = lappend_oid(table_oids, relid);

              /*
               * Remember the association for the second pass.  Note: we must do
--- 2076,2087 ----
          }
          else
          {
!             /* relations that need work are added to table_oids/table_shares */
              if (dovacuum || doanalyze)
+             {
                  table_oids = lappend_oid(table_oids, relid);
+                 table_shares = lappend_int(table_shares, relisshared);
+             }

              /*
               * Remember the association for the second pass.  Note: we must do
*************** do_autovacuum(void)
*** 2117,2122 ****
--- 2127,2133 ----
          Form_pg_class classForm = (Form_pg_class) GETSTRUCT(tuple);
          PgStat_StatTabEntry *tabentry;
          Oid            relid;
+         bool        relisshared;
          AutoVacOpts *relopts = NULL;
          bool        dovacuum;
          bool        doanalyze;
*************** do_autovacuum(void)
*** 2129,2134 ****
--- 2140,2146 ----
              continue;

          relid = HeapTupleGetOid(tuple);
+         relisshared = classForm->relisshared;

          /*
           * fetch reloptions -- if this toast table does not have them, try the
*************** do_autovacuum(void)
*** 2146,2152 ****
          }

          /* Fetch the pgstat entry for this table */
!         tabentry = get_pgstat_tabentry_relid(relid, classForm->relisshared,
                                               shared, dbentry);

          relation_needs_vacanalyze(relid, relopts, classForm, tabentry,
--- 2158,2164 ----
          }

          /* Fetch the pgstat entry for this table */
!         tabentry = get_pgstat_tabentry_relid(relid, relisshared,
                                               shared, dbentry);

          relation_needs_vacanalyze(relid, relopts, classForm, tabentry,
*************** do_autovacuum(void)
*** 2155,2161 ****
--- 2167,2176 ----

          /* ignore analyze for toast tables */
          if (dovacuum)
+         {
              table_oids = lappend_oid(table_oids, relid);
+             table_shares = lappend_int(table_shares, relisshared);
+         }
      }

      heap_endscan(relScan);
*************** do_autovacuum(void)
*** 2181,2189 ****
      /*
       * Perform operations on collected tables.
       */
!     foreach(cell, table_oids)
      {
!         Oid            relid = lfirst_oid(cell);
          autovac_table *tab;
          bool        skipit;
          int            stdVacuumCostDelay;
--- 2196,2205 ----
      /*
       * Perform operations on collected tables.
       */
!     forboth(lco, table_oids, lcs, table_shares)
      {
!         Oid            relid = lfirst_oid(lco);
!         bool        relisshared = (bool) lfirst_int(lcs);
          autovac_table *tab;
          bool        skipit;
          int            stdVacuumCostDelay;
*************** do_autovacuum(void)
*** 2229,2243 ****
              if (worker == MyWorkerInfo)
                  continue;

!             /* ignore workers in other databases */
!             if (worker->wi_dboid != MyDatabaseId)
                  continue;

!             if (worker->wi_tableoid == relid)
!             {
!                 skipit = true;
!                 break;
!             }
          }
          LWLockRelease(AutovacuumLock);
          if (skipit)
--- 2245,2261 ----
              if (worker == MyWorkerInfo)
                  continue;

!             /* not a match if tableoid or shared-rel flag don't match */
!             if (worker->wi_tableoid != relid ||
!                 worker->wi_tableshared != relisshared)
                  continue;

!             /* for unshared table, not a match unless same database */
!             if (!relisshared && worker->wi_dboid != MyDatabaseId)
!                 continue;
!
!             skipit = true;
!             break;
          }
          LWLockRelease(AutovacuumLock);
          if (skipit)
*************** do_autovacuum(void)
*** 2271,2276 ****
--- 2289,2295 ----
           * the lock so that other workers don't vacuum it concurrently.
           */
          MyWorkerInfo->wi_tableoid = relid;
+         MyWorkerInfo->wi_tableshared = relisshared;
          LWLockRelease(AutovacuumScheduleLock);

          /*

Re: Autovacuum of pg_database

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > These are all shared catalogs.  There are others, so you may still see
> > more.  We got another report for pg_database
> > https://www.postgresql.org/message-id/A9D40BB7-CFD6-46AF-A0A1-249F04878A2A%40amazon.com
> > so I suppose there really is a bug.  I don't know what's going on there.
>
> I think it's pretty obvious: autovacuum.c's rule for detecting whether
> some other worker is already processing table X is wrong when X is a
> shared table.  I propose the attached patch.

Hmm, I have pretty much the same patch, except I added the flag to
struct autovac_table and have it populated by table_recheck_autovac.
Haven't tested this yet, which is why I hadn't posted it.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Autovacuum of pg_database

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane wrote:
>> I think it's pretty obvious: autovacuum.c's rule for detecting whether
>> some other worker is already processing table X is wrong when X is a
>> shared table.  I propose the attached patch.

> Hmm, I have pretty much the same patch, except I added the flag to
> struct autovac_table and have it populated by table_recheck_autovac.
> Haven't tested this yet, which is why I hadn't posted it.

Actually, I realized after looking at it that the new wi_tableshared
field is unnecessary in my patch.  The only important part is that
knowledge of relisshared be available when we're looking for conflicting
workers, and that is entirely local in do_autovacuum().  I'd started the
patch by adding wi_tableshared, on the expectation that it would be
necessary, but it ain't ...

            regards, tom lane


Re: Autovacuum of pg_database

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > Tom Lane wrote:
> >> I think it's pretty obvious: autovacuum.c's rule for detecting whether
> >> some other worker is already processing table X is wrong when X is a
> >> shared table.  I propose the attached patch.
>
> > Hmm, I have pretty much the same patch, except I added the flag to
> > struct autovac_table and have it populated by table_recheck_autovac.
> > Haven't tested this yet, which is why I hadn't posted it.
>
> Actually, I realized after looking at it that the new wi_tableshared
> field is unnecessary in my patch.  The only important part is that
> knowledge of relisshared be available when we're looking for conflicting
> workers, and that is entirely local in do_autovacuum().  I'd started the
> patch by adding wi_tableshared, on the expectation that it would be
> necessary, but it ain't ...

OK, if I understand you correctly then that is what my patch does -- the
"sharedrel" flag in my patch is only inside do_autovacuum.  Do you
already have a test rig for this?  If not, I can set one up so that I
can push the patch hopefully later today.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Autovacuum of pg_database

From
Greg Spiegelberg
Date:
All,

We were bit a couple months ago by a very similar issue where autovacuum ignored pg_type.  A manual vacuum fixed it but since that table is abused by every client using libpq which is darn near everything I suspect it went bad in a hurry.

Question is, will this patch be backported to 9.3?

Thanks,
-Greg

On Fri, May 6, 2016 at 10:19 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > Tom Lane wrote:
> >> I think it's pretty obvious: autovacuum.c's rule for detecting whether
> >> some other worker is already processing table X is wrong when X is a
> >> shared table.  I propose the attached patch.
>
> > Hmm, I have pretty much the same patch, except I added the flag to
> > struct autovac_table and have it populated by table_recheck_autovac.
> > Haven't tested this yet, which is why I hadn't posted it.
>
> Actually, I realized after looking at it that the new wi_tableshared
> field is unnecessary in my patch.  The only important part is that
> knowledge of relisshared be available when we're looking for conflicting
> workers, and that is entirely local in do_autovacuum().  I'd started the
> patch by adding wi_tableshared, on the expectation that it would be
> necessary, but it ain't ...

OK, if I understand you correctly then that is what my patch does -- the
"sharedrel" flag in my patch is only inside do_autovacuum.  Do you
already have a test rig for this?  If not, I can set one up so that I
can push the patch hopefully later today.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: Autovacuum of pg_database

From
Alvaro Herrera
Date:
Hi Greg,

Greg Spiegelberg wrote:

> We were bit a couple months ago by a very similar issue where autovacuum
> ignored pg_type.  A manual vacuum fixed it but since that table is abused
> by every client using libpq which is darn near everything I suspect it went
> bad in a hurry.

Hmm.  The current report is about shared catalogs (pg_shdepend and
pg_database were reported as problematic) which pg_type is not, so I
doubt that this bugfix will have any impact in a problem vacuuming
pg_type.

I'm interested in seeing a more detailed report from you about the
pg_type vacuuming failure.

> Question is, will this patch be backported to 9.3?

Yes, in my opinion we would backpatch it, back to 9.1 even.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Autovacuum of pg_database

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane wrote:
>> Actually, I realized after looking at it that the new wi_tableshared
>> field is unnecessary in my patch.  The only important part is that
>> knowledge of relisshared be available when we're looking for conflicting
>> workers, and that is entirely local in do_autovacuum().  I'd started the
>> patch by adding wi_tableshared, on the expectation that it would be
>> necessary, but it ain't ...

> OK, if I understand you correctly then that is what my patch does -- the
> "sharedrel" flag in my patch is only inside do_autovacuum.  Do you
> already have a test rig for this?

I didn't attempt to replicate the problem report, if that's what you
mean; I just tested it as far as running the regression tests.

            regards, tom lane


Re: Autovacuum of pg_database

From
Greg Spiegelberg
Date:
On Fri, May 6, 2016 at 10:33 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Hi Greg,

Greg Spiegelberg wrote:

> We were bit a couple months ago by a very similar issue where autovacuum
> ignored pg_type.  A manual vacuum fixed it but since that table is abused
> by every client using libpq which is darn near everything I suspect it went
> bad in a hurry.

Hmm.  The current report is about shared catalogs (pg_shdepend and
pg_database were reported as problematic) which pg_type is not, so I
doubt that this bugfix will have any impact in a problem vacuuming
pg_type.

I'm interested in seeing a more detailed report from you about the
pg_type vacuuming failure.


Not to steer the conversation elsewhere but the short of it is the system in question is a very active 9.3.5, well aware of more current versions, and in the early morning on Feb 8th before normal increase in activity and in the span of a few minutes queries went from normal sub-second responses to 30 sec to 5 minutes.  Took hours to isolate because we never suspected pg_type until logging was turned up.  We had to suspend operations and suspend pgbouncer to execute a manual VACUUM FULL.  EXPLAIN ANALYZE before reported hundreds of thousands of rows before the vacuum and <2,000 after.  All queries since have performed well since.

Stepping into the wayback machine, before VACUUM FULL:
Seq Scan on pg_type (cost=0.00..155188.24 rows=4788024 width=68) (actual time=0.007..40951.243 rows=1889 loops=1)
Total runtime: 40952.907 ms

Attempted a simple VACUUM VERBOSE:
vacuum verbose pg_type;
INFO: vacuuming "pg_catalog.pg_type"
INFO: scanned index "pg_type_oid_index" to remove 860 row versions
DETAIL: CPU 0.98s/3.63u sec elapsed 96.52 sec.
INFO: scanned index "pg_type_typname_nsp_index" to remove 860 row versions
DETAIL: CPU 1.89s/7.14u sec elapsed 247.72 sec.
INFO: "pg_type": removed 860 row versions in 36 pages
DETAIL: CPU 0.00s/0.00u sec elapsed 0.04 sec.
INFO: index "pg_type_oid_index" now contains 2014 row versions in 25824
pages
DETAIL: 715 index row versions were removed.
25717 index pages have been deleted, 25714 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "pg_type_typname_nsp_index" now contains 2014 row versions in
35903 pages
DETAIL: 860 index row versions were removed.
35560 index pages have been deleted, 35488 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "pg_type": found 163 removable, 1909 nonremovable row versions in
72 out of 107308 pages
DETAIL: 36 dead row versions cannot be removed yet.
There were 3056 unused item pointers.
0 pages are entirely empty.
CPU 2.88s/10.79u sec elapsed 344.29 sec.
INFO: "pg_type": stopping truncate due to conflicting lock request
VACUUM

The VACUUM FULL:
vacuum full verbose pg_type;
INFO: vacuuming "pg_catalog.pg_type"
INFO: "pg_type": found 42 removable, 1895 nonremovable row versions in 107308 pages
DETAIL: 4 dead row versions cannot be removed yet.
CPU 0.10s/0.11u sec elapsed 0.22 sec.
VACUUM
Time: 957.159 ms

EXPLAIN ANALYZE then and now:
 Seq Scan on pg_type  (cost=0.00..680.17 rows=1917 width=754) (actual time=0.004..10.467 rows=1917 loops=1)
 Total runtime: 10.999 ms


Autovacuum settings:

select name,setting from pg_settings where name ~ 'autov';
                name                 |  setting
-------------------------------------+-----------
 autovacuum                          | on
 autovacuum_analyze_scale_factor     | 0.1
 autovacuum_analyze_threshold        | 50
 autovacuum_freeze_max_age           | 200000000
 autovacuum_max_workers              | 3
 autovacuum_multixact_freeze_max_age | 400000000
 autovacuum_naptime                  | 60
 autovacuum_vacuum_cost_delay        | 20
 autovacuum_vacuum_cost_limit        | -1
 autovacuum_vacuum_scale_factor      | 0.2
 autovacuum_vacuum_threshold         | 50
 log_autovacuum_min_duration         | -1
(12 rows)


I know, not much to go on.

 
> Question is, will this patch be backported to 9.3?

Yes, in my opinion we would backpatch it, back to 9.1 even.


Good news.

Thanks,
-Greg
 

Re: Autovacuum of pg_database

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > Tom Lane wrote:
> >> Actually, I realized after looking at it that the new wi_tableshared
> >> field is unnecessary in my patch.  The only important part is that
> >> knowledge of relisshared be available when we're looking for conflicting
> >> workers, and that is entirely local in do_autovacuum().  I'd started the
> >> patch by adding wi_tableshared, on the expectation that it would be
> >> necessary, but it ain't ...
>
> > OK, if I understand you correctly then that is what my patch does -- the
> > "sharedrel" flag in my patch is only inside do_autovacuum.  Do you
> > already have a test rig for this?
>
> I didn't attempt to replicate the problem report, if that's what you
> mean; I just tested it as far as running the regression tests.

OK.  I'll do some more targeted testing later before pushing.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Autovacuum of pg_database

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > These are all shared catalogs.  There are others, so you may still see
> > more.  We got another report for pg_database
> > https://www.postgresql.org/message-id/A9D40BB7-CFD6-46AF-A0A1-249F04878A2A%40amazon.com
> > so I suppose there really is a bug.  I don't know what's going on there.
>
> I think it's pretty obvious: autovacuum.c's rule for detecting whether
> some other worker is already processing table X is wrong when X is a
> shared table.  I propose the attached patch.

Now that I actually tried this, it turned out that this problem is not
so simple.  vacuum.c already has logic to use conditional acquire of the
table-level lock, and if not available it skips the table:

LOG:  skipping vacuum of "pg_shdepend" --- lock not available

so an autovacuum worker is never "stuck" behind another worker trying to
vacuum the table.  This code is already in 9.2.  I suppose the only way
for multiple workers to get stuck is the relatively new logic in
lazy_truncate_heap that retries multiple times when AEL is not
available.  I haven't tried to replicate this yet.

In other words, the patches proposed here would not fix the actual
problem.  Back to the drawing board, it seems.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Autovacuum of pg_database

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Now that I actually tried this, it turned out that this problem is not
> so simple.  vacuum.c already has logic to use conditional acquire of the
> table-level lock, and if not available it skips the table:

> LOG:  skipping vacuum of "pg_shdepend" --- lock not available

> so an autovacuum worker is never "stuck" behind another worker trying to
> vacuum the table.

Hmm ... but then, how do we have the observed symptom of several workers
concurrently trying to process the same shared catalog?  Seems like all
but one should fall out at this point.

            regards, tom lane


Re: Autovacuum of pg_database

From
Tom Lane
Date:
I wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> Now that I actually tried this, it turned out that this problem is not
>> so simple.  vacuum.c already has logic to use conditional acquire of the
>> table-level lock, and if not available it skips the table:
>> LOG:  skipping vacuum of "pg_shdepend" --- lock not available
>> so an autovacuum worker is never "stuck" behind another worker trying to
>> vacuum the table.

> Hmm ... but then, how do we have the observed symptom of several workers
> concurrently trying to process the same shared catalog?  Seems like all
> but one should fall out at this point.

Oh, see table_recheck_autovac:

        tab->at_vacoptions = VACOPT_SKIPTOAST |
            (dovacuum ? VACOPT_VACUUM : 0) |
            (doanalyze ? VACOPT_ANALYZE : 0) |
            (!wraparound ? VACOPT_NOWAIT : 0);
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I'll bet they're all trying to do anti-wraparound vacuums.  This would
explain why the problem hasn't been observed often enough to have been
fixed long since.

            regards, tom lane


Re: Autovacuum of pg_database

From
Bob Lunney
Date:
I had the same problem with 9.1 and vacuuming several databases simultaneously.  We just lived with it.  Other tables
inthe databases were far larger than pg_sharedepend, so it wasn't the most significant issue we faced. 

Bob Lunney


> On May 9, 2016, at 6:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I wrote:
>> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>>> Now that I actually tried this, it turned out that this problem is not
>>> so simple.  vacuum.c already has logic to use conditional acquire of the
>>> table-level lock, and if not available it skips the table:
>>> LOG:  skipping vacuum of "pg_shdepend" --- lock not available
>>> so an autovacuum worker is never "stuck" behind another worker trying to
>>> vacuum the table.
>
>> Hmm ... but then, how do we have the observed symptom of several workers
>> concurrently trying to process the same shared catalog?  Seems like all
>> but one should fall out at this point.
>
> Oh, see table_recheck_autovac:
>
>        tab->at_vacoptions = VACOPT_SKIPTOAST |
>            (dovacuum ? VACOPT_VACUUM : 0) |
>            (doanalyze ? VACOPT_ANALYZE : 0) |
>            (!wraparound ? VACOPT_NOWAIT : 0);
>             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> I'll bet they're all trying to do anti-wraparound vacuums.  This would
> explain why the problem hasn't been observed often enough to have been
> fixed long since.
>
>             regards, tom lane
>
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin



Re: Autovacuum of pg_database

From
Alvaro Herrera
Date:
Tom Lane wrote:

> I'll bet they're all trying to do anti-wraparound vacuums.  This would
> explain why the problem hasn't been observed often enough to have been
> fixed long since.

Ah, that should have been obvious, shouldn't it.  I'm able to reproduce
the problem by moving relfrozenxid backwards for a shared catalog, and I
can confirm that it works correctly with the patch proposed, so I have
pushed it to all branches.  (I only tested the problem scenario in 9.1
and master, since it's too much manual work).

Thanks for the report!

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Autovacuum of pg_database

From
Ondřej Světlík
Date:
>
> I'll bet they're all trying to do anti-wraparound vacuums.  This would
> explain why the problem hasn't been observed often enough to have been
> fixed long since.
>
>             regards, tom lane
>

You are right, sorry I didn't mention it sooner.

With regards

Ondřej