Thread: Autovacuum of pg_database
Hello again, we have another strange problem with autovacuum. The process goes well until it reaches the table pg_database (in some databases, not all), then it loops over it forever (until I run vacuum full analyze on that table, simple vacuum doesn't help): 2016-05-05 17:13:33 CEST 34818LOG: automatic vacuum of table "somedatabase.pg_catalog.pg_database": index scans: 0 pages: 0 removed, 233 remain tuples: 0 removed, 5817 remain, 0 are dead but not yet removable buffer usage: 87 hits, 0 misses, 0 dirtied avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec 2016-05-05 17:13:33 CEST 34819LOG: automatic vacuum of table "somedatabase.pg_catalog.pg_database": index scans: 0 pages: 0 removed, 233 remain tuples: 0 removed, 5817 remain, 0 are dead but not yet removable buffer usage: 87 hits, 0 misses, 0 dirtied avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec 2016-05-05 17:13:33 CEST 34820LOG: automatic vacuum of table "somedatabase.pg_catalog.pg_database": index scans: 0 pages: 0 removed, 233 remain tuples: 0 removed, 5817 remain, 0 are dead but not yet removable buffer usage: 87 hits, 0 misses, 0 dirtied avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec 2016-05-05 17:13:34 CEST 34821LOG: automatic vacuum of table "somedatabase.pg_catalog.pg_database": index scans: 0 pages: 0 removed, 233 remain tuples: 0 removed, 5817 remain, 0 are dead but not yet removable buffer usage: 87 hits, 0 misses, 0 dirtied avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec Not sure if it's a bug or I'm doing something wrong. Thanks in advance, Ondřej
Dne 5.5.2016 v 17:16 Ondřej Světlík napsal(a): > Hello again, > > we have another strange problem with autovacuum. The process goes well > until it reaches the table pg_database (in some databases, not all), > then it loops over it forever (until I run vacuum full analyze on that > table, simple vacuum doesn't help): > > 2016-05-05 17:13:33 CEST 34818LOG: automatic vacuum of table > "somedatabase.pg_catalog.pg_database": index scans: 0 > pages: 0 removed, 233 remain > tuples: 0 removed, 5817 remain, 0 are dead but not yet removable > buffer usage: 87 hits, 0 misses, 0 dirtied > avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s > system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec > 2016-05-05 17:13:33 CEST 34819LOG: automatic vacuum of table > "somedatabase.pg_catalog.pg_database": index scans: 0 > pages: 0 removed, 233 remain > tuples: 0 removed, 5817 remain, 0 are dead but not yet removable > buffer usage: 87 hits, 0 misses, 0 dirtied > avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s > system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec > 2016-05-05 17:13:33 CEST 34820LOG: automatic vacuum of table > "somedatabase.pg_catalog.pg_database": index scans: 0 > pages: 0 removed, 233 remain > tuples: 0 removed, 5817 remain, 0 are dead but not yet removable > buffer usage: 87 hits, 0 misses, 0 dirtied > avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s > system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec > 2016-05-05 17:13:34 CEST 34821LOG: automatic vacuum of table > "somedatabase.pg_catalog.pg_database": index scans: 0 > pages: 0 removed, 233 remain > tuples: 0 removed, 5817 remain, 0 are dead but not yet removable > buffer usage: 87 hits, 0 misses, 0 dirtied > avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s > system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec > > Not sure if it's a bug or I'm doing something wrong. > > Thanks in advance, > > Ondřej Another part of log shows, that the process tries pg_database twice, than it does one other table and than back to pg_database for the rest of our days: 2016-05-05 17:16:56 CEST 36675LOG: automatic vacuum of table "someotherdb.pg_catalog.pg_database": index scans: 0 pages: 0 removed, 233 remain tuples: 0 removed, 5817 remain, 0 are dead but not yet removable buffer usage: 36 hits, 50 misses, 2 dirtied avg read rate: 3.277 MB/s, avg write rate: 0.131 MB/s system usage: CPU 0.00s/0.00u sec elapsed 0.11 sec 2016-05-05 17:16:56 CEST 36676LOG: automatic vacuum of table "someotherdb.pg_catalog.pg_database": index scans: 0 pages: 0 removed, 233 remain tuples: 0 removed, 5817 remain, 0 are dead but not yet removable buffer usage: 68 hits, 18 misses, 0 dirtied avg read rate: 2.563 MB/s, avg write rate: 0.000 MB/s system usage: CPU 0.00s/0.00u sec elapsed 0.05 sec 2016-05-05 17:16:56 CEST 36674LOG: automatic vacuum of table "someotherdb.pg_toast.pg_toast_2618": index scans: 1 pages: 58 removed, 50 remain tuples: 141 removed, 219 remain, 0 are dead but not yet removable buffer usage: 151 hits, 192 misses, 73 dirtied avg read rate: 2.728 MB/s, avg write rate: 1.037 MB/s system usage: CPU 0.00s/0.00u sec elapsed 0.54 sec 2016-05-05 17:16:56 CEST 36677LOG: automatic vacuum of table "someotherdb.pg_catalog.pg_database": index scans: 0 pages: 0 removed, 233 remain tuples: 0 removed, 5817 remain, 0 are dead but not yet removable buffer usage: 86 hits, 0 misses, 0 dirtied avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s system usage: CPU 0.00s/0.00u sec elapsed 0.01 sec Ondřej
Dne 5.5.2016 v 17:19 Ondřej Světlík napsal(a): > Dne 5.5.2016 v 17:16 Ondřej Světlík napsal(a): >> Hello again, >> >> we have another strange problem with autovacuum. The process goes well >> until it reaches the table pg_database (in some databases, not all), >> then it loops over it forever (until I run vacuum full analyze on that >> table, simple vacuum doesn't help): >> >> 2016-05-05 17:13:33 CEST 34818LOG: automatic vacuum of table >> "somedatabase.pg_catalog.pg_database": index scans: 0 >> pages: 0 removed, 233 remain >> tuples: 0 removed, 5817 remain, 0 are dead but not yet removable >> buffer usage: 87 hits, 0 misses, 0 dirtied And now the same goes for pg_authid. Sorry for spamming, I hope this is the last one. Ondřej
Ondřej Světlík wrote: > Dne 5.5.2016 v 17:19 Ondřej Světlík napsal(a): > >Dne 5.5.2016 v 17:16 Ondřej Světlík napsal(a): > >>Hello again, > >> > >>we have another strange problem with autovacuum. The process goes well > >>until it reaches the table pg_database (in some databases, not all), > >>then it loops over it forever (until I run vacuum full analyze on that > >>table, simple vacuum doesn't help): > >> > >>2016-05-05 17:13:33 CEST 34818LOG: automatic vacuum of table > >>"somedatabase.pg_catalog.pg_database": index scans: 0 > >> pages: 0 removed, 233 remain > >> tuples: 0 removed, 5817 remain, 0 are dead but not yet removable > >> buffer usage: 87 hits, 0 misses, 0 dirtied > > And now the same goes for pg_authid. Sorry for spamming, I hope this is the > last one. These are all shared catalogs. There are others, so you may still see more. We got another report for pg_database https://www.postgresql.org/message-id/A9D40BB7-CFD6-46AF-A0A1-249F04878A2A%40amazon.com so I suppose there really is a bug. I don't know what's going on there. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello, I have observed that after upgrading from 9.3 to 9.4 on windows, autovacuum keep running again and again. In 9.3 this was not true. Any recommendations? Thanks & Regards, Vicky Soni Database Administrator -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Alvaro Herrera Sent: 05 May 2016 21:14 To: Ondřej Světlík <osvetlik@flexibee.eu> Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Autovacuum of pg_database Ondřej Světlík wrote: > Dne 5.5.2016 v 17:19 Ondřej Světlík napsal(a): > >Dne 5.5.2016 v 17:16 Ondřej Světlík napsal(a): > >>Hello again, > >> > >>we have another strange problem with autovacuum. The process goes > >>well until it reaches the table pg_database (in some databases, not > >>all), then it loops over it forever (until I run vacuum full > >>analyze on that table, simple vacuum doesn't help): > >> > >>2016-05-05 17:13:33 CEST 34818LOG: automatic vacuum of table > >>"somedatabase.pg_catalog.pg_database": index scans: 0 > >> pages: 0 removed, 233 remain > >> tuples: 0 removed, 5817 remain, 0 are dead but not yet removable > >> buffer usage: 87 hits, 0 misses, 0 dirtied > > And now the same goes for pg_authid. Sorry for spamming, I hope this > is the last one. These are all shared catalogs. There are others, so you may still see more. We got another report for pg_database https://www.postgresql.org/message-id/A9D40BB7-CFD6-46AF-A0A1-249F04878A2A%40amazon.com so I suppose there really is a bug. I don't know what's going on there. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
Vicky Soni - Quipment India wrote: > Hello, > > I have observed that after upgrading from 9.3 to 9.4 on windows, autovacuum keep running again and again. > > In 9.3 this was not true. Any recommendations? Maybe there is a multixact wraparound problem. What 9.3 were you running previously, and what 9.4 did you upgrade to? Can you share the output of pg_controldata? Please paste the output of this query select datname, datminmxid from pg_database -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Dne 5.5.2016 v 17:51 Alvaro Herrera napsal(a): > Vicky Soni - Quipment India wrote: >> Hello, >> >> I have observed that after upgrading from 9.3 to 9.4 on windows, autovacuum keep running again andagain. >> >> In 9.3 this was not true. Any recommendations? > > Maybe there is a multixact wraparound problem. What 9.3 were you > running previously, and what 9.4 did you upgrade to? Can you share the > output of pg_controldata? Please paste the output of this query > select datname, datminmxid from pg_database > That sound's reasonable, our autovacuum problems started with processes labeled as 'to prevent wraparound' (on pg_shdepend). We use 9.4. Ondřej
Output of given query. Not sharing db names except temp database. 'datminmxid' 1 1 1 template0 16696 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Thanks & Regards, Vicky Soni Database Administrator -----Original Message----- From: Alvaro Herrera [mailto:alvherre@2ndquadrant.com] Sent: 05 May 2016 21:21 To: Vicky Soni - Quipment India <vicky.soni@quipment.nl> Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Autovacuum of pg_database Vicky Soni - Quipment India wrote: > Hello, > > I have observed that after upgrading from 9.3 to 9.4 on windows, autovacuum keep running again and again. > > In 9.3 this was not true. Any recommendations? Maybe there is a multixact wraparound problem. What 9.3 were you running previously, and what 9.4 did you upgrade to? Canyou share the output of pg_controldata? Please paste the output of this query select datname, datminmxid from pg_database -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Dne 5.5.2016 v 17:43 Alvaro Herrera napsal(a): > Ondřej Světlík wrote: >> Dne 5.5.2016 v 17:19 Ondřej Světlík napsal(a): >>> Dne 5.5.2016 v 17:16 Ondřej Světlík napsal(a): >>>> Hello again, >>>> >>>> we have another strange problem with autovacuum. The process goes well >>>> until it reaches the table pg_database (in some databases, not all), >>>> then it loops over it forever (until I run vacuum full analyze on that >>>> table, simple vacuum doesn't help): >>>> >>>> 2016-05-05 17:13:33 CEST 34818LOG: automatic vacuum of table >>>> "somedatabase.pg_catalog.pg_database": index scans: 0 >>>> pages: 0 removed, 233 remain >>>> tuples: 0 removed, 5817 remain, 0 are dead but not yet removable >>>> buffer usage: 87 hits, 0 misses, 0 dirtied >> >> And now the same goes for pg_authid. Sorry for spamming, I hope this is the >> last one. > > These are all shared catalogs. There are others, so you may still see > more. We got another report for pg_database > https://www.postgresql.org/message-id/A9D40BB7-CFD6-46AF-A0A1-249F04878A2A%40amazon.com > so I suppose there really is a bug. I don't know what's going on there. > Let me know if I can help trace the cause. With regards Ondřej
Hello, In my development setup I promoted slave, and now autovacuum keep running again and again. Specially on pg_statistics table of one of my OLTP using many temp tables in functions. Please help. Thanks & Regards, Vicky Soni Database Administrator -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Ondrej Svetlík Sent: 05 May 2016 21:28 To: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Autovacuum of pg_database Dne 5.5.2016 v 17:43 Alvaro Herrera napsal(a): > Ondřej Světlík wrote: >> Dne 5.5.2016 v 17:19 Ondřej Světlík napsal(a): >>> Dne 5.5.2016 v 17:16 Ondřej Světlík napsal(a): >>>> Hello again, >>>> >>>> we have another strange problem with autovacuum. The process goes >>>> well until it reaches the table pg_database (in some databases, not >>>> all), then it loops over it forever (until I run vacuum full >>>> analyze on that table, simple vacuum doesn't help): >>>> >>>> 2016-05-05 17:13:33 CEST 34818LOG: automatic vacuum of table >>>> "somedatabase.pg_catalog.pg_database": index scans: 0 >>>> pages: 0 removed, 233 remain >>>> tuples: 0 removed, 5817 remain, 0 are dead but not yet removable >>>> buffer usage: 87 hits, 0 misses, 0 dirtied >> >> And now the same goes for pg_authid. Sorry for spamming, I hope this >> is the last one. > > These are all shared catalogs. There are others, so you may still see > more. We got another report for pg_database > https://www.postgresql.org/message-id/A9D40BB7-CFD6-46AF-A0A1-249F0487 > 8A2A%40amazon.com so I suppose there really is a bug. I don't know > what's going on there. > Let me know if I can help trace the cause. With regards Ondřej -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > These are all shared catalogs. There are others, so you may still see > more. We got another report for pg_database > https://www.postgresql.org/message-id/A9D40BB7-CFD6-46AF-A0A1-249F04878A2A%40amazon.com > so I suppose there really is a bug. I don't know what's going on there. I think it's pretty obvious: autovacuum.c's rule for detecting whether some other worker is already processing table X is wrong when X is a shared table. I propose the attached patch. regards, tom lane diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index e2859df..4bc5d42 100644 *** a/src/backend/postmaster/autovacuum.c --- b/src/backend/postmaster/autovacuum.c *************** typedef struct autovac_table *** 203,215 **** * wi_links entry into free list or running list * wi_dboid OID of the database this worker is supposed to work on * wi_tableoid OID of the table currently being vacuumed, if any * wi_proc pointer to PGPROC of the running worker, NULL if not started * wi_launchtime Time at which this worker was launched * wi_cost_* Vacuum cost-based delay parameters current in this worker * ! * All fields are protected by AutovacuumLock, except for wi_tableoid which is ! * protected by AutovacuumScheduleLock (which is read-only for everyone except ! * that worker itself). *------------- */ typedef struct WorkerInfoData --- 203,216 ---- * wi_links entry into free list or running list * wi_dboid OID of the database this worker is supposed to work on * wi_tableoid OID of the table currently being vacuumed, if any + * wi_tableshared true if the table currently being vacuumed is a shared rel * wi_proc pointer to PGPROC of the running worker, NULL if not started * wi_launchtime Time at which this worker was launched * wi_cost_* Vacuum cost-based delay parameters current in this worker * ! * All fields are protected by AutovacuumLock, except for wi_tableoid and ! * wi_tableshared which are protected by AutovacuumScheduleLock (and are ! * read-only for everyone except that worker itself). *------------- */ typedef struct WorkerInfoData *************** typedef struct WorkerInfoData *** 217,222 **** --- 218,224 ---- dlist_node wi_links; Oid wi_dboid; Oid wi_tableoid; + bool wi_tableshared; PGPROC *wi_proc; TimestampTz wi_launchtime; bool wi_dobalance; *************** autovac_balance_cost(void) *** 1791,1798 **** } if (worker->wi_proc != NULL) ! elog(DEBUG2, "autovac_balance_cost(pid=%u db=%u, rel=%u, dobalance=%s cost_limit=%d, cost_limit_base=%d, cost_delay=%d)", worker->wi_proc->pid, worker->wi_dboid, worker->wi_tableoid, worker->wi_dobalance ? "yes" : "no", worker->wi_cost_limit, worker->wi_cost_limit_base, worker->wi_cost_delay); --- 1793,1801 ---- } if (worker->wi_proc != NULL) ! elog(DEBUG2, "autovac_balance_cost(pid=%u db=%u, rel=%u%s, dobalance=%s cost_limit=%d, cost_limit_base=%d,cost_delay=%d)", worker->wi_proc->pid, worker->wi_dboid, worker->wi_tableoid, + worker->wi_tableshared ? " (shared)" : "", worker->wi_dobalance ? "yes" : "no", worker->wi_cost_limit, worker->wi_cost_limit_base, worker->wi_cost_delay); *************** do_autovacuum(void) *** 1885,1893 **** HeapScanDesc relScan; Form_pg_database dbForm; List *table_oids = NIL; HASHCTL ctl; HTAB *table_toast_map; ! ListCell *volatile cell; PgStat_StatDBEntry *shared; PgStat_StatDBEntry *dbentry; BufferAccessStrategy bstrategy; --- 1888,1898 ---- HeapScanDesc relScan; Form_pg_database dbForm; List *table_oids = NIL; + List *table_shares = NIL; HASHCTL ctl; HTAB *table_toast_map; ! ListCell *volatile lco; ! ListCell *volatile lcs; PgStat_StatDBEntry *shared; PgStat_StatDBEntry *dbentry; BufferAccessStrategy bstrategy; *************** do_autovacuum(void) *** 2004,2009 **** --- 2009,2015 ---- PgStat_StatTabEntry *tabentry; AutoVacOpts *relopts; Oid relid; + bool relisshared; bool dovacuum; bool doanalyze; bool wraparound; *************** do_autovacuum(void) *** 2013,2022 **** continue; relid = HeapTupleGetOid(tuple); /* Fetch reloptions and the pgstat entry for this table */ relopts = extract_autovac_opts(tuple, pg_class_desc); ! tabentry = get_pgstat_tabentry_relid(relid, classForm->relisshared, shared, dbentry); /* Check if it needs vacuum or analyze */ --- 2019,2029 ---- continue; relid = HeapTupleGetOid(tuple); + relisshared = classForm->relisshared; /* Fetch reloptions and the pgstat entry for this table */ relopts = extract_autovac_opts(tuple, pg_class_desc); ! tabentry = get_pgstat_tabentry_relid(relid, relisshared, shared, dbentry); /* Check if it needs vacuum or analyze */ *************** do_autovacuum(void) *** 2069,2077 **** } else { ! /* relations that need work are added to table_oids */ if (dovacuum || doanalyze) table_oids = lappend_oid(table_oids, relid); /* * Remember the association for the second pass. Note: we must do --- 2076,2087 ---- } else { ! /* relations that need work are added to table_oids/table_shares */ if (dovacuum || doanalyze) + { table_oids = lappend_oid(table_oids, relid); + table_shares = lappend_int(table_shares, relisshared); + } /* * Remember the association for the second pass. Note: we must do *************** do_autovacuum(void) *** 2117,2122 **** --- 2127,2133 ---- Form_pg_class classForm = (Form_pg_class) GETSTRUCT(tuple); PgStat_StatTabEntry *tabentry; Oid relid; + bool relisshared; AutoVacOpts *relopts = NULL; bool dovacuum; bool doanalyze; *************** do_autovacuum(void) *** 2129,2134 **** --- 2140,2146 ---- continue; relid = HeapTupleGetOid(tuple); + relisshared = classForm->relisshared; /* * fetch reloptions -- if this toast table does not have them, try the *************** do_autovacuum(void) *** 2146,2152 **** } /* Fetch the pgstat entry for this table */ ! tabentry = get_pgstat_tabentry_relid(relid, classForm->relisshared, shared, dbentry); relation_needs_vacanalyze(relid, relopts, classForm, tabentry, --- 2158,2164 ---- } /* Fetch the pgstat entry for this table */ ! tabentry = get_pgstat_tabentry_relid(relid, relisshared, shared, dbentry); relation_needs_vacanalyze(relid, relopts, classForm, tabentry, *************** do_autovacuum(void) *** 2155,2161 **** --- 2167,2176 ---- /* ignore analyze for toast tables */ if (dovacuum) + { table_oids = lappend_oid(table_oids, relid); + table_shares = lappend_int(table_shares, relisshared); + } } heap_endscan(relScan); *************** do_autovacuum(void) *** 2181,2189 **** /* * Perform operations on collected tables. */ ! foreach(cell, table_oids) { ! Oid relid = lfirst_oid(cell); autovac_table *tab; bool skipit; int stdVacuumCostDelay; --- 2196,2205 ---- /* * Perform operations on collected tables. */ ! forboth(lco, table_oids, lcs, table_shares) { ! Oid relid = lfirst_oid(lco); ! bool relisshared = (bool) lfirst_int(lcs); autovac_table *tab; bool skipit; int stdVacuumCostDelay; *************** do_autovacuum(void) *** 2229,2243 **** if (worker == MyWorkerInfo) continue; ! /* ignore workers in other databases */ ! if (worker->wi_dboid != MyDatabaseId) continue; ! if (worker->wi_tableoid == relid) ! { ! skipit = true; ! break; ! } } LWLockRelease(AutovacuumLock); if (skipit) --- 2245,2261 ---- if (worker == MyWorkerInfo) continue; ! /* not a match if tableoid or shared-rel flag don't match */ ! if (worker->wi_tableoid != relid || ! worker->wi_tableshared != relisshared) continue; ! /* for unshared table, not a match unless same database */ ! if (!relisshared && worker->wi_dboid != MyDatabaseId) ! continue; ! ! skipit = true; ! break; } LWLockRelease(AutovacuumLock); if (skipit) *************** do_autovacuum(void) *** 2271,2276 **** --- 2289,2295 ---- * the lock so that other workers don't vacuum it concurrently. */ MyWorkerInfo->wi_tableoid = relid; + MyWorkerInfo->wi_tableshared = relisshared; LWLockRelease(AutovacuumScheduleLock); /*
Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > These are all shared catalogs. There are others, so you may still see > > more. We got another report for pg_database > > https://www.postgresql.org/message-id/A9D40BB7-CFD6-46AF-A0A1-249F04878A2A%40amazon.com > > so I suppose there really is a bug. I don't know what's going on there. > > I think it's pretty obvious: autovacuum.c's rule for detecting whether > some other worker is already processing table X is wrong when X is a > shared table. I propose the attached patch. Hmm, I have pretty much the same patch, except I added the flag to struct autovac_table and have it populated by table_recheck_autovac. Haven't tested this yet, which is why I hadn't posted it. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Tom Lane wrote: >> I think it's pretty obvious: autovacuum.c's rule for detecting whether >> some other worker is already processing table X is wrong when X is a >> shared table. I propose the attached patch. > Hmm, I have pretty much the same patch, except I added the flag to > struct autovac_table and have it populated by table_recheck_autovac. > Haven't tested this yet, which is why I hadn't posted it. Actually, I realized after looking at it that the new wi_tableshared field is unnecessary in my patch. The only important part is that knowledge of relisshared be available when we're looking for conflicting workers, and that is entirely local in do_autovacuum(). I'd started the patch by adding wi_tableshared, on the expectation that it would be necessary, but it ain't ... regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > Tom Lane wrote: > >> I think it's pretty obvious: autovacuum.c's rule for detecting whether > >> some other worker is already processing table X is wrong when X is a > >> shared table. I propose the attached patch. > > > Hmm, I have pretty much the same patch, except I added the flag to > > struct autovac_table and have it populated by table_recheck_autovac. > > Haven't tested this yet, which is why I hadn't posted it. > > Actually, I realized after looking at it that the new wi_tableshared > field is unnecessary in my patch. The only important part is that > knowledge of relisshared be available when we're looking for conflicting > workers, and that is entirely local in do_autovacuum(). I'd started the > patch by adding wi_tableshared, on the expectation that it would be > necessary, but it ain't ... OK, if I understand you correctly then that is what my patch does -- the "sharedrel" flag in my patch is only inside do_autovacuum. Do you already have a test rig for this? If not, I can set one up so that I can push the patch hopefully later today. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
All,
We were bit a couple months ago by a very similar issue where autovacuum ignored pg_type. A manual vacuum fixed it but since that table is abused by every client using libpq which is darn near everything I suspect it went bad in a hurry.
Question is, will this patch be backported to 9.3?
Thanks,
-Greg
On Fri, May 6, 2016 at 10:19 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > Tom Lane wrote:
> >> I think it's pretty obvious: autovacuum.c's rule for detecting whether
> >> some other worker is already processing table X is wrong when X is a
> >> shared table. I propose the attached patch.
>
> > Hmm, I have pretty much the same patch, except I added the flag to
> > struct autovac_table and have it populated by table_recheck_autovac.
> > Haven't tested this yet, which is why I hadn't posted it.
>
> Actually, I realized after looking at it that the new wi_tableshared
> field is unnecessary in my patch. The only important part is that
> knowledge of relisshared be available when we're looking for conflicting
> workers, and that is entirely local in do_autovacuum(). I'd started the
> patch by adding wi_tableshared, on the expectation that it would be
> necessary, but it ain't ...
OK, if I understand you correctly then that is what my patch does -- the
"sharedrel" flag in my patch is only inside do_autovacuum. Do you
already have a test rig for this? If not, I can set one up so that I
can push the patch hopefully later today.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Hi Greg, Greg Spiegelberg wrote: > We were bit a couple months ago by a very similar issue where autovacuum > ignored pg_type. A manual vacuum fixed it but since that table is abused > by every client using libpq which is darn near everything I suspect it went > bad in a hurry. Hmm. The current report is about shared catalogs (pg_shdepend and pg_database were reported as problematic) which pg_type is not, so I doubt that this bugfix will have any impact in a problem vacuuming pg_type. I'm interested in seeing a more detailed report from you about the pg_type vacuuming failure. > Question is, will this patch be backported to 9.3? Yes, in my opinion we would backpatch it, back to 9.1 even. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Tom Lane wrote: >> Actually, I realized after looking at it that the new wi_tableshared >> field is unnecessary in my patch. The only important part is that >> knowledge of relisshared be available when we're looking for conflicting >> workers, and that is entirely local in do_autovacuum(). I'd started the >> patch by adding wi_tableshared, on the expectation that it would be >> necessary, but it ain't ... > OK, if I understand you correctly then that is what my patch does -- the > "sharedrel" flag in my patch is only inside do_autovacuum. Do you > already have a test rig for this? I didn't attempt to replicate the problem report, if that's what you mean; I just tested it as far as running the regression tests. regards, tom lane
On Fri, May 6, 2016 at 10:33 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Hi Greg,
Greg Spiegelberg wrote:
> We were bit a couple months ago by a very similar issue where autovacuum
> ignored pg_type. A manual vacuum fixed it but since that table is abused
> by every client using libpq which is darn near everything I suspect it went
> bad in a hurry.
Hmm. The current report is about shared catalogs (pg_shdepend and
pg_database were reported as problematic) which pg_type is not, so I
doubt that this bugfix will have any impact in a problem vacuuming
pg_type.
I'm interested in seeing a more detailed report from you about the
pg_type vacuuming failure.
Not to steer the conversation elsewhere but the short of it is the system in question is a very active 9.3.5, well aware of more current versions, and in the early morning on Feb 8th before normal increase in activity and in the span of a few minutes queries went from normal sub-second responses to 30 sec to 5 minutes. Took hours to isolate because we never suspected pg_type until logging was turned up. We had to suspend operations and suspend pgbouncer to execute a manual VACUUM FULL. EXPLAIN ANALYZE before reported hundreds of thousands of rows before the vacuum and <2,000 after. All queries since have performed well since.
Stepping into the wayback machine, before VACUUM FULL:
Seq Scan on pg_type (cost=0.00..155188.24 rows=4788024 width=68) (actual time=0.007..40951.243 rows=1889 loops=1)
Total runtime: 40952.907 ms
Total runtime: 40952.907 ms
Attempted a simple VACUUM VERBOSE:
vacuum verbose pg_type;
INFO: vacuuming "pg_catalog.pg_type"
INFO: scanned index "pg_type_oid_index" to remove 860 row versions
DETAIL: CPU 0.98s/3.63u sec elapsed 96.52 sec.
INFO: scanned index "pg_type_typname_nsp_index" to remove 860 row versions
DETAIL: CPU 1.89s/7.14u sec elapsed 247.72 sec.
INFO: "pg_type": removed 860 row versions in 36 pages
DETAIL: CPU 0.00s/0.00u sec elapsed 0.04 sec.
INFO: index "pg_type_oid_index" now contains 2014 row versions in 25824
pages
DETAIL: 715 index row versions were removed.
25717 index pages have been deleted, 25714 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "pg_type_typname_nsp_index" now contains 2014 row versions in
35903 pages
DETAIL: 860 index row versions were removed.
35560 index pages have been deleted, 35488 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "pg_type": found 163 removable, 1909 nonremovable row versions in
72 out of 107308 pages
DETAIL: 36 dead row versions cannot be removed yet.
There were 3056 unused item pointers.
0 pages are entirely empty.
CPU 2.88s/10.79u sec elapsed 344.29 sec.
INFO: "pg_type": stopping truncate due to conflicting lock request
VACUUM
INFO: vacuuming "pg_catalog.pg_type"
INFO: scanned index "pg_type_oid_index" to remove 860 row versions
DETAIL: CPU 0.98s/3.63u sec elapsed 96.52 sec.
INFO: scanned index "pg_type_typname_nsp_index" to remove 860 row versions
DETAIL: CPU 1.89s/7.14u sec elapsed 247.72 sec.
INFO: "pg_type": removed 860 row versions in 36 pages
DETAIL: CPU 0.00s/0.00u sec elapsed 0.04 sec.
INFO: index "pg_type_oid_index" now contains 2014 row versions in 25824
pages
DETAIL: 715 index row versions were removed.
25717 index pages have been deleted, 25714 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "pg_type_typname_nsp_index" now contains 2014 row versions in
35903 pages
DETAIL: 860 index row versions were removed.
35560 index pages have been deleted, 35488 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "pg_type": found 163 removable, 1909 nonremovable row versions in
72 out of 107308 pages
DETAIL: 36 dead row versions cannot be removed yet.
There were 3056 unused item pointers.
0 pages are entirely empty.
CPU 2.88s/10.79u sec elapsed 344.29 sec.
INFO: "pg_type": stopping truncate due to conflicting lock request
VACUUM
The VACUUM FULL:
vacuum full verbose pg_type;
INFO: vacuuming "pg_catalog.pg_type"
INFO: "pg_type": found 42 removable, 1895 nonremovable row versions in 107308 pages
DETAIL: 4 dead row versions cannot be removed yet.
CPU 0.10s/0.11u sec elapsed 0.22 sec.
VACUUM
Time: 957.159 ms
INFO: vacuuming "pg_catalog.pg_type"
INFO: "pg_type": found 42 removable, 1895 nonremovable row versions in 107308 pages
DETAIL: 4 dead row versions cannot be removed yet.
CPU 0.10s/0.11u sec elapsed 0.22 sec.
VACUUM
Time: 957.159 ms
EXPLAIN ANALYZE then and now:
Seq Scan on pg_type (cost=0.00..680.17 rows=1917 width=754) (actual time=0.004..10.467 rows=1917 loops=1)
Total runtime: 10.999 ms
Autovacuum settings:
select name,setting from pg_settings where name ~ 'autov';
name | setting
-------------------------------------+-----------
autovacuum | on
autovacuum_analyze_scale_factor | 0.1
autovacuum_analyze_threshold | 50
autovacuum_freeze_max_age | 200000000
autovacuum_max_workers | 3
autovacuum_multixact_freeze_max_age | 400000000
autovacuum_naptime | 60
autovacuum_vacuum_cost_delay | 20
autovacuum_vacuum_cost_limit | -1
autovacuum_vacuum_scale_factor | 0.2
autovacuum_vacuum_threshold | 50
log_autovacuum_min_duration | -1
(12 rows)
I know, not much to go on.
> Question is, will this patch be backported to 9.3?
Yes, in my opinion we would backpatch it, back to 9.1 even.
Good news.
Thanks,
-Greg
Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > Tom Lane wrote: > >> Actually, I realized after looking at it that the new wi_tableshared > >> field is unnecessary in my patch. The only important part is that > >> knowledge of relisshared be available when we're looking for conflicting > >> workers, and that is entirely local in do_autovacuum(). I'd started the > >> patch by adding wi_tableshared, on the expectation that it would be > >> necessary, but it ain't ... > > > OK, if I understand you correctly then that is what my patch does -- the > > "sharedrel" flag in my patch is only inside do_autovacuum. Do you > > already have a test rig for this? > > I didn't attempt to replicate the problem report, if that's what you > mean; I just tested it as far as running the regression tests. OK. I'll do some more targeted testing later before pushing. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > These are all shared catalogs. There are others, so you may still see > > more. We got another report for pg_database > > https://www.postgresql.org/message-id/A9D40BB7-CFD6-46AF-A0A1-249F04878A2A%40amazon.com > > so I suppose there really is a bug. I don't know what's going on there. > > I think it's pretty obvious: autovacuum.c's rule for detecting whether > some other worker is already processing table X is wrong when X is a > shared table. I propose the attached patch. Now that I actually tried this, it turned out that this problem is not so simple. vacuum.c already has logic to use conditional acquire of the table-level lock, and if not available it skips the table: LOG: skipping vacuum of "pg_shdepend" --- lock not available so an autovacuum worker is never "stuck" behind another worker trying to vacuum the table. This code is already in 9.2. I suppose the only way for multiple workers to get stuck is the relatively new logic in lazy_truncate_heap that retries multiple times when AEL is not available. I haven't tried to replicate this yet. In other words, the patches proposed here would not fix the actual problem. Back to the drawing board, it seems. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Now that I actually tried this, it turned out that this problem is not > so simple. vacuum.c already has logic to use conditional acquire of the > table-level lock, and if not available it skips the table: > LOG: skipping vacuum of "pg_shdepend" --- lock not available > so an autovacuum worker is never "stuck" behind another worker trying to > vacuum the table. Hmm ... but then, how do we have the observed symptom of several workers concurrently trying to process the same shared catalog? Seems like all but one should fall out at this point. regards, tom lane
I wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> Now that I actually tried this, it turned out that this problem is not >> so simple. vacuum.c already has logic to use conditional acquire of the >> table-level lock, and if not available it skips the table: >> LOG: skipping vacuum of "pg_shdepend" --- lock not available >> so an autovacuum worker is never "stuck" behind another worker trying to >> vacuum the table. > Hmm ... but then, how do we have the observed symptom of several workers > concurrently trying to process the same shared catalog? Seems like all > but one should fall out at this point. Oh, see table_recheck_autovac: tab->at_vacoptions = VACOPT_SKIPTOAST | (dovacuum ? VACOPT_VACUUM : 0) | (doanalyze ? VACOPT_ANALYZE : 0) | (!wraparound ? VACOPT_NOWAIT : 0); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I'll bet they're all trying to do anti-wraparound vacuums. This would explain why the problem hasn't been observed often enough to have been fixed long since. regards, tom lane
I had the same problem with 9.1 and vacuuming several databases simultaneously. We just lived with it. Other tables inthe databases were far larger than pg_sharedepend, so it wasn't the most significant issue we faced. Bob Lunney > On May 9, 2016, at 6:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I wrote: >> Alvaro Herrera <alvherre@2ndquadrant.com> writes: >>> Now that I actually tried this, it turned out that this problem is not >>> so simple. vacuum.c already has logic to use conditional acquire of the >>> table-level lock, and if not available it skips the table: >>> LOG: skipping vacuum of "pg_shdepend" --- lock not available >>> so an autovacuum worker is never "stuck" behind another worker trying to >>> vacuum the table. > >> Hmm ... but then, how do we have the observed symptom of several workers >> concurrently trying to process the same shared catalog? Seems like all >> but one should fall out at this point. > > Oh, see table_recheck_autovac: > > tab->at_vacoptions = VACOPT_SKIPTOAST | > (dovacuum ? VACOPT_VACUUM : 0) | > (doanalyze ? VACOPT_ANALYZE : 0) | > (!wraparound ? VACOPT_NOWAIT : 0); > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > I'll bet they're all trying to do anti-wraparound vacuums. This would > explain why the problem hasn't been observed often enough to have been > fixed long since. > > regards, tom lane > > > -- > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-admin
Tom Lane wrote: > I'll bet they're all trying to do anti-wraparound vacuums. This would > explain why the problem hasn't been observed often enough to have been > fixed long since. Ah, that should have been obvious, shouldn't it. I'm able to reproduce the problem by moving relfrozenxid backwards for a shared catalog, and I can confirm that it works correctly with the patch proposed, so I have pushed it to all branches. (I only tested the problem scenario in 9.1 and master, since it's too much manual work). Thanks for the report! -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> > I'll bet they're all trying to do anti-wraparound vacuums. This would > explain why the problem hasn't been observed often enough to have been > fixed long since. > > regards, tom lane > You are right, sorry I didn't mention it sooner. With regards Ondřej