Thread: automatic vacuum on pg_statistic pg_toast area blocks all queries in hot standby

Hi,

We have a master/hot standy databases set up with streaming replication. The
pair are running:

PostgreSQL 9.1.11 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-3), 64-bit
Linux 2.6.32-431.11.2.el6.x86_64 #1 SMP Tue Mar 25 19:59:55 UTC 2014 x86_64
x86_64 x86_64 GNU/Linux

We just find the following situation:

on the master db server:
(1) at 16:58:22 automatic vacuum of table ""pg_toast.pg_toast_2619"": index
scans: 1

almost at the same time, queries at hot standby are blocked, all waiting for
a share lock on pg_toast_2619, for example:

process 26822 still waiting for AccessShareLock on relation 2840 of database
16414 after 1000.089 ms
process 26820 still waiting for AccessShareLock on relation 2840 of database
16414 after 1000.085 ms
process 26819 still waiting for AccessShareLock on relation 2840 of database
16414 after 1000.079 ms
and etc...


relation 2840 is actually pg_toast_2619 of pg_statistic:

DB=# select relname from pg_class where reltoastrelid=2840;
   relname
--------------
 pg_statistic
(1 row)


Any idea why?

Thanks a lot,

Pius




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/automatic-vacuum-on-pg-statistic-pg-toast-area-blocks-all-queries-in-hot-standby-tp5807143.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
Hello.

What happens on the master:
1. A typical request puts a AccessShareLock on the table.
2. Autovacuum puts cleans dead entries from the table. This is possible
because AccessShareLock does not conflict with ExclusiveLock.
http://www.postgresql.org/docs/9.1/static/explicit-locking.html
3. Autovacuum finish successful.
4. Unlock AccessShareLock.

What happens on the slave:
1. From WAL puts a AccessShareLock on table.
2. From WAL recovery Autovacuum result, but there is a conflict. WAL don't
recovery, waiting time seconds from parameter max_standby_streaming_delay
before canceling queries.
3. Queries waiting unlock AccessShareLock. Unlock does not occur because of
paragraph 2.
4. Queries canceling by statement_timeout or conflict recovery.
5. WAL recovery and unlock AccessShareLock.

If enable hot_standby_feedback then master also waiting slave time seconds
from max_standby_streaming_delay parameter.

Solution:
1. Disable hot_standby_feedback.
2. select txid_current() into res1
3. Waiting seconds.
4. select txid_current() into res2
5. Change postgresql.conf on master. vacuum_defer_cleanup_age = res2 - res1
+ reserve.

Good luck.



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/automatic-vacuum-on-pg-statistic-pg-toast-area-blocks-all-queries-in-hot-standby-tp5807143p5820168.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
Hi,

Thanks for your feedback. Someone finally replies to this thread. I am not
quite clear of your solution, so I need more info from you.

First of all, we do not enable hot_standby_feedback on SLAVE DB.

postgres=# show hot_standby_feedback;
 hot_standby_feedback
----------------------
 off
(1 row)

Secondly, where do you run "select txid_current() into res1" and "select
txid_current() into res2" ?

Even if you set MASTER vacuum_defer_cleanup_age to certain value, autovacuum
still can kick in anytime and may conflict query on the target table. So
setting vacuum_defer_cleanup_age to a value can really reduce the chance of
such situation?

Thanks a lot,

Pius



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/automatic-vacuum-on-pg-statistic-pg-toast-area-blocks-all-queries-in-hot-standby-tp5807143p5820171.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
> Secondly, where do you run "select txid_current() into res1" and "select
txid_current() into res2" ?
It does not matter where. Important to count the number of transactions that
takes place during max_standby_streaming_delay. Example res2-res1 = 25000,
my value vacuum_defer_cleanup_age = 40000

> Even if you set MASTER vacuum_defer_cleanup_age to certain value,
> autovacuum still can kick in anytime and may conflict query on the target
> table. So setting vacuum_defer_cleanup_age to a value can really reduce
> the chance of such situation?
Due to this parameter, at the time when Autovacuum clean the dead rows they
will be dead on the slave, and thus no queries will not use them. Therefore,
the conflict will not be.

> Moreover, in my post, the vacuum is on pg_statistic toast area, why did it
> block queries on SLAVE?
Autovacuum serves all tables, including a pg_catalog. This problem occurs on
any table. For example pg_class or my_table. Autovacuum remove rows in
moment AccessShareLock, this is row and unlock AccessShare receive to slave
with WAL. WAL not recorded on slave due to a conflict recovery. At the same
time queries are waiting for a lock on the slave.



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/automatic-vacuum-on-pg-statistic-pg-toast-area-blocks-all-queries-in-hot-standby-tp5807143p5820183.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
Good point, thanks a lot for your help.

Pius



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/automatic-vacuum-on-pg-statistic-pg-toast-area-blocks-all-queries-in-hot-standby-tp5807143p5820192.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.