vacuum analyze hanging - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | vacuum analyze hanging |
Date | |
Msg-id | s33441f6.067@gwmta.wicourts.gov Whole thread Raw |
List | pgsql-hackers |
We have what may be a bug in beta2. We have two databases running beta2, one on Linux and one on Windows. The hardware is identical. The configurate files are identical. They are being fed identical streams of data modifications (primarily inserts, very few deletes). We've been running this setup for a few days. The performance has matched very closely -- until now. We are running the new autovacuum with default parameters. We have run an explicit vacuum analyze verbose on the full database (as the database owner, not the superuser) twice. These went fine, and didn't show any numbers that led us to believe we needed to adjust the defaults for autovacuum. About an hour ago we started an explicit vacuum analyze (without verbose this time). The Windows box finished in about one minute, and the Linux box -- I was going to say it was still running, but it completed as I was typing this, after running over an hour. While it was "stuck", the message about skipping pg_database had not yet appeared (the other five message like it had appeared). The pg_database message appeared very close to the time of completion. The normal processing continued while the vacuums were run, in all cases. This remained static during the "stuck" state: postgres=# select * from pg_locks where transaction = 8536365; locktype | database | relation | page | tuple | transactionid| classid | objid | objsubid | transaction | pid | mode | granted ---------------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+-------+--------------------------+---------relation | 30793 | 2666 | | | | | | | 8536365 | 31798 | RowExclusiveLock | trelation | 30793 | 2666 | | | | | | | 8536365 | 31798 | ShareUpdateExclusiveLock | trelation | 30793 | 2664 | | | | | | | 8536365 | 31798 | ShareUpdateExclusiveLock | trelation | 30793 | 2665 | | | | | | | 8536365 | 31798 | ShareUpdateExclusiveLock | ttransactionid| | | | | 8536365 | | | | 8536365 | 31798 | ExclusiveLock | trelation | 30793 | 2606 | | | | | | | 8536365 | 31798 | ShareUpdateExclusiveLock | trelation | 30793 | 2667 | | | | | | | 8536365 | 31798 | ShareUpdateExclusiveLock | t (7 rows) The only other locks were on other connections and were very short-lived. Time accumulated on the VACUUM process also remained constant (at "3:04"), while the other connections slowly increased (from "0:41") while I watched: linpost:/opt/ccap/dtr # ps aux|grep ^postgres postgres 31603 0.0 0.0 170968 4952 ? S 12:37 0:00 /usr/local/pgsql/bin/postmaster -D /var/pgsql/data postgres 31605 0.0 1.9 171196 165672 ? S 12:37 0:03 postgres: writer process postgres 31606 0.2 0.0 7240 2788 ? S 12:37 0:39 postgres: stats buffer process postgres 31607 0.2 0.0 6500 2020 ? S 12:37 0:35 postgres: stats collector process postgres 31614 0.5 1.6 171992 135064 ? S 12:39 1:23 postgres: dtr dtr 127.0.0.1(33384) idle postgres 31615 0.5 1.6 172008 133152 ? S 12:39 1:23 postgres: dtr dtr 127.0.0.1(33386) idle postgres 31616 0.5 1.6 172008 133264 ? S 12:39 1:22 postgres: dtr dtr 127.0.0.1(33388) idle postgres 31617 0.5 1.6 172008 132964 ? S 12:39 1:23 postgres: dtr dtr 127.0.0.1(33390) idle postgres 31618 0.5 1.6 172008 133168 ? S 12:39 1:23 postgres: dtr dtr 127.0.0.1(33392) idle postgres 31619 0.5 1.6 171992 133952 ? S 12:39 1:24 postgres: dtr dtr 127.0.0.1(33394) idle postgres 31798 1.5 2.0 189036 167752 ? S 13:42 3:04 postgres: dtr dtr 165.219.88.77(2313) VACUUM Upon completion of the VACUUM, that 3:04 went to 3:21, without any other commands being issued on the connection. Here is what is not commented out in the postgresql.conf file: listen_addresses = '*' # what IP interface(s) to listen on; max_connections = 100 # note: increasing max_connections costs shared_buffers = 20000 # min 16 or max_connections*2, 8KB each work_mem = 10240 # min 64, size in KB wal_buffers = 20 # min 4, 8KB each effective_cache_size = 393216 # typically 8KB each random_page_cost = 2 # units are one sequential page fetch stats_start_collector = on stats_row_level = on autovacuum = true # enable autovacuum subprocess? lc_messages = 'C' # locale for system error message lc_monetary = 'C' # locale for monetary formatting lc_numeric = 'C' # locale for number formatting lc_time = 'C' # locale for time formatting sql_inheritance = off Since it didn't interfere with our processing, and it did eventually complete, we're not looking for any assistance. We just thought you might be interested. If there's anything else I can give you that might be helpful, just let me know. -Kevin
pgsql-hackers by date: