Thread: [GENERAL] Autovacuum stuck for hours, blocking queries
I have a postgres 9.3.4 database table which (intermittently but reliably) gets into a state where queries get blocked indefinitely(at least for many hours) behind an automatic vacuum. I was under the impression that vacuum should never takeany blocking locks for any significant period of time, and so would like help resolving the issue. The process blocking the query is: postgres 21985 11304 98 Feb13 ? 1-14:20:52 postgres: autovacuum worker process <db_name> which is running the query autovacuum: VACUUM public.<table_name> The query being blocked is: ALTER TABLE <table_name> ALTER COLUMN <column_name> DROP DEFAULT (But I have seen this previously with other queries being blocked. I used the SQL in https://wiki.postgresql.org/wiki/Lock_Monitoringto determine which queries were blocked) Notes: - This database table is used for about 6 million row writes per day, all of which are then deleted at the end of the day. - We have a reasonable number of systems running identical databases and near identical load profiles, and this problemhas only been seen on two of those systems. It occurs intermittently but reliably (once every few days). It persistsuntil postgres is restarted (usually accomplished by a reboot) or the autovacuum thread is forcibly killed. Althoughit often happens again after a short period of time when the autovacuum worker process starts. - I don't have data on whether this vacuum ever finishes - it certainly takes longer than a few hours. - Our application kicks off a manual vacuum against this table each night which doesn't hit this problem, as far as we'reaware. - I have sometimes seen this behaviour when there was database corruption. In that case I see logs like the following.But it also occurs when there are no logs indicating database corruption (which is the purpose of this email) LOG: no left sibling (concurrent deletion?) in "<index_name>" ERROR: right sibling's left-link doesn't match: block 41 links to 127 instead of expected 79 in index "<index_name>" Things that I have tried: - Printing out backtrace a few times from connecting to that process with gdb (see below) - This indicates that the vacuum process isn't stuck, but I don't know what else it says - Reading the FAQ and release notes of other 9.3.x versions to see whether this looks like a known issue. I couldn't seeanything obvious apart from an issue about 0000 files (which didn't apply to my system - there was SQL in the releasenote to check). - Forcibly killing the autovacuum thread (obviously not a good solution to the problem!). This causes the ALTER TABLE queryto complete correctly. Questions: - Is this expected? I was under the impression that vacuum should never take any blocking locks for any significant periodof time. - Should I report this as a bug? - Should I change some config options to avoid this? (I know turning off autovacuum would avoid it, but that's not goodpractice) I've put the vacuum settings for my database below. - Should I avoid certain queries which autovacuum is happening? - Are there more diags that I should gather to help diagnose this issue? Thanks! Tim --System details-- Postgres 9.3.4 running on Linux x64 VMWare virtual machine. 24 CPUs of 2.7 GHz. 48GB RAM It is also running another application (so the database is not the only thing on the server). ---Backtrace (from gdb) --- #0 0x00000000007328c3 in hash_search_with_hash_value () #1 0x00000000006369ba in BufTableLookup () #2 0x0000000000639192 in ?? () #3 0x0000000000639ade in ReadBufferExtended () #4 0x00000000004906d9 in _bt_getbuf () #5 0x00000000004910e1 in _bt_pagedel () #6 0x00000000004924d1 in ?? () #7 0x00000000004926da in ?? () #8 0x00000000004928fa in btbulkdelete () #9 0x000000000072cdbd in FunctionCall4Coll () #10 0x000000000048cf58 in index_bulk_delete () #11 0x000000000057bb85 in ?? () #12 0x000000000057cfe9 in lazy_vacuum_rel () #13 0x000000000057af56 in ?? () #14 0x000000000057b28c in vacuum () #15 0x000000000060c8fa in ?? () #16 0x000000000060cd96 in ?? () #17 0x000000000060ce66 in StartAutoVacWorker () #18 0x0000000000617602 in ?? () #19 <signal handler called> #20 0x00007f9210c9d393 in __select_nocancel () from /lib64/libc.so.6 #21 0x0000000000618add in PostmasterMain () #22 0x00000000005b58d0 in main () #0 0x0000000000491107 in _bt_pagedel () #1 0x00000000004924d1 in ?? () #2 0x00000000004926da in ?? () #3 0x00000000004928fa in btbulkdelete () ... #0 0x000000000047a1ef in hash_any () #1 0x00000000007336e9 in tag_hash () #2 0x000000000063916c in ?? () #3 0x0000000000639ade in ReadBufferExtended () #4 0x00000000004906d9 in _bt_getbuf () #5 0x00000000004910e1 in _bt_pagedel () #6 0x00000000004924d1 in ?? () #7 0x00000000004926da in ?? () #8 0x00000000004928fa in btbulkdelete () ... ---Vacuum settings in postgresql.conf--- # - Cost-Based Vacuum Delay - vacuum_cost_delay = 10 # 0-100 milliseconds #vacuum_cost_page_hit = 1 # 0-10000 credits #vacuum_cost_page_miss = 10 # 0-10000 credits #vacuum_cost_page_dirty = 20 # 0-10000 credits vacuum_cost_limit = 200 # 1-10000 credits #vacuum_defer_cleanup_age = 0 # number of xacts by which cleanup is delayed #------------------------------------------------------------------------------ # AUTOVACUUM PARAMETERS #------------------------------------------------------------------------------ autovacuum = on # Enable autovacuum subprocess? 'on' # requires track_counts to also be on. #log_autovacuum_min_duration = -1 # -1 disables, 0 logs all actions and # their durations, > 0 logs only # actions running at least this number # of milliseconds. #autovacuum_max_workers = 3 # max number of autovacuum subprocesses # (change requires restart) #autovacuum_naptime = 1min # time between autovacuum runs #autovacuum_vacuum_threshold = 50 # min number of row updates before # vacuum #autovacuum_analyze_threshold = 50 # min number of row updates before # analyze #autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum #autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze #autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum # (change requires restart) #autovacuum_multixact_freeze_max_age = 400000000 # maximum Multixact age # before forced vacuum # (change requires restart) #autovacuum_vacuum_cost_delay = 20ms # default vacuum cost delay for # autovacuum, in milliseconds; # -1 means use vacuum_cost_delay #autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for # autovacuum, -1 means use # vacuum_cost_limit #vacuum_freeze_min_age = 50000000 #vacuum_freeze_table_age = 150000000 #vacuum_multixact_freeze_min_age = 5000000 #vacuum_multixact_freeze_table_age = 150000000
On 02/15/2017 09:30 AM, Tim Bellis wrote: > I have a postgres 9.3.4 database table which (intermittently but reliably) gets into a state where queries get blockedindefinitely (at least for many hours) behind an automatic vacuum. I was under the impression that vacuum should nevertake any blocking locks for any significant period of time, and so would like help resolving the issue. > > The process blocking the query is: > postgres 21985 11304 98 Feb13 ? 1-14:20:52 postgres: autovacuum worker process <db_name> > which is running the query > autovacuum: VACUUM public.<table_name> > > The query being blocked is: > ALTER TABLE <table_name> ALTER COLUMN <column_name> DROP DEFAULT > (But I have seen this previously with other queries being blocked. I used the SQL in https://wiki.postgresql.org/wiki/Lock_Monitoringto determine which queries were blocked) > Other ALTER TABLE queries? If so I believe this might apply: https://www.postgresql.org/docs/9.5/static/explicit-locking.html SHARE UPDATE EXCLUSIVE Conflicts with the SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE lock modes. This mode protects a table against concurrent schema changes and VACUUM runs. Acquired by VACUUM (without FULL), ANALYZE, CREATE INDEX CONCURRENTLY, and ALTER TABLE VALIDATE and other ALTER TABLE variants (for full details see ALTER TABLE). -- Adrian Klaver adrian.klaver@aklaver.com
On Wed, Feb 15, 2017 at 10:30 AM, Tim Bellis <Tim.Bellis@metaswitch.com> wrote: > I have a postgres 9.3.4 database table which (intermittently but reliably) gets into a state where queries get blockedindefinitely (at least for many hours) behind an automatic vacuum. I was under the impression that vacuum should nevertake any blocking locks for any significant period of time, and so would like help resolving the issue. > > The process blocking the query is: > postgres 21985 11304 98 Feb13 ? 1-14:20:52 postgres: autovacuum worker process <db_name> > which is running the query > autovacuum: VACUUM public.<table_name> > > The query being blocked is: > ALTER TABLE <table_name> ALTER COLUMN <column_name> DROP DEFAULT > (But I have seen this previously with other queries being blocked. I used the SQL in https://wiki.postgresql.org/wiki/Lock_Monitoringto determine which queries were blocked) Yup, there's a priority inversion in DDL, DML and maintenance (vacuum). Vacuum runs slow in the background. Normal update/delete/insert work fine because of the type of lock vacuum has. Then a ddl gets in line. It has to wait on the vacuum, and the vacuum, set to run super slow. And everybody waits. On vacuum. Basically it's bad practice to alter tables that are big and being worked on, because one way or another you're going to pay a price. I've used partitions for logging and auditing that autocreate and drop and vacuum, but they never get ddl done on them when they're getting updated and vice versa. There are also ways of making the table less likely / not likely / will not get vacuum automatically. If you're willing to schedule ddl and vacuum on your own you can then mix the two in relative safety.
On Wed, Feb 15, 2017 at 3:26 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Wed, Feb 15, 2017 at 10:30 AM, Tim Bellis <Tim.Bellis@metaswitch.com> wrote: >> I have a postgres 9.3.4 database table which (intermittently but reliably) gets into a state where queries get blockedindefinitely (at least for many hours) behind an automatic vacuum. I was under the impression that vacuum should nevertake any blocking locks for any significant period of time, and so would like help resolving the issue. >> >> The process blocking the query is: >> postgres 21985 11304 98 Feb13 ? 1-14:20:52 postgres: autovacuum worker process <db_name> >> which is running the query >> autovacuum: VACUUM public.<table_name> >> >> The query being blocked is: >> ALTER TABLE <table_name> ALTER COLUMN <column_name> DROP DEFAULT >> (But I have seen this previously with other queries being blocked. I used the SQL in https://wiki.postgresql.org/wiki/Lock_Monitoringto determine which queries were blocked) > > There are also ways of making the table less likely / not likely / > will not get vacuum automatically. If you're willing to schedule ddl > and vacuum on your own you can then mix the two in relative safety. Followup: https://www.postgresql.org/docs/9.3/static/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS
Scott Marlowe wrote: > Then a ddl gets in line. It has to wait on the vacuum, and the vacuum, > set to run super slow. And everybody waits. On vacuum. Note that this is normally not seen, because autovacuum cancels itself when somebody is blocked behind it -- until the table reaches the freeze_max_age limit, and then autovacuum is a for-wraparound one that is no longer terminated, and then everybody has to wait on it. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thank you all - that's really useful :-) The other query that gets blocked behind the vacuum is the below (truncated). This query is generated by jdbc in this method: org.postgresql.jdbc2.AbstractJdbc2DatabaseMetaData.getIndexInfo(AbstractJdbc2DatabaseMetaData.java:4023) Even though this is a read only query, is it also expected to be blocked behind the vacuum? Is there a way of getting indexesfor a table which won't be blocked behind a vacuum? Thank you all again, Tim SELECT NULL AS TABLE_CAT, n.nspname AS TABLE_SCHEM, ct.relname AS TABLE_NAME, NOT i.indisunique AS NON_UNIQUE, NULL ASINDEX_QUALIFIER, ci.relname AS INDEX_NAME, CASE i.indisclustered WHEN true THEN 1 ELSE CASE am.amname WHEN'hash' THEN 2 ELSE 3 END END AS TYPE, (i.keys).n AS ORDINAL_POSITION, pg_catalog.pg_get_indexdef(ci.oid,(i.keys).n, false) AS COLUMN_NAME, CASE am.amcanorder WHEN true THEN CASE i.indoption[(i.keys).n- 1] & 1 WHEN 1 THEN 'D' ELSE 'A' END ELSE NULL END AS ASC_OR_DESC, ci.reltuplesAS CARDINALITY, ci.relpages AS PAGES, pg_catalog.pg_get_expr(i.indpred, i.indrelid) AS FILTER_CONDITION FROMpg_catalog.pg_class ct JOIN pg_catalog.pg_namespace n ON (ct.relnamespace = n.oid) JOIN (SELECT i.indexrelid, i.indrelid,i.indoption, i.indisunique, i.indisclustered, i.indpred, i.indexprs, information_schema._pg_expandarray(i.indkey)AS keys FROM pg_catalog.pg_index i) i ON (ct.oid = i.ind
On 02/16/2017 08:45 AM, Tim Bellis wrote: > Thank you all - that's really useful :-) > > The other query that gets blocked behind the vacuum is the below (truncated). > > This query is generated by jdbc in this method: > org.postgresql.jdbc2.AbstractJdbc2DatabaseMetaData.getIndexInfo(AbstractJdbc2DatabaseMetaData.java:4023) Is JDBC doing anything else before issuing this? > > Even though this is a read only query, is it also expected to be blocked behind the vacuum? Is there a way of getting indexesfor a table which won't be blocked behind a vacuum? Table 13.2 here: https://www.postgresql.org/docs/9.5/static/explicit-locking.html shows the conflicts with SHARE UPDATE EXCLUSIVE(vacuum). pg_locks: https://www.postgresql.org/docs/9.5/static/view-pg-locks.html shows locks being held. So next time it happens I would take a look and see if you can work backwards from there. You could directly access the index information using: https://www.postgresql.org/docs/9.6/static/catalog-pg-index.html https://www.postgresql.org/docs/9.6/static/catalog-pg-class.html > > Thank you all again, > > Tim > > SELECT NULL AS TABLE_CAT, n.nspname AS TABLE_SCHEM, ct.relname AS TABLE_NAME, NOT i.indisunique AS NON_UNIQUE, NULLAS INDEX_QUALIFIER, ci.relname AS INDEX_NAME, CASE i.indisclustered WHEN true THEN 1 ELSE CASE am.amname WHEN 'hash' THEN 2 ELSE 3 END END AS TYPE, (i.keys).n AS ORDINAL_POSITION, pg_catalog.pg_get_indexdef(ci.oid,(i.keys).n, false) AS COLUMN_NAME, CASE am.amcanorder WHEN true THEN CASE i.indoption[(i.keys).n- 1] & 1 WHEN 1 THEN 'D' ELSE 'A' END ELSE NULL END AS ASC_OR_DESC, ci.reltuplesAS CARDINALITY, ci.relpages AS PAGES, pg_catalog.pg_get_expr(i.indpred, i.indrelid) AS FILTER_CONDITION FROMpg_catalog.pg_class ct JOIN pg_catalog.pg_namespace n ON (ct.relnamespace = n.oid) JOIN (SELECT i.indexrelid, i.indrelid,i.indoption, i.indisunique, i.indisclustered, i.indpred, i.indexprs, information_schema._pg_expandarray(i.indkey)AS keys FROM pg_catalog.pg_index i) i ON (ct.oid = i.ind This query is cut off so cannot say whether it is the issue or not. > -- Adrian Klaver adrian.klaver@aklaver.com
Tim Bellis <Tim.Bellis@metaswitch.com> writes: > Even though this is a read only query, is it also expected to be blocked behind the vacuum? Is there a way of getting indexesfor a table which won't be blocked behind a vacuum? It's not the vacuum that's blocking your read-only queries. It's the ALTER TABLE, which needs an exclusive lock in order to alter the table's schema. The ALTER is queued waiting for the vacuum to finish, and lesser lock requests queue up behind it. We could let the non-exclusive lock requests go ahead of the ALTER, but that would create a severe risk of the ALTER *never* getting to run. I'd kill the ALTER and figure on trying again after the vacuum is done. Also you might want to look into how you got into a situation where you have an anti-wraparound vacuum that's taking so long to run. You didn't do something silly like disable autovacuum did you? regards, tom lane
Tom Lane wrote: > Also you might want to look into how you got into a situation where > you have an anti-wraparound vacuum that's taking so long to run. If there are ALTERs running all the time, regular (non-anti-wraparound) vacuums would be canceled and never get a chance to run. Eventually, autovacuum decides it's had enough and doesn't cancel anymore, so everyone else gets stuck behind. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I have a postgres 9.3.4 database table which (intermittently but reliably) gets into a state where queries get blocked indefinitely (at least for many hours) behind an automatic vacuum. I was under the impression that vacuum should never take any blocking locks for any significant period of time, and so would like help resolving the issue.
The process blocking the query is:
postgres 21985 11304 98 Feb13 ? 1-14:20:52 postgres: autovacuum worker process <db_name>
which is running the query
autovacuum: VACUUM public.<table_name>
Notes:
- This database table is used for about 6 million row writes per day, all of which are then deleted at the end of the day.
- Our application kicks off a manual vacuum against this table each night which doesn't hit this problem, as far as we're aware.
Hi Tim, Am 2017-02-15 um 18:30 schrieb Tim Bellis: > I have a postgres 9.3.4 database table which (intermittently but reliably) > gets into a state where queries get blocked indefinitely > [..] > Notes: > - This database table is used for about 6 million row writes per day, > all of which are then deleted at the end of the day. If you are dumping the contents of the table anyways, why not use TRUNCATE instead of DELETE? It unlinks and recreates the table data files, requiring nearly zero IO and analyzing. Or even drop, and recreate the table with the correct new structure so you do not even need to ALTER TABLE ? I'm a bit skeptical of these suggestions since very competent people have already answered your post and did not come up with this... ;-) the only drawback I'm aware of is that TRUNCATE will immediatly free disk space on the OS level, so the table's space will not be "blocked". But probably the VACUUM you are currently performing will also eventually release the unused disk space, so this may or may not match the current behaviour. Best regards, -hannes
The DELETE operations only deletes rows from the previous day. It's possible that there have been rows added that day whichought not to be deleted, so TRUNCATE wouldn't work. But that was a helpful suggestion - thanks! Tim -----Original Message----- From: Hannes Erven [mailto:hannes@erven.at] Sent: 17 February 2017 11:47 To: pgsql-general@postgresql.org Cc: Tim Bellis <Tim.Bellis@metaswitch.com> Subject: Re: [GENERAL] Autovacuum stuck for hours, blocking queries Hi Tim, Am 2017-02-15 um 18:30 schrieb Tim Bellis: > I have a postgres 9.3.4 database table which (intermittently but > reliably) > gets into a state where queries get blocked indefinitely > [..] > Notes: > - This database table is used for about 6 million row writes per > day, > all of which are then deleted at the end of the day. If you are dumping the contents of the table anyways, why not use TRUNCATE instead of DELETE? It unlinks and recreates thetable data files, requiring nearly zero IO and analyzing. Or even drop, and recreate the table with the correct new structure so you do not even need to ALTER TABLE ? I'm a bit skeptical of these suggestions since very competent people have already answered your post and did not come upwith this... ;-) the only drawback I'm aware of is that TRUNCATE will immediatly free disk space on the OS level, so thetable's space will not be "blocked". But probably the VACUUM you are currently performing will also eventually releasethe unused disk space, so this may or may not match the current behaviour. Best regards, -hannes
From: Jeff Janes [mailto:jeff.janes@gmail.com]
Sent: 17 February 2017 02:59
To: Tim Bellis <Tim.Bellis@metaswitch.com>
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Autovacuum stuck for hours, blocking queries
On Wed, Feb 15, 2017 at 9:30 AM, Tim Bellis <Tim.Bellis@metaswitch.com> wrote:
I have a postgres 9.3.4 database table which (intermittently but reliably) gets into a state where queries get blocked indefinitely (at least for many hours) behind an automatic vacuum. I was under the impression that vacuum should never take any blocking locks for any significant period of time, and so would like help resolving the issue.
The process blocking the query is:
postgres 21985 11304 98 Feb13 ? 1-14:20:52 postgres: autovacuum worker process <db_name>
which is running the query
autovacuum: VACUUM public.<table_name>
Are you sure it doesn't really say:
autovacuum: VACUUM public.<table_name> (to prevent wraparound)
[Tim Bellis] It doesn’t. I was using the query from https://wiki.postgresql.org/wiki/Lock_Monitoring and looking at the ‘current_statement_in_blocking_process’ column. Is there a different query I should be using?
If it doesn't include the "to prevent wraparound", then it should sacrifice itself as soon as it realizes it is blocking something else. If it is not doing that, something is wrong.
If it does say "(to prevent wraparound)", then see all the other comments on this thread.
Notes:
- This database table is used for about 6 million row writes per day, all of which are then deleted at the end of the day.
How many transactions do those 6 million writes comprise?
[Tim Bellis] I’m not sure - I think it’s 6 million transactions, but there might be some batching going on that I’m unaware of. What would this affect? (I can dig in if necessary)
(I might have been slightly wrong in characterising the exact behaviour; the table might be cleared every hour rather than every day, but there are still 6 million writes per day)
- Our application kicks off a manual vacuum against this table each night which doesn't hit this problem, as far as we're aware.
If it were having problems, would you be aware of it? Do you see in the log files the completion of the vacuum? Or look in pg_stat_user_tables to see when last_vacuum was. If it runs every night and succeeds, it is hard to see why wraparound would ever kick in. Unless you are hitting 150,000,000 transactions in a day.
[Tim Bellis] I shall investigate this.
Cheers,
Jeff
Hi Tim, Am 2017-02-17 um 17:02 schrieb Tim Bellis: > The DELETE operations only deletes rows from the > previous day. It's possible that there have been rows > added that day which ought not to be deleted, so > TRUNCATE wouldn't work. OK, then I'll try two other suggestions: - use table partitioning ( https://www.postgresql.org/docs/9.3/static/ddl-partitioning.html ) - if the number of rows you need to keep is small, you could try something like this: LOCK TABLE yourtable <a suitable LOCKMODE>; CREATE TEMPORARY TABLE keep AS SELECT * FROM yourtable WHERE <keep>; TRUNCATE yourtable; INSERT INTO yourtable SELECT * from keep; COMMIT; Best regards, -hannes > -----Original Message----- > From: Hannes Erven [mailto:hannes@erven.at] > Sent: 17 February 2017 11:47 > To: pgsql-general@postgresql.org > Cc: Tim Bellis <Tim.Bellis@metaswitch.com> > Subject: Re: [GENERAL] Autovacuum stuck for hours, blocking queries > > Hi Tim, > > > > Am 2017-02-15 um 18:30 schrieb Tim Bellis: >> I have a postgres 9.3.4 database table which (intermittently but >> reliably) > > gets into a state where queries get blocked indefinitely > [..] >> Notes: >> - This database table is used for about 6 million row writes per >> day, > > all of which are then deleted at the end of the day. > > If you are dumping the contents of the table anyways, why not use TRUNCATE instead of DELETE? It unlinks and recreatesthe table data files, requiring nearly zero IO and analyzing. > Or even drop, and recreate the table with the correct new structure so you do not even need to ALTER TABLE ? > > I'm a bit skeptical of these suggestions since very competent people have already answered your post and did not come upwith this... ;-) the only drawback I'm aware of is that TRUNCATE will immediatly free disk space on the OS level, so thetable's space will not be "blocked". But probably the VACUUM you are currently performing will also eventually releasethe unused disk space, so this may or may not match the current behaviour. > > > Best regards, > > -hannes > > >
LOCK TABLE yourtable <a suitable LOCKMODE>; CREATE TEMPORARY TABLE keep AS SELECT * FROM yourtable WHERE <keep>; TRUNCATE yourtable; INSERT INTO yourtable SELECT * from keep; COMMIT; === the above snippet assumes truncate in PG can be in a transaction. In other words, while truncate by itself is atomic, it can't be rolled back. So in the above case, if "INSERT INTO yourtable SELECT * from keep;" and we rollback, will it rollback yourtable.
On Fri, Feb 17, 2017 at 1:38 PM, Rakesh Kumar <rakeshkumar464@outlook.com> wrote: > LOCK TABLE yourtable <a suitable LOCKMODE>; > CREATE TEMPORARY TABLE keep AS SELECT * FROM yourtable WHERE <keep>; > TRUNCATE yourtable; > INSERT INTO yourtable SELECT * from keep; > COMMIT; > === > the above snippet assumes truncate in PG can be in a transaction. In other words, while truncate by itself > is atomic, it can't be rolled back. So in the above case, if "INSERT INTO yourtable SELECT * from keep;" and > we rollback, will it rollback yourtable. Yes it can. Truncate has been rollbackable for a while now. begin; create table insert into table truncate old table . something goes wrong . rollback; Unless I misunderstand your meaning.
On Sat, Feb 18, 2017 at 1:32 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > Yes it can. Truncate has been rollbackable for a while now. Per the docs: https://www.postgresql.org/docs/9.6/static/sql-truncate.html "TRUNCATE is transaction-safe with respect to the data in the tables: the truncation will be safely rolled back if the surrounding transaction does not commit." In short yes a transaction doing a truncate can be rollbacked. -- Michael
On 02/17/2017 11:54 PM, Michael Paquier wrote: > On Sat, Feb 18, 2017 at 1:32 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: >> Yes it can. Truncate has been rollbackable for a while now. > > Per the docs: > https://www.postgresql.org/docs/9.6/static/sql-truncate.html > "TRUNCATE is transaction-safe with respect to the data in the tables: > the truncation will be safely rolled back if the surrounding > transaction does not commit." > In short yes a transaction doing a truncate can be rollbacked. > I think the part that confuses people into thinking it can not be rollbacked is this: "TRUNCATE is not MVCC-safe. After truncation, the table will appear empty to concurrent transactions, if they are using a snapshot taken before the truncation occurred. See Section 13.5 for more details." -- Adrian Klaver adrian.klaver@aklaver.com
On Thursday, February 16, 2017, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Tim Bellis <Tim.Bellis@metaswitch.com> writes:
> Even though this is a read only query, is it also expected to be blocked behind the vacuum? Is there a way of getting indexes for a table which won't be blocked behind a vacuum?
It's not the vacuum that's blocking your read-only queries. It's the
ALTER TABLE, which needs an exclusive lock in order to alter the table's
schema. The ALTER is queued waiting for the vacuum to finish, and lesser
lock requests queue up behind it. We could let the non-exclusive lock
requests go ahead of the ALTER, but that would create a severe risk of the
ALTER *never* getting to run.
I'd kill the ALTER and figure on trying again after the vacuum is done.
-----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: 16 February 2017 22:40 To: Tim Bellis <Tim.Bellis@metaswitch.com> Cc: Adrian Klaver <adrian.klaver@aklaver.com>; pgsql-general@postgresql.org; Alvaro Herrera <alvherre@2ndquadrant.com>; ScottMarlowe <scott.marlowe@gmail.com> Subject: Re: [GENERAL] Autovacuum stuck for hours, blocking queries Tim Bellis <Tim.Bellis@metaswitch.com> writes: > > Even though this is a read only query, is it also expected to be blocked behind the vacuum? Is there a way of gettingindexes for a table which won't be blocked behind a vacuum? > It's not the vacuum that's blocking your read-only queries. It's the ALTER TABLE, which needs an exclusive lock in orderto alter the table's schema. The ALTER is queued waiting for the vacuum to finish, and lesser lock requests queue upbehind it. We could let the non-exclusive lock requests go ahead of the ALTER, but that would create a severe risk ofthe ALTER *never* getting to run. The lock monitoring query (https://wiki.postgresql.org/wiki/Lock_Monitoring) said that the blocking_pid and the current_statement_in_blocking_processfor the queries reading the index data was the autovacuum, not the ALTER. Am I readingthe output wrong? Does it not correctly represent the chain of locks? > I'd kill the ALTER and figure on trying again after the vacuum is done. > Also you might want to look into how you got into a situation where you have an anti-wraparound vacuum that's taking solong to run. > You didn't do something silly like disable autovacuum did you? No, autovacuum is on (and this is an autovacuum which is in progress). But I will look at why I'm getting a blocking autovacuum. > regards, tom lane
On Thursday, February 16, 2017, Tom Lane <tgl@sss.pgh.pa.us> wrote:Tim Bellis <Tim.Bellis@metaswitch.com> writes:
> Even though this is a read only query, is it also expected to be blocked behind the vacuum? Is there a way of getting indexes for a table which won't be blocked behind a vacuum?
It's not the vacuum that's blocking your read-only queries. It's the
ALTER TABLE, which needs an exclusive lock in order to alter the table's
schema. The ALTER is queued waiting for the vacuum to finish, and lesser
lock requests queue up behind it. We could let the non-exclusive lock
requests go ahead of the ALTER, but that would create a severe risk of the
ALTER *never* getting to run.
I'd kill the ALTER and figure on trying again after the vacuum is done.I've been drilled by this and similar lock stacking issues enough times to make me near 100% sure deferring the ALTER would be the better choice
From: Jeff Janes [mailto:jeff.janes@gmail.com]
Sent: 17 February 2017 02:59
To: Tim Bellis <Tim.Bellis@metaswitch.com>
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Autovacuum stuck for hours, blocking queries
On Wed, Feb 15, 2017 at 9:30 AM, Tim Bellis <Tim.Bellis@metaswitch.com> wrote:
I have a postgres 9.3.4 database table which (intermittently but reliably) gets into a state where queries get blocked indefinitely (at least for many hours) behind an automatic vacuum. I was under the impression that vacuum should never take any blocking locks for any significant period of time, and so would like help resolving the issue.
The process blocking the query is:
postgres 21985 11304 98 Feb13 ? 1-14:20:52 postgres: autovacuum worker process <db_name>
which is running the query
autovacuum: VACUUM public.<table_name>
Are you sure it doesn't really say:
autovacuum: VACUUM public.<table_name> (to prevent wraparound)
[Tim Bellis] It doesn’t. I was using the query from https://wiki.postgresql.org/
wiki/Lock_Monitoring and looking at the ‘current_statement_in_ blocking_process’ column. Is there a different query I should be using?
On Wed, 22 Feb 2017 13:19:11 -0800 Jeff Janes <jeff.janes@gmail.com> wrote: > On Mon, Feb 20, 2017 at 5:40 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > > > > On Thursday, February 16, 2017, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > >> Tim Bellis <Tim.Bellis@metaswitch.com> writes: > >> > Even though this is a read only query, is it also expected to be > >> blocked behind the vacuum? Is there a way of getting indexes for a table > >> which won't be blocked behind a vacuum? > >> > >> It's not the vacuum that's blocking your read-only queries. It's the > >> ALTER TABLE, which needs an exclusive lock in order to alter the table's > >> schema. The ALTER is queued waiting for the vacuum to finish, and lesser > >> lock requests queue up behind it. We could let the non-exclusive lock > >> requests go ahead of the ALTER, but that would create a severe risk of the > >> ALTER *never* getting to run. > >> > >> I'd kill the ALTER and figure on trying again after the vacuum is done. > >> > >> > > I've been drilled by this and similar lock stacking issues enough times to > > make me near 100% sure deferring the ALTER would be the better choice > > > > > This seems like a rather one-sided observation. How could you know how > often the unimplemented behavior also would have "drilled" you, since it is > unimplemented? > > There was a proposal for a "LOCK TABLE...DEFERABLE" which would allow other > requestors jump the queue if they were compatible with the held lock. If > that is implemented, then you would just manually lock the table deferably > before invoking the ALTER TABLE command, if that is the behavior you wanted > (but it wouldn't work for things that can't be run in transactions) This seems redundant to me. We already have LOCK ... NOWAIT. It's fairly trivial to write an upgrade script that uses LOCK to explicitly lock tables that it's going to ALTER, then busy-waits if the lock is not immediately grantable. The fact that so many ORMs and similar tools don't take advantage of that functionality is rather depressing. In my experience, I've also seen heavily loaded systems that this wouldn't work on, essentially because there is _always_ _some_ lock on every table. This is a case where experienced developers are required to take some extra time to coordinate their upgrades to work around the high load. But the proposed LOCK TABLE ... DEFERRABLE wouldn't help there either, because the ALTER would be deferred indefinitely. Personally, I feel like the existing behavior is preferrable. Software teams need to take the time to understand the locking implications of their actions or they'll have nothing but trouble anyway. As I've seen time and again: writing an application that handles low load and low concurrency is fairly trivial, but scaling that app up to high load and/or high concurrency generally sorts out the truely brilliant developers from the merely average. -- Bill Moran <wmoran@potentialtech.com>
On Wed, Feb 22, 2017 at 3:19 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Mon, Feb 20, 2017 at 5:40 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> On Thursday, February 16, 2017, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> >>> Tim Bellis <Tim.Bellis@metaswitch.com> writes: >>> > Even though this is a read only query, is it also expected to be >>> > blocked behind the vacuum? Is there a way of getting indexes for a table >>> > which won't be blocked behind a vacuum? >>> >>> It's not the vacuum that's blocking your read-only queries. It's the >>> ALTER TABLE, which needs an exclusive lock in order to alter the table's >>> schema. The ALTER is queued waiting for the vacuum to finish, and lesser >>> lock requests queue up behind it. We could let the non-exclusive lock >>> requests go ahead of the ALTER, but that would create a severe risk of >>> the >>> ALTER *never* getting to run. >>> >>> I'd kill the ALTER and figure on trying again after the vacuum is done. >> >> I've been drilled by this and similar lock stacking issues enough times to >> make me near 100% sure deferring the ALTER would be the better choice > > This seems like a rather one-sided observation. How could you know how > often the unimplemented behavior also would have "drilled" you, since it is > unimplemented? Well, that I can't really say, but at least in my case ALTER TABLE in the face of concurrent application activity can wait but locking tables for reading for an indefinite period will rapidly destabilize the system. An example of this usage is replacing partitions on a warehouse table. About half of my P1s over the last 12 months ago are relating to locking problem of some kind. So I end up during workarounds such as issuing "LOCK...NOWAIT" in a sleep loop :( or application restructuring, especially minimizing use of TRUNCATE. I do think instrumentation around locking behaviors would be helpful. Allowing (optionally) waiters to leapfrog in if they can clear would be wonderful as would being able to specify maximum wait timeouts inside a transaction. FWIW, I'm not sure this behavior makes sense attached to LOCK, I'd rather see them attached generally to SET TRANSACTION -- my 0.02$ (talk is cheap, etc). merlin
This seems redundant to me.On Wed, 22 Feb 2017 13:19:11 -0800
Jeff Janes <jeff.janes@gmail.com> wrote:
> On Mon, Feb 20, 2017 at 5:40 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> >
> > On Thursday, February 16, 2017, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> >> Tim Bellis <Tim.Bellis@metaswitch.com> writes:
> >> > Even though this is a read only query, is it also expected to be
> >> blocked behind the vacuum? Is there a way of getting indexes for a table
> >> which won't be blocked behind a vacuum?
> >>
> >> It's not the vacuum that's blocking your read-only queries. It's the
> >> ALTER TABLE, which needs an exclusive lock in order to alter the table's
> >> schema. The ALTER is queued waiting for the vacuum to finish, and lesser
> >> lock requests queue up behind it. We could let the non-exclusive lock
> >> requests go ahead of the ALTER, but that would create a severe risk of the
> >> ALTER *never* getting to run.
> >>
> >> I'd kill the ALTER and figure on trying again after the vacuum is done.
> >>
> >>
> > I've been drilled by this and similar lock stacking issues enough times to
> > make me near 100% sure deferring the ALTER would be the better choice
> >
> >
> This seems like a rather one-sided observation. How could you know how
> often the unimplemented behavior also would have "drilled" you, since it is
> unimplemented?
>
> There was a proposal for a "LOCK TABLE...DEFERABLE" which would allow other
> requestors jump the queue if they were compatible with the held lock. If
> that is implemented, then you would just manually lock the table deferably
> before invoking the ALTER TABLE command, if that is the behavior you wanted
> (but it wouldn't work for things that can't be run in transactions)
We already have LOCK ... NOWAIT. It's fairly trivial to write an upgrade
script that uses LOCK to explicitly lock tables that it's going to ALTER,
then busy-waits if the lock is not immediately grantable.
The fact that so many ORMs and similar tools don't take advantage of that
functionality is rather depressing.
In my experience, I've also seen heavily loaded systems that this wouldn't
work on, essentially because there is _always_ _some_ lock on every table.
This is a case where experienced developers are required to take some
extra time to coordinate their upgrades to work around the high load. But
the proposed LOCK TABLE ... DEFERRABLE wouldn't help there either, because
the ALTER would be deferred indefinitely.
Personally, I feel like the existing behavior is preferrable. Software
teams need to take the time to understand the locking implications of their
actions or they'll have nothing but trouble anyway.
As I've seen time and again: writing an application that handles low load
and low concurrency is fairly trivial, but scaling that app up to high
load and/or high concurrency generally sorts out the truely brilliant
developers from the merely average.