Postgres Query Plan Live Lock - Mailing list pgsql-performance

From Pweaver (Paul Weaver)
Subject Postgres Query Plan Live Lock
Date
Msg-id CAFTGa=mvkpL_mOi4NT4LbB3E1QnqVfQ+QQfgSJBfZ1WHujnvpA@mail.gmail.com
Whole thread Raw
Responses Re: Postgres Query Plan Live Lock
Postgres Query Plan Live Lock
Re: Postgres Query Plan Live Lock
List pgsql-performance
We have been running into a (live lock?) issue on our production Postgres instance causing queries referencing a particular table to become extremely slow and our application to lock up.

This tends to occur on a particular table that gets a lot of queries against it after a large number of deletes. When this happens, the following symptoms occur when queries referencing that table are run (even it we stop the deleting):

SELECT * FROM table_name LIMIT 10;  -- takes ~45 seconds to complete
EXPLAIN SELECT * FROM table_name LIMIT 10;  -- takes ~45 seconds to complete the explain query, the query plan looks reasonable
EXPLAIN SELECT * FROM table_name LIMIT 10;  -- takes ~45 seconds to complete the explain analyze query, query plan looks reasonable, timing stats says query took sub millisecond time to complete

SELECT * FROM another_table LIMIT 10; -- takes sub millisecond time
EXPLAIN * FROM another_table LIMIT 10; -- takes sub millisecond time, query plan looks reasonable

This behavior only stops and the queries go back to taking sub millisecond time if we take the application issuing the SELECTs offline and wait for the active queries to finish (or terminate them).

There is not a particularly large load on the database machine at the time, neither are there a particularly large number of wal logs being written (although there is a burst of wal log writes immediately after the queue is cleared).

table_name stats:
~ 400,000,000 rows
We are deleting 10,000,000s of rows in 100,000 row increments over a few days time prior/during this slowdown.
Simultaneously a web app is querying this table continuously.

table_name has 4 btree indexes on it (one of which is set to CLUSTER) and one foreign key constraint.

The obvious workaround is to not delete so much data on the table on our production database, but we would like to figure out why Postgres is live locking this table. Do you have any ideas why this is happening and how to prevent it while still doing mass deletes on the table?

-------------------------------------------------------------------------

System information:

Postgres Version - PostgreSQL 9.2.3 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
OS - Ubuntu 12.04 LTS

Autovacuum is on.

--------------------------------------------------------------------------

SELECT name, current_setting(name), source
  FROM pg_settings
  WHERE source NOT IN ('default');
             name             |             current_setting              |        source        
------------------------------+------------------------------------------+----------------------
 application_name             | psql                                     | client
 archive_command              | /bin/true                                | configuration file
 archive_mode                 | on                                       | configuration file
 bytea_output                 | escape                                   | configuration file
 checkpoint_completion_target | 0.9                                      | configuration file
 checkpoint_segments          | 24                                       | configuration file
 client_encoding              | UTF8                                     | session
 DateStyle                    | ISO, MDY                                 | configuration file
 default_text_search_config   | pg_catalog.english                       | configuration file
 effective_cache_size         | 54GB                                     | configuration file
 effective_io_concurrency     | 2                                        | configuration file
 listen_addresses             | *                                        | configuration file
 log_checkpoints              | on                                       | configuration file
 log_connections              | on                                       | configuration file
 log_disconnections           | on                                       | configuration file
 log_hostname                 | on                                       | configuration file
 log_line_prefix              | %t                                       | configuration file
 logging_collector            | on                                       | configuration file
 maintenance_work_mem         | 256MB                                    | configuration file
 max_connections              | 600                                      | configuration file
 max_stack_depth              | 2MB                                      | environment variable
 max_wal_senders              | 3                                        | configuration file
 random_page_cost             | 1.75                                     | configuration file
 server_encoding              | UTF8                                     | override
 shared_buffers               | 12GB                                     | configuration file
 synchronous_commit           | off                                      | configuration file
 tcp_keepalives_idle          | 180                                      | configuration file
 track_activity_query_size    | 8192                                     | configuration file
 transaction_deferrable       | off                                      | override
 transaction_isolation        | read committed                           | override
 transaction_read_only        | off                                      | override
 vacuum_freeze_min_age        | 20000000                                 | configuration file
 vacuum_freeze_table_age      | 800000000                                | configuration file
 wal_buffers                  | 16MB                                     | override
 wal_keep_segments            | 16384                                    | configuration file
 wal_level                    | hot_standby                              | configuration file
 wal_writer_delay             | 330ms                                    | configuration file
 work_mem                     | 512MB                                    | configuration file


--
Thank You,

pgsql-performance by date:

Previous
From: Artur Zając CFI
Date:
Subject: Planner estimates and VACUUM/autovacuum
Next
From: Claudio Freire
Date:
Subject: Re: Re: Increased memory utilization by pgsql backend after upgrade from 9.1.3 to 9.2.6