Thread: Big data INSERT optimization - ExclusiveLock on extension of the table
Hi, the problem I'm dealing with is long holding locks during extensions of table: process xxx still waiting for ExclusiveLock on extension of relation xxx of database xxx after 3000.158 ms My application is write intensive, in one round I need to insert about 1M rows. The general scheme of the process looks as follows: 1. rename table t01 to t02 2. insert into t02 1M rows in chunks for about 100k 3. from t01 (previously loaded table) insert data through stored procedure to b01 - this happens parallel in over a dozen sessions 4. truncate t01 Some data: PostgreSQL version 9.5 commit_delay | 0 | Sets the delay in microseconds between transaction commit and flushing WAL to disk. checkpoint_completion_target | 0.9 | Time spent flushing dirty buffers during checkpoint, as fraction of checkpoint interval maintenance_work_mem | 2GB | Sets the maximum memory to be used for maintenance operations. shared_buffers | 2GB wal_block_size | 8192 | Shows the block size in the write ahead log. wal_buffers | 16MB | Sets the number of disk-page buffers in shared memory for WAL. wal_compression | off | Compresses full-page writes written in WAL file. wal_keep_segments | 0 | Sets the number of WAL files held for standby servers. wal_level | minimal | Set the level of information written to the WAL. wal_log_hints | off | Writes full pages to WAL when first modified after a checkpoint, even for a non-critical modifications. wal_receiver_status_interval | 10s | Sets the maximum interval between WAL receiver status reports to the primary. wal_receiver_timeout | 1min | Sets the maximum wait time to receive data from the primary. wal_retrieve_retry_interval | 5s | Sets the time to wait before retrying to retrieve WAL after a failed attempt. wal_segment_size | 16MB | Shows the number of pages per write ahead log segment. wal_sender_timeout | 1min | Sets the maximum time to wait for WAL replication. wal_sync_method | fdatasync | Selects the method used for forcing WAL updates to disk. wal_writer_delay | 200ms | WAL writer sleep time between WAL flushes. work_mem | 32MB | Sets the maximum memory to be used for query workspaces. Checkpoints occur every ~ 30sec. Following the advices from this mailing list shared buffers size was changed from 12 to 2GB but nothing has changed. I'm not sure or my bottleneck is the I/O subsystem or there is anything else I can do to make it faster? What I came up with is (but I'm not sure if any of this makes sense): * change settings for bgwriter/wal? * make sure huge pages are in use by changing huge_pages parameter to on * replace truncate with DROP/CREATE command? * turning off fsync for loading? * increase commit_delay value? * move temporary tables to a different tablespace Your advice or suggestions will be much appreciated. -- View this message in context: http://postgresql.nabble.com/Big-data-INSERT-optimization-ExclusiveLock-on-extension-of-the-table-tp5916781.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
On 8/17/16 6:45 AM, pinker wrote: > 1. rename table t01 to t02 OK... > 2. insert into t02 1M rows in chunks for about 100k Why not just insert into t01?? > 3. from t01 (previously loaded table) insert data through stored procedure But you renamed t01 so it no longer exists??? > to b01 - this happens parallel in over a dozen sessions b01? > 4. truncate t01 Huh?? -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
> 1. rename table t01 to t02 OK... > 2. insert into t02 1M rows in chunks for about 100k Why not just insert into t01?? Because of cpu utilization, it speeds up when load is divided > 3. from t01 (previously loaded table) insert data through stored procedure But you renamed t01 so it no longer exists??? > to b01 - this happens parallel in over a dozen sessions b01? that's another table - permanent one > 4. truncate t01 Huh?? The data were inserted to permanent storage so the temporary table can be truncated and reused. Ok, maybe the process is not so important; let's say the table is loaded, then data are fetched and reloaded to other table through stored procedure (with it's logic), then the table is truncated and process goes again. The most important part is holding ExclusiveLocks ~ 1-5s. -- View this message in context: http://postgresql.nabble.com/Big-data-INSERT-optimization-ExclusiveLock-on-extension-of-the-table-tp5916781p5917136.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
On 8/18/16 5:26 PM, pinker wrote: > > >> 1. rename table t01 to t02 > OK... >> 2. insert into t02 1M rows in chunks for about 100k > Why not just insert into t01?? > > Because of cpu utilization, it speeds up when load is divided That still doesn't explain why you renamed t01 to t02. >> 3. from t01 (previously loaded table) insert data through stored procedure > But you renamed t01 so it no longer exists??? >> to b01 - this happens parallel in over a dozen sessions > b01? > > that's another table - permanent one > >> 4. truncate t01 > Huh?? > > The data were inserted to permanent storage so the temporary table can be > truncated and reused. Except t01 doesn't exist anymore... > Ok, maybe the process is not so important; let's say the table is loaded, > then data are fetched and reloaded to other table through stored procedure > (with it's logic), then the table is truncated and process goes again. The > most important part is holding ExclusiveLocks ~ 1-5s. The process is important though, because AFAIK the only thing that blocks the extension lock is another process extending the relation, vacuum, or something trying to record information about free space and an FSM page not existing. Is there something else doing inserts into the table at the same time? Is something doing a bunch of updates or deletes on pages that are newly inserted? BTW, there we improvements made to relation extension in 9.6, so if you have some way to test this on 9.6 it would be useful to know if it's still a problem or not. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
Re: Big data INSERT optimization - ExclusiveLock on extension of the table
From
Merlin Moncure
Date:
On Wed, Aug 17, 2016 at 6:45 AM, pinker <pinker@onet.eu> wrote: > Hi, > the problem I'm dealing with is long holding locks during extensions of > table: > process xxx still waiting for ExclusiveLock on extension of relation xxx of > database xxx after 3000.158 ms > My application is write intensive, in one round I need to insert about 1M > rows. The general scheme of the process looks as follows: > 1. rename table t01 to t02 > 2. insert into t02 1M rows in chunks for about 100k > 3. from t01 (previously loaded table) insert data through stored procedure > to b01 - this happens parallel in over a dozen sessions > 4. truncate t01 > > Some data: > PostgreSQL version 9.5 > > commit_delay | 0 > | Sets the delay in microseconds between transaction commit and flushing WAL > to disk. > checkpoint_completion_target | 0.9 > | Time spent flushing dirty buffers during checkpoint, as fraction of > checkpoint interval > maintenance_work_mem | 2GB > | Sets the maximum memory to be used for maintenance operations. > shared_buffers | 2GB > > wal_block_size | 8192 > | Shows the block size in the write ahead log. > wal_buffers | 16MB > | Sets the number of disk-page buffers in shared memory for WAL. > wal_compression | off > | Compresses full-page writes written in WAL file. > wal_keep_segments | 0 > | Sets the number of WAL files held for standby servers. > wal_level | minimal > | Set the level of information written to the WAL. > wal_log_hints | off > | Writes full pages to WAL when first modified after a checkpoint, even for > a non-critical modifications. > wal_receiver_status_interval | 10s > | Sets the maximum interval between WAL receiver status reports to the > primary. > wal_receiver_timeout | 1min > | Sets the maximum wait time to receive data from the primary. > wal_retrieve_retry_interval | 5s > | Sets the time to wait before retrying to retrieve WAL after a failed > attempt. > wal_segment_size | 16MB > | Shows the number of pages per write ahead log segment. > wal_sender_timeout | 1min > | Sets the maximum time to wait for WAL replication. > wal_sync_method | fdatasync > | Selects the method used for forcing WAL updates to disk. > wal_writer_delay | 200ms > | WAL writer sleep time between WAL flushes. > work_mem | 32MB > | Sets the maximum memory to be used for query workspaces. > > Checkpoints occur every ~ 30sec. > > Following the advices from this mailing list shared buffers size was changed > from 12 to 2GB but nothing has changed. > > I'm not sure or my bottleneck is the I/O subsystem or there is anything else > I can do to make it faster? What I came up with is (but I'm not sure if any > of this makes sense): > * change settings for bgwriter/wal? > * make sure huge pages are in use by changing huge_pages parameter to on > * replace truncate with DROP/CREATE command? > * turning off fsync for loading? > * increase commit_delay value? > * move temporary tables to a different tablespace > > Your advice or suggestions will be much appreciated. Here's how I do it: CREATE TABLE t_new (LIKE t INCLUDING ALL); <insert from n threads to t_new> BEGIN; DROP TABLE t; ALTER TABLE t_new RENAME to t; <recreate views etc as needed> COMMIT; If moving multiple tables in a single transaction I do a looped lock probe with NOWAIT to avoid deadlocks. Postgres deadlock resolution behavior is such that longer running processes seem to get killed first; in these scenarios it seems to almost always kill the one you *don't* want killed :-). This strategy will even work in complicated scenarios, for example partitioned tables; you can build up the partition on the side and swap in the the new one over the old one in a transaction. The above is all about avoiding locks. If your problem is i/o bound, here are some general strategies to improve insert performance: *) UNLOGGED tables (beware: no data on spurious restart) *) synchronous_commit =false *) ensure shared_buffers is high enough (too low and you get checkpoints) merlin