Thread: Big data INSERT optimization - ExclusiveLock on extension of the table

Big data INSERT optimization - ExclusiveLock on extension of the table

From

pinker

Date:

17 August 2016, 14:45:11

Hi,
the problem I'm dealing with is long holding locks during extensions of
table:
process xxx still waiting for ExclusiveLock on extension of relation xxx of
database xxx after 3000.158 ms
My application is write intensive, in one round I need to insert about 1M
rows. The general scheme of the process looks as follows:
1. rename table t01 to t02
2. insert into t02 1M rows in chunks for about 100k
3. from t01 (previously loaded table) insert data through stored procedure
to b01 - this happens parallel in over a dozen sessions
4. truncate t01

Some data:
PostgreSQL version 9.5

commit_delay | 0
| Sets the delay in microseconds between transaction commit and flushing WAL
to disk.
checkpoint_completion_target | 0.9
| Time spent flushing dirty buffers during checkpoint, as fraction of
checkpoint interval
maintenance_work_mem | 2GB
| Sets the maximum memory to be used for maintenance operations.
shared_buffers | 2GB

wal_block_size | 8192
| Shows the block size in the write ahead log.
wal_buffers | 16MB
| Sets the number of disk-page buffers in shared memory for WAL.
wal_compression | off
| Compresses full-page writes written in WAL file.
wal_keep_segments | 0
| Sets the number of WAL files held for standby servers.
wal_level | minimal
| Set the level of information written to the WAL.
wal_log_hints | off
| Writes full pages to WAL when first modified after a checkpoint, even for
a non-critical modifications.
wal_receiver_status_interval | 10s
| Sets the maximum interval between WAL receiver status reports to the
primary.
wal_receiver_timeout | 1min
| Sets the maximum wait time to receive data from the primary.
wal_retrieve_retry_interval | 5s
| Sets the time to wait before retrying to retrieve WAL after a failed
attempt.
wal_segment_size | 16MB
| Shows the number of pages per write ahead log segment.
wal_sender_timeout | 1min
| Sets the maximum time to wait for WAL replication.
wal_sync_method | fdatasync
| Selects the method used for forcing WAL updates to disk.
wal_writer_delay | 200ms
| WAL writer sleep time between WAL flushes.
work_mem | 32MB
| Sets the maximum memory to be used for query workspaces.

Checkpoints occur every ~ 30sec.

Following the advices from this mailing list shared buffers size was changed
from 12 to 2GB but nothing has changed.

I'm not sure or my bottleneck is the I/O subsystem or there is anything else
I can do to make it faster? What I came up with is (but I'm not sure if any
of this makes sense):
* change settings for bgwriter/wal?
* make sure huge pages are in use by changing huge_pages parameter to on
* replace truncate with DROP/CREATE command?
* turning off fsync for loading?
* increase commit_delay value?
* move temporary tables to a different tablespace

Your advice or suggestions will be much appreciated.

--
View this message in context:
http://postgresql.nabble.com/Big-data-INSERT-optimization-ExclusiveLock-on-extension-of-the-table-tp5916781.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: Big data INSERT optimization - ExclusiveLock on extension of the table

From

Jim Nasby

Date:

19 August 2016, 00:50:53

On 8/17/16 6:45 AM, pinker wrote:
> 1. rename table t01 to t02
OK...
> 2. insert into t02 1M rows in chunks for about 100k
Why not just insert into t01??
> 3. from t01 (previously loaded table) insert data through stored procedure
But you renamed t01 so it no longer exists???
> to b01 - this happens parallel in over a dozen sessions
b01?
> 4. truncate t01
Huh??
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

Re: Big data INSERT optimization - ExclusiveLock on extension of the table

From

pinker

Date:

19 August 2016, 01:26:40


> 1. rename table t01 to t02
OK...
> 2. insert into t02 1M rows in chunks for about 100k
Why not just insert into t01??

Because of cpu utilization, it speeds up when load is divided

> 3. from t01 (previously loaded table) insert data through stored procedure
But you renamed t01 so it no longer exists???
> to b01 - this happens parallel in over a dozen sessions
b01?

that's another table - permanent one

> 4. truncate t01
Huh??

The data were inserted to permanent storage so the temporary table can be
truncated and reused.

Ok, maybe the process is not so important; let's say the table is loaded,
then data are fetched and reloaded to other table through stored procedure
(with it's logic), then the table is truncated and process goes again. The
most important part is holding ExclusiveLocks ~ 1-5s.




--
View this message in context:
http://postgresql.nabble.com/Big-data-INSERT-optimization-ExclusiveLock-on-extension-of-the-table-tp5916781p5917136.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: Re: Big data INSERT optimization - ExclusiveLock on extension of the table

From

Jim Nasby

Date:

19 August 2016, 17:13:34

On 8/18/16 5:26 PM, pinker wrote:
>
>
>> 1. rename table t01 to t02
> OK...
>> 2. insert into t02 1M rows in chunks for about 100k
> Why not just insert into t01??
>
> Because of cpu utilization, it speeds up when load is divided

That still doesn't explain why you renamed t01 to t02.

>> 3. from t01 (previously loaded table) insert data through stored procedure
> But you renamed t01 so it no longer exists???
>> to b01 - this happens parallel in over a dozen sessions
> b01?
>
> that's another table - permanent one
>
>> 4. truncate t01
> Huh??
>
> The data were inserted to permanent storage so the temporary table can be
> truncated and reused.

Except t01 doesn't exist anymore...

> Ok, maybe the process is not so important; let's say the table is loaded,
> then data are fetched and reloaded to other table through stored procedure
> (with it's logic), then the table is truncated and process goes again. The
> most important part is holding ExclusiveLocks ~ 1-5s.

The process is important though, because AFAIK the only thing that
blocks the extension lock is another process extending the relation,
vacuum, or something trying to record information about free space and
an FSM page not existing. Is there something else doing inserts into the
table at the same time? Is something doing a bunch of updates or deletes
on pages that are newly inserted?

BTW, there we improvements made to relation extension in 9.6, so if you
have some way to test this on 9.6 it would be useful to know if it's
still a problem or not.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

Re: Big data INSERT optimization - ExclusiveLock on extension of the table

From

Merlin Moncure

Date:

19 August 2016, 18:01:18

On Wed, Aug 17, 2016 at 6:45 AM, pinker <pinker@onet.eu> wrote:
> Hi,
> the problem I'm dealing with is long holding locks during extensions of
> table:
> process xxx still waiting for ExclusiveLock on extension of relation xxx of
> database xxx after 3000.158 ms
> My application is write intensive, in one round I need to insert about 1M
> rows. The general scheme of the process looks as follows:
> 1. rename table t01 to t02
> 2. insert into t02 1M rows in chunks for about 100k
> 3. from t01 (previously loaded table) insert data through stored procedure
> to b01 - this happens parallel in over a dozen sessions
> 4. truncate t01
>
> Some data:
> PostgreSQL version 9.5
>
>  commit_delay                        | 0
> | Sets the delay in microseconds between transaction commit and flushing WAL
> to disk.
>  checkpoint_completion_target        | 0.9
> | Time spent flushing dirty buffers during checkpoint, as fraction of
> checkpoint interval
>  maintenance_work_mem                | 2GB
> | Sets the maximum memory to be used for maintenance operations.
> shared_buffers                      | 2GB
>
> wal_block_size                      | 8192
> | Shows the block size in the write ahead log.
>  wal_buffers                         | 16MB
> | Sets the number of disk-page buffers in shared memory for WAL.
>  wal_compression                     | off
> | Compresses full-page writes written in WAL file.
>  wal_keep_segments                   | 0
> | Sets the number of WAL files held for standby servers.
>  wal_level                           | minimal
> | Set the level of information written to the WAL.
>  wal_log_hints                       | off
> | Writes full pages to WAL when first modified after a checkpoint, even for
> a non-critical modifications.
>  wal_receiver_status_interval        | 10s
> | Sets the maximum interval between WAL receiver status reports to the
> primary.
>  wal_receiver_timeout                | 1min
> | Sets the maximum wait time to receive data from the primary.
>  wal_retrieve_retry_interval         | 5s
> | Sets the time to wait before retrying to retrieve WAL after a failed
> attempt.
>  wal_segment_size                    | 16MB
> | Shows the number of pages per write ahead log segment.
>  wal_sender_timeout                  | 1min
> | Sets the maximum time to wait for WAL replication.
>  wal_sync_method                     | fdatasync
> | Selects the method used for forcing WAL updates to disk.
>  wal_writer_delay                    | 200ms
> | WAL writer sleep time between WAL flushes.
>  work_mem                            | 32MB
> | Sets the maximum memory to be used for query workspaces.
>
> Checkpoints occur every ~ 30sec.
>
> Following the advices from this mailing list shared buffers size was changed
> from 12 to 2GB but nothing has changed.
>
> I'm not sure or my bottleneck is the I/O subsystem or there is anything else
> I can do to make it faster? What I came up with is (but I'm not sure if any
> of this makes sense):
> * change settings for bgwriter/wal?
> * make sure huge pages are in use by changing huge_pages parameter to on
> * replace truncate with DROP/CREATE command?
> * turning off fsync for loading?
> * increase commit_delay value?
> * move temporary tables to a different tablespace
>
> Your advice or suggestions will be much appreciated.

Here's how I do it:
CREATE TABLE t_new (LIKE t INCLUDING ALL);
<insert from n threads to t_new>

BEGIN;
DROP TABLE t;
ALTER TABLE t_new RENAME to t;
<recreate views etc as needed>
COMMIT;

If moving multiple tables in a single transaction I do a looped lock
probe with NOWAIT to avoid deadlocks.  Postgres deadlock resolution
behavior is such that longer running processes seem to get killed
first; in these scenarios it seems to almost always kill the one you
*don't* want killed :-).

This strategy will even work in complicated scenarios, for example
partitioned tables; you can build up the partition on the side and
swap in the the new one over the old one in a transaction.

The above is all about avoiding locks.  If your problem is i/o bound,
here are some general strategies to improve insert performance:
*) UNLOGGED tables (beware: no data on spurious restart)
*) synchronous_commit =false
*) ensure shared_buffers is high enough (too low and you get checkpoints)


merlin