Thread: lock contention on parallel COPY ?

lock contention on parallel COPY ?

From
Stefan Kaltenbrunner
Date:
I'm currently playing with Andrews parallel restore patch and it seems 
that pg is far from taking advantage of the hardware I have for testing 
(Dual Quad Core Xeon attached to Netapp with 68 spindles). with a 
concurrency of 4 - I see iowait hovering ~1% CPU load at 20% peak and 
around 150000 contextswitches/s. The load seems to progress at around 
the same rate as a single backend restore does on the same box. a 
profile during the load (a fair amount of tables sized ~10-30M rows 
each) looks fairly similiar to:


samples  %        symbol name
1933314  21.8884  LWLockAcquire
1677808  18.9957  XLogInsert
848227    9.6034  LWLockRelease
414179    4.6892  DoCopy
332633    3.7660  CopyReadLine
266580    3.0181  UnpinBuffer
221693    2.5099  heap_formtuple
176939    2.0033  .plt
171842    1.9455  PinBuffer
160470    1.8168  GetNewObjectId
154095    1.7446  heap_insert
151813    1.7188  s_lock
117849    1.3343  LockBuffer
109530    1.2401  hash_search_with_hash_value
102169    1.1567  PageAddItem
91151     1.0320  pg_verify_mbstr_len
82538     0.9345  CopyGetData


using --truncate-before-load seems to help a bit but it still seems to 
only barely utilizing the available resources.


Stefan


Re: lock contention on parallel COPY ?

From
Tom Lane
Date:
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> samples  %        symbol name
> 1933314  21.8884  LWLockAcquire
> 1677808  18.9957  XLogInsert
> 848227    9.6034  LWLockRelease
> 414179    4.6892  DoCopy
> 332633    3.7660  CopyReadLine
> 266580    3.0181  UnpinBuffer
> 221693    2.5099  heap_formtuple

I suppose Andrew didn't yet put in the hack to avoid WAL logging
during the COPY commands.  The LWLockAcquires are presumably
blocking on WALInsertLock, given that XLogInsert is also right
up there ...
        regards, tom lane


Re: lock contention on parallel COPY ?

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>   
>> samples  %        symbol name
>> 1933314  21.8884  LWLockAcquire
>> 1677808  18.9957  XLogInsert
>> 848227    9.6034  LWLockRelease
>> 414179    4.6892  DoCopy
>> 332633    3.7660  CopyReadLine
>> 266580    3.0181  UnpinBuffer
>> 221693    2.5099  heap_formtuple
>>     
>
> I suppose Andrew didn't yet put in the hack to avoid WAL logging
> during the COPY commands.  The LWLockAcquires are presumably
> blocking on WALInsertLock, given that XLogInsert is also right
> up there ...
>
>             
>   


Yes I did. That's what the --truncate-before-load switch does (or should 
do).

cheers

andrew


Re: lock contention on parallel COPY ?

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> I suppose Andrew didn't yet put in the hack to avoid WAL logging

> Yes I did. That's what the --truncate-before-load switch does (or should 
> do).

Well, it doesn't seem to be having any effect.  Maybe Stefan is testing
a configuration with xlog archiving enabled?
        regards, tom lane


Re: lock contention on parallel COPY ?

From
Stefan Kaltenbrunner
Date:
Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> Tom Lane wrote:
>>> I suppose Andrew didn't yet put in the hack to avoid WAL logging
> 
>> Yes I did. That's what the --truncate-before-load switch does (or should 
>> do).
> 
> Well, it doesn't seem to be having any effect.  Maybe Stefan is testing
> a configuration with xlog archiving enabled?


heh no log archiving - I actually said that I'm now playing with 
--truncate-before-load which seems to cause a noticeable performance (as 
in IO generated) increase but I still see >130000 context switches/s and 
a profile that looks like:


samples  %        symbol name
55526    16.5614  LWLockAcquire
29721     8.8647  DoCopy
26581     7.9281  CopyReadLine
25105     7.4879  LWLockRelease
15743     4.6956  PinBuffer
14725     4.3919  heap_formtuple
9748      2.9075  GetNewObjectId
8059      2.4037  pg_verify_mbstr_len
6825      2.0356  hash_search_with_hash_value
6386      1.9047  s_lock
5645      1.6837  heap_insert
5631      1.6795  PageAddItem
4723      1.4087  pg_atoi


Stefan


Re: lock contention on parallel COPY ?

From
Tom Lane
Date:
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> heh no log archiving - I actually said that I'm now playing with 
> --truncate-before-load which seems to cause a noticeable performance (as 
> in IO generated) increase but I still see >130000 context switches/s and 
> a profile that looks like:

> samples  %        symbol name
> 55526    16.5614  LWLockAcquire
> 29721     8.8647  DoCopy
> 26581     7.9281  CopyReadLine
> 25105     7.4879  LWLockRelease
> 15743     4.6956  PinBuffer
> 14725     4.3919  heap_formtuple

Still a lot of contention for something, then.  You might try turning on
LWLOCK_STATS (this only requires recompiling storage/lmgr/lwlock.c) to
get some evidence about what.
        regards, tom lane


Re: lock contention on parallel COPY ?

From
Stefan Kaltenbrunner
Date:
Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>> heh no log archiving - I actually said that I'm now playing with 
>> --truncate-before-load which seems to cause a noticeable performance (as 
>> in IO generated) increase but I still see >130000 context switches/s and 
>> a profile that looks like:
> 
>> samples  %        symbol name
>> 55526    16.5614  LWLockAcquire
>> 29721     8.8647  DoCopy
>> 26581     7.9281  CopyReadLine
>> 25105     7.4879  LWLockRelease
>> 15743     4.6956  PinBuffer
>> 14725     4.3919  heap_formtuple
> 
> Still a lot of contention for something, then.  You might try turning on
> LWLOCK_STATS (this only requires recompiling storage/lmgr/lwlock.c) to
> get some evidence about what.

that one generates a huge amount of logs - output for ~60s into the load 
is available here:

http://www.kaltenbrunner.cc/files/lwstats.txt (21MB!)


Stefan


Re: lock contention on parallel COPY ?

From
Tom Lane
Date:
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> that one generates a huge amount of logs - output for ~60s into the load 
> is available here:
> http://www.kaltenbrunner.cc/files/lwstats.txt (21MB!)

Huh ... essentially all the contention is for OidGenLock.  I take it
you're loading into a table that has OIDs?
        regards, tom lane


Re: lock contention on parallel COPY ?

From
Stefan Kaltenbrunner
Date:
Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>> that one generates a huge amount of logs - output for ~60s into the load 
>> is available here:
>> http://www.kaltenbrunner.cc/files/lwstats.txt (21MB!)
> 
> Huh ... essentially all the contention is for OidGenLock.  I take it
> you're loading into a table that has OIDs?

hmm that particular database has its ancient roots on 7.3 (or 7.4) and 
indeed it seems that some tables in the dump are preceeded by a SET 
default_with_oids = true;

Will go and elimited those and try again.


Stefan


Re: lock contention on parallel COPY ?

From
Simon Riggs
Date:
On Fri, 2008-09-26 at 12:38 -0400, Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> > heh no log archiving - I actually said that I'm now playing with 
> > --truncate-before-load which seems to cause a noticeable performance (as 
> > in IO generated) increase but I still see >130000 context switches/s and 
> > a profile that looks like:
> 
> > samples  %        symbol name
> > 55526    16.5614  LWLockAcquire
> > 29721     8.8647  DoCopy
> > 26581     7.9281  CopyReadLine
> > 25105     7.4879  LWLockRelease
> > 15743     4.6956  PinBuffer
> > 14725     4.3919  heap_formtuple
> 
> Still a lot of contention for something, then.  You might try turning on
> LWLOCK_STATS (this only requires recompiling storage/lmgr/lwlock.c) to
> get some evidence about what.

Probably loading a table with a generated PK or loading data in
ascending sequence, so its contending heavily for the rightmost edge of
the index.

We need to load data a block at a time and buffer the inserts into the
index also, so we don't need to lock/unlock per row.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: lock contention on parallel COPY ?

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>> samples  %        symbol name
>> 55526    16.5614  LWLockAcquire
>> 29721     8.8647  DoCopy
>> 26581     7.9281  CopyReadLine
>> 25105     7.4879  LWLockRelease
>> 15743     4.6956  PinBuffer
>> 14725     4.3919  heap_formtuple

> Probably loading a table with a generated PK or loading data in
> ascending sequence, so its contending heavily for the rightmost edge of
> the index.

No, given that DoCopy and CopyReadLine are right up there, I think we're
still looking at the COPY phase, not index building.

The profile will probably change completely once index building
starts...
        regards, tom lane


Re: lock contention on parallel COPY ?

From
Stefan Kaltenbrunner
Date:
Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>>> samples  %        symbol name
>>> 55526    16.5614  LWLockAcquire
>>> 29721     8.8647  DoCopy
>>> 26581     7.9281  CopyReadLine
>>> 25105     7.4879  LWLockRelease
>>> 15743     4.6956  PinBuffer
>>> 14725     4.3919  heap_formtuple
> 
>> Probably loading a table with a generated PK or loading data in
>> ascending sequence, so its contending heavily for the rightmost edge of
>> the index.
> 
> No, given that DoCopy and CopyReadLine are right up there, I think we're
> still looking at the COPY phase, not index building.
> 
> The profile will probably change completely once index building
> starts...

yeah this profile is only showing the COPY phase ...

Stefan


Re: lock contention on parallel COPY ?

From
Simon Riggs
Date:
On Fri, 2008-09-26 at 14:00 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> >> samples  %        symbol name
> >> 55526    16.5614  LWLockAcquire
> >> 29721     8.8647  DoCopy
> >> 26581     7.9281  CopyReadLine
> >> 25105     7.4879  LWLockRelease
> >> 15743     4.6956  PinBuffer
> >> 14725     4.3919  heap_formtuple
> 
> > Probably loading a table with a generated PK or loading data in
> > ascending sequence, so its contending heavily for the rightmost edge of
> > the index.
> 
> No, given that DoCopy and CopyReadLine are right up there, I think we're
> still looking at the COPY phase, not index building.
> 
> The profile will probably change completely once index building
> starts...

Sorry, was assuming we were loading with indexes on, which is wrong.

Agree the profile looks odd.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: lock contention on parallel COPY ?

From
Simon Riggs
Date:
On Fri, 2008-09-26 at 20:07 +0200, Stefan Kaltenbrunner wrote:
> Tom Lane wrote:
> > Simon Riggs <simon@2ndQuadrant.com> writes:
> >> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> >>> samples  %        symbol name
> >>> 55526    16.5614  LWLockAcquire
> >>> 29721     8.8647  DoCopy
> >>> 26581     7.9281  CopyReadLine
> >>> 25105     7.4879  LWLockRelease
> >>> 15743     4.6956  PinBuffer
> >>> 14725     4.3919  heap_formtuple
> >
> >> Probably loading a table with a generated PK or loading data in
> >> ascending sequence, so its contending heavily for the rightmost edge of
> >> the index.
> >
> > No, given that DoCopy and CopyReadLine are right up there, I think we're
> > still looking at the COPY phase, not index building.
> >
> > The profile will probably change completely once index building
> > starts...
>
> yeah this profile is only showing the COPY phase ...

Try using this Postgres core patch. It's a updated version of my
fast_copy.v4.patch from Patches: Bulk Insert tuning 20 Mar 2008

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

Attachment