Re: COPY locking - Mailing list pgsql-general

From John Coers
Subject Re: COPY locking
Date
Msg-id 3AFAA933.B01F403A@intrinsity.com
Whole thread Raw
In response to COPY locking  (John Coers <coers@intrinsity.com>)
Responses Re: COPY locking
List pgsql-general
> No, they all share the same disk buffer in shared memory for whatever is
> currently the last page of the file.  Adding a tuple into that page
> requires grabbing a short-term lock on that page (since obviously only
> one process can do it at a time, else they'd be trying to insert into
> the same physical spot).  That lock is released as soon as the tuple is
> physically stored in the buffer, but with enough processes running COPY
> into the same table, you'll get contention for the lock.
Ah so!  So after each tuple is written to shared memory there is a giant
scramble for the lock.  Explains all the semops and timer sets and sleeps
in the truss output that I've been seeing.


> I suspect the performance issue you're seeing is not so much the need
> for a short-term lock (it's awful hard to see how to do without one,
> anyway) as it is that our current spinlock implementation is pretty bad
> in the face of heavy contention.  On most platforms the backoff when
> you can't get the lock on the first try is a ten-millisecond sleep,
> which is huge overkill for locks that are only held for periods of
> microseconds.  There was discussion in pghackers a few months ago about
> reimplementing spinlocks in some more modern way (eg, using Posix
> semaphores if available) but no progress has been made yet.

I saw that thread while looking for answers to my problem.  Put me in as a
"Yay" vote.  My use of this system is different from most I think.  I need to cram
huge amounts of data in from multiple clients, then I'll pull it all out and
process it later although there will be an occasional query to monitor progress.
I don't need fancy queries.


> > Are there any suggested techniques or tweaks I can make to avoid this
> > interference?
>
> Do you really need to run multiple COPYs in parallel, or would
> serializing them be just as good?  You could serialize them without
> locking out readers by doing
>
>         BEGIN;
>         LOCK TABLE foo IN EXCLUSIVE MODE;
>         COPY foo FROM ...
>         END;
>
> "EXCLUSIVE" mode isn't quite as exclusive as a plain LOCK TABLE; it
> still allows other readers.  See "Table-level locks" in the User's
> Guide.
>

They don't have to be parallel in the strictest sense.  As clients get
finished with jobs they will try to connect and upload data.  If serializing
the process makes it faster, then that's what I'll try.  Counterintuitive though
it may be.  Of course getting my best performance with N=2 was counterintuitive too...



--
John Coers            Intrinsity, Inc.
coers@intrinsity.com  Austin, Texas

pgsql-general by date:

Previous
From: Justin Clift
Date:
Subject: Re: ER diagrams
Next
From: Tom Lane
Date:
Subject: Re: GRANT