Re: COPY locking - Mailing list pgsql-general
From | John Coers |
---|---|
Subject | Re: COPY locking |
Date | |
Msg-id | 3AFAA933.B01F403A@intrinsity.com Whole thread Raw |
In response to | COPY locking (John Coers <coers@intrinsity.com>) |
Responses |
Re: COPY locking
|
List | pgsql-general |
> No, they all share the same disk buffer in shared memory for whatever is > currently the last page of the file. Adding a tuple into that page > requires grabbing a short-term lock on that page (since obviously only > one process can do it at a time, else they'd be trying to insert into > the same physical spot). That lock is released as soon as the tuple is > physically stored in the buffer, but with enough processes running COPY > into the same table, you'll get contention for the lock. Ah so! So after each tuple is written to shared memory there is a giant scramble for the lock. Explains all the semops and timer sets and sleeps in the truss output that I've been seeing. > I suspect the performance issue you're seeing is not so much the need > for a short-term lock (it's awful hard to see how to do without one, > anyway) as it is that our current spinlock implementation is pretty bad > in the face of heavy contention. On most platforms the backoff when > you can't get the lock on the first try is a ten-millisecond sleep, > which is huge overkill for locks that are only held for periods of > microseconds. There was discussion in pghackers a few months ago about > reimplementing spinlocks in some more modern way (eg, using Posix > semaphores if available) but no progress has been made yet. I saw that thread while looking for answers to my problem. Put me in as a "Yay" vote. My use of this system is different from most I think. I need to cram huge amounts of data in from multiple clients, then I'll pull it all out and process it later although there will be an occasional query to monitor progress. I don't need fancy queries. > > Are there any suggested techniques or tweaks I can make to avoid this > > interference? > > Do you really need to run multiple COPYs in parallel, or would > serializing them be just as good? You could serialize them without > locking out readers by doing > > BEGIN; > LOCK TABLE foo IN EXCLUSIVE MODE; > COPY foo FROM ... > END; > > "EXCLUSIVE" mode isn't quite as exclusive as a plain LOCK TABLE; it > still allows other readers. See "Table-level locks" in the User's > Guide. > They don't have to be parallel in the strictest sense. As clients get finished with jobs they will try to connect and upload data. If serializing the process makes it faster, then that's what I'll try. Counterintuitive though it may be. Of course getting my best performance with N=2 was counterintuitive too... -- John Coers Intrinsity, Inc. coers@intrinsity.com Austin, Texas
pgsql-general by date: