Re: Using multi-row technique with COPY - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Using multi-row technique with COPY |
Date | |
Msg-id | 1133296503.2906.394.camel@localhost.localdomain Whole thread Raw |
In response to | Re: Using multi-row technique with COPY (Bruce Momjian <pgman@candle.pha.pa.us>) |
Responses |
Re: Using multi-row technique with COPY
|
List | pgsql-hackers |
On Tue, 2005-11-29 at 14:17 -0500, Bruce Momjian wrote: > Simon Riggs wrote: > > As a further enhancement, I would also return to the NOLOGGING option > > for COPY. Previously we had said that COPY LOCK was the way to go - > > taking a full table lock to prevent concurrent inserts to a block from a > > COPY that didn't write WAL and another backend which wanted to write WAL > > about that block. With the above suggested all-inserts-at-once > > optimization, it would no longer be a requirement to lock the table. > > That means we can continue to take advantage of the ability to run > > multiple COPY loads into the same table. Avoiding writing WAL will > > further reduce CPU by about 15% and I/O by about 50%. > > > > I would also suggest that pgdump be changed to use the NOLOGGING option > > by default, with an option to work as previously. > > For those who have been around, they know I dislike having options that > 95% of our users desire not be the default behavior. I like that approach and it makes for a great product. > I think the COPY > NOLOGGING idea falls in that category. I would like to explore if there > is a way to have COPY automatically do no logging where possible by > default. see last, below > First, I think NOLOGGING is probably the wrong keyword. I am thinking > SHARE/EXCLUSIVE is best because they are already keywords, and they > explain the effect of the flag on other applications, rather than the > LOGGING capability, which is invisible to applications. > > I am thinking we would have COPY WITH [ [ EXCLUSIVE | SHARE ] [ LOCK ]] ... > EXCLUSIVE lock would be NOLOGGING, SHARE would do logging because other > applications could insert into the table at the same time (and do > UPDATES/DELETES of the inserted rows). I wasn't trying to suggest new wording for COPY LOCK, or whatever it was called previously. My suggestion was to alter the underlying mechanism so that it would not actually need to take a lock at all. So I was trying to find a name that matched that thought.... so your comments above don't really apply.... but I'm more than happy to have others suggest names and will go with the flow. > One idea for default behavior would be to use EXCLUSIVE when the table > is zero size. I think that would do pg_dump and most of the user cases, > and of course users could override the default by using a keyword. We > could emit a NOTICE if an an exclusive lock is used without an EXCLUSIVE > keyword. One problem I see is that there is no way to insure zero size > without a lock that blocks other writers. Is that reliable? I think it would be possible to say: if when a COPY starts the table has zero blocks, then we do a NOLOGGING (change_the_name) mode operation. That gives a problem for recovery. If you lose the server and recover with PITR, that table would come back empty. I'd want to make very sure that I knew whether it was going to be there, or not, and I want it to be my choice. So, I agree with the thought from a automatic performance perspective, but not from a recovery perspective. (There is some irony in being the person to both advocate PITR and to try so hard not to log...) I think of PITR as the default mode of operation now; almost all Oracle, DB2, SQLServer etc users make consistent use of archivelogmode/archive recovery. Best Regards, Simon Riggs
pgsql-hackers by date: