Re: Using multi-row technique with COPY - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Using multi-row technique with COPY
Date
Msg-id 200511292351.jATNprm20699@candle.pha.pa.us
Whole thread Raw
In response to Re: Using multi-row technique with COPY  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Using multi-row technique with COPY  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
Please let me back up and ask a more simplistic question.  I understand
the idea of allowing COPY to insert rows with less locking, but I am
wondering about the NOLOGGING idea.  On commit, we must guarantee that
all the rows are in the table, so what advantage is there to a NOLOGGING
option?

---------------------------------------------------------------------------

Simon Riggs wrote:
> On Tue, 2005-11-29 at 14:17 -0500, Bruce Momjian wrote:
> > Simon Riggs wrote:
> > > As a further enhancement, I would also return to the NOLOGGING option
> > > for COPY. Previously we had said that COPY LOCK was the way to go -
> > > taking a full table lock to prevent concurrent inserts to a block from a
> > > COPY that didn't write WAL and another backend which wanted to write WAL
> > > about that block. With the above suggested all-inserts-at-once
> > > optimization, it would no longer be a requirement to lock the table.
> > > That means we can continue to take advantage of the ability to run
> > > multiple COPY loads into the same table. Avoiding writing WAL will
> > > further reduce CPU by about 15% and I/O by about 50%. 
> > > 
> > > I would also suggest that pgdump be changed to use the NOLOGGING option
> > > by default, with an option to work as previously.
> > 
> > For those who have been around, they know I dislike having options that
> > 95% of our users desire not be the default behavior.  
> 
> I like that approach and it makes for a great product.
> 
> > I think the COPY
> > NOLOGGING idea falls in that category.  I would like to explore if there
> > is a way to have COPY automatically do no logging where possible by
> > default.
> 
> see last, below
> 
> > First, I think NOLOGGING is probably the wrong keyword.  I am thinking
> > SHARE/EXCLUSIVE is best because they are already keywords, and they
> > explain the effect of the flag on other applications, rather than the
> > LOGGING capability, which is invisible to applications.
> > 
> > I am thinking we would have COPY WITH [ [ EXCLUSIVE | SHARE ] [ LOCK ]] ...
> > EXCLUSIVE lock would be NOLOGGING, SHARE would do logging because other
> > applications could insert into the table at the same time (and do
> > UPDATES/DELETES of the inserted rows).
> 
> I wasn't trying to suggest new wording for COPY LOCK, or whatever it was
> called previously. My suggestion was to alter the underlying mechanism
> so that it would not actually need to take a lock at all. So I was
> trying to find a name that matched that thought.... so your comments
> above don't really apply.... but I'm more than happy to have others
> suggest names and will go with the flow.
> 
> > One idea for default behavior would be to use EXCLUSIVE when the table
> > is zero size.  I think that would do pg_dump and most of the user cases,
> > and of course users could override the default by using a keyword.  We
> > could emit a NOTICE if an an exclusive lock is used without an EXCLUSIVE
> > keyword.  One problem I see is that there is no way to insure zero size
> > without a lock that blocks other writers.  Is that reliable?
> 
> I think it would be possible to say: if when a COPY starts the table has
> zero blocks, then we do a NOLOGGING (change_the_name) mode operation.
> 
> That gives a problem for recovery. If you lose the server and recover
> with PITR, that table would come back empty. I'd want to make very sure
> that I knew whether it was going to be there, or not, and I want it to
> be my choice. So, I agree with the thought from a automatic performance
> perspective, but not from a recovery perspective.
> 
> (There is some irony in being the person to both advocate PITR and to
> try so hard not to log...)
> 
> I think of PITR as the default mode of operation now; almost all Oracle,
> DB2, SQLServer etc users make consistent use of archivelogmode/archive
> recovery.
> 
> Best Regards, Simon Riggs
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: slow IN() clause for many cases
Next
From: "Luke Lonergan"
Date:
Subject: Re: ice-broker scan thread