Re: COPY Hacks (WAS: RE: Postgresql vs SQLserver for this application ?) - Mailing list pgsql-performance

From Dave Held
Subject Re: COPY Hacks (WAS: RE: Postgresql vs SQLserver for this application ?)
Date
Msg-id 49E94D0CFCD4DB43AFBA928DDD20C8F9026184A1@asg002.asg.local
Whole thread Raw
In response to COPY Hacks (WAS: RE: Postgresql vs SQLserver for this application ?)  (Mischa <mischa.Sandberg@telus.net>)
List pgsql-performance
> -----Original Message-----
> From: Mischa [mailto:mischa.Sandberg@telus.net]
> Sent: Wednesday, April 06, 2005 1:47 PM
> To: pgsql-performance@postgresql.org
> Subject: COPY Hacks (WAS: RE: [PERFORM] Postgresql vs
> SQLserver for this
> application ?)
>
> [...]
> Using COPY ... FROM STDIN via the Perl DBI (DBD::Pg) interface,
> I accidentally strung together several \n-terminated input lines,
> and sent them to the server with a single "putline".
>
> To my (happy) surprise, I ended up with exactly that number of
> rows in the target table.
>
> Is this a bug? Is this fundamental to the protocol?

Just guessing without looking at the code, I assume that the
server doesn't care if you send your data in lines, words, or
massive blocks.  It simply looks for the newline terminator to
determine end-of-block.  The reason putline works nicely is
probably that it terminates your rows with a newline character.
But as you noticed, you can do that yourself.  I would say that
it's intrinsic to the way I/O is typically done.  It may very
well be that the function that implements COPY never sees when
exactly you make a function call from Perl, but only sees a
buffer getting filled up with data that it needs to parse.  From
that perspective, it's easy to see why you simply need to
properly terminate your rows to get the expected behavior.
Consider COPYing from a file...odds are it doesn't read data
from the file exactly 1 row at a time, but rather some block-
size multiple at a time.  The only way COPY could work correctly
is if it ignored the size of data sent to it and only parsed on
\n boundaries.

> Since it hasn't been documented (but then, "endcopy" isn't
> documented), I've been shy of investing in perf testing such
> mass copy calls. But, if it DOES work, it should be reducing
> the number of network roundtrips.
> [...]

Feel free to use your technique.  I would be *extremely*
surprised if there were a reason it shouldn't work.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

pgsql-performance by date:

Previous
From: William Yu
Date:
Subject: Re: How to improve db performance with $7K?
Next
From: Tom Lane
Date:
Subject: Re: COPY Hacks (WAS: RE: Postgresql vs SQLserver for this application ?)