Andrew Dunstan <andrew@dunslane.net> writes:
> The advantage of having it in COPY is that it can be done serverside direct
> from the file system. For massive bulk loads that might be a plus, although I
> don't know what the protocol+socket overhead is.
Actually even if you use client-side COPY it's *still* more efficient than any
more general client-side alternative.
As Tom pointed out to me a while back, neither the protocol nor libpq allow
for having multiple queries in flight simultaneously. That makes it impossible
to stream large quantities of data to the server efficiently. Each record
requires a round-trip and context switch overhead.
In an ideal world the client should be able to queue up enough records to fill
the socket buffers and allow the kernel to switch to a more batch oriented
context switch mode where the server can process large numbers of records
before switching back to the client. Ideally this would apply to any kind of
query execution.
But as a kind of short cut towards this for bulk loading I'm curious whether
it would be possible to adopt a sort of batch execution mode where a statement
is prepared, then parameters to the statement are streamed to the server in a
kind of COPY mode. It would have to be some format that allowed for embedded
newlines though; there's just no point in an interface that can't handle
arbitrary data.
Personally I find the current CSV support inadequate. It seems pointless to
support CSV if it can't load data exported from Excel, which seems like the
main use case. But I always thought bulk loading should be from some external
application anyways. The problem is that there isn't any interface suitable
for an external application to use.
--
greg