Home > mailing lists

Re: multiline CSV fields - Mailing list pgsql-hackers

From	Greg Stark
Subject	Re: multiline CSV fields
Date	November 30, 2004 06:48:51
Msg-id	87oehf96ik.fsf@stark.xeocode.com Whole thread Raw
In response to	Re: multiline CSV fields (Andrew Dunstan <andrew@dunslane.net>)
Responses	Re: multiline CSV fields Re: multiline CSV fields
List	pgsql-hackers

Tree view

Andrew Dunstan <andrew@dunslane.net> writes:

> The advantage of having it in COPY is that it can be done serverside direct
> from the file system. For massive bulk loads that might be a plus, although I
> don't know what the protocol+socket overhead is. 

Actually even if you use client-side COPY it's *still* more efficient than any
more general client-side alternative.

As Tom pointed out to me a while back, neither the protocol nor libpq allow
for having multiple queries in flight simultaneously. That makes it impossible
to stream large quantities of data to the server efficiently. Each record
requires a round-trip and context switch overhead.

In an ideal world the client should be able to queue up enough records to fill
the socket buffers and allow the kernel to switch to a more batch oriented
context switch mode where the server can process large numbers of records
before switching back to the client. Ideally this would apply to any kind of
query execution.

But as a kind of short cut towards this for bulk loading I'm curious whether
it would be possible to adopt a sort of batch execution mode where a statement
is prepared, then parameters to the statement are streamed to the server in a
kind of COPY mode. It would have to be some format that allowed for embedded
newlines though; there's just no point in an interface that can't handle
arbitrary data.

Personally I find the current CSV support inadequate. It seems pointless to
support CSV if it can't load data exported from Excel, which seems like the
main use case. But I always thought bulk loading should be from some external
application anyways. The problem is that there isn't any interface suitable
for an external application to use.

-- 
greg

pgsql-hackers by date:

From: Christopher Browne
Date: 30 November 2004, 05:03:09
Subject: Re: Auto Vacuum

From: Greg Stark
Date: 30 November 2004, 07:00:43
Subject: Re: 8.0beta5 results w/ dbt2

Re: multiline CSV fields - Mailing list pgsql-hackers

Previous

Next