Home > mailing lists

Re: raw output from copy - Mailing list pgsql-hackers

From	Pavel Stehule
Subject	Re: raw output from copy
Date	April 8, 2016 19:14:07
Msg-id	CAFj8pRBSkimS4HhByHZ0WExEJLq_ep0fzdiWaoUPoHvSLDeX5g@mail.gmail.com Whole thread
In response to	Re: raw output from copy (Andrew Dunstan <andrew@dunslane.net>)
List	pgsql-hackers

Tree view

2016-04-08 20:54 GMT+02:00 Andrew Dunstan <andrew@dunslane.net>:

On 04/08/2016 02:13 PM, Robert Haas wrote:
On Tue, Apr 5, 2016 at 4:45 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
here is cleaned/finished previous implementation of RAW_TEXT/RAW_BINARY
formats for COPY statements.

The RAW with text formats means unescaped data, but with correct encoding -
input/output is realised with input/output function. RAW binary means
content produced/received by sending/received functions.

Now both directions (input/output) working well

Some examples of expected usage:

copy (select xmlelement(name foo, 'hello')) to stdout (format raw_binary,
encoding 'latin2');

create table avatars(id serial, picture bytea);
\copy avatars(picture) from ~/images/foo.jpg (format raw_binary);
select lastval();

create table doc(id serial, txt text);
\copy doc(txt) from ~/files/aaa.txt (format raw_text, encoding 'latin2');
select lastval();
As much as I know you and some other people would like it to be
otherwise, this patch clearly does not have a sufficient degree of
consensus to justify committing it to PostgreSQL 9.6. I'm marking it
Returned with Feedback.

I should add that I've been thinking about this some more, and that I now agree that something should be done to support this at the SQL level, mainly so that clients can manage very large pieces of data in a stream-oriented fashion rather than having to marshall the data in memory to load/unload via INSERT/SELECT. Anything that is client-side only is likely to have this memory issue.

At the same time I'm still not entirely convinced that COPY is a good vehicle for this. It's designed for bulk records, and already quite complex. Maybe we need something new that uses the COPY protocol but is more specifically tailored for loading or sending large singleton pieces of data.

Now it is little bit more time to think more about. But It is hard to design some more simpler than is COPY syntax. What will support both directions.

My implementation has same limit like COPY BINARY - it isn't worse. It should be good enough for VARLENA types that should not be higher than 1GB. It is not designed for LOB replacement.

Regards

Pavel

cheers

andrew

pgsql-hackers by date:

From: Tom Lane
Date: 08 April 2016, 19:13:55
Subject: Re: multivariate statistics v14

From: Alexander Korotkov
Date: 08 April 2016, 19:20:10
Subject: Re: Move PinBuffer and UnpinBuffer to atomics

Re: raw output from copy - Mailing list pgsql-hackers

Previous

Next