Thread: raw output from copy

raw output from copy

From

Pavel Stehule

Date:

09 April 2015, 18:49:22

This thread was finished without real work. I have a real use case - export XML doc in non utf8 encoding.

http://www.postgresql.org/message-id/16174.1319228878@sss.pgh.pa.us

I propose to implement new format option "RAW" like Tom proposed.

It requires only one row, one column result - and result is just raw binary data without size.

Objections? Ideas?

Regards

Pavel

Re: raw output from copy

From

Pavel Stehule

Date:

10 April 2015, 21:27:16

I wrote a prototype of this patch, and it works well

postgres=# set client_encoding to 'latin2';
SET
Time: 1.488 ms
postgres=# \copy (select xmlelement(name xx, d) from d) to ~/d.xml (format 'raw')
COPY 1
Time: 1.108 ms
postgres=# copy (select xmlelement(name xx, d) from d) to stdout (format 'raw') ;
<?xml version="1.0" encoding="LATIN2"?><xx>příliš žluťoučký kůň</xx>Time: 1.000 ms

Regards

Pavel

2015-04-09 20:48 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:

Hi

This thread was finished without real work. I have a real use case - export XML doc in non utf8 encoding.

http://www.postgresql.org/message-id/16174.1319228878@sss.pgh.pa.us

I propose to implement new format option "RAW" like Tom proposed.

It requires only one row, one column result - and result is just raw binary data without size.

Objections? Ideas?

Regards

Pavel

Attachment

copy-raw.patch

Re: raw output from copy

From

Peter Eisentraut

Date:

15 April 2015, 20:06:49

On 4/10/15 5:26 PM, Pavel Stehule wrote:
> Hi
> 
> I wrote a prototype of this patch, and it works well
> 
> postgres=# set client_encoding to 'latin2';
> SET
> Time: 1.488 ms
> postgres=# \copy (select xmlelement(name xx, d) from d) to ~/d.xml
> (format 'raw')
> COPY 1
> Time: 1.108 ms
> postgres=# copy (select xmlelement(name xx, d) from d) to stdout (format
> 'raw') ;
> <?xml version="1.0" encoding="LATIN2"?><xx>příliš žluťoučký
> kůň</xx>Time: 1.000 ms

I think you can get the same thing using regular psql output and just
turning off all field and record separators and tuple headers and so on.

Re: raw output from copy

From

Pavel Stehule

Date:

15 April 2015, 20:11:52

<p dir="ltr">It would be nice, but it is not true. You can get correct non utf8 xml with encoding specification only
whenbinary mode is used. Psql doesn't support binary mode.<p dir="ltr">Regards<p dir="ltr">Pavel<div
class="gmail_quote">Dne15. 4. 2015 22:06 napsal uživatel "Peter Eisentraut" <<a
href="mailto:peter_e@gmx.net">peter_e@gmx.net</a>>:<brtype="attribution" /><blockquote class="gmail_quote"
style="margin:00 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 4/10/15 5:26 PM, Pavel Stehule wrote:<br />
>Hi<br /> ><br /> > I wrote a prototype of this patch, and it works well<br /> ><br /> > postgres=# set
client_encodingto 'latin2';<br /> > SET<br /> > Time: 1.488 ms<br /> > postgres=# \copy (select
xmlelement(namexx, d) from d) to ~/d.xml<br /> > (format 'raw')<br /> > COPY 1<br /> > Time: 1.108 ms<br />
>postgres=# copy (select xmlelement(name xx, d) from d) to stdout (format<br /> > 'raw') ;<br /> > <?xml
version="1.0"encoding="LATIN2"?><xx>příliš žluťoučký<br /> > kůň</xx>Time: 1.000 ms<br /><br /> I
thinkyou can get the same thing using regular psql output and just<br /> turning off all field and record separators
andtuple headers and so on.<br /><br /></blockquote></div>

Re: raw output from copy

From

Pavel Golub

Date:

01 July 2015, 06:42:13

Hello Pavel.

I looked through the patch. Sources are OK. However I didn't find any docs and test cases. Would you please provide me with short description on this feature and why it is important. Because I didn't manage to find the old Andrew Dunstan's post either.

On Sat, Apr 11, 2015 at 12:26 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Hi

I wrote a prototype of this patch, and it works well

postgres=# set client_encoding to 'latin2';
SET
Time: 1.488 ms
postgres=# \copy (select xmlelement(name xx, d) from d) to ~/d.xml (format 'raw')
COPY 1
Time: 1.108 ms
postgres=# copy (select xmlelement(name xx, d) from d) to stdout (format 'raw') ;
<?xml version="1.0" encoding="LATIN2"?><xx>příliš žluťoučký kůň</xx>Time: 1.000 ms

Regards

Pavel

2015-04-09 20:48 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

This thread was finished without real work. I have a real use case - export XML doc in non utf8 encoding.

http://www.postgresql.org/message-id/16174.1319228878@sss.pgh.pa.us

I propose to implement new format option "RAW" like Tom proposed.

It requires only one row, one column result - and result is just raw binary data without size.

Objections? Ideas?

Regards

Pavel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Nullus est in vitae sensus ipsa vera est sensus.

Re: raw output from copy

From

Simon Riggs

Date:

02 July 2015, 10:55:31

On 1 July 2015 at 07:42, Pavel Golub <pavel@microolap.com> wrote:

I looked through the patch. Sources are OK. However I didn't find any docs and test cases. Would you please provide me with short description on this feature and why it is important. Because I didn't manage to find the old Andrew Dunstan's post either.

Feature sounds OK, so lets do it.

Pavel S, please submit a polished patch. Coding guidelines, tests, docs etc. Set back to Waiting On Author.

Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: raw output from copy

From

Pavel Stehule

Date:

02 July 2015, 11:15:45

I'll do it today evening

Pavel

2015-07-02 12:55 GMT+02:00 Simon Riggs <simon@2ndquadrant.com>:

On 1 July 2015 at 07:42, Pavel Golub <pavel@microolap.com> wrote:

I looked through the patch. Sources are OK. However I didn't find any docs and test cases. Would you please provide me with short description on this feature and why it is important. Because I didn't manage to find the old Andrew Dunstan's post either.

Feature sounds OK, so lets do it.

Pavel S, please submit a polished patch. Coding guidelines, tests, docs etc. Set back to Waiting On Author.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: raw output from copy

From

Andrew Dunstan

Date:

02 July 2015, 13:02:24

On 07/02/2015 07:14 AM, Pavel Stehule wrote:
> Hi
>
> I'll do it today evening
>
>

Pavel,

Please don't top-post on the PostgreSQL lists. You've been around here 
long enough to know that bottom posting is our custom.

I posted a patch for this in 2013 at 
<http://www.postgresql.org/message-id/50F2FA92.9040000@dunslane.net> but 
it can apply to a SELECT, and doesn't need COPY. Nobody seemed very 
interested, so I dropped it. Apparently people now want something along 
these lines, which is good.

cheers

andrew

Re: raw output from copy

From

Andrew Dunstan

Date:

02 July 2015, 13:08:17

On 07/02/2015 09:02 AM, Andrew Dunstan wrote:
>
> On 07/02/2015 07:14 AM, Pavel Stehule wrote:
>> Hi
>>
>> I'll do it today evening
>>
>>
>
> Pavel,
>
> Please don't top-post on the PostgreSQL lists. You've been around here 
> long enough to know that bottom posting is our custom.
>
> I posted a patch for this in 2013 at 
> <http://www.postgresql.org/message-id/50F2FA92.9040000@dunslane.net> 
> but it can apply to a SELECT, and doesn't need COPY. Nobody seemed 
> very interested, so I dropped it. Apparently people now want something 
> along these lines, which is good.


For reference, here's the Wayback Machine's version of the original blog 
post referred to: 

<http://web.archive.org/web/20110916023912/http://people.planetpostgresql.org/andrew/index.php?/archives/196-Clever-trick-challenge.html>

cheers

andrew

Re: raw output from copy

From

Simon Riggs

Date:

02 July 2015, 13:43:47

On 2 July 2015 at 14:02, Andrew Dunstan <andrew@dunslane.net> wrote:

Please don't top-post on the PostgreSQL lists. You've been around here long enough to know that bottom posting is our custom.

I posted a patch for this in 2013 at <http://www.postgresql.org/message-id/50F2FA92.9040000@dunslane.net> but it can apply to a SELECT, and doesn't need COPY. Nobody seemed very interested, so I dropped it. Apparently people now want something along these lines, which is good.

It's a shame that both solutions are restricted to either COPY or psql.

Both of those are working on suggestions from Tom, so there is no history of preference there.

Can we have both please, gentlemen?

If we implemented Andrew's solution, how would we request it in a COPY statement? Seems like we would want the RAW format keyword anyway.

Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: raw output from copy

From

Pavel Stehule

Date:

02 July 2015, 14:02:37

2015-07-02 15:43 GMT+02:00 Simon Riggs <simon@2ndquadrant.com>:

On 2 July 2015 at 14:02, Andrew Dunstan <andrew@dunslane.net> wrote:

Please don't top-post on the PostgreSQL lists. You've been around here long enough to know that bottom posting is our custom.

I posted a patch for this in 2013 at <http://www.postgresql.org/message-id/50F2FA92.9040000@dunslane.net> but it can apply to a SELECT, and doesn't need COPY. Nobody seemed very interested, so I dropped it. Apparently people now want something along these lines, which is good.

It's a shame that both solutions are restricted to either COPY or psql.

Both of those are working on suggestions from Tom, so there is no history of preference there.

Can we have both please, gentlemen?

If we implemented Andrew's solution, how would we request it in a COPY statement? Seems like we would want the RAW format keyword anyway.

I prefer a COPY like solution - it can be used on both sides (server, client), and it can be used little bit simply for psql -c "XXX" pattern.

Regards

Pavel

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: raw output from copy

From

Andrew Dunstan

Date:

02 July 2015, 14:03:11

On 07/02/2015 09:43 AM, Simon Riggs wrote:
> On 2 July 2015 at 14:02, Andrew Dunstan <andrew@dunslane.net 
> <mailto:andrew@dunslane.net>> wrote:
>
>
>     Please don't top-post on the PostgreSQL lists. You've been around
>     here long enough to know that bottom posting is our custom.
>
>     I posted a patch for this in 2013 at
>     <http://www.postgresql.org/message-id/50F2FA92.9040000@dunslane.net>
>     but it can apply to a SELECT, and doesn't need COPY. Nobody seemed
>     very interested, so I dropped it. Apparently people now want
>     something along these lines, which is good.
>
>
> It's a shame that both solutions are restricted to either COPY or psql.
>
> Both of those are working on suggestions from Tom, so there is no 
> history of preference there.
>
> Can we have both please, gentlemen?
>
> If we implemented Andrew's solution, how would we request it in a COPY 
> statement? Seems like we would want the RAW format keyword anyway.
>
>

What's the use case? My original motivation was that I had a function 
that returned a bytea (it was a PDF in fact) that I wanted to be able to 
write to a file. Of course, this is easy enough to do with a client 
library like perl's DBD::Pg, but it seems sad to have to resort to that 
for something so simple.

My original suggestion 
(<http://www.postgresql.org/message-id/4EA1B83B.2050605@pgexperts.com>) 
was to invent a \bcopy command.

I don't have a problem in building in a RAW mode for copy, but we'll 
still need to teach psql how to deal with it.

Another case where it could be useful is JSON - so we can avoid having 
to play tricks like 
<http://adpgtech.blogspot.com/2014/09/importing-json-data.html>. Similar 
considerations probably apply to XML, and the tricks are less guaranteed 
to work.

cheers

andrew

Re: raw output from copy

From

Pavel Stehule

Date:

02 July 2015, 14:13:04

2015-07-02 16:02 GMT+02:00 Andrew Dunstan <andrew@dunslane.net>:

On 07/02/2015 09:43 AM, Simon Riggs wrote:
On 2 July 2015 at 14:02, Andrew Dunstan <andrew@dunslane.net <mailto:andrew@dunslane.net>> wrote:

Please don't top-post on the PostgreSQL lists. You've been around
here long enough to know that bottom posting is our custom.

I posted a patch for this in 2013 at
<http://www.postgresql.org/message-id/50F2FA92.9040000@dunslane.net>
but it can apply to a SELECT, and doesn't need COPY. Nobody seemed
very interested, so I dropped it. Apparently people now want
something along these lines, which is good.

It's a shame that both solutions are restricted to either COPY or psql.

Both of those are working on suggestions from Tom, so there is no history of preference there.

Can we have both please, gentlemen?

If we implemented Andrew's solution, how would we request it in a COPY statement? Seems like we would want the RAW format keyword anyway.

What's the use case? My original motivation was that I had a function that returned a bytea (it was a PDF in fact) that I wanted to be able to write to a file. Of course, this is easy enough to do with a client library like perl's DBD::Pg, but it seems sad to have to resort to that for something so simple.

My original suggestion (<http://www.postgresql.org/message-id/4EA1B83B.2050605@pgexperts.com>) was to invent a \bcopy command.

I don't have a problem in building in a RAW mode for copy, but we'll still need to teach psql how to deal with it.

It can be used from psql without any problems.

Another case where it could be useful is JSON - so we can avoid having to play tricks like <http://adpgtech.blogspot.com/2014/09/importing-json-data.html>. Similar considerations probably apply to XML, and the tricks are less guaranteed to work.

cheers

andrew

Re: raw output from copy

From

Simon Riggs

Date:

02 July 2015, 14:29:09

On 2 July 2015 at 15:07, Pavel Stehule <pavel.stehule@gmail.com> wrote:

It can be used from psql without any problems.

It can, but your patch does not yet do that, while Andrew's does.

We want a solution that works from psql and other clients. Hopefully the same-ish solution.

Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: raw output from copy

From

Andrew Dunstan

Date:

02 July 2015, 14:41:27

On 07/02/2015 10:07 AM, Pavel Stehule wrote:
>
>
> 2015-07-02 16:02 GMT+02:00 Andrew Dunstan <andrew@dunslane.net 
> <mailto:andrew@dunslane.net>>:
>
>
>     On 07/02/2015 09:43 AM, Simon Riggs wrote:
>
>         On 2 July 2015 at 14:02, Andrew Dunstan <andrew@dunslane.net
>         <mailto:andrew@dunslane.net> <mailto:andrew@dunslane.net
>         <mailto:andrew@dunslane.net>>> wrote:
>
>
>             Please don't top-post on the PostgreSQL lists. You've been
>         around
>             here long enough to know that bottom posting is our custom.
>
>             I posted a patch for this in 2013 at
>            
>         <http://www.postgresql.org/message-id/50F2FA92.9040000@dunslane.net>
>             but it can apply to a SELECT, and doesn't need COPY.
>         Nobody seemed
>             very interested, so I dropped it. Apparently people now want
>             something along these lines, which is good.
>
>
>         It's a shame that both solutions are restricted to either COPY
>         or psql.
>
>         Both of those are working on suggestions from Tom, so there is
>         no history of preference there.
>
>         Can we have both please, gentlemen?
>
>         If we implemented Andrew's solution, how would we request it
>         in a COPY statement? Seems like we would want the RAW format
>         keyword anyway.
>
>
>
>
>     What's the use case? My original motivation was that I had a
>     function that returned a bytea (it was a PDF in fact) that I
>     wanted to be able to write to a file. Of course, this is easy
>     enough to do with a client library like perl's DBD::Pg, but it
>     seems sad to have to resort to that for something so simple.
>
>     My original suggestion
>     (<http://www.postgresql.org/message-id/4EA1B83B.2050605@pgexperts.com>)
>     was to invent a \bcopy command.
>
>     I don't have a problem in building in a RAW mode for copy, but
>     we'll still need to teach psql how to deal with it.
>
>
> It can be used from psql without any problems.


In fact your patch will not work with psql's \copy nor to stdout at all, 
unless I'm misreading it:
   -        if (cstate->binary)   +        if (cstate->binary || cstate->raw)                 ereport(ERROR,
            (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),                 errmsg("COPY BINARY is not supported to stdout or
from  stdin")));
 


So it looks like you're only supporting this where the server is writing 
to a file. That's horribly narrow, and certainly doesn't meet my 
original need.

Does the COPY line protocol even support binary data? If not, we're dead 
in the water here from the psql POV. Because my patch doesn't use the 
COPY protocol it doesn't have this problem.

Perhaps we should do both, although I'm not sure I understand the use 
case for the COPY solution.

cheers

andrew

Re: raw output from copy

From

Tom Lane

Date:

02 July 2015, 15:02:37

Andrew Dunstan <andrew@dunslane.net> writes:
> Does the COPY line protocol even support binary data?

The protocol, per se, just transmits a byte stream.  There is a field
in the CopyInResponse/CopyOutResponse messages that indicates whether
a text or binary copy is being done.  One thing we'd have to consider
is whether "raw" mode is sufficiently different from binary to justify
an additional value for this field, and if so whether that constitutes
a protocol break.

IIRC, psql wouldn't really care; it just transfers the byte stream to or
from the target file, regardless of text or binary mode.  But there might
be other client libraries that are smarter and expect "binary" mode to
mean the binary file format specified in the COPY reference page.  So
there may be value in being explicit about "raw" mode in these messages.

A key point in all this is that people who need "raw" transfer probably
need it in both directions, a point that your SELECT proposal cannot
satisfy, but hacking COPY could.  So I lean towards the latter really.
        regards, tom lane

Re: raw output from copy

From

Andrew Dunstan

Date:

02 July 2015, 15:41:56

On 07/02/2015 11:02 AM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> Does the COPY line protocol even support binary data?
> The protocol, per se, just transmits a byte stream.  There is a field
> in the CopyInResponse/CopyOutResponse messages that indicates whether
> a text or binary copy is being done.  One thing we'd have to consider
> is whether "raw" mode is sufficiently different from binary to justify
> an additional value for this field, and if so whether that constitutes
> a protocol break.
>
> IIRC, psql wouldn't really care; it just transfers the byte stream to or
> from the target file, regardless of text or binary mode.  But there might
> be other client libraries that are smarter and expect "binary" mode to
> mean the binary file format specified in the COPY reference page.  So
> there may be value in being explicit about "raw" mode in these messages.
>
> A key point in all this is that people who need "raw" transfer probably
> need it in both directions, a point that your SELECT proposal cannot
> satisfy, but hacking COPY could.  So I lean towards the latter really.
>
>             


OK, let's do that. I await the result with interest.

cheers

andrew

Re: raw output from copy

From

Pavel Stehule

Date:

05 July 2015, 22:56:47

2015-07-02 17:02 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:

Andrew Dunstan <andrew@dunslane.net> writes:
> Does the COPY line protocol even support binary data?

The protocol, per se, just transmits a byte stream. There is a field
in the CopyInResponse/CopyOutResponse messages that indicates whether
a text or binary copy is being done. One thing we'd have to consider
is whether "raw" mode is sufficiently different from binary to justify
an additional value for this field, and if so whether that constitutes
a protocol break. sql/plpgsql_check_passive-9.6.sql

IIRC, psql wouldn't really care; it just transfers the byte stream to or
from the target file, regardless of text or binary mode. But there might
be other client libraries that are smarter and expect "binary" mode to
mean the binary file format specified in the COPY reference page. So
there may be value in being explicit about "raw" mode in these messages.

The safe way is create new mode and propagate it on client. It should to not break any current applications, because no one uses COPY RAW.

A key point in all this is that people who need "raw" transfer probably
need it in both directions, a point that your SELECT proposal cannot
satisfy, but hacking COPY could. So I lean towards the latter really.

yes, it has sense. I am not sure, if I'll have time to implement it in this step, but I'll look on it.

regards

Pavel

regards, tom lane

Re: raw output from copy

From

Pavel Stehule

Date:

06 July 2015, 21:35:06

here is a version with both direction support.

postgres=# copy foo from '/tmp/1.jpg' (format raw);
COPY 1
Time: 93.021 ms
postgres=# \dt+ foo
List of relations
┌────────┬──────┬───────┬───────┬────────┬─────────────┐
│ Schema │ Name │ Type │ Owner │ Size │ Description │
╞════════╪══════╪═══════╪═══════╪════════╪═════════════╡
│ public │ foo │ table │ pavel │ 256 kB │ │
└────────┴──────┴───────┴───────┴────────┴─────────────┘
(1 row)

postgres=# \copy foo to '~/3.jpg' (format raw)
COPY 1
Time: 2.401 ms

Regards

Pavel

2015-07-02 17:02 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:

Andrew Dunstan <andrew@dunslane.net> writes:
> Does the COPY line protocol even support binary data?

The protocol, per se, just transmits a byte stream. There is a field
in the CopyInResponse/CopyOutResponse messages that indicates whether
a text or binary copy is being done. One thing we'd have to consider
is whether "raw" mode is sufficiently different from binary to justify
an additional value for this field, and if so whether that constitutes
a protocol break.

IIRC, psql wouldn't really care; it just transfers the byte stream to or
from the target file, regardless of text or binary mode. But there might
be other client libraries that are smarter and expect "binary" mode to
mean the binary file format specified in the COPY reference page. So
there may be value in being explicit about "raw" mode in these messages.

A key point in all this is that people who need "raw" transfer probably
need it in both directions, a point that your SELECT proposal cannot
satisfy, but hacking COPY could. So I lean towards the latter really.

regards, tom lane

Attachment

copy-raw-format-20150706-01.patch

Re: raw output from copy

From

Pavel Stehule

Date:

07 July 2015, 06:33:39

previous patch was broken, and buggy

Here is new version with fixed upload and more tests

The interesting is so I should not to modify interface or client - so it should to work with any current driver with protocol support >= 3.

Regards

Pavel

2015-07-06 23:34 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:

Hi

here is a version with both direction support.

postgres=# copy foo from '/tmp/1.jpg' (format raw);
COPY 1
Time: 93.021 ms
postgres=# \dt+ foo
List of relations
┌────────┬──────┬───────┬───────┬────────┬─────────────┐
│ Schema │ Name │ Type │ Owner │ Size │ Description │
╞════════╪══════╪═══════╪═══════╪════════╪═════════════╡
│ public │ foo │ table │ pavel │ 256 kB │ │
└────────┴──────┴───────┴───────┴────────┴─────────────┘
(1 row)

postgres=# \copy foo to '~/3.jpg' (format raw)
COPY 1
Time: 2.401 ms

Regards

Pavel

2015-07-02 17:02 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
Andrew Dunstan <andrew@dunslane.net> writes:
> Does the COPY line protocol even support binary data?

The protocol, per se, just transmits a byte stream. There is a field
in the CopyInResponse/CopyOutResponse messages that indicates whether
a text or binary copy is being done. One thing we'd have to consider
is whether "raw" mode is sufficiently different from binary to justify
an additional value for this field, and if so whether that constitutes
a protocol break.

IIRC, psql wouldn't really care; it just transfers the byte stream to or
from the target file, regardless of text or binary mode. But there might
be other client libraries that are smarter and expect "binary" mode to
mean the binary file format specified in the COPY reference page. So
there may be value in being explicit about "raw" mode in these messages.

A key point in all this is that people who need "raw" transfer probably
need it in both directions, a point that your SELECT proposal cannot
satisfy, but hacking COPY could. So I lean towards the latter really.

regards, tom lane

Attachment

copy-raw-format-20150707-02.patch

Re: raw output from copy

From

"Dickson S. Guedes"

Date:

23 July 2015, 20:05:59

2015-07-07 3:32 GMT-03:00 Pavel Stehule <pavel.stehule@gmail.com>:
>
> Hi
>
> previous patch was broken, and buggy
>
> Here is new version with fixed upload and more tests
>
> The interesting is so I should not to modify interface or client - so it should to work with any current driver with
protocolsupport >= 3.
 

Hi Pavel,

Here are some thoughts:

1) from docs: "only row data in network byte order are exported or imported."

Should it be "only raw data"?

2) from docs "Because this format doesn't support any delimiter, only
one value  can be exported or imported. NULL values are not allowed."

That "only one value can be exported or imported" is a little sad for
someone with a table with more than one column that accepts bytea. The
implemented feature doesn't covers the use-case where a table 'image'
has columns: id integer, image bytea, thumbnail bytea, and I want to
import binary data in that. We could put here the cases where we have
NOT NULL columns. Since these are expected and the error messages
complain about that couldn't them be covered in docs more explicitly?

3) from code: "bool row_processed; /* true, when first row was processed */"

Maybe rename the variable to something like `first_row_processed` and
rip off the comment?

4) from code:

if (cstate->raw)   format = 2;
else if (cstate->binary)   format = 1;
else   format = 0;

Maybe create a constant for code readability?


If by one side this feature does not covers a more generalized case,
by other is a nice start, IMHO.

-- 
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://github.com/guedes - http://guedesoft.net
http://www.postgresql.org.br

Re: raw output from copy

From

Pavel Stehule

Date:

25 July 2015, 16:42:05

2015-07-23 22:05 GMT+02:00 Dickson S. Guedes <listas@guedesoft.net>:

2015-07-07 3:32 GMT-03:00 Pavel Stehule <pavel.stehule@gmail.com>:
>
> Hi
>
> previous patch was broken, and buggy
>
> Here is new version with fixed upload and more tests
>
> The interesting is so I should not to modify interface or client - so it should to work with any current driver with protocol support >= 3.

Hi Pavel,

Here are some thoughts:

1) from docs: "only row data in network byte order are exported or imported."

Should it be "only raw data"?

I don't understand well - it use a PostgreSQL buildin "send" functions - and result of these functions is defined as "in network byte order"

2) from docs "Because this format doesn't support any delimiter, only
one value can be exported or imported. NULL values are not allowed."

That "only one value can be exported or imported" is a little sad for
someone with a table with more than one column that accepts bytea. The
implemented feature doesn't covers the use-case where a table 'image'
has columns: id integer, image bytea, thumbnail bytea, and I want to
import binary data in that. We could put here the cases where we have
NOT NULL columns. Since these are expected and the error messages
complain about that couldn't them be covered in docs more explicitly?

This mode should not to replace current COPY binary mode. RAW binary output for multiple fields is terrible complex task - you can use a fix length, you can use some special separator etc. I remember a terrible complex bulkload on Oracle or MSSQL - and I would to design it differently. I prefer to have a COPY statement simple as possible - If you need import/export all fields in record - then you can:

1. you can use a new LO api (for import) - load binary files as LO, INSERT and drop used LO

2. call more COPY statements, and join exported files with operation system tools (for export),

3. you can write specialized application that will support a COPY API and export, import data in your preferred format.

The same complexity is with input, and I would not to write generic binary files parser.

3) from code: "bool row_processed; /* true, when first row was processed */"

in this mode is only one row - so first_row_processed sounds little bit strange.

Maybe rename the variable to something like `first_row_processed` and
rip off the comment?

4) from code:

if (cstate->raw)
format = 2;
else if (cstate->binary)
format = 1;
else
format = 0;

Maybe create a constant for code readability?

good idea

If by one side this feature does not covers a more generalized case,
by other is a nice start, IMHO.

It is exactly what I don't would - the complexity of usage can go up to sky with generic binary format file processing.

Regards

Pavel

--
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://github.com/guedes - http://guedesoft.net
http://www.postgresql.org.br

Re: raw output from copy

From

Craig Ringer

Date:

27 July 2015, 03:55:42

On 7 July 2015 at 14:32, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hi
>
> previous patch was broken, and buggy
>
> Here is new version with fixed upload and more tests

I routinely see people trying to use COPY ... FORMAT binary to export
a single binary field (like an image, for example) and getting
confused by the header PostgreSQL adds. Or using text-format COPY and
struggling with the hex escaping. It's clearly something people have
trouble with.

It doesn't help that while lo_import and lo_export can read paths
outside the datadir (and refuse to read from within it),
pg_read_binary_file is superuser only and disallows absolute paths.
There's no corresponding pg_write_binary_file. So users who want to
import and export a single binary field tend to try to use COPY. We
have functionality for large objects that has no equivalent for
'bytea'.

I don't love the use of COPY for this, but it gets us support for
arbitrary clients pretty easily. Otherwise it'd be server-side only
via local filesystem access, or require special psql-specific
functionality like we have for lo_import etc.

The main point is that this is a real world thing. People want to do
it, try to do it, and have problems doing it. So it's a solution a
real issue.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: raw output from copy

From

Heikki Linnakangas

Date:

27 July 2015, 08:41:48

On 07/27/2015 06:55 AM, Craig Ringer wrote:
> On 7 July 2015 at 14:32, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> Hi
>>
>> previous patch was broken, and buggy
>>
>> Here is new version with fixed upload and more tests
>
> I routinely see people trying to use COPY ... FORMAT binary to export
> a single binary field (like an image, for example) and getting
> confused by the header PostgreSQL adds. Or using text-format COPY and
> struggling with the hex escaping. It's clearly something people have
> trouble with.
>
> It doesn't help that while lo_import and lo_export can read paths
> outside the datadir (and refuse to read from within it),
> pg_read_binary_file is superuser only and disallows absolute paths.
> There's no corresponding pg_write_binary_file. So users who want to
> import and export a single binary field tend to try to use COPY. We
> have functionality for large objects that has no equivalent for
> 'bytea'.
>
> I don't love the use of COPY for this, but it gets us support for
> arbitrary clients pretty easily. Otherwise it'd be server-side only
> via local filesystem access, or require special psql-specific
> functionality like we have for lo_import etc.

COPY seems like a strange interface for this. I can see the point that 
the syntax is almost there already, for both input and output. But even 
that's not quite there yet, we'd need the new RAW format. And as an 
input method, COPY is a bit awkward, because you cannot easily pass the 
file to a function, for example. I think this should be implemented in 
psql, along the lines of Andrew's original \bcopy patch.

There are a couple of related psql-features here actually, that would be 
useful on their own. The first is being able to send the query result to 
a file, for a single query only. You can currently do:

\o /tmp/foo
SELECT ...;
\o

But more often than not, when I try to do that, I forget to do the last 
\o, and run another query, and the output still goes to the file. So 
it'd be nice to have a \o option that only affects the next query. 
Something like:

\O /tmp/foo
SELECT ...;

The second feature needed is to write the output without any headers, 
row delimiters and such. Just the datum. And the third feature is to 
write it in binary. Perhaps something like:

\O /tmp/foo binary
SELECT blob FROM foo WHERE id = 10;

What about input? This is a whole new feature, but it would be nice to 
be able to pass the file contents as a query parameter. Something like:

\P /tmp/foo binary
INSERT INTO foo VALUES (?);

- Heikki

Re: raw output from copy

From

Pavel Stehule

Date:

27 July 2015, 11:29:40

2015-07-27 10:41 GMT+02:00 Heikki Linnakangas <hlinnaka@iki.fi>:

On 07/27/2015 06:55 AM, Craig Ringer wrote:
On 7 July 2015 at 14:32, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

previous patch was broken, and buggy

Here is new version with fixed upload and more tests

I routinely see people trying to use COPY ... FORMAT binary to export
a single binary field (like an image, for example) and getting
confused by the header PostgreSQL adds. Or using text-format COPY and
struggling with the hex escaping. It's clearly something people have
trouble with.

It doesn't help that while lo_import and lo_export can read paths
outside the datadir (and refuse to read from within it),
pg_read_binary_file is superuser only and disallows absolute paths.
There's no corresponding pg_write_binary_file. So users who want to
import and export a single binary field tend to try to use COPY. We
have functionality for large objects that has no equivalent for
'bytea'.

I don't love the use of COPY for this, but it gets us support for
arbitrary clients pretty easily. Otherwise it'd be server-side only
via local filesystem access, or require special psql-specific
functionality like we have for lo_import etc.

COPY seems like a strange interface for this. I can see the point that the syntax is almost there already, for both input and output. But even that's not quite there yet, we'd need the new RAW format. And as an input method, COPY is a bit awkward, because you cannot easily pass the file to a function, for example. I think this should be implemented in psql, along the lines of Andrew's original \bcopy patch.

There are a couple of related psql-features here actually, that would be useful on their own. The first is being able to send the query result to a file, for a single query only. You can currently do:

\o /tmp/foo
SELECT ...;
\o

But more often than not, when I try to do that, I forget to do the last \o, and run another query, and the output still goes to the file. So it'd be nice to have a \o option that only affects the next query. Something like:

\O /tmp/foo
SELECT ...;

The second feature needed is to write the output without any headers, row delimiters and such. Just the datum. And the third feature is to write it in binary. Perhaps something like:

\O /tmp/foo binary
SELECT blob FROM foo WHERE id = 10;

What about input? This is a whole new feature, but it would be nice to be able to pass the file contents as a query parameter. Something like:

\P /tmp/foo binary
INSERT INTO foo VALUES (?);

The example of input is strong reason, why don't do it via inserts. Only parsing some special "?" symbol needs lot of new code.

In this case, I don't see any advantage of psql based solution. COPY is standard interface for input/output from/to files, and it should be used there.

Regards

Pavel

- Heikki

Re: raw output from copy

From

Heikki Linnakangas

Date:

05 August 2015, 20:59:44

On 07/27/2015 02:28 PM, Pavel Stehule wrote:
> 2015-07-27 10:41 GMT+02:00 Heikki Linnakangas <hlinnaka@iki.fi>:
>
>> What about input? This is a whole new feature, but it would be nice to be
>> able to pass the file contents as a query parameter. Something like:
>>
>> \P /tmp/foo binary
>> INSERT INTO foo VALUES (?);
>
> The example of input is strong reason, why don't do it via inserts. Only
> parsing some special "?" symbol needs lot of new code.

Sorry, I meant $1 in place of the ?. No special parsing needed, psql can 
send the query to the server as is, with the parameters that are given 
by this new mechanism.

> In this case, I don't see any advantage of  psql based solution. COPY is
> standard interface for input/output from/to files, and it should be used
> there.

I'm not too happy with the COPY approach, although I won't object is one 
of the other committers feel more comfortable with it. However, we don't 
seem to be making progress here, so I'm going to mark this as Returned 
with Feedback. I don't feel good about that either, because I don't 
actually have any great suggestions on how to move this forward. Which 
is a pity because this is a genuine problem for users.

- Heikki

Re: raw output from copy

From

Andrew Dunstan

Date:

05 August 2015, 22:09:16

On 08/05/2015 04:59 PM, Heikki Linnakangas wrote:
> On 07/27/2015 02:28 PM, Pavel Stehule wrote:
>> 2015-07-27 10:41 GMT+02:00 Heikki Linnakangas <hlinnaka@iki.fi>:
>>
>>> What about input? This is a whole new feature, but it would be nice 
>>> to be
>>> able to pass the file contents as a query parameter. Something like:
>>>
>>> \P /tmp/foo binary
>>> INSERT INTO foo VALUES (?);
>>
>> The example of input is strong reason, why don't do it via inserts. Only
>> parsing some special "?" symbol needs lot of new code.
>
> Sorry, I meant $1 in place of the ?. No special parsing needed, psql 
> can send the query to the server as is, with the parameters that are 
> given by this new mechanism.
>
>> In this case, I don't see any advantage of  psql based solution. COPY is
>> standard interface for input/output from/to files, and it should be used
>> there.
>
> I'm not too happy with the COPY approach, although I won't object is 
> one of the other committers feel more comfortable with it. However, we 
> don't seem to be making progress here, so I'm going to mark this as 
> Returned with Feedback. I don't feel good about that either, because I 
> don't actually have any great suggestions on how to move this forward. 
> Which is a pity because this is a genuine problem for users.
>
>

This is really only a psql problem, IMNSHO. Inserting and extracting 
binary data is pretty trivial for most users of client libraries (e.g. 
it's a couple of lines of code in a DBD::Pg program), but it's hard in psql.

I do agree that the COPY approach feels more than a little klunky.

cheers

andrew

Re: raw output from copy

From

Pavel Stehule

Date:

06 August 2015, 08:37:39

<p dir="ltr">Hi,<p dir="ltr">Psql based implementation needs new infrastructure (more than few lines)<p
dir="ltr">Missing:<pdir="ltr">* binary mode support<br /> * parametrized query support,<p dir="ltr">I am not against,
butboth points I proposed, and both was rejected.<p dir="ltr">So why dont use current infrastructure? Raw copy is
trivialpatch.<div class="gmail_quote">Dne 6.8.2015 0:09 napsal uživatel "Andrew Dunstan" <<a
href="mailto:andrew@dunslane.net">andrew@dunslane.net</a>>:<brtype="attribution" /><blockquote class="gmail_quote"
style="margin:00 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br /> On 08/05/2015 04:59 PM, Heikki Linnakangas
wrote:<br/><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> On
07/27/201502:28 PM, Pavel Stehule wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px
#cccsolid;padding-left:1ex"> 2015-07-27 10:41 GMT+02:00 Heikki Linnakangas <<a href="mailto:hlinnaka@iki.fi"
target="_blank">hlinnaka@iki.fi</a>>:<br/><br /><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px#ccc solid;padding-left:1ex"> What about input? This is a whole new feature, but it would be nice
tobe<br /> able to pass the file contents as a query parameter. Something like:<br /><br /> \P /tmp/foo binary<br />
INSERTINTO foo VALUES (?);<br /></blockquote><br /> The example of input is strong reason, why don't do it via inserts.
Only<br/> parsing some special "?" symbol needs lot of new code.<br /></blockquote><br /> Sorry, I meant $1 in place of
the?. No special parsing needed, psql can send the query to the server as is, with the parameters that are given by
thisnew mechanism.<br /><br /><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">In this case, I don't see any advantage of  psql based solution. COPY is<br /> standard
interfacefor input/output from/to files, and it should be used<br /> there.<br /></blockquote><br /> I'm not too happy
withthe COPY approach, although I won't object is one of the other committers feel more comfortable with it. However,
wedon't seem to be making progress here, so I'm going to mark this as Returned with Feedback. I don't feel good about
thateither, because I don't actually have any great suggestions on how to move this forward. Which is a pity because
thisis a genuine problem for users.<br /><br /><br /></blockquote><br /> This is really only a psql problem, IMNSHO.
Insertingand extracting binary data is pretty trivial for most users of client libraries (e.g. it's a couple of lines
ofcode in a DBD::Pg program), but it's hard in psql.<br /><br /> I do agree that the COPY approach feels more than a
littleklunky.<br /><br /> cheers<br /><br /> andrew<br /><br /><br /></blockquote></div>

Re: raw output from copy

From

Pavel Stehule

Date:

27 February 2016, 07:27:05

2015-08-06 10:37 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:

Hi,
Psql based implementation needs new infrastructure (more than few lines)
Missing:
* binary mode support
* parametrized query support,
I am not against, but both points I proposed, and both was rejected.
So why dont use current infrastructure? Raw copy is trivial patch.

I was asked by Daniel Verite about reopening this patch in opened commitfest.

I am sending rebased patch

Regards

Pavel

Attachment

copy-raw-format-20160227-03.patch

Re: raw output from copy

From

Corey Huinker

Date:

04 March 2016, 02:13:41

On Sat, Feb 27, 2016 at 2:26 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Hi

2015-08-06 10:37 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi,
Psql based implementation needs new infrastructure (more than few lines)
Missing:
* binary mode support
* parametrized query support,
I am not against, but both points I proposed, and both was rejected.
So why dont use current infrastructure? Raw copy is trivial patch.

I was asked by Daniel Verite about reopening this patch in opened commitfest.

I am sending rebased patch

Regards

Pavel

Since this patch does something I need for my own work, I've signed up as a reviewer.

From a design standpoint, I feel that COPY is the preferred means of dealing with data from sources too transient to justify setting up a foreign data wrapper, and too simple to justify writing application code. So, for me, RAW is the right solution, or at least *a* right solution.

My first pass of reading the code changes and the regression tests is complete, and I found the changes to be clear and fairly straightforward. This shouldn't surprise anyone, as the previous reviewers had only minor quibbles with the code. So far, so good.

The regression tests seem to adequately cover all new functionality, though I wonder if we should add some cases that highlight situations where BINARY mode is insufficient.

Before I give my approval, I want to read it again more closely to make sure that no cases were skipped with regard to the (binary || raw) and (binary || !raw) tests. Also, I want to use it on some of my problematic files. Maybe I'll find a good edge case. Probably not.

I hope to find time for those things in the next few days.

Re: raw output from copy

From

Pavel Stehule

Date:

04 March 2016, 07:53:42

2016-03-04 3:13 GMT+01:00 Corey Huinker <corey.huinker@gmail.com>:

On Sat, Feb 27, 2016 at 2:26 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

2015-08-06 10:37 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi,
Psql based implementation needs new infrastructure (more than few lines)
Missing:
* binary mode support
* parametrized query support,
I am not against, but both points I proposed, and both was rejected.
So why dont use current infrastructure? Raw copy is trivial patch.

I was asked by Daniel Verite about reopening this patch in opened commitfest.

I am sending rebased patch

Regards

Pavel

Since this patch does something I need for my own work, I've signed up as a reviewer.

From a design standpoint, I feel that COPY is the preferred means of dealing with data from sources too transient to justify setting up a foreign data wrapper, and too simple to justify writing application code. So, for me, RAW is the right solution, or at least *a* right solution.

my opinion is same - there all necessary infrastructure is ready and when we work with IO, then we use COPY natively. I hope so main use case (export bytea) is solved, but there are a possibility to enhance this command by COPY options - what is, I am thinking, a advantage of this way.

My first pass of reading the code changes and the regression tests is complete, and I found the changes to be clear and fairly straightforward. This shouldn't surprise anyone, as the previous reviewers had only minor quibbles with the code. So far, so good.

The regression tests seem to adequately cover all new functionality, though I wonder if we should add some cases that highlight situations where BINARY mode is insufficient.

Before I give my approval, I want to read it again more closely to make sure that no cases were skipped with regard to the (binary || raw) and (binary || !raw) tests. Also, I want to use it on some of my problematic files. Maybe I'll find a good edge case. Probably not.

I hope to find time for those things in the next few days.

Thank you very much

Regards

Pavel

Re: raw output from copy

From

Ildar Musin

Date:

04 March 2016, 09:51:36

Hi Pavel

27/02/16 10:26, Pavel Stehule пишет:

Hi

2015-08-06 10:37 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi,
Psql based implementation needs new infrastructure (more than few lines)
Missing:
* binary mode support
* parametrized query support,
I am not against, but both points I proposed, and both was rejected.
So why dont use current infrastructure? Raw copy is trivial patch.

I was asked by Daniel Verite about reopening this patch in opened commitfest.

I am sending rebased patch

Regards

Pavel

I am new to reviewing, here is what I got. Patch have been applied nicely to the HEAD. I tried to upload and export files in psql, it works as expected. All regression tests are passed without problems as well. Code looks good for me. There is a little confusion for me in this line of documentation:

"use any metadata - only row data in network byte order are exported"

Did you mean "only raw data in network byte order is exported"?

And there are two entries for this patch on commitfest page: in "miscellaneous" and "sql" sections. Probably it's better to remove one of them to avoid confusion.

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: raw output from copy

From

"Daniel Verite"

Date:

04 March 2016, 14:54:52

    Corey Huinker wrote:

> So, for me, RAW is the right solution, or at least *a* right solution.

Questions on how to extract from a bytea column come up on a regular
basis, as in [1] [2] [3], or [4] a few days ago, and so far the answers
are to encode the contents in text and decode them in an additional
step, or use COPY BINARY and filter out the headers.

But none of this is as straightforward and efficient as the proposed
COPY RAW.
Also the conversion to text can't be used at all on very large
contents (>512MB), as mentioned in another recent thread [5]
(this is the same reason why pg_dump can't dump such rows),
but COPY RAW doesn't have this limitation.

Technically COPY BINARY should be sufficient, but it seems that
people dislike having to deal with its headers.
Also it's not supported by any of the drivers of popular
script languages that otherwise provide COPY in text format
(DBD::Pg, php, psycopg2...)
Maybe the RAW format would have a better chance to get support
there, because of its simplicity.

[1]
http://www.postgresql.org/message-id/038517CEB6DE43BD8422D7947B6BE8D8@fanliji
ng

[2] http://www.postgresql.org/message-id/4C8272C4.1000008@arcor.de

[3] http://stackoverflow.com/questions/6730729

[4] http://www.postgresql.org/message-id/56C66565.50107@consistentstate.com

[5] http://www.postgresql.org/message-id/14620.1456851036@sss.pgh.pa.us


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: raw output from copy

From

Pavel Stehule

Date:

04 March 2016, 17:06:46

2016-03-04 15:54 GMT+01:00 Daniel Verite <daniel@manitou-mail.org>:

Corey Huinker wrote:

> So, for me, RAW is the right solution, or at least *a* right solution.

Questions on how to extract from a bytea column come up on a regular
basis, as in [1] [2] [3], or [4] a few days ago, and so far the answers
are to encode the contents in text and decode them in an additional
step, or use COPY BINARY and filter out the headers.

But none of this is as straightforward and efficient as the proposed
COPY RAW.
Also the conversion to text can't be used at all on very large
contents (>512MB), as mentioned in another recent thread [5]
(this is the same reason why pg_dump can't dump such rows),
but COPY RAW doesn't have this limitation.

Technically COPY BINARY should be sufficient, but it seems that
people dislike having to deal with its headers.

Also it's not supported by any of the drivers of popular
script languages that otherwise provide COPY in text format
(DBD::Pg, php, psycopg2...)
Maybe the RAW format would have a better chance to get support
there, because of its simplicity.

exactly - I would to decrease dependency on PostgreSQL internals. Working with clean content is simple and possible with any environment without unclean operations.

Regards

Pavel

[1]
http://www.postgresql.org/message-id/038517CEB6DE43BD8422D7947B6BE8D8@fanliji
ng

[2] http://www.postgresql.org/message-id/4C8272C4.1000008@arcor.de

[3] http://stackoverflow.com/questions/6730729

[4] http://www.postgresql.org/message-id/56C66565.50107@consistentstate.com

[5] http://www.postgresql.org/message-id/14620.1456851036@sss.pgh.pa.us

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: raw output from copy

From

Pavel Stehule

Date:

08 March 2016, 17:02:20

2016-03-04 18:06 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:

2016-03-04 15:54 GMT+01:00 Daniel Verite <daniel@manitou-mail.org>:
Corey Huinker wrote:

> So, for me, RAW is the right solution, or at least *a* right solution.

Questions on how to extract from a bytea column come up on a regular
basis, as in [1] [2] [3], or [4] a few days ago, and so far the answers
are to encode the contents in text and decode them in an additional
step, or use COPY BINARY and filter out the headers.

But none of this is as straightforward and efficient as the proposed
COPY RAW.
Also the conversion to text can't be used at all on very large
contents (>512MB), as mentioned in another recent thread [5]
(this is the same reason why pg_dump can't dump such rows),
but COPY RAW doesn't have this limitation.

Technically COPY BINARY should be sufficient, but it seems that
people dislike having to deal with its headers.
Also it's not supported by any of the drivers of popular
script languages that otherwise provide COPY in text format
(DBD::Pg, php, psycopg2...)
Maybe the RAW format would have a better chance to get support
there, because of its simplicity.

exactly - I would to decrease dependency on PostgreSQL internals. Working with clean content is simple and possible with any environment without unclean operations.

COPY RAW can be used for import. I am not sure if this use case was tested.

cat image.jpg | psql -c "CREATE TEMP TABLE auxbuf(image bytea); COPY auxbuf(image) FROM stdin RAW; ..." postgres

Regards

Pavel

Regards

Pavel

[1]
http://www.postgresql.org/message-id/038517CEB6DE43BD8422D7947B6BE8D8@fanliji
ng

[2] http://www.postgresql.org/message-id/4C8272C4.1000008@arcor.de

[3] http://stackoverflow.com/questions/6730729

[4] http://www.postgresql.org/message-id/56C66565.50107@consistentstate.com

[5] http://www.postgresql.org/message-id/14620.1456851036@sss.pgh.pa.us

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: raw output from copy

From

Corey Huinker

Date:

09 March 2016, 17:41:43

The regression tests seem to adequately cover all new functionality, though I wonder if we should add some cases that highlight situations where BINARY mode is insufficient.

One thing I tried to test RAW was to load an existing json file.

My own personal test was to load an existing .json file into a 1x1 bytea table, which worked. From there I was able to

select encode(col_name,'escape')::text::jsonb from test_table

and the json was correctly converted.

A similar test copying binary failed.

A write up of the test looks like this:

\copy (select '{"foo": "bar"}') to '/tmp/raw_test.jsonb' (format raw);
COPY 1
create temporary table raw_byte (b bytea);
CREATE TABLE
create temporary table raw_text (t text);
CREATE TABLE
\copy raw_jsonb from '/tmp/raw_test.blob' (format raw);
psql:/home/ubuntu/raw_test.sql:9: ERROR: relation "raw_jsonb" does not exist
\copy raw_byte from '/tmp/raw_test.blob' (format raw);
COPY 1
select encode(b,'escape')::text::json from raw_byte;
encode
----------------
{"foo": "bar"}
(1 row)

\copy raw_text from '/tmp/raw_test.blob' (format raw);
COPY 1
select t::jsonb from raw_text;
t
----------------
{"foo": "bar"}
(1 row)

create temporary table binary_byte (b bytea);
CREATE TABLE
create temporary table binary_text (t text);
CREATE TABLE
\copy binary_byte from '/tmp/raw_test.blob' (format binary);
psql:/home/ubuntu/raw_test.sql:22: ERROR: COPY file signature not recognized
select encode(b,'escape')::jsonb from binary_byte;
encode
--------
(0 rows)

\copy binary_text from '/tmp/raw_test.blob' (format binary);
psql:/home/ubuntu/raw_test.sql:26: ERROR: COPY file signature not recognized
select t::jsonb from binary_text;
t
---
(0 rows)

So, if we want to add a regression test to demonstrate to posterity why we need RAW for cases that BINARY can't handle, I offer the attached file.

Does anyone else see value in adding that to the regression tests?

Before I give my approval, I want to read it again more closely to make sure that no cases were skipped with regard to the (binary || raw) and (binary || !raw) tests. Also, I want to use it on some of my problematic files. Maybe I'll find a good edge case. Probably not.

I don't know why I thought this, but when I looked at the patch, I assumed that the ( binary || raw ) tests were part of a large if/elseif/else waterfall. They are not. They stand alone. There are no edge cases to find.

Review complete and passed. I can re-review if we want to add the additional test.

Attachment

raw_test.sql

Re: raw output from copy

From

Pavel Stehule

Date:

12 March 2016, 06:25:40

2016-03-09 18:41 GMT+01:00 Corey Huinker <corey.huinker@gmail.com>:

The regression tests seem to adequately cover all new functionality, though I wonder if we should add some cases that highlight situations where BINARY mode is insufficient.

One thing I tried to test RAW was to load an existing json file.

My own personal test was to load an existing .json file into a 1x1 bytea table, which worked. From there I was able to
select encode(col_name,'escape')::text::jsonb from test_table
and the json was correctly converted.

A similar test copying binary failed.

A write up of the test looks like this:

\copy (select '{"foo": "bar"}') to '/tmp/raw_test.jsonb' (format raw);
COPY 1
create temporary table raw_byte (b bytea);
CREATE TABLE
create temporary table raw_text (t text);
CREATE TABLE
\copy raw_jsonb from '/tmp/raw_test.blob' (format raw);
psql:/home/ubuntu/raw_test.sql:9: ERROR: relation "raw_jsonb" does not exist
\copy raw_byte from '/tmp/raw_test.blob' (format raw);
COPY 1
select encode(b,'escape')::text::json from raw_byte;
encode
----------------
{"foo": "bar"}
(1 row)

\copy raw_text from '/tmp/raw_test.blob' (format raw);
COPY 1
select t::jsonb from raw_text;
t
----------------
{"foo": "bar"}
(1 row)

create temporary table binary_byte (b bytea);
CREATE TABLE
create temporary table binary_text (t text);
CREATE TABLE
\copy binary_byte from '/tmp/raw_test.blob' (format binary);
psql:/home/ubuntu/raw_test.sql:22: ERROR: COPY file signature not recognized
select encode(b,'escape')::jsonb from binary_byte;
encode
--------
(0 rows)

\copy binary_text from '/tmp/raw_test.blob' (format binary);
psql:/home/ubuntu/raw_test.sql:26: ERROR: COPY file signature not recognized
select t::jsonb from binary_text;
t
---
(0 rows)

So, if we want to add a regression test to demonstrate to posterity why we need RAW for cases that BINARY can't handle, I offer the attached file.

I don't think so regress tests should to do this demonstration. It is clean, so COPY BINARY should to fail every time, and then there is not any benefit from it in regress tests. There are lot of discussion in this thread, and we don't need to inject more "garbage" to regress tests.

Does anyone else see value in adding that to the regression tests?

Before I give my approval, I want to read it again more closely to make sure that no cases were skipped with regard to the (binary || raw) and (binary || !raw) tests. Also, I want to use it on some of my problematic files. Maybe I'll find a good edge case. Probably not.

I don't know why I thought this, but when I looked at the patch, I assumed that the ( binary || raw ) tests were part of a large if/elseif/else waterfall. They are not. They stand alone. There are no edge cases to find.

This is organized to files by necessity to work with external files. The regress tests for COPY RAW has about 100 lines - so why need special files and infrastructure. COPY RAW, COPY BINARY tests well shares infrastructure.

Review complete and passed. I can re-review if we want to add the additional test.

Great, thank you very much. I hope so this feature really useful. It allow to simple export/import XML doc in different encodings, JSONs and can be enhanced future via options. The nice feature (but not for this release) can be additional cast info for import -- like "COPY table(jsonb_column) FROM stdin (FORMAT RAW, CAST json_2_jsonb). Because there are the options, there are big space for other enhancing.

Regards

Pavel

Re: raw output from copy

From

David Steele

Date:

18 March 2016, 15:32:42

On 3/12/16 1:24 AM, Pavel Stehule wrote:

> Great, thank you very much. I hope so this feature really useful. It
> allow to simple export/import XML doc in different encodings, JSONs and
> can be enhanced future via options. The nice feature  (but not for this
> release) can be additional cast info for import -- like "COPY
> table(jsonb_column) FROM stdin (FORMAT RAW, CAST json_2_jsonb). Because
> there are the options, there are big space for other enhancing.

Andres Karlsson pointed out that this patch has two CF entries:

https://commitfest.postgresql.org/9/223/
https://commitfest.postgresql.org/9/547/

I closed the one that was in the "needs review" (547) state and kept the 
one that is "ready for committer" (223).

-- 
-David
david@pgmasters.net

Re: raw output from copy

From

Pavel Stehule

Date:

18 March 2016, 17:19:36

2016-03-18 16:32 GMT+01:00 David Steele <david@pgmasters.net>:

On 3/12/16 1:24 AM, Pavel Stehule wrote:

Great, thank you very much. I hope so this feature really useful. It
allow to simple export/import XML doc in different encodings, JSONs and
can be enhanced future via options. The nice feature (but not for this
release) can be additional cast info for import -- like "COPY
table(jsonb_column) FROM stdin (FORMAT RAW, CAST json_2_jsonb). Because
there are the options, there are big space for other enhancing.

Andres Karlsson pointed out that this patch has two CF entries:

https://commitfest.postgresql.org/9/223/
https://commitfest.postgresql.org/9/547/

I closed the one that was in the "needs review" (547) state and kept the one that is "ready for committer" (223).

sure, thank you

Pavel

--
-David
david@pgmasters.net

Re: raw output from copy

From

Tom Lane

Date:

28 March 2016, 22:27:32

Pavel Stehule <pavel.stehule@gmail.com> writes:
> [ copy-raw-format-20160227-03.patch ]

I looked at this patch.  I'm having a hard time accepting that it has
a use-case large enough to justify it, and here's the reason: it's
a protocol break.  Conveniently omitting to update protocol.sgml
doesn't make it not a protocol break.  (libpq.sgml also contains
assorted statements that are falsified by this patch.)

You could argue that it's the user's own fault if he tries to use
COPY RAW with client-side code that hasn't been updated to support it.
Maybe that's okay, but I wonder if we're opening ourselves up to
problems.  Maybe even security-grade problems.

In terms of specific code that hasn't been updated, ecpg is broken
by this patch, and I'm not very sure what libpq's PQbinaryTuples()
ought to do but probably something other than what it does today.

There's also a definitional question of what we think PQfformat() ought
to do; should it return "2" for the per-field format?  Or maybe the
per-field format is still "1", since it's after all the same binary data
format as for COPY BINARY, and only the overall copy format reported by
PQbinaryTuples() should change to "2".

BTW, I'm not really sure why the patch is trying to enforce single
row and column for the COPY OUT case.  I thought the idea for that
was that we'd just shove out the data without any delimiters, and
if it's more than one datum it's the user's problem whether he can
identify the boundaries.  On the input side we would have to insist
on one column since we're not going to attempt to identify boundaries
(and one row would fall out of the fact that we slurp the entire input
and treat it as one datum).

Anyway this is certainly not committable as-is, so I'm setting it back
to Waiting on Author.  But the fact that both libpq and ecpg would need
updates makes me question whether we can safely pretend that this isn't
a protocol break.
        regards, tom lane

Re: raw output from copy

From

Andrew Dunstan

Date:

29 March 2016, 03:12:21


On 03/28/2016 06:26 PM, Tom Lane wrote:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> [ copy-raw-format-20160227-03.patch ]
> I looked at this patch.  I'm having a hard time accepting that it has
> a use-case large enough to justify it, and here's the reason: it's
> a protocol break.  Conveniently omitting to update protocol.sgml
> doesn't make it not a protocol break.  (libpq.sgml also contains
> assorted statements that are falsified by this patch.)
>
> You could argue that it's the user's own fault if he tries to use
> COPY RAW with client-side code that hasn't been updated to support it.
> Maybe that's okay, but I wonder if we're opening ourselves up to
> problems.  Maybe even security-grade problems.
>
> In terms of specific code that hasn't been updated, ecpg is broken
> by this patch, and I'm not very sure what libpq's PQbinaryTuples()
> ought to do but probably something other than what it does today.
>
> There's also a definitional question of what we think PQfformat() ought
> to do; should it return "2" for the per-field format?  Or maybe the
> per-field format is still "1", since it's after all the same binary data
> format as for COPY BINARY, and only the overall copy format reported by
> PQbinaryTuples() should change to "2".
>
> BTW, I'm not really sure why the patch is trying to enforce single
> row and column for the COPY OUT case.  I thought the idea for that
> was that we'd just shove out the data without any delimiters, and
> if it's more than one datum it's the user's problem whether he can
> identify the boundaries.  On the input side we would have to insist
> on one column since we're not going to attempt to identify boundaries
> (and one row would fall out of the fact that we slurp the entire input
> and treat it as one datum).
>
> Anyway this is certainly not committable as-is, so I'm setting it back
> to Waiting on Author.  But the fact that both libpq and ecpg would need
> updates makes me question whether we can safely pretend that this isn't
> a protocol break.
>
>             


In that case I humbly submit that there is a case for reviving the psql 
patch I posted that kicked off this whole thing and lets you export a 
piece of binary data from psql quite easily. It should certainly not 
involve any protocol break.

cheers

andrew

Re: raw output from copy

From

Pavel Stehule

Date:

29 March 2016, 03:19:03

2016-03-29 5:12 GMT+02:00 Andrew Dunstan <andrew@dunslane.net>:

On 03/28/2016 06:26 PM, Tom Lane wrote:
Pavel Stehule <pavel.stehule@gmail.com> writes:
[ copy-raw-format-20160227-03.patch ]
I looked at this patch. I'm having a hard time accepting that it has
a use-case large enough to justify it, and here's the reason: it's
a protocol break. Conveniently omitting to update protocol.sgml
doesn't make it not a protocol break. (libpq.sgml also contains
assorted statements that are falsified by this patch.)

You could argue that it's the user's own fault if he tries to use
COPY RAW with client-side code that hasn't been updated to support it.
Maybe that's okay, but I wonder if we're opening ourselves up to
problems. Maybe even security-grade problems.

In terms of specific code that hasn't been updated, ecpg is broken
by this patch, and I'm not very sure what libpq's PQbinaryTuples()
ought to do but probably something other than what it does today.

There's also a definitional question of what we think PQfformat() ought
to do; should it return "2" for the per-field format? Or maybe the
per-field format is still "1", since it's after all the same binary data
format as for COPY BINARY, and only the overall copy format reported by
PQbinaryTuples() should change to "2".

BTW, I'm not really sure why the patch is trying to enforce single
row and column for the COPY OUT case. I thought the idea for that
was that we'd just shove out the data without any delimiters, and
if it's more than one datum it's the user's problem whether he can
identify the boundaries. On the input side we would have to insist
on one column since we're not going to attempt to identify boundaries
(and one row would fall out of the fact that we slurp the entire input
and treat it as one datum).

Anyway this is certainly not committable as-is, so I'm setting it back
to Waiting on Author. But the fact that both libpq and ecpg would need
updates makes me question whether we can safely pretend that this isn't
a protocol break.

In that case I humbly submit that there is a case for reviving the psql patch I posted that kicked off this whole thing and lets you export a piece of binary data from psql quite easily. It should certainly not involve any protocol break.

The psql only solution can work only for output. Doesn't help with input.

Regards

Pavel

cheers

andrew

Re: raw output from copy

From

Pavel Stehule

Date:

29 March 2016, 03:36:46

2016-03-29 0:26 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:
> [ copy-raw-format-20160227-03.patch ]

I looked at this patch. I'm having a hard time accepting that it has
a use-case large enough to justify it, and here's the reason: it's
a protocol break. Conveniently omitting to update protocol.sgml
doesn't make it not a protocol break. (libpq.sgml also contains
assorted statements that are falsified by this patch.)

The reply on this question depends how we would to be strict. This doesn't change the format in types stream, but it creates new enum value. Correctly written should to raise exception when is processing unknown enum value.

I'll do tests against old libpq.

You could argue that it's the user's own fault if he tries to use
COPY RAW with client-side code that hasn't been updated to support it.
Maybe that's okay, but I wonder if we're opening ourselves up to
problems. Maybe even security-grade problems.

In terms of specific code that hasn't been updated, ecpg is broken
by this patch, and I'm not very sure what libpq's PQbinaryTuples()
ought to do but probably something other than what it does today.

There's also a definitional question of what we think PQfformat() ought
to do; should it return "2" for the per-field format? Or maybe the
per-field format is still "1", since it's after all the same binary data
format as for COPY BINARY, and only the overall copy format reported by
PQbinaryTuples() should change to "2".

I'll recheck it

BTW, I'm not really sure why the patch is trying to enforce single
row and column for the COPY OUT case. I thought the idea for that
was that we'd just shove out the data without any delimiters, and
if it's more than one datum it's the user's problem whether he can
identify the boundaries. On the input side we would have to insist
on one column since we're not going to attempt to identify boundaries
(and one row would fall out of the fact that we slurp the entire input
and treat it as one datum).

It should not be problem. I though about it. The COPY statements is extensible with options. We can support more fields, more rows if it will be required with additional options. But now, it looks like premature optimization.

Anyway this is certainly not committable as-is, so I'm setting it back
to Waiting on Author. But the fact that both libpq and ecpg would need
updates makes me question whether we can safely pretend that this isn't
a protocol break.

I'll do test against some clients.

Regards

Pavel

regards, tom lane

Re: raw output from copy

From

Pavel Stehule

Date:

29 March 2016, 03:56:55

Anyway this is certainly not committable as-is, so I'm setting it back
to Waiting on Author. But the fact that both libpq and ecpg would need
updates makes me question whether we can safely pretend that this isn't
a protocol break.

In that case I humbly submit that there is a case for reviving the psql patch I posted that kicked off this whole thing and lets you export a piece of binary data from psql quite easily. It should certainly not involve any protocol break.

The psql only solution can work only for output. Doesn't help with input.

In this case, I am thinking so the features of COPY statement is perfect for this feature. The way from a content to the file is direct. In psql you have to off - tuple separator, record separator, you have to set output file. You can get same effect, but with more work. In previous version it was relatively hard to use it from command line - now, with multi command -c is much simpler, but still the COPY is the ideal.

I agree, so output formats of psql is nice feature. And should be pretty nice, if we support more common formats - like csv, simple xml, simple json. I believe so sometime the redundancy is acceptable, if the cost is not too high.

sorry for offtopic - I would to see some output format on client side, but the format possibilities are on server side. So there are natural idea - define server side output format. psql output format just can wrap it.

Regards

Pavel

Regards

Pavel

cheers

andrew

Re: raw output from copy

From

Pavel Stehule

Date:

29 March 2016, 15:44:53

2016-03-29 0:26 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:
> [ copy-raw-format-20160227-03.patch ]

I looked at this patch. I'm having a hard time accepting that it has
a use-case large enough to justify it, and here's the reason: it's
a protocol break. Conveniently omitting to update protocol.sgml
doesn't make it not a protocol break. (libpq.sgml also contains
assorted statements that are falsified by this patch.)

You could argue that it's the user's own fault if he tries to use
COPY RAW with client-side code that hasn't been updated to support it.
Maybe that's okay, but I wonder if we're opening ourselves up to
problems. Maybe even security-grade problems.

I tested COPY RAW on old psql clients - and it is working without any problem - so when the client uses same logic as psql, then it should to work. Sure, there can be differently implemented clients, but the COPY client side is usually simple - store stream to output.

Maybe I am blind, but I don't see any new security risks. The risk can be only on client side - and if client is not able work with new value, then it can fails. But any attacker can use fake data stream, and can enforce this error too. So if there are some security risks on special designed clients, then this risks is existing now.

In terms of specific code that hasn't been updated, ecpg is broken
by this patch, and I'm not very sure what libpq's PQbinaryTuples()
ought to do but probably something other than what it does today.

There's also a definitional question of what we think PQfformat() ought
to do; should it return "2" for the per-field format? Or maybe the
per-field format is still "1", since it's after all the same binary data
format as for COPY BINARY, and only the overall copy format reported by
PQbinaryTuples() should change to "2".

Theoretically the change there is allowed - "Format code zero indicates textual data representation, while format code one indicates binary representation. (Other codes are reserved for future definition.) - PQfformat". But - the format of COPY RAW is binary - this format is cleaner binary format than is used by COPY BINARY (where is a header + BINARY). I am thinking so PQbinaryTuples should to return 1 (without change), and PQfformat should to return 2. If some older client uses deprecated function PQbinaryTuples(), then 1 is safe value. PQfformat() is documented differently and if there will be different than expected value, then the client should to raise a error. So using 2 is safe there. The value 2 is adequate to actual content

Packet: t=1459265078.596466, session=213070643360702
PGSQL: type=Query, F -> B
QUERY query=copy foo(x) to stdout (format raw);

Packet: t=1459265078.597755, session=213070643360702
PGSQL: type=CopyOutResponse, B -> F
COPY OUT RESPONSE copy format=1, num_fields=1, fields_formats=2

Packet: t=1459265078.597755, session=213070643360702
PGSQL: type=CopyData, B -> F
COPY DATA len=20

Packet: t=1459265078.597755, session=213070643360702
PGSQL: type=CopyDone, B -> F
COPY DONE

Packet: t=1459265078.597755, session=213070643360702
PGSQL: type=CommandComplete, B -> F
COMMAND COMPLETE command='COPY 1'

Packet: t=1459265078.597755, session=213070643360702
PGSQL: type=ReadyForQuery, B -> F
READY FOR QUERY type=<IDLE>

What do you think ?

p.s. These values are returned now

PQfformat(*results, 0)) returns 2 already, PQbinaryTuples() returns 1.

BTW, I'm not really sure why the patch is trying to enforce single
row and column for the COPY OUT case. I thought the idea for that
was that we'd just shove out the data without any delimiters, and
if it's more than one datum it's the user's problem whether he can
identify the boundaries. On the input side we would have to insist
on one column since we're not going to attempt to identify boundaries
(and one row would fall out of the fact that we slurp the entire input
and treat it as one datum).

Anyway this is certainly not committable as-is, so I'm setting it back
to Waiting on Author. But the fact that both libpq and ecpg would need
updates makes me question whether we can safely pretend that this isn't
a protocol break.

I executed all tests in libpq and ecpg without any problems. Can you, please, help me with repeating a ecpg issues?

Regards

Pavel

regards, tom lane

Re: raw output from copy

From

Tom Lane

Date:

29 March 2016, 16:19:35

Pavel Stehule <pavel.stehule@gmail.com> writes:
> I tested COPY RAW on old psql clients - and it is working without any
> problem - so when the client uses same logic as psql, then it should to
> work. Sure, there can be differently implemented clients, but the COPY
> client side is usually simple - store stream to output.

My point is precisely that I doubt all clients are that stupid about COPY.

> Maybe I am blind, but I don't see any new security risks. The risk can be
> only on client side - and if client is not able work with new value, then
> it can fails.

Well, the point is that low-level code might get used to process the data
stream for commands it doesn't have any control over.  Maybe there's no
realistic security risk there, or maybe there is; I'm not sure.

> I am thinking so PQbinaryTuples should to return 1 (without change), and
> PQfformat should to return 2.

Well, that seems pretty backwards to me.  The format of the individual
fields is still what it is under COPY BINARY; you would not use a
different per-field transformation.  You do need to know about the
overall format of the copy data stream being different, and defining
PQbinaryTuples as still returning 1 means there's no clean way to
understand overall copy format vs. per-field format.

There's a case to be made that we should invent a new function named
along the lines of PQcopyFormat() rather than overloading PQbinaryTuples()
some more.  That function is currently deprecated and I'm not very happy
with un-deprecating it only to use it in a confusing way.

To be more concrete about this: I think it's actually rather broken
that this patch ties RAW to binary format of the field contents.
Why would it not be exactly as useful to have delimiter-less COPY
of textual data, for use when there's just one datum and/or you're
confident in picking the data apart for yourself?  But as things stand
it'd be too confusing for an application to try to figure out what's
happening in such a case.

So I think we should either invent RAW_TEXT and RAW_BINARY formats
(not just RAW) or make RAW be an orthogonal copy option.  And we need
to improve libpq's behavior enough so that applications can sanely
figure out what's happening.

> I executed all tests in libpq and ecpg without any problems. Can you,
> please, help me with repeating a ecpg issues?

Of course the ecpg tests pass; you didn't extend them to see what would
happen if someone tries COPY RAW with ecpg.   Likewise, we have no tests
exercising a client's use of libpq with more intelligence than psql has
got.  But that doesn't mean it's acceptable to write this patch with no
thought for such clients.

I am fairly sure that there actually are third-party client libraries
that have more intelligence about COPY than psql, but I do not remember
any specifics unfortunately.
        regards, tom lane

Re: raw output from copy

From

Pavel Stehule

Date:

29 March 2016, 17:29:01

2016-03-29 18:19 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:
> I tested COPY RAW on old psql clients - and it is working without any
> problem - so when the client uses same logic as psql, then it should to
> work. Sure, there can be differently implemented clients, but the COPY
> client side is usually simple - store stream to output.

My point is precisely that I doubt all clients are that stupid about COPY.

> Maybe I am blind, but I don't see any new security risks. The risk can be
> only on client side - and if client is not able work with new value, then
> it can fails.

Well, the point is that low-level code might get used to process the data
stream for commands it doesn't have any control over. Maybe there's no
realistic security risk there, or maybe there is; I'm not sure.

> I am thinking so PQbinaryTuples should to return 1 (without change), and
> PQfformat should to return 2.

Well, that seems pretty backwards to me. The format of the individual
fields is still what it is under COPY BINARY; you would not use a
different per-field transformation. You do need to know about the
overall format of the copy data stream being different, and defining
PQbinaryTuples as still returning 1 means there's no clean way to
understand overall copy format vs. per-field format.

There's a case to be made that we should invent a new function named
along the lines of PQcopyFormat() rather than overloading PQbinaryTuples()
some more. That function is currently deprecated and I'm not very happy
with un-deprecating it only to use it in a confusing way.

I see a introduction of PQcopyFormat() as best idea. So for PQbinaryTuples() and PQfformat() these new changes are transparent - and PQcopyFormat can returns info about used method.

To be more concrete about this: I think it's actually rather broken
that this patch ties RAW to binary format of the field contents.
Why would it not be exactly as useful to have delimiter-less COPY
of textual data, for use when there's just one datum and/or you're
confident in picking the data apart for yourself? But as things stand
it'd be too confusing for an application to try to figure out what's
happening in such a case.

So I think we should either invent RAW_TEXT and RAW_BINARY formats
(not just RAW) or make RAW be an orthogonal copy option. And we need
to improve libpq's behavior enough so that applications can sanely
figure out what's happening.

I had a use case that required binary mode. Higher granularity has sense.

This opening new question - RAW_TEXT will use text output function. But if I will pass this value as text value, then a behave of current clients will be same as usual COPY. So I need to use binary protocol. And then the behave of PQbinaryTuples() and PQfformat() is the question? Although text value can be passed in binary mode too (with format [length, data...]).

> I executed all tests in libpq and ecpg without any problems. Can you,
> please, help me with repeating a ecpg issues?

Of course the ecpg tests pass; you didn't extend them to see what would
happen if someone tries COPY RAW with ecpg. Likewise, we have no tests
exercising a client's use of libpq with more intelligence than psql has
got. But that doesn't mean it's acceptable to write this patch with no
thought for such clients.

if we don't change PQbinaryTuples() and PQfformat(), then COPY RAW should be transparent for any client. Server sending data in binary format - what is generic.

I am fairly sure that there actually are third-party client libraries
that have more intelligence about COPY than psql, but I do not remember
any specifics unfortunately.

The COPY RAW should not to break any existing application. This is new feature - and old application, old client use COPY RAW newer. I see as important the conformity of used mode (text/binary) and PQbinaryTuples() and PQfformat().

I am writing few lines as summary:

1. invention RAW_TEXT and RAW_BINARY

2. for RAW_BINARY: PQbinaryTuples() returns 1 and PQfformat() returns 1

3.a for RAW_TEXT: PQbinaryTuples() returns 0 and PQfformat() returns 0, but the client should to check PQcopyFormat() to not print "\n" on the end

3.b for RAW_TEXT: PQbinaryTuples() returns 1 and PQfformat() returns 1, but used output function, not necessary client modification
4. PQcopyFormat() returns 0 for text, 1 for binary, 2 for RAW_TEXT, 3 for RAW_BINARY

5. create tests for ecpg

Is it ok?

What do you prefer 3.a, or 3.b?

Regards

Pavel

regards, tom lane

Re: raw output from copy

From

Tom Lane

Date:

29 March 2016, 18:59:44

Pavel Stehule <pavel.stehule@gmail.com> writes:
> I am writing few lines as summary:

> 1. invention RAW_TEXT and RAW_BINARY
> 2. for RAW_BINARY: PQbinaryTuples() returns 1 and PQfformat() returns 1
> 3.a for RAW_TEXT: PQbinaryTuples() returns 0 and PQfformat() returns 0, but
> the client should to check PQcopyFormat() to not print "\n" on the end
> 3.b for RAW_TEXT: PQbinaryTuples() returns 1 and PQfformat() returns 1, but
> used output function, not necessary client modification
> 4. PQcopyFormat() returns 0 for text, 1 for binary, 2 for RAW_TEXT, 3 for
> RAW_BINARY
> 5. create tests for ecpg

3.b certainly seems completely wrong.  PQfformat==1 would imply binary
data.

I suggest that PQcopyFormat should be understood as defining the format
of the copy data encapsulation, not the individual fields.  So it would go
like 0 = traditional text format, 1 = traditional binary format, 2 = raw
(no encapsulation).  You'd need to also look at PQfformat to distinguish
raw text from raw binary.  But if we do it as you suggest above, we've
locked ourselves into only ever having two field format codes, which
is something the existing design is specifically intended to allow
expansion in.
        regards, tom lane

Re: raw output from copy

From

Pavel Stehule

Date:

29 March 2016, 19:41:48

2016-03-29 20:59 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:
> I am writing few lines as summary:

> 1. invention RAW_TEXT and RAW_BINARY
> 2. for RAW_BINARY: PQbinaryTuples() returns 1 and PQfformat() returns 1
> 3.a for RAW_TEXT: PQbinaryTuples() returns 0 and PQfformat() returns 0, but
> the client should to check PQcopyFormat() to not print "\n" on the end
> 3.b for RAW_TEXT: PQbinaryTuples() returns 1 and PQfformat() returns 1, but
> used output function, not necessary client modification
> 4. PQcopyFormat() returns 0 for text, 1 for binary, 2 for RAW_TEXT, 3 for
> RAW_BINARY
> 5. create tests for ecpg

3.b certainly seems completely wrong. PQfformat==1 would imply binary
data.

I suggest that PQcopyFormat should be understood as defining the format
of the copy data encapsulation, not the individual fields. So it would go
like 0 = traditional text format, 1 = traditional binary format, 2 = raw
(no encapsulation). You'd need to also look at PQfformat to distinguish
raw text from raw binary. But if we do it as you suggest above, we've
locked ourselves into only ever having two field format codes, which
is something the existing design is specifically intended to allow
expansion in.

I have a less courage than you :). The original design worked with almost clients without changes on client side. New design has lot of combinations, that are unknown for old clients. It can be better, because the client authors will do update faster.

If PQfformat will returns 0 = text, 1 = traditional binary, 2 = raw text, 3 = raw binary - like you propose, then PQcopyFormat is useless. I see all information just from PQfformat.

Regards

Pavel

regards, tom lane

Re: raw output from copy

From

Andrew Dunstan

Date:

29 March 2016, 19:53:00

On 03/28/2016 11:18 PM, Pavel Stehule wrote:
>
>
>
>         Anyway this is certainly not committable as-is, so I'm setting
>         it back
>         to Waiting on Author.  But the fact that both libpq and ecpg
>         would need
>         updates makes me question whether we can safely pretend that
>         this isn't
>         a protocol break.
>
>
>
>
>     In that case I humbly submit that there is a case for reviving the
>     psql patch I posted that kicked off this whole thing and lets you
>     export a piece of binary data from psql quite easily. It should
>     certainly not involve any protocol break.
>
>
> The psql only solution can work only for output. Doesn't help with input.
>
>

The I would suggest we try to invent something for psql which does help 
with it. I just don't see this as an SQL problem. Pretty much any driver 
library will have no difficulty in handling binary input and output. 
It's only psql that has an issue, ISTM, and therefore I believe that's 
where the fix should go. What else is going to use this? As an SQL 
change this seems like a solution in search of a problem. If someone can 
make a good case that this is going to be of general use I'll happily go 
along, but I haven't seen one so far.

cheers

andrdew

Re: raw output from copy

From

Tom Lane

Date:

29 March 2016, 20:33:40

Andrew Dunstan <andrew@dunslane.net> writes:
> The I would suggest we try to invent something for psql which does help 
> with it. I just don't see this as an SQL problem.

There's certainly a lot to be said for that approach.  I'm still not
convinced that we can make COPY do this without creating compatibility
issues, regardless of the details; and it doesn't seem like a big
enough problem to be worth taking any risks of that sort.
        regards, tom lane

Re: raw output from copy

From

Pavel Stehule

Date:

31 March 2016, 06:13:13

2016-03-29 20:59 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:
> I am writing few lines as summary:

> 1. invention RAW_TEXT and RAW_BINARY
> 2. for RAW_BINARY: PQbinaryTuples() returns 1 and PQfformat() returns 1
> 3.a for RAW_TEXT: PQbinaryTuples() returns 0 and PQfformat() returns 0, but
> the client should to check PQcopyFormat() to not print "\n" on the end
> 3.b for RAW_TEXT: PQbinaryTuples() returns 1 and PQfformat() returns 1, but
> used output function, not necessary client modification
> 4. PQcopyFormat() returns 0 for text, 1 for binary, 2 for RAW_TEXT, 3 for
> RAW_BINARY
> 5. create tests for ecpg

3.b certainly seems completely wrong. PQfformat==1 would imply binary
data.

I suggest that PQcopyFormat should be understood as defining the format
of the copy data encapsulation, not the individual fields. So it would go
like 0 = traditional text format, 1 = traditional binary format, 2 = raw
(no encapsulation). You'd need to also look at PQfformat to distinguish
raw text from raw binary. But if we do it as you suggest above, we've
locked ourselves into only ever having two field format codes, which
is something the existing design is specifically intended to allow
expansion in.

I wrote concept of raw_text, raw_binary modes.

I am trying to implement text data passing like text format - but for RAW_TEXT it is not practical. Text passing is designed for one line data, for multiline data enforces escaping, what we don't would for RAW mode. I have to skip escaping, and the code is not nice.

So I propose different schema - RAW_TEXT uses text values (uses input/output functions), enforce encoding from/to client codes and for passing to client mode is used binary mode - then I don't need to read the content with line by line. PQbinaryTuples() returns 1 for RAW_TEXT and RAW_BINARY - in these cases data are passed as one binary value. PQfformat returns 2 for RAW_TEXT and 3 for RAW_BINARY.

Any objections to this design?

Regards

Pavel

regards, tom lane

Attachment

copy-raw-format-20160331-04.patch

Re: raw output from copy

From

Craig Ringer

Date:

31 March 2016, 06:34:58

On 30 March 2016 at 00:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Pavel Stehule <pavel.stehule@gmail.com> writes:
> I tested COPY RAW on old psql clients - and it is working without any
> problem - so when the client uses same logic as psql, then it should to
> work. Sure, there can be differently implemented clients, but the COPY
> client side is usually simple - store stream to output.

My point is precisely that I doubt all clients are that stupid about COPY.

PgJDBC definitely isn't.

Any changes really need to be tested against PgJDBC's CopyManager.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: raw output from copy

From

Pavel Stehule

Date:

31 March 2016, 06:40:46

2016-03-31 8:34 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:

On 30 March 2016 at 00:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I tested COPY RAW on old psql clients - and it is working without any
> problem - so when the client uses same logic as psql, then it should to
> work. Sure, there can be differently implemented clients, but the COPY
> client side is usually simple - store stream to output.

My point is precisely that I doubt all clients are that stupid about COPY.

PgJDBC definitely isn't.

Any changes really need to be tested against PgJDBC's CopyManager.

this patch doesn't break any old application. Accepting new feature depends on binary method detection. PQbinaryTuples based clients should to support COPY RAW* without problems, PQfformat() should to report unknown format.

Regards

Pavel

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: raw output from copy

From

Craig Ringer

Date:

31 March 2016, 07:49:04

On 31 March 2016 at 14:40, Pavel Stehule <pavel.stehule@gmail.com> wrote:

this patch doesn't break any old application. Accepting new feature depends on binary method detection. PQbinaryTuples based clients should to support COPY RAW* without problems, PQfformat() should to report unknown format.

PgJDBC does not use libpq.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: raw output from copy

From

Pavel Stehule

Date:

31 March 2016, 07:55:18

2016-03-31 9:48 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:

On 31 March 2016 at 14:40, Pavel Stehule <pavel.stehule@gmail.com> wrote:

this patch doesn't break any old application. Accepting new feature depends on binary method detection. PQbinaryTuples based clients should to support COPY RAW* without problems, PQfformat() should to report unknown format.

PgJDBC does not use libpq.

so it can be interesting test

Pavel

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: raw output from copy

From

"Daniel Verite"

Date:

01 April 2016, 15:42:58

    Andrew Dunstan wrote:

> If someone can make a good case that this is going to be of
> general use I'll happily go along, but I haven't seen one so far.

About COPY FROM with a raw format, for instance just yesterday
there was this user question on stackoverflow:
http://stackoverflow.com/questions/36317237

which essentially is: how to import contents from a file without any
particular interpretation of any character?

With the patch discussed in this thread, a user can do
\copy table(textcol) from /path/to/file (format raw)
or the equivalent COPY.
If it's a binary column, that works just the same.

Without this, it's not obvious at all how this result can be
achieved without resorting to external preprocessing,
and assuming the availability of such preprocessing tools
in the environment. Notwithstanding the fact that the
solution proposed on SO (doubling backslashes with sed)
doesn't even work if the file contains tabs, as they would be
interpreted as field separators, even if the copy target has only
one column. You can change the delimiter with COPY but AFAIK
you can't tell that there is none.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: raw output from copy

From

Andrew Dunstan

Date:

02 April 2016, 17:53:02


On 04/01/2016 11:42 AM, Daniel Verite wrote:
>     Andrew Dunstan wrote:
>
>> If someone can make a good case that this is going to be of
>> general use I'll happily go along, but I haven't seen one so far.
> About COPY FROM with a raw format, for instance just yesterday
> there was this user question on stackoverflow:
> http://stackoverflow.com/questions/36317237
>
> which essentially is: how to import contents from a file without any
> particular interpretation of any character?\


There is so much wrong with this it's hard to know where to start.

Inserting the whole contents of a text file unchanged is insanely easy 
in psql.
    \set file `cat /path/to/file`    insert into mytable(contents) values(:'file');

What is more everyone on SO missed the fact that CSV mode gives you very 
considerable control over the quote, delimiter and null settings.

See for example 
<http://adpgtech.blogspot.com/2014/09/importing-json-data.html> which 
has this example for handling files consisting of 1 json document per line:
copy the_table(jsonfield)from '/path/to/jsondata'csv quote e'\x01' delimiter e'\x02';

psql's \copy will work just the same way

(I noticed with amusement this week that CitusData is using pretty much 
exactly this in one of their examples.)

>
> With the patch discussed in this thread, a user can do
> \copy table(textcol) from /path/to/file (format raw)
> or the equivalent COPY.
> If it's a binary column, that works just the same.


It would be fairly simple to invent a binary mechanism that did the 
equivalent of the above insert. All without any change to SQL or the 
backend at all.


>
> Without this, it's not obvious at all how this result can be
> achieved without resorting to external preprocessing,
> and assuming the availability of such preprocessing tools
> in the environment. Notwithstanding the fact that the
> solution proposed on SO (doubling backslashes with sed)
> doesn't even work if the file contains tabs, as they would be
> interpreted as field separators, even if the copy target has only
> one column. You can change the delimiter with COPY but AFAIK
> you can't tell that there is none.


There is arguably a good case for allowing a null delimiter. But that SO 
page is just a terrible piece of misinformation, as far too often 
happens in my experience.

And I am still waiting for a non-psql use case. But I don't expect to 
see one, precisely because most clients have no difficulty at all in 
handling binary data.

cheers

andrew

Re: raw output from copy

From

"Daniel Verite"

Date:

04 April 2016, 16:38:45

    Andrew Dunstan wrote:

> Inserting the whole contents of a text file unchanged is insanely easy
> in psql.
>
>      \set file `cat /path/to/file`
>      insert into mytable(contents) values(:'file');

That's assuming psql but the asker of that question never mentioned
using psql. The COPY invocation could be inside a function. Even if
that particular user would be fine with a psql-only option, the next
one might not. Or they might want to import a binary file, and
as you mention, currently there's no equivalent of the :'var'
feature for binary.

But there's another aspect to this that's worth of consideration,
and that this forum question illustrates.
One reason of adding the format to COPY is that it's where users
are looking for it. It's the canonical way of importing contents
from files so that's where it makes more sense.
From the POV of being user friendly and consistent, restricting what
COPY can do because psql could do it completely differently
if the user was psql-savvy enough to know it, what sense does it
make?

> And I am still waiting for a non-psql use case. But I don't expect to
> see one, precisely because most clients have no difficulty at all in
> handling binary data.

You mean small or medium-size binary data. The 512MB-1GB range is
impossible to handle if requested in text format, which is what drivers
tend to use. Even pg_dump fails on these contents.
Maybe it was unimportant when bytea was added ~15 years ago,
but the size of data that people actually put into bytea columns is
growing, following Moore's law like the rest.

Even in the lower size range, considering the amount of memory allocated
and the time spent to convert to hex, sending twice the number
of bytes on the wire, just to do the reverse conversion in the client
as soon as all data is obtained, it works but it's pointless
and inefficient.

Code that uses PQexecParams() binary "resultFormat", or the
binary format of copy doesn't have that problem,  but most
client-side drivers don't do that.

And maybe they just can't realistically, because  getting result
format in binary is exposed as an all-or-nothing choice in libpq.

I mean if client code does SELECT * FROM table or even COPY of the
same, and what comes back is bytea and e.g. timestamps and floats and
custom types, the client-side driver may wish to have the bytea field in
binary format for efficiency and the rest in text format for
usability, but that's not possible with PQexecParams(), or other
libpq functions.

The point of mixing binary and text is outside the scope of a RAW
format for COPY, as obviously it wouldn't help with that in any way,
but on the argument that the status quo is fine because clients
have no difficulty, that's just not true. Clients cope with what they have,
but what they have is far from being complete or optimal.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: raw output from copy

From

Tom Lane

Date:

04 April 2016, 16:56:11

"Daniel Verite" <daniel@manitou-mail.org> writes:
> One reason of adding the format to COPY is that it's where users
> are looking for it. It's the canonical way of importing contents
> from files so that's where it makes more sense.

I'm not sure I buy that argument, because it could be used to justify
adding absolutely any ETL functionality to COPY.  And we don't want
to go down that path; the design intention for COPY is that it be as
simple and fast as possible.

>> And I am still waiting for a non-psql use case. But I don't expect to 
>> see one, precisely because most clients have no difficulty at all in 
>> handling binary data.

> You mean small or medium-size binary data. The 512MB-1GB range is
> impossible to handle if requested in text format, which is what drivers
> tend to use. Even pg_dump fails on these contents.

... which is COPY.  I do not see that RAW mode is going to help much
here: it's not going to be noticeably better than COPY BINARY in terms
of maximum field width.

>> Code that uses PQexecParams() binary "resultFormat", or the
>> binary format of copy doesn't have that problem,  but most
>> client-side drivers don't do that.

> And maybe they just can't realistically, because  getting result
> format in binary is exposed as an all-or-nothing choice in libpq.

That's simply wrong.  Read the documentation for PQexecParams and
friends: you can specify text or binary per-column.  It's COPY that
has the only-one-column-format restriction, and RAW certainly isn't
going to make that better.

I'm not quite as convinced as Andrew that RAW mode is unnecessary,
but I don't find these arguments for it to be very compelling.

The real issue to my mind is that it doesn't seem like we can shoehorn
a sanely-defined version of RAW into the existing protocol spec without
creating compatibility hazards.  So we can either wait for the mythical
protocol v4 (but even a protocol update wouldn't fix the application-level
hazards) or we can treat it as a problem to be solved client-side.
        regards, tom lane

Re: raw output from copy

From

"David G. Johnston"

Date:

04 April 2016, 17:28:46

On Fri, Apr 1, 2016 at 8:42 AM, Daniel Verite <daniel@manitou-mail.org> wrote:

Andrew Dunstan wrote:

> If someone can make a good case that this is going to be of
> general use I'll happily go along, but I haven't seen one so far.

About COPY FROM with a raw format, for instance just yesterday
there was this user question on stackoverflow:
http://stackoverflow.com/questions/36317237

which essentially is: how to import contents from a file without any
particular interpretation of any character?

With the patch discussed in this thread, a user can do
\copy table(textcol) from /path/to/file (format raw)

What is needed to solve this specific use-case is a way to specify "QUOTE NONE" instead of the default for whatever format is being hijacked:

COPY file_content FROM '/tmp/textfile.txt' WITH (FORMAT csv, QUOTE E'<unprintable character that should never appear in the data>');

becomes

COPY file_content FROM '/tmp/textfile.txt' WITH (FORMAT csv, QUOTE NONE);

Or maybe: "WITH (FORMAT single_column)"

Though maybe that doesn't extend well to unencoded binary data...which seems like it can be considered a separate problem from reliably importing an entire file into a single row+column in a table.

David J.

Re: raw output from copy

From

"Daniel Verite"

Date:

04 April 2016, 18:07:07

Tom Lane wrote:

> >> Code that uses PQexecParams() binary "resultFormat", or the
> >> binary format of copy doesn't have that problem,  but most
> >> client-side drivers don't do that.
>
> > And maybe they just can't realistically, because  getting result
> > format in binary is exposed as an all-or-nothing choice in libpq.
>
> That's simply wrong.  Read the documentation for PQexecParams and
> friends: you can specify text or binary per-column.  It's COPY that
> has the only-one-column-format restriction, and RAW certainly isn't
> going to make that better.

About PQexecParams, I disagree, the parameters formats can be
specified independantly, but the not the results, which are either all
binary or all text.

Quoting the doc at
http://www.postgresql.org/docs/9.5/static/libpq-exec.html
<quote>
PGresult *PQexecParams(PGconn *conn,           const char *command,           int nParams,           const Oid
*paramTypes,          const char * const *paramValues,           const int *paramLengths,           const int
*paramFormats,          int resultFormat); 
[...]

resultFormat:   Specify zero to obtain results in text format, or one to obtain results
in binary format. (There is not currently a provision to obtain different
result columns in different formats, although that is possible in the
underlying protocol.)
</quote>

For the client-side drivers that I've looked at, like these used in php
or perl, they just never use resultFormat=1.
I assume that they consider that having all values
in binary is unworkable for them, which is reasonable.
Maybe if they had a per-column choice, they wouldn't
use it anyway, but at least it would be theirs to decide

All this is only tangentially related to COPY RAW.
It's just that COPY RAW can be seen as an efficient alternative to
the single-column returning [SELECT bytea_column FROM...]
The drivers currently request this in text mode even though
it makes no sense in this particular case, and it gets measurably
annoying if the contents are big.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: raw output from copy

From

Pavel Stehule

Date:

05 April 2016, 08:46:34

here is cleaned/finished previous implementation of RAW_TEXT/RAW_BINARY formats for COPY statements.

The RAW with text formats means unescaped data, but with correct encoding - input/output is realised with input/output function. RAW binary means content produced/received by sending/received functions.

Now both directions (input/output) working well

Some examples of expected usage:

copy (select xmlelement(name foo, 'hello')) to stdout (format raw_binary, encoding 'latin2');

create table avatars(id serial, picture bytea);

\copy avatars(picture) from ~/images/foo.jpg (format raw_binary);

select lastval();

create table doc(id serial, txt text);

\copy doc(txt) from ~/files/aaa.txt (format raw_text, encoding 'latin2');

select lastval();

Regards

Pavel

Attachment

copy-raw_text_binary-01.patch

Re: raw output from copy

From

Robert Haas

Date:

08 April 2016, 18:13:45

On Tue, Apr 5, 2016 at 4:45 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> here is cleaned/finished previous implementation of RAW_TEXT/RAW_BINARY
> formats for COPY statements.
>
> The RAW with text formats means unescaped data, but with correct encoding -
> input/output is realised with input/output function. RAW binary means
> content produced/received by sending/received functions.
>
> Now both directions (input/output) working well
>
> Some examples of expected usage:
>
> copy (select xmlelement(name foo, 'hello')) to stdout (format raw_binary,
> encoding 'latin2');
>
> create table avatars(id serial, picture bytea);
> \copy avatars(picture) from ~/images/foo.jpg (format raw_binary);
> select lastval();
>
> create table doc(id serial, txt text);
> \copy doc(txt) from ~/files/aaa.txt (format raw_text, encoding 'latin2');
> select lastval();

As much as I know you and some other people would like it to be
otherwise, this patch clearly does not have a sufficient degree of
consensus to justify committing it to PostgreSQL 9.6.  I'm marking it
Returned with Feedback.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: raw output from copy

From

Pavel Stehule

Date:

08 April 2016, 18:24:03

2016-04-08 20:13 GMT+02:00 Robert Haas <robertmhaas@gmail.com>:

On Tue, Apr 5, 2016 at 4:45 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> here is cleaned/finished previous implementation of RAW_TEXT/RAW_BINARY
> formats for COPY statements.
>
> The RAW with text formats means unescaped data, but with correct encoding -
> input/output is realised with input/output function. RAW binary means
> content produced/received by sending/received functions.
>
> Now both directions (input/output) working well
>
> Some examples of expected usage:
>
> copy (select xmlelement(name foo, 'hello')) to stdout (format raw_binary,
> encoding 'latin2');
>
> create table avatars(id serial, picture bytea);
> \copy avatars(picture) from ~/images/foo.jpg (format raw_binary);
> select lastval();
>
> create table doc(id serial, txt text);
> \copy doc(txt) from ~/files/aaa.txt (format raw_text, encoding 'latin2');
> select lastval();

As much as I know you and some other people would like it to be
otherwise, this patch clearly does not have a sufficient degree of
consensus to justify committing it to PostgreSQL 9.6. I'm marking it
Returned with Feedback.

ok, I'll try to complete this patch

Regards

Pavel

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: raw output from copy

From

Andrew Dunstan

Date:

08 April 2016, 18:54:58

On 04/08/2016 02:13 PM, Robert Haas wrote:
> On Tue, Apr 5, 2016 at 4:45 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> here is cleaned/finished previous implementation of RAW_TEXT/RAW_BINARY
>> formats for COPY statements.
>>
>> The RAW with text formats means unescaped data, but with correct encoding -
>> input/output is realised with input/output function. RAW binary means
>> content produced/received by sending/received functions.
>>
>> Now both directions (input/output) working well
>>
>> Some examples of expected usage:
>>
>> copy (select xmlelement(name foo, 'hello')) to stdout (format raw_binary,
>> encoding 'latin2');
>>
>> create table avatars(id serial, picture bytea);
>> \copy avatars(picture) from ~/images/foo.jpg (format raw_binary);
>> select lastval();
>>
>> create table doc(id serial, txt text);
>> \copy doc(txt) from ~/files/aaa.txt (format raw_text, encoding 'latin2');
>> select lastval();
> As much as I know you and some other people would like it to be
> otherwise, this patch clearly does not have a sufficient degree of
> consensus to justify committing it to PostgreSQL 9.6.  I'm marking it
> Returned with Feedback.
>

I should add that I've been thinking about this some more, and that I 
now agree that something should be done to support this at the SQL 
level, mainly so that clients can manage very large pieces of data in a 
stream-oriented fashion rather than having to marshall the data in 
memory to load/unload via INSERT/SELECT. Anything that is client-side 
only is likely to have this memory issue.

At the same time I'm still not entirely convinced that COPY is a good 
vehicle for this. It's designed for bulk records, and already quite 
complex. Maybe we need something new that uses the COPY protocol but is 
more specifically tailored for loading or sending large singleton pieces 
of data.

cheers

andrew

Re: raw output from copy

From

Pavel Stehule

Date:

08 April 2016, 19:14:07

2016-04-08 20:54 GMT+02:00 Andrew Dunstan <andrew@dunslane.net>:

On 04/08/2016 02:13 PM, Robert Haas wrote:
On Tue, Apr 5, 2016 at 4:45 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
here is cleaned/finished previous implementation of RAW_TEXT/RAW_BINARY
formats for COPY statements.

The RAW with text formats means unescaped data, but with correct encoding -
input/output is realised with input/output function. RAW binary means
content produced/received by sending/received functions.

Now both directions (input/output) working well

Some examples of expected usage:

copy (select xmlelement(name foo, 'hello')) to stdout (format raw_binary,
encoding 'latin2');

create table avatars(id serial, picture bytea);
\copy avatars(picture) from ~/images/foo.jpg (format raw_binary);
select lastval();

create table doc(id serial, txt text);
\copy doc(txt) from ~/files/aaa.txt (format raw_text, encoding 'latin2');
select lastval();
As much as I know you and some other people would like it to be
otherwise, this patch clearly does not have a sufficient degree of
consensus to justify committing it to PostgreSQL 9.6. I'm marking it
Returned with Feedback.

I should add that I've been thinking about this some more, and that I now agree that something should be done to support this at the SQL level, mainly so that clients can manage very large pieces of data in a stream-oriented fashion rather than having to marshall the data in memory to load/unload via INSERT/SELECT. Anything that is client-side only is likely to have this memory issue.

At the same time I'm still not entirely convinced that COPY is a good vehicle for this. It's designed for bulk records, and already quite complex. Maybe we need something new that uses the COPY protocol but is more specifically tailored for loading or sending large singleton pieces of data.

Now it is little bit more time to think more about. But It is hard to design some more simpler than is COPY syntax. What will support both directions.

My implementation has same limit like COPY BINARY - it isn't worse. It should be good enough for VARLENA types that should not be higher than 1GB. It is not designed for LOB replacement.

Regards

Pavel

cheers

andrew

Re: raw output from copy

From

Ants Aasma

Date:

12 April 2016, 10:22:40

On 8 Apr 2016 9:14 pm, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
> 2016-04-08 20:54 GMT+02:00 Andrew Dunstan <andrew@dunslane.net>:
>> I should add that I've been thinking about this some more, and that I now agree that something should be done to
supportthis at the SQL level, mainly so that clients can manage very large pieces of data in a stream-oriented fashion
ratherthan having to marshall the data in memory to load/unload via INSERT/SELECT. Anything that is client-side only is
likelyto have this memory issue. 
>>
>> At the same time I'm still not entirely convinced that COPY is a good vehicle for this. It's designed for bulk
records,and already quite complex. Maybe we need something new that uses the COPY protocol but is more specifically
tailoredfor loading or sending large singleton pieces of data. 
>
>
> Now it is little bit more time to think more about. But It is hard to design some more simpler than is COPY syntax.
Whatwill support both directions. 

Sorry for arriving late and adding to the bikeshedding. Maybe the
answer is to make COPY pluggable. It seems to me that it would be
relatively straightforward to add an extension mechanism for copy
output and input plugins that could support any format expressible as
a binary stream. Raw output would then be an almost trivial plugin.
Others could implement JSON, protocol buffers, Redis bulk load, BSON,
ASN.1 or whatever else serialisation format du jour. It will still
have the same backwards compatibility issues as adding the raw output,
but the payoff is greater.

Regards,
Ants Aasma

Re: raw output from copy

From

Pavel Stehule

Date:

12 April 2016, 20:20:42

2016-04-12 12:22 GMT+02:00 Ants Aasma <ants.aasma@eesti.ee>:

On 8 Apr 2016 9:14 pm, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
> 2016-04-08 20:54 GMT+02:00 Andrew Dunstan <andrew@dunslane.net>:
>> I should add that I've been thinking about this some more, and that I now agree that something should be done to support this at the SQL level, mainly so that clients can manage very large pieces of data in a stream-oriented fashion rather than having to marshall the data in memory to load/unload via INSERT/SELECT. Anything that is client-side only is likely to have this memory issue.
>>
>> At the same time I'm still not entirely convinced that COPY is a good vehicle for this. It's designed for bulk records, and already quite complex. Maybe we need something new that uses the COPY protocol but is more specifically tailored for loading or sending large singleton pieces of data.
>
>
> Now it is little bit more time to think more about. But It is hard to design some more simpler than is COPY syntax. What will support both directions.

Sorry for arriving late and adding to the bikeshedding. Maybe the
answer is to make COPY pluggable. It seems to me that it would be
relatively straightforward to add an extension mechanism for copy
output and input plugins that could support any format expressible as
a binary stream. Raw output would then be an almost trivial plugin.
Others could implement JSON, protocol buffers, Redis bulk load, BSON,
ASN.1 or whatever else serialisation format du jour. It will still
have the same backwards compatibility issues as adding the raw output,
but the payoff is greater.

I had a idea about additional options of COPY RAW statements. One can be CAST function. These CAST functions can be used to any for any format.

COPY has two parts - client, and server side. Currently we cannot to expand libpq, and we cannot to expand psql. So we have to send data to client in target format and all transformations should be done on server side. Personally, I strongly prefer to write Linux server side extensions against MSWin client side extensions. The client (psql) is able to use a pipe - so any client side transformation can be done outer psql.

Regards

Pavel

Regards,
Ants Aasma

Re: raw output from copy

From

Tom Lane

Date:

12 April 2016, 20:49:20

Pavel Stehule <pavel.stehule@gmail.com> writes:
> I had a idea about additional options of COPY RAW statements. One can be
> CAST function. These CAST functions can be used to any for any format.

Uh, what?  CAST() is not about external representations of values, and
overloading it for that purpose doesn't seem like a particularly good
idea: you'd have to figure out what the conversions meant inside SQL as
well as externally.  Also, maybe I missed something, but a different
representation of individual data values within a COPY wasn't what we
were after here.
        regards, tom lane

Re: raw output from copy

From

Pavel Stehule

Date:

13 April 2016, 03:39:46

2016-04-12 22:48 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:
> I had a idea about additional options of COPY RAW statements. One can be
> CAST function. These CAST functions can be used to any for any format.

Uh, what? CAST() is not about external representations of values, and
overloading it for that purpose doesn't seem like a particularly good
idea: you'd have to figure out what the conversions meant inside SQL as
well as externally. Also, maybe I missed something, but a different
representation of individual data values within a COPY wasn't what we
were after here.

I didn't think about this idea to deep - so there can be more than one problem. More - I though about it before you designed RAW_TEXT mode - that can coverage this use case too.

Originally I had only RAW mode, what can be difficult for JSONB, so my solution was

COPY target(jsonbcol) FROM jsondata OPTIONS(RAW, CAST(json_to_jsonb)).

Now this idea is obsolete, because anybody can do

COPY target(jsonbcol) FROM jsondata OPTIONS(RAW_TEXT)

What is much more simple.

Using explicit casts in COPY statement was motivated by possible requirement do some manipulations with data before their storing to table. It is idea, and probably wrong idea.

I don't want to increase complexity of COPY statement too much. My goal is enhance COPY to import single objects simply. And if you need some more complex, then you can write some simple application where can be used classic COPY or COPY RAW again (because it doesn't require escaping).

Regards

Pavel

regards, tom lane

Re: raw output from copy

From

Pavel Stehule

Date:

16 July 2016, 08:55:55

2016-04-05 10:45 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:

Hi

here is cleaned/finished previous implementation of RAW_TEXT/RAW_BINARY formats for COPY statements.

The RAW with text formats means unescaped data, but with correct encoding - input/output is realised with input/output function. RAW binary means content produced/received by sending/received functions.

Now both directions (input/output) working well

Some examples of expected usage:

copy (select xmlelement(name foo, 'hello')) to stdout (format raw_binary, encoding 'latin2');

create table avatars(id serial, picture bytea);
\copy avatars(picture) from ~/images/foo.jpg (format raw_binary);
select lastval();

create table doc(id serial, txt text);
\copy doc(txt) from ~/files/aaa.txt (format raw_text, encoding 'latin2');
select lastval();

Regards

Pavel

I am sending fresh version of COPY RAW patch.

There is new regress client test requested by Tom.

Note: I though about another solution based on binary parameters and binary result support in psql. Somelike:

INSERT INTO foo(a) VALUES($1)

\gpush filename

SELECT a FROM foo

\gpop filename

but, it is less intuitive, and doesn't work with stdin/stdout - so it is significant week against COPY based solution for scripting from shell. More \g***** solution is still possible if will be requested in future.

Regards

Pavel

[pavel@nemesis ~]$ cat avatar.gif | psql -Xq -At -c "copy xx(b) from stdin (format raw_text)" -c "select lastval()" postgres
313

Attachment

copy-raw-2016-07-16.patch

Re: raw output from copy

From

Michael Paquier

Date:

03 October 2016, 04:31:46

On Sat, Jul 16, 2016 at 5:55 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> I am sending fresh version of COPY RAW patch.

Moved to next CF per this status.

+++ b/src/interfaces/libpq/test/copy-raw-regress.pl
@@ -0,0 +1,48 @@
+#!/usr/bin/perl -w
+
+use strict;
I don't understand why this is shaped this way, I mean the perl part
if we have the TAP infra in place. MSVC is not testing it as well.
-- 
Michael

Re: raw output from copy

From

Haribabu Kommi

Date:

22 November 2016, 11:48:35

Hi,

This is a gentle reminder.

you assigned as reviewer to the current patch in the 11-2016 commitfest.

But you haven't shared your review yet. Please share your review about

the patch. This will help us in smoother operation of commitfest.

Please Ignore if you already shared your review.

Regards,

Hari Babu

Fujitsu Australia

Re: raw output from copy

From

Haribabu Kommi

Date:

05 December 2016, 05:17:12

On Tue, Nov 22, 2016 at 10:48 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

Hi,

This is a gentle reminder.

you assigned as reviewer to the current patch in the 11-2016 commitfest.
But you haven't shared your review yet. Please share your review about
the patch. This will help us in smoother operation of commitfest.

Please Ignore if you already shared your review.

Patch is not applying properly to HEAD.

Moved to next CF with "waiting on author" status.

Regards,

Hari Babu

Fujitsu Australia

Re: raw output from copy

From

Kohei KaiGai

Date:

05 December 2016, 13:19:38

Sorry for my late response.

I've briefly checked a series of discussion in the past.
I understood the target/purpose of this patch is provision of a fast interface
to import/export a particular cell of a relation, by skip of text<->binary
transformation. Its typical use case are XML and JSON data types. Right?

If so, how about the idea to use fast-path invocation protocol to call functions
to import/export these document types?
It allows to accept binary form of the data stream, with minimum overheads.

It seems to me extend of COPY statement for this optimization is a bit overkill
solution. Do we find out an alternative solution that we can build on
the existing
infrastructure?

Best regards,

2016-12-05 14:16 GMT+09:00 Haribabu Kommi <kommi.haribabu@gmail.com>:
>
>
> On Tue, Nov 22, 2016 at 10:48 PM, Haribabu Kommi <kommi.haribabu@gmail.com>
> wrote:
>>
>>  Hi,
>>
>> This is a gentle reminder.
>>
>> you assigned as reviewer to the current patch in the 11-2016 commitfest.
>> But you haven't shared your review yet. Please share your review about
>> the patch. This will help us in smoother operation of commitfest.
>>
>> Please Ignore if you already shared your review.
>
>
> Patch is not applying properly to HEAD.
> Moved to next CF with "waiting on author" status.
>
>
> Regards,
> Hari Babu
> Fujitsu Australia

-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: raw output from copy

From

Pavel Stehule

Date:

05 December 2016, 13:46:26

2016-12-05 14:19 GMT+01:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:

Sorry for my late response.

I've briefly checked a series of discussion in the past.
I understood the target/purpose of this patch is provision of a fast interface
to import/export a particular cell of a relation, by skip of text<->binary
transformation. Its typical use case are XML and JSON data types. Right?

There are more goals:

1. user friendly import of text or binary data - import text data (with psql) from file is possible - but you have to load a content to psql variable. For binary data you should to use workaround based on LO and transformation from LO to bytea.

2. user friendly export text or binary data - now, the binary data can be exported only via transformation to LO. The XML has very interesting features when is passing from/to client binary. This feature is impossible in psql now.

If so, how about the idea to use fast-path invocation protocol to call functions
to import/export these document types?
It allows to accept binary form of the data stream, with minimum overheads.

Sorry, I don't see a sense - for Fast API I have to write a application - and then I can write a application with support of binary passing.

It seems to me extend of COPY statement for this optimization is a bit overkill
solution. Do we find out an alternative solution that we can build on
the existing
infrastructure?

The advantage and sense of COPY RAW was reusing existing interface. The question was: How I can export/import binary data simply from psql console?

Regards

Pavel

Best regards,

2016-12-05 14:16 GMT+09:00 Haribabu Kommi <kommi.haribabu@gmail.com>:
>
>
> On Tue, Nov 22, 2016 at 10:48 PM, Haribabu Kommi <kommi.haribabu@gmail.com>
> wrote:
>>
>> Hi,
>>
>> This is a gentle reminder.
>>
>> you assigned as reviewer to the current patch in the 11-2016 commitfest.
>> But you haven't shared your review yet. Please share your review about
>> the patch. This will help us in smoother operation of commitfest.
>>
>> Please Ignore if you already shared your review.
>
>
> Patch is not applying properly to HEAD.
> Moved to next CF with "waiting on author" status.
>
>
> Regards,
> Hari Babu
> Fujitsu Australia

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: raw output from copy

From

Kohei KaiGai

Date:

06 December 2016, 00:50:22

2016-12-05 22:45 GMT+09:00 Pavel Stehule <pavel.stehule@gmail.com>:
>
> There are more goals:
>
> 1. user friendly import of text or binary data - import text data (with
> psql) from file is possible - but you have to load a content to psql
> variable. For binary data you should to use workaround based on LO and
> transformation from LO to bytea.
>
> 2. user friendly export text or binary data - now, the binary data can be
> exported only via transformation to LO. The XML has very interesting
> features when is passing from/to client binary. This feature is impossible
> in psql now.
> :
<snip> :
>> It seems to me extend of COPY statement for this optimization is a bit
>> overkill
>> solution. Do we find out an alternative solution that we can build on
>> the existing
>> infrastructure?
>
> The advantage and sense of COPY RAW was reusing existing interface. The
> question was: How I can export/import binary data simply from psql console?
>
OK, I could get your point.

Likeky, we can implement the feature without COPY statement enhancement
by adding a special purpose function and \xxx command on psql.

Let's assume the two commands below on psql:

\blob_import <table_name> <column_name> (STDIN|<filename>)
\blob_export <query> (STDOUT|<filename>)

On \blob_import, the psql command reads the binary contents from either
stdin or file, than call a special purpose function that takes three
arguments; table name, column name and a binary data chunk.
PQexecParams() of libpq allows to deliver the data chunk with keeping
binary data format, then the special purpose function will be able to
lookup the destination table/column and construct a tuple that contains
the supplied data chunk. (I think xxxx_recv handler shall be used for
data validation, but not an element of this feature.)


On \blob_export, the psql command also set up a simple query as follows: SELECT blob_export((<user's supplied query))
For example, \blob_export SELECT binary_data FROM my_table WHERE id = 10   /tmp/aaa
shall be transformed to SELECT blob_export((SELECT binary_data FROM my_table WHERE id = 10))

This function is declared as: blob_export(anyelement) RETURNS bytea
So, as long as the user supplied query returns exactly one column and
one row, it can transform the argument to the binary stream, then psql
command receive it and dump somewhere; stdout or file.

How about your thought?

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: raw output from copy

From

Pavel Stehule

Date:

06 December 2016, 08:00:00

2016-12-06 1:50 GMT+01:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:

2016-12-05 22:45 GMT+09:00 Pavel Stehule <pavel.stehule@gmail.com>:
>
> There are more goals:
>
> 1. user friendly import of text or binary data - import text data (with
> psql) from file is possible - but you have to load a content to psql
> variable. For binary data you should to use workaround based on LO and
> transformation from LO to bytea.
>
> 2. user friendly export text or binary data - now, the binary data can be
> exported only via transformation to LO. The XML has very interesting
> features when is passing from/to client binary. This feature is impossible
> in psql now.
>
:
<snip>
:
>> It seems to me extend of COPY statement for this optimization is a bit
>> overkill
>> solution. Do we find out an alternative solution that we can build on
>> the existing
>> infrastructure?
>
> The advantage and sense of COPY RAW was reusing existing interface. The
> question was: How I can export/import binary data simply from psql console?
>
OK, I could get your point.

Likeky, we can implement the feature without COPY statement enhancement
by adding a special purpose function and \xxx command on psql.

Let's assume the two commands below on psql:

\blob_import <table_name> <column_name> (STDIN|<filename>)
\blob_export <query> (STDOUT|<filename>)

On \blob_import, the psql command reads the binary contents from either
stdin or file, than call a special purpose function that takes three
arguments; table name, column name and a binary data chunk.
PQexecParams() of libpq allows to deliver the data chunk with keeping
binary data format, then the special purpose function will be able to
lookup the destination table/column and construct a tuple that contains
the supplied data chunk. (I think xxxx_recv handler shall be used for
data validation, but not an element of this feature.)

On \blob_export, the psql command also set up a simple query as follows:
SELECT blob_export((<user's supplied query))
For example,
\blob_export SELECT binary_data FROM my_table WHERE id = 10 /tmp/aaa
shall be transformed to
SELECT blob_export((SELECT binary_data FROM my_table WHERE id = 10))

This is reason why I prefer a COPY statement - because it does all necessary things natural. But if there is a disagreement against COPY RAW it can be implemented as psql commands.

export should be similar like \g, \gset feature

SELECT xmldoc FROM xxxx

\gbinary_store xxxx.xxx

import is maybe better solved by proposed file references in queries

Regards

Pavel

This function is declared as:
blob_export(anyelement) RETURNS bytea
So, as long as the user supplied query returns exactly one column and
one row, it can transform the argument to the binary stream, then psql
command receive it and dump somewhere; stdout or file.

How about your thought?

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: raw output from copy

From

Kohei KaiGai

Date:

06 December 2016, 14:30:27

2016-12-06 16:59 GMT+09:00 Pavel Stehule <pavel.stehule@gmail.com>:
>
>
> 2016-12-06 1:50 GMT+01:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:
>>
>> 2016-12-05 22:45 GMT+09:00 Pavel Stehule <pavel.stehule@gmail.com>:
>> >
>> > There are more goals:
>> >
>> > 1. user friendly import of text or binary data - import text data (with
>> > psql) from file is possible - but you have to load a content to psql
>> > variable. For binary data you should to use workaround based on LO and
>> > transformation from LO to bytea.
>> >
>> > 2. user friendly export text or binary data - now, the binary data can
>> > be
>> > exported only via transformation to LO. The XML has very interesting
>> > features when is passing from/to client binary. This feature is
>> > impossible
>> > in psql now.
>> >
>>   :
>> <snip>
>>   :
>> >> It seems to me extend of COPY statement for this optimization is a bit
>> >> overkill
>> >> solution. Do we find out an alternative solution that we can build on
>> >> the existing
>> >> infrastructure?
>> >
>> > The advantage and sense of COPY RAW was reusing existing interface. The
>> > question was: How I can export/import binary data simply from psql
>> > console?
>> >
>> OK, I could get your point.
>>
>> Likeky, we can implement the feature without COPY statement enhancement
>> by adding a special purpose function and \xxx command on psql.
>>
>> Let's assume the two commands below on psql:
>>
>> \blob_import <table_name> <column_name> (STDIN|<filename>)
>> \blob_export <query> (STDOUT|<filename>)
>>
>> On \blob_import, the psql command reads the binary contents from either
>> stdin or file, than call a special purpose function that takes three
>> arguments; table name, column name and a binary data chunk.
>> PQexecParams() of libpq allows to deliver the data chunk with keeping
>> binary data format, then the special purpose function will be able to
>> lookup the destination table/column and construct a tuple that contains
>> the supplied data chunk. (I think xxxx_recv handler shall be used for
>> data validation, but not an element of this feature.)
>>
>>
>> On \blob_export, the psql command also set up a simple query as follows:
>>   SELECT blob_export((<user's supplied query))
>> For example,
>>   \blob_export SELECT binary_data FROM my_table WHERE id = 10   /tmp/aaa
>> shall be transformed to
>>   SELECT blob_export((SELECT binary_data FROM my_table WHERE id = 10))
>
>
> This is reason why I prefer a COPY statement - because it does all necessary
> things natural.  But if there is a disagreement against COPY RAW it can be
> implemented as psql commands.
>
Yes, both of approach will be able to implement what you want to do.
I agree it is valuable if psql can import/export a particular item with
simple shell-script description, however, here is no consensus how
to implement it.

If psql supports the special \xxx command, it is equivalently convenient
from the standpoint of users, with no enhancement of the statement.

I hope committers comment on the approach we will take on.

Thanks,

> export should be similar like \g, \gset feature
>
> so
>
> SELECT xmldoc FROM xxxx
> \gbinary_store xxxx.xxx
>
> import is maybe better solved by proposed file references in queries
>
> Regards
>
> Pavel
>
>
>>
>>
>> This function is declared as:
>>   blob_export(anyelement) RETURNS bytea
>> So, as long as the user supplied query returns exactly one column and
>> one row, it can transform the argument to the binary stream, then psql
>> command receive it and dump somewhere; stdout or file.
>>
>> How about your thought?
>>
>> Thanks,
>> --
>> KaiGai Kohei <kaigai@kaigai.gr.jp>
>
>



-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>