Thread: raw output from copy
http://www.postgresql.org/message-id/16174.1319228878@sss.pgh.pa.us
postgres=# set client_encoding to 'latin2';
SET
Time: 1.488 ms
postgres=# \copy (select xmlelement(name xx, d) from d) to ~/d.xml (format 'raw')
COPY 1
Time: 1.108 ms
postgres=# copy (select xmlelement(name xx, d) from d) to stdout (format 'raw') ;
<?xml version="1.0" encoding="LATIN2"?><xx>příliš žluťoučký kůň</xx>Time: 1.000 ms
PavelRegardsObjections? Ideas?It requires only one row, one column result - and result is just raw binary data without size.I propose to implement new format option "RAW" like Tom proposed.HiThis thread was finished without real work. I have a real use case - export XML doc in non utf8 encoding.
http://www.postgresql.org/message-id/16174.1319228878@sss.pgh.pa.us
Attachment
On 4/10/15 5:26 PM, Pavel Stehule wrote: > Hi > > I wrote a prototype of this patch, and it works well > > postgres=# set client_encoding to 'latin2'; > SET > Time: 1.488 ms > postgres=# \copy (select xmlelement(name xx, d) from d) to ~/d.xml > (format 'raw') > COPY 1 > Time: 1.108 ms > postgres=# copy (select xmlelement(name xx, d) from d) to stdout (format > 'raw') ; > <?xml version="1.0" encoding="LATIN2"?><xx>příliš žluťoučký > kůň</xx>Time: 1.000 ms I think you can get the same thing using regular psql output and just turning off all field and record separators and tuple headers and so on.
<p dir="ltr">It would be nice, but it is not true. You can get correct non utf8 xml with encoding specification only whenbinary mode is used. Psql doesn't support binary mode.<p dir="ltr">Regards<p dir="ltr">Pavel<div class="gmail_quote">Dne15. 4. 2015 22:06 napsal uživatel "Peter Eisentraut" <<a href="mailto:peter_e@gmx.net">peter_e@gmx.net</a>>:<brtype="attribution" /><blockquote class="gmail_quote" style="margin:00 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 4/10/15 5:26 PM, Pavel Stehule wrote:<br /> >Hi<br /> ><br /> > I wrote a prototype of this patch, and it works well<br /> ><br /> > postgres=# set client_encodingto 'latin2';<br /> > SET<br /> > Time: 1.488 ms<br /> > postgres=# \copy (select xmlelement(namexx, d) from d) to ~/d.xml<br /> > (format 'raw')<br /> > COPY 1<br /> > Time: 1.108 ms<br /> >postgres=# copy (select xmlelement(name xx, d) from d) to stdout (format<br /> > 'raw') ;<br /> > <?xml version="1.0"encoding="LATIN2"?><xx>příliš žluťoučký<br /> > kůň</xx>Time: 1.000 ms<br /><br /> I thinkyou can get the same thing using regular psql output and just<br /> turning off all field and record separators andtuple headers and so on.<br /><br /></blockquote></div>
I looked through the patch. Sources are OK. However I didn't find any docs and test cases. Would you please provide me with short description on this feature and why it is important. Because I didn't manage to find the old Andrew Dunstan's post either.
PavelRegardsHiI wrote a prototype of this patch, and it works well
postgres=# set client_encoding to 'latin2';
SET
Time: 1.488 ms
postgres=# \copy (select xmlelement(name xx, d) from d) to ~/d.xml (format 'raw')
COPY 1
Time: 1.108 ms
postgres=# copy (select xmlelement(name xx, d) from d) to stdout (format 'raw') ;
<?xml version="1.0" encoding="LATIN2"?><xx>příliš žluťoučký kůň</xx>Time: 1.000 ms2015-04-09 20:48 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:PavelRegardsObjections? Ideas?It requires only one row, one column result - and result is just raw binary data without size.I propose to implement new format option "RAW" like Tom proposed.HiThis thread was finished without real work. I have a real use case - export XML doc in non utf8 encoding.
http://www.postgresql.org/message-id/16174.1319228878@sss.pgh.pa.us
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
I looked through the patch. Sources are OK. However I didn't find any docs and test cases. Would you please provide me with short description on this feature and why it is important. Because I didn't manage to find the old Andrew Dunstan's post either.
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1 July 2015 at 07:42, Pavel Golub <pavel@microolap.com> wrote:I looked through the patch. Sources are OK. However I didn't find any docs and test cases. Would you please provide me with short description on this feature and why it is important. Because I didn't manage to find the old Andrew Dunstan's post either.Feature sounds OK, so lets do it.Pavel S, please submit a polished patch. Coding guidelines, tests, docs etc. Set back to Waiting On Author.--Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 07/02/2015 07:14 AM, Pavel Stehule wrote: > Hi > > I'll do it today evening > > Pavel, Please don't top-post on the PostgreSQL lists. You've been around here long enough to know that bottom posting is our custom. I posted a patch for this in 2013 at <http://www.postgresql.org/message-id/50F2FA92.9040000@dunslane.net> but it can apply to a SELECT, and doesn't need COPY. Nobody seemed very interested, so I dropped it. Apparently people now want something along these lines, which is good. cheers andrew
On 07/02/2015 09:02 AM, Andrew Dunstan wrote: > > On 07/02/2015 07:14 AM, Pavel Stehule wrote: >> Hi >> >> I'll do it today evening >> >> > > Pavel, > > Please don't top-post on the PostgreSQL lists. You've been around here > long enough to know that bottom posting is our custom. > > I posted a patch for this in 2013 at > <http://www.postgresql.org/message-id/50F2FA92.9040000@dunslane.net> > but it can apply to a SELECT, and doesn't need COPY. Nobody seemed > very interested, so I dropped it. Apparently people now want something > along these lines, which is good. For reference, here's the Wayback Machine's version of the original blog post referred to: <http://web.archive.org/web/20110916023912/http://people.planetpostgresql.org/andrew/index.php?/archives/196-Clever-trick-challenge.html> cheers andrew
Please don't top-post on the PostgreSQL lists. You've been around here long enough to know that bottom posting is our custom.
I posted a patch for this in 2013 at <http://www.postgresql.org/message-id/50F2FA92.9040000@dunslane.net> but it can apply to a SELECT, and doesn't need COPY. Nobody seemed very interested, so I dropped it. Apparently people now want something along these lines, which is good.
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2 July 2015 at 14:02, Andrew Dunstan <andrew@dunslane.net> wrote:
Please don't top-post on the PostgreSQL lists. You've been around here long enough to know that bottom posting is our custom.
I posted a patch for this in 2013 at <http://www.postgresql.org/message-id/50F2FA92.9040000@dunslane.net> but it can apply to a SELECT, and doesn't need COPY. Nobody seemed very interested, so I dropped it. Apparently people now want something along these lines, which is good.It's a shame that both solutions are restricted to either COPY or psql.Both of those are working on suggestions from Tom, so there is no history of preference there.Can we have both please, gentlemen?If we implemented Andrew's solution, how would we request it in a COPY statement? Seems like we would want the RAW format keyword anyway.
--Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 07/02/2015 09:43 AM, Simon Riggs wrote: > On 2 July 2015 at 14:02, Andrew Dunstan <andrew@dunslane.net > <mailto:andrew@dunslane.net>> wrote: > > > Please don't top-post on the PostgreSQL lists. You've been around > here long enough to know that bottom posting is our custom. > > I posted a patch for this in 2013 at > <http://www.postgresql.org/message-id/50F2FA92.9040000@dunslane.net> > but it can apply to a SELECT, and doesn't need COPY. Nobody seemed > very interested, so I dropped it. Apparently people now want > something along these lines, which is good. > > > It's a shame that both solutions are restricted to either COPY or psql. > > Both of those are working on suggestions from Tom, so there is no > history of preference there. > > Can we have both please, gentlemen? > > If we implemented Andrew's solution, how would we request it in a COPY > statement? Seems like we would want the RAW format keyword anyway. > > What's the use case? My original motivation was that I had a function that returned a bytea (it was a PDF in fact) that I wanted to be able to write to a file. Of course, this is easy enough to do with a client library like perl's DBD::Pg, but it seems sad to have to resort to that for something so simple. My original suggestion (<http://www.postgresql.org/message-id/4EA1B83B.2050605@pgexperts.com>) was to invent a \bcopy command. I don't have a problem in building in a RAW mode for copy, but we'll still need to teach psql how to deal with it. Another case where it could be useful is JSON - so we can avoid having to play tricks like <http://adpgtech.blogspot.com/2014/09/importing-json-data.html>. Similar considerations probably apply to XML, and the tricks are less guaranteed to work. cheers andrew
On 07/02/2015 09:43 AM, Simon Riggs wrote:On 2 July 2015 at 14:02, Andrew Dunstan <andrew@dunslane.net <mailto:andrew@dunslane.net>> wrote:
Please don't top-post on the PostgreSQL lists. You've been around
here long enough to know that bottom posting is our custom.
I posted a patch for this in 2013 at
<http://www.postgresql.org/message-id/50F2FA92.9040000@dunslane.net>
but it can apply to a SELECT, and doesn't need COPY. Nobody seemed
very interested, so I dropped it. Apparently people now want
something along these lines, which is good.
It's a shame that both solutions are restricted to either COPY or psql.
Both of those are working on suggestions from Tom, so there is no history of preference there.
Can we have both please, gentlemen?
If we implemented Andrew's solution, how would we request it in a COPY statement? Seems like we would want the RAW format keyword anyway.
What's the use case? My original motivation was that I had a function that returned a bytea (it was a PDF in fact) that I wanted to be able to write to a file. Of course, this is easy enough to do with a client library like perl's DBD::Pg, but it seems sad to have to resort to that for something so simple.
My original suggestion (<http://www.postgresql.org/message-id/4EA1B83B.2050605@pgexperts.com>) was to invent a \bcopy command.
I don't have a problem in building in a RAW mode for copy, but we'll still need to teach psql how to deal with it.
Another case where it could be useful is JSON - so we can avoid having to play tricks like <http://adpgtech.blogspot.com/2014/09/importing-json-data.html>. Similar considerations probably apply to XML, and the tricks are less guaranteed to work.
cheers
andrew
It can be used from psql without any problems.
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 07/02/2015 10:07 AM, Pavel Stehule wrote: > > > 2015-07-02 16:02 GMT+02:00 Andrew Dunstan <andrew@dunslane.net > <mailto:andrew@dunslane.net>>: > > > On 07/02/2015 09:43 AM, Simon Riggs wrote: > > On 2 July 2015 at 14:02, Andrew Dunstan <andrew@dunslane.net > <mailto:andrew@dunslane.net> <mailto:andrew@dunslane.net > <mailto:andrew@dunslane.net>>> wrote: > > > Please don't top-post on the PostgreSQL lists. You've been > around > here long enough to know that bottom posting is our custom. > > I posted a patch for this in 2013 at > > <http://www.postgresql.org/message-id/50F2FA92.9040000@dunslane.net> > but it can apply to a SELECT, and doesn't need COPY. > Nobody seemed > very interested, so I dropped it. Apparently people now want > something along these lines, which is good. > > > It's a shame that both solutions are restricted to either COPY > or psql. > > Both of those are working on suggestions from Tom, so there is > no history of preference there. > > Can we have both please, gentlemen? > > If we implemented Andrew's solution, how would we request it > in a COPY statement? Seems like we would want the RAW format > keyword anyway. > > > > > What's the use case? My original motivation was that I had a > function that returned a bytea (it was a PDF in fact) that I > wanted to be able to write to a file. Of course, this is easy > enough to do with a client library like perl's DBD::Pg, but it > seems sad to have to resort to that for something so simple. > > My original suggestion > (<http://www.postgresql.org/message-id/4EA1B83B.2050605@pgexperts.com>) > was to invent a \bcopy command. > > I don't have a problem in building in a RAW mode for copy, but > we'll still need to teach psql how to deal with it. > > > It can be used from psql without any problems. In fact your patch will not work with psql's \copy nor to stdout at all, unless I'm misreading it: - if (cstate->binary) + if (cstate->binary || cstate->raw) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("COPY BINARY is not supported to stdout or from stdin"))); So it looks like you're only supporting this where the server is writing to a file. That's horribly narrow, and certainly doesn't meet my original need. Does the COPY line protocol even support binary data? If not, we're dead in the water here from the psql POV. Because my patch doesn't use the COPY protocol it doesn't have this problem. Perhaps we should do both, although I'm not sure I understand the use case for the COPY solution. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Does the COPY line protocol even support binary data? The protocol, per se, just transmits a byte stream. There is a field in the CopyInResponse/CopyOutResponse messages that indicates whether a text or binary copy is being done. One thing we'd have to consider is whether "raw" mode is sufficiently different from binary to justify an additional value for this field, and if so whether that constitutes a protocol break. IIRC, psql wouldn't really care; it just transfers the byte stream to or from the target file, regardless of text or binary mode. But there might be other client libraries that are smarter and expect "binary" mode to mean the binary file format specified in the COPY reference page. So there may be value in being explicit about "raw" mode in these messages. A key point in all this is that people who need "raw" transfer probably need it in both directions, a point that your SELECT proposal cannot satisfy, but hacking COPY could. So I lean towards the latter really. regards, tom lane
On 07/02/2015 11:02 AM, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> Does the COPY line protocol even support binary data? > The protocol, per se, just transmits a byte stream. There is a field > in the CopyInResponse/CopyOutResponse messages that indicates whether > a text or binary copy is being done. One thing we'd have to consider > is whether "raw" mode is sufficiently different from binary to justify > an additional value for this field, and if so whether that constitutes > a protocol break. > > IIRC, psql wouldn't really care; it just transfers the byte stream to or > from the target file, regardless of text or binary mode. But there might > be other client libraries that are smarter and expect "binary" mode to > mean the binary file format specified in the COPY reference page. So > there may be value in being explicit about "raw" mode in these messages. > > A key point in all this is that people who need "raw" transfer probably > need it in both directions, a point that your SELECT proposal cannot > satisfy, but hacking COPY could. So I lean towards the latter really. > > OK, let's do that. I await the result with interest. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes:
> Does the COPY line protocol even support binary data?
The protocol, per se, just transmits a byte stream. There is a field
in the CopyInResponse/CopyOutResponse messages that indicates whether
a text or binary copy is being done. One thing we'd have to consider
is whether "raw" mode is sufficiently different from binary to justify
an additional value for this field, and if so whether that constitutes
a protocol break. sql/plpgsql_check_passive-9.6.sql
IIRC, psql wouldn't really care; it just transfers the byte stream to or
from the target file, regardless of text or binary mode. But there might
be other client libraries that are smarter and expect "binary" mode to
mean the binary file format specified in the COPY reference page. So
there may be value in being explicit about "raw" mode in these messages.
A key point in all this is that people who need "raw" transfer probably
need it in both directions, a point that your SELECT proposal cannot
satisfy, but hacking COPY could. So I lean towards the latter really.
regards, tom lane
postgres=# copy foo from '/tmp/1.jpg' (format raw);
COPY 1
Time: 93.021 ms
postgres=# \dt+ foo
List of relations
┌────────┬──────┬───────┬───────┬────────┬─────────────┐
│ Schema │ Name │ Type │ Owner │ Size │ Description │
╞════════╪══════╪═══════╪═══════╪════════╪═════════════╡
│ public │ foo │ table │ pavel │ 256 kB │ │
└────────┴──────┴───────┴───────┴────────┴─────────────┘
(1 row)
postgres=# \copy foo to '~/3.jpg' (format raw)
COPY 1
Time: 2.401 ms
Andrew Dunstan <andrew@dunslane.net> writes:
> Does the COPY line protocol even support binary data?
The protocol, per se, just transmits a byte stream. There is a field
in the CopyInResponse/CopyOutResponse messages that indicates whether
a text or binary copy is being done. One thing we'd have to consider
is whether "raw" mode is sufficiently different from binary to justify
an additional value for this field, and if so whether that constitutes
a protocol break.
IIRC, psql wouldn't really care; it just transfers the byte stream to or
from the target file, regardless of text or binary mode. But there might
be other client libraries that are smarter and expect "binary" mode to
mean the binary file format specified in the COPY reference page. So
there may be value in being explicit about "raw" mode in these messages.
A key point in all this is that people who need "raw" transfer probably
need it in both directions, a point that your SELECT proposal cannot
satisfy, but hacking COPY could. So I lean towards the latter really.
regards, tom lane
Attachment
PavelRegardsHihere is a version with both direction support.
postgres=# copy foo from '/tmp/1.jpg' (format raw);
COPY 1
Time: 93.021 ms
postgres=# \dt+ foo
List of relations
┌────────┬──────┬───────┬───────┬────────┬─────────────┐
│ Schema │ Name │ Type │ Owner │ Size │ Description │
╞════════╪══════╪═══════╪═══════╪════════╪═════════════╡
│ public │ foo │ table │ pavel │ 256 kB │ │
└────────┴──────┴───────┴───────┴────────┴─────────────┘
(1 row)
postgres=# \copy foo to '~/3.jpg' (format raw)
COPY 1
Time: 2.401 ms2015-07-02 17:02 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:Andrew Dunstan <andrew@dunslane.net> writes:
> Does the COPY line protocol even support binary data?
The protocol, per se, just transmits a byte stream. There is a field
in the CopyInResponse/CopyOutResponse messages that indicates whether
a text or binary copy is being done. One thing we'd have to consider
is whether "raw" mode is sufficiently different from binary to justify
an additional value for this field, and if so whether that constitutes
a protocol break.
IIRC, psql wouldn't really care; it just transfers the byte stream to or
from the target file, regardless of text or binary mode. But there might
be other client libraries that are smarter and expect "binary" mode to
mean the binary file format specified in the COPY reference page. So
there may be value in being explicit about "raw" mode in these messages.
A key point in all this is that people who need "raw" transfer probably
need it in both directions, a point that your SELECT proposal cannot
satisfy, but hacking COPY could. So I lean towards the latter really.
regards, tom lane
Attachment
2015-07-07 3:32 GMT-03:00 Pavel Stehule <pavel.stehule@gmail.com>: > > Hi > > previous patch was broken, and buggy > > Here is new version with fixed upload and more tests > > The interesting is so I should not to modify interface or client - so it should to work with any current driver with protocolsupport >= 3. Hi Pavel, Here are some thoughts: 1) from docs: "only row data in network byte order are exported or imported." Should it be "only raw data"? 2) from docs "Because this format doesn't support any delimiter, only one value can be exported or imported. NULL values are not allowed." That "only one value can be exported or imported" is a little sad for someone with a table with more than one column that accepts bytea. The implemented feature doesn't covers the use-case where a table 'image' has columns: id integer, image bytea, thumbnail bytea, and I want to import binary data in that. We could put here the cases where we have NOT NULL columns. Since these are expected and the error messages complain about that couldn't them be covered in docs more explicitly? 3) from code: "bool row_processed; /* true, when first row was processed */" Maybe rename the variable to something like `first_row_processed` and rip off the comment? 4) from code: if (cstate->raw) format = 2; else if (cstate->binary) format = 1; else format = 0; Maybe create a constant for code readability? If by one side this feature does not covers a more generalized case, by other is a nice start, IMHO. -- Dickson S. Guedes mail/xmpp: guedes@guedesoft.net - skype: guediz http://github.com/guedes - http://guedesoft.net http://www.postgresql.org.br
2015-07-07 3:32 GMT-03:00 Pavel Stehule <pavel.stehule@gmail.com>:
>
> Hi
>
> previous patch was broken, and buggy
>
> Here is new version with fixed upload and more tests
>
> The interesting is so I should not to modify interface or client - so it should to work with any current driver with protocol support >= 3.
Hi Pavel,
Here are some thoughts:
1) from docs: "only row data in network byte order are exported or imported."
Should it be "only raw data"?
2) from docs "Because this format doesn't support any delimiter, only
one value can be exported or imported. NULL values are not allowed."
That "only one value can be exported or imported" is a little sad for
someone with a table with more than one column that accepts bytea. The
implemented feature doesn't covers the use-case where a table 'image'
has columns: id integer, image bytea, thumbnail bytea, and I want to
import binary data in that. We could put here the cases where we have
NOT NULL columns. Since these are expected and the error messages
complain about that couldn't them be covered in docs more explicitly?
3) from code: "bool row_processed; /* true, when first row was processed */"
Maybe rename the variable to something like `first_row_processed` and
rip off the comment?
4) from code:
if (cstate->raw)
format = 2;
else if (cstate->binary)
format = 1;
else
format = 0;
Maybe create a constant for code readability?
If by one side this feature does not covers a more generalized case,
by other is a nice start, IMHO.
--
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://github.com/guedes - http://guedesoft.net
http://www.postgresql.org.br
On 7 July 2015 at 14:32, Pavel Stehule <pavel.stehule@gmail.com> wrote: > Hi > > previous patch was broken, and buggy > > Here is new version with fixed upload and more tests I routinely see people trying to use COPY ... FORMAT binary to export a single binary field (like an image, for example) and getting confused by the header PostgreSQL adds. Or using text-format COPY and struggling with the hex escaping. It's clearly something people have trouble with. It doesn't help that while lo_import and lo_export can read paths outside the datadir (and refuse to read from within it), pg_read_binary_file is superuser only and disallows absolute paths. There's no corresponding pg_write_binary_file. So users who want to import and export a single binary field tend to try to use COPY. We have functionality for large objects that has no equivalent for 'bytea'. I don't love the use of COPY for this, but it gets us support for arbitrary clients pretty easily. Otherwise it'd be server-side only via local filesystem access, or require special psql-specific functionality like we have for lo_import etc. The main point is that this is a real world thing. People want to do it, try to do it, and have problems doing it. So it's a solution a real issue. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 07/27/2015 06:55 AM, Craig Ringer wrote: > On 7 July 2015 at 14:32, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> Hi >> >> previous patch was broken, and buggy >> >> Here is new version with fixed upload and more tests > > I routinely see people trying to use COPY ... FORMAT binary to export > a single binary field (like an image, for example) and getting > confused by the header PostgreSQL adds. Or using text-format COPY and > struggling with the hex escaping. It's clearly something people have > trouble with. > > It doesn't help that while lo_import and lo_export can read paths > outside the datadir (and refuse to read from within it), > pg_read_binary_file is superuser only and disallows absolute paths. > There's no corresponding pg_write_binary_file. So users who want to > import and export a single binary field tend to try to use COPY. We > have functionality for large objects that has no equivalent for > 'bytea'. > > I don't love the use of COPY for this, but it gets us support for > arbitrary clients pretty easily. Otherwise it'd be server-side only > via local filesystem access, or require special psql-specific > functionality like we have for lo_import etc. COPY seems like a strange interface for this. I can see the point that the syntax is almost there already, for both input and output. But even that's not quite there yet, we'd need the new RAW format. And as an input method, COPY is a bit awkward, because you cannot easily pass the file to a function, for example. I think this should be implemented in psql, along the lines of Andrew's original \bcopy patch. There are a couple of related psql-features here actually, that would be useful on their own. The first is being able to send the query result to a file, for a single query only. You can currently do: \o /tmp/foo SELECT ...; \o But more often than not, when I try to do that, I forget to do the last \o, and run another query, and the output still goes to the file. So it'd be nice to have a \o option that only affects the next query. Something like: \O /tmp/foo SELECT ...; The second feature needed is to write the output without any headers, row delimiters and such. Just the datum. And the third feature is to write it in binary. Perhaps something like: \O /tmp/foo binary SELECT blob FROM foo WHERE id = 10; What about input? This is a whole new feature, but it would be nice to be able to pass the file contents as a query parameter. Something like: \P /tmp/foo binary INSERT INTO foo VALUES (?); - Heikki
On 07/27/2015 06:55 AM, Craig Ringer wrote:On 7 July 2015 at 14:32, Pavel Stehule <pavel.stehule@gmail.com> wrote:Hi
previous patch was broken, and buggy
Here is new version with fixed upload and more tests
I routinely see people trying to use COPY ... FORMAT binary to export
a single binary field (like an image, for example) and getting
confused by the header PostgreSQL adds. Or using text-format COPY and
struggling with the hex escaping. It's clearly something people have
trouble with.
It doesn't help that while lo_import and lo_export can read paths
outside the datadir (and refuse to read from within it),
pg_read_binary_file is superuser only and disallows absolute paths.
There's no corresponding pg_write_binary_file. So users who want to
import and export a single binary field tend to try to use COPY. We
have functionality for large objects that has no equivalent for
'bytea'.
I don't love the use of COPY for this, but it gets us support for
arbitrary clients pretty easily. Otherwise it'd be server-side only
via local filesystem access, or require special psql-specific
functionality like we have for lo_import etc.
COPY seems like a strange interface for this. I can see the point that the syntax is almost there already, for both input and output. But even that's not quite there yet, we'd need the new RAW format. And as an input method, COPY is a bit awkward, because you cannot easily pass the file to a function, for example. I think this should be implemented in psql, along the lines of Andrew's original \bcopy patch.
There are a couple of related psql-features here actually, that would be useful on their own. The first is being able to send the query result to a file, for a single query only. You can currently do:
\o /tmp/foo
SELECT ...;
\o
But more often than not, when I try to do that, I forget to do the last \o, and run another query, and the output still goes to the file. So it'd be nice to have a \o option that only affects the next query. Something like:
\O /tmp/foo
SELECT ...;
The second feature needed is to write the output without any headers, row delimiters and such. Just the datum. And the third feature is to write it in binary. Perhaps something like:
\O /tmp/foo binary
SELECT blob FROM foo WHERE id = 10;
What about input? This is a whole new feature, but it would be nice to be able to pass the file contents as a query parameter. Something like:
\P /tmp/foo binary
INSERT INTO foo VALUES (?);
- Heikki
On 07/27/2015 02:28 PM, Pavel Stehule wrote: > 2015-07-27 10:41 GMT+02:00 Heikki Linnakangas <hlinnaka@iki.fi>: > >> What about input? This is a whole new feature, but it would be nice to be >> able to pass the file contents as a query parameter. Something like: >> >> \P /tmp/foo binary >> INSERT INTO foo VALUES (?); > > The example of input is strong reason, why don't do it via inserts. Only > parsing some special "?" symbol needs lot of new code. Sorry, I meant $1 in place of the ?. No special parsing needed, psql can send the query to the server as is, with the parameters that are given by this new mechanism. > In this case, I don't see any advantage of psql based solution. COPY is > standard interface for input/output from/to files, and it should be used > there. I'm not too happy with the COPY approach, although I won't object is one of the other committers feel more comfortable with it. However, we don't seem to be making progress here, so I'm going to mark this as Returned with Feedback. I don't feel good about that either, because I don't actually have any great suggestions on how to move this forward. Which is a pity because this is a genuine problem for users. - Heikki
On 08/05/2015 04:59 PM, Heikki Linnakangas wrote: > On 07/27/2015 02:28 PM, Pavel Stehule wrote: >> 2015-07-27 10:41 GMT+02:00 Heikki Linnakangas <hlinnaka@iki.fi>: >> >>> What about input? This is a whole new feature, but it would be nice >>> to be >>> able to pass the file contents as a query parameter. Something like: >>> >>> \P /tmp/foo binary >>> INSERT INTO foo VALUES (?); >> >> The example of input is strong reason, why don't do it via inserts. Only >> parsing some special "?" symbol needs lot of new code. > > Sorry, I meant $1 in place of the ?. No special parsing needed, psql > can send the query to the server as is, with the parameters that are > given by this new mechanism. > >> In this case, I don't see any advantage of psql based solution. COPY is >> standard interface for input/output from/to files, and it should be used >> there. > > I'm not too happy with the COPY approach, although I won't object is > one of the other committers feel more comfortable with it. However, we > don't seem to be making progress here, so I'm going to mark this as > Returned with Feedback. I don't feel good about that either, because I > don't actually have any great suggestions on how to move this forward. > Which is a pity because this is a genuine problem for users. > > This is really only a psql problem, IMNSHO. Inserting and extracting binary data is pretty trivial for most users of client libraries (e.g. it's a couple of lines of code in a DBD::Pg program), but it's hard in psql. I do agree that the COPY approach feels more than a little klunky. cheers andrew
<p dir="ltr">Hi,<p dir="ltr">Psql based implementation needs new infrastructure (more than few lines)<p dir="ltr">Missing:<pdir="ltr">* binary mode support<br /> * parametrized query support,<p dir="ltr">I am not against, butboth points I proposed, and both was rejected.<p dir="ltr">So why dont use current infrastructure? Raw copy is trivialpatch.<div class="gmail_quote">Dne 6.8.2015 0:09 napsal uživatel "Andrew Dunstan" <<a href="mailto:andrew@dunslane.net">andrew@dunslane.net</a>>:<brtype="attribution" /><blockquote class="gmail_quote" style="margin:00 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br /> On 08/05/2015 04:59 PM, Heikki Linnakangas wrote:<br/><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> On 07/27/201502:28 PM, Pavel Stehule wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #cccsolid;padding-left:1ex"> 2015-07-27 10:41 GMT+02:00 Heikki Linnakangas <<a href="mailto:hlinnaka@iki.fi" target="_blank">hlinnaka@iki.fi</a>>:<br/><br /><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px#ccc solid;padding-left:1ex"> What about input? This is a whole new feature, but it would be nice tobe<br /> able to pass the file contents as a query parameter. Something like:<br /><br /> \P /tmp/foo binary<br /> INSERTINTO foo VALUES (?);<br /></blockquote><br /> The example of input is strong reason, why don't do it via inserts. Only<br/> parsing some special "?" symbol needs lot of new code.<br /></blockquote><br /> Sorry, I meant $1 in place of the?. No special parsing needed, psql can send the query to the server as is, with the parameters that are given by thisnew mechanism.<br /><br /><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">In this case, I don't see any advantage of psql based solution. COPY is<br /> standard interfacefor input/output from/to files, and it should be used<br /> there.<br /></blockquote><br /> I'm not too happy withthe COPY approach, although I won't object is one of the other committers feel more comfortable with it. However, wedon't seem to be making progress here, so I'm going to mark this as Returned with Feedback. I don't feel good about thateither, because I don't actually have any great suggestions on how to move this forward. Which is a pity because thisis a genuine problem for users.<br /><br /><br /></blockquote><br /> This is really only a psql problem, IMNSHO. Insertingand extracting binary data is pretty trivial for most users of client libraries (e.g. it's a couple of lines ofcode in a DBD::Pg program), but it's hard in psql.<br /><br /> I do agree that the COPY approach feels more than a littleklunky.<br /><br /> cheers<br /><br /> andrew<br /><br /><br /></blockquote></div>
Hi,
Psql based implementation needs new infrastructure (more than few lines)
Missing:
* binary mode support
* parametrized query support,I am not against, but both points I proposed, and both was rejected.
So why dont use current infrastructure? Raw copy is trivial patch.
I am sending rebased patch
Attachment
Hi2015-08-06 10:37 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:Hi,
Psql based implementation needs new infrastructure (more than few lines)
Missing:
* binary mode support
* parametrized query support,I am not against, but both points I proposed, and both was rejected.
So why dont use current infrastructure? Raw copy is trivial patch.
I was asked by Daniel Verite about reopening this patch in opened commitfest.
I am sending rebased patchRegardsPavel
On Sat, Feb 27, 2016 at 2:26 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:Hi2015-08-06 10:37 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:Hi,
Psql based implementation needs new infrastructure (more than few lines)
Missing:
* binary mode support
* parametrized query support,I am not against, but both points I proposed, and both was rejected.
So why dont use current infrastructure? Raw copy is trivial patch.
I was asked by Daniel Verite about reopening this patch in opened commitfest.
I am sending rebased patchRegardsPavelSince this patch does something I need for my own work, I've signed up as a reviewer.From a design standpoint, I feel that COPY is the preferred means of dealing with data from sources too transient to justify setting up a foreign data wrapper, and too simple to justify writing application code. So, for me, RAW is the right solution, or at least *a* right solution.
My first pass of reading the code changes and the regression tests is complete, and I found the changes to be clear and fairly straightforward. This shouldn't surprise anyone, as the previous reviewers had only minor quibbles with the code. So far, so good.The regression tests seem to adequately cover all new functionality, though I wonder if we should add some cases that highlight situations where BINARY mode is insufficient.Before I give my approval, I want to read it again more closely to make sure that no cases were skipped with regard to the (binary || raw) and (binary || !raw) tests. Also, I want to use it on some of my problematic files. Maybe I'll find a good edge case. Probably not.I hope to find time for those things in the next few days.
I am new to reviewing, here is what I got. Patch have been applied nicely to the HEAD. I tried to upload and export files in psql, it works as expected. All regression tests are passed without problems as well. Code looks good for me. There is a little confusion for me in this line of documentation:Hi2015-08-06 10:37 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:Hi,
Psql based implementation needs new infrastructure (more than few lines)
Missing:
* binary mode support
* parametrized query support,I am not against, but both points I proposed, and both was rejected.
So why dont use current infrastructure? Raw copy is trivial patch.
I was asked by Daniel Verite about reopening this patch in opened commitfest.
I am sending rebased patchRegardsPavel
"use any metadata - only row data in network byte order are exported"
Did you mean "only raw data in network byte order is exported"?
And there are two entries for this patch on commitfest page: in "miscellaneous" and "sql" sections. Probably it's better to remove one of them to avoid confusion.
-- Ildar Musin Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Corey Huinker wrote: > So, for me, RAW is the right solution, or at least *a* right solution. Questions on how to extract from a bytea column come up on a regular basis, as in [1] [2] [3], or [4] a few days ago, and so far the answers are to encode the contents in text and decode them in an additional step, or use COPY BINARY and filter out the headers. But none of this is as straightforward and efficient as the proposed COPY RAW. Also the conversion to text can't be used at all on very large contents (>512MB), as mentioned in another recent thread [5] (this is the same reason why pg_dump can't dump such rows), but COPY RAW doesn't have this limitation. Technically COPY BINARY should be sufficient, but it seems that people dislike having to deal with its headers. Also it's not supported by any of the drivers of popular script languages that otherwise provide COPY in text format (DBD::Pg, php, psycopg2...) Maybe the RAW format would have a better chance to get support there, because of its simplicity. [1] http://www.postgresql.org/message-id/038517CEB6DE43BD8422D7947B6BE8D8@fanliji ng [2] http://www.postgresql.org/message-id/4C8272C4.1000008@arcor.de [3] http://stackoverflow.com/questions/6730729 [4] http://www.postgresql.org/message-id/56C66565.50107@consistentstate.com [5] http://www.postgresql.org/message-id/14620.1456851036@sss.pgh.pa.us Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Corey Huinker wrote:
> So, for me, RAW is the right solution, or at least *a* right solution.
Questions on how to extract from a bytea column come up on a regular
basis, as in [1] [2] [3], or [4] a few days ago, and so far the answers
are to encode the contents in text and decode them in an additional
step, or use COPY BINARY and filter out the headers.
But none of this is as straightforward and efficient as the proposed
COPY RAW.
Also the conversion to text can't be used at all on very large
contents (>512MB), as mentioned in another recent thread [5]
(this is the same reason why pg_dump can't dump such rows),
but COPY RAW doesn't have this limitation.
Technically COPY BINARY should be sufficient, but it seems that
people dislike having to deal with its headers.
Also it's not supported by any of the drivers of popular
script languages that otherwise provide COPY in text format
(DBD::Pg, php, psycopg2...)
Maybe the RAW format would have a better chance to get support
there, because of its simplicity.
[1]
http://www.postgresql.org/message-id/038517CEB6DE43BD8422D7947B6BE8D8@fanliji
ng
[2] http://www.postgresql.org/message-id/4C8272C4.1000008@arcor.de
[3] http://stackoverflow.com/questions/6730729
[4] http://www.postgresql.org/message-id/56C66565.50107@consistentstate.com
[5] http://www.postgresql.org/message-id/14620.1456851036@sss.pgh.pa.us
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
2016-03-04 15:54 GMT+01:00 Daniel Verite <daniel@manitou-mail.org>:Corey Huinker wrote:
> So, for me, RAW is the right solution, or at least *a* right solution.
Questions on how to extract from a bytea column come up on a regular
basis, as in [1] [2] [3], or [4] a few days ago, and so far the answers
are to encode the contents in text and decode them in an additional
step, or use COPY BINARY and filter out the headers.
But none of this is as straightforward and efficient as the proposed
COPY RAW.
Also the conversion to text can't be used at all on very large
contents (>512MB), as mentioned in another recent thread [5]
(this is the same reason why pg_dump can't dump such rows),
but COPY RAW doesn't have this limitation.
Technically COPY BINARY should be sufficient, but it seems that
people dislike having to deal with its headers.Also it's not supported by any of the drivers of popular
script languages that otherwise provide COPY in text format
(DBD::Pg, php, psycopg2...)
Maybe the RAW format would have a better chance to get support
there, because of its simplicity.exactly - I would to decrease dependency on PostgreSQL internals. Working with clean content is simple and possible with any environment without unclean operations.
RegardsPavel
[1]
http://www.postgresql.org/message-id/038517CEB6DE43BD8422D7947B6BE8D8@fanliji
ng
[2] http://www.postgresql.org/message-id/4C8272C4.1000008@arcor.de
[3] http://stackoverflow.com/questions/6730729
[4] http://www.postgresql.org/message-id/56C66565.50107@consistentstate.com
[5] http://www.postgresql.org/message-id/14620.1456851036@sss.pgh.pa.us
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
The regression tests seem to adequately cover all new functionality, though I wonder if we should add some cases that highlight situations where BINARY mode is insufficient.
\copy (select '{"foo": "bar"}') to '/tmp/raw_test.jsonb' (format raw);COPY 1create temporary table raw_byte (b bytea);CREATE TABLEcreate temporary table raw_text (t text);CREATE TABLE\copy raw_jsonb from '/tmp/raw_test.blob' (format raw);psql:/home/ubuntu/raw_test.sql:9: ERROR: relation "raw_jsonb" does not exist\copy raw_byte from '/tmp/raw_test.blob' (format raw);COPY 1select encode(b,'escape')::text::json from raw_byte;encode----------------{"foo": "bar"}(1 row)\copy raw_text from '/tmp/raw_test.blob' (format raw);COPY 1select t::jsonb from raw_text;t----------------{"foo": "bar"}(1 row)create temporary table binary_byte (b bytea);CREATE TABLEcreate temporary table binary_text (t text);CREATE TABLE\copy binary_byte from '/tmp/raw_test.blob' (format binary);psql:/home/ubuntu/raw_test.sql:22: ERROR: COPY file signature not recognizedselect encode(b,'escape')::jsonb from binary_byte;encode--------(0 rows)\copy binary_text from '/tmp/raw_test.blob' (format binary);psql:/home/ubuntu/raw_test.sql:26: ERROR: COPY file signature not recognizedselect t::jsonb from binary_text;t---(0 rows)
Before I give my approval, I want to read it again more closely to make sure that no cases were skipped with regard to the (binary || raw) and (binary || !raw) tests. Also, I want to use it on some of my problematic files. Maybe I'll find a good edge case. Probably not.
Attachment
The regression tests seem to adequately cover all new functionality, though I wonder if we should add some cases that highlight situations where BINARY mode is insufficient.One thing I tried to test RAW was to load an existing json file.My own personal test was to load an existing .json file into a 1x1 bytea table, which worked. From there I was able toselect encode(col_name,'escape')::text::jsonb from test_tableand the json was correctly converted.A similar test copying binary failed.A write up of the test looks like this:\copy (select '{"foo": "bar"}') to '/tmp/raw_test.jsonb' (format raw);COPY 1create temporary table raw_byte (b bytea);CREATE TABLEcreate temporary table raw_text (t text);CREATE TABLE\copy raw_jsonb from '/tmp/raw_test.blob' (format raw);psql:/home/ubuntu/raw_test.sql:9: ERROR: relation "raw_jsonb" does not exist\copy raw_byte from '/tmp/raw_test.blob' (format raw);COPY 1select encode(b,'escape')::text::json from raw_byte;encode----------------{"foo": "bar"}(1 row)\copy raw_text from '/tmp/raw_test.blob' (format raw);COPY 1select t::jsonb from raw_text;t----------------{"foo": "bar"}(1 row)create temporary table binary_byte (b bytea);CREATE TABLEcreate temporary table binary_text (t text);CREATE TABLE\copy binary_byte from '/tmp/raw_test.blob' (format binary);psql:/home/ubuntu/raw_test.sql:22: ERROR: COPY file signature not recognizedselect encode(b,'escape')::jsonb from binary_byte;encode--------(0 rows)\copy binary_text from '/tmp/raw_test.blob' (format binary);psql:/home/ubuntu/raw_test.sql:26: ERROR: COPY file signature not recognizedselect t::jsonb from binary_text;t---(0 rows)So, if we want to add a regression test to demonstrate to posterity why we need RAW for cases that BINARY can't handle, I offer the attached file.
Does anyone else see value in adding that to the regression tests?
Before I give my approval, I want to read it again more closely to make sure that no cases were skipped with regard to the (binary || raw) and (binary || !raw) tests. Also, I want to use it on some of my problematic files. Maybe I'll find a good edge case. Probably not.I don't know why I thought this, but when I looked at the patch, I assumed that the ( binary || raw ) tests were part of a large if/elseif/else waterfall. They are not. They stand alone. There are no edge cases to find.
Review complete and passed. I can re-review if we want to add the additional test.
On 3/12/16 1:24 AM, Pavel Stehule wrote: > Great, thank you very much. I hope so this feature really useful. It > allow to simple export/import XML doc in different encodings, JSONs and > can be enhanced future via options. The nice feature (but not for this > release) can be additional cast info for import -- like "COPY > table(jsonb_column) FROM stdin (FORMAT RAW, CAST json_2_jsonb). Because > there are the options, there are big space for other enhancing. Andres Karlsson pointed out that this patch has two CF entries: https://commitfest.postgresql.org/9/223/ https://commitfest.postgresql.org/9/547/ I closed the one that was in the "needs review" (547) state and kept the one that is "ready for committer" (223). -- -David david@pgmasters.net
On 3/12/16 1:24 AM, Pavel Stehule wrote:Great, thank you very much. I hope so this feature really useful. It
allow to simple export/import XML doc in different encodings, JSONs and
can be enhanced future via options. The nice feature (but not for this
release) can be additional cast info for import -- like "COPY
table(jsonb_column) FROM stdin (FORMAT RAW, CAST json_2_jsonb). Because
there are the options, there are big space for other enhancing.
Andres Karlsson pointed out that this patch has two CF entries:
https://commitfest.postgresql.org/9/223/
https://commitfest.postgresql.org/9/547/
I closed the one that was in the "needs review" (547) state and kept the one that is "ready for committer" (223).
--
-David
david@pgmasters.net
Pavel Stehule <pavel.stehule@gmail.com> writes: > [ copy-raw-format-20160227-03.patch ] I looked at this patch. I'm having a hard time accepting that it has a use-case large enough to justify it, and here's the reason: it's a protocol break. Conveniently omitting to update protocol.sgml doesn't make it not a protocol break. (libpq.sgml also contains assorted statements that are falsified by this patch.) You could argue that it's the user's own fault if he tries to use COPY RAW with client-side code that hasn't been updated to support it. Maybe that's okay, but I wonder if we're opening ourselves up to problems. Maybe even security-grade problems. In terms of specific code that hasn't been updated, ecpg is broken by this patch, and I'm not very sure what libpq's PQbinaryTuples() ought to do but probably something other than what it does today. There's also a definitional question of what we think PQfformat() ought to do; should it return "2" for the per-field format? Or maybe the per-field format is still "1", since it's after all the same binary data format as for COPY BINARY, and only the overall copy format reported by PQbinaryTuples() should change to "2". BTW, I'm not really sure why the patch is trying to enforce single row and column for the COPY OUT case. I thought the idea for that was that we'd just shove out the data without any delimiters, and if it's more than one datum it's the user's problem whether he can identify the boundaries. On the input side we would have to insist on one column since we're not going to attempt to identify boundaries (and one row would fall out of the fact that we slurp the entire input and treat it as one datum). Anyway this is certainly not committable as-is, so I'm setting it back to Waiting on Author. But the fact that both libpq and ecpg would need updates makes me question whether we can safely pretend that this isn't a protocol break. regards, tom lane
On 03/28/2016 06:26 PM, Tom Lane wrote: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> [ copy-raw-format-20160227-03.patch ] > I looked at this patch. I'm having a hard time accepting that it has > a use-case large enough to justify it, and here's the reason: it's > a protocol break. Conveniently omitting to update protocol.sgml > doesn't make it not a protocol break. (libpq.sgml also contains > assorted statements that are falsified by this patch.) > > You could argue that it's the user's own fault if he tries to use > COPY RAW with client-side code that hasn't been updated to support it. > Maybe that's okay, but I wonder if we're opening ourselves up to > problems. Maybe even security-grade problems. > > In terms of specific code that hasn't been updated, ecpg is broken > by this patch, and I'm not very sure what libpq's PQbinaryTuples() > ought to do but probably something other than what it does today. > > There's also a definitional question of what we think PQfformat() ought > to do; should it return "2" for the per-field format? Or maybe the > per-field format is still "1", since it's after all the same binary data > format as for COPY BINARY, and only the overall copy format reported by > PQbinaryTuples() should change to "2". > > BTW, I'm not really sure why the patch is trying to enforce single > row and column for the COPY OUT case. I thought the idea for that > was that we'd just shove out the data without any delimiters, and > if it's more than one datum it's the user's problem whether he can > identify the boundaries. On the input side we would have to insist > on one column since we're not going to attempt to identify boundaries > (and one row would fall out of the fact that we slurp the entire input > and treat it as one datum). > > Anyway this is certainly not committable as-is, so I'm setting it back > to Waiting on Author. But the fact that both libpq and ecpg would need > updates makes me question whether we can safely pretend that this isn't > a protocol break. > > In that case I humbly submit that there is a case for reviving the psql patch I posted that kicked off this whole thing and lets you export a piece of binary data from psql quite easily. It should certainly not involve any protocol break. cheers andrew
In that case I humbly submit that there is a case for reviving the psql patch I posted that kicked off this whole thing and lets you export a piece of binary data from psql quite easily. It should certainly not involve any protocol break.
On 03/28/2016 06:26 PM, Tom Lane wrote:Pavel Stehule <pavel.stehule@gmail.com> writes:[ copy-raw-format-20160227-03.patch ]I looked at this patch. I'm having a hard time accepting that it has
a use-case large enough to justify it, and here's the reason: it's
a protocol break. Conveniently omitting to update protocol.sgml
doesn't make it not a protocol break. (libpq.sgml also contains
assorted statements that are falsified by this patch.)
You could argue that it's the user's own fault if he tries to use
COPY RAW with client-side code that hasn't been updated to support it.
Maybe that's okay, but I wonder if we're opening ourselves up to
problems. Maybe even security-grade problems.
In terms of specific code that hasn't been updated, ecpg is broken
by this patch, and I'm not very sure what libpq's PQbinaryTuples()
ought to do but probably something other than what it does today.
There's also a definitional question of what we think PQfformat() ought
to do; should it return "2" for the per-field format? Or maybe the
per-field format is still "1", since it's after all the same binary data
format as for COPY BINARY, and only the overall copy format reported by
PQbinaryTuples() should change to "2".
BTW, I'm not really sure why the patch is trying to enforce single
row and column for the COPY OUT case. I thought the idea for that
was that we'd just shove out the data without any delimiters, and
if it's more than one datum it's the user's problem whether he can
identify the boundaries. On the input side we would have to insist
on one column since we're not going to attempt to identify boundaries
(and one row would fall out of the fact that we slurp the entire input
and treat it as one datum).
Anyway this is certainly not committable as-is, so I'm setting it back
to Waiting on Author. But the fact that both libpq and ecpg would need
updates makes me question whether we can safely pretend that this isn't
a protocol break.
cheers
andrew
Pavel Stehule <pavel.stehule@gmail.com> writes:
> [ copy-raw-format-20160227-03.patch ]
I looked at this patch. I'm having a hard time accepting that it has
a use-case large enough to justify it, and here's the reason: it's
a protocol break. Conveniently omitting to update protocol.sgml
doesn't make it not a protocol break. (libpq.sgml also contains
assorted statements that are falsified by this patch.)
You could argue that it's the user's own fault if he tries to use
COPY RAW with client-side code that hasn't been updated to support it.
Maybe that's okay, but I wonder if we're opening ourselves up to
problems. Maybe even security-grade problems.
In terms of specific code that hasn't been updated, ecpg is broken
by this patch, and I'm not very sure what libpq's PQbinaryTuples()
ought to do but probably something other than what it does today.
There's also a definitional question of what we think PQfformat() ought
to do; should it return "2" for the per-field format? Or maybe the
per-field format is still "1", since it's after all the same binary data
format as for COPY BINARY, and only the overall copy format reported by
PQbinaryTuples() should change to "2".
BTW, I'm not really sure why the patch is trying to enforce single
row and column for the COPY OUT case. I thought the idea for that
was that we'd just shove out the data without any delimiters, and
if it's more than one datum it's the user's problem whether he can
identify the boundaries. On the input side we would have to insist
on one column since we're not going to attempt to identify boundaries
(and one row would fall out of the fact that we slurp the entire input
and treat it as one datum).
Anyway this is certainly not committable as-is, so I'm setting it back
to Waiting on Author. But the fact that both libpq and ecpg would need
updates makes me question whether we can safely pretend that this isn't
a protocol break.
regards, tom lane
In that case I humbly submit that there is a case for reviving the psql patch I posted that kicked off this whole thing and lets you export a piece of binary data from psql quite easily. It should certainly not involve any protocol break.
Anyway this is certainly not committable as-is, so I'm setting it back
to Waiting on Author. But the fact that both libpq and ecpg would need
updates makes me question whether we can safely pretend that this isn't
a protocol break.
The psql only solution can work only for output. Doesn't help with input.
RegardsPavel
cheers
andrew
Pavel Stehule <pavel.stehule@gmail.com> writes:
> [ copy-raw-format-20160227-03.patch ]
I looked at this patch. I'm having a hard time accepting that it has
a use-case large enough to justify it, and here's the reason: it's
a protocol break. Conveniently omitting to update protocol.sgml
doesn't make it not a protocol break. (libpq.sgml also contains
assorted statements that are falsified by this patch.)
You could argue that it's the user's own fault if he tries to use
COPY RAW with client-side code that hasn't been updated to support it.
Maybe that's okay, but I wonder if we're opening ourselves up to
problems. Maybe even security-grade problems.
In terms of specific code that hasn't been updated, ecpg is broken
by this patch, and I'm not very sure what libpq's PQbinaryTuples()
ought to do but probably something other than what it does today.
There's also a definitional question of what we think PQfformat() ought
to do; should it return "2" for the per-field format? Or maybe the
per-field format is still "1", since it's after all the same binary data
format as for COPY BINARY, and only the overall copy format reported by
PQbinaryTuples() should change to "2".
Packet: t=1459265078.596466, session=213070643360702
PGSQL: type=Query, F -> B
QUERY query=copy foo(x) to stdout (format raw);
Packet: t=1459265078.597755, session=213070643360702
PGSQL: type=CopyOutResponse, B -> F
COPY OUT RESPONSE copy format=1, num_fields=1, fields_formats=2
Packet: t=1459265078.597755, session=213070643360702
PGSQL: type=CopyData, B -> F
COPY DATA len=20
Packet: t=1459265078.597755, session=213070643360702
PGSQL: type=CopyDone, B -> F
COPY DONE
Packet: t=1459265078.597755, session=213070643360702
PGSQL: type=CommandComplete, B -> F
COMMAND COMPLETE command='COPY 1'
Packet: t=1459265078.597755, session=213070643360702
PGSQL: type=ReadyForQuery, B -> F
READY FOR QUERY type=<IDLE>
PQfformat(*results, 0)) returns 2 already, PQbinaryTuples() returns 1.
BTW, I'm not really sure why the patch is trying to enforce single
row and column for the COPY OUT case. I thought the idea for that
was that we'd just shove out the data without any delimiters, and
if it's more than one datum it's the user's problem whether he can
identify the boundaries. On the input side we would have to insist
on one column since we're not going to attempt to identify boundaries
(and one row would fall out of the fact that we slurp the entire input
and treat it as one datum).
Anyway this is certainly not committable as-is, so I'm setting it back
to Waiting on Author. But the fact that both libpq and ecpg would need
updates makes me question whether we can safely pretend that this isn't
a protocol break.
regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes: > I tested COPY RAW on old psql clients - and it is working without any > problem - so when the client uses same logic as psql, then it should to > work. Sure, there can be differently implemented clients, but the COPY > client side is usually simple - store stream to output. My point is precisely that I doubt all clients are that stupid about COPY. > Maybe I am blind, but I don't see any new security risks. The risk can be > only on client side - and if client is not able work with new value, then > it can fails. Well, the point is that low-level code might get used to process the data stream for commands it doesn't have any control over. Maybe there's no realistic security risk there, or maybe there is; I'm not sure. > I am thinking so PQbinaryTuples should to return 1 (without change), and > PQfformat should to return 2. Well, that seems pretty backwards to me. The format of the individual fields is still what it is under COPY BINARY; you would not use a different per-field transformation. You do need to know about the overall format of the copy data stream being different, and defining PQbinaryTuples as still returning 1 means there's no clean way to understand overall copy format vs. per-field format. There's a case to be made that we should invent a new function named along the lines of PQcopyFormat() rather than overloading PQbinaryTuples() some more. That function is currently deprecated and I'm not very happy with un-deprecating it only to use it in a confusing way. To be more concrete about this: I think it's actually rather broken that this patch ties RAW to binary format of the field contents. Why would it not be exactly as useful to have delimiter-less COPY of textual data, for use when there's just one datum and/or you're confident in picking the data apart for yourself? But as things stand it'd be too confusing for an application to try to figure out what's happening in such a case. So I think we should either invent RAW_TEXT and RAW_BINARY formats (not just RAW) or make RAW be an orthogonal copy option. And we need to improve libpq's behavior enough so that applications can sanely figure out what's happening. > I executed all tests in libpq and ecpg without any problems. Can you, > please, help me with repeating a ecpg issues? Of course the ecpg tests pass; you didn't extend them to see what would happen if someone tries COPY RAW with ecpg. Likewise, we have no tests exercising a client's use of libpq with more intelligence than psql has got. But that doesn't mean it's acceptable to write this patch with no thought for such clients. I am fairly sure that there actually are third-party client libraries that have more intelligence about COPY than psql, but I do not remember any specifics unfortunately. regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I tested COPY RAW on old psql clients - and it is working without any
> problem - so when the client uses same logic as psql, then it should to
> work. Sure, there can be differently implemented clients, but the COPY
> client side is usually simple - store stream to output.
My point is precisely that I doubt all clients are that stupid about COPY.
> Maybe I am blind, but I don't see any new security risks. The risk can be
> only on client side - and if client is not able work with new value, then
> it can fails.
Well, the point is that low-level code might get used to process the data
stream for commands it doesn't have any control over. Maybe there's no
realistic security risk there, or maybe there is; I'm not sure.
> I am thinking so PQbinaryTuples should to return 1 (without change), and
> PQfformat should to return 2.
Well, that seems pretty backwards to me. The format of the individual
fields is still what it is under COPY BINARY; you would not use a
different per-field transformation. You do need to know about the
overall format of the copy data stream being different, and defining
PQbinaryTuples as still returning 1 means there's no clean way to
understand overall copy format vs. per-field format.
There's a case to be made that we should invent a new function named
along the lines of PQcopyFormat() rather than overloading PQbinaryTuples()
some more. That function is currently deprecated and I'm not very happy
with un-deprecating it only to use it in a confusing way.
To be more concrete about this: I think it's actually rather broken
that this patch ties RAW to binary format of the field contents.
Why would it not be exactly as useful to have delimiter-less COPY
of textual data, for use when there's just one datum and/or you're
confident in picking the data apart for yourself? But as things stand
it'd be too confusing for an application to try to figure out what's
happening in such a case.
So I think we should either invent RAW_TEXT and RAW_BINARY formats
(not just RAW) or make RAW be an orthogonal copy option. And we need
to improve libpq's behavior enough so that applications can sanely
figure out what's happening.
> I executed all tests in libpq and ecpg without any problems. Can you,
> please, help me with repeating a ecpg issues?
Of course the ecpg tests pass; you didn't extend them to see what would
happen if someone tries COPY RAW with ecpg. Likewise, we have no tests
exercising a client's use of libpq with more intelligence than psql has
got. But that doesn't mean it's acceptable to write this patch with no
thought for such clients.
I am fairly sure that there actually are third-party client libraries
that have more intelligence about COPY than psql, but I do not remember
any specifics unfortunately.
4. PQcopyFormat() returns 0 for text, 1 for binary, 2 for RAW_TEXT, 3 for RAW_BINARY
regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes: > I am writing few lines as summary: > 1. invention RAW_TEXT and RAW_BINARY > 2. for RAW_BINARY: PQbinaryTuples() returns 1 and PQfformat() returns 1 > 3.a for RAW_TEXT: PQbinaryTuples() returns 0 and PQfformat() returns 0, but > the client should to check PQcopyFormat() to not print "\n" on the end > 3.b for RAW_TEXT: PQbinaryTuples() returns 1 and PQfformat() returns 1, but > used output function, not necessary client modification > 4. PQcopyFormat() returns 0 for text, 1 for binary, 2 for RAW_TEXT, 3 for > RAW_BINARY > 5. create tests for ecpg 3.b certainly seems completely wrong. PQfformat==1 would imply binary data. I suggest that PQcopyFormat should be understood as defining the format of the copy data encapsulation, not the individual fields. So it would go like 0 = traditional text format, 1 = traditional binary format, 2 = raw (no encapsulation). You'd need to also look at PQfformat to distinguish raw text from raw binary. But if we do it as you suggest above, we've locked ourselves into only ever having two field format codes, which is something the existing design is specifically intended to allow expansion in. regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I am writing few lines as summary:
> 1. invention RAW_TEXT and RAW_BINARY
> 2. for RAW_BINARY: PQbinaryTuples() returns 1 and PQfformat() returns 1
> 3.a for RAW_TEXT: PQbinaryTuples() returns 0 and PQfformat() returns 0, but
> the client should to check PQcopyFormat() to not print "\n" on the end
> 3.b for RAW_TEXT: PQbinaryTuples() returns 1 and PQfformat() returns 1, but
> used output function, not necessary client modification
> 4. PQcopyFormat() returns 0 for text, 1 for binary, 2 for RAW_TEXT, 3 for
> RAW_BINARY
> 5. create tests for ecpg
3.b certainly seems completely wrong. PQfformat==1 would imply binary
data.
I suggest that PQcopyFormat should be understood as defining the format
of the copy data encapsulation, not the individual fields. So it would go
like 0 = traditional text format, 1 = traditional binary format, 2 = raw
(no encapsulation). You'd need to also look at PQfformat to distinguish
raw text from raw binary. But if we do it as you suggest above, we've
locked ourselves into only ever having two field format codes, which
is something the existing design is specifically intended to allow
expansion in.
regards, tom lane
On 03/28/2016 11:18 PM, Pavel Stehule wrote: > > > > Anyway this is certainly not committable as-is, so I'm setting > it back > to Waiting on Author. But the fact that both libpq and ecpg > would need > updates makes me question whether we can safely pretend that > this isn't > a protocol break. > > > > > In that case I humbly submit that there is a case for reviving the > psql patch I posted that kicked off this whole thing and lets you > export a piece of binary data from psql quite easily. It should > certainly not involve any protocol break. > > > The psql only solution can work only for output. Doesn't help with input. > > The I would suggest we try to invent something for psql which does help with it. I just don't see this as an SQL problem. Pretty much any driver library will have no difficulty in handling binary input and output. It's only psql that has an issue, ISTM, and therefore I believe that's where the fix should go. What else is going to use this? As an SQL change this seems like a solution in search of a problem. If someone can make a good case that this is going to be of general use I'll happily go along, but I haven't seen one so far. cheers andrdew
Andrew Dunstan <andrew@dunslane.net> writes: > The I would suggest we try to invent something for psql which does help > with it. I just don't see this as an SQL problem. There's certainly a lot to be said for that approach. I'm still not convinced that we can make COPY do this without creating compatibility issues, regardless of the details; and it doesn't seem like a big enough problem to be worth taking any risks of that sort. regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I am writing few lines as summary:
> 1. invention RAW_TEXT and RAW_BINARY
> 2. for RAW_BINARY: PQbinaryTuples() returns 1 and PQfformat() returns 1
> 3.a for RAW_TEXT: PQbinaryTuples() returns 0 and PQfformat() returns 0, but
> the client should to check PQcopyFormat() to not print "\n" on the end
> 3.b for RAW_TEXT: PQbinaryTuples() returns 1 and PQfformat() returns 1, but
> used output function, not necessary client modification
> 4. PQcopyFormat() returns 0 for text, 1 for binary, 2 for RAW_TEXT, 3 for
> RAW_BINARY
> 5. create tests for ecpg
3.b certainly seems completely wrong. PQfformat==1 would imply binary
data.
I suggest that PQcopyFormat should be understood as defining the format
of the copy data encapsulation, not the individual fields. So it would go
like 0 = traditional text format, 1 = traditional binary format, 2 = raw
(no encapsulation). You'd need to also look at PQfformat to distinguish
raw text from raw binary. But if we do it as you suggest above, we've
locked ourselves into only ever having two field format codes, which
is something the existing design is specifically intended to allow
expansion in.
regards, tom lane
Attachment
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I tested COPY RAW on old psql clients - and it is working without any
> problem - so when the client uses same logic as psql, then it should to
> work. Sure, there can be differently implemented clients, but the COPY
> client side is usually simple - store stream to output.
My point is precisely that I doubt all clients are that stupid about COPY.
On 30 March 2016 at 00:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:Pavel Stehule <pavel.stehule@gmail.com> writes:
> I tested COPY RAW on old psql clients - and it is working without any
> problem - so when the client uses same logic as psql, then it should to
> work. Sure, there can be differently implemented clients, but the COPY
> client side is usually simple - store stream to output.
My point is precisely that I doubt all clients are that stupid about COPY.PgJDBC definitely isn't.Any changes really need to be tested against PgJDBC's CopyManager.
--
this patch doesn't break any old application. Accepting new feature depends on binary method detection. PQbinaryTuples based clients should to support COPY RAW* without problems, PQfformat() should to report unknown format.
On 31 March 2016 at 14:40, Pavel Stehule <pavel.stehule@gmail.com> wrote:this patch doesn't break any old application. Accepting new feature depends on binary method detection. PQbinaryTuples based clients should to support COPY RAW* without problems, PQfformat() should to report unknown format.PgJDBC does not use libpq.
--
Andrew Dunstan wrote: > If someone can make a good case that this is going to be of > general use I'll happily go along, but I haven't seen one so far. About COPY FROM with a raw format, for instance just yesterday there was this user question on stackoverflow: http://stackoverflow.com/questions/36317237 which essentially is: how to import contents from a file without any particular interpretation of any character? With the patch discussed in this thread, a user can do \copy table(textcol) from /path/to/file (format raw) or the equivalent COPY. If it's a binary column, that works just the same. Without this, it's not obvious at all how this result can be achieved without resorting to external preprocessing, and assuming the availability of such preprocessing tools in the environment. Notwithstanding the fact that the solution proposed on SO (doubling backslashes with sed) doesn't even work if the file contains tabs, as they would be interpreted as field separators, even if the copy target has only one column. You can change the delimiter with COPY but AFAIK you can't tell that there is none. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
On 04/01/2016 11:42 AM, Daniel Verite wrote: > Andrew Dunstan wrote: > >> If someone can make a good case that this is going to be of >> general use I'll happily go along, but I haven't seen one so far. > About COPY FROM with a raw format, for instance just yesterday > there was this user question on stackoverflow: > http://stackoverflow.com/questions/36317237 > > which essentially is: how to import contents from a file without any > particular interpretation of any character?\ There is so much wrong with this it's hard to know where to start. Inserting the whole contents of a text file unchanged is insanely easy in psql. \set file `cat /path/to/file` insert into mytable(contents) values(:'file'); What is more everyone on SO missed the fact that CSV mode gives you very considerable control over the quote, delimiter and null settings. See for example <http://adpgtech.blogspot.com/2014/09/importing-json-data.html> which has this example for handling files consisting of 1 json document per line: copy the_table(jsonfield)from '/path/to/jsondata'csv quote e'\x01' delimiter e'\x02'; psql's \copy will work just the same way (I noticed with amusement this week that CitusData is using pretty much exactly this in one of their examples.) > > With the patch discussed in this thread, a user can do > \copy table(textcol) from /path/to/file (format raw) > or the equivalent COPY. > If it's a binary column, that works just the same. It would be fairly simple to invent a binary mechanism that did the equivalent of the above insert. All without any change to SQL or the backend at all. > > Without this, it's not obvious at all how this result can be > achieved without resorting to external preprocessing, > and assuming the availability of such preprocessing tools > in the environment. Notwithstanding the fact that the > solution proposed on SO (doubling backslashes with sed) > doesn't even work if the file contains tabs, as they would be > interpreted as field separators, even if the copy target has only > one column. You can change the delimiter with COPY but AFAIK > you can't tell that there is none. There is arguably a good case for allowing a null delimiter. But that SO page is just a terrible piece of misinformation, as far too often happens in my experience. And I am still waiting for a non-psql use case. But I don't expect to see one, precisely because most clients have no difficulty at all in handling binary data. cheers andrew
Andrew Dunstan wrote: > Inserting the whole contents of a text file unchanged is insanely easy > in psql. > > \set file `cat /path/to/file` > insert into mytable(contents) values(:'file'); That's assuming psql but the asker of that question never mentioned using psql. The COPY invocation could be inside a function. Even if that particular user would be fine with a psql-only option, the next one might not. Or they might want to import a binary file, and as you mention, currently there's no equivalent of the :'var' feature for binary. But there's another aspect to this that's worth of consideration, and that this forum question illustrates. One reason of adding the format to COPY is that it's where users are looking for it. It's the canonical way of importing contents from files so that's where it makes more sense. From the POV of being user friendly and consistent, restricting what COPY can do because psql could do it completely differently if the user was psql-savvy enough to know it, what sense does it make? > And I am still waiting for a non-psql use case. But I don't expect to > see one, precisely because most clients have no difficulty at all in > handling binary data. You mean small or medium-size binary data. The 512MB-1GB range is impossible to handle if requested in text format, which is what drivers tend to use. Even pg_dump fails on these contents. Maybe it was unimportant when bytea was added ~15 years ago, but the size of data that people actually put into bytea columns is growing, following Moore's law like the rest. Even in the lower size range, considering the amount of memory allocated and the time spent to convert to hex, sending twice the number of bytes on the wire, just to do the reverse conversion in the client as soon as all data is obtained, it works but it's pointless and inefficient. Code that uses PQexecParams() binary "resultFormat", or the binary format of copy doesn't have that problem, but most client-side drivers don't do that. And maybe they just can't realistically, because getting result format in binary is exposed as an all-or-nothing choice in libpq. I mean if client code does SELECT * FROM table or even COPY of the same, and what comes back is bytea and e.g. timestamps and floats and custom types, the client-side driver may wish to have the bytea field in binary format for efficiency and the rest in text format for usability, but that's not possible with PQexecParams(), or other libpq functions. The point of mixing binary and text is outside the scope of a RAW format for COPY, as obviously it wouldn't help with that in any way, but on the argument that the status quo is fine because clients have no difficulty, that's just not true. Clients cope with what they have, but what they have is far from being complete or optimal. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
"Daniel Verite" <daniel@manitou-mail.org> writes: > One reason of adding the format to COPY is that it's where users > are looking for it. It's the canonical way of importing contents > from files so that's where it makes more sense. I'm not sure I buy that argument, because it could be used to justify adding absolutely any ETL functionality to COPY. And we don't want to go down that path; the design intention for COPY is that it be as simple and fast as possible. >> And I am still waiting for a non-psql use case. But I don't expect to >> see one, precisely because most clients have no difficulty at all in >> handling binary data. > You mean small or medium-size binary data. The 512MB-1GB range is > impossible to handle if requested in text format, which is what drivers > tend to use. Even pg_dump fails on these contents. ... which is COPY. I do not see that RAW mode is going to help much here: it's not going to be noticeably better than COPY BINARY in terms of maximum field width. >> Code that uses PQexecParams() binary "resultFormat", or the >> binary format of copy doesn't have that problem, but most >> client-side drivers don't do that. > And maybe they just can't realistically, because getting result > format in binary is exposed as an all-or-nothing choice in libpq. That's simply wrong. Read the documentation for PQexecParams and friends: you can specify text or binary per-column. It's COPY that has the only-one-column-format restriction, and RAW certainly isn't going to make that better. I'm not quite as convinced as Andrew that RAW mode is unnecessary, but I don't find these arguments for it to be very compelling. The real issue to my mind is that it doesn't seem like we can shoehorn a sanely-defined version of RAW into the existing protocol spec without creating compatibility hazards. So we can either wait for the mythical protocol v4 (but even a protocol update wouldn't fix the application-level hazards) or we can treat it as a problem to be solved client-side. regards, tom lane
Andrew Dunstan wrote:
> If someone can make a good case that this is going to be of
> general use I'll happily go along, but I haven't seen one so far.
About COPY FROM with a raw format, for instance just yesterday
there was this user question on stackoverflow:
http://stackoverflow.com/questions/36317237
which essentially is: how to import contents from a file without any
particular interpretation of any character?
With the patch discussed in this thread, a user can do
\copy table(textcol) from /path/to/file (format raw)
Tom Lane wrote: > >> Code that uses PQexecParams() binary "resultFormat", or the > >> binary format of copy doesn't have that problem, but most > >> client-side drivers don't do that. > > > And maybe they just can't realistically, because getting result > > format in binary is exposed as an all-or-nothing choice in libpq. > > That's simply wrong. Read the documentation for PQexecParams and > friends: you can specify text or binary per-column. It's COPY that > has the only-one-column-format restriction, and RAW certainly isn't > going to make that better. About PQexecParams, I disagree, the parameters formats can be specified independantly, but the not the results, which are either all binary or all text. Quoting the doc at http://www.postgresql.org/docs/9.5/static/libpq-exec.html <quote> PGresult *PQexecParams(PGconn *conn, const char *command, int nParams, const Oid *paramTypes, const char * const *paramValues, const int *paramLengths, const int *paramFormats, int resultFormat); [...] resultFormat: Specify zero to obtain results in text format, or one to obtain results in binary format. (There is not currently a provision to obtain different result columns in different formats, although that is possible in the underlying protocol.) </quote> For the client-side drivers that I've looked at, like these used in php or perl, they just never use resultFormat=1. I assume that they consider that having all values in binary is unworkable for them, which is reasonable. Maybe if they had a per-column choice, they wouldn't use it anyway, but at least it would be theirs to decide All this is only tangentially related to COPY RAW. It's just that COPY RAW can be seen as an efficient alternative to the single-column returning [SELECT bytea_column FROM...] The drivers currently request this in text mode even though it makes no sense in this particular case, and it gets measurably annoying if the contents are big. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Attachment
On Tue, Apr 5, 2016 at 4:45 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > here is cleaned/finished previous implementation of RAW_TEXT/RAW_BINARY > formats for COPY statements. > > The RAW with text formats means unescaped data, but with correct encoding - > input/output is realised with input/output function. RAW binary means > content produced/received by sending/received functions. > > Now both directions (input/output) working well > > Some examples of expected usage: > > copy (select xmlelement(name foo, 'hello')) to stdout (format raw_binary, > encoding 'latin2'); > > create table avatars(id serial, picture bytea); > \copy avatars(picture) from ~/images/foo.jpg (format raw_binary); > select lastval(); > > create table doc(id serial, txt text); > \copy doc(txt) from ~/files/aaa.txt (format raw_text, encoding 'latin2'); > select lastval(); As much as I know you and some other people would like it to be otherwise, this patch clearly does not have a sufficient degree of consensus to justify committing it to PostgreSQL 9.6. I'm marking it Returned with Feedback. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Apr 5, 2016 at 4:45 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> here is cleaned/finished previous implementation of RAW_TEXT/RAW_BINARY
> formats for COPY statements.
>
> The RAW with text formats means unescaped data, but with correct encoding -
> input/output is realised with input/output function. RAW binary means
> content produced/received by sending/received functions.
>
> Now both directions (input/output) working well
>
> Some examples of expected usage:
>
> copy (select xmlelement(name foo, 'hello')) to stdout (format raw_binary,
> encoding 'latin2');
>
> create table avatars(id serial, picture bytea);
> \copy avatars(picture) from ~/images/foo.jpg (format raw_binary);
> select lastval();
>
> create table doc(id serial, txt text);
> \copy doc(txt) from ~/files/aaa.txt (format raw_text, encoding 'latin2');
> select lastval();
As much as I know you and some other people would like it to be
otherwise, this patch clearly does not have a sufficient degree of
consensus to justify committing it to PostgreSQL 9.6. I'm marking it
Returned with Feedback.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 04/08/2016 02:13 PM, Robert Haas wrote: > On Tue, Apr 5, 2016 at 4:45 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> here is cleaned/finished previous implementation of RAW_TEXT/RAW_BINARY >> formats for COPY statements. >> >> The RAW with text formats means unescaped data, but with correct encoding - >> input/output is realised with input/output function. RAW binary means >> content produced/received by sending/received functions. >> >> Now both directions (input/output) working well >> >> Some examples of expected usage: >> >> copy (select xmlelement(name foo, 'hello')) to stdout (format raw_binary, >> encoding 'latin2'); >> >> create table avatars(id serial, picture bytea); >> \copy avatars(picture) from ~/images/foo.jpg (format raw_binary); >> select lastval(); >> >> create table doc(id serial, txt text); >> \copy doc(txt) from ~/files/aaa.txt (format raw_text, encoding 'latin2'); >> select lastval(); > As much as I know you and some other people would like it to be > otherwise, this patch clearly does not have a sufficient degree of > consensus to justify committing it to PostgreSQL 9.6. I'm marking it > Returned with Feedback. > I should add that I've been thinking about this some more, and that I now agree that something should be done to support this at the SQL level, mainly so that clients can manage very large pieces of data in a stream-oriented fashion rather than having to marshall the data in memory to load/unload via INSERT/SELECT. Anything that is client-side only is likely to have this memory issue. At the same time I'm still not entirely convinced that COPY is a good vehicle for this. It's designed for bulk records, and already quite complex. Maybe we need something new that uses the COPY protocol but is more specifically tailored for loading or sending large singleton pieces of data. cheers andrew
On 04/08/2016 02:13 PM, Robert Haas wrote:On Tue, Apr 5, 2016 at 4:45 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:here is cleaned/finished previous implementation of RAW_TEXT/RAW_BINARYAs much as I know you and some other people would like it to be
formats for COPY statements.
The RAW with text formats means unescaped data, but with correct encoding -
input/output is realised with input/output function. RAW binary means
content produced/received by sending/received functions.
Now both directions (input/output) working well
Some examples of expected usage:
copy (select xmlelement(name foo, 'hello')) to stdout (format raw_binary,
encoding 'latin2');
create table avatars(id serial, picture bytea);
\copy avatars(picture) from ~/images/foo.jpg (format raw_binary);
select lastval();
create table doc(id serial, txt text);
\copy doc(txt) from ~/files/aaa.txt (format raw_text, encoding 'latin2');
select lastval();
otherwise, this patch clearly does not have a sufficient degree of
consensus to justify committing it to PostgreSQL 9.6. I'm marking it
Returned with Feedback.
I should add that I've been thinking about this some more, and that I now agree that something should be done to support this at the SQL level, mainly so that clients can manage very large pieces of data in a stream-oriented fashion rather than having to marshall the data in memory to load/unload via INSERT/SELECT. Anything that is client-side only is likely to have this memory issue.
At the same time I'm still not entirely convinced that COPY is a good vehicle for this. It's designed for bulk records, and already quite complex. Maybe we need something new that uses the COPY protocol but is more specifically tailored for loading or sending large singleton pieces of data.
cheers
andrew
On 8 Apr 2016 9:14 pm, "Pavel Stehule" <pavel.stehule@gmail.com> wrote: > 2016-04-08 20:54 GMT+02:00 Andrew Dunstan <andrew@dunslane.net>: >> I should add that I've been thinking about this some more, and that I now agree that something should be done to supportthis at the SQL level, mainly so that clients can manage very large pieces of data in a stream-oriented fashion ratherthan having to marshall the data in memory to load/unload via INSERT/SELECT. Anything that is client-side only is likelyto have this memory issue. >> >> At the same time I'm still not entirely convinced that COPY is a good vehicle for this. It's designed for bulk records,and already quite complex. Maybe we need something new that uses the COPY protocol but is more specifically tailoredfor loading or sending large singleton pieces of data. > > > Now it is little bit more time to think more about. But It is hard to design some more simpler than is COPY syntax. Whatwill support both directions. Sorry for arriving late and adding to the bikeshedding. Maybe the answer is to make COPY pluggable. It seems to me that it would be relatively straightforward to add an extension mechanism for copy output and input plugins that could support any format expressible as a binary stream. Raw output would then be an almost trivial plugin. Others could implement JSON, protocol buffers, Redis bulk load, BSON, ASN.1 or whatever else serialisation format du jour. It will still have the same backwards compatibility issues as adding the raw output, but the payoff is greater. Regards, Ants Aasma
On 8 Apr 2016 9:14 pm, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
> 2016-04-08 20:54 GMT+02:00 Andrew Dunstan <andrew@dunslane.net>:
>> I should add that I've been thinking about this some more, and that I now agree that something should be done to support this at the SQL level, mainly so that clients can manage very large pieces of data in a stream-oriented fashion rather than having to marshall the data in memory to load/unload via INSERT/SELECT. Anything that is client-side only is likely to have this memory issue.
>>
>> At the same time I'm still not entirely convinced that COPY is a good vehicle for this. It's designed for bulk records, and already quite complex. Maybe we need something new that uses the COPY protocol but is more specifically tailored for loading or sending large singleton pieces of data.
>
>
> Now it is little bit more time to think more about. But It is hard to design some more simpler than is COPY syntax. What will support both directions.
Sorry for arriving late and adding to the bikeshedding. Maybe the
answer is to make COPY pluggable. It seems to me that it would be
relatively straightforward to add an extension mechanism for copy
output and input plugins that could support any format expressible as
a binary stream. Raw output would then be an almost trivial plugin.
Others could implement JSON, protocol buffers, Redis bulk load, BSON,
ASN.1 or whatever else serialisation format du jour. It will still
have the same backwards compatibility issues as adding the raw output,
but the payoff is greater.
Regards,
Ants Aasma
Pavel Stehule <pavel.stehule@gmail.com> writes: > I had a idea about additional options of COPY RAW statements. One can be > CAST function. These CAST functions can be used to any for any format. Uh, what? CAST() is not about external representations of values, and overloading it for that purpose doesn't seem like a particularly good idea: you'd have to figure out what the conversions meant inside SQL as well as externally. Also, maybe I missed something, but a different representation of individual data values within a COPY wasn't what we were after here. regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I had a idea about additional options of COPY RAW statements. One can be
> CAST function. These CAST functions can be used to any for any format.
Uh, what? CAST() is not about external representations of values, and
overloading it for that purpose doesn't seem like a particularly good
idea: you'd have to figure out what the conversions meant inside SQL as
well as externally. Also, maybe I missed something, but a different
representation of individual data values within a COPY wasn't what we
were after here.
regards, tom lane
create table avatars(id serial, picture bytea);copy (select xmlelement(name foo, 'hello')) to stdout (format raw_binary, encoding 'latin2');Some examples of expected usage:Now both directions (input/output) working wellThe RAW with text formats means unescaped data, but with correct encoding - input/output is realised with input/output function. RAW binary means content produced/received by sending/received functions.Hihere is cleaned/finished previous implementation of RAW_TEXT/RAW_BINARY formats for COPY statements.\copy avatars(picture) from ~/images/foo.jpg (format raw_binary);select lastval();create table doc(id serial, txt text);\copy doc(txt) from ~/files/aaa.txt (format raw_text, encoding 'latin2');select lastval();RegardsPavel
[pavel@nemesis ~]$ cat avatar.gif | psql -Xq -At -c "copy xx(b) from stdin (format raw_text)" -c "select lastval()" postgres
313
Attachment
On Sat, Jul 16, 2016 at 5:55 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > I am sending fresh version of COPY RAW patch. Moved to next CF per this status. +++ b/src/interfaces/libpq/test/copy-raw-regress.pl @@ -0,0 +1,48 @@ +#!/usr/bin/perl -w + +use strict; I don't understand why this is shaped this way, I mean the perl part if we have the TAP infra in place. MSVC is not testing it as well. -- Michael
Hi,This is a gentle reminder.you assigned as reviewer to the current patch in the 11-2016 commitfest.But you haven't shared your review yet. Please share your review aboutthe patch. This will help us in smoother operation of commitfest.Please Ignore if you already shared your review.
Patch is not applying properly to HEAD.
Sorry for my late response. I've briefly checked a series of discussion in the past. I understood the target/purpose of this patch is provision of a fast interface to import/export a particular cell of a relation, by skip of text<->binary transformation. Its typical use case are XML and JSON data types. Right? If so, how about the idea to use fast-path invocation protocol to call functions to import/export these document types? It allows to accept binary form of the data stream, with minimum overheads. It seems to me extend of COPY statement for this optimization is a bit overkill solution. Do we find out an alternative solution that we can build on the existing infrastructure? Best regards, 2016-12-05 14:16 GMT+09:00 Haribabu Kommi <kommi.haribabu@gmail.com>: > > > On Tue, Nov 22, 2016 at 10:48 PM, Haribabu Kommi <kommi.haribabu@gmail.com> > wrote: >> >> Hi, >> >> This is a gentle reminder. >> >> you assigned as reviewer to the current patch in the 11-2016 commitfest. >> But you haven't shared your review yet. Please share your review about >> the patch. This will help us in smoother operation of commitfest. >> >> Please Ignore if you already shared your review. > > > Patch is not applying properly to HEAD. > Moved to next CF with "waiting on author" status. > > > Regards, > Hari Babu > Fujitsu Australia -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Sorry for my late response.
I've briefly checked a series of discussion in the past.
I understood the target/purpose of this patch is provision of a fast interface
to import/export a particular cell of a relation, by skip of text<->binary
transformation. Its typical use case are XML and JSON data types. Right?
If so, how about the idea to use fast-path invocation protocol to call functions
to import/export these document types?
It allows to accept binary form of the data stream, with minimum overheads.
It seems to me extend of COPY statement for this optimization is a bit overkill
solution. Do we find out an alternative solution that we can build on
the existing
infrastructure?
Best regards,--
2016-12-05 14:16 GMT+09:00 Haribabu Kommi <kommi.haribabu@gmail.com>:
>
>
> On Tue, Nov 22, 2016 at 10:48 PM, Haribabu Kommi <kommi.haribabu@gmail.com>
> wrote:
>>
>> Hi,
>>
>> This is a gentle reminder.
>>
>> you assigned as reviewer to the current patch in the 11-2016 commitfest.
>> But you haven't shared your review yet. Please share your review about
>> the patch. This will help us in smoother operation of commitfest.
>>
>> Please Ignore if you already shared your review.
>
>
> Patch is not applying properly to HEAD.
> Moved to next CF with "waiting on author" status.
>
>
> Regards,
> Hari Babu
> Fujitsu Australia
KaiGai Kohei <kaigai@kaigai.gr.jp>
2016-12-05 22:45 GMT+09:00 Pavel Stehule <pavel.stehule@gmail.com>: > > There are more goals: > > 1. user friendly import of text or binary data - import text data (with > psql) from file is possible - but you have to load a content to psql > variable. For binary data you should to use workaround based on LO and > transformation from LO to bytea. > > 2. user friendly export text or binary data - now, the binary data can be > exported only via transformation to LO. The XML has very interesting > features when is passing from/to client binary. This feature is impossible > in psql now. > : <snip> : >> It seems to me extend of COPY statement for this optimization is a bit >> overkill >> solution. Do we find out an alternative solution that we can build on >> the existing >> infrastructure? > > The advantage and sense of COPY RAW was reusing existing interface. The > question was: How I can export/import binary data simply from psql console? > OK, I could get your point. Likeky, we can implement the feature without COPY statement enhancement by adding a special purpose function and \xxx command on psql. Let's assume the two commands below on psql: \blob_import <table_name> <column_name> (STDIN|<filename>) \blob_export <query> (STDOUT|<filename>) On \blob_import, the psql command reads the binary contents from either stdin or file, than call a special purpose function that takes three arguments; table name, column name and a binary data chunk. PQexecParams() of libpq allows to deliver the data chunk with keeping binary data format, then the special purpose function will be able to lookup the destination table/column and construct a tuple that contains the supplied data chunk. (I think xxxx_recv handler shall be used for data validation, but not an element of this feature.) On \blob_export, the psql command also set up a simple query as follows: SELECT blob_export((<user's supplied query)) For example, \blob_export SELECT binary_data FROM my_table WHERE id = 10 /tmp/aaa shall be transformed to SELECT blob_export((SELECT binary_data FROM my_table WHERE id = 10)) This function is declared as: blob_export(anyelement) RETURNS bytea So, as long as the user supplied query returns exactly one column and one row, it can transform the argument to the binary stream, then psql command receive it and dump somewhere; stdout or file. How about your thought? Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
2016-12-05 22:45 GMT+09:00 Pavel Stehule <pavel.stehule@gmail.com>:
>
> There are more goals:
>
> 1. user friendly import of text or binary data - import text data (with
> psql) from file is possible - but you have to load a content to psql
> variable. For binary data you should to use workaround based on LO and
> transformation from LO to bytea.
>
> 2. user friendly export text or binary data - now, the binary data can be
> exported only via transformation to LO. The XML has very interesting
> features when is passing from/to client binary. This feature is impossible
> in psql now.
>
:
<snip>
:
>> It seems to me extend of COPY statement for this optimization is a bit
>> overkill
>> solution. Do we find out an alternative solution that we can build on
>> the existing
>> infrastructure?
>
> The advantage and sense of COPY RAW was reusing existing interface. The
> question was: How I can export/import binary data simply from psql console?
>
OK, I could get your point.
Likeky, we can implement the feature without COPY statement enhancement
by adding a special purpose function and \xxx command on psql.
Let's assume the two commands below on psql:
\blob_import <table_name> <column_name> (STDIN|<filename>)
\blob_export <query> (STDOUT|<filename>)
On \blob_import, the psql command reads the binary contents from either
stdin or file, than call a special purpose function that takes three
arguments; table name, column name and a binary data chunk.
PQexecParams() of libpq allows to deliver the data chunk with keeping
binary data format, then the special purpose function will be able to
lookup the destination table/column and construct a tuple that contains
the supplied data chunk. (I think xxxx_recv handler shall be used for
data validation, but not an element of this feature.)
On \blob_export, the psql command also set up a simple query as follows:
SELECT blob_export((<user's supplied query))
For example,
\blob_export SELECT binary_data FROM my_table WHERE id = 10 /tmp/aaa
shall be transformed to
SELECT blob_export((SELECT binary_data FROM my_table WHERE id = 10))
This function is declared as:
blob_export(anyelement) RETURNS bytea
So, as long as the user supplied query returns exactly one column and
one row, it can transform the argument to the binary stream, then psql
command receive it and dump somewhere; stdout or file.
How about your thought?
Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
2016-12-06 16:59 GMT+09:00 Pavel Stehule <pavel.stehule@gmail.com>: > > > 2016-12-06 1:50 GMT+01:00 Kohei KaiGai <kaigai@kaigai.gr.jp>: >> >> 2016-12-05 22:45 GMT+09:00 Pavel Stehule <pavel.stehule@gmail.com>: >> > >> > There are more goals: >> > >> > 1. user friendly import of text or binary data - import text data (with >> > psql) from file is possible - but you have to load a content to psql >> > variable. For binary data you should to use workaround based on LO and >> > transformation from LO to bytea. >> > >> > 2. user friendly export text or binary data - now, the binary data can >> > be >> > exported only via transformation to LO. The XML has very interesting >> > features when is passing from/to client binary. This feature is >> > impossible >> > in psql now. >> > >> : >> <snip> >> : >> >> It seems to me extend of COPY statement for this optimization is a bit >> >> overkill >> >> solution. Do we find out an alternative solution that we can build on >> >> the existing >> >> infrastructure? >> > >> > The advantage and sense of COPY RAW was reusing existing interface. The >> > question was: How I can export/import binary data simply from psql >> > console? >> > >> OK, I could get your point. >> >> Likeky, we can implement the feature without COPY statement enhancement >> by adding a special purpose function and \xxx command on psql. >> >> Let's assume the two commands below on psql: >> >> \blob_import <table_name> <column_name> (STDIN|<filename>) >> \blob_export <query> (STDOUT|<filename>) >> >> On \blob_import, the psql command reads the binary contents from either >> stdin or file, than call a special purpose function that takes three >> arguments; table name, column name and a binary data chunk. >> PQexecParams() of libpq allows to deliver the data chunk with keeping >> binary data format, then the special purpose function will be able to >> lookup the destination table/column and construct a tuple that contains >> the supplied data chunk. (I think xxxx_recv handler shall be used for >> data validation, but not an element of this feature.) >> >> >> On \blob_export, the psql command also set up a simple query as follows: >> SELECT blob_export((<user's supplied query)) >> For example, >> \blob_export SELECT binary_data FROM my_table WHERE id = 10 /tmp/aaa >> shall be transformed to >> SELECT blob_export((SELECT binary_data FROM my_table WHERE id = 10)) > > > This is reason why I prefer a COPY statement - because it does all necessary > things natural. But if there is a disagreement against COPY RAW it can be > implemented as psql commands. > Yes, both of approach will be able to implement what you want to do. I agree it is valuable if psql can import/export a particular item with simple shell-script description, however, here is no consensus how to implement it. If psql supports the special \xxx command, it is equivalently convenient from the standpoint of users, with no enhancement of the statement. I hope committers comment on the approach we will take on. Thanks, > export should be similar like \g, \gset feature > > so > > SELECT xmldoc FROM xxxx > \gbinary_store xxxx.xxx > > import is maybe better solved by proposed file references in queries > > Regards > > Pavel > > >> >> >> This function is declared as: >> blob_export(anyelement) RETURNS bytea >> So, as long as the user supplied query returns exactly one column and >> one row, it can transform the argument to the binary stream, then psql >> command receive it and dump somewhere; stdout or file. >> >> How about your thought? >> >> Thanks, >> -- >> KaiGai Kohei <kaigai@kaigai.gr.jp> > > -- KaiGai Kohei <kaigai@kaigai.gr.jp>