Re: raw output from copy - Mailing list pgsql-hackers
From | Pavel Stehule |
---|---|
Subject | Re: raw output from copy |
Date | |
Msg-id | CAFj8pRCkK7VYdadRaN6utNcAWLMQEnZtSQFbzGUqwmWTd1H54g@mail.gmail.com Whole thread Raw |
In response to | Re: raw output from copy (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: raw output from copy
|
List | pgsql-hackers |
Hi
2016-03-29 0:26 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> [ copy-raw-format-20160227-03.patch ]
I looked at this patch. I'm having a hard time accepting that it has
a use-case large enough to justify it, and here's the reason: it's
a protocol break. Conveniently omitting to update protocol.sgml
doesn't make it not a protocol break. (libpq.sgml also contains
assorted statements that are falsified by this patch.)
You could argue that it's the user's own fault if he tries to use
COPY RAW with client-side code that hasn't been updated to support it.
Maybe that's okay, but I wonder if we're opening ourselves up to
problems. Maybe even security-grade problems.
I tested COPY RAW on old psql clients - and it is working without any problem - so when the client uses same logic as psql, then it should to work. Sure, there can be differently implemented clients, but the COPY client side is usually simple - store stream to output.
Maybe I am blind, but I don't see any new security risks. The risk can be only on client side - and if client is not able work with new value, then it can fails. But any attacker can use fake data stream, and can enforce this error too. So if there are some security risks on special designed clients, then this risks is existing now.
In terms of specific code that hasn't been updated, ecpg is broken
by this patch, and I'm not very sure what libpq's PQbinaryTuples()
ought to do but probably something other than what it does today.
There's also a definitional question of what we think PQfformat() ought
to do; should it return "2" for the per-field format? Or maybe the
per-field format is still "1", since it's after all the same binary data
format as for COPY BINARY, and only the overall copy format reported by
PQbinaryTuples() should change to "2".
Theoretically the change there is allowed - "Format code zero indicates textual data representation, while format code one indicates binary representation. (Other codes are reserved for future definition.) - PQfformat". But - the format of COPY RAW is binary - this format is cleaner binary format than is used by COPY BINARY (where is a header + BINARY). I am thinking so PQbinaryTuples should to return 1 (without change), and PQfformat should to return 2. If some older client uses deprecated function PQbinaryTuples(), then 1 is safe value. PQfformat() is documented differently and if there will be different than expected value, then the client should to raise a error. So using 2 is safe there. The value 2 is adequate to actual content
Packet: t=1459265078.596466, session=213070643360702
PGSQL: type=Query, F -> B
QUERY query=copy foo(x) to stdout (format raw);
Packet: t=1459265078.597755, session=213070643360702
PGSQL: type=CopyOutResponse, B -> F
COPY OUT RESPONSE copy format=1, num_fields=1, fields_formats=2
Packet: t=1459265078.597755, session=213070643360702
PGSQL: type=CopyData, B -> F
COPY DATA len=20
Packet: t=1459265078.597755, session=213070643360702
PGSQL: type=CopyDone, B -> F
COPY DONE
Packet: t=1459265078.597755, session=213070643360702
PGSQL: type=CommandComplete, B -> F
COMMAND COMPLETE command='COPY 1'
Packet: t=1459265078.597755, session=213070643360702
PGSQL: type=ReadyForQuery, B -> F
READY FOR QUERY type=<IDLE>
Packet: t=1459265078.596466, session=213070643360702
PGSQL: type=Query, F -> B
QUERY query=copy foo(x) to stdout (format raw);
Packet: t=1459265078.597755, session=213070643360702
PGSQL: type=CopyOutResponse, B -> F
COPY OUT RESPONSE copy format=1, num_fields=1, fields_formats=2
Packet: t=1459265078.597755, session=213070643360702
PGSQL: type=CopyData, B -> F
COPY DATA len=20
Packet: t=1459265078.597755, session=213070643360702
PGSQL: type=CopyDone, B -> F
COPY DONE
Packet: t=1459265078.597755, session=213070643360702
PGSQL: type=CommandComplete, B -> F
COMMAND COMPLETE command='COPY 1'
Packet: t=1459265078.597755, session=213070643360702
PGSQL: type=ReadyForQuery, B -> F
READY FOR QUERY type=<IDLE>
What do you think ?
p.s. These values are returned now
PQfformat(*results, 0)) returns 2 already, PQbinaryTuples() returns 1.
PQfformat(*results, 0)) returns 2 already, PQbinaryTuples() returns 1.
BTW, I'm not really sure why the patch is trying to enforce single
row and column for the COPY OUT case. I thought the idea for that
was that we'd just shove out the data without any delimiters, and
if it's more than one datum it's the user's problem whether he can
identify the boundaries. On the input side we would have to insist
on one column since we're not going to attempt to identify boundaries
(and one row would fall out of the fact that we slurp the entire input
and treat it as one datum).
Anyway this is certainly not committable as-is, so I'm setting it back
to Waiting on Author. But the fact that both libpq and ecpg would need
updates makes me question whether we can safely pretend that this isn't
a protocol break.
I executed all tests in libpq and ecpg without any problems. Can you, please, help me with repeating a ecpg issues?
Regards
Pavel
regards, tom lane
pgsql-hackers by date: