Re: raw output from copy - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: raw output from copy
Date
Msg-id CAFj8pRBipcyx5MaqFX1aBR_ruvo0JiHdbd+GxL=98DGvuyFx5g@mail.gmail.com
Whole thread Raw
In response to Re: raw output from copy  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: raw output from copy
List pgsql-hackers
Hi

2016-03-29 18:19 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I tested COPY RAW on old psql clients - and it is working without any
> problem - so when the client uses same logic as psql, then it should to
> work. Sure, there can be differently implemented clients, but the COPY
> client side is usually simple - store stream to output.

My point is precisely that I doubt all clients are that stupid about COPY.

> Maybe I am blind, but I don't see any new security risks. The risk can be
> only on client side - and if client is not able work with new value, then
> it can fails.

Well, the point is that low-level code might get used to process the data
stream for commands it doesn't have any control over.  Maybe there's no
realistic security risk there, or maybe there is; I'm not sure.

> I am thinking so PQbinaryTuples should to return 1 (without change), and
> PQfformat should to return 2.

Well, that seems pretty backwards to me.  The format of the individual
fields is still what it is under COPY BINARY; you would not use a
different per-field transformation.  You do need to know about the
overall format of the copy data stream being different, and defining
PQbinaryTuples as still returning 1 means there's no clean way to
understand overall copy format vs. per-field format.

There's a case to be made that we should invent a new function named
along the lines of PQcopyFormat() rather than overloading PQbinaryTuples()
some more.  That function is currently deprecated and I'm not very happy
with un-deprecating it only to use it in a confusing way.

I see a introduction of PQcopyFormat() as best idea. So for PQbinaryTuples() and PQfformat() these new changes are transparent - and PQcopyFormat can returns info about used method.


To be more concrete about this: I think it's actually rather broken
that this patch ties RAW to binary format of the field contents.
Why would it not be exactly as useful to have delimiter-less COPY
of textual data, for use when there's just one datum and/or you're
confident in picking the data apart for yourself?  But as things stand
it'd be too confusing for an application to try to figure out what's
happening in such a case.

So I think we should either invent RAW_TEXT and RAW_BINARY formats
(not just RAW) or make RAW be an orthogonal copy option.  And we need
to improve libpq's behavior enough so that applications can sanely
figure out what's happening.

I had a use case that required binary mode. Higher granularity has sense.

This opening new question - RAW_TEXT will use text output function. But if I will pass this value as text value, then a behave of current clients will be same as usual COPY. So I need to use binary protocol. And then the behave of PQbinaryTuples() and PQfformat() is the question? Although text value can be passed in binary mode too (with format [length, data...]).
 

> I executed all tests in libpq and ecpg without any problems. Can you,
> please, help me with repeating a ecpg issues?

Of course the ecpg tests pass; you didn't extend them to see what would
happen if someone tries COPY RAW with ecpg.   Likewise, we have no tests
exercising a client's use of libpq with more intelligence than psql has
got.  But that doesn't mean it's acceptable to write this patch with no
thought for such clients.

if we don't change PQbinaryTuples() and PQfformat(), then COPY RAW should be transparent for any client. Server sending data in binary format - what is generic.
 

I am fairly sure that there actually are third-party client libraries
that have more intelligence about COPY than psql, but I do not remember
any specifics unfortunately.

The COPY RAW should not to break any existing application. This is new feature - and old application, old client use COPY RAW newer. I see as important the conformity of used mode (text/binary) and PQbinaryTuples() and PQfformat().

I am writing few lines as summary:

1. invention RAW_TEXT and RAW_BINARY
2. for RAW_BINARY: PQbinaryTuples() returns 1 and PQfformat() returns 1
3.a for RAW_TEXT: PQbinaryTuples() returns 0 and PQfformat() returns 0, but the client should to check PQcopyFormat() to not print "\n" on the end
3.b for RAW_TEXT: PQbinaryTuples() returns 1 and PQfformat() returns 1, but used output function, not necessary client modification
4. PQcopyFormat() returns 0 for text, 1 for binary, 2 for RAW_TEXT, 3 for RAW_BINARY
5. create tests for ecpg

Is it ok?

What do you prefer 3.a, or 3.b?

Regards

Pavel

                        regards, tom lane

pgsql-hackers by date:

Previous
From: Petr Jelinek
Date:
Subject: Re: Sequence Access Method WIP
Next
From: Teodor Sigaev
Date:
Subject: Re: WIP: Access method extendability