Re: raw output from copy - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: raw output from copy
Date
Msg-id CAFj8pRCDNZend88podm+K3fa7GF7uqdhw2h+G_4ffn7nG19vaQ@mail.gmail.com
Whole thread Raw
In response to Re: raw output from copy  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi

2016-03-29 0:26 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> [ copy-raw-format-20160227-03.patch ]

I looked at this patch.  I'm having a hard time accepting that it has
a use-case large enough to justify it, and here's the reason: it's
a protocol break.  Conveniently omitting to update protocol.sgml
doesn't make it not a protocol break.  (libpq.sgml also contains
assorted statements that are falsified by this patch.)

The reply on this question depends how we would to be strict. This doesn't change the format in types stream, but it creates new enum value. Correctly written should to raise exception when is processing unknown enum value.

I'll do tests against old libpq.
 

You could argue that it's the user's own fault if he tries to use
COPY RAW with client-side code that hasn't been updated to support it.
Maybe that's okay, but I wonder if we're opening ourselves up to
problems.  Maybe even security-grade problems.

In terms of specific code that hasn't been updated, ecpg is broken
by this patch, and I'm not very sure what libpq's PQbinaryTuples()
ought to do but probably something other than what it does today.

There's also a definitional question of what we think PQfformat() ought
to do; should it return "2" for the per-field format?  Or maybe the
per-field format is still "1", since it's after all the same binary data
format as for COPY BINARY, and only the overall copy format reported by
PQbinaryTuples() should change to "2".


I'll recheck it
 
BTW, I'm not really sure why the patch is trying to enforce single
row and column for the COPY OUT case.  I thought the idea for that
was that we'd just shove out the data without any delimiters, and
if it's more than one datum it's the user's problem whether he can
identify the boundaries.  On the input side we would have to insist
on one column since we're not going to attempt to identify boundaries
(and one row would fall out of the fact that we slurp the entire input
and treat it as one datum).

It should not be problem. I though about it. The COPY statements is extensible with options. We can support more fields, more rows if it will be required with additional options. But now, it looks like premature optimization.
 

Anyway this is certainly not committable as-is, so I'm setting it back
to Waiting on Author.  But the fact that both libpq and ecpg would need
updates makes me question whether we can safely pretend that this isn't
a protocol break.

I'll do test against some clients.

Regards

Pavel
 

                        regards, tom lane

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [COMMITTERS] pgsql: Sync tzload() and tzparse() APIs with IANA release tzcode2016c.
Next
From: Pavel Stehule
Date:
Subject: Re: raw output from copy