Re: raw output from copy - Mailing list pgsql-hackers

From Tom Lane
Subject Re: raw output from copy
Date
Msg-id 12799.1459788946@sss.pgh.pa.us
Whole thread Raw
In response to Re: raw output from copy  ("Daniel Verite" <daniel@manitou-mail.org>)
Responses Re: raw output from copy  ("Daniel Verite" <daniel@manitou-mail.org>)
List pgsql-hackers
"Daniel Verite" <daniel@manitou-mail.org> writes:
> One reason of adding the format to COPY is that it's where users
> are looking for it. It's the canonical way of importing contents
> from files so that's where it makes more sense.

I'm not sure I buy that argument, because it could be used to justify
adding absolutely any ETL functionality to COPY.  And we don't want
to go down that path; the design intention for COPY is that it be as
simple and fast as possible.

>> And I am still waiting for a non-psql use case. But I don't expect to 
>> see one, precisely because most clients have no difficulty at all in 
>> handling binary data.

> You mean small or medium-size binary data. The 512MB-1GB range is
> impossible to handle if requested in text format, which is what drivers
> tend to use. Even pg_dump fails on these contents.

... which is COPY.  I do not see that RAW mode is going to help much
here: it's not going to be noticeably better than COPY BINARY in terms
of maximum field width.

>> Code that uses PQexecParams() binary "resultFormat", or the
>> binary format of copy doesn't have that problem,  but most
>> client-side drivers don't do that.

> And maybe they just can't realistically, because  getting result
> format in binary is exposed as an all-or-nothing choice in libpq.

That's simply wrong.  Read the documentation for PQexecParams and
friends: you can specify text or binary per-column.  It's COPY that
has the only-one-column-format restriction, and RAW certainly isn't
going to make that better.


I'm not quite as convinced as Andrew that RAW mode is unnecessary,
but I don't find these arguments for it to be very compelling.

The real issue to my mind is that it doesn't seem like we can shoehorn
a sanely-defined version of RAW into the existing protocol spec without
creating compatibility hazards.  So we can either wait for the mythical
protocol v4 (but even a protocol update wouldn't fix the application-level
hazards) or we can treat it as a problem to be solved client-side.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [BUGS] Breakage with VACUUM ANALYSE + partitions
Next
From: "David G. Johnston"
Date:
Subject: Re: raw output from copy