Re: raw output from copy - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: raw output from copy
Date
Msg-id 570006F4.9090309@dunslane.net
Whole thread Raw
In response to Re: raw output from copy  ("Daniel Verite" <daniel@manitou-mail.org>)
Responses Re: raw output from copy  ("Daniel Verite" <daniel@manitou-mail.org>)
List pgsql-hackers

On 04/01/2016 11:42 AM, Daniel Verite wrote:
>     Andrew Dunstan wrote:
>
>> If someone can make a good case that this is going to be of
>> general use I'll happily go along, but I haven't seen one so far.
> About COPY FROM with a raw format, for instance just yesterday
> there was this user question on stackoverflow:
> http://stackoverflow.com/questions/36317237
>
> which essentially is: how to import contents from a file without any
> particular interpretation of any character?\


There is so much wrong with this it's hard to know where to start.

Inserting the whole contents of a text file unchanged is insanely easy 
in psql.
    \set file `cat /path/to/file`    insert into mytable(contents) values(:'file');

What is more everyone on SO missed the fact that CSV mode gives you very 
considerable control over the quote, delimiter and null settings.

See for example 
<http://adpgtech.blogspot.com/2014/09/importing-json-data.html> which 
has this example for handling files consisting of 1 json document per line:
copy the_table(jsonfield)from '/path/to/jsondata'csv quote e'\x01' delimiter e'\x02';

psql's \copy will work just the same way

(I noticed with amusement this week that CitusData is using pretty much 
exactly this in one of their examples.)

>
> With the patch discussed in this thread, a user can do
> \copy table(textcol) from /path/to/file (format raw)
> or the equivalent COPY.
> If it's a binary column, that works just the same.


It would be fairly simple to invent a binary mechanism that did the 
equivalent of the above insert. All without any change to SQL or the 
backend at all.


>
> Without this, it's not obvious at all how this result can be
> achieved without resorting to external preprocessing,
> and assuming the availability of such preprocessing tools
> in the environment. Notwithstanding the fact that the
> solution proposed on SO (doubling backslashes with sed)
> doesn't even work if the file contains tabs, as they would be
> interpreted as field separators, even if the copy target has only
> one column. You can change the delimiter with COPY but AFAIK
> you can't tell that there is none.


There is arguably a good case for allowing a null delimiter. But that SO 
page is just a terrible piece of misinformation, as far too often 
happens in my experience.

And I am still waiting for a non-psql use case. But I don't expect to 
see one, precisely because most clients have no difficulty at all in 
handling binary data.

cheers

andrew









pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Transactional enum additions - was Re: Alter or rename enum value
Next
From: Tom Lane
Date:
Subject: Re: Batch update of indexes