Re: raw output from copy - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: raw output from copy
Date
Msg-id CAFj8pRCUa8QMKmqbfVmsE5zDyH9rdSVL0i=Hku9+nRUVqsYK8w@mail.gmail.com
Whole thread Raw
In response to Re: raw output from copy  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: raw output from copy  (Pavel Stehule <pavel.stehule@gmail.com>)
List pgsql-hackers
Hi

2016-03-29 20:59 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I am writing few lines as summary:

> 1. invention RAW_TEXT and RAW_BINARY
> 2. for RAW_BINARY: PQbinaryTuples() returns 1 and PQfformat() returns 1
> 3.a for RAW_TEXT: PQbinaryTuples() returns 0 and PQfformat() returns 0, but
> the client should to check PQcopyFormat() to not print "\n" on the end
> 3.b for RAW_TEXT: PQbinaryTuples() returns 1 and PQfformat() returns 1, but
> used output function, not necessary client modification
> 4. PQcopyFormat() returns 0 for text, 1 for binary, 2 for RAW_TEXT, 3 for
> RAW_BINARY
> 5. create tests for ecpg

3.b certainly seems completely wrong.  PQfformat==1 would imply binary
data.

I suggest that PQcopyFormat should be understood as defining the format
of the copy data encapsulation, not the individual fields.  So it would go
like 0 = traditional text format, 1 = traditional binary format, 2 = raw
(no encapsulation).  You'd need to also look at PQfformat to distinguish
raw text from raw binary.  But if we do it as you suggest above, we've
locked ourselves into only ever having two field format codes, which
is something the existing design is specifically intended to allow
expansion in.


I wrote concept of raw_text, raw_binary modes.

I am trying to implement text data passing like text format - but for RAW_TEXT it is not practical. Text passing is designed for one line data, for multiline data enforces escaping, what we don't would for RAW mode. I have to skip escaping, and the code is not nice.

So I propose different schema - RAW_TEXT uses text values (uses input/output functions), enforce encoding from/to client codes and for passing to client mode is used binary mode - then I don't need to read the content with line by line. PQbinaryTuples() returns 1 for RAW_TEXT and RAW_BINARY - in these cases data are passed as one binary value. PQfformat returns 2 for RAW_TEXT and 3 for RAW_BINARY.

Any objections to this design?

Regards

Pavel


 
                        regards, tom lane

Attachment

pgsql-hackers by date:

Previous
From: Rajkumar Raghuwanshi
Date:
Subject: Re: Postgres_fdw join pushdown - INNER - FULL OUTER join combination generating wrong result
Next
From: Craig Ringer
Date:
Subject: Re: raw output from copy