Home > mailing lists

Re: raw output from copy - Mailing list pgsql-hackers

From	Daniel Verite
Subject	Re: raw output from copy
Date	April 4, 2016 16:38:45
Msg-id	4509626f-b87f-4e5e-816f-eee1137615ba@mm Whole thread
In response to	Re: raw output from copy (Andrew Dunstan <andrew@dunslane.net>)
Responses	Re: raw output from copy
List	pgsql-hackers

Tree view

    Andrew Dunstan wrote:

> Inserting the whole contents of a text file unchanged is insanely easy
> in psql.
>
>      \set file `cat /path/to/file`
>      insert into mytable(contents) values(:'file');

That's assuming psql but the asker of that question never mentioned
using psql. The COPY invocation could be inside a function. Even if
that particular user would be fine with a psql-only option, the next
one might not. Or they might want to import a binary file, and
as you mention, currently there's no equivalent of the :'var'
feature for binary.

But there's another aspect to this that's worth of consideration,
and that this forum question illustrates.
One reason of adding the format to COPY is that it's where users
are looking for it. It's the canonical way of importing contents
from files so that's where it makes more sense.
From the POV of being user friendly and consistent, restricting what
COPY can do because psql could do it completely differently
if the user was psql-savvy enough to know it, what sense does it
make?

> And I am still waiting for a non-psql use case. But I don't expect to
> see one, precisely because most clients have no difficulty at all in
> handling binary data.

You mean small or medium-size binary data. The 512MB-1GB range is
impossible to handle if requested in text format, which is what drivers
tend to use. Even pg_dump fails on these contents.
Maybe it was unimportant when bytea was added ~15 years ago,
but the size of data that people actually put into bytea columns is
growing, following Moore's law like the rest.

Even in the lower size range, considering the amount of memory allocated
and the time spent to convert to hex, sending twice the number
of bytes on the wire, just to do the reverse conversion in the client
as soon as all data is obtained, it works but it's pointless
and inefficient.

Code that uses PQexecParams() binary "resultFormat", or the
binary format of copy doesn't have that problem,  but most
client-side drivers don't do that.

And maybe they just can't realistically, because  getting result
format in binary is exposed as an all-or-nothing choice in libpq.

I mean if client code does SELECT * FROM table or even COPY of the
same, and what comes back is bytea and e.g. timestamps and floats and
custom types, the client-side driver may wish to have the bytea field in
binary format for efficiency and the rest in text format for
usability, but that's not possible with PQexecParams(), or other
libpq functions.

The point of mixing binary and text is outside the scope of a RAW
format for COPY, as obviously it wouldn't help with that in any way,
but on the argument that the status quo is fine because clients
have no difficulty, that's just not true. Clients cope with what they have,
but what they have is far from being complete or optimal.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

pgsql-hackers by date:

From: Tom Lane
Date: 04 April 2016, 16:36:12
Subject: Re: [PROPOSAL] Client Log Output Filtering

From: Robert Haas
Date: 04 April 2016, 16:45:36
Subject: Re: [BUGS] Breakage with VACUUM ANALYSE + partitions

Re: raw output from copy - Mailing list pgsql-hackers

Previous

Next