Re: raw output from copy - Mailing list pgsql-hackers
From | Daniel Verite |
---|---|
Subject | Re: raw output from copy |
Date | |
Msg-id | 4509626f-b87f-4e5e-816f-eee1137615ba@mm Whole thread Raw |
In response to | Re: raw output from copy (Andrew Dunstan <andrew@dunslane.net>) |
Responses |
Re: raw output from copy
|
List | pgsql-hackers |
Andrew Dunstan wrote: > Inserting the whole contents of a text file unchanged is insanely easy > in psql. > > \set file `cat /path/to/file` > insert into mytable(contents) values(:'file'); That's assuming psql but the asker of that question never mentioned using psql. The COPY invocation could be inside a function. Even if that particular user would be fine with a psql-only option, the next one might not. Or they might want to import a binary file, and as you mention, currently there's no equivalent of the :'var' feature for binary. But there's another aspect to this that's worth of consideration, and that this forum question illustrates. One reason of adding the format to COPY is that it's where users are looking for it. It's the canonical way of importing contents from files so that's where it makes more sense. From the POV of being user friendly and consistent, restricting what COPY can do because psql could do it completely differently if the user was psql-savvy enough to know it, what sense does it make? > And I am still waiting for a non-psql use case. But I don't expect to > see one, precisely because most clients have no difficulty at all in > handling binary data. You mean small or medium-size binary data. The 512MB-1GB range is impossible to handle if requested in text format, which is what drivers tend to use. Even pg_dump fails on these contents. Maybe it was unimportant when bytea was added ~15 years ago, but the size of data that people actually put into bytea columns is growing, following Moore's law like the rest. Even in the lower size range, considering the amount of memory allocated and the time spent to convert to hex, sending twice the number of bytes on the wire, just to do the reverse conversion in the client as soon as all data is obtained, it works but it's pointless and inefficient. Code that uses PQexecParams() binary "resultFormat", or the binary format of copy doesn't have that problem, but most client-side drivers don't do that. And maybe they just can't realistically, because getting result format in binary is exposed as an all-or-nothing choice in libpq. I mean if client code does SELECT * FROM table or even COPY of the same, and what comes back is bytea and e.g. timestamps and floats and custom types, the client-side driver may wish to have the bytea field in binary format for efficiency and the rest in text format for usability, but that's not possible with PQexecParams(), or other libpq functions. The point of mixing binary and text is outside the scope of a RAW format for COPY, as obviously it wouldn't help with that in any way, but on the argument that the status quo is fine because clients have no difficulty, that's just not true. Clients cope with what they have, but what they have is far from being complete or optimal. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
pgsql-hackers by date: