Thread: Add encoding support to COPY
Today on IRC, someone was wondering what the preferred method of exporting data in a specific encoding via COPY was. They reply was wrapping the COPY command in "set client_encoding='foo';", which made me wonder how hard it would be to add an additional WITH parameter to the actual COPY statement to specify the encoding, a la: [ [ WITH ] [ BINARY ] [ OIDS ] [ DELIMITER [ AS ] 'delimiter' ] [ ENCODING [ AS ] 'charset'] [ NULL [ AS ] 'null string' ] [ CSV [ HEADER ] [ QUOTE [ AS ] 'quote' ] [ ESCAPE [ AS ] 'escape' ] [ FORCE QUOTE column [, ...] ] Any objections? It seems like a cleaner solution client side than issuing multiple calls to set the client_encoding. If there are no objections, I can attempt to prepare a patch for the next commitfest. David Blewett
David Blewett <david@dawninglight.net> writes: > Today on IRC, someone was wondering what the preferred method of > exporting data in a specific encoding via COPY was. They reply was > wrapping the COPY command in "set client_encoding='foo';", which made > me wonder how hard it would be to add an additional WITH parameter to > the actual COPY statement to specify the encoding, a la: What is the point? You'd generally have client_encoding set correctly for your usage anyway, and if you did not, the data could confuse your client-side code terribly. Offering an option to let the backend send data in the "wrong" encoding does NOT seem like a good idea to me. regards, tom lane
On Wed, Jul 15, 2009 at 12:04 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > What is the point? You'd generally have client_encoding set correctly > for your usage anyway, and if you did not, the data could confuse your > client-side code terribly. Offering an option to let the backend send > data in the "wrong" encoding does NOT seem like a good idea to me. The use case was that the client connection was using one encoding, but needed to output the file in a different encoding. So they would have to do the "set client_encoding" dance each time they wanted to export the file. I don't see how it's "wrong", especially considering there is already a method to do this, albeit cumbersome. I consider it simply syntactic sugar over existing functionality. David Blewett
David Blewett <david@dawninglight.net> writes: > On Wed, Jul 15, 2009 at 12:04 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> What is the point? �You'd generally have client_encoding set correctly >> for your usage anyway, and if you did not, the data could confuse your >> client-side code terribly. �Offering an option to let the backend send >> data in the "wrong" encoding does NOT seem like a good idea to me. > The use case was that the client connection was using one encoding, > but needed to output the file in a different encoding. So they would > have to do the "set client_encoding" dance each time they wanted to > export the file. Well, it might make sense to allow an ENCODING option attached to a COPY with a file source/destination. I remain of the opinion that overriding client_encoding on a transfer to/from the client is a bad idea. regards, tom lane
David Blewett wrote: > Today on IRC, someone was wondering what the preferred method of > exporting data in a specific encoding via COPY was. They reply was > wrapping the COPY command in "set client_encoding='foo';", which made > me wonder how hard it would be to add an additional WITH parameter to > the actual COPY statement to specify the encoding, a la: > [ [ WITH ] > [ BINARY ] > [ OIDS ] > [ DELIMITER [ AS ] 'delimiter' ] > [ ENCODING [ AS ] 'charset' ] > [ NULL [ AS ] 'null string' ] > [ CSV [ HEADER ] > [ QUOTE [ AS ] 'quote' ] > [ ESCAPE [ AS ] 'escape' ] > [ FORCE QUOTE column [, ...] ] > > Any objections? It seems like a cleaner solution client side than > issuing multiple calls to set the client_encoding. If there are no > objections, I can attempt to prepare a patch for the next commitfest. > > David Blewett > I think that I was the one who wondered about that. Our use case is related to moving data between different servers which have different encodings. Ofcourse the encoding should be an option only when COPY involves files. -- Nagy Karoly Gabriel Expert Software Group SRL (o__ 417495 Sanmartin nr. 205 //\' Bihor, Romania V_/_ Tel./Fax: +4 0259 317 142, +4 0259 317 143
Attachment
Apologies to Tom for the duplicate... On Wed, Jul 15, 2009 at 12:17 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Well, it might make sense to allow an ENCODING option attached to a COPY > with a file source/destination. I remain of the opinion that overriding > client_encoding on a transfer to/from the client is a bad idea. I really don't see how it is any different from manually flipping the client_encoding before/after the transfer. We could of course put a warning sign in the docs, but it seems to me it's more error prone for clients to set the client_encoding manually rather than include an option for a single command. What happens if an exception is thrown during the COPY process and the client doesn't handle things correctly? The rest of their session could be in an unexpected encoding, whereas with this method we know to return to the original client_encoding before doing anything else. By including the encoding option, their explicitly saying how they want to handle the data. I could see a use case for remote client code to do a COPY to STDOUT, that is actually being redirected to a file. If the consensus is for local file-based operations only, however, I can structure the patch that way. David
--On 15. Juli 2009 19:59:56 +0300 Nagy Karoly Gabriel <nagy.karoly@expert-erp.net> wrote: > I think that I was the one who wondered about that. Our use case is > related to moving data between different servers which have different > encodings. Ofcourse the encoding should be an option only when COPY > involves files. I find this rather confusing: can't you just tell via client_encoding the correct encoding your file contains during restore? -- Thanks Bernd
David Blewett wrote: > On Wed, Jul 15, 2009 at 12:17 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > > Well, it might make sense to allow an ENCODING option attached to a COPY > > with a file source/destination. I remain of the opinion that overriding > > client_encoding on a transfer to/from the client is a bad idea. > I could see a use case for remote client code to do a COPY to STDOUT, > that is actually being redirected to a file. If the consensus is for > local file-based operations only, however, I can structure the patch > that way. Yeah, the problem is that reading to/from files is only allowed to superusers ... (I'm not sure how this affects \copy in psql; probably something you should investigate) -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
David Blewett <david@dawninglight.net> writes: > On Wed, Jul 15, 2009 at 12:17 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> Well, it might make sense to allow an ENCODING option attached to a COPY >> with a file source/destination. �I remain of the opinion that overriding >> client_encoding on a transfer to/from the client is a bad idea. > I really don't see how it is any different from manually flipping the > client_encoding before/after the transfer. The difference is that the client-side code gets told that the encoding changed if you do the latter. regards, tom lane
On Wed, Jul 15, 2009 at 4:07 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > David Blewett <david@dawninglight.net> writes: >> On Wed, Jul 15, 2009 at 12:17 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> Well, it might make sense to allow an ENCODING option attached to a COPY >>> with a file source/destination. I remain of the opinion that overriding >>> client_encoding on a transfer to/from the client is a bad idea. > >> I really don't see how it is any different from manually flipping the >> client_encoding before/after the transfer. > > The difference is that the client-side code gets told that the encoding > changed if you do the latter. Do you mean at the protocol level? All I was planning on having the patch do is the equivalent of the set client_encoding dance. Wouldn't that be sufficent to notify the client of the encoding change? David Blewett
Nagy Karoly Gabriel wrote: > David Blewett wrote: >> Today on IRC, someone was wondering what the preferred method of >> exporting data in a specific encoding via COPY was. They reply was >> wrapping the COPY command in "set client_encoding='foo';", which made >> me wonder how hard it would be to add an additional WITH parameter to >> the actual COPY statement to specify the encoding, a la: >> > I think that I was the one who wondered about that. Our use case is > related to moving data between different servers which have different > encodings. Ofcourse the encoding should be an option only when COPY > involves files. > Well, that is the case that there seems to be consensus about, and it's also the case that can't be done via client encoding. We tend to have a bias against providing lots of ways to do the same thing, so let's go with this case (i.e. do it for cases other than STDIN/STDOUT). cheers andrew