Re: libpq should have functions for escaping data for use in COPY FROM - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: libpq should have functions for escaping data for use in COPY FROM |
Date | |
Msg-id | CA+TgmoYQ8tksGPfyxx=dyrTO72xKLsgvuGodc99sxCLBXV0Gng@mail.gmail.com Whole thread Raw |
In response to | libpq should have functions for escaping data for use in COPY FROM (Joey Adams <joeyadams3.14159@gmail.com>) |
Responses |
Re: libpq should have functions for escaping data for use in COPY FROM
|
List | pgsql-hackers |
On Fri, Mar 9, 2012 at 9:16 PM, Joey Adams <joeyadams3.14159@gmail.com> wrote: > libpq has functions for escaping values in SQL commands > (PQescapeStringConn, PQescapeByteaConn, and the new PQescapeLiteral), > and it supports parameterizing queries with PQexecParams. But it does > not (to my knowledge) have functions for escaping values for COPY > FROM. > > COPY FROM is useful for inserting rows in bulk (though I wonder if > constructing massive INSERT statements and using PQexecParams is just > as efficient). It is also useful for generating .sql files which can > be run on a database elsewhere. > > I think libpq should include functions for escaping with COPY FROM. I'm a little bit confused about what you're getting at here, because COPY has a huge pile of options - not just CSV or text, but also things like QUOTE and DELIMITER. It's not like there is ONE way to escape things for COPY. I guess we could include code that escapes things in the manner that an optionless COPY expects, or we could include in the API all the same options that COPY supports, but the former sounds narrow and the latter complex. > Before spending a bunch of time on this, I'd like some input. A few questions: > > * Should we have corresponding functions for parsing COPY TO data, or > is PQexecParams sufficient? > > * Should we support CSV escaping? Can the CSV format safely encode > all characters (in particular, newlines)? The fine manual page for COPY discusses how to encode CSV data in considerable detail. > * Should we deal with encodings here, or just escape everything that > isn't printable ASCII like the code I wrote does? I think your code will fall over badly if fed, say, UTF-8 characters with code points greater than 0x7F. I doubt very much that we would accept anything into libpq that doesn't handle all the encodings we support, and that covers a lot of territory. There are some restrictions on the set of server-side encodings - we only allow those that have certain "safe" properties - but IIUC client encodings are much less restricted and a lot of wacky stuff is possible. Even if you can come up with code that handles all cases correctly, it'll probably perform much less well in simple cases than the quick hack you linked to here. Considering all the above, this seems like it might be a solution in search of a problem. It's not actually that hard to write code to do proper escaping for a *given* encoding and a *given* set of COPY options, but trying to write something general sounds like a job and a half. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: