Thread: allow CSV quote in NULL
Greetings, Please find attached a minor patch to remove the constraints that a user can't include the delimiter or quote characters in a 'NULL AS' string when importing CSV files. This allows a user to explicitly request that NULL conversion happen on fields which are quoted. As the quote character is being allowed to be in the 'NULL AS' string now, there's no reason to exclude the delimiter character from being seen in that string as well, though unless quoted using the CSV quote character it won't ever be matched. An example of the usage: sfrost*=# \copy billing_data from ~/BillingSamplePricerFile.csv with csv header quote as '"' null as '""' This is no contrived example, it's an issue I ran into earlier today when I got a file which had (for reasons unknown to me and not easily changed upstream): "1","V","WASHDCABC12","","120033"... Both of the ending columns shown are integer fields, the "" here being used to indicate a NULL value. Without the patch, an ERROR occurs: sfrost=> \copy billing_data from ~/BillingSamplePricerFile.csv with csv header quote as '"' ERROR: invalid input syntax for integer: "" And there's no way to get it to import with COPY CSV mode. The patch adds this ability without affecting existing usage or changing the syntax. Even with the patch an ERROR occurs with the default treatment of CSV files: sfrost=# \copy billing_data from ~/BillingSamplePricerFile.csv with csv header quote as '"' ERROR: invalid input syntax for integer: "" Which would be expected. If the file is modified to remove the ""s for NULL columns, it imports just fine with the syntax above. It'd be really nice to have this included. Thanks! Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > Please find attached a minor patch to remove the constraints that a > user can't include the delimiter or quote characters in a 'NULL AS' > string when importing CSV files. This can't really be sane can it? regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > Please find attached a minor patch to remove the constraints that a > > user can't include the delimiter or quote characters in a 'NULL AS' > > string when importing CSV files. > > This can't really be sane can it? hm? It's a problem I've run into a number of times and I finally got fed up with it enough to create a patch for it. It solves my problem and I don't believe breaks or adversely affects anything else. It also has to be explicitly asked for by the user, and when the user asks for it, what's done is intuitively what would be expected. I don't see the logic in forbidding the user from being able to do this... My specific case involved an integer field which was quoted in the CSV file (which isn't generally a problem, it's only an issue if it ends up being a an empty string). I could easily see it being used for text fields as well though, if, for example, they want empty strings, even when quoted, to import as NULLs. Of course, the default remains that it'd be imported as an empty string, having it be a NULL has to be explicitly requested. Thanks, Stephen
Attachment
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Stephen Frost <sfrost@snowman.net> writes: >> Please find attached a minor patch to remove the constraints that a >> user can't include the delimiter or quote characters in a 'NULL AS' >> string when importing CSV files. > > This can't really be sane can it? Including unquoted delimiter characters seems kind of insane. But including the quote characters or quoted delimiter characters seems reasonable. The alternative would be interpreting NULL strings after dequoting but that would leave no way to include the NULL string literally. This solution means there's no way to include it (if it needs quoting) but only when you specify it this way. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
* Gregory Stark (stark@enterprisedb.com) wrote: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: > > > Stephen Frost <sfrost@snowman.net> writes: > >> Please find attached a minor patch to remove the constraints that a > >> user can't include the delimiter or quote characters in a 'NULL AS' > >> string when importing CSV files. > > > > This can't really be sane can it? > > Including unquoted delimiter characters seems kind of insane. But including > the quote characters or quoted delimiter characters seems reasonable. Right. We could write something to check if the user provides an unquoted delimiter character but I'm not really sure it's worth it, to be perfectly honest. If it'll make people happier with the patch I can do it though. > The alternative would be interpreting NULL strings after dequoting but that > would leave no way to include the NULL string literally. This solution means > there's no way to include it (if it needs quoting) but only when you specify > it this way. Yeah, interpreting NULLs after dequoting means you've lost the information about if it's quoted or not, or you have to add some funky syntax to say "if it's quoted, do it differently...", which is no good, imv. What the patch does basically is say "give us the exact string that shows up between the unquoted delimiters that you want to be treated as a NULL." This removes the complexity of the question about quoting, unquoting, whatever, and makes it a very clear-cut, straight-forward solution with no impact on existing users, imv. Thanks, Stephen
Attachment
Stephen Frost wrote: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > > Stephen Frost <sfrost@snowman.net> writes: > > > Please find attached a minor patch to remove the constraints that a > > > user can't include the delimiter or quote characters in a 'NULL AS' > > > string when importing CSV files. > > > > This can't really be sane can it? > > hm? It's a problem I've run into a number of times and I finally got > fed up with it enough to create a patch for it. I've seen people in the same situation before, so this makes sense from that perspective. If there's a vote, the patch gets a +1 from me. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --------------------------------------------------------------------------- Stephen Frost wrote: > Greetings, > > Please find attached a minor patch to remove the constraints that a > user can't include the delimiter or quote characters in a 'NULL AS' > string when importing CSV files. > > This allows a user to explicitly request that NULL conversion happen > on fields which are quoted. As the quote character is being allowed > to be in the 'NULL AS' string now, there's no reason to exclude the > delimiter character from being seen in that string as well, though > unless quoted using the CSV quote character it won't ever be matched. > > An example of the usage: > > sfrost*=# \copy billing_data from ~/BillingSamplePricerFile.csv > with csv header quote as '"' null as '""' > > This is no contrived example, it's an issue I ran into earlier today > when I got a file which had (for reasons unknown to me and not easily > changed upstream): > > "1","V","WASHDCABC12","","120033"... > > Both of the ending columns shown are integer fields, the "" here being > used to indicate a NULL value. > > Without the patch, an ERROR occurs: > > sfrost=> \copy billing_data from ~/BillingSamplePricerFile.csv > with csv header quote as '"' > ERROR: invalid input syntax for integer: "" > > And there's no way to get it to import with COPY CSV mode. The > patch adds this ability without affecting existing usage or changing > the syntax. Even with the patch an ERROR occurs with the default > treatment of CSV files: > > sfrost=# \copy billing_data from ~/BillingSamplePricerFile.csv > with csv header quote as '"' > ERROR: invalid input syntax for integer: "" > > Which would be expected. If the file is modified to remove the ""s > for NULL columns, it imports just fine with the syntax above. > > It'd be really nice to have this included. > > Thanks! > > Stephen [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +