Thread: Timestamp input + copy
Hi Postgressers! I really like Postgres. Thanks for all your work on it. I just have a problem with the way it's handling my flat file's timestamp columns. I have a flat file with a column with dates formatted like this: 2004-04-15 18:04:26 PM It's a bit strange, I know, but I didn't create the file. My idea of Postgres's proper behavior would be to load this date as a military time (and ignore the "PM"). MS SQL Server behaves in this way. Postgres, however, won't even load the file: bonusticket=copy bonusticket2004Q2 from '/home/kevin/bonusticket/data3/uberfile/uberfile1.txt' null as ''; # ERROR: date/time field value out of range: "2004-04-15 18:04:26 PM" CONTEXT: COPY bonusticket2004q2, line 17, column submit_date: "2004-04-15 18:04:26 PM" I presume that Postgres is complaining about the fact that I have an 18 in the hour slot of a supposedly PM time. What can I do about this? Can I possibly specify a time format (similar to the 'YYYY-MM-DD HH24:MI:SS' I might pass to to_timestamp) at load time? Thanks for any help you can provide, Kevin
Kevin Bartz wrote: > I have a flat file with a column with dates formatted like this: > > 2004-04-15 18:04:26 PM > > It's a bit strange, I know, but I didn't create the file. My idea of > Postgres's proper behavior would be to load this date as a military > time (and ignore the "PM"). MS SQL Server behaves in this way. I couldn't disagree more that it would be correct behavior to ignore the PM and treat it as 24-hour time. It's one of the most important features of a database that when you give is bad data, it responds with an error message rather than trying to guess at what you mean. Why are you using a database, if not to ensure that you can trust your data; and how can you trust data that comes from an ambiguous source? I'm sorry to hear that SQL Server accepts this input without complaint. It's caused me to lose confidence in that product. > What can I do about > this? Can I possibly specify a time format (similar to the > 'YYYY-MM-DD HH24:MI:SS' I might pass to to_timestamp) at load time? If you know that the time is in 24-hour form and want to ignore the AM or PM specifier, then you can certainly run it by a processor written in pretty much any programming language that will fix it. In UNIX sed, it looks like this (all on one line): cat data.txt | sed 's([0-9]{2,4}\-[0-9]{1,2}\-[0-9]{1,2} [0-9]{1,2}\:[0-9]{1,2}:[0-9]{1,2}) ((AM|PM))?/\1/g' > data.txt.fixed -- www.designacourse.com The Easiest Way to Train Anyone... Anywhere. Chris Smith - Lead Software Developer/Technical Trainer MindIQ Corporation
Chris! Thanks so much for your reply. Now that I think about it, I guess you're right. I would rather know about what's going on in my data set than simply be oblivious to it, even though SQL Server happened to guess correctly in this case. Come to think of it, it wasn't until I tried to use Postgres that I even knew my dates were formatted like that! Anyway, I retract my prior statement. Regarding your suggestion: is there any way I can make it go a little faster? I used sed extensively for dates back in my MySQL days (MySQL is VERY finicky about its datetime input), but I have upwards of 10 GB of data like this, with the ill-formatted example I gave you on most of the lines. At the rate it's going now, sed's going to need more than an hour to strip away all the PMs and AMs. Is there any way I can tell Postgres about the format to expect for date strings, in the same manner I would tell the function to_timestamp (i.e., 'YYYY-MM-DD HH24:MI:SS')? Alternatively, I could load the bad column as a string and then tell Postgres to recast it (using the "using" clause) with to_timestamp. The syntax listed in the documentation, however, alter table bonusticket alter submit_date type timestamp using to_timestamp(substring(submit_date, 1, 19), 'YYYY-MM-DD HH24:MI:SS'); throws a syntax error, it explains, at the word "type." I have the latest version, 7.4.3 on a 64-bit SuSE box. Any suggestions? Kevin -----Original Message----- From: Chris Smith [mailto:cdsmith@twu.net] Sent: Wednesday, July 28, 2004 3:51 PM To: Kevin Bartz Subject: Re: [GENERAL] Timestamp input + copy Kevin Bartz wrote: > I have a flat file with a column with dates formatted like this: > > 2004-04-15 18:04:26 PM > > It's a bit strange, I know, but I didn't create the file. My idea of > Postgres's proper behavior would be to load this date as a military > time (and ignore the "PM"). MS SQL Server behaves in this way. I couldn't disagree more that it would be correct behavior to ignore the PM and treat it as 24-hour time. It's one of the most important features of a database that when you give is bad data, it responds with an error message rather than trying to guess at what you mean. Why are you using a database, if not to ensure that you can trust your data; and how can you trust data that comes from an ambiguous source? I'm sorry to hear that SQL Server accepts this input without complaint. It's caused me to lose confidence in that product. > What can I do about > this? Can I possibly specify a time format (similar to the > 'YYYY-MM-DD HH24:MI:SS' I might pass to to_timestamp) at load time? If you know that the time is in 24-hour form and want to ignore the AM or PM specifier, then you can certainly run it by a processor written in pretty much any programming language that will fix it. In UNIX sed, it looks like this (all on one line): cat data.txt | sed '/s/([0-9]{2,4}\-[0-9]{1,2}\-[0-9]{1,2} [0-9]{1,2}\:[0-9]{1,2}:[0-9]{1,2}) ((AM|PM))?/\1/g' > data.txt.fixed -- www.designacourse.com The Easiest Way to Train Anyone... Anywhere. Chris Smith - Lead Software Developer/Technical Trainer MindIQ Corporation
On Wednesday 28 July 2004 3:11 pm, Kevin Bartz wrote: ... > I have a flat file with a column with dates formatted like this: > > 2004-04-15 18:04:26 PM > > It's a bit strange, I know, but I didn't create the file. My idea > of Postgres's proper behavior would be to load this date as a > military time (and ignore the "PM"). MS SQL Server behaves in this > way. Postgres, however, won't even load the file: Edit the file to eliminate the PM. I don't know how large the file is but fixing the date should be pretty easy with vi, sed, perl, etc. (choose the one that works for you). What seems like PostgreSQL being annoying and pedantic is really PostgreSQL protecting your a** - the importance of data integrity and all that. Some databases try to be "helpful" (or are just sloppy) so if you try to insert a number bigger than that allowed by the field it just truncates it to the largest number that will fit. I hope nobody uses that database for financial data. What should a database do if confronted with '2004-04-15 18:04:26 AM'? I would much prefer the system to throw an error and let me evaluate and fix it than silently "help" me by loading corrupt data. Cheers and welcome to PostgreSQL, Steve
"Kevin Bartz" <kbartz@loyaltymatrix.com> writes: > Alternatively, I could load the bad column as a string and then tell > Postgres to recast it (using the "using" clause) with to_timestamp. The > syntax listed in the documentation, however, > alter table bonusticket > alter submit_date type timestamp using to_timestamp(substring(submit_date, > 1, 19), 'YYYY-MM-DD HH24:MI:SS'); > throws a syntax error, it explains, at the word "type." I have the latest > version, 7.4.3 on a 64-bit SuSE box. Any suggestions? You apparently are reading development-tip documentation rather than 7.4 documentation; that flavor of ALTER does not exist in 7.4. I think the only real solution in 7.4 is to load the data into a temp table in which this column is declared as "text", and then transfer to the final table using INSERT/SELECT (or possibly CREATE TABLE AS) with the appropriate conversion expression. I suspect though that the sed-based solution is going to be quicker than any of these. regards, tom lane
Thanks, guys. sed took a couple hours, but it's all done now, and the data are loaded. I still like Postgres! Kevin -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Wednesday, July 28, 2004 5:39 PM To: Kevin Bartz Cc: 'Chris Smith'; pgsql-general@postgresql.org Subject: Re: [GENERAL] Timestamp input + copy "Kevin Bartz" <kbartz@loyaltymatrix.com> writes: > Alternatively, I could load the bad column as a string and then tell > Postgres to recast it (using the "using" clause) with to_timestamp. The > syntax listed in the documentation, however, > alter table bonusticket > alter submit_date type timestamp using to_timestamp(substring(submit_date, > 1, 19), 'YYYY-MM-DD HH24:MI:SS'); > throws a syntax error, it explains, at the word "type." I have the latest > version, 7.4.3 on a 64-bit SuSE box. Any suggestions? You apparently are reading development-tip documentation rather than 7.4 documentation; that flavor of ALTER does not exist in 7.4. I think the only real solution in 7.4 is to load the data into a temp table in which this column is declared as "text", and then transfer to the final table using INSERT/SELECT (or possibly CREATE TABLE AS) with the appropriate conversion expression. I suspect though that the sed-based solution is going to be quicker than any of these. regards, tom lane
On Wed, 2004-07-28 at 19:25, Kevin Bartz wrote: > Thanks, guys. sed took a couple hours, but it's all done now, and the data > are loaded. I still like Postgres! ya know, there's an easier way (imnsho): just import it into a text field, then update the real field with update table tablename set realfield = substring(tempfield,1,20) Not sure it'd be any faster, but hey, it's pretty simple to do.
Thanks for your reply, Scott. A variation on that suggestion does work very well for me. Kevin -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Scott Marlowe Sent: Thursday, July 29, 2004 12:25 AM To: Kevin Bartz Cc: 'Tom Lane'; pgsql-general@postgresql.org Subject: Re: [GENERAL] Timestamp input + copy On Wed, 2004-07-28 at 19:25, Kevin Bartz wrote: > Thanks, guys. sed took a couple hours, but it's all done now, and the data > are loaded. I still like Postgres! ya know, there's an easier way (imnsho): just import it into a text field, then update the real field with update table tablename set realfield = substring(tempfield,1,20) Not sure it'd be any faster, but hey, it's pretty simple to do. ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster