Thread: Meridiem markers (was: [BUGS] Incorrect "invalid AM/PM string" error from to_timestamp)
Meridiem markers (was: [BUGS] Incorrect "invalid AM/PM string" error from to_timestamp)
From
"Brendan Jurd"
Date:
Hey folks, Tom pointed out in the parent thread [1] that the error message for bogus AM/PM markers in to_timestamp is pretty lame: ERROR: invalid AM/PM string I agree, and once I started thinking about this, I came up with other gripes concerning the treatment of 12-hour time in to_timestamp, the use of the meridiem markers (AM/PM) in particular. Currently, Postgres accepts four separate flavours for specifying meridiem markers, given by uppercase/lowercase and with/without periods: * am/pm* AM/PM* a.m./p.m.* A.M./P.M. What I find surprising about the implementation is that you must specify the correct formatting pattern for the particular flavour of meridiem marker in your input string, or you get the aforementioned lame error message. Consider: postgres=# select to_timestamp('11:47 pm 27 Sep 2008', 'HH:MI PM DD Mon YYYY'); ERROR: invalid AM/PM string Here it seems to me that Postgres is being unnecessarily pedantic. Yes, the case of the marker differs from the case of the formatting keyword, but is that really grounds for an ERROR? The user's intention is perfectly unambiguous, so why not just accept the input? I would go so far as to say that we should accept any of the 8 valid meridiem markers, regardless of which flavour is indicated by the formatting keyword. Day and month names already work this way. We don't throw an error if a user specifies a mixed-case month name like "Sep" but uses the uppercase formatting keyword "MON". I suspect that the current behaviour isn't so much a deliberate choice to be draconian as it is a consequence of the way the code was developed; as the inverse of to_char(). One way to tidy this up would be to re-implement the meridiem markers using the seq_search functions, i.e., make it work like the day and month names. This would make it easy to accept any flavour of marker, and the error messages thrown for bogus input would then be the same as those for bogus day and month names. Note that all of the above applies equally to the era markers B.C. and A.D. Comments? Cheers, BJ [1] http://archives.postgresql.org/message-id/8148.1222359751@sss.pgh.pa.us
Re: Meridiem markers (was: [BUGS] Incorrect "invalid AM/PM string" error from to_timestamp)
From
Tom Lane
Date:
"Brendan Jurd" <direvus@gmail.com> writes: > One way to tidy this up would be to re-implement the meridiem markers > using the seq_search functions, i.e., make it work like the day and > month names. This would make it easy to accept any flavour of marker, > and the error messages thrown for bogus input would then be the same > as those for bogus day and month names. > Note that all of the above applies equally to the era markers B.C. and A.D. +1 ... making these work like month names makes sense to me. regards, tom lane
Re: Meridiem markers (was: [BUGS] Incorrect "invalid AM/PM string" error from to_timestamp)
From
"Alex Hunsaker"
Date:
On Fri, Sep 26, 2008 at 11:25 AM, Brendan Jurd <direvus@gmail.com> wrote: > One way to tidy this up would be to re-implement the meridiem markers > using the seq_search functions, i.e., make it work like the day and > month names. This would make it easy to accept any flavour of marker, > and the error messages thrown for bogus input would then be the same > as those for bogus day and month names. Yeah if we seq_search then it should be a pretty easy conversion. so +1 However that still leaves the original complaint around (at least IMHO): select to_timestamp('AN', 'AM'); ERROR: invalid AM/PM string select to_timestamp('11:47 PM 27 Sep a2008', 'HH:MI PM DD Mon YYYY'); ERROR: invalid value for "YYYY" in source string Now arguably most to_timestamp calls are going to be short so i can easily look for the YYYY in my string and see what I did wrong (and DETAIL: provides Value must be an integer in the second case)... So really maybe thats good enough I dunno... Personally I find the output of my dumb patch to be better than both above: select to_timestamp('AN', 'AM'); ERROR: invalid AM/PM string for 'AN' And we could improve that by only showing node->key->len chars. And make that work for both AM/PM and the others YYYY etc Then again maybe its not worth it?
Re: Meridiem markers (was: [BUGS] Incorrect "invalid AM/PM string" error from to_timestamp)
From
Tom Lane
Date:
"Alex Hunsaker" <badalex@gmail.com> writes: > However that still leaves the original complaint around (at least IMHO): > select to_timestamp('AN', 'AM'); > ERROR: invalid AM/PM string > select to_timestamp('11:47 PM 27 Sep a2008', 'HH:MI PM DD Mon YYYY'); > ERROR: invalid value for "YYYY" in source string Yeah, it would be a lot better if it said ERROR: invalid value for "YYYY": "a2008" The DETAIL is good too, but it's no substitute for showing the exact substring that the code is unhappy about. regards, tom lane
Re: Meridiem markers (was: [BUGS] Incorrect "invalid AM/PM string" error from to_timestamp)
From
"Brendan Jurd"
Date:
On Sat, Sep 27, 2008 at 4:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Alex Hunsaker" <badalex@gmail.com> writes: >> However that still leaves the original complaint around (at least IMHO): > >> select to_timestamp('AN', 'AM'); >> ERROR: invalid AM/PM string > >> select to_timestamp('11:47 PM 27 Sep a2008', 'HH:MI PM DD Mon YYYY'); >> ERROR: invalid value for "YYYY" in source string > > Yeah, it would be a lot better if it said > > ERROR: invalid value for "YYYY": "a2008" > > The DETAIL is good too, but it's no substitute for showing the exact > substring that the code is unhappy about. > I agree. And with my proposed changes to the meridiem/era marker code, it will be much easier to improve the error messages in a consistent way. I'll work on a patch for the Nov commitfest. Cheers, BJ
Re: Meridiem markers (was: [BUGS] Incorrect "invalid AM/PM string" error from to_timestamp)
From
Bruce Momjian
Date:
Brendan Jurd wrote: > On Sat, Sep 27, 2008 at 4:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > "Alex Hunsaker" <badalex@gmail.com> writes: > >> However that still leaves the original complaint around (at least IMHO): > > > >> select to_timestamp('AN', 'AM'); > >> ERROR: invalid AM/PM string > > > >> select to_timestamp('11:47 PM 27 Sep a2008', 'HH:MI PM DD Mon YYYY'); > >> ERROR: invalid value for "YYYY" in source string > > > > Yeah, it would be a lot better if it said > > > > ERROR: invalid value for "YYYY": "a2008" > > > > The DETAIL is good too, but it's no substitute for showing the exact > > substring that the code is unhappy about. > > > > I agree. And with my proposed changes to the meridiem/era marker > code, it will be much easier to improve the error messages in a > consistent way. > > I'll work on a patch for the Nov commitfest. Brendan, did you ever complete this patch? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Re: Meridiem markers (was: [BUGS] Incorrect "invalid AM/PM string" error from to_timestamp)
From
"Brendan Jurd"
Date:
On Thu, Jan 8, 2009 at 2:10 PM, Bruce Momjian <bruce@momjian.us> wrote: > Brendan, did you ever complete this patch? > I didn't, no. I still intend on doing work in this area, but obviously it will have to be in the 8.5 cycle. Cheers, BJ
Re: Meridiem markers (was: [BUGS] Incorrect "invalid AM/PM string" error from to_timestamp)
From
Bruce Momjian
Date:
Brendan Jurd wrote: > On Thu, Jan 8, 2009 at 2:10 PM, Bruce Momjian <bruce@momjian.us> wrote: > > Brendan, did you ever complete this patch? > > > > I didn't, no. I still intend on doing work in this area, but > obviously it will have to be in the 8.5 cycle. OK, what is the TODO? It is more than AM/PM, right? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Re: Meridiem markers (was: [BUGS] Incorrect "invalid AM/PM string" error from to_timestamp)
From
"Brendan Jurd"
Date:
On Thu, Jan 8, 2009 at 2:20 PM, Bruce Momjian <bruce@momjian.us> wrote: > OK, what is the TODO? It is more than AM/PM, right? > I see it happening in two stages. Stage 1 is updating the AM/PM parse code to use the seq_search technique, which may involve some minor refactoring around seq_search itself. This will get us the relaxed validation rules I was talking about before (i.e., "am" is an acceptable spelling for "AM" and vice versa). Stage 2 is improving the error message reporting for all the bits that use seq_search. So I would probably make two TODOs, descriptions could be like: * Relax validation for AM/PM markers in to_timestamp() by using seq_search * Make to_timestamp() error messages more specific, and display the actual value which caused the error. Cheers, BJ
Re: Meridiem markers (was: [BUGS] Incorrect "invalid AM/PM string" error from to_timestamp)
From
"Brendan Jurd"
Date:
On Sat, Sep 27, 2008 at 4:25 AM, Brendan Jurd <direvus@gmail.com> wrote: > Currently, Postgres accepts four separate flavours for specifying > meridiem markers, given by uppercase/lowercase and with/without > periods: > > * am/pm > * AM/PM > * a.m./p.m. > * A.M./P.M. > > I would go so far as to say that we should accept any of the 8 valid > meridiem markers, regardless of which flavour is indicated by the > formatting keyword. > > Day and month names already work this way. We don't throw an error if > a user specifies a mixed-case month name like "Sep" but uses the > uppercase formatting keyword "MON". I've been thinking further about this lately, and whilst the month and day name tokens aren't fussy about *case*, they do make a distinction about *length*. So, while MON will match "Sep", "SEP" and "sep" just fine, it will have issues with "September" (it will match the first three characters as "Sep" and then leave the remaining characters "tember" to bork up the next token). Likewise, MONTH will not match "Sep", it needs the full month name. I think, for to_timestamp(), it's important that the user have a solid idea of how many characters each formatting token wants to consume. With the am/pm and bc/ad markers, we've got two possibilities for length; without periods (2 characters) and with periods (4 characters). Having the 2-character token match against a 4-character string might cause more confusion than convenience. It may make more sense to keep the different lengths separate, so that a 2-character token will match any of "am", "pm", "AM", "PM", and a 4-character token will match any of "a.m.", "p.m.", "A.M.", "P.M.". Comments? Cheers, BJ
Re: Meridiem markers (was: [BUGS] Incorrect "invalid AM/PM string" error from to_timestamp)
From
"Brendan Jurd"
Date:
On Sat, Sep 27, 2008 at 4:25 AM, Brendan Jurd <direvus@gmail.com> wrote: > One way to tidy this up would be to re-implement the meridiem markers > using the seq_search functions, i.e., make it work like the day and > month names. This would make it easy to accept any flavour of marker, > and the error messages thrown for bogus input would then be the same > as those for bogus day and month names. > >From the better-late-than-never department, comes a patch to improve the handling of AM/PM and AD/BC markers in to_timestamp(), and up the ante on error reporting for various kinds of invalid input. The only difference as far as parsing goes is that we no longer worry about the case of AM/PM and AD/BC. As long as you use the right variation for with/without periods, it will parse. Internally, the code now uses seq_search to parse the markers, so they work in much the same way as month and day names do. I improved the error messages for when a seq_search fails, and when the code fails to parse an integer. Some examples of the new error messages: postgres=# select to_timestamp('Tue 20 Foo 2009 11:39 PM', 'Dy DD Mon YYYY HH:MI PM'); ERROR: invalid value "Foo" for "Mon" DETAIL: The given value did not match any of the allowed values for this field. postgres=# select to_timestamp('Tue 20 Jan 2009 11:39 pn', 'Dy DD Mon YYYY HH:MI PM'); ERROR: invalid value "PN" for "PM" DETAIL: The given value did not match any of the allowed values for this field. postgres=# select to_timestamp('Tue 20 Jan 2009 23:39', 'Dy DD Mon YYYY HH:MI'); ERROR: hour "23" is invalid for the 12-hour clock HINT: Use the 24-hour clock, or give an hour between 1 and 12. postgres=# select to_timestamp('Tue 20 Jan 2009 xx:39 pm', 'Dy DD Mon YYYY HH:MI PM'); ERROR: invalid value "xx" for "HH" DETAIL: Value must be an integer. This resolves TODO item "Improve to_timestamp() handling of AM/PM, and error messages". I've added the patch to the 2009 commitfest. Cheers, BJ doc/src/sgml/func.sgml | 24 +-- src/backend/utils/adt/formatting.c | 235 ++++++++++++++++----------------- src/test/regress/expected/horology.out | 10 - src/test/regress/sql/horology.sql | 4 4 files changed, 137 insertions(+), 136 deletions(-)
Attachment
Re: Meridiem markers (was: [BUGS] Incorrect "invalid AM/PM string" error from to_timestamp)
From
Alex Hunsaker
Date:
On Mon, Jan 19, 2009 at 15:26, Brendan Jurd <direvus@gmail.com> wrote: > From the better-late-than-never department, comes a patch to improve > the handling of AM/PM and AD/BC markers in to_timestamp(), and up the > ante on error reporting for various kinds of invalid input. Ok cool I tried this out and gave it a look over. Looks good to me, only a few nits below: -the various (only moved not added) strncpy(...) copy[len] = '\0'; just use strlcpy? -it seemed we were going to do the error messages the other way around "invalid value for "YYYY" in "a009" instead of "invalid value "a009" for "YYYY" but I think I like the latter (what you did in your patch) better anyways... so never mind
Re: Meridiem markers (was: [BUGS] Incorrect "invalid AM/PM string" error from to_timestamp)
From
Brendan Jurd
Date:
On Mon, Jan 26, 2009 at 8:32 AM, Alex Hunsaker <badalex@gmail.com> wrote: > > -the various (only moved not added) > strncpy(...) > copy[len] = '\0'; > > just use strlcpy? > Thanks for having a look Alex. strlcpy would be easier, but I thought there might be portability concerns re its non-entirely-standard status. A quick grep through the backend code shows that strlcpy and strncpy are both in use, with neither having a clear majority. I used strncpy because it is more prevalent within src/backend/utils/adt. Happy to use strlcpy if it is acceptable though. Anybody else have an opinion on this? Cheers, BJ
Re: Meridiem markers (was: [BUGS] Incorrect "invalid AM/PM string" error from to_timestamp)
From
Tom Lane
Date:
Brendan Jurd <direvus@gmail.com> writes: > A quick grep through the backend code shows that strlcpy and strncpy > are both in use, with neither having a clear majority. I used strncpy > because it is more prevalent within src/backend/utils/adt. strncpy is generally deprecated; any remaining uses you find of it are probably only there for lack of round tuits. Use strlcpy in new code, unless there's a pretty strong argument that strncpy is actually clearer for a given usage. (No, it's not a portability issue, because we have our own copy for platforms without it; see src/port/.) regards, tom lane
Re: Meridiem markers (was: [BUGS] Incorrect "invalid AM/PM string" error from to_timestamp)
From
Brendan Jurd
Date:
On Tue, Jan 27, 2009 at 3:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Brendan Jurd <direvus@gmail.com> writes: >> A quick grep through the backend code shows that strlcpy and strncpy >> are both in use, with neither having a clear majority. I used strncpy >> because it is more prevalent within src/backend/utils/adt. > > strncpy is generally deprecated; any remaining uses you find of it > are probably only there for lack of round tuits. Use strlcpy in new > code, unless there's a pretty strong argument that strncpy is actually > clearer for a given usage. > Thanks for clearing that up Tom. I will update the patch to use strlcpy shortly. Cheers, BJ
Re: Meridiem markers (was: [BUGS] Incorrect "invalid AM/PM string" error from to_timestamp)
From
Brendan Jurd
Date:
On Tue, Jan 27, 2009 at 3:25 PM, Brendan Jurd <direvus@gmail.com> wrote: > On Tue, Jan 27, 2009 at 3:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> strncpy is generally deprecated; any remaining uses you find of it >> are probably only there for lack of round tuits. Use strlcpy in new >> code, unless there's a pretty strong argument that strncpy is actually >> clearer for a given usage. >> > > Thanks for clearing that up Tom. I will update the patch to use > strlcpy shortly. > Okay, I've attached a new version of the patch with strlcpy used in from_char_set_int_len() and from_char_seq_search(). I've also included an incremental patch which shows the differences between version 0 and version 1. Cheers, BJ
Attachment
Re: Re: Meridiem markers (was: [BUGS] Incorrect "invalid AM/PM string" error from to_timestamp)
From
Bruce Momjian
Date:
Brendan Jurd wrote: > On Tue, Jan 27, 2009 at 3:25 PM, Brendan Jurd <direvus@gmail.com> wrote: > > On Tue, Jan 27, 2009 at 3:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> > >> strncpy is generally deprecated; any remaining uses you find of it > >> are probably only there for lack of round tuits. Use strlcpy in new > >> code, unless there's a pretty strong argument that strncpy is actually > >> clearer for a given usage. > >> > > > > Thanks for clearing that up Tom. I will update the patch to use > > strlcpy shortly. > > > > Okay, I've attached a new version of the patch with strlcpy used in > from_char_set_int_len() and from_char_seq_search(). I've also > included an incremental patch which shows the differences between > version 0 and version 1. I consider this a bug fix so I have applied it to CVS HEAD, even though the patch arrived after the commit fest started. Thanks. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +