Thread: trailing junk in numeric literals
I was surprised to find that this doesn't error: => select 100a; a ----- 100 I suspect this and similar cases used to error before aliases without AS were introduced. But now this seems possibly problematic. Should we try to handle this better?
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: > I was surprised to find that this doesn't error: > => select 100a; > a > ----- > 100 > I suspect this and similar cases used to error before aliases without AS > were introduced. But now this seems possibly problematic. Should we > try to handle this better? Meh. I think you'd get more brickbats than kudos if you start insisting on a space there. I'm too lazy to try to decipher the SQL spec right now, but ISTR that it insists on whitespace between a numeric literal and an identifier. So strictly speaking this SQL code is nonstandard anyway. But our lexer has always been forgiving about not requiring space if it's not logically necessary to separate tokens. I doubt trying to change that would improve matters. regards, tom lane
On 2020-12-28 21:54, Tom Lane wrote: > Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: >> I was surprised to find that this doesn't error: >> => select 100a; >> a >> ----- >> 100 > >> I suspect this and similar cases used to error before aliases without AS >> were introduced. But now this seems possibly problematic. Should we >> try to handle this better? > > Meh. I think you'd get more brickbats than kudos if you start insisting > on a space there. > > I'm too lazy to try to decipher the SQL spec right now, but ISTR that > it insists on whitespace between a numeric literal and an identifier. Yeah, non-delimiter tokens are supposed to be separated by delimiter tokens. > So strictly speaking this SQL code is nonstandard anyway. But our > lexer has always been forgiving about not requiring space if it's > not logically necessary to separate tokens. I doubt trying to > change that would improve matters. Well, the idea is to diagnose potential typos better. But if there is no interest, then that's fine.
Hello Peter, My 0.02€: >> So strictly speaking this SQL code is nonstandard anyway. But our >> lexer has always been forgiving about not requiring space if it's >> not logically necessary to separate tokens. I doubt trying to >> change that would improve matters. > > Well, the idea is to diagnose potential typos better. But if there is no > interest, then that's fine. ISTM that silently accepting bogus syntax hides bugs rather than helps users. I'm personaly all for fixing these, especially when I'm said user. My latest catch was: SELECT TIMESTAMP '2020-12-29Z06:16:18'; # 2020-12-29 00:00:00 But: SELECT TIMESTAMPTZ '2020-12-29Z06:16:18'; # 2020-12-29 07:16:18+01 SELECT TIMESTAMP '2020-12-29T06:16:18'; # 2020-12-29 06:16:18 I happen to type a O which is close to 0 for which the shift key is also needed on the French keyboard. This makes the unhelpful: SELECT 12O; # 12 as O I think that the policy should be to help user by detecting mistyped entries, not trying to interpret them out of the norm and expectations. -- Fabien.
On 12/29/20 10:18 AM, Peter Eisentraut wrote: > On 2020-12-28 21:54, Tom Lane wrote: >> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: >>> I was surprised to find that this doesn't error: >>> => select 100a; >>> a >>> ----- >>> 100 >> >>> I suspect this and similar cases used to error before aliases without AS >>> were introduced. But now this seems possibly problematic. Should we >>> try to handle this better? >> >> Meh. I think you'd get more brickbats than kudos if you start insisting >> on a space there. >> >> I'm too lazy to try to decipher the SQL spec right now, but ISTR that >> it insists on whitespace between a numeric literal and an identifier. > > Yeah, non-delimiter tokens are supposed to be separated by delimiter > tokens. > >> So strictly speaking this SQL code is nonstandard anyway. But our >> lexer has always been forgiving about not requiring space if it's >> not logically necessary to separate tokens. I doubt trying to >> change that would improve matters. > > Well, the idea is to diagnose potential typos better. But if there is > no interest, then that's fine. I am in favor of such a change so that we can also accept 1_000_000 which currently parses as "1 AS _000_000" (which also isn't compliant because identifiers cannot start with an underscore, but I don't want to take it that far). It would also allow us to have 0xdead_beef, 0o_777, and 0b1010_0000_1110 without most of it being interpreted as an alias. -- Vik Fearing
On 1/16/21 2:02 PM, Vik Fearing wrote: > I am in favor of such a change so that we can also accept 1_000_000 > which currently parses as "1 AS _000_000" (which also isn't compliant > because identifiers cannot start with an underscore, but I don't want to > take it that far). > > It would also allow us to have 0xdead_beef, 0o_777, and 0b1010_0000_1110 > without most of it being interpreted as an alias. That would be a nice feature. Is it part of the SQL standard? Andreas
On 1/16/21 4:32 PM, Andreas Karlsson wrote: > On 1/16/21 2:02 PM, Vik Fearing wrote: >> I am in favor of such a change so that we can also accept 1_000_000 >> which currently parses as "1 AS _000_000" (which also isn't compliant >> because identifiers cannot start with an underscore, but I don't want to >> take it that far). >> >> It would also allow us to have 0xdead_beef, 0o_777, and 0b1010_0000_1110 >> without most of it being interpreted as an alias. > > That would be a nice feature. Is it part of the SQL standard? Yes, all of that is in the standard. -- Vik Fearing
Vik Fearing <vik@postgresfriends.org> writes: > On 1/16/21 4:32 PM, Andreas Karlsson wrote: >> On 1/16/21 2:02 PM, Vik Fearing wrote: >>> I am in favor of such a change so that we can also accept 1_000_000 >>> which currently parses as "1 AS _000_000" (which also isn't compliant >>> because identifiers cannot start with an underscore, but I don't want to >>> take it that far). >>> It would also allow us to have 0xdead_beef, 0o_777, and 0b1010_0000_1110 >>> without most of it being interpreted as an alias. >> That would be a nice feature. Is it part of the SQL standard? > Yes, all of that is in the standard. Really? Please cite chapter and verse. AFAICS in SQL:2011 5.3 <literal>, a numeric literal can't contain any extraneous characters, just sign, digits, optional decimal point, and optional exponent. Hex and octal literals are certainly not there either. regards, tom lane
On 1/16/21 6:10 PM, Tom Lane wrote: > Vik Fearing <vik@postgresfriends.org> writes: >> On 1/16/21 4:32 PM, Andreas Karlsson wrote: >>> On 1/16/21 2:02 PM, Vik Fearing wrote: >>>> I am in favor of such a change so that we can also accept 1_000_000 >>>> which currently parses as "1 AS _000_000" (which also isn't compliant >>>> because identifiers cannot start with an underscore, but I don't want to >>>> take it that far). >>>> It would also allow us to have 0xdead_beef, 0o_777, and 0b1010_0000_1110 >>>> without most of it being interpreted as an alias. > >>> That would be a nice feature. Is it part of the SQL standard? > >> Yes, all of that is in the standard. > > Really? Please cite chapter and verse. AFAICS in SQL:2011 5.3 <literal>, > a numeric literal can't contain any extraneous characters, just sign, > digits, optional decimal point, and optional exponent. Hex and octal > literals are certainly not there either. With respect, you are looking at a 10-year-old document and I am not. 5.3 <literal> has since been modified. -- Vik Fearing
Vik Fearing <vik@postgresfriends.org> writes: > With respect, you are looking at a 10-year-old document and I am not. > 5.3 <literal> has since been modified. Is a newer version of the spec available online? regards, tom lane