Thread: BUG #17853: COLLATE does not work with numeric column references in ORDER BY

BUG #17853: COLLATE does not work with numeric column references in ORDER BY

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      17853
Logged by:          Branko Radovanovic
Email address:      branko.radovanovic.zg@gmail.com
PostgreSQL version: 13.4
Operating system:   Debian
Description:

The following SQL:

values ('a')
order by 1 collate "C";

...returns an error: SQL Error [42804]: ERROR: collations are not supported
by type integer

In the above query, "1" is not an integer but a column reference, so it
should be treated as well-formed and work the same as with the actual column
label:

values ('a')
order by column1 collate "C";

Best regards,
Branko


On 3/18/23 23:44, PG Bug reporting form wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      17853
> Logged by:          Branko Radovanovic
> Email address:      branko.radovanovic.zg@gmail.com
> PostgreSQL version: 13.4
> Operating system:   Debian
> Description:
> 
> The following SQL:
> 
> values ('a')
> order by 1 collate "C";
> 
> ...returns an error: SQL Error [42804]: ERROR: collations are not supported
> by type integer
> 
> In the above query, "1" is not an integer but a column reference, so it
> should be treated as well-formed and work the same as with the actual column
> label:

There is an argument that this should work.  There is also an argument 
that using numerical column references is not (or rather, is no longer) 
Standard SQL and should not be used.
-- 
Vik Fearing




Re: BUG #17853: COLLATE does not work with numeric column references in ORDER BY

From
"David G. Johnston"
Date:
On Sat, Mar 18, 2023 at 5:56 PM Vik Fearing <vik@postgresfriends.org> wrote:
On 3/18/23 23:44, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference:      17853
> Logged by:          Branko Radovanovic
> Email address:      branko.radovanovic.zg@gmail.com
> PostgreSQL version: 13.4
> Operating system:   Debian
> Description:
>
> The following SQL:
>
> values ('a')
> order by 1 collate "C";
>
> ...returns an error: SQL Error [42804]: ERROR: collations are not supported
> by type integer
>
> In the above query, "1" is not an integer but a column reference, so it
> should be treated as well-formed and work the same as with the actual column
> label:

There is an argument that this should work.  There is also an argument
that using numerical column references is not (or rather, is no longer)
Standard SQL and should not be used.


So we also have a documentation bug for failing to accurately indicate that our treatment here is non-standard.  Given the existing notes that refer to both SQL-92 and SQL:1999 explicitly and make no mention of this I'd have to assume such a material difference falls into the scope of things we document.

David J.

Re: BUG #17853: COLLATE does not work with numeric column references in ORDER BY

From
"David G. Johnston"
Date:
On Sat, Mar 18, 2023 at 6:11 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Sat, Mar 18, 2023 at 5:56 PM Vik Fearing <vik@postgresfriends.org> wrote:
On 3/18/23 23:44, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference:      17853
> Logged by:          Branko Radovanovic
> Email address:      branko.radovanovic.zg@gmail.com
> PostgreSQL version: 13.4
> Operating system:   Debian
> Description:
>
> The following SQL:
>
> values ('a')
> order by 1 collate "C";
>
> ...returns an error: SQL Error [42804]: ERROR: collations are not supported
> by type integer
>
> In the above query, "1" is not an integer but a column reference, so it
> should be treated as well-formed and work the same as with the actual column
> label:

There is an argument that this should work.  There is also an argument
that using numerical column references is not (or rather, is no longer)
Standard SQL and should not be used.


So we also have a documentation bug for failing to accurately indicate that our treatment here is non-standard.  Given the existing notes that refer to both SQL-92 and SQL:1999 explicitly and make no mention of this I'd have to assume such a material difference falls into the scope of things we document.


I'll correct myself - the original bug report is indeed not actually a bug.  We do not document nor promise the expected behavior.  Anything that is more than a simple integer number or a column name is by its nature an "...arbitrary expression formed from input-column values."  In particular, the syntax doesn't allow for any place to attach COLLATE to the number like it does for ASC etc... Though here the number of input columns referenced is zero, you just have a constant, thus pointless, order by expression.  Which is documented as being allowed.

I wouldn't completely discount the possibility of adding a note about this in the documentation, but it hasn't ever come up, the section is long enough as-is, and the error is consistently reported for the case.

David J.

Re: BUG #17853: COLLATE does not work with numeric column references in ORDER BY

From
Branko Radovanovic
Date:
Indeed, my initial instinct was to report this as a documentation bug. COLLATE is a good place to de-support numeric references because ORDER BY 1 COLLATE "C" is then unambiguously an error, rather than a silent fail.

However, this (fairly dumb) SQL will still work:

values ('a', 'b'), ('c', 'd')
order by 1 desc, column2 collate "C"

...so in the grand scheme of things it doesn't really help. Sometimes it's actually easier to implement something than explain (in the doc) how or why it doesn't work (as expected or at all). On the other hand, the benefit of doing either may - granted - be close to zero in this particular case.

B.

On Sun, Mar 19, 2023 at 2:28 AM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Sat, Mar 18, 2023 at 6:11 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Sat, Mar 18, 2023 at 5:56 PM Vik Fearing <vik@postgresfriends.org> wrote:
On 3/18/23 23:44, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference:      17853
> Logged by:          Branko Radovanovic
> Email address:      branko.radovanovic.zg@gmail.com
> PostgreSQL version: 13.4
> Operating system:   Debian
> Description:
>
> The following SQL:
>
> values ('a')
> order by 1 collate "C";
>
> ...returns an error: SQL Error [42804]: ERROR: collations are not supported
> by type integer
>
> In the above query, "1" is not an integer but a column reference, so it
> should be treated as well-formed and work the same as with the actual column
> label:

There is an argument that this should work.  There is also an argument
that using numerical column references is not (or rather, is no longer)
Standard SQL and should not be used.


So we also have a documentation bug for failing to accurately indicate that our treatment here is non-standard.  Given the existing notes that refer to both SQL-92 and SQL:1999 explicitly and make no mention of this I'd have to assume such a material difference falls into the scope of things we document.


I'll correct myself - the original bug report is indeed not actually a bug.  We do not document nor promise the expected behavior.  Anything that is more than a simple integer number or a column name is by its nature an "...arbitrary expression formed from input-column values."  In particular, the syntax doesn't allow for any place to attach COLLATE to the number like it does for ASC etc... Though here the number of input columns referenced is zero, you just have a constant, thus pointless, order by expression.  Which is documented as being allowed.

I wouldn't completely discount the possibility of adding a note about this in the documentation, but it hasn't ever come up, the section is long enough as-is, and the error is consistently reported for the case.

David J.

Branko Radovanovic <branko.radovanovic.zg@gmail.com> writes:
> COLLATE is a good place to de-support numeric references because ORDER BY 1
> COLLATE "C" is then unambiguously an error, rather than a silent fail.

Yup.

> ...so in the grand scheme of things it doesn't really help. Sometimes it's
> actually easier to implement something than explain (in the doc) how or why
> it doesn't work (as expected or at all).

It's not that any specific choice of semantics would be hard to
implement.  It's that the syntax is fundamentally ambiguous, so if
you get too aggressive about saying "we'll resolve ambiguous cases
like this" then you risk queries silently doing something other than
what the user expected, giving rise to a different set of bug reports.

The back story here is that SQL92 said "the argument of ORDER BY is
an output column name or number", but SQL99 reversed course and said
"the argument of ORDER BY is an expression over the input columns".
So "ORDER BY 1" means two completely different things depending on
which spec version you read.  We didn't (and still don't) want to give
up compatibility with SQL92's way, for backwards-compatibility reasons
and because it's such a handy shortcut in many cases.  But we use the
SQL92 interpretation only when the clause satisfies SQL92 exactly.
"COLLATE" wasn't in SQL92; so if we did what you claim we should do
we would be applying SQL92 semantics to a query that isn't even legal
SQL92 syntax, while failing to comply with SQL99 for a query that *is*
legal SQL99.

I like throwing an error better than either of those choices, so
I'm perfectly satisfied with the code's behavior as-is.  What seems
open for debate is whether the documentation needs to address this
case specifically.  I think it's already sufficiently implied by [1],
but perhaps it isn't.

            regards, tom lane

[1] https://www.postgresql.org/docs/current/queries-order.html