Thread: cast to domain with default collation issue.

cast to domain with default collation issue.

From
jian he
Date:

CREATE DOMAIN testdomain AS text;

 --asume the default collation is as per show LC_COLLATE;

– on my pc, it is C.UTF-8.

--So the testdomain will be collation "C.UTF-8"

------------

=> \d collate_test1

            Table "test.collate_test1"

 Column |  Type   | Collation | Nullable | Default

--------+---------+-----------+----------+---------

 a      | integer |           |          |

 b      | text    | en-x-icu  | not null |

=> \d collate_test2

            Table "test.collate_test2"

 Column |  Type   | Collation | Nullable | Default

--------+---------+-----------+----------+---------

 a      | integer |           |          |

 b      | text    | sv-x-icu  |          |

=> \d collate_test3

            Table "test.collate_test3"

 Column |  Type   | Collation | Nullable | Default

--------+---------+-----------+----------+---------

 a      | integer |           |          | b      | text    | C         |          |

-----------------------------------------------

My guess is that the following should be the same. Since the same content in the end will be cast to the same collation. However the following output contradicts with my understanding. 


SELECT a, b::testdomain FROM collate_test1 ORDER BY 2;

/*

 +---+-----+

| a |  b  |

+---+-----+

| 1 | abc |

| 4 | ABC |

| 2 | äbc |

| 3 | bbc |

+---+-----+

 */

SELECT a, b::testdomain FROM collate_test2 ORDER BY 2;

/*

 +---+-----+

| a |  b  |

+---+-----+

| 1 | abc |

| 4 | ABC |

| 3 | bbc |

| 2 | äbc |

+---+-----+

 */

SELECT a, b::testdomain FROM collate_test3 ORDER BY 2;

/*

 +---+-----+

| a |  b  |

+---+-----+

| 4 | ABC |

| 1 | abc |

| 3 | bbc |

| 2 | äbc |

+---+-----+

 */



--
 I recommend David Deutsch's <<The Beginning of Infinity>>

  Jian


Re: cast to domain with default collation issue.

From
"David G. Johnston"
Date:
On Monday, May 23, 2022, jian he <jian.universality@gmail.com> wrote:

CREATE DOMAIN testdomain AS text;

 --asume the default collation is as per show LC_COLLATE;

– on my pc, it is C.UTF-8.

--So the testdomain will be collation "C.UTF-8"

------------

=> \d collate_test1

            Table "test.collate_test1"

 Column |  Type   | Collation | Nullable | Default

--------+---------+-----------+----------+---------

 a      | integer |           |          |

 b      | text    | en-x-icu  | not null |

-----------------------------------------------

My guess is that the following should be the same.



My reading of the docs say this is consistent with outcome #2.


 David J.

Re: cast to domain with default collation issue.

From
Tom Lane
Date:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Monday, May 23, 2022, jian he <jian.universality@gmail.com> wrote:
>> CREATE DOMAIN testdomain AS text;
>> --asume the default collation is as per show LC_COLLATE;
>> – on my pc, it is C.UTF-8.
>> --So the testdomain will be collation "C.UTF-8"

> My reading of the docs say this is consistent with outcome #2.
> https://www.postgresql.org/docs/current/collation.html

Yeah.  The comments in parse_collate.c are clear that this behavior is
intentional:

        case T_CoerceToDomain:
            {
                /*
                 * If the domain declaration included a non-default COLLATE
                 * spec, then use that collation as the output collation of
                 * the coercion.  Otherwise allow the input collation to
                 * bubble up.  (The input should be of the domain's base type,
                 * therefore we don't need to worry about it not being
                 * collatable when the domain is.)
                 */

Perhaps this should be documented more clearly, but it's not obviously
wrong.  If the domain declaration doesn't include an explicit COLLATE
then casting to the domain doesn't create an explicit collation
requirement.  (That is, the domain *doesn't* have a specific
collation attached to it, any more than type text does.)

            regards, tom lane



Re: cast to domain with default collation issue.

From
Tom Lane
Date:
I wrote:
> Perhaps this should be documented more clearly, but it's not obviously
> wrong.  If the domain declaration doesn't include an explicit COLLATE
> then casting to the domain doesn't create an explicit collation
> requirement.  (That is, the domain *doesn't* have a specific
> collation attached to it, any more than type text does.)

Perhaps we could improve matters like this?

diff --git a/doc/src/sgml/ref/create_domain.sgml b/doc/src/sgml/ref/create_domain.sgml
index 81a8924926..e4b856d630 100644
--- a/doc/src/sgml/ref/create_domain.sgml
+++ b/doc/src/sgml/ref/create_domain.sgml
@@ -94,7 +94,8 @@ CREATE DOMAIN <replaceable class="parameter">name</replaceable> [ AS ] <replacea
       <listitem>
        <para>
         An optional collation for the domain.  If no collation is
-        specified, the underlying data type's default collation is used.
+        specified, the domain has the same collation behavior as its
+        underlying data type.
         The underlying type must be collatable if <literal>COLLATE</literal>
         is specified.
        </para>


            regards, tom lane



Re: cast to domain with default collation issue.

From
jian he
Date:

Otherwise, all input expressions must have the same implicit collation derivation or the default collation. If any non-default collation is present, that is the result of the collation combination. Otherwise, the result is the default collation.

I think the above quote part can be used to explain the  following examples.
CREATE TABLE test1 ( a text COLLATE "de_DE", b text COLLATE "es_ES", c text );
SELECT a < 'foo' FROM test1; 
SELECT c < 'foo' FROM test1;
But the non-default seems not that correct for me. Like a column if it does not mention anything, then the default value is null. So
create table test111( a tex
t) The default collation for column a is the same as the output of show lc_collate.
so there is no non-default? 


 

On Tue, May 24, 2022 at 10:43 AM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Monday, May 23, 2022, jian he <jian.universality@gmail.com> wrote:

CREATE DOMAIN testdomain AS text;

 --asume the default collation is as per show LC_COLLATE;

– on my pc, it is C.UTF-8.

--So the testdomain will be collation "C.UTF-8"

------------

=> \d collate_test1

            Table "test.collate_test1"

 Column |  Type   | Collation | Nullable | Default

--------+---------+-----------+----------+---------

 a      | integer |           |          |

 b      | text    | en-x-icu  | not null |

-----------------------------------------------

My guess is that the following should be the same.



My reading of the docs say this is consistent with outcome #2.


 David J.



--
 I recommend David Deutsch's <<The Beginning of Infinity>>

  Jian


Re: cast to domain with default collation issue.

From
"David G. Johnston"
Date:
Please don’t top-post.

On Tuesday, May 24, 2022, jian he <jian.universality@gmail.com> wrote:

Otherwise, all input expressions must have the same implicit collation derivation or the default collation. If any non-default collation is present, that is the result of the collation combination. Otherwise, the result is the default collation.

I think the above quote part can be used to explain the  following examples.
CREATE TABLE test1 ( a text COLLATE "de_DE", b text COLLATE "es_ES", c text );
SELECT a < 'foo' FROM test1; 
SELECT c < 'foo' FROM test1;
But the non-default seems not that correct for me. Like a column if it does not mention anything, then the default value is null. So
create table test111( a tex
t) The default collation for column a is the same as the output of show lc_collate.
so there is no non-default? 

I’m not following the point you are trying to make.  table111.a contributes the default collation for any expression needing a collation implicitly resolved.

David J.
 

Re: cast to domain with default collation issue.

From
jian he
Date:

postgresql 15 manual parts:

Otherwise, all input expressions must have the same implicit collation derivation or the default collation. If any non-default collation is present, that is the result of the collation combination. Otherwise, the result is the default collation.

For example, consider this table definition:

CREATE TABLE test1 (    a text COLLATE "de_DE",    b text COLLATE "es_ES",    ...
);

Then in

SELECT a < 'foo' FROM test1;

the < comparison is performed according to de_DE rules, because the expression combines an implicitly derived collation with the default collation. 

 query:  SELECT a < 'foo' FROM test1;
is an example of {{If any non-default collation is present,  that is the result of the collation combination. }}

So it should be something like {{ if any side of expression don't have implicit derived collation is present, that is the result of the collation combination}

I personally feel wording non-default may not be that correct. Because if the column is text then it automatically at least has default collation.

see manual quote about default collation:
The collation of an expression can be the default collation, which means the locale settings defined for the database. It is also possible for an expression's collation to be indeterminate. In such cases, ordering operations and other operations that need to know the collation will fail.






On Wed, May 25, 2022 at 12:08 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
Please don’t top-post.

On Tuesday, May 24, 2022, jian he <jian.universality@gmail.com> wrote:

Otherwise, all input expressions must have the same implicit collation derivation or the default collation. If any non-default collation is present, that is the result of the collation combination. Otherwise, the result is the default collation.

I think the above quote part can be used to explain the  following examples.
CREATE TABLE test1 ( a text COLLATE "de_DE", b text COLLATE "es_ES", c text );
SELECT a < 'foo' FROM test1; 
SELECT c < 'foo' FROM test1;
But the non-default seems not that correct for me. Like a column if it does not mention anything, then the default value is null. So
create table test111( a tex
t) The default collation for column a is the same as the output of show lc_collate.
so there is no non-default? 

I’m not following the point you are trying to make.  table111.a contributes the default collation for any expression needing a collation implicitly resolved.

David J.
 


--
 I recommend David Deutsch's <<The Beginning of Infinity>>

  Jian


Re: cast to domain with default collation issue.

From
"David G. Johnston"
Date:


On Wednesday, May 25, 2022, jian he <jian.universality@gmail.com> wrote:

I personally feel wording non-default may not be that correct. Because if the column is text then it automatically at least has default collation. 

Non-default means “a value that is not the default value”.

David J.
 

Re: cast to domain with default collation issue.

From
"David G. Johnston"
Date:
On Tue, May 24, 2022 at 7:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> Perhaps this should be documented more clearly, but it's not obviously
> wrong.  If the domain declaration doesn't include an explicit COLLATE
> then casting to the domain doesn't create an explicit collation
> requirement.  (That is, the domain *doesn't* have a specific
> collation attached to it, any more than type text does.)

Perhaps we could improve matters like this?

diff --git a/doc/src/sgml/ref/create_domain.sgml b/doc/src/sgml/ref/create_domain.sgml
index 81a8924926..e4b856d630 100644
--- a/doc/src/sgml/ref/create_domain.sgml
+++ b/doc/src/sgml/ref/create_domain.sgml
@@ -94,7 +94,8 @@ CREATE DOMAIN <replaceable class="parameter">name</replaceable> [ AS ] <replacea
       <listitem>
        <para>
         An optional collation for the domain.  If no collation is
-        specified, the underlying data type's default collation is used.
+        specified, the domain has the same collation behavior as its
+        underlying data type.
         The underlying type must be collatable if <literal>COLLATE</literal>
         is specified.
        </para>


+1

The lack of any explicitness pushes evaluation down to the base type - which is a behavioral thing as opposed to some kind of attribute it possesses.

David J.

Re: cast to domain with default collation issue.

From
Tom Lane
Date:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Tue, May 24, 2022 at 7:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Perhaps we could improve matters like this?

>> -        specified, the underlying data type's default collation is used.
>> +        specified, the domain has the same collation behavior as its
>> +        underlying data type.

> +1

Pushed then, thanks for thinking about it.

            regards, tom lane