Re: Proposal - Collation at database level - Mailing list pgsql-hackers

From Zdenek Kotala
Subject Re: Proposal - Collation at database level
Date
Msg-id 48418F95.2000007@sun.com
Whole thread Raw
In response to Re: Proposal - Collation at database level  (Radek Strnad <radek.strnad@gmail.com>)
List pgsql-hackers
Radek Strnad napsal(a):
> Zdenek Kotala wrote:
>> Radek Strnad napsal(a):
>>
>> <snip>
>>
>>>
>>> I'm thinking of dividing the problem into two parts - in beginning
>>> pg_collation will contain two functions. One will have hard-coded rules
>>> for these basic collations (SQL_CHARACTER, GRAPHIC_IRV, LATIN1, ISO8BIT,
>>> UCS_BASIC). It will compare each string character bitwise and guarantee
>>> that the implementation will meet the SQL standard implemented in
>>> PostgreSQL.
>>> Second one will allow the user to use installed system locales. The set
>>> of these collations will obviously vary between systems. Catalogs will
>>> contain encoding and collation for calling the system locale function.
>>> This will allow us to use collations such as en_US.utf8, cs_CZ.iso88592
>>> etc. if they will be availible.
>>>
>>> We will also need to change the way how strings are compared. Regarding
>>> the set database collation the right function will be used.
>>> http://doxygen.postgresql.org/varlena_8c.html#4c7af81f110f9be0bd8eb2bd99525675 
>>>
>>>
>>> This design will make possible switch to ICU or any other implementation
>>> quite simple and will not cause any major rewriting of what I'm coding
>>> right now.
>>
>>
>> Collation function is main point here. How you mentioned one will be 
>> only wrapper about strcmp and second one about strcoll. (maybe you 
>> need four - char/wchar) Which function will be used it is defined in 
>> pg_collation catalog by CREATE COLLATION command. But you need specify 
>> name of locale for system locales. It means you need attribute for 
>> storing locale name.
>>
> You're right. I've extended pg_collation for system locale columns. In 
> the first stage we actually don't need any other catalogs such as 
> encoding, etc. and we can build this functionality only on following 
> pg_collation catalog. Used collation function (system or built-in) will 
> be decided  on existing collation name.
> 
> CATALOG(pg_collations, ###)
> {
>    NameData    colname;        /* collation name */
>    Oid        colschema;        /* collation schema */
>    NameData   colcharset;    /*  character set specification */
>    Oid         colexistingcollation; /* existing collation */
>    bool        colpadattribute;    /* pad attribute */
>    bool        colcasesensitive;    /* case sensitive */
>    bool        colaccent;        /* accent sensitive */
>    NameData    colsyslccollate;    /* lc_collate */
>    NameData    colsyslcctype; /* lc_ctype */
>    regproc        colfunc;        /* used collation function */
> } FormData_pg_collations;
> 
> 
>>> FormData_pg_collations;
>> It would be good to send list of new and modified SQL commands (like 
>> CREATE COLLATION) for wide discussion.
>>
> CREATE COLLATION <collation name> FOR <character set specification> FROM 
> <existing collation name> [ <pad characteristic> ] [ <case sensitive> ] 
> [ <accent sensitive> ] [ LC_COLLATE <lc_collate> ] [ LC_CTYPE <lc_ctype> ]
> 
> <pad characteristic> := NO PAD | PAD SPACE
> <case sensitive> := CASE SENSITIVE | CASE INSENSITIVE
> <accent sensitive> := ACCENT SENSITIVE | ACCENT INSENSITIVE
> 
> Since you can specify order by in select clause there's no need for 
> adding ascending and descending type  of collation. They will allways be 
> ascending.
> 
> DROP COLLATION <collation name>
> 
> CREATE DATABASE ... [ COLLATE <collation name> ] ...
> 
> ALTER DATABASE ... [ COLLATE <collation name> ] ...
> 
>

I think catalog is good. Maybe attributes names colsyslccollate and 
colsyslcctype should be more generic because they could be shared with ICU.

But collation function should be specified in CREATE COLLATION command.
Maybe CREATE COLLATION .... [STRCOL <fn name>]

    Zdenek




pgsql-hackers by date:

Previous
From: "Pavel Stehule"
Date:
Subject: Re: Packages in oracle Style
Next
From: "David E. Wheeler"
Date:
Subject: Re: Overhauling GUCS