Re: Proposal - Collation at database level - Mailing list pgsql-hackers

From Radek Strnad
Subject Re: Proposal - Collation at database level
Date
Msg-id 483FFBDE.9040504@gmail.com
Whole thread Raw
In response to Re: Proposal - Collation at database level  (Zdenek Kotala <Zdenek.Kotala@Sun.COM>)
Responses Re: Proposal - Collation at database level  (Zdenek Kotala <Zdenek.Kotala@Sun.COM>)
List pgsql-hackers
Zdenek Kotala wrote:
> Radek Strnad napsal(a):
>
> <snip>
>
>>
>> I'm thinking of dividing the problem into two parts - in beginning
>> pg_collation will contain two functions. One will have hard-coded rules
>> for these basic collations (SQL_CHARACTER, GRAPHIC_IRV, LATIN1, ISO8BIT,
>> UCS_BASIC). It will compare each string character bitwise and guarantee
>> that the implementation will meet the SQL standard implemented in
>> PostgreSQL.
>> Second one will allow the user to use installed system locales. The set
>> of these collations will obviously vary between systems. Catalogs will
>> contain encoding and collation for calling the system locale function.
>> This will allow us to use collations such as en_US.utf8, cs_CZ.iso88592
>> etc. if they will be availible.
>>
>> We will also need to change the way how strings are compared. Regarding
>> the set database collation the right function will be used.
>> http://doxygen.postgresql.org/varlena_8c.html#4c7af81f110f9be0bd8eb2bd99525675 
>>
>>
>> This design will make possible switch to ICU or any other implementation
>> quite simple and will not cause any major rewriting of what I'm coding
>> right now.
>
>
> Collation function is main point here. How you mentioned one will be 
> only wrapper about strcmp and second one about strcoll. (maybe you 
> need four - char/wchar) Which function will be used it is defined in 
> pg_collation catalog by CREATE COLLATION command. But you need specify 
> name of locale for system locales. It means you need attribute for 
> storing locale name.
>
You're right. I've extended pg_collation for system locale columns. In 
the first stage we actually don't need any other catalogs such as 
encoding, etc. and we can build this functionality only on following 
pg_collation catalog. Used collation function (system or built-in) will 
be decided  on existing collation name.

CATALOG(pg_collations, ###)
{   NameData    colname;        /* collation name */   Oid        colschema;        /* collation schema */   NameData
colcharset;   /*  character set specification */   Oid         colexistingcollation; /* existing collation */   bool
   colpadattribute;    /* pad attribute */   bool        colcasesensitive;    /* case sensitive */   bool
colaccent;       /* accent sensitive */   NameData    colsyslccollate;    /* lc_collate */   NameData    colsyslcctype;
/*lc_ctype */   regproc        colfunc;        /* used collation function */
 
} FormData_pg_collations;


>> FormData_pg_collations;
> It would be good to send list of new and modified SQL commands (like 
> CREATE COLLATION) for wide discussion.
>
CREATE COLLATION <collation name> FOR <character set specification> FROM 
<existing collation name> [ <pad characteristic> ] [ <case sensitive> ] 
[ <accent sensitive> ] [ LC_COLLATE <lc_collate> ] [ LC_CTYPE <lc_ctype> ]

<pad characteristic> := NO PAD | PAD SPACE
<case sensitive> := CASE SENSITIVE | CASE INSENSITIVE
<accent sensitive> := ACCENT SENSITIVE | ACCENT INSENSITIVE

Since you can specify order by in select clause there's no need for 
adding ascending and descending type  of collation. They will allways be 
ascending.

DROP COLLATION <collation name>

CREATE DATABASE ... [ COLLATE <collation name> ] ...

ALTER DATABASE ... [ COLLATE <collation name> ] ...
   Any thoughts?
         Radek




pgsql-hackers by date:

Previous
From: "Gurjeet Singh"
Date:
Subject: Re: Core team statement on replication in PostgreSQL
Next
From: "Pavan Deolasee"
Date:
Subject: Re: Avoiding second heap scan in VACUUM