Per-column collation, proof of concept - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Per-column collation, proof of concept
Date
Msg-id 1279045531.32647.14.camel@vanquo.pezone.net
Whole thread Raw
Responses Re: Per-column collation, proof of concept
Re: Per-column collation, proof of concept
Re: Per-column collation, proof of concept
List pgsql-hackers
Here is a proof of concept for per-column collation support.

Here is how it works: When creating a table, an optional COLLATE clause
can specify a collation name, which is stored (by OID) in pg_attribute.
This becomes part of the type information and is propagated through the
expression parse analysis, like typmod.  When an operator or function
call is parsed (transformed), the collations of the arguments are
unified, using some rules (like type analysis, but different in detail).
The collations of the function/operator arguments come either from Var
nodes which in turn got them from pg_attribute, or from other
function and operator calls, or you can override them with explicit
COLLATE clauses (not yet implemented, but will work a bit like
RelabelType).  At the end, each function or operator call gets one
collation to use.

The function call itself can then look up the collation using the
fcinfo->flinfo->fn_expr field.  (Works for operator calls, but doesn't
work for sort operations, needs more thought.)

A collation is in this implementation defined as an lc_collate string
and an lc_ctype string.  The implementation of functions interested in
that information, such as comparison operators, or upper and lower
functions, will take the collation OID that is passed in, look up the
locale string, and use the xlocale.h interface (newlocale(),
strcoll_l()) to compute the result.

(Note that the xlocale stuff is only 10 or so lines in this patch.  It
should be feasible to allow other appropriate locale libraries to be
used.)

Loose ends:

- Support function calls (currently only operator calls) (easy)

- Implementation of sort clauses

- Indexing support/integration

- Domain support (should be straightforward)

- Make all expression node types deal with collation information
  appropriately

- Explicit COLLATE clause on expressions

- Caching and not leaking memory of locale lookups

- I have typcollatable to mark which types can accept collation
  information, but perhaps there should also be proicareaboutcollation
  to skip collation resolution when none of the functions in the
  expression tree care.

You can start by reading the collate.sql regression test file to see
what it can do.  Btw., regression tests only work with "make check
MULTIBYTE=UTF8".  And it (probably) only works with glibc for now.


Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: explain.c: why trace PlanState and Plan trees separately?
Next
From: Tom Lane
Date:
Subject: Re: ERROR: argument to pg_get_expr() must come from system catalogs