Home > mailing lists

Re: How to pass around collation information - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: How to pass around collation information
Date	May 28, 2010 14:22:30
Msg-id	4BFFFBBD.9000609@enterprisedb.com Whole thread Raw
In response to	How to pass around collation information (Peter Eisentraut <peter_e@gmx.net>)
Responses	Re: How to pass around collation information
List	pgsql-hackers

Tree view

On 28/05/10 19:27, Peter Eisentraut wrote:
> I have been thinking about this collation support business a bit.
> Ignoring for the moment where we would get the actual collation routines
> from, I wonder how we are going to pass this information around in the
> system.  Someone declares a collation on a column in a table, and
> somehow this information needs to arrive in bttextcmp() and friends.

Yes. Comparison operators need it, as do functions like isalpha().

> Also, functions that take in a string and return one (e.g., substring),
> need to take in this information and return it back out.  How should
> this work?

Hmm, I don't see what substring would need collation for. And it 
certainly shouldn't be returning it. Collation is a property of the 
comparison operators (and isalpha etc.), and the planner needs to deduce 
the right collation for each such operation in the query. That involves 
looking at the tables and columns involved, as well as per-user 
information and any explicit COLLATE clauses in the query, but all that 
happens at plan-time.

> Option 1, make it part of the datum.  That way it will pass through the
> system just fine, but it would waste a lot of storage and break just
> about everything that operates on string types now, as well as
> pg_upgrade.  So that's probably out.

It's also fundamentally wrong, collation is not a property of a datum 
but of the operation.

> Option 2, invent some new mechanism that accompanies a datum or a type
> whereever it goes.  Kind of like typmod, but not really.  Then the
> collation information would presumably be made available to functions
> through the fmgr interface.  The binary representation of data values
> stays the same.

Something like that. I'm thinking that bttextcmp() and friends will 
simply take an extra argument indicating the collation, and we'll teach 
the operator / operator class infrastructure about that too.

One way to approach this is to realize that it's already possible to use 
multiple collations in a database. You just have to define separate < = > operators and operator classes for every
collation,and change all 

your queries to use the right operator depending on the desired 
collation everywhere where you use < = > (including ORDER BYs, with the 
USING <operator> syntax). The behavior is exactly what we want, it's 
just completely inpractical, so we need something to do the same in a 
less cumbersome way.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

pgsql-hackers by date:

From: alvherre
Date: 28 May 2010, 14:05:50
Subject: Re: How to pass around collation information

From: Heikki Linnakangas
Date: 28 May 2010, 14:27:01
Subject: Re: functional call named notation clashes with SQL feature

Re: How to pass around collation information - Mailing list pgsql-hackers

Previous

Next