Re: once again, sorting with Unicode - Mailing list pgsql-sql

From Troy
Subject Re: once again, sorting with Unicode
Date
Msg-id 200302201045.h1KAjbGu018120@tksoft.com
Whole thread Raw
In response to once again, sorting with Unicode  (JBJ <postgre@totw.org>)
List pgsql-sql
There are various examples in the example source code section
of the postgres distribution, where you can find code you can
use to write exactly the kind of funtion you need. I can't
immediately include source code from us, but I can include
the gist of how the code works.

The basic idea is to convert the input data to byte values
which are in the right order. If the input data is unicode,
utf8, utf16, or whatever, you have to know what it is, so you
can convert the data to a meaningful byte stream which can be
evaluated just like an array of numbers would be. I.e. remove
bytes which indicate something to the encoding and convert
characters to their one byte values. E.g. if the data is UTF8,
it is one or two bytes long for ISO8859_1 (upto six bytes for
others), one byte for ascii and two bytes for ISO8859_1. You
need to convert it to a one byte long value so comparisons at
byte level will work. For pure unicode you just have to skip
every other byte. 



1. Source code :

... various includes.


PG_FUNCTION_INFO_V1(sample_encoding_func);


Datum sample_encoding_func(PG_FUNCTION_ARGS) {  text * str;  text * result;  size_t len;
  if (PG_ARGISNULL(0))      PG_RETURN_NULL();
  str = PG_GETARG_TEXT_P(0);
  len = VARSIZE(str) - VARHDRSZ;
 ...  do your conversion thing, allocate memory for the   result and return the value, doing error checking as you
go.
}



Add the function to your db:

DROP FUNCTION sample_encoding_func (text);
CREATE FUNCTION sample_encoding_func (text) RETURNS text  AS 'sample_encoding_func.so'  LANGUAGE 'C' WITH
(iscachable,isstrict);

You can create an index with:

create index dummyindex on usertable using btree (sample_encoding_func(username) text_ops);


Troy


> 
> At 20:16 19.2.2003, Troy K wrote:
> >You can generate indexes for your custom functions, though,
> >which will speed things up. This is what I've done, successfully.
> 
> Sounds useful, do you have a demo of such a function?
> 
> I can if all else fails sort the data using PHP but am not too fond of it 
> when I have over 2000 rows or more as will be the case in other tables.
> 
> Thanks all for the answers.
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> 



pgsql-sql by date:

Previous
From: Stephen.Thompson@bmwfin.com
Date:
Subject: Re: VIEW or Stored Proc - Is this even possible?
Next
From: "Troy"
Date:
Subject: Re: once again, sorting with Unicode