* Oleg Bartunov <oleg@sai.msu.su> [20070420 11:32]:
> >If I understand it correctly such a dictionary would require to write
> >a custom C component - is that correct? Or could I get away with
> >writing a plpgsql function that does the above and hooking that
> >somehow into the tsearch2 config?
>
> You need to write C-function, see example in
> http://www.sai.msu.su/~megera/postgres/fts/doc/fts-intdict-xmp.html
Thanks.
My colleague who speaks more C than me came up with the code below
which works fine for us. Will the memory allocated for lexeme be freed
by the caller?
Til
/*
* Dictionary for partials of a word, ie. foo => {f,fo,foo}
*
* Based on the tsearch2/gendict/config.sh generator
*
* Author: Sean Treadway
*
* This code is released under the terms of the PostgreSQL License.
*/
#include "postgres.h"
#include "dict.h"
#include "common.h"
#include "subinclude.h"
#include "ts_locale.h"
#define is_utf8_continuation(c) ((unsigned char)(c) >= 0x80 && (unsigned char)(c) <= 0xBF)
PG_FUNCTION_INFO_V1(dlexize_partial);
Datum dlexize_partial(PG_FUNCTION_ARGS);
Datum
dlexize_partial(PG_FUNCTION_ARGS) {
char* in = (char*)PG_GETARG_POINTER(1);
char* utxt = pnstrdup(in, PG_GETARG_INT32(2)); /* palloc */
char* txt = lowerstr(utxt); /* palloc */
int txt_len = strlen(txt);
int results = 0;
int i = 0;
/* may overallocate, that's ok */
TSLexeme *res = palloc(sizeof(TSLexeme)*(txt_len+1));
for (i = 1; i <= txt_len; i++) {
/* skip UTF8 control codes until EOS */
if (!is_utf8_continuation(txt[i])) {
res[results++].lexeme = pnstrdup(txt, i);
}
}
res[results].lexeme=NULL;
pfree(utxt);
pfree(txt);
/* Receiver must free res memory and res[].lexeme */
PG_RETURN_POINTER(res);
}