Re: tsearch2 dictionary that indexes substrings? - Mailing list pgsql-general

From Tilmann Singer
Subject Re: tsearch2 dictionary that indexes substrings?
Date
Msg-id 20070423171659.GB27485@tils.net
Whole thread Raw
In response to Re: tsearch2 dictionary that indexes substrings?  (Oleg Bartunov <oleg@sai.msu.su>)
Responses Re: tsearch2 dictionary that indexes substrings?
List pgsql-general
* Oleg Bartunov <oleg@sai.msu.su> [20070420 11:32]:
> >If I understand it correctly such a dictionary would require to write
> >a custom C component - is that correct? Or could I get away with
> >writing a plpgsql function that does the above and hooking that
> >somehow into the tsearch2 config?
>
> You need to write C-function, see example in
> http://www.sai.msu.su/~megera/postgres/fts/doc/fts-intdict-xmp.html

Thanks.

My colleague who speaks more C than me came up with the code below
which works fine for us. Will the memory allocated for lexeme be freed
by the caller?



Til





/*
 * Dictionary for partials of a word, ie. foo => {f,fo,foo}
 *
 * Based on the tsearch2/gendict/config.sh generator
 *
 * Author: Sean Treadway
 *
 * This code is released under the terms of the PostgreSQL License.
 */
#include "postgres.h"

#include "dict.h"
#include "common.h"

#include "subinclude.h"
#include "ts_locale.h"

#define is_utf8_continuation(c) ((unsigned char)(c) >= 0x80 && (unsigned char)(c) <= 0xBF)

PG_FUNCTION_INFO_V1(dlexize_partial);
Datum dlexize_partial(PG_FUNCTION_ARGS);
Datum
dlexize_partial(PG_FUNCTION_ARGS) {
  char*  in = (char*)PG_GETARG_POINTER(1);

  char*  utxt = pnstrdup(in, PG_GETARG_INT32(2)); /* palloc */
  char*  txt = lowerstr(utxt);                    /* palloc */
  int    txt_len = strlen(txt);

  int    results = 0;
  int    i = 0;

  /* may overallocate, that's ok */
  TSLexeme   *res = palloc(sizeof(TSLexeme)*(txt_len+1));

  for (i = 1; i <= txt_len; i++) {
    /* skip UTF8 control codes until EOS */
    if (!is_utf8_continuation(txt[i])) {
      res[results++].lexeme = pnstrdup(txt, i);
    }
  }

  res[results].lexeme=NULL;

  pfree(utxt);
  pfree(txt);

  /* Receiver must free res memory and res[].lexeme */
  PG_RETURN_POINTER(res);
}

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Help debugging a hung postgresql client connection
Next
From: "Pat Maddox"
Date:
Subject: Setting table ids in slony