Re: Latest on CITEXT 2.0 - Mailing list pgsql-hackers

From David E. Wheeler
Subject Re: Latest on CITEXT 2.0
Date
Msg-id 25892F1D-AB99-4CBB-AA04-A5A4267ED196@kineticode.com
Whole thread Raw
In response to Re: Latest on CITEXT 2.0  (Martijn van Oosterhout <kleptog@svana.org>)
Responses Re: Latest on CITEXT 2.0  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: Latest on CITEXT 2.0  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Thanks a million for your answers, Martijn. I just have some more
stupid questions, if you could bear with me.

On Jun 26, 2008, at 03:28, Martijn van Oosterhout wrote:

> When creating an index, your comparison functions are going ot be
> called O(N log N) times. If they leak into a context that isn't
> regularly freed you may have a problem. I'd suggest loking at how the
> text comparisons do it. PG_FREE_IF_COPY() is probably a good idea
> because the incoming tuples may be detoasted.

Okay, I see that text_cmp in varlena doesn't use PG_FREE_IF_COPY(),
and neither do text_smaller nor text_larger (which just dispatch to
text_cmp anyway).

The operator functions *do* use PG_FREE_IF_COPY(). So I'm guessing
it's these functions you're talking about. However, my implementation
just looks like this:

Datum citext_ne (PG_FUNCTION_ARGS) {    // Fast path for different-length inputs. Okay for canonical
equivalence?    if (VARSIZE(PG_GETARG_TEXT_P(0)) != VARSIZE(PG_GETARG_TEXT_P(1)))        PG_RETURN_BOOL( 1 );
PG_RETURN_BOOL(citextcmp( PG_ARGS ) != 0 ); 
}

I don't *thinkI any variables are copied there. citextcmp() is just
this:

int citextcmp (PG_FUNCTION_ARGS) {    // XXX These are all just references to existing structures, right?    text *
left = PG_GETARG_TEXT_P(0);    text * right = PG_GETARG_TEXT_P(1);    return varstr_cmp(        cilower( left ),
VARSIZE_ANY_EXHDR(left),       cilower( right ),        VARSIZE_ANY_EXHDR(right)    ); 
}

Again, no copying. cilower() does copy:
    int    index, len;    char * result;
    index  = 0;    len    = VARSIZE(arg) - VARHDRSZ;    result = (char *) palloc( strlen( str ) + 1 );
    for (index = 0; index <= len; index++) {        result[index] = tolower((unsigned char) str[index] );    }    //
XXXI don't need to pfree result if I'm returning it, right?    return result; 

But the copied value is returned. Hrm…it should probably be pfreed
somewhere, yes?

So I'm wondering if I should change citextcmp to pfree values?
Something like this:
    text * left  = PG_GETARG_TEXT_P(0);    text * right = PG_GETARG_TEXT_P(1);    char * lcstr = cilower( left  );
char* rcstr = cilower( right ); 
    int result = varstr_cmp(        cilower( left ),        VARSIZE_ANY_EXHDR(left),        cilower( right ),
VARSIZE_ANY_EXHDR(right)   ); 
    pfree( lcstr );    pfree( rcstr );    return result;

This is the only function that calls cilower(). And I *think* it's the
only place where values are copied or memory is allocated needing to
be freed. Does that sound right to you?

On a side note, I've implemented this pretty differently from how the
text functions are implemented in varlena.c, just to try to keep
things succinct. But I'm wondering now if I shouldn't switch back to
the style used by varlena.c, if only to keep the style the same, and
thus perhaps to increase the chances that citext would be a welcome
contrib addition. Thoughts?

Many thanks again. You're a great help to this C n00b.

Best,

David

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: get_relation_stats_hook()
Next
From: Chris Browne
Date:
Subject: Re: Planner creating ineffective plans on LEFT OUTER joins