Thread: Custom type, operators and operator class not sorting/indexing correctly

Custom type, operators and operator class not sorting/indexing correctly

From
Roger Leigh
Date:
Dear all,

I've created a new domain (debversion) derived from TEXT, which
includes its own operators (< <= = >= > and <>), and also its
own operator class for BTREE indices.

The operators function correctly when I test them by themselves,
e.g. SELECT x < y;
However, if I create a table with a column of this type, ORDER BY
does not result in correct ordering.  I have to explicitly add
'USING <' to the query, and even this fails to work if I haven't
defined the operator class:

# SELECT * FROM testv ORDER BY version ASC;
     version
------------------
 1.0.3-3
 3.0.7+1-1
 3.0.7+1-2
 3.0.7+1-2~lenny2
(4 rows)

# SELECT * FROM testv ORDER BY version USING <;
     version
------------------
 1.0.3-3
 3.0.7+1-1
 3.0.7+1-2~lenny2
 3.0.7+1-2
(4 rows)

The latter shows the correct ordering.  The former appears to be
using the lexical ordering of the TEXT type.  Adding an index
does not affect the ordering, even if I explictly make it use my
operator class (it's also set as the default).

The SQL code to create the type and demonstrate the problem follows
at the end of this mail.  It requires the PL/Perl and PL/pgSQL
languages to be available.  It shows example queries to demonstrate
the ordering issue above.

I thought that I had correctly defined the type, functions, operators
and operator class in order for everything to function correctly, but
I must be missing some final piece of the puzzle or some PostgreSQL
subtlety I'm not aware of (this is my first attempt at defining
operators, and I am also a newcomer to using procedural languages).

Could anyone suggest what I've done wrong here?


Many thanks,
Roger Leigh

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment
Roger Leigh <rleigh@codelibre.net> writes:
> I've created a new domain (debversion) derived from TEXT, which
> includes its own operators (< <= = >= > and <>), and also its
> own operator class for BTREE indices.

You can't realistically attach such things to a domain; try making
a separate type, perhaps with an implicit cast to text to allow
use of text operators for other purposes.

Elein was going to look into tweaking the coercion rules so that
functions on domains had a non-negligible chance of being useful,
but we've not heard much about that project lately.

            regards, tom lane

Re: Custom type, operators and operator class not sorting/indexing correctly

From
Roger Leigh
Date:
On Wed, Jan 21, 2009 at 02:03:03AM -0500, Tom Lane wrote:
> Roger Leigh <rleigh@codelibre.net> writes:
> > I've created a new domain (debversion) derived from TEXT, which
> > includes its own operators (< <= = >= > and <>), and also its
> > own operator class for BTREE indices.
>
> You can't realistically attach such things to a domain; try making
> a separate type, perhaps with an implicit cast to text to allow
> use of text operators for other purposes.

Ah, thanks for the clarification.  So I need to use CREATE TYPE
rather than CREATE DOMAIN.  Because I'm essentially just storing
a text string with different operators, can I derive a type from
TEXT (perhaps by reusing the same input, output, receive and send
functions as TEXT?)  I saw the textsend and textreceive functions,
which I assume are the appropriate functions for send and receive?
Are there any for input and output which I may reuse?


Many thanks,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Re: Custom type, operators and operator class not sorting/indexing correctly

From
Martijn van Oosterhout
Date:
On Wed, Jan 21, 2009 at 10:48:09AM +0000, Roger Leigh wrote:
> Ah, thanks for the clarification.  So I need to use CREATE TYPE
> rather than CREATE DOMAIN.  Because I'm essentially just storing
> a text string with different operators, can I derive a type from
> TEXT (perhaps by reusing the same input, output, receive and send
> functions as TEXT?)  I saw the textsend and textreceive functions,
> which I assume are the appropriate functions for send and receive?
> Are there any for input and output which I may reuse?

Yes, you can copy all the existing attributes. Use:

select * from pg_type where typname='text';

To get the relevent info (upcoming 8.4 will have CREATE TYPE xxx
(LIKE=text) ). Lookup the citext project for an example.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Attachment

Re: Custom type, operators and operator class not sorting/indexing correctly

From
Roger Leigh
Date:
On Wed, Jan 21, 2009 at 08:52:19PM +0100, Martijn van Oosterhout wrote:
> On Wed, Jan 21, 2009 at 10:48:09AM +0000, Roger Leigh wrote:
> > Ah, thanks for the clarification.  So I need to use CREATE TYPE
> > rather than CREATE DOMAIN.  Because I'm essentially just storing
> > a text string with different operators, can I derive a type from
> > TEXT (perhaps by reusing the same input, output, receive and send
> > functions as TEXT?)  I saw the textsend and textreceive functions,
> > which I assume are the appropriate functions for send and receive?
> > Are there any for input and output which I may reuse?
>
> Yes, you can copy all the existing attributes. Use:
>
> select * from pg_type where typname='text';
>
> To get the relevent info (upcoming 8.4 will have CREATE TYPE xxx
> (LIKE=text) ). Lookup the citext project for an example.

Many thanks for this suggestion; it has been incredibly useful.
Using a pre-existing C implementation, plus looking at citext for
how to write the operators correctly, I now have a working pure
C (C++ implementation).


http://git.debian.org/?p=users/rleigh/sbuild.git;a=tree;f=db;h=0a6d3348744c59b78322ec6a4855876875613239;hb=81fd39259953853632a7d0e2198cfc745d270fe3

I just have a few points I'd like to clarify.  In


http://git.debian.org/?p=users/rleigh/sbuild.git;a=blob;f=db/debversion.cc;h=980786ecf7a3b7fb5769d04b0952af300723c3b9;hb=81fd39259953853632a7d0e2198cfc745d270fe3

debversioncmp (lines 49-73), I'm duplicating text* to char* by hand.
Is the text_to_cstring available internally also accessible by
external modules (I didn't see it in any headers)?

After every PG_GETARG_TEXT_PP, I've called PG_FREE_IF_COPY before
returning.  However, I saw in citext (the behaviour of which I
duplicated in debversion_smaller and debversion_larger (lines 221-246))
that you *don't* use PG_FREE_IF_COPY before returning one of the two
values.  Is this not a potential memory leak?  Don't you need to
duplicate your chosen value, then free both of the temporary values?

I also noticed that I couldn't include some postgres headers in
my code (e.g. <access/builtins.h>) due to included headers
<nodes/parsenodes.h> and <nodes/primnodes.h> using invalid C++ syntax
due to use of "using" and "typename" which are reserved C++ keywords.
The headers are also not wrapped with extern "C" blocks for direct
inclusion in C++ sources, though this isn't as big a deal--you can
just include them as I did within such a block in your own code.
The former problem can't be worked around due to it being invalid
code (this can be rectified by renaming to not use these specific
keyword names).


Many thanks for your help,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment

Re: Custom type, operators and operator class not sorting/indexing correctly

From
Martijn van Oosterhout
Date:
On Sun, Jan 25, 2009 at 03:52:02PM +0000, Roger Leigh wrote:
> Many thanks for this suggestion; it has been incredibly useful.

No problem.

> I just have a few points I'd like to clarify.  In
>
>
http://git.debian.org/?p=users/rleigh/sbuild.git;a=blob;f=db/debversion.cc;h=980786ecf7a3b7fb5769d04b0952af300723c3b9;hb=81fd39259953853632a7d0e2198cfc745d270fe3
>
> debversioncmp (lines 49-73), I'm duplicating text* to char* by hand.
> Is the text_to_cstring available internally also accessible by
> external modules (I didn't see it in any headers)?

A function like that exists, the "proper" way to do it is (untested):

DatumGetCString( DirectFunctionCall1(textout, TextGetDatum(foo)  )

It's used in various places, in the unreleased 8.4 there will be
official functions like cstring_to_text and text_to_cstring. Many other
modules already declare stuff like this.

> After every PG_GETARG_TEXT_PP, I've called PG_FREE_IF_COPY before
> returning.  However, I saw in citext (the behaviour of which I
> duplicated in debversion_smaller and debversion_larger (lines 221-246))
> that you *don't* use PG_FREE_IF_COPY before returning one of the two
> values.  Is this not a potential memory leak?  Don't you need to
> duplicate your chosen value, then free both of the temporary values?

Memory leaks are not an issue generally, everything you allocate gets
freed at the end of the statement, if not earlier. Normally you don't
manually free at all, however code that might be used by indexes is
somewhat of an exception, since they might be called often in a tight
loop. smaller/larger are not index functions, but the cmp function is.

Though your strings here are probably short enough you won't notice
either way.

> I also noticed that I couldn't include some postgres headers in
> my code (e.g. <access/builtins.h>) due to included headers
> <nodes/parsenodes.h> and <nodes/primnodes.h> using invalid C++ syntax

C++ incompatability has been noted before and patches were posted. I
don't remember right now what happened to them, check the archives. The
usual workarond is to split the C++ specific stuff into a seperate
file, but that's kind of ugly I agree.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Attachment

Re: Custom type, operators and operator class not sorting/indexing correctly

From
Roger Leigh
Date:
On Sun, Jan 25, 2009 at 05:08:54PM +0100, Martijn van Oosterhout wrote:
> On Sun, Jan 25, 2009 at 03:52:02PM +0000, Roger Leigh wrote:
> >
> > I'm duplicating text* to char* by hand.
> > Is the text_to_cstring available internally also accessible by
> > external modules (I didn't see it in any headers)?
>
> A function like that exists, the "proper" way to do it is (untested):
>
> DatumGetCString( DirectFunctionCall1(textout, TextGetDatum(foo)  )
>
> It's used in various places, in the unreleased 8.4 there will be
> official functions like cstring_to_text and text_to_cstring. Many other
> modules already declare stuff like this.

Great, it looks like I'll be able to make things much simpler with 8.4,
though I will need to be backward compatible with 8.3 for some time.

> > After every PG_GETARG_TEXT_PP, I've called PG_FREE_IF_COPY before
> > returning.  However, I saw in citext (the behaviour of which I
> > duplicated in debversion_smaller and debversion_larger (lines 221-246))
> > that you *don't* use PG_FREE_IF_COPY before returning one of the two
> > values.  Is this not a potential memory leak?  Don't you need to
> > duplicate your chosen value, then free both of the temporary values?
>
> Memory leaks are not an issue generally, everything you allocate gets
> freed at the end of the statement, if not earlier. Normally you don't
> manually free at all, however code that might be used by indexes is
> somewhat of an exception, since they might be called often in a tight
> loop. smaller/larger are not index functions, but the cmp function is.
>
> Though your strings here are probably short enough you won't notice
> either way.

Thanks for the clarification, I wasn't aware everything was automatically
freed.  The stricter requirements for index operators does make sense.

> > I also noticed that I couldn't include some postgres headers in
> > my code (e.g. <access/builtins.h>) due to included headers
> > <nodes/parsenodes.h> and <nodes/primnodes.h> using invalid C++ syntax
>
> C++ incompatability has been noted before and patches were posted. I
> don't remember right now what happened to them, check the archives.

Ah, thanks.  I won't file a bug if it's already a known issue.


Thanks again,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment