Re: pgsql 'prefix' error - Mailing list pgsql-general

From Dimitri Fontaine
Subject Re: pgsql 'prefix' error
Date
Msg-id 87tywid19x.fsf@hi-media-techno.com
Whole thread Raw
In response to pgsql 'prefix' error  (Bino Oetomo <bino@indoakses-online.com>)
Responses Re: pgsql 'prefix' error  (Dimitri Fontaine <dfontaine@hi-media.com>)
List pgsql-general
Hi,

Bino Oetomo <bino@indoakses-online.com> writes:
> ERROR: duplicate key value violates unique constraint "myrecords_pkey"
> CONTEXT: COPY myrecords, line 2: "12"

I think I should add the following code comment to the documentation, if
not already done:

/*
 * We invent a prefix_range ordering for convenience, but that's
 * dangerous. Use the BTree opclass at your own risk.
 *
 * On the other hand, when your routing table does contain pretty static
 * data and you test it carefully or know it will fit into the ordering
 * simplification, you're good to go.
 *
 * Baring bug, the constraint is to have non-overlapping data.
 */

You have to remember that '12'::prefix_range could be spelled as the
regular expression '12.*'. So that '1'::prefix_range contains '12'.

The BTree opclass is not made to resist to overlapping data. Maybe in
this case though we could say that 12 contains less elements than 1 so
it's less than 1. Here's a test to redefine the pr_cmp() operator in
term of that, as a patch against current CVS (which is 1.0.0).

Can you test with this version and maybe better data set?

Note that as said earlier the indexing you need to speed up queries is
the GiST one, but it could be you want the PK constraint noneless.

prefix=# select prefix_range_cmp('1', '12');
                                             
 prefix_range_cmp
------------------
                1  -- it is 0 without the patch.
(1 row)

This means '1'::prefix_range > '12'::prefix_range and you're now able to
create your PRIMARY KEY on the example data. It's still not very useful
for the general case, but could be argued as better...

Of course changing that will discard any btree containing a prefix_range
column, so that's going to be 1.1.0 if workable.

Regards,
--
dim

PS: no worry about the operators themselves, they are defined atop cmp:

static inline
bool pr_lt(prefix_range *a, prefix_range *b, bool eqval) {
  int cmp = pr_cmp(a, b);
  return eqval ? cmp <= 0 : cmp < 0;
}

static inline
bool pr_gt(prefix_range *a, prefix_range *b, bool eqval) {
  int cmp = pr_cmp(a, b);
  return eqval ? cmp >= 0 : cmp > 0;
}

? README.html
? TESTS.html
? prefix.sql
? prefixes.check.pl
? todo
? varlena.gavin.snippet.c
? debian/files
? debian/postgresql-8.3-prefix
? debian/postgresql-8.3-prefix.debhelper.log
? debian/postgresql-8.3-prefix.substvars
? debian/postgresql-8.4-prefix
? debian/postgresql-8.4-prefix.debhelper.log
? debian/postgresql-8.4-prefix.substvars
Index: prefix.c
===================================================================
RCS file: /cvsroot/prefix/prefix/prefix.c,v
retrieving revision 1.54
diff -p -u -r1.54 prefix.c
--- prefix.c    6 Oct 2009 09:55:32 -0000    1.54
+++ prefix.c    25 Nov 2009 16:10:47 -0000
@@ -520,7 +520,11 @@ int pr_cmp(prefix_range *a, prefix_range
   cmp = memcmp(p, q, mlen);

   if( cmp == 0 )
-    return (a->first == b->first) ? (a->last - b->last) : (a->first - b->first);
+    /*
+     * we are comparing e.g. '1' and '12' (the shorter contains the
+     * smaller), so let's pretend '12' < '1' as it contains less elements.
+     */
+    return (alen == mlen) ? 1 : -1;

   return cmp;
 }

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: [BUGS] Re: How duplicate data produce when a UNIQUE index exite ?
Next
From: Greg Smith
Date:
Subject: Re: [BUGS] Re: How duplicate data produce when a UNIQUE index exite ?