Thread: Dublicates pairs in a table.

Dublicates pairs in a table.

From
"Ries van Twisk"
Date:
Dear guys/girls,

I have a small question which I could not clearly find in the postgreSQL
manual.

if I create this table and index
CRAEATE TABLE test (id    SERIAL,c1    VARCHAR(32),c2    VARCHAR(32),c3    VARCHAR(32)
);

CREATE UNIQUE INDEX test_idx ON test(id, c1,c2);

what I try to archive here is that I don't want duplicate pais in my table:
example

INSET INTO test (c1,c2) VALUES('a', 'a');    -- Allowed
INSET INTO test (c1,c2) VALUES('b', 'b');    -- Allowed
INSET INTO test (c1,c2) VALUES('b', 'c');    -- Allowed
INSET INTO test (c1,c2) VALUES('a', 'a');    -- Not allowed since we already
have a duplicate ('a', 'a') pair
INSET INTO test (c1,c2) VALUES('b', 'c');    -- Not allowed since we already
have a duplicate ('b', 'c') pair

etc. etc. I think you get the idea...

What I want to know is that if this is smart do do, or is there a other
better way to make sure I don't insert duplicate pairs in my database.
I'm not sure if a stored procedure is better in my case since I don't really
need the index on columns c1 or c2.
I'm looking for performance technical answers here thus speed vs memory
comparisons. Ofcourse a pointer to a URL will also do..

Best Regards,
Ries van Twisk









Re: Dublicates pairs in a table.

From
Richard Huxton
Date:
On Monday 16 Sep 2002 3:51 pm, Ries van Twisk wrote:
> Dear guys/girls,
>
> I have a small question which I could not clearly find in the postgreSQL
> manual.
>
> if I create this table and index
> CRAEATE TABLE test (
>     id    SERIAL,
>     c1    VARCHAR(32),
>     c2    VARCHAR(32),
>     c3    VARCHAR(32)
> );
>
> CREATE UNIQUE INDEX test_idx ON test(id, c1,c2);

Close, try

CREATE UNIQUE INDEX test_idx ON test (c1,c2)

> what I try to archive here is that I don't want duplicate pais in my table:
> example
>
> INSET INTO test (c1,c2) VALUES('a', 'a');    -- Not allowed since we already
> have a duplicate ('a', 'a') pair

> What I want to know is that if this is smart do do, or is there a other
> better way to make sure I don't insert duplicate pairs in my database.
> I'm not sure if a stored procedure is better in my case since I don't
> really need the index on columns c1 or c2.

Unless test is a very small table with lots of inserts/deletions I'd just use
the index, otherwise you'll have to scan the table and check for another copy
anyway.

HTH

- Richard Huxton


Re: Dublicates pairs in a table.

From
"Ries van Twisk"
Date:
Richard,

do you suggest using a stored procedure to handle this? I do expect that the
table will be large (for me large is a around 10000-20000 records, the table
as more columns but I only need the restriction on c1 & c2) but I don’t
expect lots of inserts and deletes. Data entry will be done using a regular
user but of course a update needs to be done within a acceptable time so the
user does not get annoyed.

best regards,
Ries van Twisk


> -----Oorspronkelijk bericht-----
> Van: Richard Huxton [mailto:dev@archonet.com]
> Verzonden: maandag 16 september 2002 17:08
> Aan: ries@jongert.nl; pgsql-sql@postgresql.org
> Onderwerp: Re: [SQL] Dublicates pairs in a table.
>
>
> On Monday 16 Sep 2002 3:51 pm, Ries van Twisk wrote:
> > Dear guys/girls,
> >
> > I have a small question which I could not clearly find in
> the postgreSQL
> > manual.
> >
> > if I create this table and index
> > CRAEATE TABLE test (
> >     id    SERIAL,
> >     c1    VARCHAR(32),
> >     c2    VARCHAR(32),
> >     c3    VARCHAR(32)
> > );
> >
> > CREATE UNIQUE INDEX test_idx ON test(id, c1,c2);
>
> Close, try
>
> CREATE UNIQUE INDEX test_idx ON test (c1,c2)
>
> > what I try to archive here is that I don't want duplicate
> pais in my table:
> > example
> >
> > INSET INTO test (c1,c2) VALUES('a', 'a');    -- Not allowed
> since we already
> > have a duplicate ('a', 'a') pair
>
> > What I want to know is that if this is smart do do, or is
> there a other
> > better way to make sure I don't insert duplicate pairs in
> my database.
> > I'm not sure if a stored procedure is better in my case
> since I don't
> > really need the index on columns c1 or c2.
>
> Unless test is a very small table with lots of
> inserts/deletions I'd just use
> the index, otherwise you'll have to scan the table and check
> for another copy
> anyway.
>
> HTH
>
> - Richard Huxton
>



Re: Dublicates pairs in a table.

From
Richard Huxton
Date:
On Tuesday 17 Sep 2002 7:36 am, Ries van Twisk wrote:
> Richard,
>
> do you suggest using a stored procedure to handle this? I do expect that
> the table will be large (for me large is a around 10000-20000 records, the
> table as more columns but I only need the restriction on c1 & c2) but I
> don’t expect lots of inserts and deletes. Data entry will be done using a
> regular user but of course a update needs to be done within a acceptable
> time so the user does not get annoyed.

I'd go with the UNIQUE INDEX. You're unlikely to be able to write code any
faster than the built-in indexing code. It's also simpler and easier for
someone else to see what's going on.

The only time the index will slow you down is on an insert/delete when the
index needs to be updated as well as the table. However, without the index
you'll need to scan the table to check for duplicates. For thousands of
records that can only be slower.

To put your mind at ease, I'd put together some test data and try it - nothing
like seeing some real evidence.

HTH

- Richard Huxton


Re: Dublicates pairs in a table.

From
Kevin Houle
Date:
Ries van Twisk wrote:

> I have a small question which I could not clearly find in the postgreSQL
> manual.
> 
> if I create this table and index
> CRAEATE TABLE test (
>     id    SERIAL,
>     c1    VARCHAR(32),
>     c2    VARCHAR(32),
>     c3    VARCHAR(32)
> );
> 
> CREATE UNIQUE INDEX test_idx ON test(id, c1,c2);
> 
> what I try to archive here is that I don't want duplicate pais in my table:
> example
> 
> INSET INTO test (c1,c2) VALUES('a', 'a');    -- Allowed
> INSET INTO test (c1,c2) VALUES('b', 'b');    -- Allowed
> INSET INTO test (c1,c2) VALUES('b', 'c');    -- Allowed
> INSET INTO test (c1,c2) VALUES('a', 'a');    -- Not allowed since we already
> have a duplicate ('a', 'a') pair
> INSET INTO test (c1,c2) VALUES('b', 'c');    -- Not allowed since we already
> have a duplicate ('b', 'c') pair
> 
> etc. etc. I think you get the idea...

I have the same issue with a table that currently holds well
over 600,000 rows. The case you left out is this:
 INSERT INTO test (c1,c2) VALUES('a','c'); INSERT INTO test (c1,c2) VALUES('c','a');

I want that to fail, but I haven't been able to get it to fail
using unique indexes. I presume ordering is significant. Instead,
I am doing a SELECT prior to insert to insure the pair doesn't
already exist. If you've been able to get order-independent
pairs restricted to being unique using indexes, I'd like to know
about it. :-)

Kevin



Re: Dublicates pairs in a table.

From
Richard Huxton
Date:
On Wednesday 25 Sep 2002 2:10 am, Kevin Houle wrote:
> I have the same issue with a table that currently holds well
> over 600,000 rows. The case you left out is this:
>
>   INSERT INTO test (c1,c2) VALUES('a','c');
>   INSERT INTO test (c1,c2) VALUES('c','a');
>
> I want that to fail, but I haven't been able to get it to fail
> using unique indexes. I presume ordering is significant. Instead,
> I am doing a SELECT prior to insert to insure the pair doesn't
> already exist. If you've been able to get order-independent
> pairs restricted to being unique using indexes, I'd like to know
> about it. :-)

Functional indexes sir - define a function that puts the columns into a sorted
order.

richardh=> CREATE TABLE foo (a text, b text);
CREATE
richardh=> CREATE UNIQUE INDEX foo_both_uniq ON foo ( ord_fn(a,b) );
ERROR:  DefineIndex: index function must be marked iscachable
richardh=> \i ordfn.txt
DROP
CREATE
richardh=> CREATE UNIQUE INDEX foo_both_uniq ON foo ( ord_fn(a,b) );
CREATE
richardh=> insert into foo values ('aa','bb');
INSERT 332596 1
richardh=> insert into foo values ('aa','cc');
INSERT 332597 1
richardh=> insert into foo values ('bb','aa');
ERROR:  Cannot insert a duplicate key into unique index foo_both_uniq
richardh=> insert into foo values ('aa','bb');
ERROR:  Cannot insert a duplicate key into unique index foo_both_uniq

Function defined as:
CREATE FUNCTION ord_fn (text,text) RETURNS text AS '
SELECT (CASE   WHEN $1 < $2   THEN $1 || $2   ELSE $2 || $1   END) as t;
' LANGUAGE SQL WITH (iscachable);

--  Richard Huxton


Re: Dublicates pairs in a table.

From
Jean-Luc Lachance
Date:
What's wrong with 
CREATE UNIQUE INDEX foo_both_uniq ON foo(a,b);
???


Richard Huxton wrote:
> 
> On Wednesday 25 Sep 2002 2:10 am, Kevin Houle wrote:
> > I have the same issue with a table that currently holds well
> > over 600,000 rows. The case you left out is this:
> >
> >   INSERT INTO test (c1,c2) VALUES('a','c');
> >   INSERT INTO test (c1,c2) VALUES('c','a');
> >
> > I want that to fail, but I haven't been able to get it to fail
> > using unique indexes. I presume ordering is significant. Instead,
> > I am doing a SELECT prior to insert to insure the pair doesn't
> > already exist. If you've been able to get order-independent
> > pairs restricted to being unique using indexes, I'd like to know
> > about it. :-)
> 
> Functional indexes sir - define a function that puts the columns into a sorted
> order.
> 
> richardh=> CREATE TABLE foo (a text, b text);
> CREATE
> richardh=> CREATE UNIQUE INDEX foo_both_uniq ON foo ( ord_fn(a,b) );
> ERROR:  DefineIndex: index function must be marked iscachable
> richardh=> \i ordfn.txt
> DROP
> CREATE
> richardh=> CREATE UNIQUE INDEX foo_both_uniq ON foo ( ord_fn(a,b) );
> CREATE
> richardh=> insert into foo values ('aa','bb');
> INSERT 332596 1
> richardh=> insert into foo values ('aa','cc');
> INSERT 332597 1
> richardh=> insert into foo values ('bb','aa');
> ERROR:  Cannot insert a duplicate key into unique index foo_both_uniq
> richardh=> insert into foo values ('aa','bb');
> ERROR:  Cannot insert a duplicate key into unique index foo_both_uniq
> 
> Function defined as:
> CREATE FUNCTION ord_fn (text,text) RETURNS text AS '
> SELECT (CASE
>     WHEN $1 < $2
>     THEN $1 || $2
>     ELSE $2 || $1
>     END) as t;
> ' LANGUAGE SQL WITH (iscachable);
> 
> --
>   Richard Huxton
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)


Re: Dublicates pairs in a table.

From
Stephan Szabo
Date:
On Fri, 27 Sep 2002, Richard Huxton wrote:

> On Wednesday 25 Sep 2002 2:10 am, Kevin Houle wrote:
> > I have the same issue with a table that currently holds well
> > over 600,000 rows. The case you left out is this:
> >
> >   INSERT INTO test (c1,c2) VALUES('a','c');
> >   INSERT INTO test (c1,c2) VALUES('c','a');
> >
> > I want that to fail, but I haven't been able to get it to fail
> > using unique indexes. I presume ordering is significant. Instead,
> > I am doing a SELECT prior to insert to insure the pair doesn't
> > already exist. If you've been able to get order-independent
> > pairs restricted to being unique using indexes, I'd like to know
> > about it. :-)
>
> Function defined as:
> CREATE FUNCTION ord_fn (text,text) RETURNS text AS '
> SELECT (CASE
>     WHEN $1 < $2
>     THEN $1 || $2
>     ELSE $2 || $1
>     END) as t;
> ' LANGUAGE SQL WITH (iscachable);

Note, that for a final system, you'll may want to also add a
delimiter that doesn't show up in $1 or $2 if ('a','ab') and
('aa','b') aren't supposed to cause an error.



Re: Dublicates pairs in a table.

From
Richard Huxton
Date:
On Friday 27 Sep 2002 5:17 pm, Jean-Luc Lachance wrote:
> What's wrong with
> CREATE UNIQUE INDEX foo_both_uniq ON foo(a,b);
> ???

Because he specifically wanted values of ('a','b') and ('b','a') to be treated
as equivalent (see quote).

> > >   INSERT INTO test (c1,c2) VALUES('a','c');
> > >   INSERT INTO test (c1,c2) VALUES('c','a');

Note Stephen Szabo's observation that I'd missed the obvious need for some
separator so ('a','ab') is different from ('aa','b') - Doh!

- Richard Huxton




Re: Dublicates pairs in a table.

From
Jean-Luc Lachance
Date:
Oh, sorry I missed that. 

Still if C1 and C2 are interchangable, a rule could force C1 <= C2 and
swap them if necessary.


Richard Huxton wrote:
> 
> On Friday 27 Sep 2002 5:17 pm, Jean-Luc Lachance wrote:
> > What's wrong with
> > CREATE UNIQUE INDEX foo_both_uniq ON foo(a,b);
> > ???
> 
> Because he specifically wanted values of ('a','b') and ('b','a') to be treated
> as equivalent (see quote).
> 
> > > >   INSERT INTO test (c1,c2) VALUES('a','c');
> > > >   INSERT INTO test (c1,c2) VALUES('c','a');
> 
> Note Stephen Szabo's observation that I'd missed the obvious need for some
> separator so ('a','ab') is different from ('aa','b') - Doh!
> 
> - Richard Huxton