Thread: Dublicates pairs in a table.
Dear guys/girls, I have a small question which I could not clearly find in the postgreSQL manual. if I create this table and index CRAEATE TABLE test (id SERIAL,c1 VARCHAR(32),c2 VARCHAR(32),c3 VARCHAR(32) ); CREATE UNIQUE INDEX test_idx ON test(id, c1,c2); what I try to archive here is that I don't want duplicate pais in my table: example INSET INTO test (c1,c2) VALUES('a', 'a'); -- Allowed INSET INTO test (c1,c2) VALUES('b', 'b'); -- Allowed INSET INTO test (c1,c2) VALUES('b', 'c'); -- Allowed INSET INTO test (c1,c2) VALUES('a', 'a'); -- Not allowed since we already have a duplicate ('a', 'a') pair INSET INTO test (c1,c2) VALUES('b', 'c'); -- Not allowed since we already have a duplicate ('b', 'c') pair etc. etc. I think you get the idea... What I want to know is that if this is smart do do, or is there a other better way to make sure I don't insert duplicate pairs in my database. I'm not sure if a stored procedure is better in my case since I don't really need the index on columns c1 or c2. I'm looking for performance technical answers here thus speed vs memory comparisons. Ofcourse a pointer to a URL will also do.. Best Regards, Ries van Twisk
On Monday 16 Sep 2002 3:51 pm, Ries van Twisk wrote: > Dear guys/girls, > > I have a small question which I could not clearly find in the postgreSQL > manual. > > if I create this table and index > CRAEATE TABLE test ( > id SERIAL, > c1 VARCHAR(32), > c2 VARCHAR(32), > c3 VARCHAR(32) > ); > > CREATE UNIQUE INDEX test_idx ON test(id, c1,c2); Close, try CREATE UNIQUE INDEX test_idx ON test (c1,c2) > what I try to archive here is that I don't want duplicate pais in my table: > example > > INSET INTO test (c1,c2) VALUES('a', 'a'); -- Not allowed since we already > have a duplicate ('a', 'a') pair > What I want to know is that if this is smart do do, or is there a other > better way to make sure I don't insert duplicate pairs in my database. > I'm not sure if a stored procedure is better in my case since I don't > really need the index on columns c1 or c2. Unless test is a very small table with lots of inserts/deletions I'd just use the index, otherwise you'll have to scan the table and check for another copy anyway. HTH - Richard Huxton
Richard, do you suggest using a stored procedure to handle this? I do expect that the table will be large (for me large is a around 10000-20000 records, the table as more columns but I only need the restriction on c1 & c2) but I dont expect lots of inserts and deletes. Data entry will be done using a regular user but of course a update needs to be done within a acceptable time so the user does not get annoyed. best regards, Ries van Twisk > -----Oorspronkelijk bericht----- > Van: Richard Huxton [mailto:dev@archonet.com] > Verzonden: maandag 16 september 2002 17:08 > Aan: ries@jongert.nl; pgsql-sql@postgresql.org > Onderwerp: Re: [SQL] Dublicates pairs in a table. > > > On Monday 16 Sep 2002 3:51 pm, Ries van Twisk wrote: > > Dear guys/girls, > > > > I have a small question which I could not clearly find in > the postgreSQL > > manual. > > > > if I create this table and index > > CRAEATE TABLE test ( > > id SERIAL, > > c1 VARCHAR(32), > > c2 VARCHAR(32), > > c3 VARCHAR(32) > > ); > > > > CREATE UNIQUE INDEX test_idx ON test(id, c1,c2); > > Close, try > > CREATE UNIQUE INDEX test_idx ON test (c1,c2) > > > what I try to archive here is that I don't want duplicate > pais in my table: > > example > > > > INSET INTO test (c1,c2) VALUES('a', 'a'); -- Not allowed > since we already > > have a duplicate ('a', 'a') pair > > > What I want to know is that if this is smart do do, or is > there a other > > better way to make sure I don't insert duplicate pairs in > my database. > > I'm not sure if a stored procedure is better in my case > since I don't > > really need the index on columns c1 or c2. > > Unless test is a very small table with lots of > inserts/deletions I'd just use > the index, otherwise you'll have to scan the table and check > for another copy > anyway. > > HTH > > - Richard Huxton >
On Tuesday 17 Sep 2002 7:36 am, Ries van Twisk wrote: > Richard, > > do you suggest using a stored procedure to handle this? I do expect that > the table will be large (for me large is a around 10000-20000 records, the > table as more columns but I only need the restriction on c1 & c2) but I > dont expect lots of inserts and deletes. Data entry will be done using a > regular user but of course a update needs to be done within a acceptable > time so the user does not get annoyed. I'd go with the UNIQUE INDEX. You're unlikely to be able to write code any faster than the built-in indexing code. It's also simpler and easier for someone else to see what's going on. The only time the index will slow you down is on an insert/delete when the index needs to be updated as well as the table. However, without the index you'll need to scan the table to check for duplicates. For thousands of records that can only be slower. To put your mind at ease, I'd put together some test data and try it - nothing like seeing some real evidence. HTH - Richard Huxton
Ries van Twisk wrote: > I have a small question which I could not clearly find in the postgreSQL > manual. > > if I create this table and index > CRAEATE TABLE test ( > id SERIAL, > c1 VARCHAR(32), > c2 VARCHAR(32), > c3 VARCHAR(32) > ); > > CREATE UNIQUE INDEX test_idx ON test(id, c1,c2); > > what I try to archive here is that I don't want duplicate pais in my table: > example > > INSET INTO test (c1,c2) VALUES('a', 'a'); -- Allowed > INSET INTO test (c1,c2) VALUES('b', 'b'); -- Allowed > INSET INTO test (c1,c2) VALUES('b', 'c'); -- Allowed > INSET INTO test (c1,c2) VALUES('a', 'a'); -- Not allowed since we already > have a duplicate ('a', 'a') pair > INSET INTO test (c1,c2) VALUES('b', 'c'); -- Not allowed since we already > have a duplicate ('b', 'c') pair > > etc. etc. I think you get the idea... I have the same issue with a table that currently holds well over 600,000 rows. The case you left out is this: INSERT INTO test (c1,c2) VALUES('a','c'); INSERT INTO test (c1,c2) VALUES('c','a'); I want that to fail, but I haven't been able to get it to fail using unique indexes. I presume ordering is significant. Instead, I am doing a SELECT prior to insert to insure the pair doesn't already exist. If you've been able to get order-independent pairs restricted to being unique using indexes, I'd like to know about it. :-) Kevin
On Wednesday 25 Sep 2002 2:10 am, Kevin Houle wrote: > I have the same issue with a table that currently holds well > over 600,000 rows. The case you left out is this: > > INSERT INTO test (c1,c2) VALUES('a','c'); > INSERT INTO test (c1,c2) VALUES('c','a'); > > I want that to fail, but I haven't been able to get it to fail > using unique indexes. I presume ordering is significant. Instead, > I am doing a SELECT prior to insert to insure the pair doesn't > already exist. If you've been able to get order-independent > pairs restricted to being unique using indexes, I'd like to know > about it. :-) Functional indexes sir - define a function that puts the columns into a sorted order. richardh=> CREATE TABLE foo (a text, b text); CREATE richardh=> CREATE UNIQUE INDEX foo_both_uniq ON foo ( ord_fn(a,b) ); ERROR: DefineIndex: index function must be marked iscachable richardh=> \i ordfn.txt DROP CREATE richardh=> CREATE UNIQUE INDEX foo_both_uniq ON foo ( ord_fn(a,b) ); CREATE richardh=> insert into foo values ('aa','bb'); INSERT 332596 1 richardh=> insert into foo values ('aa','cc'); INSERT 332597 1 richardh=> insert into foo values ('bb','aa'); ERROR: Cannot insert a duplicate key into unique index foo_both_uniq richardh=> insert into foo values ('aa','bb'); ERROR: Cannot insert a duplicate key into unique index foo_both_uniq Function defined as: CREATE FUNCTION ord_fn (text,text) RETURNS text AS ' SELECT (CASE WHEN $1 < $2 THEN $1 || $2 ELSE $2 || $1 END) as t; ' LANGUAGE SQL WITH (iscachable); -- Richard Huxton
What's wrong with CREATE UNIQUE INDEX foo_both_uniq ON foo(a,b); ??? Richard Huxton wrote: > > On Wednesday 25 Sep 2002 2:10 am, Kevin Houle wrote: > > I have the same issue with a table that currently holds well > > over 600,000 rows. The case you left out is this: > > > > INSERT INTO test (c1,c2) VALUES('a','c'); > > INSERT INTO test (c1,c2) VALUES('c','a'); > > > > I want that to fail, but I haven't been able to get it to fail > > using unique indexes. I presume ordering is significant. Instead, > > I am doing a SELECT prior to insert to insure the pair doesn't > > already exist. If you've been able to get order-independent > > pairs restricted to being unique using indexes, I'd like to know > > about it. :-) > > Functional indexes sir - define a function that puts the columns into a sorted > order. > > richardh=> CREATE TABLE foo (a text, b text); > CREATE > richardh=> CREATE UNIQUE INDEX foo_both_uniq ON foo ( ord_fn(a,b) ); > ERROR: DefineIndex: index function must be marked iscachable > richardh=> \i ordfn.txt > DROP > CREATE > richardh=> CREATE UNIQUE INDEX foo_both_uniq ON foo ( ord_fn(a,b) ); > CREATE > richardh=> insert into foo values ('aa','bb'); > INSERT 332596 1 > richardh=> insert into foo values ('aa','cc'); > INSERT 332597 1 > richardh=> insert into foo values ('bb','aa'); > ERROR: Cannot insert a duplicate key into unique index foo_both_uniq > richardh=> insert into foo values ('aa','bb'); > ERROR: Cannot insert a duplicate key into unique index foo_both_uniq > > Function defined as: > CREATE FUNCTION ord_fn (text,text) RETURNS text AS ' > SELECT (CASE > WHEN $1 < $2 > THEN $1 || $2 > ELSE $2 || $1 > END) as t; > ' LANGUAGE SQL WITH (iscachable); > > -- > Richard Huxton > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
On Fri, 27 Sep 2002, Richard Huxton wrote: > On Wednesday 25 Sep 2002 2:10 am, Kevin Houle wrote: > > I have the same issue with a table that currently holds well > > over 600,000 rows. The case you left out is this: > > > > INSERT INTO test (c1,c2) VALUES('a','c'); > > INSERT INTO test (c1,c2) VALUES('c','a'); > > > > I want that to fail, but I haven't been able to get it to fail > > using unique indexes. I presume ordering is significant. Instead, > > I am doing a SELECT prior to insert to insure the pair doesn't > > already exist. If you've been able to get order-independent > > pairs restricted to being unique using indexes, I'd like to know > > about it. :-) > > Function defined as: > CREATE FUNCTION ord_fn (text,text) RETURNS text AS ' > SELECT (CASE > WHEN $1 < $2 > THEN $1 || $2 > ELSE $2 || $1 > END) as t; > ' LANGUAGE SQL WITH (iscachable); Note, that for a final system, you'll may want to also add a delimiter that doesn't show up in $1 or $2 if ('a','ab') and ('aa','b') aren't supposed to cause an error.
On Friday 27 Sep 2002 5:17 pm, Jean-Luc Lachance wrote: > What's wrong with > CREATE UNIQUE INDEX foo_both_uniq ON foo(a,b); > ??? Because he specifically wanted values of ('a','b') and ('b','a') to be treated as equivalent (see quote). > > > INSERT INTO test (c1,c2) VALUES('a','c'); > > > INSERT INTO test (c1,c2) VALUES('c','a'); Note Stephen Szabo's observation that I'd missed the obvious need for some separator so ('a','ab') is different from ('aa','b') - Doh! - Richard Huxton
Oh, sorry I missed that. Still if C1 and C2 are interchangable, a rule could force C1 <= C2 and swap them if necessary. Richard Huxton wrote: > > On Friday 27 Sep 2002 5:17 pm, Jean-Luc Lachance wrote: > > What's wrong with > > CREATE UNIQUE INDEX foo_both_uniq ON foo(a,b); > > ??? > > Because he specifically wanted values of ('a','b') and ('b','a') to be treated > as equivalent (see quote). > > > > > INSERT INTO test (c1,c2) VALUES('a','c'); > > > > INSERT INTO test (c1,c2) VALUES('c','a'); > > Note Stephen Szabo's observation that I'd missed the obvious need for some > separator so ('a','ab') is different from ('aa','b') - Doh! > > - Richard Huxton