Thread: Scrub one large table against another
I currently have a db supporting what is for the most part an OLAP data warehousing application. One table (good data) has roughly 120 million rows, divided into roughly 40 different relational groups (logically by foreign key). Every time I add data to this table, I need to afterwards scrub that group against known "bad data" which is housed in a second table that has roughly 21 million rows. The 120 million row good data table is called "email_record" The 21 million row bad data table is called "suppress" There are separate btree indexes on 'email_record_id', 'email_list_id' and 'email' on both tables. Each time I scrub data I pull out anywhere from 1 to 5 million rows from the good table (depending on the size of the group i'm scrubbing) and compare them against the 21 million rows in the 'suppress' table. So far I've done this using a temporary staging table that stores only the email_record_id for each row from the relevant group of the good table. I use a plsql function that does roughly the following (i've included only sql syntax and inserted the constant '9' where i would normally use a variable): The characters: email_record_id int8, email varchar(255), email_list_id int8 ------------------------------------------------------------- CREATE TEMP TABLE temp_list_suppress(email_record_id int8); INSERT INTO temp_list_suppress SELECT email_record_id from ONLY email_record er WHERE email_list_id = 9 AND email IN (select email from suppress); CREATE INDEX unique_id_index on temp_list_suppress ( email_record_id ); INSERT INTO er_banned SELECT * from ONLY email_record er WHERE EXISTS (SELECT 1 from temp_list_suppress ts where er.email_record_id = ts.email_record_id)'; DELETE FROM ONLY email_record WHERE email_list_id = 9 AND email_record_id IN (SELECT email_record_id from temp_list_suppress); TRUNCATE TABLE temp_list_suppress; DROP TABLE temp_list_suppress; -------------------------------------------------------------- The performance is dreadful, is there a more efficient way to do this? Would I be better off just grabbing * initially from the good table instead of just the id to avoid more sequential searches later? Here are my configs: Debian Postgres 8.1.4 dual zeon ram: 4 gigs raid 5 # - Memory - shared_buffers = 3000 work_mem = 92768 maintenance_work_mem = 128384 autovacuum is turned off, and the db is annalyzed and vacuumed regularly. Regards, Brendan
Brendan Curran <brendan.curran@gmail.com> writes: > CREATE TEMP TABLE temp_list_suppress(email_record_id int8); > INSERT INTO temp_list_suppress > SELECT email_record_id from ONLY email_record er > WHERE email_list_id = 9 AND email IN > (select email from suppress); > CREATE INDEX unique_id_index on temp_list_suppress ( email_record_id ); > INSERT INTO er_banned > SELECT * from ONLY email_record er WHERE EXISTS > (SELECT 1 from temp_list_suppress ts where er.email_record_id = > ts.email_record_id)'; > DELETE FROM ONLY email_record WHERE email_list_id = 9 AND email_record_id IN > (SELECT email_record_id from temp_list_suppress); > TRUNCATE TABLE temp_list_suppress; > DROP TABLE temp_list_suppress; > The performance is dreadful, is there a more efficient way to do this? Have you tried doing EXPLAIN ANALYZE of each of the INSERT/DELETE steps? If you don't even know which part is slow, it's hard to improve. It would probably help to do an "ANALYZE temp_list_suppress" right after populating the temp table. As you have it, the second insert and delete are being planned with nothing more than a row count (obtained during CREATE INDEX) and no stats about distribution of the table contents. Also, I'd be inclined to try replacing the EXISTS with an IN test; in recent PG versions the planner is generally smarter about IN. (Is there a reason why you are doing the INSERT one way and the DELETE the other?) BTW, that TRUNCATE right before the DROP seems quite useless, although it's not the main source of your problem. regards, tom lane
Tom Lane wrote: > Brendan Curran <brendan.curran@gmail.com> writes: >> CREATE TEMP TABLE temp_list_suppress(email_record_id int8); > >> INSERT INTO temp_list_suppress >> SELECT email_record_id from ONLY email_record er >> WHERE email_list_id = 9 AND email IN >> (select email from suppress); > >> CREATE INDEX unique_id_index on temp_list_suppress ( email_record_id ); > >> INSERT INTO er_banned >> SELECT * from ONLY email_record er WHERE EXISTS >> (SELECT 1 from temp_list_suppress ts where er.email_record_id = >> ts.email_record_id)'; > >> DELETE FROM ONLY email_record WHERE email_list_id = 9 AND email_record_id IN >> (SELECT email_record_id from temp_list_suppress); > >> TRUNCATE TABLE temp_list_suppress; >> DROP TABLE temp_list_suppress; > >> The performance is dreadful, is there a more efficient way to do this? > > Have you tried doing EXPLAIN ANALYZE of each of the INSERT/DELETE steps? > If you don't even know which part is slow, it's hard to improve. FIRST INSERT (Just the select is explained): Hash Join (cost=8359220.68..9129843.00 rows=800912 width=32) Hash Cond: (("outer".email)::text = ("inner".email)::text) -> Unique (cost=4414093.19..4522324.49 rows=21646260 width=25) -> Sort (cost=4414093.19..4468208.84 rows=21646260 width=25) Sort Key: suppress.email -> Seq Scan on suppress (cost=0.00..393024.60 rows=21646260 width=25) -> Hash (cost=3899868.47..3899868.47 rows=4606808 width=32) -> Bitmap Heap Scan on email_record er (cost=38464.83..3899868.47 rows=4606808 width=32) Recheck Cond: (email_list_id = 13) -> Bitmap Index Scan on list (cost=0.00..38464.83 rows=4606808 width=0) Index Cond: (email_list_id = 13) SECOND INSERT (Using EXISTS): Seq Scan on email_record er (cost=0.00..381554175.29 rows=62254164 width=1863) Filter: (subplan) SubPlan -> Index Scan using er_primeq_pk on er_primeq eq (cost=0.00..3.03 rows=1 width=0) Index Cond: ($0 = email_record_id) SECOND INSERT (Using IN): Nested Loop (cost=26545.94..2627497.28 rows=27134 width=1863) -> HashAggregate (cost=26545.94..33879.49 rows=733355 width=8) -> Seq Scan on er_primeq (cost=0.00..24712.55 rows=733355 width=8) -> Index Scan using email_record_pkey on email_record er (cost=0.00..3.52 rows=1 width=1863) Index Cond: (er.email_record_id = "outer".email_record_id) Filter: (email_list_id = 13) DELETE Nested Loop (cost=26545.94..2627497.28 rows=50846 width=6) -> HashAggregate (cost=26545.94..33879.49 rows=733355 width=8) -> Seq Scan on er_primeq (cost=0.00..24712.55 rows=733355 width=8) -> Index Scan using email_record_pkey on email_record (cost=0.00..3.52 rows=1 width=14) Index Cond: (email_record.email_record_id = "outer".email_record_id) Filter: (email_list_id = 9) To get this explain data I used a sample "temp_suppress" table that contained about 700k rows and was indexed but not analyzed... > > It would probably help to do an "ANALYZE temp_list_suppress" right after > populating the temp table. As you have it, the second insert and delete > are being planned with nothing more than a row count (obtained during > CREATE INDEX) and no stats about distribution of the table contents. > > Also, I'd be inclined to try replacing the EXISTS with an IN test; > in recent PG versions the planner is generally smarter about IN. > (Is there a reason why you are doing the INSERT one way and the > DELETE the other?) > > BTW, that TRUNCATE right before the DROP seems quite useless, > although it's not the main source of your problem. > > regards, tom lane >
Brendan Curran <brendan.curran@gmail.com> writes: > Tom Lane wrote: >> Have you tried doing EXPLAIN ANALYZE of each of the INSERT/DELETE steps? > FIRST INSERT (Just the select is explained): EXPLAIN ANALYZE, please, not just EXPLAIN. regards, tom lane
Tom Lane wrote: > Brendan Curran <brendan.curran@gmail.com> writes: >> Tom Lane wrote: >>> Have you tried doing EXPLAIN ANALYZE of each of the INSERT/DELETE steps? > >> FIRST INSERT (Just the select is explained): > > EXPLAIN ANALYZE, please, not just EXPLAIN. > > regards, tom lane > Sorry, here is the EXPLAIN ANALYZE output of that first SELECT EXPLAIN ANALYZE SELECT email_record_id from ONLY email_record er WHERE email_list_id = 13 AND email IN (select email from suppress); Hash Join (cost=8359220.68..9129843.00 rows=800912 width=8) (actual time=2121601.603..2121601.603 rows=0 loops=1) Hash Cond: (("outer".email)::text = ("inner".email)::text) -> Unique (cost=4414093.19..4522324.49 rows=21646260 width=25) (actual time=1165955.907..1434439.731 rows=21646261 loops=1) -> Sort (cost=4414093.19..4468208.84 rows=21646260 width=25) (actual time=1165955.903..1384667.715 rows=21646261 loops=1) Sort Key: suppress.email -> Seq Scan on suppress (cost=0.00..393024.60 rows=21646260 width=25) (actual time=37.784..609848.551 rows=21646261 loops=1) -> Hash (cost=3899868.47..3899868.47 rows=4606808 width=32) (actual time=554522.983..554522.983 rows=3245336 loops=1) -> Bitmap Heap Scan on email_record er (cost=38464.83..3899868.47 rows=4606808 width=32) (actual time=275640.435..541342.727 rows=3245336 loops=1) Recheck Cond: (email_list_id = 13) -> Bitmap Index Scan on list (cost=0.00..38464.83 rows=4606808 width=0) (actual time=275102.037..275102.037 rows=5172979 loops=1) Index Cond: (email_list_id = 13) Total runtime: 2122693.864 ms So much time is being spent in the Unique and Sort leaves... I would think that it wouldn't need to do the unique portion, since there is no DISTINCT clause...
On Tue, Oct 10, 2006 at 05:46:18PM -0600, Brendan Curran wrote: > > > Tom Lane wrote: > >Brendan Curran <brendan.curran@gmail.com> writes: > >>Tom Lane wrote: > >>>Have you tried doing EXPLAIN ANALYZE of each of the INSERT/DELETE steps? > > > >>FIRST INSERT (Just the select is explained): > > > >EXPLAIN ANALYZE, please, not just EXPLAIN. > > > > regards, tom lane > > > > Sorry, here is the EXPLAIN ANALYZE output of that first SELECT > > EXPLAIN ANALYZE SELECT email_record_id from ONLY email_record er > WHERE email_list_id = 13 AND email IN > (select email from suppress); > > Hash Join (cost=8359220.68..9129843.00 rows=800912 width=8) (actual > time=2121601.603..2121601.603 rows=0 loops=1) > Hash Cond: (("outer".email)::text = ("inner".email)::text) > -> Unique (cost=4414093.19..4522324.49 rows=21646260 width=25) > (actual time=1165955.907..1434439.731 rows=21646261 loops=1) > -> Sort (cost=4414093.19..4468208.84 rows=21646260 width=25) > (actual time=1165955.903..1384667.715 rows=21646261 loops=1) > Sort Key: suppress.email > -> Seq Scan on suppress (cost=0.00..393024.60 > rows=21646260 width=25) (actual time=37.784..609848.551 rows=21646261 > loops=1) > -> Hash (cost=3899868.47..3899868.47 rows=4606808 width=32) (actual > time=554522.983..554522.983 rows=3245336 loops=1) > -> Bitmap Heap Scan on email_record er > (cost=38464.83..3899868.47 rows=4606808 width=32) (actual > time=275640.435..541342.727 rows=3245336 loops=1) > Recheck Cond: (email_list_id = 13) > -> Bitmap Index Scan on list (cost=0.00..38464.83 > rows=4606808 width=0) (actual time=275102.037..275102.037 rows=5172979 > loops=1) > Index Cond: (email_list_id = 13) > Total runtime: 2122693.864 ms > > > So much time is being spent in the Unique and Sort leaves... I would > think that it wouldn't need to do the unique portion, since there is no > DISTINCT clause... I think that's coming about because of the IN. Try a simple join instead... SELECT email_record_id FROM ONLY email_record er JOIN suppress s USING (email) WHERE er.email_list_id = 13; -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Brendan Curran <brendan.curran@gmail.com> writes: > So much time is being spent in the Unique and Sort leaves... I would > think that it wouldn't need to do the unique portion, since there is no > DISTINCT clause... There's nothing in that query suggesting that suppress.email is unique. If you know that it is, try using a plain join instead of an IN. regards, tom lane
Tom Lane wrote: > Brendan Curran <brendan.curran@gmail.com> writes: >> So much time is being spent in the Unique and Sort leaves... I would >> think that it wouldn't need to do the unique portion, since there is no >> DISTINCT clause... > > There's nothing in that query suggesting that suppress.email is unique. > If you know that it is, try using a plain join instead of an IN. > > regards, tom lane > Interestingly, and thank you to Tom and Jim, the explicit JOIN improved performance tremendously (RESULTS BELOW). I converted the entire query to use explicit joins instead of IN and EXISTS and discovered acceptable performance. I think the next place to go from here is RAID1/RAID10 and possibly partitioning my large table (Welcome to DDL insanity, right?). I have to add that I'm a little surprised the documentation is so generous to IN and EXISTS. Is there something amiss in my configuration that prevents them from performing correctly? If not, I can't imagine a time when IN or EXISTS would be more performant than an explicit JOIN... Additionally, I manually scrub for duplicates at the group level in the email_record table to keep my records unique. I would like to use a unique constraint, but have found that batching in JDBC is impossible due to irrecoverable errors even when using BEFORE INSERT triggers to just return NULL if a record exists already. Has anyone got an elegant solution for the 'add only if not exists already' problem similar to MSSQL's MERGE command? Just one more thing... I have found that maintaining a btree index on a varchar(255) value is extremely expensive on insert/update/delete. It is unfortunately necessary for me to maintain this index for queries and reports so I am transitioning to using an unindexed staging table to import data into before merging it with the larger table. All the docs and posts recommend is to drop the index, import your data, and then create the index again. This is untenable on a daily / bi-weekly basis. Is there a more elegant solution to this indexing problem? Thank you for all of your help! EXPLAIN ANALYZE result comparison... 1. EXPLAIN ANALYZE SELECT email_record_id from ONLY email_record er WHERE email_list_id = 13 AND email IN (select email from suppress); Hash Join (cost=8359220.68..9129843.00 rows=800912 width=8) (actual time=2121601.603..2121601.603 rows=0 loops=1) Hash Cond: (("outer".email)::text = ("inner".email)::text) -> Unique (cost=4414093.19..4522324.49 rows=21646260 width=25) (actual time=1165955.907..1434439.731 rows=21646261 loops=1) -> Sort (cost=4414093.19..4468208.84 rows=21646260 width=25) (actual time=1165955.903..1384667.715 rows=21646261 loops=1) Sort Key: suppress.email -> Seq Scan on suppress (cost=0.00..393024.60 rows=21646260 width=25) (actual time=37.784..609848.551 rows=21646261 loops=1) -> Hash (cost=3899868.47..3899868.47 rows=4606808 width=32) (actual time=554522.983..554522.983 rows=3245336 loops=1) -> Bitmap Heap Scan on email_record er (cost=38464.83..3899868.47 rows=4606808 width=32) (actual time=275640.435..541342.727 rows=3245336 loops=1) Recheck Cond: (email_list_id = 13) -> Bitmap Index Scan on list (cost=0.00..38464.83 rows=4606808 width=0) (actual time=275102.037..275102.037 rows=5172979 loops=1) Index Cond: (email_list_id = 13) Total runtime: 2,122,693.864 ms -------------------------------------------------------- 2. EXPLAIN ANALYZE SELECT email_record_id FROM ONLY email_record er JOIN suppress s USING (email) WHERE er.email_list_id = 13; Hash Join (cost=3945127.49..5000543.11 rows=800912 width=8) (actual time=808874.088..808874.088 rows=0 loops=1) Hash Cond: (("outer".email)::text = ("inner".email)::text) -> Seq Scan on suppress s (cost=0.00..393024.60 rows=21646260 width=25) (actual time=661.518..216933.399 rows=21646261 loops=1) -> Hash (cost=3899868.47..3899868.47 rows=4606808 width=32) (actual time=494294.932..494294.932 rows=3245336 loops=1) -> Bitmap Heap Scan on email_record er (cost=38464.83..3899868.47 rows=4606808 width=32) (actual time=242198.226..485942.542 rows=3245336 loops=1) Recheck Cond: (email_list_id = 13) -> Bitmap Index Scan on list (cost=0.00..38464.83 rows=4606808 width=0) (actual time=241769.786..241769.786 rows=5172979 loops=1) Index Cond: (email_list_id = 13) Total runtime: 808,884.387 ms
> > What prevents you from using an aggregate function? > I guess I could actually obtain the results in an aggregate function and use those to maintain a summary table. There is a web view that requires 'as accurate as possible' numbers to be queried per group (all 40 groups are displayed on the same page) and so constant aggregates over the entire table would be a nightmare. > Probably not 2x, but better performance than now. You probably don't > want RAID 1, depending on your setup, many list member swear by RAID 10. > Of course, your setup will depend on how much money you have to burn. > That said, RAID 1 testing will allow you to determine the upper bounds > of your hardware. Some folks say they get better performance with WAL > off the main RAID, some keep it on. Only testing will allow you to > determine what is optimal. I will have to try moving WAL off those raid spindles, I have seen the posts regarding this. > In the meantime, you need to identify the > bottleneck of your operation. You should collect vmstat and iostat > statistics for your present setup. Good luck! > I have to confess that I am a bit of a novice with vmstat. Below is a sample of my vmstat output while running two scrubbing queries simultaneously: machine:/dir# vmstat -S M 2 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 1 4 117 15 2962 0 0 100 25 96 107 2 0 86 11 0 3 4 117 15 2962 0 0 4884 1860 415 841 18 1 52 29 1 1 4 115 15 2964 0 0 2246 1222 462 394 8 0 51 41 0 2 4 114 14 2967 0 0 3932 2238 485 613 12 0 62 25 1 1 4 115 13 2966 0 0 3004 1684 507 609 8 0 60 31 0 3 4 116 13 2965 0 0 4688 4000 531 613 15 1 52 33 1 1 4 117 13 2964 0 0 2890 268 433 441 9 1 58 32 0 1 4 114 13 2968 0 0 2802 4708 650 501 8 1 64 28 0 2 4 114 13 2968 0 0 4850 1696 490 574 15 1 57 27 0 2 4 116 13 2966 0 0 4300 3062 540 520 13 1 61 26 0 2 4 115 13 2966 0 0 3292 3608 549 455 10 1 65 24 0 3 4 115 13 2966 0 0 4856 2098 505 564 15 1 59 26 0 3 4 115 13 2966 0 0 1608 2314 447 413 4 0 63 33 0 3 4 116 13 2966 0 0 6206 1664 442 649 18 1 52 29 1 1 4 115 13 2966 0 0 1886 1262 464 412 5 0 60 35 0 3 4 118 13 2964 0 0 2510 4138 571 493 7 1 64 28 1 1 4 117 13 2964 0 0 1632 56 325 373 5 0 53 42 0 3 4 116 13 2965 0 0 5358 3510 504 649 14 1 59 26 1 1 4 118 13 2964 0 0 2814 920 447 403 8 0 63 29 I know that wa is the time spent waiting on IO, but I lack a benchmark to determine just what I should expect from my hardware (three 146GB U320 SCSI 10k drives in raid 5 on a Dell PERC4ei PE2850 controller). Those drives are dedicated completely to a /data mount that contains only /data/postgresql/8.1/main. I have another two drives in raid 1 for everything else (OS, apps, etc.). Can you give me any pointers based on that vmstat output? Regards and Thanks, Brendan
On Wed, Oct 11, 2006 at 10:53:41AM -0600, Brendan Curran wrote: > Interestingly, and thank you to Tom and Jim, the explicit JOIN improved > performance tremendously (RESULTS BELOW). I converted the entire query > to use explicit joins instead of IN and EXISTS and discovered acceptable > performance. I think the next place to go from here is RAID1/RAID10 and > possibly partitioning my large table (Welcome to DDL insanity, right?). Remember that partitioning is not a magic bullet: it only helps in cases where you need to keep a lot of data, but normally only access a small portion of it. WAL on RAID5 without a really good controller will probably kill you. Data being there isn't too much better. You'll probably be better with either 1 raid 10 or 2 raid 1s. > I have to add that I'm a little surprised the documentation is so > generous to IN and EXISTS. Is there something amiss in my configuration > that prevents them from performing correctly? If not, I can't imagine a > time when IN or EXISTS would be more performant than an explicit JOIN... Well, IN != EXISTS != JOIN. Exists just stops as soon as it finds a record. For some cases, it's equivalent to IN, but not all. IN has to de-duplicate it's list in some fashion. For small IN lists, you can do this with an OR, but at some point you need to switch to an actual unique (actually, I suspect the difference in PostgreSQL just depends on if you passed values into IN or a subquery). A join on the other hand doesn't worry about duplicates at all. There may be some brains in the planner that realize if a subquery will return a unique set (ie: you're querying on a primary key). > Additionally, I manually scrub for duplicates at the group level in the > email_record table to keep my records unique. I would like to use a > unique constraint, but have found that batching in JDBC is impossible > due to irrecoverable errors even when using BEFORE INSERT triggers to > just return NULL if a record exists already. Has anyone got an elegant > solution for the 'add only if not exists already' problem similar to > MSSQL's MERGE command? Your best bet (until we have something akin to MERGE, hopefully in 8.3) is to load the data into a TEMP table and de-dupe it from there. Depending on what you're doing you might want to delete it, or update an ID column in the temp table. Note that assumes that only one process is loading data at any time, if that's not the case you have to get trickier. > Just one more thing... I have found that maintaining a btree index on a > varchar(255) value is extremely expensive on insert/update/delete. It is > unfortunately necessary for me to maintain this index for queries and > reports so I am transitioning to using an unindexed staging table to > import data into before merging it with the larger table. All the docs > and posts recommend is to drop the index, import your data, and then > create the index again. This is untenable on a daily / bi-weekly basis. > Is there a more elegant solution to this indexing problem? You might be happier with tsearch than a regular index. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Hi, Brendan, Brendan Curran wrote: >> What prevents you from using an aggregate function? > > I guess I could actually obtain the results in an aggregate function and > use those to maintain a summary table. There is a web view that requires > 'as accurate as possible' numbers to be queried per group (all 40 groups > are displayed on the same page) and so constant aggregates over the > entire table would be a nightmare. That sounds just like a case for GROUP BY and a materialized view. Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org
> Well, IN != EXISTS != JOIN. Exists just stops as soon as it finds a > record. For some cases, it's equivalent to IN, but not all. IN has to > de-duplicate it's list in some fashion. For small IN lists, you can do > this with an OR, but at some point you need to switch to an actual > unique (actually, I suspect the difference in PostgreSQL just depends on > if you passed values into IN or a subquery). A join on the other hand > doesn't worry about duplicates at all. There may be some brains in the > planner that realize if a subquery will return a unique set (ie: you're > querying on a primary key). > I agree, and it makes sense now that I consider it that IN would force the planner to implement some form of unique check - possibly leveraging a PK or unique index if one is already available. Maybe I'll tack up a note to the online documentation letting people know so that it's a little more explicitly clear that when you choose IN on data that isn't explicitly unique (to the planner i.e. post-analyze) you get the baggage of a forced unique whether you need it or not. Or perhaps someone that knows the internals of the planner a little better than me should put some info up regarding that? > >> Just one more thing... I have found that maintaining a btree index on a >> varchar(255) value is extremely expensive on insert/update/delete. It is >> unfortunately necessary for me to maintain this index for queries and >> reports so I am transitioning to using an unindexed staging table to >> import data into before merging it with the larger table. All the docs >> and posts recommend is to drop the index, import your data, and then >> create the index again. This is untenable on a daily / bi-weekly basis. >> Is there a more elegant solution to this indexing problem? > > You might be happier with tsearch than a regular index. Thanks, I'll look into using tsearch2 as a possibility. From what I've seen so far it would add quite a bit of complexity (necessary updates after inserts, proprietary query syntax that might require a large amount of specialization from client apps) but in the end the overhead may be less than that of maintaining the btree. Thanks and Regards, B
Brendan Curran <brendan.curran@gmail.com> writes: > I'll tack up a note to the online documentation letting people know so > that it's a little more explicitly clear that when you choose IN on > data that isn't explicitly unique (to the planner i.e. post-analyze) > you get the baggage of a forced unique whether you need it or not. Or > perhaps someone that knows the internals of the planner a little > better than me should put some info up regarding that? You get a forced unique step, period --- the planner doesn't try to shortcut on the basis of noticing a relevant unique constraint. We have some plan techniques that might look like they are not checking uniqueness (eg, an "IN Join") but they really are. This is an example of what I was talking about just a minute ago, about not wanting to rely on constraints that could go away while the plan is still potentially usable. It's certainly something that we should look at adding as soon as the plan-invalidation infrastructure is there to make it safe to do. regards, tom lane