Re: Non Matching Records in Two Tables - Mailing list pgsql-sql
From | Ken Hill |
---|---|
Subject | Re: Non Matching Records in Two Tables |
Date | |
Msg-id | 1139950507.3083.64.camel@localhost.localdomain Whole thread Raw |
In response to | Re: Non Matching Records in Two Tables (Markus Schaber <schabi@logix-tt.com>) |
Responses |
Re: Non Matching Records in Two Tables
Re: Non Matching Records in Two Tables |
List | pgsql-sql |
On Wed, 2006-02-08 at 22:31 +0100, Markus Schaber wrote: <blockquote type="CITE"><pre> <font color="#000000">Hi, Ken,</font> <font color="#000000">Ken Hill schrieb:</font> <font color="#000000">> I need some help with a bit of SQL. I have two tables. I want to find</font> <font color="#000000">> records in one table that don't match records in another table based on</font> <font color="#000000">> a common column in the two tables. Both tables have a column named</font> <font color="#000000">> 'key100'. I was trying something like:</font> <font color="#000000">> </font> <font color="#000000">> SELECT count(*)</font> <font color="#000000">> FROM table1, table2</font> <font color="#000000">> WHERE (table1.key100 != table2.key100);</font> <font color="#000000">> </font> <font color="#000000">> But the query is very slow and I finally just cancel it. Any help is</font> <font color="#000000">> very much appreciated.</font> <font color="#000000">Do you have indices on the key100 columns? Is autovacuum running, or do</font> <font color="#000000">you do analyze manually?</font> <font color="#000000">Can you send us the output from "EXPLAIN ANALYZE [your query]"?</font> <font color="#000000">Btw, I don't think this query will do what you wanted, it basically</font> <font color="#000000">creates a cross product, that means if your tables look like:</font> <font color="#000000">schabitest=# select * from table1;</font> <font color="#000000"> key100 | valuea | valueb</font> <font color="#000000">--------+--------+--------</font> <font color="#000000"> 1 | foo | bar</font> <font color="#000000"> 2 | blah | blubb</font> <font color="#000000"> 3 | manga | mungo</font> <font color="#000000">schabitest=# select * from table2;</font> <font color="#000000"> key100 | valuec | valued</font> <font color="#000000">--------+--------+--------</font> <font color="#000000"> 1 | monday | euro</font> <font color="#000000"> 2 | sunday | dollar</font> <font color="#000000"> 4 | friday | pounds</font> <font color="#000000">Then your query will produce something like:</font> <font color="#000000">schabitest=# select * from table1, table2 WHERE (table1.key100 !=</font> <font color="#000000">table2.key100);</font> <font color="#000000"> key100 | valuea | valueb | key100 | valuec | valued</font> <font color="#000000">--------+--------+--------+--------+--------+--------</font> <font color="#000000"> 1 | foo | bar | 2 | sunday | dollar</font> <font color="#000000"> 1 | foo | bar | 4 | friday | pounds</font> <font color="#000000"> 2 | blah | blubb | 1 | monday | euro</font> <font color="#000000"> 2 | blah | blubb | 4 | friday | pounds</font> <font color="#000000"> 3 | manga | mungo | 1 | monday | euro</font> <font color="#000000"> 3 | manga | mungo | 2 | sunday | dollar</font> <font color="#000000"> 3 | manga | mungo | 4 | friday | pounds</font> <font color="#000000">I suggest you would like to have all records from table1 that don't have</font> <font color="#000000">a corresponding record in table2:</font> <font color="#000000">schabitest=# select * from table1 where table1.key100 not in (select</font> <font color="#000000">key100 from table2);</font> <font color="#000000"> key100 | valuea | valueb</font> <font color="#000000">--------+--------+--------</font> <font color="#000000"> 3 | manga | mungo</font> <font color="#000000">HTH,</font> <font color="#000000">Markus</font> </pre></blockquote> Here is my query SQL:<br /><br /> SELECT key100 FROM ncccr10<br /> WHERE ncccr10.key100 NOT IN (SELECTkey100 FROM ncccr9);<br /><br /> It is is running after 30 minutes. Here is the query plan:<br /><br /> QUERY PLAN<br /> -------------------------------------------------------------------------<br/> Seq Scan on ncccr10 (cost=0.00..20417160510.08rows=305782 width=104)<br /> Filter: (NOT (subplan))<br /> SubPlan<br /> -> SeqScan on ncccr9 (cost=0.00..65533.71 rows=494471 width=104)<br /> (4 rows)<br /><br /> Any ideas why it is so slow?