Re: Non Matching Records in Two Tables - Mailing list pgsql-sql

From Ken Hill
Subject Re: Non Matching Records in Two Tables
Date
Msg-id 1139950507.3083.64.camel@localhost.localdomain
Whole thread Raw
In response to Re: Non Matching Records in Two Tables  (Markus Schaber <schabi@logix-tt.com>)
Responses Re: Non Matching Records in Two Tables  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Non Matching Records in Two Tables  (chester c young <chestercyoung@yahoo.com>)
List pgsql-sql
On Wed, 2006-02-08 at 22:31 +0100, Markus Schaber wrote: <blockquote type="CITE"><pre>
<font color="#000000">Hi, Ken,</font>

<font color="#000000">Ken Hill schrieb:</font>
<font color="#000000">> I need some help with a bit of SQL. I have two tables. I want to find</font>
<font color="#000000">> records in one table that don't match records in another table based on</font>
<font color="#000000">> a common column in the two tables. Both tables have a column named</font>
<font color="#000000">> 'key100'. I was trying something like:</font>
<font color="#000000">> </font>
<font color="#000000">> SELECT count(*)</font>
<font color="#000000">> FROM table1, table2</font>
<font color="#000000">> WHERE (table1.key100 != table2.key100);</font>
<font color="#000000">> </font>
<font color="#000000">> But the query is very slow and I finally just cancel it. Any help is</font>
<font color="#000000">> very much appreciated.</font>

<font color="#000000">Do you have indices on the key100 columns? Is autovacuum running, or do</font>
<font color="#000000">you do analyze manually?</font>

<font color="#000000">Can you send us the output from "EXPLAIN ANALYZE [your query]"?</font>

<font color="#000000">Btw, I don't think this query will do what you wanted, it basically</font>
<font color="#000000">creates a cross product, that means if your tables look like:</font>

<font color="#000000">schabitest=# select * from table1;</font>
<font color="#000000"> key100 | valuea | valueb</font>
<font color="#000000">--------+--------+--------</font>
<font color="#000000">      1 | foo    | bar</font>
<font color="#000000">      2 | blah   | blubb</font>
<font color="#000000">      3 | manga  | mungo</font>

<font color="#000000">schabitest=# select * from table2;</font>
<font color="#000000"> key100 | valuec | valued</font>
<font color="#000000">--------+--------+--------</font>
<font color="#000000">      1 | monday | euro</font>
<font color="#000000">      2 | sunday | dollar</font>
<font color="#000000">      4 | friday | pounds</font>

<font color="#000000">Then your query will produce something like:</font>
<font color="#000000">schabitest=# select * from table1, table2 WHERE (table1.key100 !=</font>
<font color="#000000">table2.key100);</font>
<font color="#000000"> key100 | valuea | valueb | key100 | valuec | valued</font>
<font color="#000000">--------+--------+--------+--------+--------+--------</font>
<font color="#000000">      1 | foo    | bar    |      2 | sunday | dollar</font>
<font color="#000000">      1 | foo    | bar    |      4 | friday | pounds</font>
<font color="#000000">      2 | blah   | blubb  |      1 | monday | euro</font>
<font color="#000000">      2 | blah   | blubb  |      4 | friday | pounds</font>
<font color="#000000">      3 | manga  | mungo  |      1 | monday | euro</font>
<font color="#000000">      3 | manga  | mungo  |      2 | sunday | dollar</font>
<font color="#000000">      3 | manga  | mungo  |      4 | friday | pounds</font>

<font color="#000000">I suggest you would like to have all records from table1 that don't have</font>
<font color="#000000">a corresponding record in table2:</font>

<font color="#000000">schabitest=# select * from table1 where table1.key100 not in (select</font>
<font color="#000000">key100 from table2);</font>
<font color="#000000"> key100 | valuea | valueb</font>
<font color="#000000">--------+--------+--------</font>
<font color="#000000">      3 | manga  | mungo</font>

<font color="#000000">HTH,</font>
<font color="#000000">Markus</font>

</pre></blockquote> Here is my query SQL:<br /><br /> SELECT key100 FROM ncccr10<br /> WHERE ncccr10.key100 NOT IN
(SELECTkey100 FROM ncccr9);<br /><br /> It is is running after 30 minutes. Here is the query plan:<br /><br />
                              QUERY PLAN<br />
-------------------------------------------------------------------------<br/> Seq Scan on ncccr10 
(cost=0.00..20417160510.08rows=305782 width=104)<br />    Filter: (NOT (subplan))<br />    SubPlan<br />      -> 
SeqScan on ncccr9  (cost=0.00..65533.71 rows=494471 width=104)<br /> (4 rows)<br /><br /> Any ideas why it is so slow?  

pgsql-sql by date:

Previous
From: Markus Schaber
Date:
Subject: Re: Slow update SQL
Next
From: Tom Lane
Date:
Subject: Re: Non Matching Records in Two Tables