Thread: RE: [SQL] comparing 2 tables. . .

RE: [SQL] comparing 2 tables. . .

From

Daniel Péder

Date:

24 September 1999, 09:52:07

try this, it should select what in tab1 is not in tab2 ( compared by pkeys )

select pkey.tab1 where pkey.tab1 not in ( pkey.tab2 );

--

dan peder
dpeder@infoset.cz
http://shop.culture.cz

-----Original Message-----
From:    JT Kirkpatrick [SMTP:jt-kirkpatrick@mpsllc.com]

can anyone help please??

we have two tables in our database that *should* have the exact same number 
of records, with identical values in their corresponding primary keys (pk 
same in both tables, int4).  for this example, table t1 with primary key 
pk, and table t2 with primary key pk.  after about 3 months of use and a 
crash last night i see that the two tables do NOT have the same number of 
records -- one is missing some that it should have.  is there a way i can 
compare the two tables and select ONLY the records that do NOT exist in the 
2nd table??  if i could identify the records (and the pk) then i could 
manually insert them.  each table has over 50,000 records so it is not 
feasable to review each record to find them manually (which i tried last 
night. . .).

we are struggling this morning -- any help you could provide quickly would 
be greatly appreciated!

jt kirkpatrick / mps

************

RE: [SQL] comparing 2 tables. . .

From

Stuart Rison

Date:

24 September 1999, 10:30:14

On Fri, 24 Sep 1999, [iso-8859-2] Daniel P�der wrote:

> try this, it should select what in tab1 is not in tab2 ( compared by pkeys )
> 
> select pkey.tab1 where pkey.tab1 not in ( pkey.tab2 );

erm... unless the 'pkey' is an operator of some kind in PG 6.5.x that I'm 
completely unaware of, I don't think that will work.

Try:

SELECT tab1.pkey FROM <your_table> tab1 WHERE tab1.pkey NOT IN (SELECT
tab2.pkey FROM <your_other_table> tab2.pkey);

or else (faster):

SELECT tab1.pkey FROM <your_table> tab1 WHERE NOT EXISTS (SELECT 1 FROM
<your_other_table> tab2 WHERE tab1.pkey=tab2.pkey);

Finally, if you have version 6.5.x you could use MINUS:

SELECT tab1.pkey FROM <your_table> tab1
MINUS
SELECT tab2.pkey FROM <your_other_table> tab2;

Can't vouch for the last one as I only have PG 6.4.2.

HTH,

S. 


> 
> -----Original Message-----
> From:    JT Kirkpatrick [SMTP:jt-kirkpatrick@mpsllc.com]
> 
> can anyone help please??
> 
> we have two tables in our database that *should* have the exact same number 
> of records, with identical values in their corresponding primary keys (pk 
> same in both tables, int4).  for this example, table t1 with primary key 
> pk, and table t2 with primary key pk.  after about 3 months of use and a 
> crash last night i see that the two tables do NOT have the same number of 
> records -- one is missing some that it should have.  is there a way i can 
> compare the two tables and select ONLY the records that do NOT exist in the 
> 2nd table??  if i could identify the records (and the pk) then i could 
> manually insert them.  each table has over 50,000 records so it is not 
> feasable to review each record to find them manually (which i tried last 
> night. . .).
> 
> we are struggling this morning -- any help you could provide quickly would 
> be greatly appreciated!
> 
> jt kirkpatrick / mps

Stuart C. G. Rison
Department of Biochemistry and Molecular Biology
6th floor, Darwin Building, University College London (UCL)
Gower Street, London, WC1E 6BT, United Kingdom
Tel. 0207 504 2303, Fax. 0207 380 7033
e-mail: rison@biochem.ucl.ac.uk

Re: [SQL] comparing 2 tables. . .

From

"Albert REINER"

Date:

24 September 1999, 16:52:31

On Fri, Sep 24, 1999 at 03:32:51PM +0100, Stuart Rison wrote:
...
> Try:
> 
> SELECT tab1.pkey FROM <your_table> tab1 WHERE tab1.pkey NOT IN (SELECT
> tab2.pkey FROM <your_other_table> tab2.pkey);
> 
> or else (faster):
> 
> SELECT tab1.pkey FROM <your_table> tab1 WHERE NOT EXISTS (SELECT 1 FROM
> <your_other_table> tab2 WHERE tab1.pkey=tab2.pkey);

Maybe this is trivial, but...

Is there a simple way to see that/why the last query is faster?

Albert.

-- 

--------------------------------------------------------------------------- Post an / Mail to / Skribu al: Albert
Reiner<areiner@tph.tuwien.ac.at>
 
---------------------------------------------------------------------------

Re: [SQL] comparing 2 tables. . .

From

Engard Ferenc

Date:

25 September 1999, 09:59:28

On Fri, 24 Sep 1999, Albert REINER wrote:

>On Fri, Sep 24, 1999 at 03:32:51PM +0100, Stuart Rison wrote:
>...
>> Try:
>> 
>> SELECT tab1.pkey FROM <your_table> tab1 WHERE tab1.pkey NOT IN (SELECT
>> tab2.pkey FROM <your_other_table> tab2.pkey);
>> 
>> or else (faster):
>> 
>> SELECT tab1.pkey FROM <your_table> tab1 WHERE NOT EXISTS (SELECT 1 FROM
>> <your_other_table> tab2 WHERE tab1.pkey=tab2.pkey);
>
>Maybe this is trivial, but...
>
>Is there a simple way to see that/why the last query is faster?

Yes, the simple way is to look at the thread 'subselect performance'
a few days ago... :-))  In essence, I don't know... ;)  Something
about that the subselect's result always recomputed for each row in
the main select or what, if I got the idea. (And in the 2nd case the
subselect has only one row, provided that the primary keys are
unique.)

Circum

PS: Anyway, as I can figure it out, this is that 'EXISTS' workaround
that you mentioned in that thread, right?
__  @
/  \    _   _                                           Engárd Ferenc
l    | ( \ /  | | (\/)                      mailto:s-fery@kkt.sote.hu
\__/ | |   \_ \_/ I  I                    http://pons.sote.hu/~s-fery

Re: [SQL] comparing 2 tables. . .

From

Stuart Rison

Date:

27 September 1999, 06:55:11

Well, I guess officially you're supposed to be able to decode an EXPLAIN
of each statement but I'm not quite that savvy (yet).

With regards to this specific examples, I seem to remember a number of
previous postings that say that (NOT) EXISTS type statements are much
faster that (NOT) IN (SELECT...) type statements (because that kind of
sub-select is currently sub-optimally implemented in PG).

regards,

Stuart.

On Fri, 24 Sep 1999, Albert REINER wrote:

> On Fri, Sep 24, 1999 at 03:32:51PM +0100, Stuart Rison wrote:
> ...
> > Try:
> > 
> > SELECT tab1.pkey FROM <your_table> tab1 WHERE tab1.pkey NOT IN (SELECT
> > tab2.pkey FROM <your_other_table> tab2.pkey);
> > 
> > or else (faster):
> > 
> > SELECT tab1.pkey FROM <your_table> tab1 WHERE NOT EXISTS (SELECT 1 FROM
> > <your_other_table> tab2 WHERE tab1.pkey=tab2.pkey);
> 
> Maybe this is trivial, but...
> 
> Is there a simple way to see that/why the last query is faster?
> 
> Albert.
> 
> -- 
> 
> ---------------------------------------------------------------------------
>   Post an / Mail to / Skribu al: Albert Reiner <areiner@tph.tuwien.ac.at>
> ---------------------------------------------------------------------------
> 
> ************
> 

Stuart C. G. Rison
Department of Biochemistry and Molecular Biology
6th floor, Darwin Building, University College London (UCL)
Gower Street, London, WC1E 6BT, United Kingdom
Tel. 0207 504 2303, Fax. 0207 380 7033
e-mail: rison@biochem.ucl.ac.uk