Thread: RE: [SQL] comparing 2 tables. . .
try this, it should select what in tab1 is not in tab2 ( compared by pkeys ) select pkey.tab1 where pkey.tab1 not in ( pkey.tab2 ); -- dan peder dpeder@infoset.cz http://shop.culture.cz -----Original Message----- From: JT Kirkpatrick [SMTP:jt-kirkpatrick@mpsllc.com] can anyone help please?? we have two tables in our database that *should* have the exact same number of records, with identical values in their corresponding primary keys (pk same in both tables, int4). for this example, table t1 with primary key pk, and table t2 with primary key pk. after about 3 months of use and a crash last night i see that the two tables do NOT have the same number of records -- one is missing some that it should have. is there a way i can compare the two tables and select ONLY the records that do NOT exist in the 2nd table?? if i could identify the records (and the pk) then i could manually insert them. each table has over 50,000 records so it is not feasable to review each record to find them manually (which i tried last night. . .). we are struggling this morning -- any help you could provide quickly would be greatly appreciated! jt kirkpatrick / mps ************
On Fri, 24 Sep 1999, [iso-8859-2] Daniel P�der wrote: > try this, it should select what in tab1 is not in tab2 ( compared by pkeys ) > > select pkey.tab1 where pkey.tab1 not in ( pkey.tab2 ); erm... unless the 'pkey' is an operator of some kind in PG 6.5.x that I'm completely unaware of, I don't think that will work. Try: SELECT tab1.pkey FROM <your_table> tab1 WHERE tab1.pkey NOT IN (SELECT tab2.pkey FROM <your_other_table> tab2.pkey); or else (faster): SELECT tab1.pkey FROM <your_table> tab1 WHERE NOT EXISTS (SELECT 1 FROM <your_other_table> tab2 WHERE tab1.pkey=tab2.pkey); Finally, if you have version 6.5.x you could use MINUS: SELECT tab1.pkey FROM <your_table> tab1 MINUS SELECT tab2.pkey FROM <your_other_table> tab2; Can't vouch for the last one as I only have PG 6.4.2. HTH, S. > > -----Original Message----- > From: JT Kirkpatrick [SMTP:jt-kirkpatrick@mpsllc.com] > > can anyone help please?? > > we have two tables in our database that *should* have the exact same number > of records, with identical values in their corresponding primary keys (pk > same in both tables, int4). for this example, table t1 with primary key > pk, and table t2 with primary key pk. after about 3 months of use and a > crash last night i see that the two tables do NOT have the same number of > records -- one is missing some that it should have. is there a way i can > compare the two tables and select ONLY the records that do NOT exist in the > 2nd table?? if i could identify the records (and the pk) then i could > manually insert them. each table has over 50,000 records so it is not > feasable to review each record to find them manually (which i tried last > night. . .). > > we are struggling this morning -- any help you could provide quickly would > be greatly appreciated! > > jt kirkpatrick / mps Stuart C. G. Rison Department of Biochemistry and Molecular Biology 6th floor, Darwin Building, University College London (UCL) Gower Street, London, WC1E 6BT, United Kingdom Tel. 0207 504 2303, Fax. 0207 380 7033 e-mail: rison@biochem.ucl.ac.uk
On Fri, Sep 24, 1999 at 03:32:51PM +0100, Stuart Rison wrote: ... > Try: > > SELECT tab1.pkey FROM <your_table> tab1 WHERE tab1.pkey NOT IN (SELECT > tab2.pkey FROM <your_other_table> tab2.pkey); > > or else (faster): > > SELECT tab1.pkey FROM <your_table> tab1 WHERE NOT EXISTS (SELECT 1 FROM > <your_other_table> tab2 WHERE tab1.pkey=tab2.pkey); Maybe this is trivial, but... Is there a simple way to see that/why the last query is faster? Albert. -- --------------------------------------------------------------------------- Post an / Mail to / Skribu al: Albert Reiner<areiner@tph.tuwien.ac.at> ---------------------------------------------------------------------------
On Fri, 24 Sep 1999, Albert REINER wrote: >On Fri, Sep 24, 1999 at 03:32:51PM +0100, Stuart Rison wrote: >... >> Try: >> >> SELECT tab1.pkey FROM <your_table> tab1 WHERE tab1.pkey NOT IN (SELECT >> tab2.pkey FROM <your_other_table> tab2.pkey); >> >> or else (faster): >> >> SELECT tab1.pkey FROM <your_table> tab1 WHERE NOT EXISTS (SELECT 1 FROM >> <your_other_table> tab2 WHERE tab1.pkey=tab2.pkey); > >Maybe this is trivial, but... > >Is there a simple way to see that/why the last query is faster? Yes, the simple way is to look at the thread 'subselect performance' a few days ago... :-)) In essence, I don't know... ;) Something about that the subselect's result always recomputed for each row in the main select or what, if I got the idea. (And in the 2nd case the subselect has only one row, provided that the primary keys are unique.) Circum PS: Anyway, as I can figure it out, this is that 'EXISTS' workaround that you mentioned in that thread, right? __ @ / \ _ _ Engárd Ferenc l | ( \ / | | (\/) mailto:s-fery@kkt.sote.hu \__/ | | \_ \_/ I I http://pons.sote.hu/~s-fery
Well, I guess officially you're supposed to be able to decode an EXPLAIN of each statement but I'm not quite that savvy (yet). With regards to this specific examples, I seem to remember a number of previous postings that say that (NOT) EXISTS type statements are much faster that (NOT) IN (SELECT...) type statements (because that kind of sub-select is currently sub-optimally implemented in PG). regards, Stuart. On Fri, 24 Sep 1999, Albert REINER wrote: > On Fri, Sep 24, 1999 at 03:32:51PM +0100, Stuart Rison wrote: > ... > > Try: > > > > SELECT tab1.pkey FROM <your_table> tab1 WHERE tab1.pkey NOT IN (SELECT > > tab2.pkey FROM <your_other_table> tab2.pkey); > > > > or else (faster): > > > > SELECT tab1.pkey FROM <your_table> tab1 WHERE NOT EXISTS (SELECT 1 FROM > > <your_other_table> tab2 WHERE tab1.pkey=tab2.pkey); > > Maybe this is trivial, but... > > Is there a simple way to see that/why the last query is faster? > > Albert. > > -- > > --------------------------------------------------------------------------- > Post an / Mail to / Skribu al: Albert Reiner <areiner@tph.tuwien.ac.at> > --------------------------------------------------------------------------- > > ************ > Stuart C. G. Rison Department of Biochemistry and Molecular Biology 6th floor, Darwin Building, University College London (UCL) Gower Street, London, WC1E 6BT, United Kingdom Tel. 0207 504 2303, Fax. 0207 380 7033 e-mail: rison@biochem.ucl.ac.uk