Thread: How to find entries missing in 2nd table?
Hi,I realize I probably lost my marbles but I've been having a god awful time with a single query: control:........controller_id pk; datapack: controller_id fk;............ I need to get all entries from the table control that are not listed in datapack. Thanks, Alex
> Hi, > I realize I probably lost my marbles but I've been having a god > awful time with a single query: > > control: > .... > .... > controller_id pk; > > > datapack: > > controller_id fk; > .... > .... > .... > > I need to get all entries from the table control that are not listed in > datapack. SELECT C.CONTROLLER_ID FROM CONTROL AS C LEFT JOIN DATAPACK AS D ON ( C.CONTROLLER_ID = D.CONTROLLER_ID) WHERE D.CONTROLLER_ID IS NULL; Regards, Richard Broersma Jr.
On Tue, 2006-07-11 at 09:19, alex-lists-pgsql@yuriev.com wrote: > Hi, > I realize I probably lost my marbles but I've been having a god > awful time with a single query: > > control: > .... > .... > controller_id pk; > > > datapack: > > controller_id fk; > .... > .... > .... > > I need to get all entries from the table control that are not listed in > datapack. OK, this is a pretty common problem. Easy solution is left join / null. select t1.id from table1 t1 left join table table2 t2 on (t1.id=t2.id) where t2.id is null since a left join gives us all rows from the left table, and nulls where the right table doesn't match up, and we're asking for all the rows where t2.id is null, we'll get all the rows in t1 that don't have a match in t2. cool, huh?
Richard Broersma Jr написа: >> Hi, >> I realize I probably lost my marbles but I've been having a god >> awful time with a single query: >> >> control: >> .... >> .... >> controller_id pk; >> >> >> datapack: >> >> controller_id fk; >> .... >> .... >> .... >> >> I need to get all entries from the table control that are not listed in >> datapack. > > SELECT C.CONTROLLER_ID > > FROM CONTROL AS C > LEFT JOIN DATAPACK AS D ON ( C.CONTROLLER_ID = D.CONTROLLER_ID) > > WHERE D.CONTROLLER_ID IS NULL; > Or (SELECT controller_id FROM control) EXCEPT (SELECT controller_id FROM datapack) ? -- Milen A. Radev
At 10:19 AM 7/11/06, alex-lists-pgsql@yuriev.com wrote: >control: > .... > controller_id pk; > >datapack: > > controller_id fk; > >I need to get all entries from the table control that are not listed in >datapack. select controller.controller_id from controller left join datapack on controller.controller_id = datapack.controller_id where datapack.controller_id is null;
--- Richard Broersma Jr <rabroersma@yahoo.com> wrote: > > >> I need to get all entries from the table control that are not listed in > > >> datapack. > > > > > > SELECT C.CONTROLLER_ID > > > > > > FROM CONTROL AS C > > > LEFT JOIN DATAPACK AS D ON ( C.CONTROLLER_ID = D.CONTROLLER_ID) > > > > > > WHERE D.CONTROLLER_ID IS NULL; > > > > > > > > > Or > > (SELECT controller_id FROM control) > > EXCEPT > > Good point! But don't forget to include the list. :-) > > Regards, > > Richard Broersma Jr. > > (SELECT controller_id FROM datapack) > > ? > > > > -- > > Milen A. Radev > > > >
Hi, what about SELECT controller_id FROM control WHERE controller_id NOT IN (SELECT DISTINCT controller_id FROM datapack); ? Regards Peter > -----Ursprüngliche Nachricht----- > Von: pgsql-sql-owner@postgresql.org > [mailto:pgsql-sql-owner@postgresql.org] Im Auftrag von > Richard Broersma Jr > Gesendet: Dienstag, 11. Juli 2006 19:04 > An: SQL Postgresql List > Betreff: Re: [SQL] How to find entries missing in 2nd table? > > > > --- Richard Broersma Jr <rabroersma@yahoo.com> wrote: > > > > >> I need to get all entries from the table control that are not > > > >> listed in datapack. > > > > > > > > SELECT C.CONTROLLER_ID > > > > > > > > FROM CONTROL AS C > > > > LEFT JOIN DATAPACK AS D ON ( C.CONTROLLER_ID = > D.CONTROLLER_ID) > > > > > > > > WHERE D.CONTROLLER_ID IS NULL; > > > > > > > > > > > > > Or > > > (SELECT controller_id FROM control) > > > EXCEPT > > > > Good point! But don't forget to include the list. :-) > > > > Regards, > > > > Richard Broersma Jr. > > > (SELECT controller_id FROM datapack) ? > > > > > > -- > > > Milen A. Radev > > > > > > > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match >
On Wed, 2006-07-12 at 03:06, Exner, Peter wrote: > Hi, > > what about > > SELECT controller_id FROM control > WHERE controller_id NOT IN > (SELECT DISTINCT controller_id FROM datapack); That one works too, but it's generally not as fast as the left join / is null query on large tables. Give the two a try on a test set and you should see the speed difference.
> > SELECT controller_id FROM control > > WHERE controller_id NOT IN > > (SELECT DISTINCT controller_id FROM datapack); > The DISTINCT is not necessary. I have heard with Oracle that DISTINCT is a > huge performance problem. Is that true on PostgreSQL also? From my experience, it does not preform as well as the standard group by clause. I noticed a ~20% increase in query run times. Regards, Richard Broersma Jr.
On 7/13/06, Richard Broersma Jr <rabroersma@yahoo.com> wrote:
So in that case this would be better:
SELECT controller_id FROM control
WHERE controller_id NOT IN
(SELECT controller_id FROM datapack);
or
SELECT controller_id FROM control
WHERE controller_id NOT IN
(SELECT controller_id FROM datapack GROUP BY controller_id);
Guess you need to do some explain plans to see which would be best.
Good luck!
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
==================================================================
> > SELECT controller_id FROM control
> > WHERE controller_id NOT IN
> > (SELECT DISTINCT controller_id FROM datapack);
> The DISTINCT is not necessary. I have heard with Oracle that DISTINCT is a
> huge performance problem. Is that true on PostgreSQL also?
From my experience, it does not preform as well as the standard group by clause. I noticed a ~20%
increase in query run times.
So in that case this would be better:
SELECT controller_id FROM control
WHERE controller_id NOT IN
(SELECT controller_id FROM datapack);
or
SELECT controller_id FROM control
WHERE controller_id NOT IN
(SELECT controller_id FROM datapack GROUP BY controller_id);
Guess you need to do some explain plans to see which would be best.
Good luck!
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
==================================================================
On 7/12/06, Exner, Peter <Exner@his.de> wrote:
The DISTINCT is not necessary. I have heard with Oracle that DISTINCT is a huge performance problem. Is that true on PostgreSQL also?
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
==================================================================
Hi,
what about
SELECT controller_id FROM control
WHERE controller_id NOT IN
(SELECT DISTINCT controller_id FROM datapack);
The DISTINCT is not necessary. I have heard with Oracle that DISTINCT is a huge performance problem. Is that true on PostgreSQL also?
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
==================================================================
> > > > SELECT controller_id FROM control > > > > WHERE controller_id NOT IN > > > > (SELECT DISTINCT controller_id FROM datapack); > > > The DISTINCT is not necessary. I have heard with Oracle that DISTINCT > > is a > > > huge performance problem. Is that true on PostgreSQL also? > > > > From my experience, it does not preform as well as the standard group by > > clause. I noticed a ~20% > > increase in query run times. > > > > So in that case this would be better: > > SELECT controller_id FROM control > WHERE controller_id NOT IN > (SELECT controller_id FROM datapack); > > or > > SELECT controller_id FROM control > WHERE controller_id NOT IN > (SELECT controller_id FROM datapack GROUP BY controller_id); Well in this case, the group by or distinct is simple not needed for the query to preform correctly. The additional group by clause in the second query could cause it to preform additional processing which "may" cause it to preform slower. Regards, Richard Broersma Jr.
<div class="Section1"><p class="MsoNormal"><span class="EmailStyle18"><font color="navy" face="Arial" size="2"><span style="font-size:10.0pt;mso-bidi-font-size:12.0pt;font-family:Arial">Ihave received good performance with the following:</span></font></span><pclass="MsoNormal"><span class="EmailStyle18"><font color="navy" face="Arial" size="2"><spanstyle="font-size:10.0pt;mso-bidi-font-size:12.0pt;font-family:Arial"> </span></font></span><p class="MsoNormal"><spanclass="EmailStyle18"><font color="navy" face="Arial" size="2"><span style="font-size:10.0pt;mso-bidi-font-size:12.0pt;font-family:Arial">selectc.controller</span></font></span><p class="MsoNormal"><spanclass="EmailStyle18"><font color="navy" face="Arial" size="2"><span style="font-size:10.0pt;mso-bidi-font-size:12.0pt;font-family:Arial"><spanstyle="mso-spacerun: yes"> </span>from controlc</span></font></span><p class="MsoNormal"><span class="EmailStyle18"><font color="navy" face="Arial" size="2"><spanstyle="font-size:10.0pt;mso-bidi-font-size:12.0pt;font-family:Arial"><span style="mso-spacerun: yes"> </span>left outer join datapack d on d.controller_id = c.controller_id</span></font></span><p class="MsoNormal"><spanclass="EmailStyle18"><font color="navy" face="Arial" size="2"><span style="font-size:10.0pt;mso-bidi-font-size:12.0pt;font-family:Arial"><spanstyle="mso-spacerun: yes"> </span>where d.controller_idis null</span></font></span><p class="MsoNormal"><span class="EmailStyle18"><font color="navy" face="Arial"size="2"><span style="font-size:10.0pt;mso-bidi-font-size:12.0pt;font-family:Arial"> </span></font></span><pclass="MsoNormal"><span class="EmailStyle18"><fontcolor="navy" face="Arial" size="2"><span style="font-size:10.0pt;mso-bidi-font-size:12.0pt;font-family:Arial">NateTeller</span></font></span><p class="MsoNormal"><spanclass="EmailStyle18"><font color="navy" face="Arial" size="2"><span style="font-size:10.0pt;mso-bidi-font-size:12.0pt;font-family:Arial"> </span></font></span><pclass="MsoNormal" style="margin-left:.5in"><fontcolor="black" face="Tahoma" size="2"><span style="font-size:10.0pt;font-family:Tahoma;color:black">-----OriginalMessage-----<br /><b><span style="font-weight:bold">From:</span></b>pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org]<b><span style="font-weight:bold">OnBehalf Of </span></b>Aaron Bono<br /><b><span style="font-weight:bold">Sent:</span></b> Thursday,July 13, 2006 11:46 AM<br /><b><span style="font-weight:bold">To:</span></b> Exner, Peter<br /><b><span style="font-weight:bold">Cc:</span></b>Richard Broersma Jr; SQL Postgresql List<br /><b><span style="font-weight:bold">Subject:</span></b>Re: [SQL] How to find entries missing in 2nd table?</span></font><p class="MsoNormal"style="margin-left:.5in"><font face="Times New Roman" size="3"><span style="font-size:12.0pt"> </span></font><pclass="MsoNormal" style="margin-left:.5in"><font color="black" face="Times NewRoman" size="3"><span style="font-size:12.0pt;color:black">On 7/12/06, <b><span style="font-weight:bold">Exner, Peter</span></b><<a href="mailto:Exner@his.de">Exner@his.de</a>> wrote:</span></font><font color="black"><span style="color:black;mso-color-alt:windowtext"></span></font><divstyle="border:none;border-left:solid #CCCCCC .75pt;padding:0in0in 0in 6.0pt"><p class="MsoNormal" style="margin-left:40.8pt;border:none;mso-border-left-alt: solid #CCCCCC .75pt;padding:0in;mso-padding-alt:0in 0in 0in 6.0pt"><font color="black" face="Times New Roman" size="3"><spanstyle="font-size:12.0pt; color:black">Hi,<br /><br /> what about<br /><br /> SELECT controller_id FROM control<br /> WHERE controller_id NOT IN<br/> (SELECT DISTINCT controller_id FROM datapack);</span></font><font color="black"><span style="color:black;mso-color-alt:windowtext"></span></font></div><pclass="MsoNormal" style="margin-left:.5in"><font color="black"face="Times New Roman" size="3"><span style="font-size:12.0pt;color:black"><br /> The DISTINCT is not necessary. I have heard with Oracle that DISTINCT is a huge performance problem. Is that true on PostgreSQL also? <br /><br/> ==================================================================<br /> Aaron Bono<br /> Aranya Software Technologies,Inc.<br /> <a href="http://www.aranya.com">http://www.aranya.com</a><br /> ==================================================================</span></font><font color="black"><span style="color:black;mso-color-alt:windowtext"></span></font></div>