Thread: Proper Join and check
If I have a one to many relation set on sym in each table is like... say this: CREATE TABLE "tableone" ( "sym" varchar(5) NOT NULL, PRIMARY KEY ("sym") ); and CREATE TABLE "tablemany" ( "id" int4 NOT NULL DEFAULT nextval('res_id_seq'::text), "sym" varchar(5) REFERENCES tableone ON UPDATE CASCADE, PRIMARY KEY ("id") ); Now these tables hold a lot of data so I want to do this correctly. How do I find the records in tableone that don't have any corresponding records in tablemany? In my application this shouldn't happen accept during the initial inserting in tableone which briefly is followed by inserting in tablemany.
Billy G. Allie wrote: ----------------->>>> The query you are looking for is: select a.sym from tableone a where a.sym not in (select b.sym from tablemany b where b.sym = a.sym); This query will run MUCH faster if you create a secondary index for tablemany (in fact, don't even try it without the index for any significant number or rows): create index tablemany_sym on tablemany(sym); A better solution though, would be to use triggers to update tablemany when tableone is updated, or to wrap both updates in a single transaction so that the tableone transaction can be rolled back if tablemany is not updated. Not knowing what you are attempting to do and how the updates occur prevents me from going into any more detail than this. ----------------->>>> Oh yeah it helped... Thanks the query took like 4 seconds. I'm not sure I totally understand the b.sym = a.sym though. I was trying to not bore anyone with a in depth description. But since you asked... There is company data in each record of the tableone, and daily inputs of accounting data in tablemany. Now when a company gets added to table one it's because it's YTD sales went above a certain point. So a record gets inserted into tableone, and then the accounting data gets inserted into tablemany sequentially in that order. When any company drops below that certain point it gets flagged and if it stays there for a certain time, it gets dropped from the tableone and tablemany. The only constraint I had set was the "REFERENCES tableone ON UPDATE CASCADE" which adds some continuity, but not much. Hadn't thought about triggers or really anything. Application was working fine until I noticed that there was a sym in tableone that didn't exist in tablemany. Does that describe it good? Again thanks for the index and select query.
Patrick Nelson wrote: > Billy G. Allie wrote: > ----------------->>>> > The query you are looking for is: > > select a.sym from tableone a > where a.sym not in (select b.sym from tablemany b > where b.sym = a.sym); > > This query will run MUCH faster if you create a secondary index for > tablemany (in fact, don't even try it without the index for any significant > number or rows): [. . .] > Oh yeah it helped... Thanks the query took like 4 seconds. I'm not sure I > totally understand the b.sym = a.sym though. With the b.sym = a.sym, the query optimizer will use an index scan of tablemany. Without it, a sequential scan will be used (very VERY S L O W for any signifi- cant number of rows). Without the b.sym = a.sym, the result of the subselect will be all the rows in tablemany which will have to be scanned to see if the test (not in) succeeds. With the b.sym = a.sym, the result of the subselect will be empty or the value of b.sym (as determined by an index lookup, which is why the index was needed). I hope this clarifies things somewhat. ___________________________________________________________________________ ____ | Billy G. Allie | Domain....: Bill.Allie@mug.org | /| | 7436 Hartwell | MSN.......: B_G_Allie@email.msn.com |-/-|----- | Dearborn, MI 48126| |/ |LLIE | (313) 582-1540 |
Patrick Nelson wrote: > If I have a one to many relation set on sym in each table is like... say > this: > > CREATE TABLE "tableone" ( > "sym" varchar(5) NOT NULL, > PRIMARY KEY ("sym") ); > > and > > CREATE TABLE "tablemany" ( > "id" int4 NOT NULL DEFAULT nextval('res_id_seq'::text), > "sym" varchar(5) REFERENCES tableone ON UPDATE CASCADE, > PRIMARY KEY ("id") ); > > Now these tables hold a lot of data so I want to do this correctly. How do > I find the records in tableone that don't have any corresponding records in > tablemany? In my application this shouldn't happen accept during the > initial inserting in tableone which briefly is followed by inserting in > tablemany. The query you are looking for is: select a.sym from tableone a where a.sym not in (select b.sym from tablemany b where b.sym = a.sym); This query will run MUCH faster if you create a secondary index for tablemany (in fact, don't even try it without the indexfor any significant number or rows): create index tablemany_sym on tablemany(sym); A better solution though, would be to use triggers to update tablemany when tableone is updated, or to wrap both updatesin a single transaction so that the tableone transaction can be rolled back if tablemany is not updated. Not knowing what you are attempting to do and how the updates occur prevents me from going into any more detail than this. I hope this helps. -- ____ | Billy G. Allie | Domain....: Bill.Allie@mug.org | /| | 7436 Hartwell | MSN.......: B_G_Allie@email.msn.com |-/-|----- | Dearborn, MI 48126| |/ |LLIE | (313) 582-1540 |