Thread: Hack around lack of CORRESPONDING BY in EXCEPT?
I'm not sure if this a performance question or a sql question really, but since my primarily peeve here is performance, here goes:
I'm trying to write a query which takes the output of a join and shows me only what the items that are in the main join but not in the subselect of just one of the tables in the join, using EXCEPT.
This is a little complicated, so please bear with me.
I have two tables: an event table that logs random events as they come in, and a tracking table that keeps a state of events it cares about. In this particular case I'm trying to obtain a list of tracking pkeys for related event data that do not correspond to a certain (other) set of event data.
Ideally, here is what I want:
SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE tracking.event_fk = event.pk EXCEPT (SELECT events.data1,events.data2 FROM events WHERE event.type = 10)
The problem I have of course is that I get an error regarding trying to use different columns for the two queries in EXCEPT. I'm sure someone will point this out, but the following suggestion will not work:
SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE tracking.event_fk = event.pk EXCEPT (SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE tracking.event_fk = event.pk AND event.type = 10)
That won't work for two reasons... first, there are no matching entries in the tracking table pointing to events where event.type = 10, meaning this query would always return an empty set. And even if there were, I don't want to do the join twice if its not necessary, as the events table is liable to be very large.
The official solution to this I believe would be to just use CORRESPONDING BY, but that's not supported by PG (why exactly, oh why!)
Suggestions, anyone? Thanks in advance,
Lucas.
On Wed, 7 May 2003, Lucas Adamski wrote: > I'm not sure if this a performance question or a sql question really, but > since my primarily peeve here is performance, here goes: > > I'm trying to write a query which takes the output of a join and shows me > only what the items that are in the main join but not in the subselect of > just one of the tables in the join, using EXCEPT. > > This is a little complicated, so please bear with me. > > I have two tables: an event table that logs random events as they come in, > and a tracking table that keeps a state of events it cares about. In this > particular case I'm trying to obtain a list of tracking pkeys for related > event data that do not correspond to a certain (other) set of event data. > > Ideally, here is what I want: > > SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE > tracking.event_fk = event.pk EXCEPT (SELECT events.data1,events.data2 FROM > events WHERE event.type = 10) Maybe something like (if I'm right in assuming that you want any event whose data1 and data2 match an event having type 10): select tracking.pk, e.data1, e.data2 from tracking, ((select data1,data2 from events) except (select data1,data2 from events where event.type=10)) e where tracking.event_fk=e.pk; > The official solution to this I believe would be to just use CORRESPONDING > BY, but that's not supported by PG (why exactly, oh why!) Because it's not entry level SQL92 and noone's implemented it yet. :)
On Wed, 7 May 2003, Lucas Adamski wrote: Of course my last suggestion won't work since you need to get the event.pk field out. The actual subquery would need to be more complicated and probably involve an IN or EXISTS. :(
Not sure if I'm reading your question correctly, but is this what you want? SELECT t.pk,e.data1,e.data2 FROM tracking t left outer join events e on t.event_fk = e.pk WHERE e.type <> 10 OR SELECT t.pk,e.data1,e.data2 FROM tracking t inner join events e on t.event_fk = e.pk WHERE e.type <> 10 "Lucas Adamski" <ladamski@manageww.com> To: "Postgresql Performance Mailing list (E-mail)" Sent by: <pgsql-performance@postgresql.org> pgsql-performance-owner@post cc: gresql.org Subject: [PERFORM] Hack around lack of CORRESPONDING BY in EXCEPT? 05/07/2003 12:11 PM I'm not sure if this a performance question or a sql question really, but since my primarily peeve here is performance, here goes: I'm trying to write a query which takes the output of a join and shows me only what the items that are in the main join but not in the subselect of just one of the tables in the join, using EXCEPT. This is a little complicated, so please bear with me. I have two tables: an event table that logs random events as they come in, and a tracking table that keeps a state of events it cares about. In this particular case I'm trying to obtain a list of tracking pkeys for related event data that do not correspond to a certain (other) set of event data. Ideally, here is what I want: SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE tracking.event_fk = event.pk EXCEPT (SELECT events.data1,events.data2 FROM events WHERE event.type = 10) The problem I have of course is that I get an error regarding trying to use different columns for the two queries in EXCEPT. I'm sure someone will point this out, but the following suggestion will not work: SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE tracking.event_fk = event.pk EXCEPT (SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE tracking.event_fk = event.pk AND event.type = 10) That won't work for two reasons... first, there are no matching entries in the tracking table pointing to events where event.type = 10, meaning this query would always return an empty set. And even if there were, I don't want to do the join twice if its not necessary, as the events table is liable to be very large. The official solution to this I believe would be to just use CORRESPONDING BY, but that's not supported by PG (why exactly, oh why!) Suggestions, anyone? Thanks in advance, Lucas.
On Wed, 7 May 2003 12:11:46 -0700, "Lucas Adamski" <ladamski@manageww.com> wrote: >I have two tables: an event table that logs random events as they come in, >and a tracking table that keeps a state of events it cares about. In this >particular case I'm trying to obtain a list of tracking pkeys for related >event data that do not correspond to a certain (other) set of event data. > >Ideally, here is what I want: > >SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE >tracking.event_fk = event.pk EXCEPT (SELECT events.data1,events.data2 FROM >events WHERE event.type = 10) Lucas, try this untested query: SELECT tr.pk, ev.data1, ev.data2 FROM tracking tr INNER JOIN events ev ON tr.event_fk = ev.pk WHERE ev.type != 10; (Should also work with AND instead of WHERE.) >SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE >tracking.event_fk = event.pk EXCEPT (SELECT >tracking.pk,events.data1,events.data2 FROM tracking,events WHERE >tracking.event_fk = event.pk AND event.type = 10) > >That won't work for two reasons... first, there are no matching entries in >the tracking table pointing to events where event.type = 10, meaning this >query would always return an empty set. I don't understand this. If there are no entries with event.type 10, then the subselect returns an empty result set, and <anything> EXCEPT <empty> should give the original result? Servus Manfred
Stephan, Yup, unfortunately you are correct... I'd need to get the event.pk's out of there somewhere to join with the tracking.event_fk. I can't put the event.pk in the subselects as they don't match, and I would get an empty set back. select tracking.pk, e.data1, e.data2 from tracking, ((select data1,data2 from events) except (select data1,data2 from events where event.type=10)) e where tracking.event_fk=e.pk; I wrote it originally as: SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE tracking.event_fk = event.pk EXCEPT (SELECT events.data1,events.data2 FROM events WHERE event.type = 10) because each of these subqueries restricts the dataset greatly before doing the join. I've simplified the actual problem (as the real code has a bunch of extraneous stuff that makes it even more obtuse), but essentially, the tracking table maintains a record of the last record type that was entered. The type is incremented for each batch of events that is loaded. In this case, I'm assuming that the latest batch is type=10 (or 5000, or 100000), and the tracking table references a small subset of previous events (possibly of types 1-9 in this example). This particular query is supposed to return all tracking.pk's that are present in the previous batches (types) but not in the latest batch (10). I didn't mean to make it quite so obtuse, sorry. :) So in this case I'm getting all of the relevant data for the new entries, subtracting those from the old entries that are referred to by the tracking system, and returning those outdated tracking.pk's. Lucas. -----Original Message----- From: Stephan Szabo [mailto:sszabo@megazone23.bigpanda.com] Sent: Wednesday, May 07, 2003 12:43 PM To: Lucas Adamski Cc: Postgresql Performance Mailing list (E-mail) Subject: Re: [PERFORM] Hack around lack of CORRESPONDING BY in EXCEPT? On Wed, 7 May 2003, Lucas Adamski wrote: Of course my last suggestion won't work since you need to get the event.pk field out. The actual subquery would need to be more complicated and probably involve an IN or EXISTS. :(
Patrick, I don't think that wouldn't quite work unfortunately, as I'm actually trying to filter them out based upon the values in data1 and data2. I'm using the data in set 2 (data1,data2 from events where type=10) to remove rows from set 1 (join between events and tracking table) where set1.data1=set2.data1 and set1.data2=set2.data2, and returning the tracking id's for any rows left in set 1 (that were not in set 2). I probably gave a better explaination in my response to Stephan. In the case below, I would simply get all events where type<>10 from the join, regardless of whether they matched the data1 and data2 for all type=10. Thanks, Lucas. -----Original Message----- From: Patrick Hatcher [mailto:PHatcher@macys.com] Sent: Wednesday, May 07, 2003 12:40 PM To: ladamski@manageww.com Cc: Postgresql Performance Mailing list (E-mail); pgsql-performance-owner@postgresql.org Subject: Re: [PERFORM] Hack around lack of CORRESPONDING BY in EXCEPT? Not sure if I'm reading your question correctly, but is this what you want? SELECT t.pk,e.data1,e.data2 FROM tracking t left outer join events e on t.event_fk = e.pk WHERE e.type <> 10 OR SELECT t.pk,e.data1,e.data2 FROM tracking t inner join events e on t.event_fk = e.pk WHERE e.type <> 10 "Lucas Adamski" <ladamski@manageww.com> To: "Postgresql Performance Mailing list (E-mail)" Sent by: <pgsql-performance@postgresql.org> pgsql-performance-owner@post cc: gresql.org Subject: [PERFORM] Hack around lack of CORRESPONDING BY in EXCEPT? 05/07/2003 12:11 PM I'm not sure if this a performance question or a sql question really, but since my primarily peeve here is performance, here goes: I'm trying to write a query which takes the output of a join and shows me only what the items that are in the main join but not in the subselect of just one of the tables in the join, using EXCEPT. This is a little complicated, so please bear with me. I have two tables: an event table that logs random events as they come in, and a tracking table that keeps a state of events it cares about. In this particular case I'm trying to obtain a list of tracking pkeys for related event data that do not correspond to a certain (other) set of event data. Ideally, here is what I want: SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE tracking.event_fk = event.pk EXCEPT (SELECT events.data1,events.data2 FROM events WHERE event.type = 10) The problem I have of course is that I get an error regarding trying to use different columns for the two queries in EXCEPT. I'm sure someone will point this out, but the following suggestion will not work: SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE tracking.event_fk = event.pk EXCEPT (SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE tracking.event_fk = event.pk AND event.type = 10) That won't work for two reasons... first, there are no matching entries in the tracking table pointing to events where event.type = 10, meaning this query would always return an empty set. And even if there were, I don't want to do the join twice if its not necessary, as the events table is liable to be very large. The official solution to this I believe would be to just use CORRESPONDING BY, but that's not supported by PG (why exactly, oh why!) Suggestions, anyone? Thanks in advance, Lucas.
Manfred, I think what you propose is similar to what Patrick proposed, let me see if I can explain below: > -----Original Message----- > From: Manfred Koizar [mailto:mkoi-pg@aon.at] > Sent: Wednesday, May 07, 2003 1:23 PM > To: Lucas Adamski > Cc: Postgresql Performance Mailing list (E-mail) > Subject: Re: [PERFORM] Hack around lack of CORRESPONDING BY in EXCEPT? > <snip> > Lucas, try this untested query: > > SELECT tr.pk, ev.data1, ev.data2 > FROM tracking tr INNER JOIN events ev > ON tr.event_fk = ev.pk > WHERE ev.type != 10; > > (Should also work with AND instead of WHERE.) The problem is that it simply removes all events where type != 10, versus subtracting all events from subselect of type 10 where data1 and data2 match those in the main join. The goal of the query is to remove all events that match (matching being defined as both data1 and data2 matching) that are present in events of type 10 and events that are referenced by the tracking table, then return those tracking.pk's for entries that are left over. Its not required that I join tracking and events in the primary select before doing the EXCEPT join, but it should make it a bit more efficient. > > >SELECT tracking.pk,events.data1,events.data2 FROM > tracking,events WHERE > >tracking.event_fk = event.pk EXCEPT (SELECT > >tracking.pk,events.data1,events.data2 FROM tracking,events WHERE > >tracking.event_fk = event.pk AND event.type = 10) > > > >That won't work for two reasons... first, there are no > matching entries in > >the tracking table pointing to events where event.type = 10, > meaning this > >query would always return an empty set. > > I don't understand this. If there are no entries with event.type 10, > then the subselect returns an empty result set, and <anything> EXCEPT > <empty> should give the original result? Its not that there are no entires with event.type=10, its that there may not be any tracking entires for events of type 10, and if I join them before doing the EXCEPT I will lose them. That's why I have to do the EXCEPT subselect without joining it to the table. Thanks, Lucas. > > Servus > Manfred > > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html >
On Wed, 7 May 2003, Lucas Adamski wrote: > I wrote it originally as: > > SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE > tracking.event_fk = event.pk EXCEPT (SELECT events.data1,events.data2 FROM > events WHERE event.type = 10) > > because each of these subqueries restricts the dataset greatly before doing > the join. I've simplified the actual problem (as the real code has a bunch > of extraneous stuff that makes it even more obtuse), but essentially, the > tracking table maintains a record of the last record type that was entered. > The type is incremented for each batch of events that is loaded. In this > case, I'm assuming that the latest batch is type=10 (or 5000, or 100000), > and the tracking table references a small subset of previous events > (possibly of types 1-9 in this example). This particular query is supposed > to return all tracking.pk's that are present in the previous batches (types) > but not in the latest batch (10). I didn't mean to make it quite so obtuse, > sorry. :) Maybe something like nominally like (quickly done so possibly wrong again): select tracking.pk, events.data1, events.data2 from tracking,events where not exists (select * from events e where e.type=10 and e.data1=events.data1 and e.data2=events.data2) and tracking.event_fk=event.pk Get all tracking/event combinations, not including those where the data1/2 matches that of an event with type 10. That might give dups if there are multiple events rows with that pk for different types (but not 10).
On Wed, 7 May 2003 15:49:06 -0700, "Lucas Adamski" <ladamski@manageww.com> wrote: >The problem is that it simply removes all events where type != 10, versus >subtracting all events from subselect of type 10 where data1 and data2 match >those in the main join. Yes, I realized it when I read Stephan's comment immediately after I had sent my mail. Should have read your requirements more thoroughly. Sorry for the noise ... Servus Manfred
Stephan, Bingo! That worked perfectly, thank you! I was considering something like that, but couldn't figure out the syntax offhand to join two events tables in that fashion. Didn't realize you could alias a table as well! Thanks again, Lucas. > -----Original Message----- > From: Stephan Szabo [mailto:sszabo@megazone23.bigpanda.com] > Sent: Wednesday, May 07, 2003 3:59 PM > To: Lucas Adamski > Cc: Postgresql Performance Mailing list (E-mail) > Subject: Re: [PERFORM] Hack around lack of CORRESPONDING BY in EXCEPT? > > > > On Wed, 7 May 2003, Lucas Adamski wrote: > > > I wrote it originally as: > > > > SELECT tracking.pk,events.data1,events.data2 FROM > tracking,events WHERE > > tracking.event_fk = event.pk EXCEPT (SELECT > events.data1,events.data2 FROM > > events WHERE event.type = 10) > > > > because each of these subqueries restricts the dataset > greatly before doing > > the join. I've simplified the actual problem (as the real > code has a bunch > > of extraneous stuff that makes it even more obtuse), but > essentially, the > > tracking table maintains a record of the last record type > that was entered. > > The type is incremented for each batch of events that is > loaded. In this > > case, I'm assuming that the latest batch is type=10 (or > 5000, or 100000), > > and the tracking table references a small subset of previous events > > (possibly of types 1-9 in this example). This particular > query is supposed > > to return all tracking.pk's that are present in the > previous batches (types) > > but not in the latest batch (10). I didn't mean to make it > quite so obtuse, > > sorry. :) > > Maybe something like nominally like (quickly done so possibly wrong > again): > > select tracking.pk, events.data1, events.data2 from > tracking,events where not exists (select * from events e where > e.type=10 and e.data1=events.data1 and e.data2=events.data2) > and tracking.event_fk=event.pk > > Get all tracking/event combinations, not including those > where the data1/2 > matches that of an event with type 10. > > That might give dups if there are multiple events rows with > that pk for > different types (but not 10). > >