Thread: Hack around lack of CORRESPONDING BY in EXCEPT?

Hack around lack of CORRESPONDING BY in EXCEPT?

From
"Lucas Adamski"
Date:
I'm not sure if this a performance question or a sql question really, but since my primarily peeve here is performance, here goes:
 
I'm trying to write a query which takes the output of a join and shows me only what the items that are in the main join but not in the subselect of just one of the tables in the join, using EXCEPT.
 
This is a little complicated, so please bear with me. 
 
I have two tables: an event table that logs random events as they come in, and a tracking table that keeps a state of events it cares about.  In this particular case I'm trying to obtain a list of tracking pkeys for related event data that do not correspond to a certain (other) set of event data.
 
Ideally, here is what I want:
 
SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE tracking.event_fk = event.pk EXCEPT (SELECT events.data1,events.data2 FROM events WHERE event.type = 10)
 
The problem I have of course is that I get an error regarding trying to use different columns for the two queries in EXCEPT.  I'm sure someone will point this out, but the following suggestion will not work:
 
SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE tracking.event_fk = event.pk EXCEPT (SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE tracking.event_fk = event.pk AND event.type = 10)
 
That won't work for two reasons... first, there are no matching entries in the tracking table pointing to events where event.type = 10, meaning this query would always return an empty set.  And even if there were, I don't want to do the join twice if its not necessary, as the events table is liable to be very large.
 
The official solution to this I believe would be to just use CORRESPONDING BY, but that's not supported by PG (why exactly, oh why!)
 
Suggestions, anyone?  Thanks in advance,
  Lucas.
 
 

Re: Hack around lack of CORRESPONDING BY in EXCEPT?

From
Stephan Szabo
Date:
On Wed, 7 May 2003, Lucas Adamski wrote:

> I'm not sure if this a performance question or a sql question really, but
> since my primarily peeve here is performance, here goes:
>
> I'm trying to write a query which takes the output of a join and shows me
> only what the items that are in the main join but not in the subselect of
> just one of the tables in the join, using EXCEPT.
>
> This is a little complicated, so please bear with me.
>
> I have two tables: an event table that logs random events as they come in,
> and a tracking table that keeps a state of events it cares about.  In this
> particular case I'm trying to obtain a list of tracking pkeys for related
> event data that do not correspond to a certain (other) set of event data.
>
> Ideally, here is what I want:
>
> SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE
> tracking.event_fk = event.pk EXCEPT (SELECT events.data1,events.data2 FROM
> events WHERE event.type = 10)

Maybe something like (if I'm right in assuming that you want any event
whose data1 and data2 match an event having type 10):

select tracking.pk, e.data1, e.data2 from
 tracking,
 ((select data1,data2 from events) except (select data1,data2 from events
  where event.type=10)) e
where tracking.event_fk=e.pk;


> The official solution to this I believe would be to just use CORRESPONDING
> BY, but that's not supported by PG (why exactly, oh why!)

Because it's not entry level SQL92 and noone's implemented it yet. :)


Re: Hack around lack of CORRESPONDING BY in EXCEPT?

From
Stephan Szabo
Date:
On Wed, 7 May 2003, Lucas Adamski wrote:

Of course my last suggestion won't work since you need to get the event.pk
field out.  The actual subquery would need to be more complicated and
probably involve an IN or EXISTS. :(


Re: Hack around lack of CORRESPONDING BY in EXCEPT?

From
"Patrick Hatcher"
Date:
Not sure if I'm reading your question correctly, but is this what you want?

SELECT t.pk,e.data1,e.data2
FROM tracking t left outer join events e on  t.event_fk = e.pk
WHERE e.type <> 10

OR

SELECT t.pk,e.data1,e.data2
FROM tracking t  inner join events e on  t.event_fk = e.pk
WHERE e.type <> 10





          
                    "Lucas Adamski"
          
                    <ladamski@manageww.com>            To:     "Postgresql Performance Mailing list (E-mail)"
          
                    Sent by:                            <pgsql-performance@postgresql.org>
          
                    pgsql-performance-owner@post       cc:
          
                    gresql.org                         Subject:     [PERFORM] Hack around lack of CORRESPONDING BY in
EXCEPT?     

          

          
                    05/07/2003 12:11 PM
          

          




I'm not sure if this a performance question or a sql question really, but
since my primarily peeve here is performance, here goes:

I'm trying to write a query which takes the output of a join and shows me
only what the items that are in the main join but not in the subselect of
just one of the tables in the join, using EXCEPT.

This is a little complicated, so please bear with me.

I have two tables: an event table that logs random events as they come in,
and a tracking table that keeps a state of events it cares about.  In this
particular case I'm trying to obtain a list of tracking pkeys for related
event data that do not correspond to a certain (other) set of event data.

Ideally, here is what I want:

SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE
tracking.event_fk = event.pk EXCEPT (SELECT events.data1,events.data2 FROM
events WHERE event.type = 10)

The problem I have of course is that I get an error regarding trying to use
different columns for the two queries in EXCEPT.  I'm sure someone will
point this out, but the following suggestion will not work:

SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE
tracking.event_fk = event.pk EXCEPT (SELECT
tracking.pk,events.data1,events.data2 FROM tracking,events WHERE
tracking.event_fk = event.pk AND event.type = 10)

That won't work for two reasons... first, there are no matching entries in
the tracking table pointing to events where event.type = 10, meaning this
query would always return an empty set.  And even if there were, I don't
want to do the join twice if its not necessary, as the events table is
liable to be very large.

The official solution to this I believe would be to just use CORRESPONDING
BY, but that's not supported by PG (why exactly, oh why!)

Suggestions, anyone?  Thanks in advance,
  Lucas.


Re: Hack around lack of CORRESPONDING BY in EXCEPT?

From
Manfred Koizar
Date:
On Wed, 7 May 2003 12:11:46 -0700, "Lucas Adamski"
<ladamski@manageww.com> wrote:
>I have two tables: an event table that logs random events as they come in,
>and a tracking table that keeps a state of events it cares about.  In this
>particular case I'm trying to obtain a list of tracking pkeys for related
>event data that do not correspond to a certain (other) set of event data.
>
>Ideally, here is what I want:
>
>SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE
>tracking.event_fk = event.pk EXCEPT (SELECT events.data1,events.data2 FROM
>events WHERE event.type = 10)

Lucas, try this untested query:

    SELECT tr.pk, ev.data1, ev.data2
      FROM tracking tr INNER JOIN events ev
        ON tr.event_fk = ev.pk
     WHERE ev.type != 10;

(Should also work with AND instead of WHERE.)

>SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE
>tracking.event_fk = event.pk EXCEPT (SELECT
>tracking.pk,events.data1,events.data2 FROM tracking,events WHERE
>tracking.event_fk = event.pk AND event.type = 10)
>
>That won't work for two reasons... first, there are no matching entries in
>the tracking table pointing to events where event.type = 10, meaning this
>query would always return an empty set.

I don't understand this.  If there are no entries with event.type 10,
then the subselect returns an empty result set, and <anything> EXCEPT
<empty> should give the original result?

Servus
 Manfred


Re: Hack around lack of CORRESPONDING BY in EXCEPT?

From
"Lucas Adamski"
Date:
Stephan,

Yup, unfortunately you are correct... I'd need to get the event.pk's out of
there somewhere to join with the tracking.event_fk.  I can't put the
event.pk in the subselects as they don't match, and I would get an empty set
back.

select tracking.pk, e.data1, e.data2 from
 tracking,
 ((select data1,data2 from events) except (select data1,data2 from events
  where event.type=10)) e
where tracking.event_fk=e.pk;

I wrote it originally as:

SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE
tracking.event_fk = event.pk EXCEPT (SELECT events.data1,events.data2 FROM
events WHERE event.type = 10)

because each of these subqueries restricts the dataset greatly before doing
the join.  I've simplified the actual problem (as the real code has a bunch
of extraneous stuff that makes it even more obtuse), but essentially, the
tracking table maintains a record of the last record type that was entered.
The type is incremented for each batch of events that is loaded.  In this
case, I'm assuming that the latest batch is type=10 (or 5000, or 100000),
and the tracking table references a small subset  of previous events
(possibly of types 1-9 in this example).  This particular query is supposed
to return all tracking.pk's that are present in the previous batches (types)
but not in the latest batch (10).  I didn't mean to make it quite so obtuse,
sorry. :)

So in this case I'm getting all of the relevant data for the new entries,
subtracting those from the old entries that are referred to by the tracking
system, and returning those outdated tracking.pk's.
  Lucas.

-----Original Message-----
From: Stephan Szabo [mailto:sszabo@megazone23.bigpanda.com]
Sent: Wednesday, May 07, 2003 12:43 PM
To: Lucas Adamski
Cc: Postgresql Performance Mailing list (E-mail)
Subject: Re: [PERFORM] Hack around lack of CORRESPONDING BY in EXCEPT?



On Wed, 7 May 2003, Lucas Adamski wrote:

Of course my last suggestion won't work since you need to get the event.pk
field out.  The actual subquery would need to be more complicated and
probably involve an IN or EXISTS. :(


Re: Hack around lack of CORRESPONDING BY in EXCEPT?

From
"Lucas Adamski"
Date:
Patrick,

I don't think that wouldn't quite work unfortunately, as I'm actually trying
to filter them out based upon the values in data1 and data2.  I'm using the
data in set 2 (data1,data2 from events where type=10) to remove rows from
set 1 (join between events and tracking table) where set1.data1=set2.data1
and set1.data2=set2.data2, and returning the tracking id's for any rows left
in set 1 (that were not in set 2).  I probably gave a better explaination in
my response to Stephan.  In the case below, I would simply get all events
where type<>10 from the join, regardless of whether they matched the data1
and data2 for all type=10.  Thanks,
  Lucas.

-----Original Message-----
From: Patrick Hatcher [mailto:PHatcher@macys.com]
Sent: Wednesday, May 07, 2003 12:40 PM
To: ladamski@manageww.com
Cc: Postgresql Performance Mailing list (E-mail);
pgsql-performance-owner@postgresql.org
Subject: Re: [PERFORM] Hack around lack of CORRESPONDING BY in EXCEPT?







Not sure if I'm reading your question correctly, but is this what you want?

SELECT t.pk,e.data1,e.data2
FROM tracking t left outer join events e on  t.event_fk = e.pk
WHERE e.type <> 10

OR

SELECT t.pk,e.data1,e.data2
FROM tracking t  inner join events e on  t.event_fk = e.pk
WHERE e.type <> 10





                    "Lucas Adamski"
                    <ladamski@manageww.com>            To:     "Postgresql
Performance Mailing list (E-mail)"
                    Sent by:
<pgsql-performance@postgresql.org>
                    pgsql-performance-owner@post       cc:
                    gresql.org                         Subject:
[PERFORM] Hack around lack of CORRESPONDING BY in EXCEPT?


                    05/07/2003 12:11 PM





I'm not sure if this a performance question or a sql question really, but
since my primarily peeve here is performance, here goes:

I'm trying to write a query which takes the output of a join and shows me
only what the items that are in the main join but not in the subselect of
just one of the tables in the join, using EXCEPT.

This is a little complicated, so please bear with me.

I have two tables: an event table that logs random events as they come in,
and a tracking table that keeps a state of events it cares about.  In this
particular case I'm trying to obtain a list of tracking pkeys for related
event data that do not correspond to a certain (other) set of event data.

Ideally, here is what I want:

SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE
tracking.event_fk = event.pk EXCEPT (SELECT events.data1,events.data2 FROM
events WHERE event.type = 10)

The problem I have of course is that I get an error regarding trying to use
different columns for the two queries in EXCEPT.  I'm sure someone will
point this out, but the following suggestion will not work:

SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE
tracking.event_fk = event.pk EXCEPT (SELECT
tracking.pk,events.data1,events.data2 FROM tracking,events WHERE
tracking.event_fk = event.pk AND event.type = 10)

That won't work for two reasons... first, there are no matching entries in
the tracking table pointing to events where event.type = 10, meaning this
query would always return an empty set.  And even if there were, I don't
want to do the join twice if its not necessary, as the events table is
liable to be very large.

The official solution to this I believe would be to just use CORRESPONDING
BY, but that's not supported by PG (why exactly, oh why!)

Suggestions, anyone?  Thanks in advance,
  Lucas.


Re: Hack around lack of CORRESPONDING BY in EXCEPT?

From
"Lucas Adamski"
Date:
Manfred,

I think what you propose is similar to what Patrick proposed, let me see if
I can explain below:

> -----Original Message-----
> From: Manfred Koizar [mailto:mkoi-pg@aon.at]
> Sent: Wednesday, May 07, 2003 1:23 PM
> To: Lucas Adamski
> Cc: Postgresql Performance Mailing list (E-mail)
> Subject: Re: [PERFORM] Hack around lack of CORRESPONDING BY in EXCEPT?
>

<snip>

> Lucas, try this untested query:
>
>     SELECT tr.pk, ev.data1, ev.data2
>       FROM tracking tr INNER JOIN events ev
>         ON tr.event_fk = ev.pk
>      WHERE ev.type != 10;
>
> (Should also work with AND instead of WHERE.)

The problem is that it simply removes all events where type != 10, versus
subtracting all events from subselect of type 10 where data1 and data2 match
those in the main join.  The goal of the query is to remove all events that
match (matching being defined as both data1 and data2 matching) that are
present in events of type 10 and events that are referenced by the tracking
table, then return those tracking.pk's for entries that are left over.

Its not required that I join tracking and events in the primary select
before doing the EXCEPT join, but it should make it a bit more efficient.

>
> >SELECT tracking.pk,events.data1,events.data2 FROM
> tracking,events WHERE
> >tracking.event_fk = event.pk EXCEPT (SELECT
> >tracking.pk,events.data1,events.data2 FROM tracking,events WHERE
> >tracking.event_fk = event.pk AND event.type = 10)
> >
> >That won't work for two reasons... first, there are no
> matching entries in
> >the tracking table pointing to events where event.type = 10,
> meaning this
> >query would always return an empty set.
>
> I don't understand this.  If there are no entries with event.type 10,
> then the subselect returns an empty result set, and <anything> EXCEPT
> <empty> should give the original result?

Its not that there are no entires with event.type=10, its that there may not
be any tracking entires for events of type 10, and if I join them before
doing the EXCEPT I will lose them.  That's why I have to do the EXCEPT
subselect without joining it to the table.  Thanks,
  Lucas.

>
> Servus
>  Manfred
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>


Re: Hack around lack of CORRESPONDING BY in EXCEPT?

From
Stephan Szabo
Date:
On Wed, 7 May 2003, Lucas Adamski wrote:

> I wrote it originally as:
>
> SELECT tracking.pk,events.data1,events.data2 FROM tracking,events WHERE
> tracking.event_fk = event.pk EXCEPT (SELECT events.data1,events.data2 FROM
> events WHERE event.type = 10)
>
> because each of these subqueries restricts the dataset greatly before doing
> the join.  I've simplified the actual problem (as the real code has a bunch
> of extraneous stuff that makes it even more obtuse), but essentially, the
> tracking table maintains a record of the last record type that was entered.
> The type is incremented for each batch of events that is loaded.  In this
> case, I'm assuming that the latest batch is type=10 (or 5000, or 100000),
> and the tracking table references a small subset  of previous events
> (possibly of types 1-9 in this example).  This particular query is supposed
> to return all tracking.pk's that are present in the previous batches (types)
> but not in the latest batch (10).  I didn't mean to make it quite so obtuse,
> sorry. :)

Maybe something like nominally like (quickly done so possibly wrong
again):

 select tracking.pk, events.data1, events.data2 from
  tracking,events where not exists (select * from events e where
  e.type=10 and e.data1=events.data1 and e.data2=events.data2)
  and tracking.event_fk=event.pk

Get all tracking/event combinations, not including those where the data1/2
matches that of an event with type 10.

That might give dups if there are multiple events rows with that pk for
different types (but not 10).


Re: Hack around lack of CORRESPONDING BY in EXCEPT?

From
Manfred Koizar
Date:
On Wed, 7 May 2003 15:49:06 -0700, "Lucas Adamski"
<ladamski@manageww.com> wrote:
>The problem is that it simply removes all events where type != 10, versus
>subtracting all events from subselect of type 10 where data1 and data2 match
>those in the main join.

Yes, I realized it when I read Stephan's comment immediately after I
had sent my mail.  Should have read your requirements more thoroughly.
Sorry for the noise ...

Servus
 Manfred


Re: Hack around lack of CORRESPONDING BY in EXCEPT?

From
"Lucas Adamski"
Date:
Stephan,

Bingo!  That worked perfectly, thank you!  I was considering something like
that, but couldn't figure out the syntax offhand to join two events tables
in that fashion.  Didn't realize you could alias a table as well!  Thanks
again,
  Lucas.

> -----Original Message-----
> From: Stephan Szabo [mailto:sszabo@megazone23.bigpanda.com]
> Sent: Wednesday, May 07, 2003 3:59 PM
> To: Lucas Adamski
> Cc: Postgresql Performance Mailing list (E-mail)
> Subject: Re: [PERFORM] Hack around lack of CORRESPONDING BY in EXCEPT?
>
>
>
> On Wed, 7 May 2003, Lucas Adamski wrote:
>
> > I wrote it originally as:
> >
> > SELECT tracking.pk,events.data1,events.data2 FROM
> tracking,events WHERE
> > tracking.event_fk = event.pk EXCEPT (SELECT
> events.data1,events.data2 FROM
> > events WHERE event.type = 10)
> >
> > because each of these subqueries restricts the dataset
> greatly before doing
> > the join.  I've simplified the actual problem (as the real
> code has a bunch
> > of extraneous stuff that makes it even more obtuse), but
> essentially, the
> > tracking table maintains a record of the last record type
> that was entered.
> > The type is incremented for each batch of events that is
> loaded.  In this
> > case, I'm assuming that the latest batch is type=10 (or
> 5000, or 100000),
> > and the tracking table references a small subset  of previous events
> > (possibly of types 1-9 in this example).  This particular
> query is supposed
> > to return all tracking.pk's that are present in the
> previous batches (types)
> > but not in the latest batch (10).  I didn't mean to make it
> quite so obtuse,
> > sorry. :)
>
> Maybe something like nominally like (quickly done so possibly wrong
> again):
>
>  select tracking.pk, events.data1, events.data2 from
>   tracking,events where not exists (select * from events e where
>   e.type=10 and e.data1=events.data1 and e.data2=events.data2)
>   and tracking.event_fk=event.pk
>
> Get all tracking/event combinations, not including those
> where the data1/2
> matches that of an event with type 10.
>
> That might give dups if there are multiple events rows with
> that pk for
> different types (but not 10).
>
>