Re: Finding uniques across a big join - Mailing list pgsql-general

From John D. Burger
Subject Re: Finding uniques across a big join
Date
Msg-id 254e417bc5cb3bca0441a79c8cc5c587@mitre.org
Whole thread Raw
In response to Re: Finding uniques across a big join  (Scott Marlowe <smarlowe@g2switchworks.com>)
Responses Re: Finding uniques across a big join
Re: Finding uniques across a big join
List pgsql-general
Scott Marlowe wrote:

> OK, let's assume that the basic part of it, before the group by, has
> been put into a view, so we can then do:
>
> select pkey1, field2, field3, field4 from view;
>
> And we know that pkey1 is unique, but we want the records where pkey1
> is
> the only thing different between them, right?

Hmm, I'm explaining this really badly :).  I should have defined a view
like you suggest to help simplify it.  What I want is the pkeys (and
the field values) where no other pkey has that triple of field values.
That's why my earlier query does a group by the fields and then having
count(*) = 1.  Also, FWIW, pkey1 is unique in its original table, but
not in the view, since some of the other tables are one-to-many.

> select
>     v1.pkey1,
>     v1.field2,
>     v1.field3,
>     v1.field4,
>     v2.pkey1,
>     v2.field2,
>     v2.field3,
>     v2.field4,
> from
>     view v1
> join
>     view v2
> on (
>     v1.field2=v2.field2 and
>     v1.field3=v2.field3 and
>     v1.field3=v2.field3 and
>     v1.pkey1<>v2.pkey
> )
>
> How does that work?

Won't this be a massive cross product of all pkey pairs that have the
same field values?

Here's what I'm currently using, in terms of your very helpful view:

select v1.pkey1, v1.field2, v1.field3, v1.field4
   from view as v1
   join
   (select v2.field1, v2.field2, v2.field3
    from view as v2
    group by v2.field2,  v2.field3, v2.field4
    having count(*) = 1)
   using (field2, field3, field4);

This is the one that takes eight hours. :(  Another way to express what
I want is this:

select v1.pkey1, v1.field2, v1.field3, v1.field4
   from view as v1
   where not exists
     (select true from view as v2
      where v1.field2 = v2.field2
        and v1.field3 = v2.field3
        and v1.field4 = v2.field4
        and v1.pkey1 <> v2.pkey1);

That looks like a horrible nested loop, but I suppose I should try it
to make sure it is indeed slower then the previous query.

- John Burger
   MITRE


pgsql-general by date:

Previous
From: Jerry Sievers
Date:
Subject: Re: How to check options PostgreSQL was started with
Next
From: Tyler MacDonald
Date:
Subject: Re: undefined behaviour for sub-transactions?