Home > mailing lists

Re: Finding uniques across a big join - Mailing list pgsql-general

From	Scott Marlowe
Subject	Re: Finding uniques across a big join
Date	November 30, 2005 17:18:15
Msg-id	1133385478.16010.13.camel@state.g2switchworks.com Whole thread Raw
In response to	Finding uniques across a big join ("John D. Burger" <john@mitre.org>)
Responses	Re: Finding uniques across a big join
List	pgsql-general

Tree view

On Tue, 2005-11-29 at 20:58, John D. Burger wrote:
> I could use some help with the following:
>
> I have a database of geographic entities with attributes spread across
> several tables.  I need to determine which entities are unique with
> respect to some of those attributes.  I'm using the following query:
>
> select p2.gazPlaceID from
>    (select p1.featureType, n1.placeNameID, c1.containerID
>      from gazPlaces as p1
>        join gazNamings as n1 using (gazPlaceID)
>        join gazContainers as c1 using (gazPlaceID)
>      group by p1.featureType, n1.placeNameID, c1.containerID
>      having count(*) = 1) as uniqs,
>    gazPlaces as p2
>      join gazNamings as n2 using (gazPlaceID)
>      join gazContainers as c2 using (gazPlaceID)
>    where uniqs.featureType = p2.featureType
>    and uniqs.placeNameID = n2.placeNameID
>    and uniqs.containerID = c2.containerID;
>
> The basic idea is to compute featureType-placeNameID-containerID
> combinations with a three-way join, determine which of those have a
> count of 1, and then join that back to the same three-way join to get
> the gazPlaceIDs corresponding to the unique combos (whew!).
>
> gazPlaces has about 6M entries, gazNamings and gazContainers each about
> 10M.  All of the fields above are integers, and I have indexes on
> everything relevant, but the query still takes about eight hours.  My
> question is not (necessarily) how to improve the efficiency of this
> query, but whether anyone can think of a faster way to compute the
> uniques.  Again, the goal is to find entries in gazPlaces that are the
> only ones with their particular combination of feature type, name and
> container.

OK, let's assume that the basic part of it, before the group by, has
been put into a view, so we can then do:

select pkey1, field2, field3, field4 from view;

And we know that pkey1 is unique, but we want the records where pkey1 is
the only thing different between them, right?

select
    v1.pkey1,
    v1.field2,
    v1.field3,
    v1.field4,
    v2.pkey1,
    v2.field2,
    v2.field3,
    v2.field4,
from
    view v1
join
    view v2
on (
    v1.field2=v2.field2 and
    v1.field3=v2.field3 and
    v1.field3=v2.field3 and
    v1.pkey1<>v2.pkey
)

How does that work?

pgsql-general by date:

From: Scott Marlowe
Date: 30 November 2005, 16:19:45
Subject: Re: How to change database owner in 7.4

From: Andrew Sullivan
Date: 30 November 2005, 17:19:35
Subject: Re: undefined behaviour for sub-transactions?

Re: Finding uniques across a big join - Mailing list pgsql-general

Previous

Next