Re: Finding uniques across a big join - Mailing list pgsql-general

From Jim C. Nasby
Subject Re: Finding uniques across a big join
Date
Msg-id 20051130181301.GG13642@nasby.net
Whole thread Raw
In response to Finding uniques across a big join  ("John D. Burger" <john@mitre.org>)
Responses Re: Finding uniques across a big join
List pgsql-general
It will probably be a win to come up with a list of potential records
from each table, instead of after doing the 3-way join. so something
like:

(SELECT gazPlaceID FROM gazPlaces GROUP BY featureType HAVING count(*)=1)
JOIN
(SELECT ...)

If you post the output of explain (or explain analyze is even better)
then people could probably make better suggestions.

On Tue, Nov 29, 2005 at 09:58:49PM -0500, John D. Burger wrote:
> I could use some help with the following:
>
> I have a database of geographic entities with attributes spread across
> several tables.  I need to determine which entities are unique with
> respect to some of those attributes.  I'm using the following query:
>
> select p2.gazPlaceID from
>   (select p1.featureType, n1.placeNameID, c1.containerID
>     from gazPlaces as p1
>       join gazNamings as n1 using (gazPlaceID)
>       join gazContainers as c1 using (gazPlaceID)
>     group by p1.featureType, n1.placeNameID, c1.containerID
>     having count(*) = 1) as uniqs,
>   gazPlaces as p2
>     join gazNamings as n2 using (gazPlaceID)
>     join gazContainers as c2 using (gazPlaceID)
>   where uniqs.featureType = p2.featureType
>   and uniqs.placeNameID = n2.placeNameID
>   and uniqs.containerID = c2.containerID;
>
> The basic idea is to compute featureType-placeNameID-containerID
> combinations with a three-way join, determine which of those have a
> count of 1, and then join that back to the same three-way join to get
> the gazPlaceIDs corresponding to the unique combos (whew!).
>
> gazPlaces has about 6M entries, gazNamings and gazContainers each about
> 10M.  All of the fields above are integers, and I have indexes on
> everything relevant, but the query still takes about eight hours.  My
> question is not (necessarily) how to improve the efficiency of this
> query, but whether anyone can think of a faster way to compute the
> uniques.  Again, the goal is to find entries in gazPlaces that are the
> only ones with their particular combination of feature type, name and
> container.
>
> Any help is appreciated!
>
> - John Burger
>   MITRE
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org
>

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

pgsql-general by date:

Previous
From: frank church
Date:
Subject: How to change database owner in 7.4
Next
From: Chris Browne
Date:
Subject: Re: Limits