Re: Query Join Performance - Mailing list pgsql-sql

From Aaron Bono
Subject Re: Query Join Performance
Date
Msg-id bf05e51c0704251339u2883211eh4cdedcbec63c2be9@mail.gmail.com
Whole thread Raw
In response to Re: Query Join Performance  (Richard Huxton <dev@archonet.com>)
Responses Re: Query Join Performance  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-sql
On 4/25/07, Richard Huxton <dev@archonet.com> wrote:
Aaron Bono wrote:
> Performance tanks with this query - it takes over 120 seconds (that is
> where
> I set the timeout).

> BTW, on our Linux box the full query we run (which adds 3 more tables on
> the
> whole operation along with more filtering on the zip table) finishes in
> under 10 seconds.  Problem is our development is on Windows and this is a
> real pain for developers to test.

So what's different between the systems. Obvious things to look at:
1. EXPLAIN ANALYSE outputs to show the plans (these presumably are
different, but in what details, and why?)
2. Configuration (particularly memory/cost settings).
3. Hardware.
4. Locale/encoding - these can affect index usage and sorting.

--
   Richard Huxton
   Archonet Ltd

Now I am having the same problem on the Linux box so I doubt it is the platform.

The biggest problem I notice is when I add a join from a child table (zip_city) to a parent table (zip).  I have filtered the child table down to about 650 records but when I add the join to the parent which has over 800,000 records, performance tanks.  I was able to benchmark two queries last night on my Windows machine:

-- This runs in just over 2 seconds

    select
        nearby_zip_city.zip_id,
        gps_distance(zip_city.longitude::numeric, zip_city.latitude::numeric, nearby_zip_city.longitude::numeric, nearby_zip_city.latitude::numeric) AS distance
    from zip_city
    inner join zip on (
        zip.zip_id = zip_city.zip_id
    )
    inner join zip_city as nearby_zip_city on (
        abs(zip_city.longitude - nearby_zip_city.longitude) <= (50.0 / 60.0)
        AND abs(zip_city.latitude - nearby_zip_city.latitude) <= (50.0 / 60.0)
    )
    where zip.zip_cd = '66105'

-- This takes over 48 seconds and I just added a join from the zip_city child to the zip parent table

    select
        nearby_zip.zip_cd,
        gps_distance(zip_city.longitude::numeric, zip_city.latitude::numeric, nearby_zip_city.longitude::numeric, nearby_zip_city.latitude::numeric) AS distance
    from zip_city
    inner join zip on (
        zip.zip_id = zip_city.zip_id
    )
    inner join zip_city as nearby_zip_city on (
        abs(zip_city.longitude - nearby_zip_city.longitude) <= (50.0 / 60.0)
        AND abs(zip_city.latitude - nearby_zip_city.latitude) <= ( 50.0 / 60.0)
    )
    -->>> The next 3 lines are the main difference <<<--
    inner join zip as nearby_zip on (
        nearby_zip_city.zip_id = nearby_zip.zip_id
    )
    -->>> End of difference <<<--
    where zip.zip_cd = '66105'

-- Explain plan for faster/first query:
Nested Loop  (cost=45779.82..147990502.45 rows=451678770 width=40) (actual time=5404.943..20151.684 rows=653 loops=1)
  Join Filter: ((abs(("inner".longitude - "outer".longitude)) <= 0.833333333333333::double precision) AND (abs(("inner".latitude - "outer".latitude)) <= 0.833333333333333::double precision))
  ->  Seq Scan on zip_city nearby_zip_city  (cost=0.00..25514.19 rows=901719 width=24) (actual time= 0.053..2311.547 rows=901719 loops=1)
  ->  Materialize  (cost=45779.82..45824.90 rows=4508 width=16) (actual time=0.003..0.009 rows=1 loops=901719)
        ->  Hash Join  (cost=8944.55..45775.31 rows=4508 width=16) (actual time= 444.657..4490.901 rows=1 loops=1)
              Hash Cond: ("outer".zip_id = "inner".zip_id)
              ->  Seq Scan on zip_city  (cost=0.00..25514.19 rows=901719 width=24) (actual time=0.004..2334.548 rows=901719 loops=1)
              ->  Hash  (cost=8933.90..8933.90 rows=4261 width=8) (actual time=0.296..0.296 rows=1 loops=1)
                    ->  Bitmap Heap Scan on zip  (cost=32.91..8933.90 rows=4261 width=8) (actual time= 0.272..0.275 rows=1 loops=1)
                          Recheck Cond: ((zip_cd)::text = '66105'::text)
                          ->  Bitmap Index Scan on zip_zip_cd_key  (cost=0.00..32.91 rows=4261 width=0) (actual time= 0.250..0.250 rows=1 loops=1)
                                Index Cond: ((zip_cd)::text = '66105'::text)

-- Explain plan for shower/second query:
Nested Loop  (cost=75372.31..148056286.32 rows=451678770 width=43) (actual time= 62688.188..69916.943 rows=653 loops=1)
  Join Filter: ((abs(("inner".longitude - "outer".longitude)) <= 0.833333333333333::double precision) AND (abs(("inner".latitude - "outer".latitude)) <= 0.833333333333333::double precision))
  ->  Hash Join  (cost=29592.49..91298.06 rows=901719 width=27) (actual time=17905.224..52279.151 rows=901719 loops=1)
        Hash Cond: ("outer".zip_id = "inner".zip_id)
        ->  Seq Scan on zip_city nearby_zip_city  (cost=0.00..25514.19 rows=901719 width=24) (actual time=0.044..2888.993 rows=901719 loops=1)
        ->  Hash  (cost=21634.79..21634.79 rows=852279 width=19) (actual time= 13925.502..13925.502 rows=852279 loops=1)
              ->  Seq Scan on zip nearby_zip  (cost=0.00..21634.79 rows=852279 width=19) (actual time=0.042..2535.742 rows=852279 loops=1)
  ->  Materialize  (cost=45779.82..45824.90 rows=4508 width=16) (actual time=0.002..0.009 rows=1 loops=901719)
        ->  Hash Join  (cost=8944.55..45775.31 rows=4508 width=16) (actual time=421.374..4453.224 rows=1 loops=1)
              Hash Cond: ("outer".zip_id = "inner".zip_id)
              ->  Seq Scan on zip_city  (cost=0.00..25514.19 rows=901719 width=24) (actual time=0.028..2333.941 rows=901719 loops=1)
              ->  Hash  (cost=8933.90..8933.90 rows=4261 width=8) (actual time= 0.604..0.604 rows=1 loops=1)
                    ->  Bitmap Heap Scan on zip  (cost=32.91..8933.90 rows=4261 width=8) (actual time=0.588..0.591 rows=1 loops=1)
                          Recheck Cond: ((zip_cd)::text = '66105'::text)
                          ->  Bitmap Index Scan on zip_zip_cd_key  (cost=0.00..32.91 rows=4261 width=0) (actual time=0.559..0.559 rows=1 loops=1)
                                Index Cond: ((zip_cd)::text = '66105'::text)

It is the join that is killing the query but I am at a loss of the best approach to fix it.  I have some work arounds in mind by flattening out the tables but I would rather not have to do that.

Thanks!
Aaron

--
==================================================================
   Aaron Bono
   Aranya Software Technologies, Inc.
   http://www.aranya.com
   http://codeelixir.com
==================================================================

pgsql-sql by date:

Previous
From: Richard Albright
Date:
Subject: Re: sql wrapped plpgsql set returning function
Next
From: Tom Lane
Date:
Subject: Re: Query Join Performance