Re: Simple Join - Mailing list pgsql-performance

From Kevin Brown
Subject Re: Simple Join
Date
Msg-id 200512141752.45866.blargity@gmail.com
Whole thread Raw
In response to Re: Simple Join  (Mark Kirkwood <markir@paradise.net.nz>)
Responses Re: Simple Join
List pgsql-performance
On Wednesday 14 December 2005 17:30, Mark Kirkwood wrote:
> You scan 600000 rows from to_ship to get about 25000 - so some way to
> cut this down would help.

Yup.  I'm open to anything too, as this is the only real part of the system
that cares.  So either maintaining a denormalized copy column, or whatever
would be fine.  We're doing far more reads than writes.

> Try out an explicit INNER JOIN which includes the filter info for paid
> and suspended_sub in the join condition (you may need indexes on each of
> id, paid and suspended_sub, so that the 8.1 optimizer can use a bitmap
> scan):

I only had two explicit indexes.  One was on to_ship.ordered_product_id and
the other was on ordered_products.paid.  ordered_products.id is a primary
key.  This is on your query with an index added on suspended_sub:

dli=# explain analyze SELECT ordered_products.product_id
dli-# FROM to_ship INNER JOIN ordered_products
dli-# ON (to_ship.ordered_product_id = ordered_products.id
dli(#      AND ordered_products.paid = TRUE      AND
dli(# ordered_products.suspended_sub = FALSE);
                                                               QUERY PLAN
                         

---------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=5126.19..31528.40 rows=20591 width=8) (actual
time=4554.190..23519.618 rows=14367 loops=1)
   Hash Cond: ("outer".ordered_product_id = "inner".id)
   ->  Seq Scan on to_ship  (cost=0.00..11529.12 rows=611612 width=8) (actual
time=11.254..15192.042 rows=611612 loops=1)
   ->  Hash  (cost=4954.79..4954.79 rows=21759 width=16) (actual
time=4494.900..4494.900 rows=18042 loops=1)
         ->  Index Scan using paid_index on ordered_products
(cost=0.00..4954.79 rows=21759 width=16) (actual time=72.431..4414.697
rows=18042 loops=1)
               Index Cond: (paid = true)
               Filter: (paid AND (NOT suspended_sub))
 Total runtime: 23532.785 ms
(8 rows)

So what's the best way to performance wiggle this info out of the db?  The
list of values is only about 30 tuples long out of this query, so I was
figuring I could trigger on insert to to_ship to place the value into another
table if it didn't already exist.  I'd rather the writing be slow than the
reading.

pgsql-performance by date:

Previous
From: Kevin Brown
Date:
Subject: Re: Simple Join
Next
From: "Gregory S. Williamson"
Date:
Subject: Re: [postgis-users] Is my query planner failing me, or vice versa?