Re: Common Sub-expression removal - Mailing list pgsql-hackers

From Sam Mason
Subject Re: Common Sub-expression removal
Date
Msg-id 20080219150547.GC1653@frubble.xen.chris-lamb.co.uk
Whole thread Raw
List pgsql-hackers
Hi,

One thing that occurred to me when reading the "Ad Hoc Indexes"
thread was that PG doesn't seem to do much with tidying up common
sub-expressions (I'm not sure why I remembered about it as it's not
particularly related, strange).  Anyway, as an example imagine I have a
large table that I want to do a self join on:
 SELECT m1.source_ls_id, m1.movement_date, m2.movement_date FROM bcms.source_movements m1, bcms.source_movements m2
WHEREm1.source_ls_id = m2.source_ls_id   AND m1.movement_date < m2.movement_date;
 

I get a plan that sorts the movements table twice, giving the correct
answer but taking a while to actually get it.
 Merge Join  (cost=58981120.56..138431232.17 rows=1498156785 width=12)   Merge Cond: (m1.source_ls_id =
m2.source_ls_id)  Join Filter: (m1.movement_date < m2.movement_date)   ->  Sort  (cost=29490560.28..29889000.48
rows=159376080width=8)         Sort Key: m1.source_ls_id         ->  Seq Scan on source_movements m1
   (cost=0.00..2874586.80 rows=159376080 width=8)   ->  Sort  (cost=29490560.28..29889000.48 rows=159376080 width=8)
    Sort Key: m2.source_ls_id         ->  Seq Scan on source_movements m2                        (cost=0.00..2874586.80
rows=159376080width=8)
 

The time I actually tend to notice it more is when the join is between
two expensive views; this test case is nice and easy to reason about
though.

 Sam


pgsql-hackers by date:

Previous
From: "Roberts, Jon"
Date:
Subject: Re: Permanent settings
Next
From: Alvaro Herrera
Date:
Subject: Re: Timezone view