Re: Avoiding cartesian product - Mailing list pgsql-performance

From Virag Saksena
Subject Re: Avoiding cartesian product
Date
Msg-id 002c01c635e8$cfc225d0$3100000a@demo1
Whole thread Raw
In response to Avoiding cartesian product  ("Virag Saksena" <virag@auptyma.com>)
List pgsql-performance

Szűcs,
    Thanks for your suggestion, I guess there is more than one way to attack the problem.
 
I ended up using a trick with limit to get the next row ...
 
select (b.gc_minor- a.gc_minor), (b.gc_major- a.gc_major)
from jam_trace_sys a join jam_trace_sys b on
(b.seq_no = (select c.seq_no from jam_trace_sys c
 where c.trace_id = a.trace_id and c.seq_no > a.seq_no
 order by c.trace_id, c.seq_no limit 1)
 and b.trace_id = a.trace_id),
jam_tracesnap s1, jam_tracesnap s2
where s1.trace_id = a.trace_id
and s1.seq_no = a.seq_no
and s2.trace_id = b.trace_id
and s2.seq_no = b.seq_no
and a.trace_id = 22
order by a.seq_no;
 
This gave me a nice clean execution plan (there are some extra sources listed, but that is a bug in postgresql) ...
 
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Sort  (cost=11.24..11.25 rows=1 width=20) (actual time=0.040..0.040 rows=0 loops=1)
 Sort Key: a.seq_no
 ->Nested Loop  (cost=0.00..11.23 rows=1 width=20) (actual time=0.028..0.028 rows=0 loops=1)
    Join Filter: ("inner".seq_no = "outer".seq_no)
    ->Nested Loop  (cost=0.00..9.20 rows=1 width=32) (actual time=0.024..0.024 rows=0 loops=1)
       Join Filter: ("inner".seq_no = "outer".seq_no)
       ->Nested Loop  (cost=0.00..7.17 rows=1 width=32) (actual time=0.020..0.020 rows=0 loops=1)
          Join Filter: ("inner".seq_no = (subplan))
          ->Index Scan using jam_trace_sys_n1 on jam_trace_sys a  (cost=0.00..3.41 rows=1 width=16) (actual time=0.016..0.016 rows=0 loops=1)
             Index Cond: (trace_id = 22)
          ->Index Scan using jam_trace_sys_n1 on jam_trace_sys b  (cost=0.00..3.41 rows=1 width=16) (never executed)
             Index Cond: (22 = trace_id)
          SubPlan
          ->Limit  (cost=0.00..0.33 rows=1 width=8) (never executed)
            ->Index Scan using jam_trace_sys_n1 on jam_trace_sys c  (cost=0.00..6.36 rows=19 width=8) (never executed)
               Index Cond: ((trace_id = $0) AND (seq_no > $1))
       ->Index Scan using jam_tracesnap_n1 on jam_tracesnap s1  (cost=0.00..2.01 rows=1 width=8) (never executed)
          Index Cond: (22 = trace_id)
    ->Index Scan using jam_tracesnap_n1 on jam_tracesnap s2  (cost=0.00..2.01 rows=1 width=8) (never executed)
       Index Cond: (22 = trace_id)
Regards,
 
Virag

----- Original Message -----
From: "Szűcs Gábor" <surrano@gmail.com>
Sent: Monday, January 09, 2006 7:59 AM
Subject: Re: Avoiding cartesian product

> Dear Virag,
>
> AFAIK aggregates aren't indexed in postgres (at least not before 8.1, which
> indexes min and max, iirc).
>
> Also, I don't think you need to exactly determine the trace_id. Try this one
> (OTOH; might be wrong):
>
> select DISTINCT ON (a.trace_id, a.seq_no) -- See below
>    b.gc_minor - a.gc_minor, b.gc_major - a.gc_major
> from jam_trace_sys a, jam_trace_sys b
> where a.trace_id = 22
>    and b.trace_id = a.trace_id
>    and b.seq_no > a.seq_no -- Simply ">" is enough
> order by a.trace_id, a.seq_no, b.seq_no; -- DISTINCT, see below
>
> The trick is that DISTINCT takes the first one in each group (IIRC it is
> granted, at least someone told me on one of these lists :) ) so if you order
> by the DISTINCT attributes and then by b.seq_no, you'll get the smallest of
> appropriate b.seq_no values for each DISTINCT values.
>
> The idea of DISTINCTing by both columns is to make sure the planner finds
> the index. (lately I had a similar problem: WHERE a=1 ORDER BY b LIMIT 1
> used an index on b, instead of an (a,b) index. Using ORDER BY a,b solved it)
>
> HTH,
>
> --
> G.
>
>
> On 2006.01.04. 5:12, Virag Saksena wrote:
> >
> > I have a table which stores cumulative values
> > I would like to display/chart the deltas between successive data
> > collections
> > 
> > If my primary key only increments by 1, I could write a simple query
> > 
> > select b.gc_minor - a.gc_minor, b.gc_major - a.gc_major
> >   from jam_trace_sys a, jam_trace_sys b
> >  where a.trace_id = 22
> >    and b.trace_id = a.trace_id
> >    and b.seq_no = a.seq_no + 1
> >  order by a.seq_no;
> > 
> > However the difference in sequence number is variable.
> > So (in Oracle) I used to extract the next seq_no using a correlated
> > sub-query
> > 
> > select b.gc_minor - a.gc_minor, b.gc_major - a.gc_major
> > from jam_trace_sys a, jam_trace_sys b
> > where a.trace_id = 22
> > and (b.trace_id, b.seq_no) =
> > (select a.trace_id, min(c.seq_no) from jam_trace_sys c
> > where c.trace_id = a.trace_id and c.seq_no > a.seq_no)
> >  order by a.seq_no;
> > 
> > For every row in A, The correlated sub-query from C will execute
> > With an appropriate index, it will just descend the index Btree
> > go one row to the right and return that row (min > :value)
> > and join to table B
> > 
> > SELECT STATEMENT
> >   SORT ORDER BY
> >    TABLE ACCESS BY INDEX ROWID JAM_TRACE_SYS B
> >      NESTED LOOPS
> >        TABLE ACCESS BY INDEX ROWID JAM_TRACE_SYS  A
> >          INDEX RANGE SCAN JAM_TRACE_SYS_N1  A
> >        INDEX RANGE SCAN JAM_TRACE_SYS_N1 B
> >          SORT AGGREGATE
> >            INDEX RANGE SCAN JAM_TRACE_SYS_N1 C
> > 
> > In postgreSQL A and B are doing a cartesian product
> > then C gets executed for every row in this cartesian product
> > and most of the extra rows get thrown out.
> > Is there any way to force an execution plan like above where the
> > correlated subquery runs before going to B.
> > The table is small right now, but it will grow to have millions of rows
> > QUERY PLAN
> > -----------------------------------------------------------------------------------------------------------------------------------
> >  Sort  (cost=124911.81..124944.84 rows=13213 width=20) (actual
> > time=13096.754..13097.053 rows=149 loops=1)
> >    Sort Key: a.seq_no
> >    ->  Nested Loop  (cost=4.34..124007.40 rows=13213 width=20) (actual
> > time=1948.300..13096.329 rows=149 loops=1)
> >          Join Filter: (subplan)
> >          ->  Seq Scan on jam_trace_sys b  (cost=0.00..3.75 rows=175
> > width=16) (actual time=0.005..0.534 rows=175 loops=1)
> >          ->  Materialize  (cost=4.34..5.85 rows=151 width=16) (actual
> > time=0.002..0.324 rows=150 loops=175)
> >                ->  Seq Scan on jam_trace_sys a  (cost=0.00..4.19
> > rows=151 width=16) (actual time=0.022..0.687 rows=150 loops=1)
> >                      Filter: (trace_id = 22)
> >          SubPlan
> >            ->  Aggregate  (cost=4.67..4.67 rows=1 width=4) (actual
> > time=0.486..0.488 rows=1 loops=26250)
> >                  ->  Seq Scan on jam_trace_sys c  (cost=0.00..4.62
> > rows=15 width=4) (actual time=0.058..0.311 rows=74 loops=26250)
> >                        Filter: ((trace_id = $0) AND (seq_no > $1))
> >  Total runtime: 13097.557 ms
> > (13 rows)
> > 
> > pglnx01=> \d jam_trace_sys
> >      Table "public.jam_trace_sys"
> >      Column      |  Type   | Modifiers
> > -----------------+---------+-----------
> >  trace_id        | integer |
> >  seq_no          | integer |
> >  cpu_utilization | integer |
> >  gc_minor        | integer |
> >  gc_major        | integer |
> >  heap_used       | integer |
> > Indexes:
> >     "jam_trace_sys_n1" btree (trace_id, seq_no)
> > 
> > pglnx01=> select count(*) from jam_trace_Sys ;
> >  count
> > -------
> >    175
> > (1 row)
> > 
> > pglnx01=> select trace_id, count(*) from jam_trace_sys group by trace_id ;
> >  trace_id | count
> > ----------+-------
> >        15 |     2
> >        18 |    21
> >        22 |   150
> >        16 |     2
> > (4 rows)
>

pgsql-performance by date:

Previous
From: Christopher Kings-Lynne
Date:
Subject: Re: Need pointers to "standard" pg database(s) for
Next
From: Fredrik Olsson
Date:
Subject: Re: Force another plan.