Home > mailing lists

Re: Avoiding cartesian product - Mailing list pgsql-performance

From	Virag Saksena
Subject	Re: Avoiding cartesian product
Date	February 20, 2006 02:42:42
Msg-id	002c01c635e8$cfc225d0$3100000a@demo1 Whole thread
In response to	Avoiding cartesian product ("Virag Saksena" <virag@auptyma.com>)
List	pgsql-performance

Tree view

Szűcs,

Thanks for your suggestion, I guess there is more than one way to attack the problem.

I ended up using a trick with limit to get the next row ...

select (b.gc_minor- a.gc_minor), (b.gc_major- a.gc_major)
from jam_trace_sys a join jam_trace_sys b on
(b.seq_no = (select c.seq_no from jam_trace_sys c
where c.trace_id = a.trace_id and c.seq_no > a.seq_no
order by c.trace_id, c.seq_no limit 1)
and b.trace_id = a.trace_id),
jam_tracesnap s1, jam_tracesnap s2
where s1.trace_id = a.trace_id
and s1.seq_no = a.seq_no
and s2.trace_id = b.trace_id
and s2.seq_no = b.seq_no
and a.trace_id = 22
order by a.seq_no;

This gave me a nice clean execution plan (there are some extra sources listed, but that is a bug in postgresql) ...

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=11.24..11.25 rows=1 width=20) (actual time=0.040..0.040 rows=0 loops=1)
Sort Key: a.seq_no
->Nested Loop (cost=0.00..11.23 rows=1 width=20) (actual time=0.028..0.028 rows=0 loops=1)
    Join Filter: ("inner".seq_no = "outer".seq_no)
    ->Nested Loop (cost=0.00..9.20 rows=1 width=32) (actual time=0.024..0.024 rows=0 loops=1)
       Join Filter: ("inner".seq_no = "outer".seq_no)
       ->Nested Loop (cost=0.00..7.17 rows=1 width=32) (actual time=0.020..0.020 rows=0 loops=1)
          Join Filter: ("inner".seq_no = (subplan))
          ->Index Scan using jam_trace_sys_n1 on jam_trace_sys a (cost=0.00..3.41 rows=1 width=16) (actual time=0.016..0.016 rows=0 loops=1)
             Index Cond: (trace_id = 22)
          ->Index Scan using jam_trace_sys_n1 on jam_trace_sys b (cost=0.00..3.41 rows=1 width=16) (never executed)
             Index Cond: (22 = trace_id)
          SubPlan
          ->Limit (cost=0.00..0.33 rows=1 width=8) (never executed)
            ->Index Scan using jam_trace_sys_n1 on jam_trace_sys c (cost=0.00..6.36 rows=19 width=8) (never executed)
               Index Cond: ((trace_id = $0) AND (seq_no > $1))
       ->Index Scan using jam_tracesnap_n1 on jam_tracesnap s1 (cost=0.00..2.01 rows=1 width=8) (never executed)
          Index Cond: (22 = trace_id)
    ->Index Scan using jam_tracesnap_n1 on jam_tracesnap s2 (cost=0.00..2.01 rows=1 width=8) (never executed)
       Index Cond: (22 = trace_id)

Regards,

Virag

----- Original Message -----

From: "Szűcs Gábor" <surrano@gmail.com>

To: <pgsql-performance@postgresql.org>

Sent: Monday, January 09, 2006 7:59 AM

Subject: Re: Avoiding cartesian product

> Dear Virag,
>
> AFAIK aggregates aren't indexed in postgres (at least not before 8.1, which
> indexes min and max, iirc).
>
> Also, I don't think you need to exactly determine the trace_id. Try this one
> (OTOH; might be wrong):
>
> select DISTINCT ON (a.trace_id, a.seq_no) -- See below
>    b.gc_minor - a.gc_minor, b.gc_major - a.gc_major
> from jam_trace_sys a, jam_trace_sys b
> where a.trace_id = 22
>    and b.trace_id = a.trace_id
>    and b.seq_no > a.seq_no -- Simply ">" is enough
> order by a.trace_id, a.seq_no, b.seq_no; -- DISTINCT, see below
>
> The trick is that DISTINCT takes the first one in each group (IIRC it is
> granted, at least someone told me on one of these lists :) ) so if you order
> by the DISTINCT attributes and then by b.seq_no, you'll get the smallest of
> appropriate b.seq_no values for each DISTINCT values.
>
> The idea of DISTINCTing by both columns is to make sure the planner finds
> the index. (lately I had a similar problem: WHERE a=1 ORDER BY b LIMIT 1
> used an index on b, instead of an (a,b) index. Using ORDER BY a,b solved it)
>
> HTH,
>
> --
> G.
>
>
> On 2006.01.04. 5:12, Virag Saksena wrote:
> >
> > I have a table which stores cumulative values
> > I would like to display/chart the deltas between successive data
> > collections
> >
> > If my primary key only increments by 1, I could write a simple query
> >
> > select b.gc_minor - a.gc_minor, b.gc_major - a.gc_major
> >   from jam_trace_sys a, jam_trace_sys b
> > where a.trace_id = 22
> >    and b.trace_id = a.trace_id
> >    and b.seq_no = a.seq_no + 1
> > order by a.seq_no;
> >
> > However the difference in sequence number is variable.
> > So (in Oracle) I used to extract the next seq_no using a correlated
> > sub-query
> >
> > select b.gc_minor - a.gc_minor, b.gc_major - a.gc_major
> > from jam_trace_sys a, jam_trace_sys b
> > where a.trace_id = 22
> > and (b.trace_id, b.seq_no) =
> > (select a.trace_id, min(c.seq_no) from jam_trace_sys c
> > where c.trace_id = a.trace_id and c.seq_no > a.seq_no)
> > order by a.seq_no;
> >
> > For every row in A, The correlated sub-query from C will execute
> > With an appropriate index, it will just descend the index Btree
> > go one row to the right and return that row (min > :value)
> > and join to table B
> >
> > SELECT STATEMENT
> >   SORT ORDER BY
> >    TABLE ACCESS BY INDEX ROWID JAM_TRACE_SYS B
> >      NESTED LOOPS
> >        TABLE ACCESS BY INDEX ROWID JAM_TRACE_SYS A
> >          INDEX RANGE SCAN JAM_TRACE_SYS_N1 A
> >        INDEX RANGE SCAN JAM_TRACE_SYS_N1 B
> >          SORT AGGREGATE
> >            INDEX RANGE SCAN JAM_TRACE_SYS_N1 C
> >
> > In postgreSQL A and B are doing a cartesian product
> > then C gets executed for every row in this cartesian product
> > and most of the extra rows get thrown out.
> > Is there any way to force an execution plan like above where the
> > correlated subquery runs before going to B.
> > The table is small right now, but it will grow to have millions of rows
> > QUERY PLAN
> > -----------------------------------------------------------------------------------------------------------------------------------
> > Sort (cost=124911.81..124944.84 rows=13213 width=20) (actual
> > time=13096.754..13097.053 rows=149 loops=1)
> >    Sort Key: a.seq_no
> >    -> Nested Loop (cost=4.34..124007.40 rows=13213 width=20) (actual
> > time=1948.300..13096.329 rows=149 loops=1)
> >          Join Filter: (subplan)
> >          -> Seq Scan on jam_trace_sys b (cost=0.00..3.75 rows=175
> > width=16) (actual time=0.005..0.534 rows=175 loops=1)
> >          -> Materialize (cost=4.34..5.85 rows=151 width=16) (actual
> > time=0.002..0.324 rows=150 loops=175)
> >                -> Seq Scan on jam_trace_sys a (cost=0.00..4.19
> > rows=151 width=16) (actual time=0.022..0.687 rows=150 loops=1)
> >                      Filter: (trace_id = 22)
> >          SubPlan
> >            -> Aggregate (cost=4.67..4.67 rows=1 width=4) (actual
> > time=0.486..0.488 rows=1 loops=26250)
> >                  -> Seq Scan on jam_trace_sys c (cost=0.00..4.62
> > rows=15 width=4) (actual time=0.058..0.311 rows=74 loops=26250)
> >                        Filter: ((trace_id = $0) AND (seq_no > $1))
> > Total runtime: 13097.557 ms
> > (13 rows)
> >
> > pglnx01=> \d jam_trace_sys
> >      Table "public.jam_trace_sys"
> >      Column      | Type   | Modifiers
> > -----------------+---------+-----------
> > trace_id        | integer |
> > seq_no          | integer |
> > cpu_utilization | integer |
> > gc_minor        | integer |
> > gc_major        | integer |
> > heap_used       | integer |
> > Indexes:
> >     "jam_trace_sys_n1" btree (trace_id, seq_no)
> >
> > pglnx01=> select count(*) from jam_trace_Sys ;
> > count
> > -------
> >    175
> > (1 row)
> >
> > pglnx01=> select trace_id, count(*) from jam_trace_sys group by trace_id ;
> > trace_id | count
> > ----------+-------
> >        15 |     2
> >        18 |    21
> >        22 |   150
> >        16 |     2
> > (4 rows)
>

pgsql-performance by date:

From: Christopher Kings-Lynne
Date: 20 February 2006, 00:47:45
Subject: Re: Need pointers to "standard" pg database(s) for

From: Fredrik Olsson
Date: 20 February 2006, 05:58:46
Subject: Re: Force another plan.

Re: Avoiding cartesian product - Mailing list pgsql-performance

Previous

Next