Avoiding cartesian product - Mailing list pgsql-performance
From | Virag Saksena |
---|---|
Subject | Avoiding cartesian product |
Date | |
Msg-id | 002b01c610e5$21cc7340$3100000a@demo1 Whole thread Raw |
Responses |
Re: Avoiding cartesian product
|
List | pgsql-performance |
I have a table which stores cumulative values
I would like to display/chart the deltas between successive data collections
I would like to display/chart the deltas between successive data collections
If my primary key only increments by 1, I could write a simple query
select b.gc_minor - a.gc_minor, b.gc_major - a.gc_major
from jam_trace_sys a, jam_trace_sys b
where a.trace_id = 22
and b.trace_id = a.trace_id
and b.seq_no = a.seq_no + 1
order by a.seq_no;
from jam_trace_sys a, jam_trace_sys b
where a.trace_id = 22
and b.trace_id = a.trace_id
and b.seq_no = a.seq_no + 1
order by a.seq_no;
However the difference in sequence number is variable.
So (in Oracle) I used to extract the next seq_no using a correlated sub-query
So (in Oracle) I used to extract the next seq_no using a correlated sub-query
select b.gc_minor - a.gc_minor, b.gc_major - a.gc_major
from jam_trace_sys a, jam_trace_sys b
where a.trace_id = 22
and (b.trace_id, b.seq_no) =
(select a.trace_id, min(c.seq_no) from jam_trace_sys c
where c.trace_id = a.trace_id and c.seq_no > a.seq_no)
order by a.seq_no;
from jam_trace_sys a, jam_trace_sys b
where a.trace_id = 22
and (b.trace_id, b.seq_no) =
(select a.trace_id, min(c.seq_no) from jam_trace_sys c
where c.trace_id = a.trace_id and c.seq_no > a.seq_no)
order by a.seq_no;
For every row in A, The correlated sub-query from C will execute
With an appropriate index, it will just descend the index Btree
go one row to the right and return that row (min > :value)
and join to table B
With an appropriate index, it will just descend the index Btree
go one row to the right and return that row (min > :value)
and join to table B
SELECT STATEMENT
SORT ORDER BY
TABLE ACCESS BY INDEX ROWID JAM_TRACE_SYS B
NESTED LOOPS
TABLE ACCESS BY INDEX ROWID JAM_TRACE_SYS A
INDEX RANGE SCAN JAM_TRACE_SYS_N1 A
INDEX RANGE SCAN JAM_TRACE_SYS_N1 B
SORT AGGREGATE
INDEX RANGE SCAN JAM_TRACE_SYS_N1 C
SORT ORDER BY
TABLE ACCESS BY INDEX ROWID JAM_TRACE_SYS B
NESTED LOOPS
TABLE ACCESS BY INDEX ROWID JAM_TRACE_SYS A
INDEX RANGE SCAN JAM_TRACE_SYS_N1 A
INDEX RANGE SCAN JAM_TRACE_SYS_N1 B
SORT AGGREGATE
INDEX RANGE SCAN JAM_TRACE_SYS_N1 C
In postgreSQL A and B are doing a cartesian product
then C gets executed for every row in this cartesian product
and most of the extra rows get thrown out.
Is there any way to force an execution plan like above where the correlated subquery runs before going to B.
The table is small right now, but it will grow to have millions of rows
then C gets executed for every row in this cartesian product
and most of the extra rows get thrown out.
Is there any way to force an execution plan like above where the correlated subquery runs before going to B.
The table is small right now, but it will grow to have millions of rows
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Sort (cost=124911.81..124944.84 rows=13213 width=20) (actual time=13096.754..13097.053 rows=149 loops=1)
Sort Key: a.seq_no
-> Nested Loop (cost=4.34..124007.40 rows=13213 width=20) (actual time=1948.300..13096.329 rows=149 loops=1)
Join Filter: (subplan)
-> Seq Scan on jam_trace_sys b (cost=0.00..3.75 rows=175 width=16) (actual time=0.005..0.534 rows=175 loops=1)
-> Materialize (cost=4.34..5.85 rows=151 width=16) (actual time=0.002..0.324 rows=150 loops=175)
-> Seq Scan on jam_trace_sys a (cost=0.00..4.19 rows=151 width=16) (actual time=0.022..0.687 rows=150 loops=1)
Filter: (trace_id = 22)
SubPlan
-> Aggregate (cost=4.67..4.67 rows=1 width=4) (actual time=0.486..0.488 rows=1 loops=26250)
-> Seq Scan on jam_trace_sys c (cost=0.00..4.62 rows=15 width=4) (actual time=0.058..0.311 rows=74 loops=26250)
Filter: ((trace_id = $0) AND (seq_no > $1))
Total runtime: 13097.557 ms
(13 rows)
-----------------------------------------------------------------------------------------------------------------------------------
Sort (cost=124911.81..124944.84 rows=13213 width=20) (actual time=13096.754..13097.053 rows=149 loops=1)
Sort Key: a.seq_no
-> Nested Loop (cost=4.34..124007.40 rows=13213 width=20) (actual time=1948.300..13096.329 rows=149 loops=1)
Join Filter: (subplan)
-> Seq Scan on jam_trace_sys b (cost=0.00..3.75 rows=175 width=16) (actual time=0.005..0.534 rows=175 loops=1)
-> Materialize (cost=4.34..5.85 rows=151 width=16) (actual time=0.002..0.324 rows=150 loops=175)
-> Seq Scan on jam_trace_sys a (cost=0.00..4.19 rows=151 width=16) (actual time=0.022..0.687 rows=150 loops=1)
Filter: (trace_id = 22)
SubPlan
-> Aggregate (cost=4.67..4.67 rows=1 width=4) (actual time=0.486..0.488 rows=1 loops=26250)
-> Seq Scan on jam_trace_sys c (cost=0.00..4.62 rows=15 width=4) (actual time=0.058..0.311 rows=74 loops=26250)
Filter: ((trace_id = $0) AND (seq_no > $1))
Total runtime: 13097.557 ms
(13 rows)
pglnx01=> \d jam_trace_sys
Table "public.jam_trace_sys"
Column | Type | Modifiers
-----------------+---------+-----------
trace_id | integer |
seq_no | integer |
cpu_utilization | integer |
gc_minor | integer |
gc_major | integer |
heap_used | integer |
Indexes:
"jam_trace_sys_n1" btree (trace_id, seq_no)
Table "public.jam_trace_sys"
Column | Type | Modifiers
-----------------+---------+-----------
trace_id | integer |
seq_no | integer |
cpu_utilization | integer |
gc_minor | integer |
gc_major | integer |
heap_used | integer |
Indexes:
"jam_trace_sys_n1" btree (trace_id, seq_no)
pglnx01=> select count(*) from jam_trace_Sys ;
count
-------
175
(1 row)
count
-------
175
(1 row)
pglnx01=> select trace_id, count(*) from jam_trace_sys group by trace_id ;
trace_id | count
----------+-------
15 | 2
18 | 21
22 | 150
16 | 2
(4 rows)
trace_id | count
----------+-------
15 | 2
18 | 21
22 | 150
16 | 2
(4 rows)
pgsql-performance by date: