Thread: Query plan question

Query plan question

From
"Maksim Likharev"
Date:
Hi,
after looking on one of my query I found interesting thing,
I was trying to force PG do inner join 2 small tables calcs. necessary
aggregates per id
and only after that left outer join with the rest.
Idea was to calc. all aggregates on small set and do the rest

So I was doing something like
SELECT ....
     FROM ( SELECT ( SELECT agr1() .... ), ( SELECT agr2() ... ), .. )
AS pt
        LEFT OUTER JOIN .....
        LEFT OUTER JOIN ....
        .....

What I see, at least from explain output, PG ignores my desire and calc.
aggregate
for every row, so in my example 'small' tables # rows 10000 but what I
see
from plan PG calcs. aggregates for all 10720 rows.
What I'm doing wrong, and more interesting question how to force PG do
what I want to do,
I think I know my data better then PG does ;)

P.S
Sorry for large post...

EXAMPLE:

SELECT ........
    FROM (
        SELECT first, .....,
            ( SELECT CAST( COUNT(*) AS int4 ) FROM
prod.t_pas AS tpa WHERE tpa.kid = p.kid ),
            ( SELECT CAST( COUNT(*) AS int4 ) FROM
prod.t_pinv AS tpi WHERE tpi.kid = p.kid ),
            ( SELECT CAST( COUNT(*) AS int4 ) FROM
prod.t_cit AS ct1 WHERE ct1.kid = p.kid),
            ( SELECT CAST( COUNT(*) AS int4 ) FROM
prod.t_cit AS ct2 WHERE ct2.kid = p.kid ),
            ( SELECT prod.pgag_list(pncl1.fld) FROM
prod.t_pcls AS pncl1 WHERE pncl1.paid = p.kid  ),
            ( SELECT prod.pgag_list(pipc1.fld) FROM
prod.t_pics AS pipc1 WHERE pipc1.kid = p.kid )

            FROM prod.t_p AS p INNER JOIN t_temp AS t
                ON p.did = t.did
            LEFT OUTER JOIN prod.t_pinv AS pi
                ON p.kid = pi.kid AND pi.orderid = 'S'
        ) AS pt

            LEFT OUTER JOIN prod.t_dmp AS pdb
                ON pt.kid = pdb.kid
            LEFT OUTER JOIN prod.t_inv AS i
                ON pt.first = i.invid
            LEFT OUTER JOIN prod.t_pata AS pa
                ON pt.kid = pa.kid
            LEFT OUTER JOIN prod.t_as AS a
                ON pa.aid = a.aid;

Nested Loop  (cost=0.00..28286.07 rows=1000 width=183) (actual
time=111.80..140707.84 rows=10720 loops=1)
   ->  Nested Loop  (cost=0.00..25260.07 rows=1000 width=154) (actual
time=50.30..45994.51 rows=10720 loops=1)
         ->  Nested Loop  (cost=0.00..19926.91 rows=1000 width=141)
(actual time=49.81..42713.54 rows=10000 loops=1)
               ->  Nested Loop  (cost=0.00..16900.91 rows=1000
width=115) (actual time=17.41..13394.78 rows=10000 loops=1)
                     ->  Nested Loop  (cost=0.00..11249.73 rows=1000
width=97) (actual time=2.15..6734.64 rows=10000 loops=1)
                           Join Filter: ("inner".orderid = 'S'::bpchar)
                           ->  Nested Loop  (cost=0.00..5791.35
rows=1000 width=84) (actual time=1.58..2860.57 rows=10000 loops=1)
                                 ->  Seq Scan on t_temp t
(cost=0.00..20.00 rows=1000 width=20) (actual time=0.06..171.12
rows=10000 loops=1)
                                 ->  Index Scan using t_pdid on t_pt p
(cost=0.00..5.76 rows=1 width=64) (actual time=0.20..0.22 rows=1
loops=10000)
                                       Index Cond: (p.did = "outer".did)
                           ->  Index Scan using t_pinviid on t_pinv pi
(cost=0.00..5.44 rows=1 width=13) (actual time=0.21..0.31 rows=2
loops=10000)
                                 Index Cond: ("outer".kid = pi.kid)
                     ->  Index Scan using t_dmpid on t_dmp pdb
(cost=0.00..5.64 rows=1 width=18) (actual time=0.58..0.60 rows=1
loops=10000)
                           Index Cond: ("outer".kid = pdb.kid)
               ->  Index Scan using t_invid on t_inv i  (cost=0.00..3.01
rows=1 width=26) (actual time=2.83..2.85 rows=1 loops=10000)
                     Index Cond: ("outer".iid = i.iid)
         ->  Index Scan using t_pasid on t_pas pa  (cost=0.00..5.32
rows=1 width=13) (actual time=0.21..0.23 rows=1 loops=10000)
               Index Cond: ("outer".kid = pa.kid)
   ->  Index Scan using t_asid on t_as a  (cost=0.00..3.01 rows=1
width=29) (actual time=0.64..0.66 rows=1 loops=10720)
         Index Cond: ("outer".aid = a.aid)
   SubPlan
     ->  Aggregate  (cost=5.32..5.32 rows=1 width=0) (actual
time=0.17..0.17 rows=1 loops=10720)
           ->  Index Scan using t_pasid on t_pas tpa  (cost=0.00..5.32
rows=1 width=0) (actual time=0.11..0.13 rows=1 loops=10720)
                 Index Cond: (kid = $0)
     ->  Aggregate  (cost=5.45..5.45 rows=1 width=0) (actual
time=0.25..0.25 rows=1 loops=10720)
           ->  Index Scan using t_pinviid on t_pinv tpi
(cost=0.00..5.44 rows=1 width=0) (actual time=0.13..0.21 rows=2
loops=10720)
                 Index Cond: (kid = $0)
     ->  Aggregate  (cost=8.72..8.72 rows=1 width=0) (actual
time=0.33..0.33 rows=1 loops=10720)
           ->  Index Scan using t_citid on t_cit ct1  (cost=0.00..8.67
rows=19 width=0) (actual time=0.19..0.28 rows=4 loops=10720)
                 Index Cond: (kid = $0)
     ->  Aggregate  (cost=78.60..78.60 rows=1 width=0) (actual
time=0.33..0.33 rows=1 loops=10720)
           ->  Index Scan using t_cdid on t_cit ct2  (cost=0.00..78.55
rows=19 width=0) (actual time=0.20..0.30 rows=1 loops=10720)
                 Index Cond: (did = $1)
     ->  Aggregate  (cost=4.51..4.51 rows=1 width=11) (actual
time=0.22..0.22 rows=1 loops=10720)
           ->  Index Scan using t_pclsid on t_pcls pncl1
(cost=0.00..4.51 rows=1 width=11) (actual time=0.13..0.14 rows=0
loops=10720)
                 Index Cond: (kid = $0)
     ->  Aggregate  (cost=5.93..5.93 rows=1 width=12) (actual
time=0.44..0.44 rows=1 loops=10720)
           ->  Index Scan using t_picsid2 on t_pics pipc1
(cost=0.00..5.93 rows=1 width=12) (actual time=0.20..0.24 rows=2
loops=10720)
                 Index Cond: (kid = $0)



Re: Query plan question

From
Tom Lane
Date:
"Maksim Likharev" <mlikharev@aurigin.com> writes:
> I was trying to force PG do inner join 2 small tables calcs. necessary
> aggregates per id
> and only after that left outer join with the rest.

AFAICS you're contorting your query to force the evaluation order.
Why are you complaining that PG follows what you told it to do?

I'm having a hard time figuring out exactly what the query's intent
is.  What does it look like when you express it in the simplest way
possible, with minimum use of subselects?

            regards, tom lane

Re: Query plan question

From
"Maksim Likharev"
Date:
Hi Tom,
basically I complaining that PG does not do what I told to do or
was hoping to do.

What I was hopping to do:
join temp table and main table + evaluate all aggregates for small
subset,
in my case subset is not so small 10000 rows and after that join
to other tables that will effectively grow resultset size, due to one to
many
relations.
But what I see PG calculates all aggregates for the final resultset in
my case 10720,
but that could be up to x5 times more.

In order to be more clear let's consider following simplification

4 tables DOCS, REVIEWERS, REVISIONS AND OTHER
[docid] is primary key in DOCS,
DOCS one to many for REVIVERS, REVISIONS, OTHER

I want following output:
[docid], [reviewrs cnt], [revisions cnt], otherfields...

So what could be simpler than

SELECT ....
    FROM( SELECT docid, stuff, ....
        ( SELECT count(...) FROM REVIEWERS ),
        ( SELECT count(...) FROM REVISIONS )
        FROM DOCS
          ) AS t
       LEFT OUTER JOIN OTHER ....

I want result set [t] to be evaluated first.
I reality I have 4/6 aggregate to calc. and 3/4 tables to join.




-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, June 25, 2003 7:53 PM
To: Maksim Likharev
Cc: GENERAL
Subject: Re: [GENERAL] Query plan question


"Maksim Likharev" <mlikharev@aurigin.com> writes:
> I was trying to force PG do inner join 2 small tables calcs. necessary
> aggregates per id
> and only after that left outer join with the rest.

AFAICS you're contorting your query to force the evaluation order.
Why are you complaining that PG follows what you told it to do?

I'm having a hard time figuring out exactly what the query's intent
is.  What does it look like when you express it in the simplest way
possible, with minimum use of subselects?

            regards, tom lane

Re: Query plan question

From
Tom Lane
Date:
"Maksim Likharev" <mlikharev@aurigin.com> writes:
> basically I complaining that PG does not do what I told to do or
> was hoping to do.

Okay, now I get the point: you want to prevent the "pt" sub-select from
being flattened into the outer query.

7.3.1 through 7.3.3 will actually do what you want (they won't flatten a
sub-select that has any sub-selects in its output list) but we got a lot
of flak for that and 7.4 will go back to the prior behavior.  In most
scenarios it's a win for the planner to flatten wherever possible.

Probably the easiest way to handle it is to insert a DISTINCT or LIMIT
clause in the sub-select; that will unconditionally keep the planner
from flattening the sub-select.  For example,

        ...
            FROM prod.t_p AS p INNER JOIN t_temp AS t
                ON p.did = t.did
            LEFT OUTER JOIN prod.t_pinv AS pi
                ON p.kid = pi.kid AND pi.orderid = 'S'
            -- hack to keep this separate from outer plan:
            OFFSET 0
        ) AS pt

            LEFT OUTER JOIN prod.t_dmp AS pdb
                ON pt.kid = pdb.kid
        ...

I don't foresee any future planner changes that would be likely to
bypass a LIMIT/OFFSET clause.

            regards, tom lane

Re: Query plan question

From
"Maksim Likharev"
Date:
Thanks Tom,
works, have to test performance....


-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Thursday, June 26, 2003 7:36 AM
To: Maksim Likharev
Cc: GENERAL
Subject: Re: [GENERAL] Query plan question


"Maksim Likharev" <mlikharev@aurigin.com> writes:
> basically I complaining that PG does not do what I told to do or
> was hoping to do.

Okay, now I get the point: you want to prevent the "pt" sub-select from
being flattened into the outer query.

7.3.1 through 7.3.3 will actually do what you want (they won't flatten a
sub-select that has any sub-selects in its output list) but we got a lot
of flak for that and 7.4 will go back to the prior behavior.  In most
scenarios it's a win for the planner to flatten wherever possible.

Probably the easiest way to handle it is to insert a DISTINCT or LIMIT
clause in the sub-select; that will unconditionally keep the planner
from flattening the sub-select.  For example,

        ...
            FROM prod.t_p AS p INNER JOIN t_temp AS t
                ON p.did = t.did
            LEFT OUTER JOIN prod.t_pinv AS pi
                ON p.kid = pi.kid AND pi.orderid = 'S'
            -- hack to keep this separate from outer plan:
            OFFSET 0
        ) AS pt

            LEFT OUTER JOIN prod.t_dmp AS pdb
                ON pt.kid = pdb.kid
        ...

I don't foresee any future planner changes that would be likely to
bypass a LIMIT/OFFSET clause.

            regards, tom lane