Thread: left join performance problem

left join performance problem

From

pginfo

Date:

11 June 2003, 13:43:08

Hi,
I am using pg 7.3.1 on a dual r.h. 7.3 box.

I have a big problem with pg left join performance.

My plan is:
=# explain analyze select D.IDS AS DIDS ,D.IDS_SKLAD, D.IDS_KO AS
DIDSKO,KL.MNAME AS KLNAME, D.NOMER AS DNOMER,D.DATE_OP,  S.MED AS
MEDNAME, NOM.MNAME AS NOMNAME,S.IDS_NUM, S.KOL,
S.CENA,S.VAL,S.TOT,S.DTO,S.PTO ,M.OTN AS MOTN  FROM A_KLIENTI KL ,
A_NOMEN NOM, A_DOC D,A_SKLA
D S  left outer join A_MESKLAD M ON(S.IDS=M.IDS)   WHERE  D.OP=4 AND
D.IDS=S.IDS_DOC AND D.IDS_KO=KL.IDS AND S.IDS_NUM=NOM.IDS AND  KL.IDS_G
RUPA  = 'SOF_112' ;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------

 Nested Loop  (cost=460869.55..470785.29 rows=20 width=1034) (actual
time=50139.27..57565.34 rows=12990 loops=1)
   ->  Hash Join  (cost=460869.55..470662.48 rows=20 width=862) (actual
time=50139.02..57246.35 rows=12990 loops=1)
         Hash Cond: ("outer".ids_doc = "inner".ids)
         ->  Merge Join  (cost=457324.89..463038.60 rows=815792
width=356) (actual time=48128.32..53430.02 rows=815926 loops=1)
               Merge Cond: ("outer".ids = "inner".ids)
               ->  Index Scan using a_mesklad_pkey on a_mesklad m
(cost=0.00..1395.47 rows=15952 width=72) (actual time=0.21..109.19
rows=15952 loops=1)
               ->  Sort  (cost=457324.89..459364.37 rows=815792
width=284) (actual time=48128.05..49380.06 rows=815926 loops=1)
                     Sort Key: s.ids
                     ->  Seq Scan on a_sklad s  (cost=0.00..74502.92
rows=815792 width=284) (actual time=4.32..16777.16 rows=815926 loops=1)
         ->  Hash  (cost=3544.65..3544.65 rows=3 width=506) (actual
time=1104.34..1104.34 rows=0 loops=1)
               ->  Hash Join  (cost=905.35..3544.65 rows=3 width=506)
(actual time=428.32..1098.52 rows=1966 loops=1)
                     Hash Cond: ("outer".ids_ko = "inner".ids)
                     ->  Index Scan using i_doc_op on a_doc d
(cost=0.00..2625.71 rows=677 width=244) (actual time=29.27..690.86
rows=1981 loops=1)
                           Index Cond: (op = 4)
                     ->  Hash  (cost=905.19..905.19 rows=65 width=262)
(actual time=398.97..398.97 rows=0 loops=1)
                           ->  Seq Scan on a_klienti kl
(cost=0.00..905.19 rows=65 width=262) (actual time=396.68..398.93 rows=7
loops=1)
                                 Filter: (ids_grupa = 'SOF_112'::name)
   ->  Index Scan using a_nomen_pkey on a_nomen nom  (cost=0.00..6.01
rows=1 width=172) (actual time=0.01..0.02 rows=1 loops=12990)
         Index Cond: ("outer".ids_num = nom.ids)
 Total runtime: 57749.24 msec
(20 rows)



If I remove the join ( I know it is not very correct and I receive 19
rows as answer) it is working very fast.
The plan is:

 explain analyze select D.IDS AS DIDS ,D.IDS_SKLAD, D.IDS_KO AS
DIDSKO,KL.MNAME AS KLNAME, D.NOMER AS DNOMER,D.DATE_OP,  S.MED AS
MEDNAME, NOM.MNAME AS NOMNAME,S.IDS_NUM, S.KOL,
S.CENA,S.VAL,S.TOT,S.DTO,S.PTO ,M.OTN AS MOTN  FROM A_KLIENTI KL ,
A_NOMEN NOM, A_DOC D,A_SKLAD S ,A_MESKLAD M  WHERE  S.IDS=M.IDS AND
D.OP=4 AND D.IDS=S.IDS_DOC AND D.IDS_KO=KL.IDS AND S.IDS_NUM=NOM.IDS
AND  D.NOMER  like '%0905'  AND  KL.IDS_GRUPA  = 'SOF_112'  ORDER BY
D.IDS,S.IDS_NUM,S.ORDER_NUM ;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------

 Sort  (cost=18897.33..18897.33 rows=1 width=1038) (actual
time=36.33..36.35 rows=48 loops=1)
   Sort Key: d.ids, s.ids_num, s.order_num
   ->  Nested Loop  (cost=0.00..18897.32 rows=1 width=1038) (actual
time=30.90..35.93 rows=48 loops=1)
         ->  Nested Loop  (cost=0.00..18891.29 rows=1 width=866) (actual
time=30.70..33.34 rows=48 loops=1)
               ->  Nested Loop  (cost=0.00..18885.28 rows=1 width=794)
(actual time=30.44..31.98 rows=48 loops=1)
                     ->  Nested Loop  (cost=0.00..2633.93 rows=1
width=506) (actual time=30.18..30.62 rows=1 loops=1)
                           ->  Index Scan using i_doc_op on a_doc d
(cost=0.00..2627.40 rows=1 width=244) (actual time=29.93..30.36 rows=1
loops=1)
                                 Index Cond: (op = 4)
                                 Filter: (nomer ~~ '%0905'::text)
                           ->  Index Scan using a_klienti_pkey on
a_klienti kl  (cost=0.00..6.01 rows=1 width=262) (actual time=0.23..0.23
rows=1 loops=1)
                                 Index Cond: ("outer".ids_ko = kl.ids)
                                 Filter: (ids_grupa = 'SOF_112'::name)
                     ->  Index Scan using i_sklad_ids_doc on a_sklad s
(cost=0.00..16200.36 rows=4079 width=288) (actual time=0.24..0.95
rows=48 loops=1)
                           Index Cond: ("outer".ids = s.ids_doc)
               ->  Index Scan using a_mesklad_pkey on a_mesklad m
(cost=0.00..6.01 rows=1 width=72) (actual time=0.02..0.02 rows=1
loops=48)
                     Index Cond: ("outer".ids = m.ids)
         ->  Index Scan using a_nomen_pkey on a_nomen nom
(cost=0.00..6.01 rows=1 width=172) (actual time=0.04..0.04 rows=1
loops=48)
               Index Cond: ("outer".ids_num = nom.ids)
 Total runtime: 36.98 msec
(19 rows)


Also S.IDS and M.IDS are name and primary key's.
I can not find my problem.
Any idea will help.
Of cours I can make the query with two selects and will work fast, but I
think it is not good solution.
regards,
ivan.

Re: left join performance problem

From

Tom Lane

Date:

11 June 2003, 23:58:04

pginfo <pginfo@t1.unisoftbg.com> writes:
> I have a big problem with pg left join performance.

I think the problem is that the LEFT JOIN clause is forcing the
planner to join A_SKLAD to A_MESKLAD before anything else, whereas
a good plan would do some of the other joins first to eliminate
as many rows as possible.  You will need to revise the query to
let the LEFT JOIN happen later.  For discussion see
http://www.postgresql.org/docs/view.php?version=7.3&idoc=0&file=explicit-joins.html

            regards, tom lane

Re: left join performance problem

From

pginfo

Date:

12 June 2003, 02:52:48

Many thanks Tom,
the doc do not contain solution for this case, but the idea to
change the join order was excelent and all is working fine at the moment.

regards,
ivan.

Tom Lane wrote:

pginfo <pginfo@t1.unisoftbg.com> writes:
> I have a big problem with pg left join performance.
I think the problem is that the LEFT JOIN clause is forcing the
planner to join A_SKLAD to A_MESKLAD before anything else, whereas
a good plan would do some of the other joins first to eliminate
as many rows as possible. You will need to revise the query to
let the LEFT JOIN happen later. For discussion see
http://www.postgresql.org/docs/view.php?version=7.3&idoc=0&file=explicit-joins.html
regards, tom lane

Re: left join performance problem

From

"Jim C. Nasby"

Date:

16 June 2003, 02:10:50

On Thu, Jun 12, 2003 at 06:48:27AM +0200, pginfo wrote:
> Many thanks Tom,
> the doc do not contain solution for this case, but the idea to
> change the join order was excelent and all is working fine at the moment.

Any chance of getting a TODO added that would provide the option of
having the optimizer pick join order when you're using the ANSI join
syntax?

IMHO I think it's bad that using the ANSI syntax forces join order; it
would be much better to come up with a custom syntax for this like
everyone else does. But I'm sure people won't want to change the
existing behavior, so special syntax to do the opposite is almost as
good.
--
Jim C. Nasby (aka Decibel!)                    jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: left join performance problem

From

Tom Lane

Date:

16 June 2003, 02:28:40

"Jim C. Nasby" <jim@nasby.net> writes:
> Any chance of getting a TODO added that would provide the option of
> having the optimizer pick join order when you're using the ANSI join
> syntax?

No ... because it's already DONE.

            regards, tom lane

Re: left join performance problem

From

"Jim C. Nasby"

Date:

16 June 2003, 02:36:40

On Mon, Jun 16, 2003 at 01:28:26AM -0400, Tom Lane wrote:
> "Jim C. Nasby" <jim@nasby.net> writes:
> > Any chance of getting a TODO added that would provide the option of
> > having the optimizer pick join order when you're using the ANSI join
> > syntax?
>
> No ... because it's already DONE.

DOH, I forgot about the subselect trick. Nevermind.

*wipes egg off face*
--
Jim C. Nasby (aka Decibel!)                    jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: left join performance problem

From

Bruno Wolff III

Date:

16 June 2003, 05:05:11

On Mon, Jun 16, 2003 at 00:36:29 -0500,
  "Jim C. Nasby" <jim@nasby.net> wrote:
> On Mon, Jun 16, 2003 at 01:28:26AM -0400, Tom Lane wrote:
> > "Jim C. Nasby" <jim@nasby.net> writes:
> > > Any chance of getting a TODO added that would provide the option of
> > > having the optimizer pick join order when you're using the ANSI join
> > > syntax?
> >
> > No ... because it's already DONE.
>
> DOH, I forgot about the subselect trick. Nevermind.

In 7.4 there is a GUC setting to control this. I believe the default
is to not constrain the join order any more than is necessary to
preserve semantics.

Re: left join performance problem

From

Tom Lane

Date:

16 June 2003, 10:26:31

"Jim C. Nasby" <jim@nasby.net> writes:
> On Mon, Jun 16, 2003 at 01:28:26AM -0400, Tom Lane wrote:
>> "Jim C. Nasby" <jim@nasby.net> writes:
> Any chance of getting a TODO added that would provide the option of
> having the optimizer pick join order when you're using the ANSI join
> syntax?
>>
>> No ... because it's already DONE.

> DOH, I forgot about the subselect trick. Nevermind.

No, I wasn't talking about that.  See
http://developer.postgresql.org/docs/postgres/explicit-joins.html
for the way it works in CVS tip.

            regards, tom lane

Re: left join performance problem

From

"Jim C. Nasby"

Date:

16 June 2003, 19:41:17

On Mon, Jun 16, 2003 at 09:26:13AM -0400, Tom Lane wrote:
> "Jim C. Nasby" <jim@nasby.net> writes:
> > On Mon, Jun 16, 2003 at 01:28:26AM -0400, Tom Lane wrote:
> >> "Jim C. Nasby" <jim@nasby.net> writes:
> > Any chance of getting a TODO added that would provide the option of
> > having the optimizer pick join order when you're using the ANSI join
> > syntax?
> >>
> >> No ... because it's already DONE.
>
> > DOH, I forgot about the subselect trick. Nevermind.
>
> No, I wasn't talking about that.  See
> http://developer.postgresql.org/docs/postgres/explicit-joins.html
> for the way it works in CVS tip.

Ahh, cool. BTW, I think it should be prominently mentioned in
http://developer.postgresql.org/docs/postgres/sql-select.html#SQL-FROM
that ANSI join syntax can force join order, since most DBA's unfamiliar
with pgsql probably won't be expecting that.
--
Jim C. Nasby (aka Decibel!)                    jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"