Thread: Similar querys, better execution time on worst execution plan

Similar querys, better execution time on worst execution plan

From
"Fernando Papa"
Date:
Hi all!

I have a strange behavior with this query:

SELECT c.id_contenido,p.fecha_publicacion,c.titulo_esp,c.activo,c.activo,s.label_esp as label_sbc,p.orden
,p.tapa_spc,p.tapa_cat,p.tapa_principal,p.id_publicacion,ca.label_esp as label_cat,sp.label_esp as label_spc
FROM cont_contenido c ,cont_publicacion p ,cont_sbc s ,cont_cat ca ,cont_spc sp
 WHERE c.id_instalacion = 2
AND s.id_instalacion = 2
AND p.id_instalacion = 2
AND c.id_contenido = p.id_contenido
AND c.id_sbc = s.id_sbc
--AND (c.activo = 'S' or c.activo = 's')
--AND (s.activo = 'S' or s.activo = 's')
AND upper(c.activo) = 'S'
AND upper(s.activo) = 'S'
AND ca.id_instalacion =  2
AND sp.id_instalacion =  2
AND ca.id_cat = s.id_cat
AND sp.id_spc = ca.id_spc
ORDER BY sp.label_esp ,ca.label_esp ,p.orden

This is the execution plan:
Sort  (cost=128.81..128.83 rows=5 width=189)
  Sort Key: sp.label_esp, ca.label_esp, p.orden
  ->  Nested Loop  (cost=0.00..128.76 rows=5 width=189)
        Join Filter: ("outer".id_contenido = "inner".id_contenido)
        ->  Nested Loop  (cost=0.00..24.70 rows=1 width=134)
              Join Filter: ("inner".id_spc = "outer".id_spc)
              ->  Nested Loop  (cost=0.00..22.46 rows=1 width=111)
                    ->  Nested Loop  (cost=0.00..6.89 rows=1 width=68)
                          Join Filter: ("inner".id_cat = "outer".id_cat)
                          ->  Seq Scan on cont_sbc s  (cost=0.00..4.44 rows=1 width=35)
                                Filter: ((id_instalacion = 2::numeric) AND (upper((activo)::text) = 'S'::text))
                          ->  Seq Scan on cont_cat ca  (cost=0.00..2.31 rows=11 width=33)
                                Filter: (id_instalacion = 2::numeric)
                    ->  Index Scan using cont_cont_cont_sbc_fk_i on cont_contenido c  (cost=0.00..15.56 rows=1
width=43)
                          Index Cond: ((c.id_instalacion = 2::numeric) AND (c.id_sbc = "outer".id_sbc))
                          Filter: (upper((activo)::text) = 'S'::text)
              ->  Seq Scan on cont_spc sp  (cost=0.00..2.16 rows=6 width=23)
                    Filter: (id_instalacion = 2::numeric)
        ->  Seq Scan on cont_publicacion p  (cost=0.00..98.54 rows=442 width=55)
              Filter: (id_instalacion = 2::numeric)

If I replace both "uppers" with "...= 'S' or ...= 's'":

SELECT c.id_contenido,p.fecha_publicacion,c.titulo_esp,c.activo,c.activo,s.label_esp as label_sbc,p.orden
,p.tapa_spc,p.tapa_cat,p.tapa_principal,p.id_publicacion,ca.label_esp as label_cat,sp.label_esp as label_spc
FROM cont_contenido c ,cont_publicacion p ,cont_sbc s ,cont_cat ca ,cont_spc sp
 WHERE c.id_instalacion = 2
AND s.id_instalacion = 2
AND p.id_instalacion = 2
AND c.id_contenido = p.id_contenido
AND c.id_sbc = s.id_sbc
AND (c.activo = 'S' or c.activo = 's')
AND (s.activo = 'S' or s.activo = 's')
AND ca.id_instalacion =  2
AND sp.id_instalacion =  2
AND ca.id_cat = s.id_cat
AND sp.id_spc = ca.id_spc
ORDER BY sp.label_esp ,ca.label_esp ,p.orden

This is the Execution plan:

Sort  (cost=193.98..194.62 rows=256 width=189)
  Sort Key: sp.label_esp, ca.label_esp, p.orden
  ->  Merge Join  (cost=178.07..183.75 rows=256 width=189)
        Merge Cond: ("outer".id_contenido = "inner".id_contenido)
        ->  Sort  (cost=60.11..60.25 rows=56 width=134)
              Sort Key: c.id_contenido
              ->  Merge Join  (cost=57.31..58.50 rows=56 width=134)
                    Merge Cond: ("outer".id_sbc = "inner".id_sbc)
                    ->  Sort  (cost=10.60..10.64 rows=15 width=91)
                          Sort Key: s.id_sbc
                          ->  Merge Join  (cost=10.00..10.32 rows=15 width=91)
                                Merge Cond: ("outer".id_cat = "inner".id_cat)
                                ->  Sort  (cost=5.10..5.12 rows=10 width=56)
                                      Sort Key: ca.id_cat
                                      ->  Merge Join  (cost=4.74..4.94 rows=10 width=56)
                                            Merge Cond: ("outer".id_spc = "inner".id_spc)
                                            ->  Sort  (cost=2.50..2.53 rows=11 width=33)
                                                  Sort Key: ca.id_spc
                                                  ->  Seq Scan on cont_cat ca  (cost=0.00..2.31 rows=11 width=33)
                                                        Filter: (id_instalacion = 2::numeric)
                                            ->  Sort  (cost=2.24..2.26 rows=6 width=23)
                                                  Sort Key: sp.id_spc
                                                  ->  Seq Scan on cont_spc sp  (cost=0.00..2.16 rows=6 width=23)
                                                        Filter: (id_instalacion = 2::numeric)
                                ->  Sort  (cost=4.90..4.96 rows=21 width=35)
                                      Sort Key: s.id_cat
                                      ->  Seq Scan on cont_sbc s  (cost=0.00..4.44 rows=21 width=35)
                                            Filter: ((id_instalacion = 2::numeric) AND ((activo = 'S'::character
varying)OR (activo = 's'::character varying))) 
                    ->  Sort  (cost=46.70..46.94 rows=93 width=43)
                          Sort Key: c.id_sbc
                          ->  Seq Scan on cont_contenido c  (cost=0.00..43.66 rows=93 width=43)
                                Filter: ((id_instalacion = 2::numeric) AND ((activo = 'S'::character varying) OR
(activo= 's'::character varying))) 
        ->  Sort  (cost=117.96..119.06 rows=442 width=55)
              Sort Key: p.id_contenido
              ->  Seq Scan on cont_publicacion p  (cost=0.00..98.54 rows=442 width=55)
                    Filter: (id_instalacion = 2::numeric)


The question is, why the query with the worst execution plan (most expensive, the second) runs faster the query with
thebetter execution plan? 
First Query: 10 runs, avg: 8 sec.
Second Query: 10 runs, avg: 1.8 sec.

I see a fail on the "best" exec plan, the rows I get are around 430, so the first EP expect only 5 rows and the second
EPexpect 256. 

I run 7.3.2 over Solaris.
I did "vacuum full analyze" before

Thanks in advance!


Fernando.-

Re: Similar querys, better execution time on worst execution plan

From
SZUCS Gábor
Date:
Fernando,

1. Try EXPLAIN ANALYZE. Cost alone isn't an absolute measure. I think it's
only to see which parts of the query are expected to be slowest. However,
EXP ANA will give you exact times in msec (which effectively means it
executes the query).

2. I think calling upper() for each row costs more than direct comparison,
but not sure

3. Notice that there are seq scans with filter conditions like
  "id_instalacion = 2::numeric"
  Do you have indices on id_instalacion, which seems to be a numeric field?
if so, try casting the constant expressions in the query to numeric so that
postgresql may find the index. If you don't have such indices, it may be
worth to create them. (I guess you only have it on the table aliased with c,
since it does an index scan there.

4. another guess may be indices on (id_instalacion, activo), or, if activo
has few possible values (for example, it may be only one of three letters,
say, 'S', 'A' or 'K'), partial indices like:

CREATE INDEX cont_sbc_id_ins_S ON cont_sbc (id_instalacion)
    WHERE activo in ('S', 's');
CREATE INDEX cont_sbc_id_ins_A ON cont_sbc (id_instalacion)
    WHERE activo in ('A', 'a');
CREATE INDEX cont_sbc_id_ins_K ON cont_sbc (id_instalacion)
    WHERE activo in ('K', 'k');

G.
------------------------------- cut here -------------------------------
 WHERE c.id_instalacion = 2
AND s.id_instalacion = 2
AND p.id_instalacion = 2
...

 ->  Seq Scan on cont_sbc s  (cost=0.00..4.44 rows=1 width=35)
     Filter: ((id_instalacion = 2::numeric)
          AND (upper((activo)::text) = 'S'::text))
 ->  Index Scan using cont_cont_cont_sbc_fk_i on cont_contenido c
     (cost=0.00..15.56 rows=1 width=43)
     Index Cond: ((c.id_instalacion = 2::numeric)
              AND (c.id_sbc = "outer".id_sbc))
     Filter: (upper((activo)::text) = 'S'::text)
 ->  Seq Scan on cont_publicacion p  (cost=0.00..98.54 rows=442 width=55)
     Filter: (id_instalacion = 2::numeric)



Re: Similar querys, better execution time on worst execution plan

From
"Fernando Papa"
Date:
> -----Mensaje original-----
> De: SZUCS Gábor [mailto:surrano@mailbox.hu]
> Enviado el: jueves, 26 de junio de 2003 7:31
> Para: pgsql-performance@postgresql.org
> Asunto: Re: [PERFORM] Similar querys, better execution time
> on worst execution plan
>
>
> Fernando,
>
> 1. Try EXPLAIN ANALYZE. Cost alone isn't an absolute measure.
> I think it's only to see which parts of the query are
> expected to be slowest. However, EXP ANA will give you exact
> times in msec (which effectively means it executes the query).

Ok, yes, I did only explay because I run several times the query and get avg. run time. but it's true, it's better to
doEXP ANA. 

> 2. I think calling upper() for each row costs more than
> direct comparison, but not sure

It's the only answer than I can found... maybe do a lot of uppers and then compare will be too much than compare with 2
conditions...

> 3. Notice that there are seq scans with filter conditions like
>   "id_instalacion = 2::numeric"
>   Do you have indices on id_instalacion, which seems to be a
> numeric field? if so, try casting the constant expressions in
> the query to numeric so that postgresql may find the index.
> If you don't have such indices, it may be worth to create
> them. (I guess you only have it on the table aliased with c,
> since it does an index scan there.

Yes, we have index on id_instalacion, but now we have only one instalation, so the content of these field, in the 99%
ofthe rows, it's 2. I think in this case it's ok to choose seq scan. 

> 4. another guess may be indices on (id_instalacion, activo),
> or, if activo has few possible values (for example, it may be
> only one of three letters, say, 'S', 'A' or 'K'), partial
> indices like:
>
> CREATE INDEX cont_sbc_id_ins_S ON cont_sbc (id_instalacion)
>     WHERE activo in ('S', 's');
> CREATE INDEX cont_sbc_id_ins_A ON cont_sbc (id_instalacion)
>     WHERE activo in ('A', 'a');
> CREATE INDEX cont_sbc_id_ins_K ON cont_sbc (id_instalacion)
>     WHERE activo in ('K', 'k');
>

I need to recheck about the "quality" of "active" field. Really I don't know if I found a lot of 'S', a lot of 'N',
maybewe will have 50%/50% of 'S' or 'N'. This will be important to define index. 

Thanks for your answer.

Re: Similar querys, better execution time on worst execution plan

From
SZUCS Gábor
Date:
*happy* :)))

G.
------------------------------- cut here -------------------------------
----- Original Message -----
From: "Fernando Papa" <fpapa@claxson.com>
Sent: Thursday, June 26, 2003 3:33 PM


I need to recheck about the "quality" of "active" field. Really I don't know
if I found a lot of 'S', a lot of 'N', maybe we will have 50%/50% of 'S' or
'N'. This will be important to define index.

Thanks for your answer.