Thread: postgres FROM clause problem

postgres FROM clause problem

From
Paolo Tavalazzi
Date:
I need help!


I suppose that  for query like :

SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id;

the planner is free to join the given tables in any order to find the most efficent query plan.



I have two query that they are different only for order of the tables in FROM lclause , but give back different query
plan:  

1)   ###################

explain analyze SELECT
    teatri.code,
    spettacoli.code,
    sum(seat.price)
    sum(seat.prev)

  FROM
    seat,
    spettacoli,
    tran ,
    teatri

  WHERE
    tran.time >= timestamp '2004-03-01 00:00:00' AND
    tran.time <= timestamp '2004-03-08 23:59:59' AND
    tran.perf_code           = spettacoli.code AND
    tran.thea_code           = spettacoli.teatro AND
    tran.id                  = seat.bt_id AND
    tran.system              = seat.system AND
    spettacoli.system        = tran.system AND
    spettacoli.teatro        = teatri.code
  GROUP BY
    1,2

with EXPLAIN :



HashAggregate  (cost=8540.03..8540.20 rows=5 width=30) (actual  time=3210.659..3212.403 rows=1317 loops=1)
   ->  Nested Loop  (cost=7853.66..8539.98 rows=5 width=30) (actual time=1316.663..2822.181 rows=67218 loops=1)
         ->  Nested Loop  (cost=7853.66..8516.75 rows=5 width=43) (actual time=1316.625..1901.870 rows=68167 loops=1)
               ->  Merge Join  (cost=7853.66..8462.71 rows=9 width=47) (actual time=1316.555..1519.227 rows=24919
loops=1)
                     Merge Cond: (("outer"."?column4?" = "inner"."?column6?") AND ("outer".system = "inner".system) AND
("outer".teatro= "inner".thea_code)) 
                     ->  Sort  (cost=6158.07..6298.27 rows=56079 width=26) (actual time=650.575..658.869 rows=48297
loops=1)
                           Sort Key: (spettacoli.code)::text, spettacoli.system, spettacoli.teatro
                           ->  Seq Scan on spettacoli  (cost=0.00..1734.79 rows=56079 width=26) (actual
time=0.020..94.725rows=56079 loops=1) 
                     ->  Sort  (cost=1695.59..1727.03 rows=12576 width=32) (actual time=619.904..625.457 rows=24923
loops=1)
                           Sort Key: (tran.perf_code)::text, tran.system, tran.thea_code
                           ->  Index Scan using time_idx on tran  (cost=0.00..839.27 rows=12576 width=32) (actual
time=0.121..104.945rows=24923 loops=1) 
                                 Index Cond: (("time" >= ('2004-03-01 00:00:00'::timestamp without time
zone)::timestampwith time zone) AND ("time" <= ('2004-03-08 23:59:59'::timestamp without time zone)::timestamp with
timezone)) 
                                 Filter: ((kind <> 9) AND (kind <> 10) AND (kind <> 3) AND (kind <> 7) AND (status =
0))
               ->  Index Scan using id_system_idx on seat  (cost=0.00..5.99 rows=1 width=10) (actual time=0.007..0.010
rows=3loops=24919) 
                     Index Cond: (("outer".id = seat.bt_id) AND ("outer".system = seat.system))
         ->  Index Scan using teatri_pkey on teatri  (cost=0.00..4.63 rows=1 width=13) (actual time=0.010..0.011 rows=1
loops=68167)
               Index Cond: (teatri.code = "outer".thea_code)
 Total runtime: 3224.260 ms


2)

explain analyze SELECT
    teatri.code,
    spettacoli.code,
    sum(seat.price),
    sum(seat.prev)


  FROM
    seat,
    teatri,
    tran ,
    spettacoli

  WHERE
    tran.time >= timestamp '2004-03-01 00:00:00' AND
    tran.time <= timestamp '2004-03-08 23:59:59' AND
    tran.perf_code           = spettacoli.code AND
    tran.thea_code           = spettacoli.teatro AND
    tran.id                  = seat.bt_id AND
    tran.system              = seat.system AND
    spettacoli.system        = tran.system AND
    spettacoli.teatro        = teatri.code
  GROUP BY
    1,2


with EXPLAIN :


 HashAggregate  (cost=8355.46..8355.49 rows=1 width=30) (actual time=48126.117..48127.715 rows=1317 loops=1)
   ->  Nested Loop  (cost=1909.00..8355.45 rows=1 width=30) (actual time=504.927..47738.287 rows=67218 loops=1)
         ->  Merge Join  (cost=1909.00..8301.41 rows=9 width=34) (actual time=504.873..47385.221 rows=24788 loops=1)
               Merge Cond: (("outer".system = "inner".system) AND ("outer".teatro = "inner".code))
               Join Filter: (("inner".perf_code)::text = ("outer".code)::text)
               ->  Index Scan using spe_sys_tea_perf on spettacoli  (cost=0.00..5856.59 rows=56079 width=26) (actual
time=0.019..67.663rows=55352 loops=1) 
               ->  Sort  (cost=1909.00..1940.44 rows=12576 width=45) (actual time=501.736..2592.348 rows=12816353
loops=1)
                     Sort Key: tran.system, teatri.code
                     ->  Hash Join  (cost=24.78..1052.68 rows=12576 width=45) (actual time=1.581..190.530 rows=24792
loops=1)
                           Hash Cond: ("outer".thea_code = "inner".code)
                           ->  Index Scan using time_idx on tran  (cost=0.00..839.27 rows=12576 width=32) (actual
time=0.074..67.643rows=24923 loops=1) 
                                 Index Cond: (("time" >= ('2004-03-01 00:00:00'::timestamp without time
zone)::timestampwith time zone) AND ("time" <= ('2004-03-08 23:59:59'::timestamp without time zone)::timestamp with
timezone)) 
                                 Filter: ((kind <> 9) AND (kind <> 10) AND (kind <> 3) AND (kind <> 7) AND (status =
0))
                           ->  Hash  (cost=22.62..22.62 rows=862 width=13) (actual time=1.348..1.348 rows=0 loops=1)
                                 ->  Seq Scan on teatri  (cost=0.00..22.62 rows=862 width=13) (actual time=0.011..0.792
rows=862loops=1) 
         ->  Index Scan using id_system_idx on seat  (cost=0.00..5.99 rows=1 width=10) (actual time=0.006..0.009 rows=3
loops=24788)
               Index Cond: (("outer".id = seat.bt_id) AND ("outer".system = seat.system))

 Total runtime: 48131.798 ms


Why this difference?
i must not have the same plan in both the cases?
I use postgres 7.4
thank you!


Re: postgres FROM clause problem

From
Tom Lane
Date:
Paolo Tavalazzi <ptavalazzi@charta.it> writes:
> I have two query that they are different only for order of the tables
> in FROM lclause , but give back different query plan :

Hm, seems like the planner is making wacko estimates in the second case.
You didn't say what data types are involved in this query --- are any of
the join columns int8 or float8 or timestamp?  If so you might be
getting bitten by the 7.4 pg_statistic alignment bug.  Please follow the
repair procedures suggested in the 7.4.2 release notes:
http://developer.postgresql.org/docs/postgres/release.html#RELEASE-7-4-2
and see if that improves matters.

            regards, tom lane

Re: postgres FROM clause problem

From
Paolo Tavalazzi
Date:
Alle 16:54, mercoledì 10 marzo 2004, hai scritto:
> Paolo Tavalazzi <ptavalazzi@charta.it> writes:
> > I have two query that they are different only for order of the tables
> > in FROM lclause , but give back different query plan :
>
> Hm, seems like the planner is making wacko estimates in the second case.
> You didn't say what data types are involved in this query --- are any of
> the join columns int8 or float8 or timestamp?  If so you might be
> getting bitten by the 7.4 pg_statistic alignment bug.  Please follow the
> repair procedures suggested in the 7.4.2 release notes:
> http://developer.postgresql.org/docs/postgres/release.html#RELEASE-7-4-2
> and see if that improves matters.
>
>             regards, tom lane


I  have applied the procedure for fixing pg_statistic as you had said, but  the result is the same!
Only tran.time in the query is a timestamp , no int8 or float8.
The OS is FEDORA 0.96 x86_64 and the flag --enable-integer-datetimes is false,it could be a problem??

I don't know what to make, help me please!

Another question, in 7.4 GROUP BY clause  not guarantee the ordering of result.
Can I desable the new group by algorithm to maintain the coherence whit the programs that I have in production with
pg_7.3

Thank you !!!

Re: postgres FROM clause problem

From
Tom Lane
Date:
Paolo Tavalazzi <ptavalazzi@charta.it> writes:
> I  have applied the procedure for fixing pg_statistic as you had said, but  the result is the same!

Hm.  It could be a planner bug.  Can you reproduce the misbehavior if
you dump and load the tables into a fresh database?  If so, could you
send me the dump so I can look at the problem with a debugger?

(Also, you might try updating to 7.4.2 first and see if that changes
anything.  We did fix quite a number of bugs already in 7.4.2...)

> Another question, in 7.4 GROUP BY clause  not guarantee the ordering of result.
> Can I desable the new group by algorithm to maintain the coherence whit the programs that I have in production with
pg_7.3

As Bruno said, your programs are broken because they are assuming
something not guaranteed by the SQL spec.  But until you get around to
adding the ORDER BY clauses they should have, see enable_hashagg.

            regards, tom lane

Re: postgres FROM clause problem (GROUP BY subquestion)

From
Bruno Wolff III
Date:
On Wed, Mar 10, 2004 at 18:33:41 +0100,
  Paolo Tavalazzi <ptavalazzi@charta.it> wrote:
>
> I don't know what to make, help me please!
>
> Another question, in 7.4 GROUP BY clause  not guarantee the ordering of result.
> Can I desable the new group by algorithm to maintain the coherence whit the programs that I have in production with
pg_7.3

GROUP BY never guarenteed an order. That this happened in 7.3 was an
implementation detail. If you want a specific order you need to use an
ORDER BY clause.

Re: postgres FROM clause problem

From
Martijn van Oosterhout
Date:
On Wed, Mar 10, 2004 at 06:33:41PM +0100, Paolo Tavalazzi wrote:
> Another question, in 7.4 GROUP BY clause  not guarantee the ordering of result.
> Can I desable the new group by algorithm to maintain the coherence whit the programs that I have in production with
pg_7.3

Uf you're expecting ordered output, perhaps you should add an ORDER BY
clause?
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> If the Catholic church can survive the printing press, science fiction
> will certainly weather the advent of bookwarez.
>    http://craphound.com/ebooksneitherenorbooks.txt - Cory Doctorow

Attachment

Re: postgres FROM clause problem (GROUP BY subquestion)

From
Paolo Tavalazzi
Date:
Alle 19:12, mercoledì 10 marzo 2004, hai scritto:
> On Wed, Mar 10, 2004 at 18:33:41 +0100,
>
>   Paolo Tavalazzi <ptavalazzi@charta.it> wrote:
> > I don't know what to make, help me please!
> >
> > Another question, in 7.4 GROUP BY clause  not guarantee the ordering of
> > result. Can I desable the new group by algorithm to maintain the
> > coherence whit the programs that I have in production with pg_7.3
>
> GROUP BY never guarenteed an order. That this happened in 7.3 was an
> implementation detail. If you want a specific order you need to use an
> ORDER BY clause.


Seeing that GROUP BY in 7.3 required sorting the input data to bring group members together,
this guaranteed to me also the order of the attributes in GROUP BY clause.

Re: postgres FROM clause problem

From
Paolo Tavalazzi
Date:
Alle 18:03, giovedì 11 marzo 2004, hai scritto:
> Paolo Tavalazzi <ptavalazzi@charta.it> writes:
> > But the query plans are still various!!
>
> I think you made a copy-and-paste mistake, because the explain results
> you posted are exactly the same ...
>
>             regards, tom lane


 Excuse I have mistaken!!


#### QUERY 1



explain analyze SELECT
    teatri.code,
    spettacoli.code,
    sum(seat.price),
    sum(seat.prev)



  FROM
    seat,
    spettacoli,
    tran ,
    teatri


  WHERE
    tran.time >= timestamp '2004-02-20 00:00:00' AND
    tran.time <= timestamp '2004-03-08 23:59:59' AND
    tran.perf_code           = spettacoli.code AND
    tran.thea_code           = spettacoli.teatro AND
    tran.id                  = seat.bt_id AND
    tran.system              = seat.system AND
    spettacoli.system        = tran.system AND
    spettacoli.teatro        = teatri.code
  GROUP BY
    1,2


EXPLAIN :

 HashAggregate  (cost=8042.89..8042.92 rows=7 width=32) (actual time=2069.895..2071.505 rows=1317 loops=1)
   ->  Hash Join  (cost=8015.82..8042.82 rows=7 width=32) (actual time=1538.771..1779.257 rows=67218 loops=1)
         Hash Cond: ("outer".code = "inner".teatro)
         ->  Seq Scan on teatri  (cost=0.00..22.62 rows=862 width=13) (actual time=0.008..0.572 rows=862 loops=1)
         ->  Hash  (cost=8015.80..8015.80 rows=7 width=45) (actual time=1538.652..1538.652 rows=0 loops=1)
               ->  Nested Loop  (cost=1538.40..8015.80 rows=7 width=45) (actual time=652.105..1486.577 rows=68167
loops=1)
                     ->  Merge Join  (cost=1538.40..7920.98 rows=16 width=45) (actual time=652.045..1095.559 rows=24919
loops=1)
                           Merge Cond: (("outer".system = "inner".system) AND ("outer".teatro = "inner".thea_code) AND
(("outer".code)::text= "inner"."?column5?")) 
                           ->  Index Scan using spe_sys_tea_perf on spettacoli  (cost=0.00..5836.07 rows=56079
width=26)(actual time=0.049..63.556 rows=55565 loops=1) 
                           ->  Sort  (cost=1538.40..1569.84 rows=12576 width=30) (actual time=651.509..656.391
rows=24923loops=1) 
                                 Sort Key: tran.system, tran.thea_code, (tran.perf_code)::text
                                 ->  Index Scan using time_idx on tran  (cost=0.00..682.08 rows=12576 width=30) (actual
time=0.083..69.887rows=24923 loops=1) 
                                       Index Cond: (("time" >= ('2004-03-01 00:00:00'::timestamp without time
zone)::timestampwith time zone) AND ("time" <= ('2004-03-08 23:59:59'::timestamp without time zone)::timestamp with
timezone)) 
                     ->  Index Scan using id_system_idx on seat  (cost=0.00..5.91 rows=1 width=14) (actual
time=0.006..0.010rows=3 loops=24919) 
                           Index Cond: (("outer".id = seat.bt_id) AND ("outer".system = seat.system))
 Total runtime: 2076.726 ms


#### QUERY 2 :

explain anlyze SELECT
    teatri.code,
    spettacoli.code,
    sum(seat.price),
    sum(seat.prev)


  FROM
    seat,
    teatri,
    tran ,
    spettacoli


  WHERE
    tran.time >= timestamp '2004-03-01 00:00:00' AND
    tran.time <= timestamp '2004-03-08 23:59:59' AND
    tran.perf_code           = spettacoli.code AND
    tran.thea_code           = spettacoli.teatro AND
    tran.id                  = seat.bt_id AND
    tran.system              = seat.system AND
    spettacoli.system        = tran.system AND
    spettacoli.teatro        = teatri.code
  GROUP BY
    1,2


EXPLAIN output :

 HashAggregate  (cost=8057.62..8057.63 rows=1 width=32) (actual time=2728.066..2729.738 rows=1317 loops=1)
   ->  Nested Loop  (cost=1538.40..8057.61 rows=1 width=32) (actual time=665.122..2438.275 rows=67218 loops=1)
         ->  Nested Loop  (cost=1538.40..8015.80 rows=9 width=45) (actual time=665.078..1509.890 rows=68167 loops=1)
               ->  Merge Join  (cost=1538.40..7920.98 rows=16 width=45) (actual time=665.018..1101.716 rows=24919
loops=1)
                     Merge Cond: (("outer".system = "inner".system) AND ("outer".teatro = "inner".thea_code) AND
(("outer".code)::text= "inner"."?column5?")) 
                     ->  Index Scan using spe_sys_tea_perf on spettacoli  (cost=0.00..5836.07 rows=56079 width=26)
(actualtime=0.046..63.772 rows=55565 loops=1) 
                     ->  Sort  (cost=1538.40..1569.84 rows=12576 width=30) (actual time=664.481..669.947 rows=24923
loops=1)
                           Sort Key: tran.system, tran.thea_code, (tran.perf_code)::text
                           ->  Index Scan using time_idx on tran  (cost=0.00..682.08 rows=12576 width=30) (actual
time=0.080..70.663rows=24923 loops=1) 
                                 Index Cond: (("time" >= ('2004-03-01 00:00:00'::timestamp without time
zone)::timestampwith time zone) AND ("time" <= ('2004-03-08 23:59:59'::timestamp without time zone)::timestamp with
timezone)) 
               ->  Index Scan using id_system_idx on seat  (cost=0.00..5.91 rows=1 width=14) (actual time=0.007..0.009
rows=3loops=24919) 
                     Index Cond: (("outer".id = seat.bt_id) AND ("outer".system = seat.system))
         ->  Index Scan using teatri_pkey on teatri  (cost=0.00..4.63 rows=1 width=13) (actual time=0.010..0.011 rows=1
loops=68167)
               Index Cond: (teatri.code = "outer".thea_code)
 Total runtime: 2733.910 ms





Re: postgres FROM clause problem

From
Tom Lane
Date:
Paolo Tavalazzi <ptavalazzi@charta.it> writes:
> But the query plans are still various!!

I think you made a copy-and-paste mistake, because the explain results
you posted are exactly the same ...

            regards, tom lane

Re: postgres FROM clause problem

From
Paolo Tavalazzi
Date:
Alle 19:40, mercoledì 10 marzo 2004, hai scritto:
> Paolo Tavalazzi <ptavalazzi@charta.it> writes:
> > I  have applied the procedure for fixing pg_statistic as you had said,
> > but  the result is the same!
>
> Hm.  It could be a planner bug.  Can you reproduce the misbehavior if
> you dump and load the tables into a fresh database?  If so, could you
> send me the dump so I can look at the problem with a debugger?
>
> (Also, you might try updating to 7.4.2 first and see if that changes
> anything.  We did fix quite a number of bugs already in 7.4.2...)
>
> > Another question, in 7.4 GROUP BY clause  not guarantee the ordering of
> > result. Can I desable the new group by algorithm to maintain the
> > coherence whit the programs that I have in production with pg_7.3
>
> As Bruno said, your programs are broken because they are assuming
> something not guaranteed by the SQL spec.  But until you get around to
> adding the ORDER BY clauses they should have, see enable_hashagg.
>
>             regards, tom lane



I update to 7.4.2 (dump - initdb -restore)  and I have had this result :

#### QUERY 1


explain analyze SELECT
    teatri.code,
    spettacoli.code,
    sum(seat.price),
    sum(seat.prev)


  FROM
    seat,
    spettacoli,
    tran ,
    teatri

  WHERE
    tran.time >= timestamp '2004-02-20 00:00:00' AND
    tran.time <= timestamp '2004-03-08 23:59:59' AND
    tran.perf_code           = spettacoli.code AND
    tran.thea_code           = spettacoli.teatro AND
    tran.id                  = seat.bt_id AND
    tran.system              = seat.system AND
    spettacoli.system        = tran.system AND
    spettacoli.teatro        = teatri.code
  GROUP BY
    1,2

EXPLAIN output :

HashAggregate  (cost=8057.62..8057.63 rows=1 width=32) (actual time=2711.656..2713.337 rows=1317 loops=1)
   ->  Nested Loop  (cost=1538.40..8057.61 rows=1 width=32) (actual time=648.011..2425.314 rows=67218 loops=1)
         ->  Nested Loop  (cost=1538.40..8015.80 rows=9 width=45) (actual time=647.968..1494.797 rows=68167 loops=1)
               ->  Merge Join  (cost=1538.40..7920.98 rows=16 width=45) (actual time=647.909..1087.032 rows=24919
loops=1)
                     Merge Cond: (("outer".system = "inner".system) AND ("outer".teatro = "inner".thea_code) AND
(("outer".code)::text= "inner"."?column5?")) 
                     ->  Index Scan using spe_sys_tea_perf on spettacoli  (cost=0.00..5836.07 rows=56079 width=26)
(actualtime=0.049..64.118 rows=55565 loops=1) 
                     ->  Sort  (cost=1538.40..1569.84 rows=12576 width=30) (actual time=647.374..653.048 rows=24923
loops=1)
                           Sort Key: tran.system, tran.thea_code, (tran.perf_code)::text
                           ->  Index Scan using time_idx on tran  (cost=0.00..682.08 rows=12576 width=30) (actual
time=0.082..73.057rows=24923 loops=1) 
                                 Index Cond: (("time" >= ('2004-03-01 00:00:00'::timestamp without time
zone)::timestampwith time zone) AND ("time" <= ('2004-03-08 23:59:59'::timestamp without time zone)::timestamp with
timezone)) 
               ->  Index Scan using id_system_idx on seat  (cost=0.00..5.91 rows=1 width=14) (actual time=0.007..0.010
rows=3loops=24919) 
                     Index Cond: (("outer".id = seat.bt_id) AND ("outer".system = seat.system))
         ->  Index Scan using teatri_pkey on teatri  (cost=0.00..4.63 rows=1 width=13) (actual time=0.010..0.011 rows=1
loops=68167)
               Index Cond: (teatri.code = "outer".thea_code)
 Total runtime: 2717.573 ms


#### QUERY 2

explain analyze SELECT
    teatri.code,
    spettacoli.code,
    sum(seat.price),
    sum(seat.prev)

  FROM
    seat,
    teatri,
    tran ,
    spettacoli

  WHERE
    tran.time >= timestamp '2004-03-01 00:00:00' AND
    tran.time <= timestamp '2004-03-08 23:59:59' AND
    tran.perf_code           = spettacoli.code AND
    tran.thea_code           = spettacoli.teatro AND
    tran.id                  = seat.bt_id AND
    tran.system              = seat.system AND
    spettacoli.system        = tran.system AND
    spettacoli.teatro        = teatri.code
  GROUP BY
    1,2

EXPLAIN output :

HashAggregate  (cost=8057.62..8057.63 rows=1 width=32) (actual time=2711.656..2713.337 rows=1317 loops=1)
   ->  Nested Loop  (cost=1538.40..8057.61 rows=1 width=32) (actual time=648.011..2425.314 rows=67218 loops=1)
         ->  Nested Loop  (cost=1538.40..8015.80 rows=9 width=45) (actual time=647.968..1494.797 rows=68167 loops=1)
               ->  Merge Join  (cost=1538.40..7920.98 rows=16 width=45) (actual time=647.909..1087.032 rows=24919
loops=1)
                     Merge Cond: (("outer".system = "inner".system) AND ("outer".teatro = "inner".thea_code) AND
(("outer".code)::text= "inner"."?column5?")) 
                     ->  Index Scan using spe_sys_tea_perf on spettacoli  (cost=0.00..5836.07 rows=56079 width=26)
(actualtime=0.049..64.118 rows=55565 loops=1) 
                     ->  Sort  (cost=1538.40..1569.84 rows=12576 width=30) (actual time=647.374..653.048 rows=24923
loops=1)
                           Sort Key: tran.system, tran.thea_code, (tran.perf_code)::text
                           ->  Index Scan using time_idx on tran  (cost=0.00..682.08 rows=12576 width=30) (actual
time=0.082..73.057rows=24923 loops=1) 
                                 Index Cond: (("time" >= ('2004-03-01 00:00:00'::timestamp without time
zone)::timestampwith time zone) AND ("time" <= ('2004-03-08 23:59:59'::timestamp without time zone)::timestamp with
timezone)) 
               ->  Index Scan using id_system_idx on seat  (cost=0.00..5.91 rows=1 width=14) (actual time=0.007..0.010
rows=3loops=24919) 
                     Index Cond: (("outer".id = seat.bt_id) AND ("outer".system = seat.system))
         ->  Index Scan using teatri_pkey on teatri  (cost=0.00..4.63 rows=1 width=13) (actual time=0.010..0.011 rows=1
loops=68167)
               Index Cond: (teatri.code = "outer".thea_code)
 Total runtime: 2717.573 ms


As you see the query they have similar result times, the second one is a lot of fast and optimized then before!
But the query plans are still various!!
For me the result is satisfactory , but i don't understand why thw two query give back different plan???
thank you



Re: postgres FROM clause problem

From
Tom Lane
Date:
Paolo Tavalazzi <ptavalazzi@charta.it> writes:
> [ query plans after updating to 7.4.2 ]

Okay, they're certainly a lot closer than before, so I think I was right
that you were getting bitten somehow by the pg_statistic alignment
problem.  It seems like there may still be a bug lurking though.  The
rows-out estimates are 7 versus 9 for the middle nested-loop join.
That might just be roundoff error, or there may be something else going
on.  Also the estimates for the top join are 7 versus 1 rows, which
seems too large a ratio to be explainable as roundoff error.

Could I pester you to send me a pg_dump dump of this database off-list?
I'd like to trace through the planner with a debugger and see exactly
where the results are diverging.

            regards, tom lane

Re: postgres FROM clause problem (GROUP BY subquestion)

From
Bruno Wolff III
Date:
On Thu, Mar 11, 2004 at 09:43:57 +0100,
  Paolo Tavalazzi <ptavalazzi@charta.it> wrote:
> Alle 19:12, mercoledì 10 marzo 2004, hai scritto:
> > On Wed, Mar 10, 2004 at 18:33:41 +0100,
> >
> >   Paolo Tavalazzi <ptavalazzi@charta.it> wrote:
> > > I don't know what to make, help me please!
> > >
> > > Another question, in 7.4 GROUP BY clause  not guarantee the ordering of
> > > result. Can I desable the new group by algorithm to maintain the
> > > coherence whit the programs that I have in production with pg_7.3
> >
> > GROUP BY never guarenteed an order. That this happened in 7.3 was an
> > implementation detail. If you want a specific order you need to use an
> > ORDER BY clause.
>
>
> Seeing that GROUP BY in 7.3 required sorting the input data to bring group members together,
> this guaranteed to me also the order of the attributes in GROUP BY clause.

This might be a language difference. You weren't guarenteed (or promised)
that the output would be sorted. It just happened that way because of how
the database implemented the group by. By reading the source code for
that version it would be possible to determine that the output would
always be sorted. However, that could change at any upgrade or if you
switched databases.

Re: postgres FROM clause problem (GROUP BY subquestion)

From
Paolo Tavalazzi
Date:
Alle 20:14, giovedì 11 marzo 2004, hai scritto:
> On Thu, Mar 11, 2004 at 09:43:57 +0100,
>
>   Paolo Tavalazzi <ptavalazzi@charta.it> wrote:
> > Alle 19:12, mercoledì 10 marzo 2004, hai scritto:
> > > On Wed, Mar 10, 2004 at 18:33:41 +0100,
> > >
> > >   Paolo Tavalazzi <ptavalazzi@charta.it> wrote:
> > > > I don't know what to make, help me please!
> > > >
> > > > Another question, in 7.4 GROUP BY clause  not guarantee the ordering
> > > > of result. Can I desable the new group by algorithm to maintain the
> > > > coherence whit the programs that I have in production with pg_7.3
> > >
> > > GROUP BY never guarenteed an order. That this happened in 7.3 was an
> > > implementation detail. If you want a specific order you need to use an
> > > ORDER BY clause.
> >
> > Seeing that GROUP BY in 7.3 required sorting the input data to bring
> > group members together, this guaranteed to me also the order of the
> > attributes in GROUP BY clause.
>
> This might be a language difference. You weren't guarenteed (or promised)
> that the output would be sorted. It just happened that way because of how
> the database implemented the group by. By reading the source code for
> that version it would be possible to determine that the output would
> always be sorted. However, that could change at any upgrade or if you
> switched databases.

You have reason.
My deduction comes only from the reading of the code.
I thought to have read it also in the postgres documentation,but it is not
true.
I have seen that the algorithm of group by in 7.4 is various then 7.3,and it
seems more efficent.
Therefore, i have changed my program to manage the order by clause.
Thank you





Re: postgres FROM clause problem

From
Paolo Tavalazzi
Date:
Alle 18:53, giovedì 11 marzo 2004, hai scritto:
> Paolo Tavalazzi <ptavalazzi@charta.it> writes:
> > [ query plans after updating to 7.4.2 ]
>
> Okay, they're certainly a lot closer than before, so I think I was right
> that you were getting bitten somehow by the pg_statistic alignment
> problem.  It seems like there may still be a bug lurking though.  The
> rows-out estimates are 7 versus 9 for the middle nested-loop join.
> That might just be roundoff error, or there may be something else going
> on.  Also the estimates for the top join are 7 versus 1 rows, which
> seems too large a ratio to be explainable as roundoff error.
>
> Could I pester you to send me a pg_dump dump of this database off-list?
> I'd like to trace through the planner with a debugger and see exactly
> where the results are diverging.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly



I have reconstructed the database from zero and i have tried the usual two query
in various conditions :

#################### NO ANALYZE :  similar execution time but different plan

1)

  FROM
    seat,spettacoli,tran,teatri
                                                                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=51.00..51.00 rows=1 width=37) (actual time=4163.191..4164.883 rows=1317 loops=1)
   ->  Nested Loop  (cost=0.00..50.99 rows=1 width=37) (actual time=0.253..3868.225 rows=67218 loops=1)
         ->  Nested Loop  (cost=0.00..46.15 rows=1 width=50) (actual time=0.215..2903.309 rows=68167 loops=1)
               ->  Nested Loop  (cost=0.00..41.31 rows=1 width=38) (actual time=0.124..508.361 rows=68174 loops=1)
                     ->  Index Scan using time_idx on tran  (cost=0.00..17.11 rows=5 width=32) (actual
time=0.068..42.747rows=24923 loops=1) 
                           Index Cond: (("time" >= ('2004-03-01 00:00:00'::timestamp without time zone)::timestamp with
timezone) AND ("time" <= ('2004-03-08 23:59:59'::timestamp without time zone)::timestamp with time zone)) 
                     ->  Index Scan using id_system_idx on seat  (cost=0.00..4.83 rows=1 width=14) (actual
time=0.007..0.010rows=3 loops=24923) 
                           Index Cond: (("outer".id = seat.bt_id) AND ("outer".system = seat.system))
               ->  Index Scan using spe_sys_tea_perf on spettacoli  (cost=0.00..4.83 rows=1 width=31) (actual
time=0.031..0.032rows=1 loops=68174) 
                     Index Cond: ((spettacoli.system = "outer".system) AND ("outer".thea_code = spettacoli.teatro) AND
(("outer".perf_code)::text= (spettacoli.code)::text)) 
         ->  Index Scan using teatri_pkey on teatri  (cost=0.00..4.82 rows=1 width=13) (actual time=0.011..0.012 rows=1
loops=68167)
               Index Cond: (teatri.code = "outer".thea_code)
 Total runtime: 4165.522 ms




2)  FROM
     seat,teatri,tran,spettacoli
                                                                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=51.00..51.00 rows=1 width=37) (actual time=4161.159..4162.811 rows=1317 loops=1)
   ->  Nested Loop  (cost=0.00..50.99 rows=1 width=37) (actual time=0.251..3865.205 rows=67218 loops=1)
         ->  Nested Loop  (cost=0.00..46.14 rows=1 width=51) (actual time=0.186..1512.814 rows=67225 loops=1)
               ->  Nested Loop  (cost=0.00..41.31 rows=1 width=38) (actual time=0.121..507.694 rows=68174 loops=1)
                     ->  Index Scan using time_idx on tran  (cost=0.00..17.11 rows=5 width=32) (actual
time=0.066..40.742rows=24923 loops=1) 
                           Index Cond: (("time" >= ('2004-03-01 00:00:00'::timestamp without time zone)::timestamp with
timezone) AND ("time" <= ('2004-03-08 23:59:59'::timestamp without time zone)::timestamp with time zone)) 
                     ->  Index Scan using id_system_idx on seat  (cost=0.00..4.83 rows=1 width=14) (actual
time=0.008..0.011rows=3 loops=24923) 
                           Index Cond: (("outer".id = seat.bt_id) AND ("outer".system = seat.system))
               ->  Index Scan using teatri_pkey on teatri  (cost=0.00..4.82 rows=1 width=13) (actual time=0.011..0.012
rows=1loops=68174) 
                     Index Cond: (teatri.code = "outer".thea_code)
         ->  Index Scan using spe_sys_tea_perf on spettacoli  (cost=0.00..4.83 rows=1 width=31) (actual
time=0.031..0.032rows=1 loops=67225) 
               Index Cond: ((spettacoli.system = "outer".system) AND (spettacoli.teatro = "outer".code) AND
(("outer".perf_code)::text= (spettacoli.code)::text)) 
 Total runtime: 4163.446 ms



###### AFTER ANALYZE  TARGET STATISTICS 100 : same time execution and same query plan

1) FROM
     seat,spettacoli,tran,teatri


-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=53216.67..53216.67 rows=1 width=32) (actual time=2697.680..2699.333 rows=1317 loops=1)
   ->  Nested Loop  (cost=47299.78..53216.66 rows=1 width=32) (actual time=653.173..2412.029 rows=67218 loops=1)
         ->  Nested Loop  (cost=47299.78..53183.92 rows=9 width=45) (actual time=653.132..1489.709 rows=68167 loops=1)
               ->  Merge Join  (cost=47299.78..53103.62 rows=16 width=45) (actual time=653.081..1105.783 rows=24919
loops=1)
                     Merge Cond: (("outer".system = "inner".system) AND ("outer".teatro = "inner".thea_code) AND
(("outer".code)::text= "inner"."?column5?")) 
                     ->  Index Scan using spe_sys_tea_perf on spettacoli  (cost=0.00..5257.29 rows=56079 width=26)
(actualtime=0.038..61.964 rows=55565 loops=1) 
                     ->  Sort  (cost=47299.78..47331.23 rows=12580 width=30) (actual time=652.556..658.198 rows=24923
loops=1)
                           Sort Key: tran.system, tran.thea_code, (tran.perf_code)::text
                           ->  Index Scan using time_idx on tran  (cost=0.00..46443.15 rows=12580 width=30) (actual
time=0.081..73.187rows=24923 loops=1) 
                                 Index Cond: (("time" >= ('2004-03-01 00:00:00'::timestamp without time
zone)::timestampwith time zone) AND ("time" <= ('2004-03-08 23:59:59'::timestamp without time zone)::timestamp with
timezone)) 
               ->  Index Scan using id_system_idx on seat  (cost=0.00..5.00 rows=1 width=14) (actual time=0.006..0.009
rows=3loops=24919) 
                     Index Cond: (("outer".id = seat.bt_id) AND ("outer".system = seat.system))
         ->  Index Scan using teatri_pkey on teatri  (cost=0.00..3.63 rows=1 width=13) (actual time=0.010..0.011 rows=1
loops=68167)
               Index Cond: (teatri.code = "outer".thea_code)
 Total runtime: 2703.439 ms

2)  FROM
     seat,teatri,tran,spettacoli


-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=53209.45..53209.49 rows=7 width=32) (actual time=2699.970..2701.667 rows=1317 loops=1)
   ->  Nested Loop  (cost=47299.78..53209.38 rows=7 width=32) (actual time=659.834..2414.241 rows=67218 loops=1)
         ->  Nested Loop  (cost=47299.78..53183.92 rows=7 width=45) (actual time=659.792..1494.365 rows=68167 loops=1)
               ->  Merge Join  (cost=47299.78..53103.62 rows=16 width=45) (actual time=659.738..1102.368 rows=24919
loops=1)
                     Merge Cond: (("outer".system = "inner".system) AND ("outer".teatro = "inner".thea_code) AND
(("outer".code)::text= "inner"."?column5?")) 
                     ->  Index Scan using spe_sys_tea_perf on spettacoli  (cost=0.00..5257.29 rows=56079 width=26)
(actualtime=0.039..62.206 rows=55565 loops=1) 
                     ->  Sort  (cost=47299.78..47331.23 rows=12580 width=30) (actual time=659.212..664.639 rows=24923
loops=1)
                           Sort Key: tran.system, tran.thea_code, (tran.perf_code)::text
                           ->  Index Scan using time_idx on tran  (cost=0.00..46443.15 rows=12580 width=30) (actual
time=0.082..73.680rows=24923 loops=1) 
                                 Index Cond: (("time" >= ('2004-03-01 00:00:00'::timestamp without time
zone)::timestampwith time zone) AND ("time" <= ('2004-03-08 23:59:59'::timestamp without time zone)::timestamp with
timezone)) 
               ->  Index Scan using id_system_idx on seat  (cost=0.00..5.00 rows=1 width=14) (actual time=0.006..0.009
rows=3loops=24919) 
                     Index Cond: (("outer".id = seat.bt_id) AND ("outer".system = seat.system))
         ->  Index Scan using teatri_pkey on teatri  (cost=0.00..3.63 rows=1 width=13) (actual time=0.010..0.011 rows=1
loops=68167)
               Index Cond: (teatri.code = "outer".thea_code)
 Total runtime: 2705.860 ms


### AFTER VACCUM ANALYZE : different execution time and query plan

1) FROM
    seat,spettacoli,tran,teatri
                                                                                                                  QUERY
PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=53935.10..53935.13 rows=7 width=32) (actual time=2072.545..2074.096 rows=1317 loops=1)
   ->  Hash Join  (cost=53908.03..53935.03 rows=7 width=32) (actual time=1538.514..1789.602 rows=67218 loops=1)
         Hash Cond: ("outer".code = "inner".teatro)
         ->  Seq Scan on teatri  (cost=0.00..22.62 rows=862 width=13) (actual time=0.008..0.500 rows=862 loops=1)
         ->  Hash  (cost=53908.01..53908.01 rows=7 width=45) (actual time=1538.391..1538.391 rows=0 loops=1)
               ->  Nested Loop  (cost=47422.83..53908.01 rows=7 width=45) (actual time=652.643..1486.794 rows=68167
loops=1)
                     ->  Merge Join  (cost=47422.83..53811.72 rows=16 width=45) (actual time=652.582..1093.848
rows=24919loops=1) 
                           Merge Cond: (("outer".system = "inner".system) AND ("outer".teatro = "inner".thea_code) AND
(("outer".code)::text= "inner"."?column5?")) 
                           ->  Index Scan using spe_sys_tea_perf on spettacoli  (cost=0.00..5842.37 rows=56079
width=26)(actual time=0.034..63.328 rows=55565 loops=1) 
                           ->  Sort  (cost=47422.83..47454.27 rows=12576 width=30) (actual time=652.055..656.872
rows=24923loops=1) 
                                 Sort Key: tran.system, tran.thea_code, (tran.perf_code)::text
                                 ->  Index Scan using time_idx on tran  (cost=0.00..46566.51 rows=12576 width=30)
(actualtime=0.089..74.088 rows=24923 loops=1) 
                                       Index Cond: (("time" >= ('2004-03-01 00:00:00'::timestamp without time
zone)::timestampwith time zone) AND ("time" <= ('2004-03-08 23:59:59'::timestamp without time zone)::timestamp with
timezone)) 
                     ->  Index Scan using id_system_idx on seat  (cost=0.00..6.00 rows=1 width=14) (actual
time=0.006..0.009rows=3 loops=24919) 
                           Index Cond: (("outer".id = seat.bt_id) AND ("outer".system = seat.system))
 Total runtime: 2079.186 ms




2) FROM
    seat,teatri,tran,spettacoli
                                                                                                               QUERY
PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=53949.76..53949.77 rows=1 width=32) (actual time=2705.489..2707.201 rows=1317 loops=1)
   ->  Nested Loop  (cost=47422.83..53949.75 rows=1 width=32) (actual time=659.431..2419.617 rows=67218 loops=1)
         ->  Nested Loop  (cost=47422.83..53908.01 rows=9 width=45) (actual time=659.386..1499.370 rows=68167 loops=1)
               ->  Merge Join  (cost=47422.83..53811.72 rows=16 width=45) (actual time=659.325..1105.997 rows=24919
loops=1)
                     Merge Cond: (("outer".system = "inner".system) AND ("outer".teatro = "inner".thea_code) AND
(("outer".code)::text= "inner"."?column5?")) 
                     ->  Index Scan using spe_sys_tea_perf on spettacoli  (cost=0.00..5842.37 rows=56079 width=26)
(actualtime=0.034..63.630 rows=55565 loops=1) 
                     ->  Sort  (cost=47422.83..47454.27 rows=12576 width=30) (actual time=658.794..664.366 rows=24923
loops=1)
                           Sort Key: tran.system, tran.thea_code, (tran.perf_code)::text
                           ->  Index Scan using time_idx on tran  (cost=0.00..46566.51 rows=12576 width=30) (actual
time=0.089..74.386rows=24923 loops=1) 
                                 Index Cond: (("time" >= ('2004-03-01 00:00:00'::timestamp without time
zone)::timestampwith time zone) AND ("time" <= ('2004-03-08 23:59:59'::timestamp without time zone)::timestamp with
timezone)) 
               ->  Index Scan using id_system_idx on seat  (cost=0.00..6.00 rows=1 width=14) (actual time=0.007..0.009
rows=3loops=24919) 
                     Index Cond: (("outer".id = seat.bt_id) AND ("outer".system = seat.system))
         ->  Index Scan using teatri_pkey on teatri  (cost=0.00..4.63 rows=1 width=13) (actual time=0.010..0.011 rows=1
loops=68167)
               Index Cond: (teatri.code = "outer".thea_code)
 Total runtime: 2711.333 ms



For me it would not be a problem to send the dump to you but the db is 2,5G size
and the   pg_dump -Fc -Z 9 -R -O is 81M.
I have not tried if diminishing the dimension of the db the problem persist.
How we can make?
What do you use for debug planner ???

Re: postgres FROM clause problem

From
Tom Lane
Date:
Paolo Tavalazzi <ptavalazzi@charta.it> writes:
> I have reconstructed the database from zero and i have tried the usual two query
> in various conditions :

I dug into this and found two basic problems.  One is associated with
handling redundant join conditions, as I said here:
http://archives.postgresql.org/pgsql-hackers/2004-03/msg00592.php
The solution to that is not entirely clear, but hopefully we'll have
something for 7.5 --- and anyway it only seems to cause fractional
estimation errors, not really big mistakes.

The other problem is that you wrote conditions

    tran.time >= timestamp '2004-02-20 00:00:00' AND
    tran.time <= timestamp '2004-03-08 23:59:59' AND

where tran.time is actually declared as timestamp with time zone.
Had you left off the "timestamp" label for the constants you'd have
been fine, but as it's written you've forced a comparison between
timestamp with timezone vs. timestamp without timezone,  That requires
a runtime coercion which is not immutable (because it depends on the
TimeZone parameter).  This presently causes the planner to throw up
its hands and make a default estimate ... and in this case the default
estimate is horribly wrong.  I got an estimate of 715 rows selected
when the reality was 100000+.  That naturally causes all sorts of bad
planning decisions.

I have some thoughts about making this behavior more foolproof for
7.5, as per
http://archives.postgresql.org/pgsql-hackers/2004-03/msg00593.php
but in the meantime you could do a lot better by labeling the constants
timestamptz, or indeed not labeling them at all.

            regards, tom lane