Thread: Wrong plan or what ?

Wrong plan or what ?

From
"Mendola Gaetano"
Date:
Hi all,
I'm running Postgres7.3.3 and I'm performing this simple select:

select *
from user_logs ul,
     user_data ud,
     class_default cd
where
     ul.id_user = ud.id_user and
     ud.id_class = cd.id_class and
     cd.id_provider = 39;

these are the number of rows for each table:

user_logs:  1258955
class_default: 31       ( only one with id_provider = 39 )
user_data: 10274;


this is the explain analyze for that query:

QUERY PLAN
 Hash Join  (cost=265.64..32000.76 rows=40612 width=263) (actual
time=11074.21..11134.28 rows=10 loops=1)
   Hash Cond: ("outer".id_user = "inner".id_user)
   ->  Seq Scan on user_logs ul  (cost=0.00..24932.65 rows=1258965 width=48)
(actual time=0.02..8530.21 rows=1258966 loops=1)
   ->  Hash  (cost=264.81..264.81 rows=331 width=215) (actual
time=30.22..30.22 rows=0 loops=1)
         ->  Nested Loop  (cost=0.00..264.81 rows=331 width=215) (actual
time=29.95..30.20 rows=6 loops=1)
               ->  Seq Scan on class_default cd  (cost=0.00..1.39 rows=1
width=55) (actual time=0.08..0.10 rows=1 loops=1)
                     Filter: (id_provider = 39)
               ->  Index Scan using idx_user_data_class on user_data ud
(cost=0.00..258.49 rows=395 width=160) (actual time=29.82..29.96 rows=6
loops=1)
                     Index Cond: (ud.id_class = "outer".id_class)
 Total runtime: 11135.65 msec
(10 rows)


I'm able to  performe that select with these 3 steps:

SELECT id_class from class_default where id_provider = 39;
 id_class
----------
       48
(1 row)

SELECT id_user from user_data where id_class in ( 48 );
 id_user
---------
   10943
   10942
   10934
   10927
   10910
   10909
(6 rows)


SELECT * from user_logs where id_user in (
 10943,   10942,   10934,   10927,   10910,  10909
);
[SNIPPED]

and the time ammount is a couple of milliseconds.

Why the planner or the executor ( I don't know ) do not follow
the same strategy ?



Thank you
Gaetano Mendola









Re: [PERFORM] Wrong plan or what ?

From
"Mendola Gaetano"
Date:
"Josh Berkus" <josh@agliodbs.com>
> Gaetano,
>
> > SELECT * from user_logs where id_user in (
> >  10943,   10942,   10934,   10927,   10910,  10909
> > );
> > [SNIPPED]
>
> > Why the planner or the executor ( I don't know ) do not follow
> > the same strategy ?
>
> It is, actually, according to the query plan.
>
> Can you post the EXPLAIN ANALYZE for the above query?

Index Scan using idx_user_user_logs, idx_user_user_logs, idx_user_user_logs,
idx_user_user_logs, idx_user_user_logs, idx_user_user_logs on user_logs
(cost=0.00..5454.21 rows=2498 width=48) (actual time=0.09..0.28 rows=10
loops=1)
   Index Cond: ((id_user = 10943) OR (id_user = 10942) OR (id_user = 10934)
OR (id_user = 10927) OR (id_user = 10910) OR (id_user = 10909))
 Total runtime: 0.41 msec
(3 rows)


Thank you
Gaetano


PS: if I execute the query I obtain 10 rows instead of 3 that say the
explain analyze.




Re: [PERFORM] Wrong plan or what ?

From
"Mendola Gaetano"
Date:
Forget my PS to last message.


Re: [PERFORM] Wrong plan or what ?

From
"Mendola Gaetano"
Date:
"Josh Berkus" <josh@agliodbs.com>
> Gaetano,
>
> > QUERY PLAN
> >  Hash Join  (cost=265.64..32000.76 rows=40612 width=263) (actual
> > time=11074.21..11134.28 rows=10 loops=1)
> >    Hash Cond: ("outer".id_user = "inner".id_user)
> >    ->  Seq Scan on user_logs ul  (cost=0.00..24932.65 rows=1258965
width=48)
> > (actual time=0.02..8530.21 rows=1258966 loops=1)
>
> OK, here's your problem
>
> The planner thinks that you're going to get 40162 rows out of the final
join,
> not 10.   If the row estimate was correct, then the Seq Scan would be a
> reasonable plan.   But it's not.   Here's some steps you can take to clear
> things up for the planner:
>
> 1) Make sure you've VACUUM ANALYZED
> 2) Adjust the following postgresql.conf statistics:
> a) effective_cache_size: increase to 70% of available (not used by other
> processes) RAM.
> b) random_page_cost: decrease, maybe to 2.
> c) default_statistics_target: try increasing to 100
> (warning: this will significantly increase the time required to do
ANALYZE)
>
> Then test again!

No improvement at all,
I pushed default_statistics_target to 1000
but the rows expected are still 40612 :-(
Of course I restarted the postmaster and I vacuumed analyze the DB


Thank you
Gaetano








Re: [PERFORM] Wrong plan or what ?

From
Josh Berkus
Date:
Gaetano,

> SELECT * from user_logs where id_user in (
>  10943,   10942,   10934,   10927,   10910,  10909
> );
> [SNIPPED]

> Why the planner or the executor ( I don't know ) do not follow
> the same strategy ?

It is, actually, according to the query plan.

Can you post the EXPLAIN ANALYZE for the above query?

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: Wrong plan or what ?

From
"HansH"
Date:
In response to "Mendola Gaetano":
> I'm running Postgres7.3.3 and I'm performing this simple select:

Looking at your fast three step plan
> SELECT id_class from class_default where id_provider = 39;
> SELECT id_user from user_data where id_class in ( 48 );
> SELECT * from user_logs where id_user in (
>  10943,   10942,   10934,   10927,   10910,  10909 );
I'ld stem for reordering the from and where clauses alike:
    select *
    from
           class_default cd,
           user_data ud,
           user_logs ul
    where
           cd.id_provider = 39 and
           ud.id_class = cd.id_class and
           ul.id_user = ud.id_user;

Personally I dislike implied joins and rather go for _about_ this:
    select *
    from
           ( class_default cd
           LEFT JOIN user_data ud ON ud.id_class = cd.id_class )
           LEFT JOIN user_logs ul  ON ul.id_user = ud.id_user,
    where
           cd.id_provider = 39;

Good luck,

HansH





Re: Wrong plan or what ?

From
"Mendola Gaetano"
Date:
""HansH"" <hartenhans@op.het.net>
> In response to "Mendola Gaetano":
> > I'm running Postgres7.3.3 and I'm performing this simple select:
>
> Looking at your fast three step plan
> > SELECT id_class from class_default where id_provider = 39;
> > SELECT id_user from user_data where id_class in ( 48 );
> > SELECT * from user_logs where id_user in (
> >  10943,   10942,   10934,   10927,   10910,  10909 );
> I'ld stem for reordering the from and where clauses alike:
>     select *
>     from
>            class_default cd,
>            user_data ud,
>            user_logs ul
>     where
>            cd.id_provider = 39 and
>            ud.id_class = cd.id_class and
>            ul.id_user = ud.id_user;


still wrong:

Hash Join  (cost=267.10..32994.34 rows=41881 width=264) (actual
time=6620.17..6847.20 rows=94 loops=1)
   Hash Cond: ("outer".id_user = "inner".id_user)
   ->  Seq Scan on user_logs ul  (cost=0.00..25712.15 rows=1298315 width=48)
(actual time=0.01..5381.69 rows=1298351 loops=1)
   ->  Hash  (cost=266.25..266.25 rows=339 width=216) (actual
time=0.89..0.89 rows=0 loops=1)
         ->  Nested Loop  (cost=0.00..266.25 rows=339 width=216) (actual
time=0.16..0.83 rows=21 loops=1)
               ->  Seq Scan on class_default cd  (cost=0.00..1.39 rows=1
width=55) (actual time=0.08..0.09 rows=1 loops=1)
                     Filter: (id_provider = 39)
               ->  Index Scan using idx_user_data_class on user_data ud
(cost=0.00..260.00 rows=389 width=161) (actual time=0.06..0.40 rows=21
loops=1)
                     Index Cond: (ud.id_class = "outer".id_class)
 Total runtime: 6847.60 msec
(10 rows)


the returned are 94.


> Personally I dislike implied joins and rather go for _about_ this:
>     select *
>     from
>            ( class_default cd
>            LEFT JOIN user_data ud ON ud.id_class = cd.id_class )
>            LEFT JOIN user_logs ul  ON ul.id_user = ud.id_user,
>     where
>            cd.id_provider = 39;

worst:

 Merge Join  (cost=280.48..55717.14 rows=41881 width=264) (actual
time=18113.64..18182.94 rows=105 loops=1)
   Merge Cond: ("outer".id_user = "inner".id_user)
   ->  Index Scan using idx_user_user_logs on user_logs ul
(cost=0.00..51665.66 rows=1298315 width=48) (actual time=10.78..15459.37
rows=1298354 loops=1)
   ->  Sort  (cost=280.48..281.33 rows=339 width=216) (actual
time=1.11..1.20 rows=105 loops=1)
         Sort Key: ud.id_user
         ->  Nested Loop  (cost=0.00..266.25 rows=339 width=216) (actual
time=0.14..0.82 rows=21 loops=1)
               ->  Seq Scan on class_default cd  (cost=0.00..1.39 rows=1
width=55) (actual time=0.07..0.07 rows=1 loops=1)
                     Filter: (id_provider = 39)
               ->  Index Scan using idx_user_data_class on user_data ud
(cost=0.00..260.00 rows=389 width=161) (actual time=0.05..0.39 rows=21
loops=1)
                     Index Cond: (ud.id_class = "outer".id_class)
 Total runtime: 18185.61 msec

:-(



thank you anyway.

Gaetano