Thread: Wrong plan or what ?

Wrong plan or what ?

From
"Mendola Gaetano"
Date:
Hi all,
I'm running Postgres7.3.3 and I'm performing this simple select:

select *
from user_logs ul,
     user_data ud,
     class_default cd
where
     ul.id_user = ud.id_user and
     ud.id_class = cd.id_class and
     cd.id_provider = 39;

these are the number of rows for each table:

user_logs:  1258955
class_default: 31       ( only one with id_provider = 39 )
user_data: 10274;


this is the explain analyze for that query:

QUERY PLAN
 Hash Join  (cost=265.64..32000.76 rows=40612 width=263) (actual
time=11074.21..11134.28 rows=10 loops=1)
   Hash Cond: ("outer".id_user = "inner".id_user)
   ->  Seq Scan on user_logs ul  (cost=0.00..24932.65 rows=1258965 width=48)
(actual time=0.02..8530.21 rows=1258966 loops=1)
   ->  Hash  (cost=264.81..264.81 rows=331 width=215) (actual
time=30.22..30.22 rows=0 loops=1)
         ->  Nested Loop  (cost=0.00..264.81 rows=331 width=215) (actual
time=29.95..30.20 rows=6 loops=1)
               ->  Seq Scan on class_default cd  (cost=0.00..1.39 rows=1
width=55) (actual time=0.08..0.10 rows=1 loops=1)
                     Filter: (id_provider = 39)
               ->  Index Scan using idx_user_data_class on user_data ud
(cost=0.00..258.49 rows=395 width=160) (actual time=29.82..29.96 rows=6
loops=1)
                     Index Cond: (ud.id_class = "outer".id_class)
 Total runtime: 11135.65 msec
(10 rows)


I'm able to  performe that select with these 3 steps:

SELECT id_class from class_default where id_provider = 39;
 id_class
----------
       48
(1 row)

SELECT id_user from user_data where id_class in ( 48 );
 id_user
---------
   10943
   10942
   10934
   10927
   10910
   10909
(6 rows)


SELECT * from user_logs where id_user in (
 10943,   10942,   10934,   10927,   10910,  10909
);
[SNIPPED]

and the time ammount is a couple of milliseconds.

Why the planner or the executor ( I don't know ) do not follow
the same strategy ?



Thank you
Gaetano Mendola









Re: Wrong plan or what ?

From
Josh Berkus
Date:
Gaetano,

> SELECT * from user_logs where id_user in (
>  10943,   10942,   10934,   10927,   10910,  10909
> );
> [SNIPPED]

> Why the planner or the executor ( I don't know ) do not follow
> the same strategy ?

It is, actually, according to the query plan.

Can you post the EXPLAIN ANALYZE for the above query?

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: Wrong plan or what ?

From
"Mendola Gaetano"
Date:
"Josh Berkus" <josh@agliodbs.com>
> Gaetano,
>
> > SELECT * from user_logs where id_user in (
> >  10943,   10942,   10934,   10927,   10910,  10909
> > );
> > [SNIPPED]
>
> > Why the planner or the executor ( I don't know ) do not follow
> > the same strategy ?
>
> It is, actually, according to the query plan.
>
> Can you post the EXPLAIN ANALYZE for the above query?

Index Scan using idx_user_user_logs, idx_user_user_logs, idx_user_user_logs,
idx_user_user_logs, idx_user_user_logs, idx_user_user_logs on user_logs
(cost=0.00..5454.21 rows=2498 width=48) (actual time=0.09..0.28 rows=10
loops=1)
   Index Cond: ((id_user = 10943) OR (id_user = 10942) OR (id_user = 10934)
OR (id_user = 10927) OR (id_user = 10910) OR (id_user = 10909))
 Total runtime: 0.41 msec
(3 rows)


Thank you
Gaetano


PS: if I execute the query I obtain 10 rows instead of 3 that say the
explain analyze.




Re: Wrong plan or what ?

From
"Mendola Gaetano"
Date:
Forget my PS to last message.


Re: Wrong plan or what ?

From
Josh Berkus
Date:
Gaetano,

> QUERY PLAN
>  Hash Join  (cost=265.64..32000.76 rows=40612 width=263) (actual
> time=11074.21..11134.28 rows=10 loops=1)
>    Hash Cond: ("outer".id_user = "inner".id_user)
>    ->  Seq Scan on user_logs ul  (cost=0.00..24932.65 rows=1258965 width=48)
> (actual time=0.02..8530.21 rows=1258966 loops=1)

OK, here's your problem

The planner thinks that you're going to get 40162 rows out of the final join,
not 10.   If the row estimate was correct, then the Seq Scan would be a
reasonable plan.   But it's not.   Here's some steps you can take to clear
things up for the planner:

1) Make sure you've VACUUM ANALYZED
2) Adjust the following postgresql.conf statistics:
    a) effective_cache_size: increase to 70% of available (not used by other
processes) RAM.
    b) random_page_cost: decrease, maybe to 2.
    c) default_statistics_target: try increasing to 100
        (warning: this will significantly increase the time required to do ANALYZE)

Then test again!


--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: Wrong plan or what ?

From
"Mendola Gaetano"
Date:
"Josh Berkus" <josh@agliodbs.com>
> Gaetano,
>
> > QUERY PLAN
> >  Hash Join  (cost=265.64..32000.76 rows=40612 width=263) (actual
> > time=11074.21..11134.28 rows=10 loops=1)
> >    Hash Cond: ("outer".id_user = "inner".id_user)
> >    ->  Seq Scan on user_logs ul  (cost=0.00..24932.65 rows=1258965
width=48)
> > (actual time=0.02..8530.21 rows=1258966 loops=1)
>
> OK, here's your problem
>
> The planner thinks that you're going to get 40162 rows out of the final
join,
> not 10.   If the row estimate was correct, then the Seq Scan would be a
> reasonable plan.   But it's not.   Here's some steps you can take to clear
> things up for the planner:
>
> 1) Make sure you've VACUUM ANALYZED
> 2) Adjust the following postgresql.conf statistics:
> a) effective_cache_size: increase to 70% of available (not used by other
> processes) RAM.
> b) random_page_cost: decrease, maybe to 2.
> c) default_statistics_target: try increasing to 100
> (warning: this will significantly increase the time required to do
ANALYZE)
>
> Then test again!

No improvement at all,
I pushed default_statistics_target to 1000
but the rows expected are still 40612 :-(
Of course I restarted the postmaster and I vacuumed analyze the DB


Thank you
Gaetano