Thread: Wrong index choosen?
I hall I have a query in this form: empdb=# explain analyze select * from v_past_connections where id_user = 26195 and login_time > '2004-07-21'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- IndexScan using idx_user_logs_login_time on user_logs (cost=0.00..14.10 rows=1 width=28) (actual time=66.890..198.998 rows=5loops=1) Index Cond: (login_time > '2004-07-21 00:00:00+02'::timestamp with time zone) Filter: (id_user = 26195)Total runtime: 199.083 ms (4 rows) as you see the index on the time stamp column is used The table have indexes on both columns: empdb=# explain analyze select * from v_past_connections where login_time > '2004-07-21'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- IndexScan using idx_user_logs_login_time on user_logs (cost=0.00..12.90 rows=481 width=28) (actual time=7.338..661.300 rows=22477loops=1) Index Cond: (login_time > '2004-07-21 00:00:00+02'::timestamp with time zone) Total runtime: 676.472ms (3 rows) empdb=# explain analyze select * from v_past_connections where id_user = 26195; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- IndexScan using idx_user_user_logs on user_logs (cost=0.00..252.47 rows=320 width=28) (actual time=4.420..100.122 rows=221loops=1) Index Cond: (id_user = 26195) Total runtime: 100.348 ms (3 rows) The rows filtered out with both condictions are two order of magnitude differents, also the extimated rows are close to real numbers: empdb=# select count(*) from v_past_connections where id_user = 26195; count ------- 221 (1 row) empdb=# select count(*) from v_past_connections where login_time > '2004-07-21'; count ------- 22441 (1 row) why then the planner choose to do an index scan using the filter that retrieve a bigger ammount of rows ? A bug ? Regards Gaetano Mendola
On Fri, 23 Jul 2004, Gaetano Mendola wrote: > empdb=# explain analyze select * from v_past_connections where login_time > '2004-07-21'; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------------------- > Index Scan using idx_user_logs_login_time on user_logs (cost=0.00..12.90 rows=481 width=28) (actual time=7.338..661.300rows=22477 loops=1) > Index Cond: (login_time > '2004-07-21 00:00:00+02'::timestamp with time zone) > Total runtime: 676.472 ms > (3 rows) In this plan it estimates to get 481 but it got 22477. So the estimation was very wrong. You can increase the statistics tarhet on the login_time and it will probably be better (after the next analyze). > why then the planner choose to do an index scan using the filter that > retrieve a bigger ammount of rows ? A bug ? Because it has to decide on the plan before it knows exactly what the result will be. As seen above, the estimation was wrong and thus the plan was not as good as it could have been. In this case you probably also want to create a combined index on both columns: CREATE INDEX foo ON user_log (id_user, login_time); ps. This letter belonged to pgsql-performance and not pgsql-hackers. -- /Dennis Björklund
Dennis Bjorklund <db@zigo.dhs.org> writes: > In this plan it estimates to get 481 but it got 22477. So the estimation > was very wrong. You can increase the statistics tarhet on the login_time > and it will probably be better (after the next analyze). Given the nature of the data (login times), I'd imagine that the problem is simply that he hasn't analyzed recently enough. A bump in stats target may not be needed, but he's going to have to re-analyze that column often if he wants this sort of query to be estimated accurately, because the fraction of entries later than a given time T is *always* going to be changing. regards, tom lane