Re: Query not using index pgsql 8.2.3 - Mailing list pgsql-general

From Henrik Zagerholm
Subject Re: Query not using index pgsql 8.2.3
Date
Msg-id CF8A4C11-DB11-450E-AA64-0FC6537AFC7D@mac.se
Whole thread Raw
In response to Re: Query not using index pgsql 8.2.3  (Michael Fuhr <mike@fuhr.org>)
Responses Re: Query not using index pgsql 8.2.3  (Henrik Zagerholm <henke@mac.se>)
List pgsql-general
23 mar 2007 kl. 13:34 skrev Michael Fuhr:

> On Fri, Mar 23, 2007 at 12:41:58PM +0100, Henrik Zagerholm wrote:
>> 23 mar 2007 kl. 12:33 skrev Michael Fuhr:
>>> The row count estimate for fk_filetype_id = 83 is high by an order
>>> of magnitude:
>>>
>>>> Bitmap Index Scan on tbl_file_idx6  (cost=0.00..25.65 rows=1251
>>>> width=0) (actual time=21.958..21.958 rows=112 loops=1)
>>>>
>>>> Index Cond: (fk_filetype_id = 83)
>>>
>>> Have you run ANALYZE or VACUUM ANALYZE on these tables recently?
>>> If so then you might try increasing the statistics target for
>>> tbl_file.fk_filetype_id and perhaps some of the columns in the join
>>> conditions.
>>
>> I did a vacuum full and reindex on all tables.
>
> VACUUM FULL is seldom (if ever) necessary if you're running plain
> VACUUM (without FULL) often enough, either manually or via autovacuum.
>
>> Now I also did a vacuum analyze on tbl_acl (the biggest table with
>> about 4.5 millin rows)
>>
>> Same result.
>
> I'd suggest analyzing all tables.  The bad estimate I mentioned
> appears to be for a column in tbl_file so if you didn't analyze
> that table then the query plan probably won't improve.
>
>> But do you mean if the row_count estimate is big it can't use any
>> index on any other table within the JOINs?
>
> High row count estimates make the planner think that scanning entire
> tables would be faster than using indexes.  The more of a table a
> query must fetch the less efficient an index scan becomes, to the
> point that a sequential scan is faster than an index scan.
>
>> Any specific parameters I should adjust?
>
> If analyzing the tables doesn't improve the row count estimates
> then try increasing some columns' statistics targets and re-analyze
> the table or just that column.  Example:
>
> ALTER TABLE tbl_file ALTER fk_filetype_id SET STATISTICS 100;
> ANALYZE tbl_file (fk_filetype_id);
>
I analyzed all tables and found no difference but after changing the
statistics for fk_filetype_id I got way better results.
Still there is a big cost on table tbl_file_structure_id it is making
a Bitmap heap scan.
Could I do something to improve this?

Thanks for all your help so far!

Limit  (cost=115168.22..115169.12 rows=20 width=173) (actual
time=5138.401..5138.982 rows=20 loops=1)
   ->  GroupAggregate  (cost=115168.22..115341.60 rows=3853
width=173) (actual time=5138.394..5138.929 rows=20 loops=1)
         ->  Sort  (cost=115168.22..115177.85 rows=3853 width=173)
(actual time=5138.339..5138.430 rows=73 loops=1)
               Sort Key: tbl_file.file_name, tbl_file.file_ctime,
tbl_structure.structure_path, tbl_computer.pk_computer_id,
tbl_filetype.filetype_icon, tbl_computer.computer_name,
tbl_share.share_name, tbl_share.share_path,
tbl_file_structure.fk_file_id
               ->  Nested Loop  (cost=8.48..114938.74 rows=3853
width=173) (actual time=185.588..5118.684 rows=730 loops=1)
                     ->  Seq Scan on tbl_filetype  (cost=0.00..1.25
rows=1 width=18) (actual time=0.036..0.053 rows=1 loops=1)
                           Filter: (83 = pk_filetype_id)
                     ->  Hash Join  (cost=8.48..114898.96 rows=3853
width=171) (actual time=185.538..5116.063 rows=730 loops=1)
                           Hash Cond: (tbl_archive.fk_share_id =
tbl_share.pk_share_id)
                           ->  Nested Loop  (cost=6.27..114843.77
rows=3853 width=146) (actual time=167.608..5093.255 rows=730 loops=1)
                                 ->  Nested Loop
(cost=6.27..32079.13 rows=1167 width=146) (actual
time=130.594..3448.927 rows=330 loops=1)
                                       Join Filter:
(tbl_archive.pk_archive_id = tbl_structure.fk_archive_id)
                                       ->  Nested Loop
(cost=4.89..31185.00 rows=1167 width=138) (actual
time=119.861..3402.527 rows=330 loops=1)
                                             ->  Nested Loop
(cost=4.89..24413.49 rows=1167 width=56) (actual
time=82.203..2097.142 rows=330 loops=1)
                                                   ->  Index Scan
using tbl_file_idx6 on tbl_file  (cost=0.00..344.56 rows=114
width=40) (actual time=13.164..15.475 rows=106 loops=1)
                                                         Index Cond:
(fk_filetype_id = 83)
                                                   ->  Bitmap Heap
Scan on tbl_file_structure  (cost=4.89..210.27 rows=69 width=24)
(actual time=8.736..19.606 rows=3 loops=106)
                                                         Recheck
Cond: (tbl_file.pk_file_id = tbl_file_structure.fk_file_id)
                                                         ->  Bitmap
Index Scan on tbl_file_structure_idx  (cost=0.00..4.88 rows=69
width=0) (actual time=2.574..2.574 rows=3 loops=106)
                                                               Index
Cond: (tbl_file.pk_file_id = tbl_file_structure.fk_file_id)
                                             ->  Index Scan using
tbl_structure_pkey on tbl_structure  (cost=0.00..5.79 rows=1
width=98) (actual time=3.942..3.945 rows=1 loops=330)
                                                   Index Cond:
(tbl_structure.pk_structure_id = tbl_file_structure.fk_structure_id)
                                       ->  Materialize
(cost=1.37..1.71 rows=34 width=16) (actual time=0.034..0.077 rows=34
loops=330)
                                             ->  Seq Scan on
tbl_archive  (cost=0.00..1.34 rows=34 width=16) (actual
time=10.606..10.693 rows=34 loops=1)
                                                   Filter:
archive_complete
                                 ->  Index Scan using tbl_acl_idx on
tbl_acl  (cost=0.00..70.47 rows=36 width=8) (actual time=4.964..4.971
rows=2 loops=330)
                                       Index Cond:
(tbl_acl.fk_file_structure_id = tbl_file_structure.pk_file_structure_id)
                           ->  Hash  (cost=2.16..2.16 rows=4
width=41) (actual time=17.890..17.890 rows=4 loops=1)
                                 ->  Hash Join  (cost=1.07..2.16
rows=4 width=41) (actual time=17.848..17.872 rows=4 loops=1)
                                       Hash Cond:
(tbl_share.fk_computer_id = tbl_computer.pk_computer_id)
                                       ->  Seq Scan on tbl_share
(cost=0.00..1.04 rows=4 width=29) (actual time=0.093..0.099 rows=4
loops=1)
                                       ->  Hash  (cost=1.03..1.03
rows=3 width=20) (actual time=17.724..17.724 rows=3 loops=1)
                                             ->  Seq Scan on
tbl_computer  (cost=0.00..1.03 rows=3 width=20) (actual
time=17.697..17.706 rows=3 loops=1)
Total runtime: 5139.608 ms



> The default statistics target is 10; the maximum value is 1000.
>
> On some systems I've found that reducing random_page_cost from 4
> (the default) to 2 gives more realistic cost estimates for index
> scans.
>
> --
> Michael Fuhr


pgsql-general by date:

Previous
From: Alban Hertroys
Date:
Subject: Re: Query not using index pgsql 8.2.3
Next
From: Henrik Zagerholm
Date:
Subject: Re: Query not using index pgsql 8.2.3