Re: Query not using index pgsql 8.2.3 - Mailing list pgsql-general

From Henrik Zagerholm
Subject Re: Query not using index pgsql 8.2.3
Date
Msg-id 1E4CC216-8B17-4C4E-855E-57BB05784D31@mac.se
Whole thread Raw
In response to Re: Query not using index pgsql 8.2.3  (Henrik Zagerholm <henke@mac.se>)
Responses Re: Query not using index pgsql 8.2.3
List pgsql-general
Never mind my last post regarding the Bitmap heap scan. That is all
in order.

Now it works perfectly. Went from 30+ seconds to 179ms. =)

I'm just amazed that setting the statistics threshold on one column
mad all the difference. IS there any guidelines on what columns I
should change the statistics on?


Thanks again!!!

23 mar 2007 kl. 14:04 skrev Henrik Zagerholm:

>
> 23 mar 2007 kl. 13:34 skrev Michael Fuhr:
>
>> On Fri, Mar 23, 2007 at 12:41:58PM +0100, Henrik Zagerholm wrote:
>>> 23 mar 2007 kl. 12:33 skrev Michael Fuhr:
>>>> The row count estimate for fk_filetype_id = 83 is high by an order
>>>> of magnitude:
>>>>
>>>>> Bitmap Index Scan on tbl_file_idx6  (cost=0.00..25.65 rows=1251
>>>>> width=0) (actual time=21.958..21.958 rows=112 loops=1)
>>>>>
>>>>> Index Cond: (fk_filetype_id = 83)
>>>>
>>>> Have you run ANALYZE or VACUUM ANALYZE on these tables recently?
>>>> If so then you might try increasing the statistics target for
>>>> tbl_file.fk_filetype_id and perhaps some of the columns in the join
>>>> conditions.
>>>
>>> I did a vacuum full and reindex on all tables.
>>
>> VACUUM FULL is seldom (if ever) necessary if you're running plain
>> VACUUM (without FULL) often enough, either manually or via
>> autovacuum.
>>
>>> Now I also did a vacuum analyze on tbl_acl (the biggest table with
>>> about 4.5 millin rows)
>>>
>>> Same result.
>>
>> I'd suggest analyzing all tables.  The bad estimate I mentioned
>> appears to be for a column in tbl_file so if you didn't analyze
>> that table then the query plan probably won't improve.
>>
>>> But do you mean if the row_count estimate is big it can't use any
>>> index on any other table within the JOINs?
>>
>> High row count estimates make the planner think that scanning entire
>> tables would be faster than using indexes.  The more of a table a
>> query must fetch the less efficient an index scan becomes, to the
>> point that a sequential scan is faster than an index scan.
>>
>>> Any specific parameters I should adjust?
>>
>> If analyzing the tables doesn't improve the row count estimates
>> then try increasing some columns' statistics targets and re-analyze
>> the table or just that column.  Example:
>>
>> ALTER TABLE tbl_file ALTER fk_filetype_id SET STATISTICS 100;
>> ANALYZE tbl_file (fk_filetype_id);
>>
> I analyzed all tables and found no difference but after changing
> the statistics for fk_filetype_id I got way better results.
> Still there is a big cost on table tbl_file_structure_id it is
> making a Bitmap heap scan.
> Could I do something to improve this?
>
> Thanks for all your help so far!
>
> Limit  (cost=115168.22..115169.12 rows=20 width=173) (actual
> time=5138.401..5138.982 rows=20 loops=1)
>   ->  GroupAggregate  (cost=115168.22..115341.60 rows=3853
> width=173) (actual time=5138.394..5138.929 rows=20 loops=1)
>         ->  Sort  (cost=115168.22..115177.85 rows=3853 width=173)
> (actual time=5138.339..5138.430 rows=73 loops=1)
>               Sort Key: tbl_file.file_name, tbl_file.file_ctime,
> tbl_structure.structure_path, tbl_computer.pk_computer_id,
> tbl_filetype.filetype_icon, tbl_computer.computer_name,
> tbl_share.share_name, tbl_share.share_path,
> tbl_file_structure.fk_file_id
>               ->  Nested Loop  (cost=8.48..114938.74 rows=3853
> width=173) (actual time=185.588..5118.684 rows=730 loops=1)
>                     ->  Seq Scan on tbl_filetype  (cost=0.00..1.25
> rows=1 width=18) (actual time=0.036..0.053 rows=1 loops=1)
>                           Filter: (83 = pk_filetype_id)
>                     ->  Hash Join  (cost=8.48..114898.96 rows=3853
> width=171) (actual time=185.538..5116.063 rows=730 loops=1)
>                           Hash Cond: (tbl_archive.fk_share_id =
> tbl_share.pk_share_id)
>                           ->  Nested Loop  (cost=6.27..114843.77
> rows=3853 width=146) (actual time=167.608..5093.255 rows=730 loops=1)
>                                 ->  Nested Loop
> (cost=6.27..32079.13 rows=1167 width=146) (actual
> time=130.594..3448.927 rows=330 loops=1)
>                                       Join Filter:
> (tbl_archive.pk_archive_id = tbl_structure.fk_archive_id)
>                                       ->  Nested Loop
> (cost=4.89..31185.00 rows=1167 width=138) (actual
> time=119.861..3402.527 rows=330 loops=1)
>                                             ->  Nested Loop
> (cost=4.89..24413.49 rows=1167 width=56) (actual
> time=82.203..2097.142 rows=330 loops=1)
>                                                   ->  Index Scan
> using tbl_file_idx6 on tbl_file  (cost=0.00..344.56 rows=114
> width=40) (actual time=13.164..15.475 rows=106 loops=1)
>                                                         Index Cond:
> (fk_filetype_id = 83)
>                                                   ->  Bitmap Heap
> Scan on tbl_file_structure  (cost=4.89..210.27 rows=69 width=24)
> (actual time=8.736..19.606 rows=3 loops=106)
>                                                         Recheck
> Cond: (tbl_file.pk_file_id = tbl_file_structure.fk_file_id)
>                                                         ->  Bitmap
> Index Scan on tbl_file_structure_idx  (cost=0.00..4.88 rows=69
> width=0) (actual time=2.574..2.574 rows=3 loops=106)
>                                                               Index
> Cond: (tbl_file.pk_file_id = tbl_file_structure.fk_file_id)
>                                             ->  Index Scan using
> tbl_structure_pkey on tbl_structure  (cost=0.00..5.79 rows=1
> width=98) (actual time=3.942..3.945 rows=1 loops=330)
>                                                   Index Cond:
> (tbl_structure.pk_structure_id = tbl_file_structure.fk_structure_id)
>                                       ->  Materialize
> (cost=1.37..1.71 rows=34 width=16) (actual time=0.034..0.077
> rows=34 loops=330)
>                                             ->  Seq Scan on
> tbl_archive  (cost=0.00..1.34 rows=34 width=16) (actual
> time=10.606..10.693 rows=34 loops=1)
>                                                   Filter:
> archive_complete
>                                 ->  Index Scan using tbl_acl_idx on
> tbl_acl  (cost=0.00..70.47 rows=36 width=8) (actual
> time=4.964..4.971 rows=2 loops=330)
>                                       Index Cond:
> (tbl_acl.fk_file_structure_id =
> tbl_file_structure.pk_file_structure_id)
>                           ->  Hash  (cost=2.16..2.16 rows=4
> width=41) (actual time=17.890..17.890 rows=4 loops=1)
>                                 ->  Hash Join  (cost=1.07..2.16
> rows=4 width=41) (actual time=17.848..17.872 rows=4 loops=1)
>                                       Hash Cond:
> (tbl_share.fk_computer_id = tbl_computer.pk_computer_id)
>                                       ->  Seq Scan on tbl_share
> (cost=0.00..1.04 rows=4 width=29) (actual time=0.093..0.099 rows=4
> loops=1)
>                                       ->  Hash  (cost=1.03..1.03
> rows=3 width=20) (actual time=17.724..17.724 rows=3 loops=1)
>                                             ->  Seq Scan on
> tbl_computer  (cost=0.00..1.03 rows=3 width=20) (actual
> time=17.697..17.706 rows=3 loops=1)
> Total runtime: 5139.608 ms
>
>
>
>> The default statistics target is 10; the maximum value is 1000.
>>
>> On some systems I've found that reducing random_page_cost from 4
>> (the default) to 2 gives more realistic cost estimates for index
>> scans.
>>
>> --
>> Michael Fuhr
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings


pgsql-general by date:

Previous
From: Henrik Zagerholm
Date:
Subject: Re: Query not using index pgsql 8.2.3
Next
From: Michael Fuhr
Date:
Subject: Re: Query not using index pgsql 8.2.3