Re: Query not using index pgsql 8.2.3 - Mailing list pgsql-general
From | Henrik Zagerholm |
---|---|
Subject | Re: Query not using index pgsql 8.2.3 |
Date | |
Msg-id | 1E4CC216-8B17-4C4E-855E-57BB05784D31@mac.se Whole thread Raw |
In response to | Re: Query not using index pgsql 8.2.3 (Henrik Zagerholm <henke@mac.se>) |
Responses |
Re: Query not using index pgsql 8.2.3
|
List | pgsql-general |
Never mind my last post regarding the Bitmap heap scan. That is all in order. Now it works perfectly. Went from 30+ seconds to 179ms. =) I'm just amazed that setting the statistics threshold on one column mad all the difference. IS there any guidelines on what columns I should change the statistics on? Thanks again!!! 23 mar 2007 kl. 14:04 skrev Henrik Zagerholm: > > 23 mar 2007 kl. 13:34 skrev Michael Fuhr: > >> On Fri, Mar 23, 2007 at 12:41:58PM +0100, Henrik Zagerholm wrote: >>> 23 mar 2007 kl. 12:33 skrev Michael Fuhr: >>>> The row count estimate for fk_filetype_id = 83 is high by an order >>>> of magnitude: >>>> >>>>> Bitmap Index Scan on tbl_file_idx6 (cost=0.00..25.65 rows=1251 >>>>> width=0) (actual time=21.958..21.958 rows=112 loops=1) >>>>> >>>>> Index Cond: (fk_filetype_id = 83) >>>> >>>> Have you run ANALYZE or VACUUM ANALYZE on these tables recently? >>>> If so then you might try increasing the statistics target for >>>> tbl_file.fk_filetype_id and perhaps some of the columns in the join >>>> conditions. >>> >>> I did a vacuum full and reindex on all tables. >> >> VACUUM FULL is seldom (if ever) necessary if you're running plain >> VACUUM (without FULL) often enough, either manually or via >> autovacuum. >> >>> Now I also did a vacuum analyze on tbl_acl (the biggest table with >>> about 4.5 millin rows) >>> >>> Same result. >> >> I'd suggest analyzing all tables. The bad estimate I mentioned >> appears to be for a column in tbl_file so if you didn't analyze >> that table then the query plan probably won't improve. >> >>> But do you mean if the row_count estimate is big it can't use any >>> index on any other table within the JOINs? >> >> High row count estimates make the planner think that scanning entire >> tables would be faster than using indexes. The more of a table a >> query must fetch the less efficient an index scan becomes, to the >> point that a sequential scan is faster than an index scan. >> >>> Any specific parameters I should adjust? >> >> If analyzing the tables doesn't improve the row count estimates >> then try increasing some columns' statistics targets and re-analyze >> the table or just that column. Example: >> >> ALTER TABLE tbl_file ALTER fk_filetype_id SET STATISTICS 100; >> ANALYZE tbl_file (fk_filetype_id); >> > I analyzed all tables and found no difference but after changing > the statistics for fk_filetype_id I got way better results. > Still there is a big cost on table tbl_file_structure_id it is > making a Bitmap heap scan. > Could I do something to improve this? > > Thanks for all your help so far! > > Limit (cost=115168.22..115169.12 rows=20 width=173) (actual > time=5138.401..5138.982 rows=20 loops=1) > -> GroupAggregate (cost=115168.22..115341.60 rows=3853 > width=173) (actual time=5138.394..5138.929 rows=20 loops=1) > -> Sort (cost=115168.22..115177.85 rows=3853 width=173) > (actual time=5138.339..5138.430 rows=73 loops=1) > Sort Key: tbl_file.file_name, tbl_file.file_ctime, > tbl_structure.structure_path, tbl_computer.pk_computer_id, > tbl_filetype.filetype_icon, tbl_computer.computer_name, > tbl_share.share_name, tbl_share.share_path, > tbl_file_structure.fk_file_id > -> Nested Loop (cost=8.48..114938.74 rows=3853 > width=173) (actual time=185.588..5118.684 rows=730 loops=1) > -> Seq Scan on tbl_filetype (cost=0.00..1.25 > rows=1 width=18) (actual time=0.036..0.053 rows=1 loops=1) > Filter: (83 = pk_filetype_id) > -> Hash Join (cost=8.48..114898.96 rows=3853 > width=171) (actual time=185.538..5116.063 rows=730 loops=1) > Hash Cond: (tbl_archive.fk_share_id = > tbl_share.pk_share_id) > -> Nested Loop (cost=6.27..114843.77 > rows=3853 width=146) (actual time=167.608..5093.255 rows=730 loops=1) > -> Nested Loop > (cost=6.27..32079.13 rows=1167 width=146) (actual > time=130.594..3448.927 rows=330 loops=1) > Join Filter: > (tbl_archive.pk_archive_id = tbl_structure.fk_archive_id) > -> Nested Loop > (cost=4.89..31185.00 rows=1167 width=138) (actual > time=119.861..3402.527 rows=330 loops=1) > -> Nested Loop > (cost=4.89..24413.49 rows=1167 width=56) (actual > time=82.203..2097.142 rows=330 loops=1) > -> Index Scan > using tbl_file_idx6 on tbl_file (cost=0.00..344.56 rows=114 > width=40) (actual time=13.164..15.475 rows=106 loops=1) > Index Cond: > (fk_filetype_id = 83) > -> Bitmap Heap > Scan on tbl_file_structure (cost=4.89..210.27 rows=69 width=24) > (actual time=8.736..19.606 rows=3 loops=106) > Recheck > Cond: (tbl_file.pk_file_id = tbl_file_structure.fk_file_id) > -> Bitmap > Index Scan on tbl_file_structure_idx (cost=0.00..4.88 rows=69 > width=0) (actual time=2.574..2.574 rows=3 loops=106) > Index > Cond: (tbl_file.pk_file_id = tbl_file_structure.fk_file_id) > -> Index Scan using > tbl_structure_pkey on tbl_structure (cost=0.00..5.79 rows=1 > width=98) (actual time=3.942..3.945 rows=1 loops=330) > Index Cond: > (tbl_structure.pk_structure_id = tbl_file_structure.fk_structure_id) > -> Materialize > (cost=1.37..1.71 rows=34 width=16) (actual time=0.034..0.077 > rows=34 loops=330) > -> Seq Scan on > tbl_archive (cost=0.00..1.34 rows=34 width=16) (actual > time=10.606..10.693 rows=34 loops=1) > Filter: > archive_complete > -> Index Scan using tbl_acl_idx on > tbl_acl (cost=0.00..70.47 rows=36 width=8) (actual > time=4.964..4.971 rows=2 loops=330) > Index Cond: > (tbl_acl.fk_file_structure_id = > tbl_file_structure.pk_file_structure_id) > -> Hash (cost=2.16..2.16 rows=4 > width=41) (actual time=17.890..17.890 rows=4 loops=1) > -> Hash Join (cost=1.07..2.16 > rows=4 width=41) (actual time=17.848..17.872 rows=4 loops=1) > Hash Cond: > (tbl_share.fk_computer_id = tbl_computer.pk_computer_id) > -> Seq Scan on tbl_share > (cost=0.00..1.04 rows=4 width=29) (actual time=0.093..0.099 rows=4 > loops=1) > -> Hash (cost=1.03..1.03 > rows=3 width=20) (actual time=17.724..17.724 rows=3 loops=1) > -> Seq Scan on > tbl_computer (cost=0.00..1.03 rows=3 width=20) (actual > time=17.697..17.706 rows=3 loops=1) > Total runtime: 5139.608 ms > > > >> The default statistics target is 10; the maximum value is 1000. >> >> On some systems I've found that reducing random_page_cost from 4 >> (the default) to 2 gives more realistic cost estimates for index >> scans. >> >> -- >> Michael Fuhr > > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings
pgsql-general by date: