Thread: Query not using index pgsql 8.2.3
Hello list, I'm having a hard time understanding why my query is not using my indices when I change a field in the WHERE clause. Please could someone shed some light on this. Below is both queries and explain analyze for the different queries. Regards, Henrik QUERY 1. Using WHERE tbl_file_structure_id.fk_file_id = 1307249. SELECT max(pk_file_structure_id) as pk_object_id, max(fk_archive_id) AS fk_archive_id, file_name AS object_name, structure_path AS object_path, computer_name, file_ctime AS object_ctime, pk_computer_id, filetype_icon AS object_img, 'file' AS object_type, share_name, share_path, fk_file_id FROM tbl_file_structure JOIN tbl_file ON pk_file_id = fk_file_id JOIN tbl_structure ON pk_structure_id = fk_structure_id JOIN tbl_archive ON pk_archive_id = fk_archive_id JOIN tbl_share ON pk_share_id = fk_share_id JOIN tbl_computer ON pk_computer_id = fk_computer_id JOIN tbl_filetype ON pk_filetype_id = fk_filetype_id JOIN tbl_acl ON fk_file_structure_id = pk_file_structure_id WHERE archive_complete = true AND fk_file_id = 1307249 GROUP BY file_name, file_ctime, structure_path, pk_computer_id, filetype_icon, computer_name, share_name, share_path, fk_file_id ORDER BY object_name LIMIT 20 OFFSET 0 EXPLAIN FOR QUERY 1. As you can see uses INDEX SCAN on several tables. Limit (cost=8530.59..8531.49 rows=20 width=173) (actual time=1.938..1.942 rows=1 loops=1) -> GroupAggregate (cost=8530.59..8540.54 rows=221 width=173) (actual time=1.932..1.933 rows=1 loops=1) -> Sort (cost=8530.59..8531.14 rows=221 width=173) (actual time=1.762..1.787 rows=20 loops=1) Sort Key: tbl_file.file_name, tbl_file.file_ctime, tbl_structure.structure_path, tbl_computer.pk_computer_id, tbl_filetype.filetype_icon, tbl_computer.computer_name, tbl_share.share_name, tbl_share.share_path, tbl_file_structure.fk_file_id -> Hash Join (cost=4.41..8521.99 rows=221 width=173) (actual time=0.655..1.460 rows=20 loops=1) Hash Cond: (tbl_structure.fk_archive_id = tbl_archive.pk_archive_id) -> Nested Loop (cost=0.00..8514.54 rows=221 width=140) (actual time=0.184..0.904 rows=20 loops=1) -> Nested Loop (cost=0.00..820.46 rows=67 width=140) (actual time=0.154..0.558 rows=10 loops=1) -> Nested Loop (cost=0.00..280.40 rows=67 width=58) (actual time=0.123..0.287 rows=10 loops=1) -> Nested Loop (cost=0.00..9.73 rows=1 width=42) (actual time=0.087..0.179 rows=1 loops=1) Join Filter: (tbl_filetype.pk_filetype_id = tbl_file.fk_filetype_id) -> Index Scan using tbl_file_pkey on tbl_file (cost=0.00..8.28 rows=1 width=40) (actual time=0.053..0.056 rows=1 loops=1) Index Cond: (1307249 = pk_file_id) -> Seq Scan on tbl_filetype (cost=0.00..1.20 rows=20 width=18) (actual time=0.012..0.036 rows=20 loops=1) -> Index Scan using tbl_file_structure_idx on tbl_file_structure (cost=0.00..270.00 rows=67 width=24) (actual time=0.030..0.073 rows=10 loops=1) Index Cond: (fk_file_id = 1307249) -> Index Scan using tbl_structure_pkey on tbl_structure (cost=0.00..8.05 rows=1 width=98) (actual time=0.018..0.020 rows=1 loops=10) Index Cond: (tbl_structure.pk_structure_id = tbl_file_structure.fk_structure_id) -> Index Scan using tbl_acl_idx on tbl_acl (cost=0.00..114.29 rows=44 width=8) (actual time=0.021..0.025 rows=2 loops=10) Index Cond: (tbl_acl.fk_file_structure_id = tbl_file_structure.pk_file_structure_id) -> Hash (cost=4.00..4.00 rows=33 width=41) (actual time=0.444..0.444 rows=33 loops=1) -> Hash Join (cost=2.21..4.00 rows=33 width=41) (actual time=0.134..0.343 rows=33 loops=1) Hash Cond: (tbl_archive.fk_share_id = tbl_share.pk_share_id) -> Seq Scan on tbl_archive (cost=0.00..1.33 rows=33 width=16) (actual time=0.018..0.086 rows=33 loops=1) Filter: archive_complete -> Hash (cost=2.16..2.16 rows=4 width=41) (actual time=0.097..0.097 rows=4 loops=1) -> Hash Join (cost=1.07..2.16 rows=4 width=41) (actual time=0.058..0.082 rows=4 loops=1) Hash Cond: (tbl_share.fk_computer_id = tbl_computer.pk_computer_id) -> Seq Scan on tbl_share (cost=0.00..1.04 rows=4 width=29) (actual time=0.007..0.013 rows=4 loops=1) -> Hash (cost=1.03..1.03 rows=3 width=20) (actual time=0.031..0.031 rows=3 loops=1) -> Seq Scan on tbl_computer (cost=0.00..1.03 rows=3 width=20) (actual time=0.009..0.015 rows=3 loops=1) Total runtime: 2.534 ms ======================================================================== ============== ======================================================================== ============== QUERY 2. Using WHERE tbl_file.fk_filetype_id = 83. SELECT max(pk_file_structure_id) as pk_object_id, max(fk_archive_id) AS fk_archive_id, file_name AS object_name, structure_path AS object_path, computer_name, file_ctime AS object_ctime, pk_computer_id, filetype_icon AS object_img, 'file' AS object_type, share_name, share_path, fk_file_id FROM tbl_file_structure JOIN tbl_file ON pk_file_id = fk_file_id JOIN tbl_structure ON pk_structure_id = fk_structure_id JOIN tbl_archive ON pk_archive_id = fk_archive_id JOIN tbl_share ON pk_share_id = fk_share_id JOIN tbl_computer ON pk_computer_id = fk_computer_id JOIN tbl_filetype ON pk_filetype_id = fk_filetype_id JOIN tbl_acl ON fk_file_structure_id = pk_file_structure_id WHERE archive_complete = true AND fk_filetype_id = 83 GROUP BY file_name, file_ctime, structure_path, pk_computer_id, filetype_icon, computer_name, share_name, share_path, fk_file_id ORDER BY object_name LIMIT 20 OFFSET 0 EXPLAIN FOR QUERY 2. As you can see uses Bitmap Heap Scan on tbl_file.fk_filetype_id BUT SEQ SCAN on all other tables. The query cost speaks for itself. Limit (cost=150979.11..150980.01 rows=20 width=173) (actual time=25329.586..25330.167 rows=20 loops=1) -> GroupAggregate (cost=150979.11..152886.93 rows=42396 width=173) (actual time=25329.580..25330.118 rows=20 loops=1) -> Sort (cost=150979.11..151085.10 rows=42396 width=173) (actual time=25329.520..25329.614 rows=73 loops=1) Sort Key: tbl_file.file_name, tbl_file.file_ctime, tbl_structure.structure_path, tbl_computer.pk_computer_id, tbl_filetype.filetype_icon, tbl_computer.computer_name, tbl_share.share_name, tbl_share.share_path, tbl_file_structure.fk_file_id -> Nested Loop (cost=42435.52..144024.63 rows=42396 width=173) (actual time=8199.229..25311.927 rows=730 loops=1) -> Seq Scan on tbl_filetype (cost=0.00..1.25 rows=1 width=18) (actual time=0.132..0.153 rows=1 loops=1) Filter: (83 = pk_filetype_id) -> Hash Join (cost=42435.52..143599.42 rows=42396 width=171) (actual time=8199.077..25309.420 rows=730 loops=1) Hash Cond: (tbl_structure.fk_archive_id = tbl_archive.pk_archive_id) -> Hash Join (cost=42431.11..143012.07 rows=42396 width=138) (actual time=8198.468..25305.206 rows=730 loops=1) Hash Cond: (tbl_file_structure.fk_structure_id = tbl_structure.pk_structure_id) -> Hash Join (cost=31898.88..128061.91 rows=42396 width=56) (actual time=5947.642..22602.311 rows=730 loops=1) Hash Cond: (tbl_acl.fk_file_structure_id = tbl_file_structure.pk_file_structure_id) -> Seq Scan on tbl_acl (cost=0.00..79136.22 rows=4427422 width=8) (actual time=0.092..8887.385 rows=4427422 loops=1) -> Hash (cost=31738.31..31738.31 rows=12846 width=56) (actual time=5801.353..5801.353 rows=330 loops=1) -> Hash Join (cost=2094.84..31738.31 rows=12846 width=56) (actual time=292.297..5798.486 rows=330 loops=1) Hash Cond: (tbl_file_structure.fk_file_id = tbl_file.pk_file_id) -> Seq Scan on tbl_file_structure (cost=0.00..24484.55 rows=1341455 width=24) (actual time=13.953..3090.936 rows=1341455 loops=1) -> Hash (cost=2079.20..2079.20 rows=1251 width=40) (actual time=180.500..180.500 rows=112 loops=1) -> Bitmap Heap Scan on tbl_file (cost=25.96..2079.20 rows=1251 width=40) (actual time=22.210..180.109 rows=112 loops=1) Recheck Cond: (fk_filetype_id = 83) -> Bitmap Index Scan on tbl_file_idx6 (cost=0.00..25.65 rows=1251 width=0) (actual time=21.958..21.958 rows=112 loops=1) Index Cond: (fk_filetype_id = 83) -> Hash (cost=5576.88..5576.88 rows=176188 width=98) (actual time=1859.992..1859.992 rows=176188 loops=1) -> Seq Scan on tbl_structure (cost=0.00..5576.88 rows=176188 width=98) (actual time=7.321..1062.527 rows=176188 loops=1) -> Hash (cost=4.00..4.00 rows=33 width=41) (actual time=0.574..0.574 rows=33 loops=1) -> Hash Join (cost=2.21..4.00 rows=33 width=41) (actual time=0.264..0.474 rows=33 loops=1) Hash Cond: (tbl_archive.fk_share_id = tbl_share.pk_share_id) -> Seq Scan on tbl_archive (cost=0.00..1.33 rows=33 width=16) (actual time=0.054..0.122 rows=33 loops=1) Filter: archive_complete -> Hash (cost=2.16..2.16 rows=4 width=41) (actual time=0.190..0.190 rows=4 loops=1) -> Hash Join (cost=1.07..2.16 rows=4 width=41) (actual time=0.148..0.170 rows=4 loops=1) Hash Cond: (tbl_share.fk_computer_id = tbl_computer.pk_computer_id) -> Seq Scan on tbl_share (cost=0.00..1.04 rows=4 width=29) (actual time=0.049..0.055 rows=4 loops=1) -> Hash (cost=1.03..1.03 rows=3 width=20) (actual time=0.073..0.073 rows=3 loops=1) -> Seq Scan on tbl_computer (cost=0.00..1.03 rows=3 width=20) (actual time=0.050..0.058 rows=3 loops=1) Total runtime: 25330.924 ms
On Fri, Mar 23, 2007 at 11:43:54AM +0100, Henrik Zagerholm wrote: > I'm having a hard time understanding why my query is not using my > indices when I change a field in the WHERE clause. The row count estimate for fk_filetype_id = 83 is high by an order of magnitude: > Bitmap Index Scan on tbl_file_idx6 (cost=0.00..25.65 rows=1251 > width=0) (actual time=21.958..21.958 rows=112 loops=1) > > Index Cond: (fk_filetype_id = 83) Have you run ANALYZE or VACUUM ANALYZE on these tables recently? If so then you might try increasing the statistics target for tbl_file.fk_filetype_id and perhaps some of the columns in the join conditions. -- Michael Fuhr
23 mar 2007 kl. 12:33 skrev Michael Fuhr: > On Fri, Mar 23, 2007 at 11:43:54AM +0100, Henrik Zagerholm wrote: >> I'm having a hard time understanding why my query is not using my >> indices when I change a field in the WHERE clause. > > The row count estimate for fk_filetype_id = 83 is high by an order > of magnitude: > >> Bitmap Index Scan on tbl_file_idx6 (cost=0.00..25.65 rows=1251 >> width=0) (actual time=21.958..21.958 rows=112 loops=1) >> >> Index Cond: (fk_filetype_id = 83) > > Have you run ANALYZE or VACUUM ANALYZE on these tables recently? > If so then you might try increasing the statistics target for > tbl_file.fk_filetype_id and perhaps some of the columns in the join > conditions. I did a vacuum full and reindex on all tables. Now I also did a vacuum analyze on tbl_acl (the biggest table with about 4.5 millin rows) Same result. But do you mean if the row_count estimate is big it can't use any index on any other table within the JOINs? Any specific parameters I should adjust? The query only returns 3200 rows. Cheers, henrik > > -- > Michael Fuhr > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that > your > message can get through to the mailing list cleanly
On Fri, Mar 23, 2007 at 12:41:58PM +0100, Henrik Zagerholm wrote: > 23 mar 2007 kl. 12:33 skrev Michael Fuhr: > >The row count estimate for fk_filetype_id = 83 is high by an order > >of magnitude: > > > >>Bitmap Index Scan on tbl_file_idx6 (cost=0.00..25.65 rows=1251 > >>width=0) (actual time=21.958..21.958 rows=112 loops=1) > >> > >>Index Cond: (fk_filetype_id = 83) > > > >Have you run ANALYZE or VACUUM ANALYZE on these tables recently? > >If so then you might try increasing the statistics target for > >tbl_file.fk_filetype_id and perhaps some of the columns in the join > >conditions. > > I did a vacuum full and reindex on all tables. VACUUM FULL is seldom (if ever) necessary if you're running plain VACUUM (without FULL) often enough, either manually or via autovacuum. > Now I also did a vacuum analyze on tbl_acl (the biggest table with > about 4.5 millin rows) > > Same result. I'd suggest analyzing all tables. The bad estimate I mentioned appears to be for a column in tbl_file so if you didn't analyze that table then the query plan probably won't improve. > But do you mean if the row_count estimate is big it can't use any > index on any other table within the JOINs? High row count estimates make the planner think that scanning entire tables would be faster than using indexes. The more of a table a query must fetch the less efficient an index scan becomes, to the point that a sequential scan is faster than an index scan. > Any specific parameters I should adjust? If analyzing the tables doesn't improve the row count estimates then try increasing some columns' statistics targets and re-analyze the table or just that column. Example: ALTER TABLE tbl_file ALTER fk_filetype_id SET STATISTICS 100; ANALYZE tbl_file (fk_filetype_id); The default statistics target is 10; the maximum value is 1000. On some systems I've found that reducing random_page_cost from 4 (the default) to 2 gives more realistic cost estimates for index scans. -- Michael Fuhr
Henrik Zagerholm wrote: >> Have you run ANALYZE or VACUUM ANALYZE on these tables recently? >> If so then you might try increasing the statistics target for >> tbl_file.fk_filetype_id and perhaps some of the columns in the join >> conditions. > > I did a vacuum full and reindex on all tables. > > Now I also did a vacuum analyze on tbl_acl (the biggest table with about > 4.5 millin rows) > > Same result. You only updated the statistics for 1 table. Run ANALYZE on all your tables and measure again. > But do you mean if the row_count estimate is big it can't use any index > on any other table within the JOINs? You misunderstand. Tom is saying that the estimated counts (based on the data from VACUUM ANALYZE) are very different from the actual row counts (based on the data from EXPLAIN ANALYZE). That means your statistics are off, and those drive the query planner. If you have bad statistics, you can (and probably will) get bad query plans. > Any specific parameters I should adjust? If the statistics are still off this much after running analyze, you can try changing the statistics sizes for the columns where the statistics are off the most. There is also the possibility that the planner chooses an expensive plan because it doesn't have a choice. Low memory is a likely cause in such cases. Make sure you configure enough shared memory and that postgres is configured to use it. -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
23 mar 2007 kl. 13:34 skrev Michael Fuhr: > On Fri, Mar 23, 2007 at 12:41:58PM +0100, Henrik Zagerholm wrote: >> 23 mar 2007 kl. 12:33 skrev Michael Fuhr: >>> The row count estimate for fk_filetype_id = 83 is high by an order >>> of magnitude: >>> >>>> Bitmap Index Scan on tbl_file_idx6 (cost=0.00..25.65 rows=1251 >>>> width=0) (actual time=21.958..21.958 rows=112 loops=1) >>>> >>>> Index Cond: (fk_filetype_id = 83) >>> >>> Have you run ANALYZE or VACUUM ANALYZE on these tables recently? >>> If so then you might try increasing the statistics target for >>> tbl_file.fk_filetype_id and perhaps some of the columns in the join >>> conditions. >> >> I did a vacuum full and reindex on all tables. > > VACUUM FULL is seldom (if ever) necessary if you're running plain > VACUUM (without FULL) often enough, either manually or via autovacuum. > >> Now I also did a vacuum analyze on tbl_acl (the biggest table with >> about 4.5 millin rows) >> >> Same result. > > I'd suggest analyzing all tables. The bad estimate I mentioned > appears to be for a column in tbl_file so if you didn't analyze > that table then the query plan probably won't improve. > >> But do you mean if the row_count estimate is big it can't use any >> index on any other table within the JOINs? > > High row count estimates make the planner think that scanning entire > tables would be faster than using indexes. The more of a table a > query must fetch the less efficient an index scan becomes, to the > point that a sequential scan is faster than an index scan. > >> Any specific parameters I should adjust? > > If analyzing the tables doesn't improve the row count estimates > then try increasing some columns' statistics targets and re-analyze > the table or just that column. Example: > > ALTER TABLE tbl_file ALTER fk_filetype_id SET STATISTICS 100; > ANALYZE tbl_file (fk_filetype_id); > I analyzed all tables and found no difference but after changing the statistics for fk_filetype_id I got way better results. Still there is a big cost on table tbl_file_structure_id it is making a Bitmap heap scan. Could I do something to improve this? Thanks for all your help so far! Limit (cost=115168.22..115169.12 rows=20 width=173) (actual time=5138.401..5138.982 rows=20 loops=1) -> GroupAggregate (cost=115168.22..115341.60 rows=3853 width=173) (actual time=5138.394..5138.929 rows=20 loops=1) -> Sort (cost=115168.22..115177.85 rows=3853 width=173) (actual time=5138.339..5138.430 rows=73 loops=1) Sort Key: tbl_file.file_name, tbl_file.file_ctime, tbl_structure.structure_path, tbl_computer.pk_computer_id, tbl_filetype.filetype_icon, tbl_computer.computer_name, tbl_share.share_name, tbl_share.share_path, tbl_file_structure.fk_file_id -> Nested Loop (cost=8.48..114938.74 rows=3853 width=173) (actual time=185.588..5118.684 rows=730 loops=1) -> Seq Scan on tbl_filetype (cost=0.00..1.25 rows=1 width=18) (actual time=0.036..0.053 rows=1 loops=1) Filter: (83 = pk_filetype_id) -> Hash Join (cost=8.48..114898.96 rows=3853 width=171) (actual time=185.538..5116.063 rows=730 loops=1) Hash Cond: (tbl_archive.fk_share_id = tbl_share.pk_share_id) -> Nested Loop (cost=6.27..114843.77 rows=3853 width=146) (actual time=167.608..5093.255 rows=730 loops=1) -> Nested Loop (cost=6.27..32079.13 rows=1167 width=146) (actual time=130.594..3448.927 rows=330 loops=1) Join Filter: (tbl_archive.pk_archive_id = tbl_structure.fk_archive_id) -> Nested Loop (cost=4.89..31185.00 rows=1167 width=138) (actual time=119.861..3402.527 rows=330 loops=1) -> Nested Loop (cost=4.89..24413.49 rows=1167 width=56) (actual time=82.203..2097.142 rows=330 loops=1) -> Index Scan using tbl_file_idx6 on tbl_file (cost=0.00..344.56 rows=114 width=40) (actual time=13.164..15.475 rows=106 loops=1) Index Cond: (fk_filetype_id = 83) -> Bitmap Heap Scan on tbl_file_structure (cost=4.89..210.27 rows=69 width=24) (actual time=8.736..19.606 rows=3 loops=106) Recheck Cond: (tbl_file.pk_file_id = tbl_file_structure.fk_file_id) -> Bitmap Index Scan on tbl_file_structure_idx (cost=0.00..4.88 rows=69 width=0) (actual time=2.574..2.574 rows=3 loops=106) Index Cond: (tbl_file.pk_file_id = tbl_file_structure.fk_file_id) -> Index Scan using tbl_structure_pkey on tbl_structure (cost=0.00..5.79 rows=1 width=98) (actual time=3.942..3.945 rows=1 loops=330) Index Cond: (tbl_structure.pk_structure_id = tbl_file_structure.fk_structure_id) -> Materialize (cost=1.37..1.71 rows=34 width=16) (actual time=0.034..0.077 rows=34 loops=330) -> Seq Scan on tbl_archive (cost=0.00..1.34 rows=34 width=16) (actual time=10.606..10.693 rows=34 loops=1) Filter: archive_complete -> Index Scan using tbl_acl_idx on tbl_acl (cost=0.00..70.47 rows=36 width=8) (actual time=4.964..4.971 rows=2 loops=330) Index Cond: (tbl_acl.fk_file_structure_id = tbl_file_structure.pk_file_structure_id) -> Hash (cost=2.16..2.16 rows=4 width=41) (actual time=17.890..17.890 rows=4 loops=1) -> Hash Join (cost=1.07..2.16 rows=4 width=41) (actual time=17.848..17.872 rows=4 loops=1) Hash Cond: (tbl_share.fk_computer_id = tbl_computer.pk_computer_id) -> Seq Scan on tbl_share (cost=0.00..1.04 rows=4 width=29) (actual time=0.093..0.099 rows=4 loops=1) -> Hash (cost=1.03..1.03 rows=3 width=20) (actual time=17.724..17.724 rows=3 loops=1) -> Seq Scan on tbl_computer (cost=0.00..1.03 rows=3 width=20) (actual time=17.697..17.706 rows=3 loops=1) Total runtime: 5139.608 ms > The default statistics target is 10; the maximum value is 1000. > > On some systems I've found that reducing random_page_cost from 4 > (the default) to 2 gives more realistic cost estimates for index > scans. > > -- > Michael Fuhr
Never mind my last post regarding the Bitmap heap scan. That is all in order. Now it works perfectly. Went from 30+ seconds to 179ms. =) I'm just amazed that setting the statistics threshold on one column mad all the difference. IS there any guidelines on what columns I should change the statistics on? Thanks again!!! 23 mar 2007 kl. 14:04 skrev Henrik Zagerholm: > > 23 mar 2007 kl. 13:34 skrev Michael Fuhr: > >> On Fri, Mar 23, 2007 at 12:41:58PM +0100, Henrik Zagerholm wrote: >>> 23 mar 2007 kl. 12:33 skrev Michael Fuhr: >>>> The row count estimate for fk_filetype_id = 83 is high by an order >>>> of magnitude: >>>> >>>>> Bitmap Index Scan on tbl_file_idx6 (cost=0.00..25.65 rows=1251 >>>>> width=0) (actual time=21.958..21.958 rows=112 loops=1) >>>>> >>>>> Index Cond: (fk_filetype_id = 83) >>>> >>>> Have you run ANALYZE or VACUUM ANALYZE on these tables recently? >>>> If so then you might try increasing the statistics target for >>>> tbl_file.fk_filetype_id and perhaps some of the columns in the join >>>> conditions. >>> >>> I did a vacuum full and reindex on all tables. >> >> VACUUM FULL is seldom (if ever) necessary if you're running plain >> VACUUM (without FULL) often enough, either manually or via >> autovacuum. >> >>> Now I also did a vacuum analyze on tbl_acl (the biggest table with >>> about 4.5 millin rows) >>> >>> Same result. >> >> I'd suggest analyzing all tables. The bad estimate I mentioned >> appears to be for a column in tbl_file so if you didn't analyze >> that table then the query plan probably won't improve. >> >>> But do you mean if the row_count estimate is big it can't use any >>> index on any other table within the JOINs? >> >> High row count estimates make the planner think that scanning entire >> tables would be faster than using indexes. The more of a table a >> query must fetch the less efficient an index scan becomes, to the >> point that a sequential scan is faster than an index scan. >> >>> Any specific parameters I should adjust? >> >> If analyzing the tables doesn't improve the row count estimates >> then try increasing some columns' statistics targets and re-analyze >> the table or just that column. Example: >> >> ALTER TABLE tbl_file ALTER fk_filetype_id SET STATISTICS 100; >> ANALYZE tbl_file (fk_filetype_id); >> > I analyzed all tables and found no difference but after changing > the statistics for fk_filetype_id I got way better results. > Still there is a big cost on table tbl_file_structure_id it is > making a Bitmap heap scan. > Could I do something to improve this? > > Thanks for all your help so far! > > Limit (cost=115168.22..115169.12 rows=20 width=173) (actual > time=5138.401..5138.982 rows=20 loops=1) > -> GroupAggregate (cost=115168.22..115341.60 rows=3853 > width=173) (actual time=5138.394..5138.929 rows=20 loops=1) > -> Sort (cost=115168.22..115177.85 rows=3853 width=173) > (actual time=5138.339..5138.430 rows=73 loops=1) > Sort Key: tbl_file.file_name, tbl_file.file_ctime, > tbl_structure.structure_path, tbl_computer.pk_computer_id, > tbl_filetype.filetype_icon, tbl_computer.computer_name, > tbl_share.share_name, tbl_share.share_path, > tbl_file_structure.fk_file_id > -> Nested Loop (cost=8.48..114938.74 rows=3853 > width=173) (actual time=185.588..5118.684 rows=730 loops=1) > -> Seq Scan on tbl_filetype (cost=0.00..1.25 > rows=1 width=18) (actual time=0.036..0.053 rows=1 loops=1) > Filter: (83 = pk_filetype_id) > -> Hash Join (cost=8.48..114898.96 rows=3853 > width=171) (actual time=185.538..5116.063 rows=730 loops=1) > Hash Cond: (tbl_archive.fk_share_id = > tbl_share.pk_share_id) > -> Nested Loop (cost=6.27..114843.77 > rows=3853 width=146) (actual time=167.608..5093.255 rows=730 loops=1) > -> Nested Loop > (cost=6.27..32079.13 rows=1167 width=146) (actual > time=130.594..3448.927 rows=330 loops=1) > Join Filter: > (tbl_archive.pk_archive_id = tbl_structure.fk_archive_id) > -> Nested Loop > (cost=4.89..31185.00 rows=1167 width=138) (actual > time=119.861..3402.527 rows=330 loops=1) > -> Nested Loop > (cost=4.89..24413.49 rows=1167 width=56) (actual > time=82.203..2097.142 rows=330 loops=1) > -> Index Scan > using tbl_file_idx6 on tbl_file (cost=0.00..344.56 rows=114 > width=40) (actual time=13.164..15.475 rows=106 loops=1) > Index Cond: > (fk_filetype_id = 83) > -> Bitmap Heap > Scan on tbl_file_structure (cost=4.89..210.27 rows=69 width=24) > (actual time=8.736..19.606 rows=3 loops=106) > Recheck > Cond: (tbl_file.pk_file_id = tbl_file_structure.fk_file_id) > -> Bitmap > Index Scan on tbl_file_structure_idx (cost=0.00..4.88 rows=69 > width=0) (actual time=2.574..2.574 rows=3 loops=106) > Index > Cond: (tbl_file.pk_file_id = tbl_file_structure.fk_file_id) > -> Index Scan using > tbl_structure_pkey on tbl_structure (cost=0.00..5.79 rows=1 > width=98) (actual time=3.942..3.945 rows=1 loops=330) > Index Cond: > (tbl_structure.pk_structure_id = tbl_file_structure.fk_structure_id) > -> Materialize > (cost=1.37..1.71 rows=34 width=16) (actual time=0.034..0.077 > rows=34 loops=330) > -> Seq Scan on > tbl_archive (cost=0.00..1.34 rows=34 width=16) (actual > time=10.606..10.693 rows=34 loops=1) > Filter: > archive_complete > -> Index Scan using tbl_acl_idx on > tbl_acl (cost=0.00..70.47 rows=36 width=8) (actual > time=4.964..4.971 rows=2 loops=330) > Index Cond: > (tbl_acl.fk_file_structure_id = > tbl_file_structure.pk_file_structure_id) > -> Hash (cost=2.16..2.16 rows=4 > width=41) (actual time=17.890..17.890 rows=4 loops=1) > -> Hash Join (cost=1.07..2.16 > rows=4 width=41) (actual time=17.848..17.872 rows=4 loops=1) > Hash Cond: > (tbl_share.fk_computer_id = tbl_computer.pk_computer_id) > -> Seq Scan on tbl_share > (cost=0.00..1.04 rows=4 width=29) (actual time=0.093..0.099 rows=4 > loops=1) > -> Hash (cost=1.03..1.03 > rows=3 width=20) (actual time=17.724..17.724 rows=3 loops=1) > -> Seq Scan on > tbl_computer (cost=0.00..1.03 rows=3 width=20) (actual > time=17.697..17.706 rows=3 loops=1) > Total runtime: 5139.608 ms > > > >> The default statistics target is 10; the maximum value is 1000. >> >> On some systems I've found that reducing random_page_cost from 4 >> (the default) to 2 gives more realistic cost estimates for index >> scans. >> >> -- >> Michael Fuhr > > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings
On Fri, Mar 23, 2007 at 02:13:57PM +0100, Henrik Zagerholm wrote: > I'm just amazed that setting the statistics threshold on one column > mad all the difference. IS there any guidelines on what columns I > should change the statistics on? Start by looking for columns involved in simple comparisons with a constant -- especially if you know that the distribution of values is uneven -- and see how far off the row count estimate is for various values. You can use simple queries such as EXPLAIN ANALYZE SELECT 1 FROM tbl_file WHERE fk_filetype_id = 83; To learn more read "How the Planner Uses Statistics" in the documentation: http://www.postgresql.org/docs/8.2/interactive/planner-stats-details.html Among the configuration settings to consider changing are shared_buffers, effective_cache_size, work_mem, and random_page_cost. The following is a good starting point: http://www.powerpostgresql.com/PerfList If you have additional performance-related questions then consider posting to pgsql-performance. -- Michael Fuhr