Thread: Query not using index pgsql 8.2.3

Query not using index pgsql 8.2.3

From
Henrik Zagerholm
Date:
Hello list,

I'm having a hard time understanding why my query is not using my
indices when I change a field in the WHERE clause.
Please could someone shed some light on this.

Below is both queries and explain analyze for the different queries.

Regards,
Henrik

QUERY 1. Using WHERE tbl_file_structure_id.fk_file_id = 1307249.

SELECT max(pk_file_structure_id) as pk_object_id, max(fk_archive_id)
AS fk_archive_id, file_name AS

  object_name, structure_path AS object_path, computer_name,

  file_ctime AS object_ctime, pk_computer_id, filetype_icon AS
object_img, 'file' AS object_type, share_name, share_path, fk_file_id
FROM tbl_file_structure
                                                                 JOIN
tbl_file ON pk_file_id = fk_file_id
                                                                 JOIN
tbl_structure ON pk_structure_id = fk_structure_id
                                                                 JOIN
tbl_archive ON pk_archive_id =     fk_archive_id
                                                                 JOIN
tbl_share ON pk_share_id = fk_share_id
                                                                 JOIN
tbl_computer ON pk_computer_id = fk_computer_id
                                                                 JOIN
tbl_filetype ON pk_filetype_id = fk_filetype_id
                                                                 JOIN
tbl_acl ON fk_file_structure_id = pk_file_structure_id WHERE
archive_complete = true AND fk_file_id = 1307249 GROUP BY file_name,
file_ctime, structure_path,

  pk_computer_id, filetype_icon, computer_name, share_name,
share_path, fk_file_id ORDER BY object_name LIMIT 20 OFFSET 0


EXPLAIN FOR QUERY 1. As you can see uses INDEX SCAN on several tables.

Limit  (cost=8530.59..8531.49 rows=20 width=173) (actual
time=1.938..1.942 rows=1 loops=1)
   ->  GroupAggregate  (cost=8530.59..8540.54 rows=221 width=173)
(actual time=1.932..1.933 rows=1 loops=1)
         ->  Sort  (cost=8530.59..8531.14 rows=221 width=173) (actual
time=1.762..1.787 rows=20 loops=1)
               Sort Key: tbl_file.file_name, tbl_file.file_ctime,
tbl_structure.structure_path, tbl_computer.pk_computer_id,
tbl_filetype.filetype_icon, tbl_computer.computer_name,
tbl_share.share_name, tbl_share.share_path,
tbl_file_structure.fk_file_id
               ->  Hash Join  (cost=4.41..8521.99 rows=221 width=173)
(actual time=0.655..1.460 rows=20 loops=1)
                     Hash Cond: (tbl_structure.fk_archive_id =
tbl_archive.pk_archive_id)
                     ->  Nested Loop  (cost=0.00..8514.54 rows=221
width=140) (actual time=0.184..0.904 rows=20 loops=1)
                           ->  Nested Loop  (cost=0.00..820.46
rows=67 width=140) (actual time=0.154..0.558 rows=10 loops=1)
                                 ->  Nested Loop  (cost=0.00..280.40
rows=67 width=58) (actual time=0.123..0.287 rows=10 loops=1)
                                       ->  Nested Loop
(cost=0.00..9.73 rows=1 width=42) (actual time=0.087..0.179 rows=1
loops=1)
                                             Join Filter:
(tbl_filetype.pk_filetype_id = tbl_file.fk_filetype_id)
                                             ->  Index Scan using
tbl_file_pkey on tbl_file  (cost=0.00..8.28 rows=1 width=40) (actual
time=0.053..0.056 rows=1 loops=1)
                                                   Index Cond:
(1307249 = pk_file_id)
                                             ->  Seq Scan on
tbl_filetype  (cost=0.00..1.20 rows=20 width=18) (actual
time=0.012..0.036 rows=20 loops=1)
                                       ->  Index Scan using
tbl_file_structure_idx on tbl_file_structure  (cost=0.00..270.00
rows=67 width=24) (actual time=0.030..0.073 rows=10 loops=1)
                                             Index Cond: (fk_file_id
= 1307249)
                                 ->  Index Scan using
tbl_structure_pkey on tbl_structure  (cost=0.00..8.05 rows=1
width=98) (actual time=0.018..0.020 rows=1 loops=10)
                                       Index Cond:
(tbl_structure.pk_structure_id = tbl_file_structure.fk_structure_id)
                           ->  Index Scan using tbl_acl_idx on
tbl_acl  (cost=0.00..114.29 rows=44 width=8) (actual
time=0.021..0.025 rows=2 loops=10)
                                 Index Cond:
(tbl_acl.fk_file_structure_id = tbl_file_structure.pk_file_structure_id)
                     ->  Hash  (cost=4.00..4.00 rows=33 width=41)
(actual time=0.444..0.444 rows=33 loops=1)
                           ->  Hash Join  (cost=2.21..4.00 rows=33
width=41) (actual time=0.134..0.343 rows=33 loops=1)
                                 Hash Cond: (tbl_archive.fk_share_id
= tbl_share.pk_share_id)
                                 ->  Seq Scan on tbl_archive
(cost=0.00..1.33 rows=33 width=16) (actual time=0.018..0.086 rows=33
loops=1)
                                       Filter: archive_complete
                                 ->  Hash  (cost=2.16..2.16 rows=4
width=41) (actual time=0.097..0.097 rows=4 loops=1)
                                       ->  Hash Join
(cost=1.07..2.16 rows=4 width=41) (actual time=0.058..0.082 rows=4
loops=1)
                                             Hash Cond:
(tbl_share.fk_computer_id = tbl_computer.pk_computer_id)
                                             ->  Seq Scan on
tbl_share  (cost=0.00..1.04 rows=4 width=29) (actual
time=0.007..0.013 rows=4 loops=1)
                                             ->  Hash
(cost=1.03..1.03 rows=3 width=20) (actual time=0.031..0.031 rows=3
loops=1)
                                                   ->  Seq Scan on
tbl_computer  (cost=0.00..1.03 rows=3 width=20) (actual
time=0.009..0.015 rows=3 loops=1)
Total runtime: 2.534 ms

========================================================================
==============
========================================================================
==============


QUERY 2. Using WHERE tbl_file.fk_filetype_id = 83.

SELECT max(pk_file_structure_id) as pk_object_id, max(fk_archive_id)
AS fk_archive_id, file_name AS

  object_name, structure_path AS object_path, computer_name,

  file_ctime AS object_ctime, pk_computer_id, filetype_icon AS
object_img, 'file' AS object_type, share_name, share_path, fk_file_id
FROM tbl_file_structure
                                                                 JOIN
tbl_file ON pk_file_id = fk_file_id
                                                                 JOIN
tbl_structure ON pk_structure_id = fk_structure_id
                                                                 JOIN
tbl_archive ON pk_archive_id =     fk_archive_id
                                                                 JOIN
tbl_share ON pk_share_id = fk_share_id
                                                                 JOIN
tbl_computer ON pk_computer_id = fk_computer_id
                                                                 JOIN
tbl_filetype ON pk_filetype_id = fk_filetype_id
                                                                 JOIN
tbl_acl ON fk_file_structure_id = pk_file_structure_id WHERE
archive_complete = true AND fk_filetype_id = 83 GROUP BY file_name,
file_ctime, structure_path,

  pk_computer_id, filetype_icon, computer_name, share_name,
share_path, fk_file_id ORDER BY object_name LIMIT 20 OFFSET 0




EXPLAIN FOR QUERY 2. As you can see uses Bitmap Heap Scan on
tbl_file.fk_filetype_id BUT SEQ SCAN on all other tables.
The query cost speaks for itself.

Limit  (cost=150979.11..150980.01 rows=20 width=173) (actual
time=25329.586..25330.167 rows=20 loops=1)
   ->  GroupAggregate  (cost=150979.11..152886.93 rows=42396
width=173) (actual time=25329.580..25330.118 rows=20 loops=1)
         ->  Sort  (cost=150979.11..151085.10 rows=42396 width=173)
(actual time=25329.520..25329.614 rows=73 loops=1)
               Sort Key: tbl_file.file_name, tbl_file.file_ctime,
tbl_structure.structure_path, tbl_computer.pk_computer_id,
tbl_filetype.filetype_icon, tbl_computer.computer_name,
tbl_share.share_name, tbl_share.share_path,
tbl_file_structure.fk_file_id
               ->  Nested Loop  (cost=42435.52..144024.63 rows=42396
width=173) (actual time=8199.229..25311.927 rows=730 loops=1)
                     ->  Seq Scan on tbl_filetype  (cost=0.00..1.25
rows=1 width=18) (actual time=0.132..0.153 rows=1 loops=1)
                           Filter: (83 = pk_filetype_id)
                     ->  Hash Join  (cost=42435.52..143599.42
rows=42396 width=171) (actual time=8199.077..25309.420 rows=730 loops=1)
                           Hash Cond: (tbl_structure.fk_archive_id =
tbl_archive.pk_archive_id)
                           ->  Hash Join  (cost=42431.11..143012.07
rows=42396 width=138) (actual time=8198.468..25305.206 rows=730 loops=1)
                                 Hash Cond:
(tbl_file_structure.fk_structure_id = tbl_structure.pk_structure_id)
                                 ->  Hash Join
(cost=31898.88..128061.91 rows=42396 width=56) (actual
time=5947.642..22602.311 rows=730 loops=1)
                                       Hash Cond:
(tbl_acl.fk_file_structure_id = tbl_file_structure.pk_file_structure_id)
                                       ->  Seq Scan on tbl_acl
(cost=0.00..79136.22 rows=4427422 width=8) (actual
time=0.092..8887.385 rows=4427422 loops=1)
                                       ->  Hash
(cost=31738.31..31738.31 rows=12846 width=56) (actual
time=5801.353..5801.353 rows=330 loops=1)
                                             ->  Hash Join
(cost=2094.84..31738.31 rows=12846 width=56) (actual
time=292.297..5798.486 rows=330 loops=1)
                                                   Hash Cond:
(tbl_file_structure.fk_file_id = tbl_file.pk_file_id)
                                                   ->  Seq Scan on
tbl_file_structure  (cost=0.00..24484.55 rows=1341455 width=24)
(actual time=13.953..3090.936 rows=1341455 loops=1)
                                                   ->  Hash
(cost=2079.20..2079.20 rows=1251 width=40) (actual
time=180.500..180.500 rows=112 loops=1)
                                                         ->  Bitmap
Heap Scan on tbl_file  (cost=25.96..2079.20 rows=1251 width=40)
(actual time=22.210..180.109 rows=112 loops=1)

Recheck Cond: (fk_filetype_id = 83)
                                                               ->
Bitmap Index Scan on tbl_file_idx6  (cost=0.00..25.65 rows=1251
width=0) (actual time=21.958..21.958 rows=112 loops=1)

Index Cond: (fk_filetype_id = 83)
                                 ->  Hash  (cost=5576.88..5576.88
rows=176188 width=98) (actual time=1859.992..1859.992 rows=176188
loops=1)
                                       ->  Seq Scan on tbl_structure
(cost=0.00..5576.88 rows=176188 width=98) (actual
time=7.321..1062.527 rows=176188 loops=1)
                           ->  Hash  (cost=4.00..4.00 rows=33
width=41) (actual time=0.574..0.574 rows=33 loops=1)
                                 ->  Hash Join  (cost=2.21..4.00
rows=33 width=41) (actual time=0.264..0.474 rows=33 loops=1)
                                       Hash Cond:
(tbl_archive.fk_share_id = tbl_share.pk_share_id)
                                       ->  Seq Scan on tbl_archive
(cost=0.00..1.33 rows=33 width=16) (actual time=0.054..0.122 rows=33
loops=1)
                                             Filter: archive_complete
                                       ->  Hash  (cost=2.16..2.16
rows=4 width=41) (actual time=0.190..0.190 rows=4 loops=1)
                                             ->  Hash Join
(cost=1.07..2.16 rows=4 width=41) (actual time=0.148..0.170 rows=4
loops=1)
                                                   Hash Cond:
(tbl_share.fk_computer_id = tbl_computer.pk_computer_id)
                                                   ->  Seq Scan on
tbl_share  (cost=0.00..1.04 rows=4 width=29) (actual
time=0.049..0.055 rows=4 loops=1)
                                                   ->  Hash
(cost=1.03..1.03 rows=3 width=20) (actual time=0.073..0.073 rows=3
loops=1)
                                                         ->  Seq Scan
on tbl_computer  (cost=0.00..1.03 rows=3 width=20) (actual
time=0.050..0.058 rows=3 loops=1)
Total runtime: 25330.924 ms






Re: Query not using index pgsql 8.2.3

From
Michael Fuhr
Date:
On Fri, Mar 23, 2007 at 11:43:54AM +0100, Henrik Zagerholm wrote:
> I'm having a hard time understanding why my query is not using my
> indices when I change a field in the WHERE clause.

The row count estimate for fk_filetype_id = 83 is high by an order
of magnitude:

> Bitmap Index Scan on tbl_file_idx6  (cost=0.00..25.65 rows=1251
> width=0) (actual time=21.958..21.958 rows=112 loops=1)
>
> Index Cond: (fk_filetype_id = 83)

Have you run ANALYZE or VACUUM ANALYZE on these tables recently?
If so then you might try increasing the statistics target for
tbl_file.fk_filetype_id and perhaps some of the columns in the join
conditions.

--
Michael Fuhr

Re: Query not using index pgsql 8.2.3

From
Henrik Zagerholm
Date:
23 mar 2007 kl. 12:33 skrev Michael Fuhr:

> On Fri, Mar 23, 2007 at 11:43:54AM +0100, Henrik Zagerholm wrote:
>> I'm having a hard time understanding why my query is not using my
>> indices when I change a field in the WHERE clause.
>
> The row count estimate for fk_filetype_id = 83 is high by an order
> of magnitude:
>
>> Bitmap Index Scan on tbl_file_idx6  (cost=0.00..25.65 rows=1251
>> width=0) (actual time=21.958..21.958 rows=112 loops=1)
>>
>> Index Cond: (fk_filetype_id = 83)
>
> Have you run ANALYZE or VACUUM ANALYZE on these tables recently?
> If so then you might try increasing the statistics target for
> tbl_file.fk_filetype_id and perhaps some of the columns in the join
> conditions.

I did a vacuum full and reindex on all tables.

Now I also did a vacuum analyze on tbl_acl (the biggest table with
about 4.5 millin rows)

Same result.

But do you mean if the row_count estimate is big it can't use any
index on any other table within the JOINs?

Any specific parameters I should adjust?

The query only returns 3200 rows.

Cheers,
henrik


>
> --
> Michael Fuhr
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that
> your
>        message can get through to the mailing list cleanly


Re: Query not using index pgsql 8.2.3

From
Michael Fuhr
Date:
On Fri, Mar 23, 2007 at 12:41:58PM +0100, Henrik Zagerholm wrote:
> 23 mar 2007 kl. 12:33 skrev Michael Fuhr:
> >The row count estimate for fk_filetype_id = 83 is high by an order
> >of magnitude:
> >
> >>Bitmap Index Scan on tbl_file_idx6  (cost=0.00..25.65 rows=1251
> >>width=0) (actual time=21.958..21.958 rows=112 loops=1)
> >>
> >>Index Cond: (fk_filetype_id = 83)
> >
> >Have you run ANALYZE or VACUUM ANALYZE on these tables recently?
> >If so then you might try increasing the statistics target for
> >tbl_file.fk_filetype_id and perhaps some of the columns in the join
> >conditions.
>
> I did a vacuum full and reindex on all tables.

VACUUM FULL is seldom (if ever) necessary if you're running plain
VACUUM (without FULL) often enough, either manually or via autovacuum.

> Now I also did a vacuum analyze on tbl_acl (the biggest table with
> about 4.5 millin rows)
>
> Same result.

I'd suggest analyzing all tables.  The bad estimate I mentioned
appears to be for a column in tbl_file so if you didn't analyze
that table then the query plan probably won't improve.

> But do you mean if the row_count estimate is big it can't use any
> index on any other table within the JOINs?

High row count estimates make the planner think that scanning entire
tables would be faster than using indexes.  The more of a table a
query must fetch the less efficient an index scan becomes, to the
point that a sequential scan is faster than an index scan.

> Any specific parameters I should adjust?

If analyzing the tables doesn't improve the row count estimates
then try increasing some columns' statistics targets and re-analyze
the table or just that column.  Example:

ALTER TABLE tbl_file ALTER fk_filetype_id SET STATISTICS 100;
ANALYZE tbl_file (fk_filetype_id);

The default statistics target is 10; the maximum value is 1000.

On some systems I've found that reducing random_page_cost from 4
(the default) to 2 gives more realistic cost estimates for index
scans.

--
Michael Fuhr

Re: Query not using index pgsql 8.2.3

From
Alban Hertroys
Date:
Henrik Zagerholm wrote:
>> Have you run ANALYZE or VACUUM ANALYZE on these tables recently?
>> If so then you might try increasing the statistics target for
>> tbl_file.fk_filetype_id and perhaps some of the columns in the join
>> conditions.
>
> I did a vacuum full and reindex on all tables.
>
> Now I also did a vacuum analyze on tbl_acl (the biggest table with about
> 4.5 millin rows)
>
> Same result.

You only updated the statistics for 1 table. Run ANALYZE on all your
tables and measure again.

> But do you mean if the row_count estimate is big it can't use any index
> on any other table within the JOINs?

You misunderstand. Tom is saying that the estimated counts (based on the
data from VACUUM ANALYZE) are very different from the actual row counts
(based on the data from EXPLAIN ANALYZE).

That means your statistics are off, and those drive the query planner.
If you have bad statistics, you can (and probably will) get bad query plans.

> Any specific parameters I should adjust?

If the statistics are still off this much after running analyze, you can
try changing the statistics sizes for the columns where the statistics
are off the most.

There is also the possibility that the planner chooses an expensive plan
because it doesn't have a choice. Low memory is a likely cause in such
cases. Make sure you configure enough shared memory and that postgres is
configured to use it.

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

Re: Query not using index pgsql 8.2.3

From
Henrik Zagerholm
Date:
23 mar 2007 kl. 13:34 skrev Michael Fuhr:

> On Fri, Mar 23, 2007 at 12:41:58PM +0100, Henrik Zagerholm wrote:
>> 23 mar 2007 kl. 12:33 skrev Michael Fuhr:
>>> The row count estimate for fk_filetype_id = 83 is high by an order
>>> of magnitude:
>>>
>>>> Bitmap Index Scan on tbl_file_idx6  (cost=0.00..25.65 rows=1251
>>>> width=0) (actual time=21.958..21.958 rows=112 loops=1)
>>>>
>>>> Index Cond: (fk_filetype_id = 83)
>>>
>>> Have you run ANALYZE or VACUUM ANALYZE on these tables recently?
>>> If so then you might try increasing the statistics target for
>>> tbl_file.fk_filetype_id and perhaps some of the columns in the join
>>> conditions.
>>
>> I did a vacuum full and reindex on all tables.
>
> VACUUM FULL is seldom (if ever) necessary if you're running plain
> VACUUM (without FULL) often enough, either manually or via autovacuum.
>
>> Now I also did a vacuum analyze on tbl_acl (the biggest table with
>> about 4.5 millin rows)
>>
>> Same result.
>
> I'd suggest analyzing all tables.  The bad estimate I mentioned
> appears to be for a column in tbl_file so if you didn't analyze
> that table then the query plan probably won't improve.
>
>> But do you mean if the row_count estimate is big it can't use any
>> index on any other table within the JOINs?
>
> High row count estimates make the planner think that scanning entire
> tables would be faster than using indexes.  The more of a table a
> query must fetch the less efficient an index scan becomes, to the
> point that a sequential scan is faster than an index scan.
>
>> Any specific parameters I should adjust?
>
> If analyzing the tables doesn't improve the row count estimates
> then try increasing some columns' statistics targets and re-analyze
> the table or just that column.  Example:
>
> ALTER TABLE tbl_file ALTER fk_filetype_id SET STATISTICS 100;
> ANALYZE tbl_file (fk_filetype_id);
>
I analyzed all tables and found no difference but after changing the
statistics for fk_filetype_id I got way better results.
Still there is a big cost on table tbl_file_structure_id it is making
a Bitmap heap scan.
Could I do something to improve this?

Thanks for all your help so far!

Limit  (cost=115168.22..115169.12 rows=20 width=173) (actual
time=5138.401..5138.982 rows=20 loops=1)
   ->  GroupAggregate  (cost=115168.22..115341.60 rows=3853
width=173) (actual time=5138.394..5138.929 rows=20 loops=1)
         ->  Sort  (cost=115168.22..115177.85 rows=3853 width=173)
(actual time=5138.339..5138.430 rows=73 loops=1)
               Sort Key: tbl_file.file_name, tbl_file.file_ctime,
tbl_structure.structure_path, tbl_computer.pk_computer_id,
tbl_filetype.filetype_icon, tbl_computer.computer_name,
tbl_share.share_name, tbl_share.share_path,
tbl_file_structure.fk_file_id
               ->  Nested Loop  (cost=8.48..114938.74 rows=3853
width=173) (actual time=185.588..5118.684 rows=730 loops=1)
                     ->  Seq Scan on tbl_filetype  (cost=0.00..1.25
rows=1 width=18) (actual time=0.036..0.053 rows=1 loops=1)
                           Filter: (83 = pk_filetype_id)
                     ->  Hash Join  (cost=8.48..114898.96 rows=3853
width=171) (actual time=185.538..5116.063 rows=730 loops=1)
                           Hash Cond: (tbl_archive.fk_share_id =
tbl_share.pk_share_id)
                           ->  Nested Loop  (cost=6.27..114843.77
rows=3853 width=146) (actual time=167.608..5093.255 rows=730 loops=1)
                                 ->  Nested Loop
(cost=6.27..32079.13 rows=1167 width=146) (actual
time=130.594..3448.927 rows=330 loops=1)
                                       Join Filter:
(tbl_archive.pk_archive_id = tbl_structure.fk_archive_id)
                                       ->  Nested Loop
(cost=4.89..31185.00 rows=1167 width=138) (actual
time=119.861..3402.527 rows=330 loops=1)
                                             ->  Nested Loop
(cost=4.89..24413.49 rows=1167 width=56) (actual
time=82.203..2097.142 rows=330 loops=1)
                                                   ->  Index Scan
using tbl_file_idx6 on tbl_file  (cost=0.00..344.56 rows=114
width=40) (actual time=13.164..15.475 rows=106 loops=1)
                                                         Index Cond:
(fk_filetype_id = 83)
                                                   ->  Bitmap Heap
Scan on tbl_file_structure  (cost=4.89..210.27 rows=69 width=24)
(actual time=8.736..19.606 rows=3 loops=106)
                                                         Recheck
Cond: (tbl_file.pk_file_id = tbl_file_structure.fk_file_id)
                                                         ->  Bitmap
Index Scan on tbl_file_structure_idx  (cost=0.00..4.88 rows=69
width=0) (actual time=2.574..2.574 rows=3 loops=106)
                                                               Index
Cond: (tbl_file.pk_file_id = tbl_file_structure.fk_file_id)
                                             ->  Index Scan using
tbl_structure_pkey on tbl_structure  (cost=0.00..5.79 rows=1
width=98) (actual time=3.942..3.945 rows=1 loops=330)
                                                   Index Cond:
(tbl_structure.pk_structure_id = tbl_file_structure.fk_structure_id)
                                       ->  Materialize
(cost=1.37..1.71 rows=34 width=16) (actual time=0.034..0.077 rows=34
loops=330)
                                             ->  Seq Scan on
tbl_archive  (cost=0.00..1.34 rows=34 width=16) (actual
time=10.606..10.693 rows=34 loops=1)
                                                   Filter:
archive_complete
                                 ->  Index Scan using tbl_acl_idx on
tbl_acl  (cost=0.00..70.47 rows=36 width=8) (actual time=4.964..4.971
rows=2 loops=330)
                                       Index Cond:
(tbl_acl.fk_file_structure_id = tbl_file_structure.pk_file_structure_id)
                           ->  Hash  (cost=2.16..2.16 rows=4
width=41) (actual time=17.890..17.890 rows=4 loops=1)
                                 ->  Hash Join  (cost=1.07..2.16
rows=4 width=41) (actual time=17.848..17.872 rows=4 loops=1)
                                       Hash Cond:
(tbl_share.fk_computer_id = tbl_computer.pk_computer_id)
                                       ->  Seq Scan on tbl_share
(cost=0.00..1.04 rows=4 width=29) (actual time=0.093..0.099 rows=4
loops=1)
                                       ->  Hash  (cost=1.03..1.03
rows=3 width=20) (actual time=17.724..17.724 rows=3 loops=1)
                                             ->  Seq Scan on
tbl_computer  (cost=0.00..1.03 rows=3 width=20) (actual
time=17.697..17.706 rows=3 loops=1)
Total runtime: 5139.608 ms



> The default statistics target is 10; the maximum value is 1000.
>
> On some systems I've found that reducing random_page_cost from 4
> (the default) to 2 gives more realistic cost estimates for index
> scans.
>
> --
> Michael Fuhr


Re: Query not using index pgsql 8.2.3

From
Henrik Zagerholm
Date:
Never mind my last post regarding the Bitmap heap scan. That is all
in order.

Now it works perfectly. Went from 30+ seconds to 179ms. =)

I'm just amazed that setting the statistics threshold on one column
mad all the difference. IS there any guidelines on what columns I
should change the statistics on?


Thanks again!!!

23 mar 2007 kl. 14:04 skrev Henrik Zagerholm:

>
> 23 mar 2007 kl. 13:34 skrev Michael Fuhr:
>
>> On Fri, Mar 23, 2007 at 12:41:58PM +0100, Henrik Zagerholm wrote:
>>> 23 mar 2007 kl. 12:33 skrev Michael Fuhr:
>>>> The row count estimate for fk_filetype_id = 83 is high by an order
>>>> of magnitude:
>>>>
>>>>> Bitmap Index Scan on tbl_file_idx6  (cost=0.00..25.65 rows=1251
>>>>> width=0) (actual time=21.958..21.958 rows=112 loops=1)
>>>>>
>>>>> Index Cond: (fk_filetype_id = 83)
>>>>
>>>> Have you run ANALYZE or VACUUM ANALYZE on these tables recently?
>>>> If so then you might try increasing the statistics target for
>>>> tbl_file.fk_filetype_id and perhaps some of the columns in the join
>>>> conditions.
>>>
>>> I did a vacuum full and reindex on all tables.
>>
>> VACUUM FULL is seldom (if ever) necessary if you're running plain
>> VACUUM (without FULL) often enough, either manually or via
>> autovacuum.
>>
>>> Now I also did a vacuum analyze on tbl_acl (the biggest table with
>>> about 4.5 millin rows)
>>>
>>> Same result.
>>
>> I'd suggest analyzing all tables.  The bad estimate I mentioned
>> appears to be for a column in tbl_file so if you didn't analyze
>> that table then the query plan probably won't improve.
>>
>>> But do you mean if the row_count estimate is big it can't use any
>>> index on any other table within the JOINs?
>>
>> High row count estimates make the planner think that scanning entire
>> tables would be faster than using indexes.  The more of a table a
>> query must fetch the less efficient an index scan becomes, to the
>> point that a sequential scan is faster than an index scan.
>>
>>> Any specific parameters I should adjust?
>>
>> If analyzing the tables doesn't improve the row count estimates
>> then try increasing some columns' statistics targets and re-analyze
>> the table or just that column.  Example:
>>
>> ALTER TABLE tbl_file ALTER fk_filetype_id SET STATISTICS 100;
>> ANALYZE tbl_file (fk_filetype_id);
>>
> I analyzed all tables and found no difference but after changing
> the statistics for fk_filetype_id I got way better results.
> Still there is a big cost on table tbl_file_structure_id it is
> making a Bitmap heap scan.
> Could I do something to improve this?
>
> Thanks for all your help so far!
>
> Limit  (cost=115168.22..115169.12 rows=20 width=173) (actual
> time=5138.401..5138.982 rows=20 loops=1)
>   ->  GroupAggregate  (cost=115168.22..115341.60 rows=3853
> width=173) (actual time=5138.394..5138.929 rows=20 loops=1)
>         ->  Sort  (cost=115168.22..115177.85 rows=3853 width=173)
> (actual time=5138.339..5138.430 rows=73 loops=1)
>               Sort Key: tbl_file.file_name, tbl_file.file_ctime,
> tbl_structure.structure_path, tbl_computer.pk_computer_id,
> tbl_filetype.filetype_icon, tbl_computer.computer_name,
> tbl_share.share_name, tbl_share.share_path,
> tbl_file_structure.fk_file_id
>               ->  Nested Loop  (cost=8.48..114938.74 rows=3853
> width=173) (actual time=185.588..5118.684 rows=730 loops=1)
>                     ->  Seq Scan on tbl_filetype  (cost=0.00..1.25
> rows=1 width=18) (actual time=0.036..0.053 rows=1 loops=1)
>                           Filter: (83 = pk_filetype_id)
>                     ->  Hash Join  (cost=8.48..114898.96 rows=3853
> width=171) (actual time=185.538..5116.063 rows=730 loops=1)
>                           Hash Cond: (tbl_archive.fk_share_id =
> tbl_share.pk_share_id)
>                           ->  Nested Loop  (cost=6.27..114843.77
> rows=3853 width=146) (actual time=167.608..5093.255 rows=730 loops=1)
>                                 ->  Nested Loop
> (cost=6.27..32079.13 rows=1167 width=146) (actual
> time=130.594..3448.927 rows=330 loops=1)
>                                       Join Filter:
> (tbl_archive.pk_archive_id = tbl_structure.fk_archive_id)
>                                       ->  Nested Loop
> (cost=4.89..31185.00 rows=1167 width=138) (actual
> time=119.861..3402.527 rows=330 loops=1)
>                                             ->  Nested Loop
> (cost=4.89..24413.49 rows=1167 width=56) (actual
> time=82.203..2097.142 rows=330 loops=1)
>                                                   ->  Index Scan
> using tbl_file_idx6 on tbl_file  (cost=0.00..344.56 rows=114
> width=40) (actual time=13.164..15.475 rows=106 loops=1)
>                                                         Index Cond:
> (fk_filetype_id = 83)
>                                                   ->  Bitmap Heap
> Scan on tbl_file_structure  (cost=4.89..210.27 rows=69 width=24)
> (actual time=8.736..19.606 rows=3 loops=106)
>                                                         Recheck
> Cond: (tbl_file.pk_file_id = tbl_file_structure.fk_file_id)
>                                                         ->  Bitmap
> Index Scan on tbl_file_structure_idx  (cost=0.00..4.88 rows=69
> width=0) (actual time=2.574..2.574 rows=3 loops=106)
>                                                               Index
> Cond: (tbl_file.pk_file_id = tbl_file_structure.fk_file_id)
>                                             ->  Index Scan using
> tbl_structure_pkey on tbl_structure  (cost=0.00..5.79 rows=1
> width=98) (actual time=3.942..3.945 rows=1 loops=330)
>                                                   Index Cond:
> (tbl_structure.pk_structure_id = tbl_file_structure.fk_structure_id)
>                                       ->  Materialize
> (cost=1.37..1.71 rows=34 width=16) (actual time=0.034..0.077
> rows=34 loops=330)
>                                             ->  Seq Scan on
> tbl_archive  (cost=0.00..1.34 rows=34 width=16) (actual
> time=10.606..10.693 rows=34 loops=1)
>                                                   Filter:
> archive_complete
>                                 ->  Index Scan using tbl_acl_idx on
> tbl_acl  (cost=0.00..70.47 rows=36 width=8) (actual
> time=4.964..4.971 rows=2 loops=330)
>                                       Index Cond:
> (tbl_acl.fk_file_structure_id =
> tbl_file_structure.pk_file_structure_id)
>                           ->  Hash  (cost=2.16..2.16 rows=4
> width=41) (actual time=17.890..17.890 rows=4 loops=1)
>                                 ->  Hash Join  (cost=1.07..2.16
> rows=4 width=41) (actual time=17.848..17.872 rows=4 loops=1)
>                                       Hash Cond:
> (tbl_share.fk_computer_id = tbl_computer.pk_computer_id)
>                                       ->  Seq Scan on tbl_share
> (cost=0.00..1.04 rows=4 width=29) (actual time=0.093..0.099 rows=4
> loops=1)
>                                       ->  Hash  (cost=1.03..1.03
> rows=3 width=20) (actual time=17.724..17.724 rows=3 loops=1)
>                                             ->  Seq Scan on
> tbl_computer  (cost=0.00..1.03 rows=3 width=20) (actual
> time=17.697..17.706 rows=3 loops=1)
> Total runtime: 5139.608 ms
>
>
>
>> The default statistics target is 10; the maximum value is 1000.
>>
>> On some systems I've found that reducing random_page_cost from 4
>> (the default) to 2 gives more realistic cost estimates for index
>> scans.
>>
>> --
>> Michael Fuhr
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings


Re: Query not using index pgsql 8.2.3

From
Michael Fuhr
Date:
On Fri, Mar 23, 2007 at 02:13:57PM +0100, Henrik Zagerholm wrote:
> I'm just amazed that setting the statistics threshold on one column
> mad all the difference. IS there any guidelines on what columns I
> should change the statistics on?

Start by looking for columns involved in simple comparisons with a
constant -- especially if you know that the distribution of values
is uneven -- and see how far off the row count estimate is for
various values.  You can use simple queries such as

EXPLAIN ANALYZE SELECT 1 FROM tbl_file WHERE fk_filetype_id = 83;

To learn more read "How the Planner Uses Statistics" in the
documentation:

http://www.postgresql.org/docs/8.2/interactive/planner-stats-details.html

Among the configuration settings to consider changing are shared_buffers,
effective_cache_size, work_mem, and random_page_cost.  The following
is a good starting point:

http://www.powerpostgresql.com/PerfList

If you have additional performance-related questions then consider
posting to pgsql-performance.

--
Michael Fuhr