Re: Index not being used in sorting of simple table - Mailing list pgsql-performance

From Paul Smith
Subject Re: Index not being used in sorting of simple table
Date
Msg-id VPOP32.5.0.20070504171050.732.408d.1.f0e0d0d3@lmail.pscs.co.uk
Whole thread Raw
In response to Re: Index not being used in sorting of simple table  (Heikki Linnakangas <heikki@enterprisedb.com>)
List pgsql-performance
At 16:26 04/05/2007, you wrote:
>Paul Smith wrote:
>>Why doesn't it use the other index? If use 'set enable_seqscan=0'
>>then it does.
>
>Just a guess, but is the table clustered on column a? Maybe not
>explicitly, but was it loaded from data that was sorted by a?

I wouldn't have thought so - a is pretty 'random' as far as order of
insertion goes. On the other hand 'b' (the one whose index doesn't
get used) is probably pretty correlated - 'b' is the date when the
entry was added to the table, so they would be added in order of 'b'
(they also get deleted after a while, and I'm not sure how PGSQL
re-uses deleted rows that have been vacuumed)

>Analyzer calculates the correlation between physical order and each
>column. The planner will favor index scans instead of sorting when
>the correlation is strong, and it thinks the data doesn't fit in
>memory. Otherwise an explicitly sort will result in less I/O and be
>therefore more favorable.

Ah, I see.

>You can check the correlation stats with:
>SELECT tablename, attname, correlation FROM pg_stats where tablename='x';

There I get
  x   | a  |     0.977819
  x   | b  |     0.78292

This is a bit odd, because I'd have thought they'd be more correlated
on 'b' than 'a'..

>>I tried using EXPLAIN ANALYZE to see how long it actually took:
>>- seq scan - 75 secs
>>- index scan - 13 secs
>>- seq scan - 77 secs
>
>>(I tried the seq scan version after the index scan as well to see
>>if disk caching was a factor, but it doesn't look like it)
>
>That won't flush the heap pages from cache...

No, I know, but it would mean that if the pages were being loaded
into disk cache by the first scan which would make the second scan
quicker, it would probably make the third one quicker as well.

>How much memory do you have and how large is the table?

The table is about 300MB. I have 2GB RAM on my PC (but most of it is
in use - the disk cache size is currently 600MB).

>I suspect that the planner thinks it doesn't fit in memory, and
>therefore favors the seqscan+sort plan which would require less random I/O,
>but in reality it's in cache and the index scan is faster because it
>doesn't need to sort. Have you set your effective_cache_size properly?

I haven't set that at all - it's the default..

If I set this to 51200 (I think that means 400MB) then it does use
the index scan method, so thanks for this bit of info.


Paul                            VPOP3 - Internet Email Server/Gateway
support@pscs.co.uk                      http://www.pscs.co.uk/



pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Index not being used in sorting of simple table
Next
From: Sebastian Hennebrueder
Date:
Subject: Re: Feature Request --- was: PostgreSQL Performance Tuning