Re: sub select performance due to seq scans - Mailing list pgsql-performance

From Scott Marlowe
Subject Re: sub select performance due to seq scans
Date
Msg-id 1154537296.7882.59.camel@state.g2switchworks.com
Whole thread Raw
In response to Re: sub select performance due to seq scans  (H Hale <hhale21@rogers.com>)
Responses Re: sub select performance due to seq scans
List pgsql-performance
On Wed, 2006-08-02 at 07:17, H Hale wrote:
> Initial testing was with data that essentially looks like a single collection with many items.
> I then changed this to have 60 collections of 50 items.
> The result, much better (but not optimum) use of indexs, but a seq scan still
> used.
>
> Turning seq scan off, all indexes where used.
> Query was much faster (1.5ms vs 300ms).
>
> I have tried to increase stats collection...
>
> alter table capsa.flatommemberrelation column srcobj set statistics 1000;
> alter table capsa.flatommemberrelation column dstobj set statistics 1000;
> alter table capsa.flatommemberrelation column objectid set statistics 1000;
> alter table capsa.flatomfilesysentry column objectid set statistics 1000;
> vacuum full analyze;
> Experimented with many postgres memory parameters.
> No difference.
>
> Is seq scan off the solution here?

It almost never is the right answer.

> My tests are with a relatively small number of records.
> My concern here is what happens with 100,000's
>  of records and seq scan off?

What you need to do is tune PostgreSQL to match your predicted usage
patterns.

Will most or all of your dataset always fit in RAM?  Then you can tune
random_page_cost down near 1.0  normally for large memory / small data
set servers, 1.2 to 1.4 is about optimal.  There will still be times
when seq scan is a win.  You can build a test data set of about the size
you'll expect to run in the future, and take a handful of the queries
you'll be running, and use more and less versions of those queries and
explain analyze to get an idea of about where random_page_cost should
be.  Make sure analyze has been run and that the statistics are fairly
accurate.

effective_cache_size should be set to some reasonable size based on the
steady state size of your machine's kernel cache + disk buffers,
preferably before you tune random_page_cost too much.

There are other numbers you can tune as well (the cpu cost ones in
particular).  If you find yourself needing values of random_page_cost at
1.0 or below to get the planner to make the right choices, then you've
got issues.  Otherwise, if a number between 1.2 and 2.0 make it work
right, you're likely set for a while.

pgsql-performance by date:

Previous
From: H Hale
Date:
Subject: Re: sub select performance due to seq scans
Next
From: "Milen Kulev"
Date:
Subject: Re: XFS filessystem for Datawarehousing