Re: Random Page Cost and Planner - Mailing list pgsql-performance

From Cédric Villemain
Subject Re: Random Page Cost and Planner
Date
Msg-id AANLkTimtdk11hrWd-OpvXGLrh-4pukGd9eHEUemEM3OR@mail.gmail.com
Whole thread Raw
In response to Re: Random Page Cost and Planner  (David Jarvis <thangalin@gmail.com>)
Responses Re: Random Page Cost and Planner
List pgsql-performance
2010/5/27 David Jarvis <thangalin@gmail.com>:
> Hi, Bryan.
>
> I was just about to reply to the thread, thanks for asking. Clustering was
> key. After rebooting the machine (just to make sure absolutely nothing was
> cached), I immediately ran a report on Toronto: 5.25 seconds!
>
> Here's what I did:
>
> Created a new set of tables that matched the old set, with statistics of
> 1000 on the station and taken (date) columns.
> Inserted the data from the old hierarchy into the new set, ordered by
> station id then by date (same seven child tables as before: one per
> category).
>
> I wanted to ensure a strong correlation between primary key and station id.
>
> Added three indexes per table: (a) station id; (b) date taken; and (c)
> station-taken-category.
> Set the station-taken-category index as CLUSTER.
> Vacuumed the new tables.
> Dropped the old tables.
> Set the following configuration values:
>
> shared_buffers = 1GB
> temp_buffers = 32MB
> work_mem = 32MB
> maintenance_work_mem = 64MB
> seq_page_cost = 1.0
> random_page_cost = 2.0
> cpu_index_tuple_cost = 0.001
> effective_cache_size = 512MB
>
> I ran a few more reports (no reboots, but reading vastly different data
> sets):
>
> Vancouver: 4.2s
> Yellowknife: 1.7s
> Montreal: 6.5s
> Trois-Riviers: 2.8s
>
> No full table scans. I imagine some indexes are not strictly necessary and
> will test to see which can be removed (my guess: the station and taken
> indexes). The problem was that the station ids were scattered and so
> PostgreSQL presumed a full table scan would be faster.
>
> Physically ordering the data by station ids triggers index use every time.
>
> Next week's hardware upgrade should halve those times -- unless anyone has
> further suggestions to squeeze more performance out of PG. ;-)

I wonder what the plan will be if you replace sc.taken_* in :
m.taken BETWEEN sc.taken_start AND sc.taken_end
by values. It might help the planner...

Also, I'll consider explicit ordered join but I admit I haven't read
the whole thread (in particular the table size).
Ho, and I set statistics to a highter value for column category_id,
table station_category  (seeing the same resquest and explain analyze
without date in the query will help)


>
> Dave
>
>



--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

pgsql-performance by date:

Previous
From: Cédric Villemain
Date:
Subject: Re: shared_buffers advice
Next
From: Konrad Garus
Date:
Subject: Re: shared_buffers advice