Home > mailing lists

Re: Random Page Cost and Planner - Mailing list pgsql-performance

From	David Jarvis
Subject	Re: Random Page Cost and Planner
Date	May 27, 2010 07:43:24
Msg-id	AANLkTikjkEvj2hG-D1CMu49PQXXbxHVzpXm9JbLWzMQo@mail.gmail.com Whole thread Raw
In response to	Random Page Cost and Planner (David Jarvis <thangalin@gmail.com>)
Responses	Re: Random Page Cost and Planner (Cédric Villemain <cedric.villemain.debian@gmail.com>) Re: Random Page Cost and Planner (Bryan Hinton <bryan@bryanhinton.com>)
List	pgsql-performance

Tree view

Hi, Bryan.

I was just about to reply to the thread, thanks for asking. Clustering was key. After rebooting the machine (just to make sure absolutely nothing was cached), I immediately ran a report on Toronto: 5.25 seconds!

Here's what I did:

Created a new set of tables that matched the old set, with statistics of 1000 on the station and taken (date) columns.
Inserted the data from the old hierarchy into the new set, ordered by station id then by date (same seven child tables as before: one per category).
- I wanted to ensure a strong correlation between primary key and station id.
Added three indexes per table: (a) station id; (b) date taken; and (c) station-taken-category.
Set the station-taken-category index as CLUSTER.
Vacuumed the new tables.
Dropped the old tables.
Set the following configuration values:
- shared_buffers = 1GB
- temp_buffers = 32MB
- work_mem = 32MB
- maintenance_work_mem = 64MB
- seq_page_cost = 1.0
- random_page_cost = 2.0
- cpu_index_tuple_cost = 0.001
- effective_cache_size = 512MB

I ran a few more reports (no reboots, but reading vastly different data sets):

Vancouver: 4.2s
Yellowknife: 1.7s
Montreal: 6.5s
Trois-Riviers: 2.8s

No full table scans. I imagine some indexes are not strictly necessary and will test to see which can be removed (my guess: the station and taken indexes). The problem was that the station ids were scattered and so PostgreSQL presumed a full table scan would be faster.

Physically ordering the data by station ids triggers index use every time.

Next week's hardware upgrade should halve those times -- unless anyone has further suggestions to squeeze more performance out of PG. ;-)

Dave

pgsql-performance by date:

From: Konrad Garus
Date: 27 May 2010, 07:24:33
Subject: Re: shared_buffers advice

From: Cédric Villemain
Date: 27 May 2010, 07:51:17
Subject: Re: shared_buffers advice

Re: Random Page Cost and Planner - Mailing list pgsql-performance

Previous

Next