Re: cpu_tuple_cost - Mailing list pgsql-performance

From Josh Berkus
Subject Re: cpu_tuple_cost
Date
Msg-id 200503161003.10916.josh@agliodbs.com
Whole thread Raw
In response to Re: cpu_tuple_cost  (Greg Stark <gsstark@mit.edu>)
Responses Re: cpu_tuple_cost
Re: cpu_tuple_cost
List pgsql-performance
Greg,

> So 800kB/s for random access reads. And 40Mb/s for sequential reads. That's
> a factor of 49. I don't think anyone wants random_page_cost to be set to 50
> though.
>
> For a high end 15k drive I see average seek times get as low as 3ms. And
> sustained transfer rates get as high as 100Mb/s. So about 2.7Mb/s for
> random access reads or about a random_page_cost of 37. Still pretty
> extreme.

Actually, what you're demonstrating here is that there's really no point in
having a random_page_cost GUC, since the seek/scan ratio is going to be high
regardless.

Although I can point out that you left out the fact that the disk needs to do
a seek to find the beginning of the seq scan area, and even then some file
fragmentation is possible.   Finally, I've never seen PostgreSQL manage more
than 70% of the maximum read rate, and in most cases more like 30%.

> So what's going on with the empirically derived value of 4?

It's not empirically derived; it's a value we plug into an
internal-to-postgresql formula.   And "4" is a fairly conservative value that
works for a lot of systems.

Realistically, the values we should be deriving from are:
-- median file cache size for postgresql files
-- average disk read throughput
-- effective processor calculation throughput
-- median I/O contention

However, working those 4 hardware "facts" into forumulas that allow us to
calculate the actual cost of a query execution plan is somebody's PhD paper.

--
Josh Berkus
Aglio Database Solutions
San Francisco

pgsql-performance by date:

Previous
From: Laurent Martelli
Date:
Subject: Speeding up select distinct
Next
From: PFC
Date:
Subject: Re: Speeding up select distinct