Greg,
> So 800kB/s for random access reads. And 40Mb/s for sequential reads. That's
> a factor of 49. I don't think anyone wants random_page_cost to be set to 50
> though.
>
> For a high end 15k drive I see average seek times get as low as 3ms. And
> sustained transfer rates get as high as 100Mb/s. So about 2.7Mb/s for
> random access reads or about a random_page_cost of 37. Still pretty
> extreme.
Actually, what you're demonstrating here is that there's really no point in
having a random_page_cost GUC, since the seek/scan ratio is going to be high
regardless.
Although I can point out that you left out the fact that the disk needs to do
a seek to find the beginning of the seq scan area, and even then some file
fragmentation is possible. Finally, I've never seen PostgreSQL manage more
than 70% of the maximum read rate, and in most cases more like 30%.
> So what's going on with the empirically derived value of 4?
It's not empirically derived; it's a value we plug into an
internal-to-postgresql formula. And "4" is a fairly conservative value that
works for a lot of systems.
Realistically, the values we should be deriving from are:
-- median file cache size for postgresql files
-- average disk read throughput
-- effective processor calculation throughput
-- median I/O contention
However, working those 4 hardware "facts" into forumulas that allow us to
calculate the actual cost of a query execution plan is somebody's PhD paper.
--
Josh Berkus
Aglio Database Solutions
San Francisco