Re: cpu_tuple_cost - Mailing list pgsql-performance

From Greg Stark
Subject Re: cpu_tuple_cost
Date
Msg-id 87acp4kq9o.fsf@stark.xeocode.com
Whole thread Raw
In response to Re: cpu_tuple_cost  (David Brown <time@bigpond.net.au>)
Responses Re: cpu_tuple_cost  (Josh Berkus <josh@agliodbs.com>)
List pgsql-performance
David Brown <time@bigpond.net.au> writes:

> Gregory Stark wrote:
>
> >The "this day and age" argument isn't very convincing. Hard drive capacity
> >growth has far outstripped hard drive seek time and bandwidth improvements.
> >Random access has more penalty than ever.
>
> In point of fact, there haven't been noticeable seek time improvements for
> years. Transfer rates, on the other hand, have gone through the roof.

Er, yeah. I stated it wrong. The real ratio here is between seek time and
throughput.

Typical 7200RPM drives have average seek times are in the area of 10ms.
Typical sustained transfer rates are in the range of 40Mb/s. Postgres reads
8kB blocks at a time.

So 800kB/s for random access reads. And 40Mb/s for sequential reads. That's a
factor of 49. I don't think anyone wants random_page_cost to be set to 50
though.

For a high end 15k drive I see average seek times get as low as 3ms. And
sustained transfer rates get as high as 100Mb/s. So about 2.7Mb/s for random
access reads or about a random_page_cost of 37. Still pretty extreme.

So what's going on with the empirically derived value of 4? Perhaps this is
because even though Postgres is reading an entire table sequentially it's
unlikely to be the only I/O consumer? The sequential reads would be
interleaved occasionally by some other I/O forcing a seek to continue.

In which case the true random_page_cost seems like it would be extremely
sensitive to the amount of readahead the OS does. To reach a random_page_cost
of 4 given the numbers above for a 7200RPM drive requires that just under 25%
of the I/O of a sequential table scan be random seeks [*]. That translates to
32kB of sequential reading, which actually does sound like a typical value for
OS readahead.

I wonder if those same empirical tests would show even higher values of
random_page_cost if the readahead were turned up to 64kB or 128kB.




[*] A bit of an algebraic diversion:

    1s/10ms = 100 random buffers/s.
    random_page_cost = 4 so net sequential buffers/s = 400.

    solve:

    400 buffers = rnd+seq
    1000ms = .2*seq + 10*rnd


--
greg

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: One tuple per transaction
Next
From: Stef
Date:
Subject: Re: Slow loads when indexes added.