Home > mailing lists

Re: cpu_tuple_cost - Mailing list pgsql-performance

From	Greg Stark
Subject	Re: cpu_tuple_cost
Date	March 16, 2005 10:24:38
Msg-id	87acp4kq9o.fsf@stark.xeocode.com Whole thread Raw
In response to	Re: cpu_tuple_cost (David Brown <time@bigpond.net.au>)
Responses	Re: cpu_tuple_cost (Josh Berkus <josh@agliodbs.com>)
List	pgsql-performance

Tree view

David Brown <time@bigpond.net.au> writes:

> Gregory Stark wrote:
>
> >The "this day and age" argument isn't very convincing. Hard drive capacity
> >growth has far outstripped hard drive seek time and bandwidth improvements.
> >Random access has more penalty than ever.
>
> In point of fact, there haven't been noticeable seek time improvements for
> years. Transfer rates, on the other hand, have gone through the roof.

Er, yeah. I stated it wrong. The real ratio here is between seek time and
throughput.

Typical 7200RPM drives have average seek times are in the area of 10ms.
Typical sustained transfer rates are in the range of 40Mb/s. Postgres reads
8kB blocks at a time.

So 800kB/s for random access reads. And 40Mb/s for sequential reads. That's a
factor of 49. I don't think anyone wants random_page_cost to be set to 50
though.

For a high end 15k drive I see average seek times get as low as 3ms. And
sustained transfer rates get as high as 100Mb/s. So about 2.7Mb/s for random
access reads or about a random_page_cost of 37. Still pretty extreme.

So what's going on with the empirically derived value of 4? Perhaps this is
because even though Postgres is reading an entire table sequentially it's
unlikely to be the only I/O consumer? The sequential reads would be
interleaved occasionally by some other I/O forcing a seek to continue.

In which case the true random_page_cost seems like it would be extremely
sensitive to the amount of readahead the OS does. To reach a random_page_cost
of 4 given the numbers above for a 7200RPM drive requires that just under 25%
of the I/O of a sequential table scan be random seeks [*]. That translates to
32kB of sequential reading, which actually does sound like a typical value for
OS readahead.

I wonder if those same empirical tests would show even higher values of
random_page_cost if the readahead were turned up to 64kB or 128kB.

[*] A bit of an algebraic diversion:

    1s/10ms = 100 random buffers/s.
    random_page_cost = 4 so net sequential buffers/s = 400.

    solve:

    400 buffers = rnd+seq
    1000ms = .2*seq + 10*rnd

--
greg

pgsql-performance by date:

From: Tom Lane
Date: 16 March 2005, 07:45:03
Subject: Re: One tuple per transaction

From: Stef
Date: 16 March 2005, 11:00:24
Subject: Re: Slow loads when indexes added.

Re: cpu_tuple_cost - Mailing list pgsql-performance

Previous

Next