Re: BLCKSZ - Mailing list pgsql-performance
From | David Lang |
---|---|
Subject | Re: BLCKSZ |
Date | |
Msg-id | Pine.LNX.4.62.0512060318070.2807@qnivq.ynat.uz Whole thread Raw |
In response to | Re: BLCKSZ ("Steinar H. Gunderson" <sgunderson@bigfoot.com>) |
List | pgsql-performance |
On Tue, 6 Dec 2005, Steinar H. Gunderson wrote: > On Tue, Dec 06, 2005 at 01:40:47PM +0300, Olleg wrote: >> I can't undestand why "bigger is better". For instance in search by >> index. Index point to page and I need load page to get one row. Thus I >> load 8kb from disk for every raw. And keep it then in cache. You >> recommend 64kb. With your recomendation I'll get 8 times more IO >> throughput, 8 time more head seek on disk, 8 time more memory cache (OS >> cache and postgresql) become busy. > > Hopefully, you won't have eight times the seeking; a single block ought to be > in one chunk on disk. You're of course at your filesystem's mercy, though. in fact useually it would mean 1/8 as many seeks, since the 64k chunk would be created all at once it's probably going to be one chunk on disk as Steiner points out and that means that you do one seek per 64k instead of one seek per 8k. With current disks it's getting to the point where it's the same cost to read 8k as it is to read 64k (i.e. almost free, you could read substantially more then 64k and not notice it in I/O speed), it's the seeks that are expensive. yes it will eat up more ram, but assuming that you are likly to need other things nearby it's likly to be a win. as processor speed keeps climing compared to memory and disk speed true random access is really not the correct way to think about I/O anymore. It's frequently more appropriate to think of your memory and disks as if they were tape drives (seek then read, repeat) even for memory access what you really do is seek to the beginning of a block (expensive) then read that block into cache (cheap, you get the entire cacheline of 64-128 bytes no matter if you need it or not) and then you can then access that block fairly quickly. with memory on SMP machines it's a constant cost to seek anywhere in memory, with NUMA machines (including multi-socket Opterons) the cost to do the seek and fetch depends on where in memory you are seeking to and what cpu you are running on. it also becomes very expensive for multiple CPU's to write to memory addresses that are in the same block (cacheline) of memory. for disks it's even more dramatic, the seek is incredibly expensive compared to the read/write, and the cost of the seek varies based on how far you need to seek, but once you are on a track you can read the entire track in for about the same cost as a single block (in fact the drive useually does read the entire track before sending the one block on to you). Raid complicates this becouse you have a block size per drive and reading larger then that block size involves multiple drives. most of the work in dealing with these issues and optimizing for them is the job of the OS, some other databases work very hard to take over this work from the OS, Postgres instead tries to let the OS do this work, but we still need to keep it in mind when configuring things becouse it's possible to make it much easier or much harder for the OS optimize things. David Lang
pgsql-performance by date: