Thread: BLCKSZ
src/include/pg_config_manual.h define BLCKSZ 8196 (8kb). Somewhere I readed BLCKSZ must be equal to memory page of operational system. And default BLCKSZ 8kb because first OS where postgres was build has memory page size 8kb. I try to test this. Linux, memory page 4kb, disk page 4kb. I set BLCKSZ to 4kb. I get some performance improve, but not big, may be because I have 4Gb on test server (amd64). Can anyone test it also? May be better move BLCKSZ from pg_config_manual.h to pg_config.h? -- Olleg Samoylov
Olleg Samoylov <olleg_s@mail.ru> writes: > I try to test this. Linux, memory page 4kb, disk page 4kb. I set BLCKSZ > to 4kb. I get some performance improve, but not big, may be because I > have 4Gb on test server (amd64). It's highly unlikely that reducing BLCKSZ is a good idea. There are bad side-effects on the maximum index entry size, maximum number of tuple fields, etc. In any case, when you didn't say *what* you tested, it's impossible to judge the usefulness of the change. regards, tom lane
Tom Lane wrote: > Olleg Samoylov <olleg_s@mail.ru> writes: > >>I try to test this. Linux, memory page 4kb, disk page 4kb. I set BLCKSZ >>to 4kb. I get some performance improve, but not big, may be because I >>have 4Gb on test server (amd64). > > It's highly unlikely that reducing BLCKSZ is a good idea. There are bad > side-effects on the maximum index entry size, maximum number of tuple > fields, etc. Yes, when I set BLCKSZ=512, database dont' work. With BLCKSZ=1024 database very slow. (This was surprise me. I expect increase performance in 8 times with 1024 BLCKSZ. :) ) As I already see in this maillist, increase of BLCKSZ reduce performace too. May be exist optimum value? Theoretically BLCKSZ equal memory/disk page/block size may reduce defragmentation drawback of memory and disk. > In any case, when you didn't say *what* you tested, it's > impossible to judge the usefulness of the change. > regards, tom lane I test performace on database test server. This is copy of working billing system to test new features and experiments. Test task was one day traffic log. Average time of a one test was 260 minutes. Postgresql 7.4.8. Server dual Opteron 240, 4Gb RAM. -- Olleg
Olleg wrote: > I test performace on database test server. This is copy of working > billing system to test new features and experiments. Test task was one > day traffic log. Average time of a one test was 260 minutes. Postgresql > 7.4.8. Server dual Opteron 240, 4Gb RAM. Did you execute queries from the log, one after another? That may not be a representative test -- try sending multiple queries in parallel, to see how the server would perform in the real world. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
At 04:32 PM 12/5/2005, Olleg wrote: >Tom Lane wrote: >>Olleg Samoylov <olleg_s@mail.ru> writes: >> >>>I try to test this. Linux, memory page 4kb, disk page 4kb. I set >>>BLCKSZ to 4kb. I get some performance improve, but not big, may be >>>because I have 4Gb on test server (amd64). >>It's highly unlikely that reducing BLCKSZ is a good idea. There >>are bad side-effects on the maximum index entry size, maximum >>number of tuple fields, etc. > >Yes, when I set BLCKSZ=512, database dont' work. With BLCKSZ=1024 >database very slow. (This was surprise me. I expect increase >performance in 8 times with 1024 BLCKSZ. :) ) No wonder pg did not work or was very slow BLCKSZ= 512 or 1024 means 512 or 1024 *Bytes* respectively. That's 1/16 and 1/8 the default 8KB BLCKSZ. > As I already see in this maillist, increase of BLCKSZ reduce > performace too. Where? BLCKSZ as large as 64KB has been shown to improve performance. If running a RAID, BLCKSZ of ~1/2 the RAID stripe size seems to be a good value. >May be exist optimum value? Theoretically BLCKSZ equal memory/disk >page/block size may reduce defragmentation drawback of memory and disk. Of course there's an optimal value... ...and of course it is dependent on your HW, OS, and DB application. In general, and in a very fuzzy sense, "bigger is better". pg files are laid down in 1GB chunks, so there's probably one limitation. Given the HW you have mentioned, I'd try BLCKSZ= 65536 (you may have to recompile your kernel) and a RAID stripe of 128KB or 256KB as a first guess. >>In any case, when you didn't say *what* you tested, it's >>impossible to judge the usefulness of the change. >> regards, tom lane > >I test performace on database test server. This is copy of working >billing system to test new features and experiments. Test task was >one day traffic log. Average time of a one test was 260 minutes. How large is a record in your billing system? You want it to be an integer divisor of BLCKSZ (so for instance odd sizes in Bytes are BAD), Beyond that, you application domain matters. OLTP like systems need low latency access for frequent small transactions. Data mining like systems need to do IO in as big a chunk as the HW and OS will allow. Probably a good idea for BLCKSZ to be _at least_ max(8KB, 2x record size) > Postgresql 7.4.8. Server dual Opteron 240, 4Gb RAM. _Especially_ with that HW, upgrade to at least 8.0.x ASAP. It's a good idea to not be running pg 7.x anymore anyway, but it's particularly so if you are running 64b SMP boxes. Ron
Ron <rjpeace@earthlink.net> writes: > Where? BLCKSZ as large as 64KB has been shown to improve > performance. Not in the Postgres context, because you can't set BLCKSZ higher than 32K without doing extensive surgery on the page item pointer layout. If anyone's actually gone to that much trouble, they sure didn't publicize their results ... >> Postgresql 7.4.8. Server dual Opteron 240, 4Gb RAM. > _Especially_ with that HW, upgrade to at least 8.0.x ASAP. It's a > good idea to not be running pg 7.x anymore anyway, but it's > particularly so if you are running 64b SMP boxes. I agree with this bit --- 8.1 is a significant improvement on any prior version for SMP boxes. It's likely that 8.2 will be better yet, because this is an area we just recently started paying serious attention to. regards, tom lane
Ron wrote: > In general, and in a very fuzzy sense, "bigger is better". pg files are > laid down in 1GB chunks, so there's probably one limitation. Hm, expect result of tests on other platforms, but if there theoretical dispute... I can't undestand why "bigger is better". For instance in search by index. Index point to page and I need load page to get one row. Thus I load 8kb from disk for every raw. And keep it then in cache. You recommend 64kb. With your recomendation I'll get 8 times more IO throughput, 8 time more head seek on disk, 8 time more memory cache (OS cache and postgresql) become busy. I have small row in often loaded table, 32 bytes. Table is not clustered, used several indices. And you recommend load 64Kb when I need only 32b, isn't it? -- Olleg
On Tue, Dec 06, 2005 at 01:40:47PM +0300, Olleg wrote: > I can't undestand why "bigger is better". For instance in search by > index. Index point to page and I need load page to get one row. Thus I > load 8kb from disk for every raw. And keep it then in cache. You > recommend 64kb. With your recomendation I'll get 8 times more IO > throughput, 8 time more head seek on disk, 8 time more memory cache (OS > cache and postgresql) become busy. Hopefully, you won't have eight times the seeking; a single block ought to be in one chunk on disk. You're of course at your filesystem's mercy, though. /* Steinar */ -- Homepage: http://www.sesse.net/
On Tue, 6 Dec 2005, Steinar H. Gunderson wrote: > On Tue, Dec 06, 2005 at 01:40:47PM +0300, Olleg wrote: >> I can't undestand why "bigger is better". For instance in search by >> index. Index point to page and I need load page to get one row. Thus I >> load 8kb from disk for every raw. And keep it then in cache. You >> recommend 64kb. With your recomendation I'll get 8 times more IO >> throughput, 8 time more head seek on disk, 8 time more memory cache (OS >> cache and postgresql) become busy. > > Hopefully, you won't have eight times the seeking; a single block ought to be > in one chunk on disk. You're of course at your filesystem's mercy, though. in fact useually it would mean 1/8 as many seeks, since the 64k chunk would be created all at once it's probably going to be one chunk on disk as Steiner points out and that means that you do one seek per 64k instead of one seek per 8k. With current disks it's getting to the point where it's the same cost to read 8k as it is to read 64k (i.e. almost free, you could read substantially more then 64k and not notice it in I/O speed), it's the seeks that are expensive. yes it will eat up more ram, but assuming that you are likly to need other things nearby it's likly to be a win. as processor speed keeps climing compared to memory and disk speed true random access is really not the correct way to think about I/O anymore. It's frequently more appropriate to think of your memory and disks as if they were tape drives (seek then read, repeat) even for memory access what you really do is seek to the beginning of a block (expensive) then read that block into cache (cheap, you get the entire cacheline of 64-128 bytes no matter if you need it or not) and then you can then access that block fairly quickly. with memory on SMP machines it's a constant cost to seek anywhere in memory, with NUMA machines (including multi-socket Opterons) the cost to do the seek and fetch depends on where in memory you are seeking to and what cpu you are running on. it also becomes very expensive for multiple CPU's to write to memory addresses that are in the same block (cacheline) of memory. for disks it's even more dramatic, the seek is incredibly expensive compared to the read/write, and the cost of the seek varies based on how far you need to seek, but once you are on a track you can read the entire track in for about the same cost as a single block (in fact the drive useually does read the entire track before sending the one block on to you). Raid complicates this becouse you have a block size per drive and reading larger then that block size involves multiple drives. most of the work in dealing with these issues and optimizing for them is the job of the OS, some other databases work very hard to take over this work from the OS, Postgres instead tries to let the OS do this work, but we still need to keep it in mind when configuring things becouse it's possible to make it much easier or much harder for the OS optimize things. David Lang