Thread: 8.3beta1 testing on Solaris
Update on my testing 8.3beta1 on Solaris. * CLOG reads * Asynchronous Commit benefit * Hot CPU Utilization Regards, Jignesh __Background_:_ We were using PostgreSQL 8.3beta1 testing on our latest Sun SPARC Enterprise T5220 Server using Solaris 10 8/07 and Sun Fire X4200 using Solaris 10 8/07. Generally for performance benefits in Solaris we put file systems on forcedirectio we bypass the filesystem cache and go direct to disks. __Problem_:_ What we were observing that there were lots of reads happening about 4MB/sec on the file system holding $PGDATA and the database tables during an OLTP Benchmark run. Initially we thought that our bufferpools were not big enough. But thanks to 64-bit builds we could use bigger bufferpools. However even with extraordinary bufferpool sizes we still saw lots of reads going to the disks. __DTrace to the Rescue_:_ I modified iosnoop.d to just snoop on reads. The modified rsnoop.d is as follows: $ cat rsnoop.d #!/usr/sbin/dtrace -s syscall::read:entry /execname=="postgres"/ { printf("pid %d reading %s\n", pid, fds[arg0].fi_pathname); } Based on it I found that most postgresql processes were doing lots of reads from pg_clog directory. CLOG or commit logs keep track of transactions in flight. Writes of CLOG comes from recording of transaction commits( or when it aborts) or when an XLOG is generated. However though I am not clear on reads yet, it seems every process constantly reads it to get some status. CLOG data is not cached in any PostgreSQL shared memory segments and hence becomes the bottleneck as it has to constantly go to the filesystem to get the read data. # ./rsnoop.d dtrace: script './rsnoop.d' matched 1 probe CPU ID FUNCTION:NAME 0 49222 read:entry pid 8739 reading /export/home0/igen/pgdata/pg_clog/000C 0 49222 read:entry pid 9607 reading /export/home0/igen/pgdata/pg_clog/000C 0 49222 read:entry pid 9423 reading /export/home0/igen/pgdata/pg_clog/000C 0 49222 read:entry pid 8731 reading /export/home0/igen/pgdata/pg_clog/000C 0 49222 read:entry pid 8719 reading /export/home0/igen/pgdata/pg_clog/000C 0 49222 read:entry pid 9019 reading /export/home0/igen/pgdata/pg_clog/000C 1 49222 read:entry pid 9255 reading /export/home0/igen/pgdata/pg_clog/000C 1 49222 read:entry pid 8867 reading /export/home0/igen/pgdata/pg_clog/000C Later on during another run I added ustack() after the printf in the above script to get the function name also: # ./rsnoop.d dtrace: script './rsnoop.d' matched 1 probe CPU ID FUNCTION:NAME 0 49222 read:entry pid 10956 reading /export/home0/igen/pgdata/pg_clog/0011 libc.so.1`_read+0xa postgres`SimpleLruReadPage+0x3e6 postgres`SimpleLruReadPage_ReadOnly+0x9b postgres`TransactionIdGetStatus+0x1f postgres`TransactionIdDidCommit+0x42 postgres`HeapTupleSatisfiesVacuum+0x21a postgres`heap_prune_chain+0x14b postgres`heap_page_prune_opt+0x1e6 postgres`index_getnext+0x144 postgres`IndexNext+0xe1 postgres`ExecScan+0x189 postgres`ExecIndexScan+0x43 postgres`ExecProcNode+0x183 postgres`ExecutePlan+0x9e postgres`ExecutorRun+0xab postgres`PortalRunSelect+0x47a postgres`PortalRun+0x262 postgres`exec_execute_message+0x565 postgres`PostgresMain+0xf45 postgres`BackendRun+0x3f9 0 49222 read:entry pid 10414 reading /export/home0/igen/pgdata/pg_clog/0011 libc.so.1`_read+0xa postgres`SimpleLruReadPage+0x3e6 postgres`SimpleLruReadPage_ReadOnly+0x9b postgres`TransactionIdGetStatus+0x1f postgres`TransactionIdDidCommit+0x42 postgres`HeapTupleSatisfiesVacuum+0x21a postgres`heap_prune_chain+0x14b postgres`heap_page_prune_opt+0x1e6 postgres`index_getnext+0x144 postgres`IndexNext+0xe1 postgres`ExecScan+0x189 ^C libc.so.1`_read+0xa postgres`SimpleLruReadPage+0x3e6 postgres`SimpleLruReadPage_ReadOnly+0x9b postgres`TransactionIdGetStatus+0x1f postgres`TransactionIdDidCommit+0x42 postgres`HeapTupleSatisfiesMVCC+0x34f postgres`index_getnext+0x29e postgres`IndexNext+0xe1 postgres`ExecScan+0x189 postgres`ExecIndexScan+0x43 postgres`ExecProcNode+0x183 postgres`ExecutePlan+0x9e postgres`ExecutorRun+0xab postgres`PortalRunSelect+0x47a postgres`PortalRun+0x262 postgres`exec_execute_message+0x565 postgres`PostgresMain+0xf45 postgres`BackendRun+0x3f9 postgres`BackendStartup+0x271 postgres`ServerLoop+0x259 0 49222 read:entry pid 10186 reading /export/home0/igen/pgdata/pg_clog/0011 libc.so.1`_read+0xa postgres`SimpleLruReadPage+0x3e6 postgres`SimpleLruReadPage_ReadOnly+0x9b postgres`TransactionIdGetStatus+0x1f postgres`TransactionIdDidCommit+0x42 postgres`HeapTupleSatisfiesVacuum+0x21a postgres`heap_prune_chain+0x14b postgres`heap_page_prune_opt+0x1e6 postgres`index_getnext+0x144 postgres`IndexNext+0xe1 postgres`ExecScan+0x189 postgres`ExecIndexScan+0x43 postgres`ExecProcNode+0x183 postgres`ExecutePlan+0x9e postgres`ExecutorRun+0xab postgres`PortalRunSelect+0x47a postgres`PortalRun+0x262 postgres`exec_execute_message+0x565 postgres`PostgresMain+0xf45 postgres`BackendRun+0x3f9 So multiple processes are reading the same file. In this case since the file system is told not to cache files, hence all read ios are being sent to the disk to read the file again. __Workaround for the high reads on CLOG on Solaris_ : _Start with the cluster $PGDATA on regular UFS (which is buffered and logging is enabled). Always create a new tablespace for your database on forcedirectio mounted file system which bypasses the file system cache. This allows all PostgreSQL CLOG files to be cached in UFS greatly reducing stress on the underlying storage. For writes to the best of my knowledge, PostgreSQL will still do fsync to force the writes the CLOGs onto the disks so it is consistent. But the reads are spared from going to the disks and returned from the cache. __Result_:_ With rightly sized bufferpool now all database data can be in PostgreSQL cache and hence reads are spared from the tablespaces. As for PGDATA data, UFS will do the caching of CLOG files, etc and hence sparring reads from going to the disks again. In the end what we achieve is a right sized bufferpool where there are no reads required during a high OLTP environment and the disks are just busy doing the writes of updates and inserts. _Asynchronous Commit_: Also as requested by Josh, I tried out Asynchronous Commit in 8.3beta 1 I compared four scenarios on internal disks (the prime target) 1. Default options (commit_delay off, synchronous_commit=true) 2. With Commit_delay on 3. With Asynchronous and Commit_delay on 4. With Asynchronous commit but Commit_delay off 5. With Fsync off In 8.2 I found compared to (1),( 2) gave me a huge boost (2X) but fsync would be eventually even 2.8X faster than (2) In 8.3 hence I did not even test the default option and took (2) as my baseline run and found (3),(4),(5) pretty much gave me the similar boost 2.55X over my baseline run (2) since eventually I was CPU bound on my box and IO ended up handling well. (Though I found (5) was better in 8.2 compared to (5) in 8.3beta1 since it was getting CPU saturated slightly earlier) _Additional CPU consumption Findings_: In the lightweight OLTP Testing that was performed with about 1000 users, with 8.3 with the above workaround in place for CLOG. I reached a scenario where the system was out of CPU resources with about 1000 users. Anyway doing a quick profiling using the "hotuser" program available in the DTraceToolkit the top function is postgres`GetSnapshotData # ./hotuser -p 10559 .... postgres`hash_seq_term 1 2.1% postgres`SearchCatCache 2 4.2% postgres`hash_search_with_hash_value 4 8.3% postgres`GetSnapshotData 6 12.5% Also Lock Waits during the 1000 User run was as follows: # ./83_lwlock_wait.d 9231 Lock Id Mode Count WALInsertLock Exclusive 1 ProcArrayLock Exclusive 19 Lock Id Combined Time (ns) WALInsertLock 428507 ProcArrayLock 1009652776 # ./83_lwlock_wait.d 9153 Lock Id Mode Count CLogControlLock Exclusive 1 WALInsertLock Exclusive 1 ProcArrayLock Exclusive 15 Lock Id Combined Time (ns) CLogControlLock 25536 WALInsertLock 397265 ProcArrayLock 696894211 # My Guess is that the ProcArrayLock is coming from the GetSnapShotData function ... or maybe caused by it.. But I guess I will let the experts comment .. I am of the opionion that if we tune GetSnapShotData, then we should be able to handle more users. So overall I think I am excited with the 8.3beta1 performance specially in terms of asynchronous_commit however to get the CPU performance in line with GetSnapshotData() and also fixing the CLOG reading problem occuring in SimpleLruRead() will greatly enhance the performance of 8.3 for OLTP benchmarks.
"Jignesh K. Shah" <J.K.Shah@Sun.COM> writes: > CLOG data is > not cached in any PostgreSQL shared memory segments The above statement is utterly false, so your trace seems to indicate something broken. Are you sure these were the only reads of pg_clog files? Can you extend the tracing to determine which page of the file got read? I am wondering if your (unspecified) test load was managing to touch more pages of the clog than there is room for in shared memory. regards, tom lane
"Jignesh K. Shah" <J.K.Shah@Sun.COM> writes: > CLOG data is not cached in any PostgreSQL shared memory segments and hence > becomes the bottleneck as it has to constantly go to the filesystem to get > the read data. This is the same bottleneck you discussed earlier. CLOG reads are cached in the Postgres shared memory segment but only NUM_CLOG_BUFFERS are which defaults to 8 buffers of 8kb each. With 1,000 clients and the transaction rate you're running you needed a larger number of buffers. Using the filesystem buffer cache is also an entirely reasonable solution though. That's surely part of the logic behind not trying to keep more of the clog in shared memory. Do you have any measurements of how much time is being spent just doing the logical I/O to the buffer cache for the clog pages? 4MB/s seems like it's not insignificant but your machine is big enough that perhaps I'm thinking at the wrong scale. I'm really curious whether you see any benefit from the vxid read-only transactions. I'm not sure how to get an apples to apples comparison though. Ideally just comparing it to CVS HEAD from immediately prior to the vxid patch going in. Perhaps calling some function which forces an xid to be allocated and seeing how much it slows down the benchmark would be a good substitute. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > "Jignesh K. Shah" <J.K.Shah@Sun.COM> writes: >> CLOG data is >> not cached in any PostgreSQL shared memory segments > > The above statement is utterly false, so your trace seems to indicate > something broken. Are you sure these were the only reads of pg_clog > files? Can you extend the tracing to determine which page of the file > got read? I am wondering if your (unspecified) test load was managing > to touch more pages of the clog than there is room for in shared memory. Didn't we already go through this? He and Simon were pushing to bump up NUM_CLOG_BUFFERS and you were arguing that the test wasn't representative and some other clog.c would have to be reengineered to scale well to larger values. Also it seemed there were only modest improvements from raising the value and there would always be a ceiling to bump into so just raising the number of buffers isn't particularly interesting unless there's some magic numbers we're trying to hit. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > Didn't we already go through this? He and Simon were pushing to bump up > NUM_CLOG_BUFFERS and you were arguing that the test wasn't representative and > some other clog.c would have to be reengineered to scale well to larger > values. AFAIR we never did get any clear explanation of what the test case is. I guess it must be write-mostly, else lazy XID assignment would have helped this by reducing the rate of XID consumption. It's still true that I'm leery of a large increase in the number of buffers without reengineering slru.c. That code was written on the assumption that there were few enough buffers that a linear search would be fine. I'd hold still for 16, or maybe even 32, but I dunno how much impact that will have for such a test case. regards, tom lane
Tom, > It's still true that I'm leery of a large increase in the number of > buffers without reengineering slru.c. That code was written on the > assumption that there were few enough buffers that a linear search > would be fine. I'd hold still for 16, or maybe even 32, but I dunno > how much impact that will have for such a test case. Actually, 32 made a significant difference as I recall ... do you still have the figures for that, Jignesh? The test case is a workload called "iGen" which is a "fixed" TPCC-like workload. I've been trying to talk Sun into open-sourcing it, but no dice so far. It is heavy on writes, and (like TPCC) consists mostly of one-line transactions. -- Josh Berkus PostgreSQL @ Sun San Francisco
Josh Berkus <josh@agliodbs.com> writes: > Actually, 32 made a significant difference as I recall ... do you still have > the figures for that, Jignesh? I'd want to see a new set of test runs backing up any call for a change in NUM_CLOG_BUFFERS --- we've changed enough stuff around this area that benchmarks using code from a few months back shouldn't carry a lot of weight. regards, tom lane
"Josh Berkus" <josh@agliodbs.com> writes: > Actually, 32 made a significant difference as I recall ... do you still have > the figures for that, Jignesh? Well it made a difference but it didn't remove the bottleneck, it just moved it. IIRC under that benchmark Jignesh was able to run with x sessions efficiently with 8 clog buffers, x + 100 or so sessions with 16 clog buffers and x + 200 or so sessions with 32 clog buffers. It happened that x + 200 was > the number of sessions he wanted to run the benchmark at so it helped the benchmark results quite a bit. But that was just an artifact of how many sessions the benchmark needed. A user who needs 1200 sessions or who has a different transaction load might find he needs more clog buffers to alleviate the bottleneck. And of course most (all?) normal users use far fewer sessions and won't run into this bottleneck at all. Raising NUM_CLOG_BUFFERS just moves around the arbitrary bottleneck. This benchmark is useful in that it gives us an idea where the bottleneck lies for various values of NUM_CLOG_BUFFERS but it doesn't tell us what value realistic users are likely to bump into. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Hi George, I have seen the 4M/sec problem first actually during an EAStress type run with only 150 connections. I will try to do more testing today that Tom has requested. Regards, Jignesh Gregory Stark wrote: > "Jignesh K. Shah" <J.K.Shah@Sun.COM> writes: > > >> CLOG data is not cached in any PostgreSQL shared memory segments and hence >> becomes the bottleneck as it has to constantly go to the filesystem to get >> the read data. >> > > This is the same bottleneck you discussed earlier. CLOG reads are cached in > the Postgres shared memory segment but only NUM_CLOG_BUFFERS are which > defaults to 8 buffers of 8kb each. With 1,000 clients and the transaction rate > you're running you needed a larger number of buffers. > > Using the filesystem buffer cache is also an entirely reasonable solution > though. That's surely part of the logic behind not trying to keep more of the > clog in shared memory. Do you have any measurements of how much time is being > spent just doing the logical I/O to the buffer cache for the clog pages? 4MB/s > seems like it's not insignificant but your machine is big enough that perhaps > I'm thinking at the wrong scale. > > I'm really curious whether you see any benefit from the vxid read-only > transactions. I'm not sure how to get an apples to apples comparison though. > Ideally just comparing it to CVS HEAD from immediately prior to the vxid patch > going in. Perhaps calling some function which forces an xid to be allocated > and seeing how much it slows down the benchmark would be a good substitute. > >
The problem I saw was first highlighted by EAStress runs with PostgreSQL on Solaris with 120-150 users. I just replicated that via my smaller internal benchmark that we use here to recreate that problem. EAStress should be just fine to highlight it.. Just put pg_clog on O_DIRECT or something so that all IOs go to disk making it easier to observe. In the meanwhile I will try to get more information. Regards, Jignesh Tom Lane wrote: > Gregory Stark <stark@enterprisedb.com> writes: > >> Didn't we already go through this? He and Simon were pushing to bump up >> NUM_CLOG_BUFFERS and you were arguing that the test wasn't representative and >> some other clog.c would have to be reengineered to scale well to larger >> values. >> > > AFAIR we never did get any clear explanation of what the test case is. > I guess it must be write-mostly, else lazy XID assignment would have > helped this by reducing the rate of XID consumption. > > It's still true that I'm leery of a large increase in the number of > buffers without reengineering slru.c. That code was written on the > assumption that there were few enough buffers that a linear search > would be fine. I'd hold still for 16, or maybe even 32, but I dunno > how much impact that will have for such a test case. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match >
I agree with Tom.. somehow I think increasing NUM_CLOG_BUFFERS is just avoiding the symptom to a later value.. I promise to look more into it before making any recommendations to increase NUM_CLOG_BUFFERs. Because though "iGen" showed improvements in that area by increasing num_clog_buffers , EAStress had shown no improvements.. Plus the reason I think this is not the problem in 8.3beta1 since the Lock Output clearly does not show CLOGControlFile as to be the issue which I had seen in earlier case. So I dont think that increasing NUM_CLOG_BUFFERS will change thing here. Now I dont understand the code pretty well yet I see three hotspots and not sure if they are related to each other * ProcArrayLock waits - causing Waits as reported by 83_lockwait.d script * SimpleLRUReadPage - causing read IOs as reported by iostat/rsnoop.d * GetSnapshotData - causing CPU utilization as reported by hotuser But I will shut up and do more testing. Regards, Jignesh Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > >> Actually, 32 made a significant difference as I recall ... do you still have >> the figures for that, Jignesh? >> > > I'd want to see a new set of test runs backing up any call for a change > in NUM_CLOG_BUFFERS --- we've changed enough stuff around this area that > benchmarks using code from a few months back shouldn't carry a lot of > weight. > > regards, tom lane >
Tom, Here is what I did: I started aggregating all read information: First I also had added group by pid (arg0,arg1, pid) and the counts were all coming as 1 Then I just grouped by filename and location (arg0,arg1 of reads) and the counts came back as # cat read.d #!/usr/sbin/dtrace -s syscall::read:entry /execname=="postgres"/ { @read[fds[arg0].fi_pathname, arg1] = count(); } # ./read.d dtrace: script './read.d' matched 1 probe ^C /export/home0/igen/pgdata/pg_clog/0014 -2753028293472 1 /export/home0/igen/pgdata/pg_clog/0014 -2753028277088 1 /export/home0/igen/pgdata/pg_clog/0015 -2753028244320 2 /export/home0/igen/pgdata/pg_clog/0015 -2753028268896 14 /export/home0/igen/pgdata/pg_clog/0015 -2753028260704 25 /export/home0/igen/pgdata/pg_clog/0015 -2753028252512 27 /export/home0/igen/pgdata/pg_clog/0015 -2753028277088 28 /export/home0/igen/pgdata/pg_clog/0015 -2753028293472 37 FYI I pressed ctrl-c within like less than a second So to me this seems that multiple processes are reading the same page from different pids. (This was with about 600 suers active. Aparently we do have a problem that we are reading the same buffer address again. (Same as not being cached anywhere or not finding it in cache anywhere). I reran lock wait script on couple of processes and did not see CLogControlFileLock as a problem.. # ./83_lwlock_wait.d 14341 Lock Id Mode Count WALInsertLock Exclusive 1 ProcArrayLock Exclusive 16 Lock Id Combined Time (ns) WALInsertLock 383109 ProcArrayLock 198866236 # ./83_lwlock_wait.d 14607 Lock Id Mode Count WALInsertLock Exclusive 2 ProcArrayLock Exclusive 15 Lock Id Combined Time (ns) WALInsertLock 55243 ProcArrayLock 69700140 # What will help you find out why it is reading the same page again? -Jignesh Jignesh K. Shah wrote: > I agree with Tom.. somehow I think increasing NUM_CLOG_BUFFERS is > just avoiding the symptom to a later value.. I promise to look more > into it before making any recommendations to increase NUM_CLOG_BUFFERs. > > > Because though "iGen" showed improvements in that area by increasing > num_clog_buffers , EAStress had shown no improvements.. Plus the > reason I think this is not the problem in 8.3beta1 since the Lock > Output clearly does not show CLOGControlFile as to be the issue which > I had seen in earlier case. So I dont think that increasing > NUM_CLOG_BUFFERS will change thing here. > > Now I dont understand the code pretty well yet I see three hotspots > and not sure if they are related to each other > * ProcArrayLock waits - causing Waits as reported by > 83_lockwait.d script > * SimpleLRUReadPage - causing read IOs as reported by > iostat/rsnoop.d > * GetSnapshotData - causing CPU utilization as reported by hotuser > > But I will shut up and do more testing. > > Regards, > Jignesh > > > > Tom Lane wrote: >> Josh Berkus <josh@agliodbs.com> writes: >> >>> Actually, 32 made a significant difference as I recall ... do you >>> still have the figures for that, Jignesh? >>> >> >> I'd want to see a new set of test runs backing up any call for a change >> in NUM_CLOG_BUFFERS --- we've changed enough stuff around this area that >> benchmarks using code from a few months back shouldn't carry a lot of >> weight. >> >> regards, tom lane >> > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings
Also to give perspective on the equivalent writes on CLOG I used the following script which runs for 10 sec to track all writes to the clog directory and here is what it came up with... (This is with 500 users running) # cat write.d #!/usr/sbin/dtrace -s syscall::write:entry /execname=="postgres" && dirname(fds[arg0].fi_pathname)=="/export/home0/igen/pgdata/pg_clog"/ { @write[fds[arg0].fi_pathname,arg1] = count(); } tick-10sec { exit(0); } # ./write.d dtrace: script './write.d' matched 2 probes CPU ID FUNCTION:NAME 3 1026 :tick-10sec /export/home0/igen/pgdata/pg_clog/001E -2753028277088 1 # I modified read.d to do a 5sec read # ./read.d dtrace: script './read.d' matched 3 probes CPU ID FUNCTION:NAME 0 1 :BEGIN 0 1027 :tick-5sec /export/home0/igen/pgdata/pg_clog/001F -2753028268896 1 /export/home0/igen/pgdata/pg_clog/001F -2753028252512 1 /export/home0/igen/pgdata/pg_clog/001F -2753028285280 2 /export/home0/igen/pgdata/pg_clog/001F -2753028277088 3 /export/home0/igen/pgdata/pg_clog/001F -2753028236128 3 /export/home0/igen/pgdata/pg_clog/001E -2753028285280 5 /export/home0/igen/pgdata/pg_clog/001E -2753028236128 9 /export/home0/igen/pgdata/pg_clog/001E -2753028277088 13 /export/home0/igen/pgdata/pg_clog/001E -2753028268896 15 /export/home0/igen/pgdata/pg_clog/001E -2753028252512 27 # So the ratio of reads vs writes to clog files is pretty huge.. -Jignesh Jignesh K. Shah wrote: > Tom, > > Here is what I did: > > I started aggregating all read information: > > First I also had added group by pid (arg0,arg1, pid) and the counts > were all coming as 1 > > Then I just grouped by filename and location (arg0,arg1 of reads) and > the counts came back as > > # cat read.d > #!/usr/sbin/dtrace -s > syscall::read:entry > /execname=="postgres"/ > { > @read[fds[arg0].fi_pathname, arg1] = count(); > } > > > # ./read.d > dtrace: script './read.d' matched 1 probe > ^C > > /export/home0/igen/pgdata/pg_clog/0014 > -2753028293472 1 > /export/home0/igen/pgdata/pg_clog/0014 > -2753028277088 1 > /export/home0/igen/pgdata/pg_clog/0015 > -2753028244320 2 > /export/home0/igen/pgdata/pg_clog/0015 > -2753028268896 14 > /export/home0/igen/pgdata/pg_clog/0015 > -2753028260704 25 > /export/home0/igen/pgdata/pg_clog/0015 > -2753028252512 27 > /export/home0/igen/pgdata/pg_clog/0015 > -2753028277088 28 > /export/home0/igen/pgdata/pg_clog/0015 > -2753028293472 37 > > > FYI I pressed ctrl-c within like less than a second > > So to me this seems that multiple processes are reading the same page > from different pids. (This was with about 600 suers active. > > Aparently we do have a problem that we are reading the same buffer > address again. (Same as not being cached anywhere or not finding it > in cache anywhere). > > I reran lock wait script on couple of processes and did not see > CLogControlFileLock as a problem.. > > # ./83_lwlock_wait.d 14341 > > Lock Id Mode Count > WALInsertLock Exclusive 1 > ProcArrayLock Exclusive 16 > > Lock Id Combined Time (ns) > WALInsertLock 383109 > ProcArrayLock 198866236 > > # ./83_lwlock_wait.d 14607 > > Lock Id Mode Count > WALInsertLock Exclusive 2 > ProcArrayLock Exclusive 15 > > Lock Id Combined Time (ns) > WALInsertLock 55243 > ProcArrayLock 69700140 > > # > > What will help you find out why it is reading the same page again? > > > -Jignesh > > > > Jignesh K. Shah wrote: >> I agree with Tom.. somehow I think increasing NUM_CLOG_BUFFERS is >> just avoiding the symptom to a later value.. I promise to look more >> into it before making any recommendations to increase NUM_CLOG_BUFFERs. >> >> >> Because though "iGen" showed improvements in that area by increasing >> num_clog_buffers , EAStress had shown no improvements.. Plus the >> reason I think this is not the problem in 8.3beta1 since the Lock >> Output clearly does not show CLOGControlFile as to be the issue which >> I had seen in earlier case. So I dont think that increasing >> NUM_CLOG_BUFFERS will change thing here. >> >> Now I dont understand the code pretty well yet I see three hotspots >> and not sure if they are related to each other >> * ProcArrayLock waits - causing Waits as reported by >> 83_lockwait.d script >> * SimpleLRUReadPage - causing read IOs as reported by >> iostat/rsnoop.d >> * GetSnapshotData - causing CPU utilization as reported by hotuser >> >> But I will shut up and do more testing. >> >> Regards, >> Jignesh >> >> >> >> Tom Lane wrote: >>> Josh Berkus <josh@agliodbs.com> writes: >>> >>>> Actually, 32 made a significant difference as I recall ... do you >>>> still have the figures for that, Jignesh? >>>> >>> >>> I'd want to see a new set of test runs backing up any call for a change >>> in NUM_CLOG_BUFFERS --- we've changed enough stuff around this area >>> that >>> benchmarks using code from a few months back shouldn't carry a lot of >>> weight. >>> >>> regards, tom lane >>> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 5: don't forget to increase your free space map settings
"Jignesh K. Shah" <J.K.Shah@Sun.COM> writes: > So the ratio of reads vs writes to clog files is pretty huge.. It looks to me that the issue is simply one of not having quite enough CLOG buffers. Your first run shows 8 different pages being fetched and the second shows 10. Bearing in mind that we "pin" the latest CLOG page into buffers, there are only NUM_CLOG_BUFFERS-1 buffers available for older pages, so what we've got here is thrashing for the available slots. Try increasing NUM_CLOG_BUFFERS to 16 and see how it affects this test. regards, tom lane
I changed CLOG Buffers to 16 Running the test again: # ./read.d dtrace: script './read.d' matched 2 probes CPU ID FUNCTION:NAME 0 1027 :tick-5sec /export/home0/igen/pgdata/pg_clog/0024 -2753028219296 1 /export/home0/igen/pgdata/pg_clog/0025 -2753028211104 1 # ./read.d dtrace: script './read.d' matched 2 probes CPU ID FUNCTION:NAME 1 1027 :tick-5sec # ./read.d dtrace: script './read.d' matched 2 probes CPU ID FUNCTION:NAME 1 1027 :tick-5sec # ./read.d dtrace: script './read.d' matched 2 probes CPU ID FUNCTION:NAME 0 1027 :tick-5sec /export/home0/igen/pgdata/pg_clog/0025 -2753028194720 1 So Tom seems to be correct that it is a case of CLOG Buffer thrashing. But since I saw the same problem with two different workloads, I think people hitting this problem is pretty high. Also I am bit surprised that CLogControlFile did not show up as being hot.. Maybe because not much writes are going on .. Or maybe since I did not trace all 500 users to see their hot lock status.. Dmitri has another workload to test, I might try that out later on to see if it causes similar impact or not. Of course I havent seen my throughput go up yet since I am already CPU bound... But this is good since the number of IOPS to the disk are reduced (and hence system calls). If I take this as my baseline number.. I can then proceed to hunt other bottlenecks???? Whats the view of the community? Hunt down CPU utilizations or Lock waits next? Your votes are crucial on where I put my focus. Another thing Josh B told me to check out was the wal_writer_delay setting: I have done two settings with almost equal performance (with the CLOG 16 setting) .. One with 100ms and other default at 200ms.. Based on the runs it seemed that the 100ms was slightly better than the default .. (Plus the risk of loosing data is reduced from 600ms to 300ms) Thanks. Regards, Jignesh Tom Lane wrote: > "Jignesh K. Shah" <J.K.Shah@Sun.COM> writes: > >> So the ratio of reads vs writes to clog files is pretty huge.. >> > > It looks to me that the issue is simply one of not having quite enough > CLOG buffers. Your first run shows 8 different pages being fetched and > the second shows 10. Bearing in mind that we "pin" the latest CLOG page > into buffers, there are only NUM_CLOG_BUFFERS-1 buffers available for > older pages, so what we've got here is thrashing for the available > slots. > > Try increasing NUM_CLOG_BUFFERS to 16 and see how it affects this test. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >
This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --------------------------------------------------------------------------- Jignesh K. Shah wrote: > > I changed CLOG Buffers to 16 > > Running the test again: > # ./read.d > dtrace: script './read.d' matched 2 probes > CPU ID FUNCTION:NAME > 0 1027 :tick-5sec > > /export/home0/igen/pgdata/pg_clog/0024 > -2753028219296 1 > /export/home0/igen/pgdata/pg_clog/0025 > -2753028211104 1 > # ./read.d > dtrace: script './read.d' matched 2 probes > CPU ID FUNCTION:NAME > 1 1027 :tick-5sec > > # ./read.d > dtrace: script './read.d' matched 2 probes > CPU ID FUNCTION:NAME > 1 1027 :tick-5sec > > # ./read.d > dtrace: script './read.d' matched 2 probes > CPU ID FUNCTION:NAME > 0 1027 :tick-5sec > > /export/home0/igen/pgdata/pg_clog/0025 > -2753028194720 1 > > > So Tom seems to be correct that it is a case of CLOG Buffer thrashing. > But since I saw the same problem with two different workloads, I think > people hitting this problem is pretty high. > > Also I am bit surprised that CLogControlFile did not show up as being > hot.. Maybe because not much writes are going on .. Or maybe since I did > not trace all 500 users to see their hot lock status.. > > > Dmitri has another workload to test, I might try that out later on to > see if it causes similar impact or not. > > Of course I havent seen my throughput go up yet since I am already CPU > bound... But this is good since the number of IOPS to the disk are > reduced (and hence system calls). > > > If I take this as my baseline number.. I can then proceed to hunt other > bottlenecks???? > > > Whats the view of the community? > > Hunt down CPU utilizations or Lock waits next? > > Your votes are crucial on where I put my focus. > > Another thing Josh B told me to check out was the wal_writer_delay setting: > > I have done two settings with almost equal performance (with the CLOG 16 > setting) .. One with 100ms and other default at 200ms.. Based on the > runs it seemed that the 100ms was slightly better than the default .. > (Plus the risk of loosing data is reduced from 600ms to 300ms) > > Thanks. > > Regards, > Jignesh > > > > > Tom Lane wrote: > > "Jignesh K. Shah" <J.K.Shah@Sun.COM> writes: > > > >> So the ratio of reads vs writes to clog files is pretty huge.. > >> > > > > It looks to me that the issue is simply one of not having quite enough > > CLOG buffers. Your first run shows 8 different pages being fetched and > > the second shows 10. Bearing in mind that we "pin" the latest CLOG page > > into buffers, there are only NUM_CLOG_BUFFERS-1 buffers available for > > older pages, so what we've got here is thrashing for the available > > slots. > > > > Try increasing NUM_CLOG_BUFFERS to 16 and see how it affects this test. > > > > regards, tom lane > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 3: Have you checked our extensive FAQ? > > > > http://www.postgresql.org/docs/faq > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +