Re: [HACKERS] 8.3beta1 testing on Solaris - Mailing list pgsql-performance
From | Jignesh K. Shah |
---|---|
Subject | Re: [HACKERS] 8.3beta1 testing on Solaris |
Date | |
Msg-id | 47220B96.4080009@sun.com Whole thread Raw |
In response to | Re: [HACKERS] 8.3beta1 testing on Solaris ("Jignesh K. Shah" <J.K.Shah@Sun.COM>) |
Responses |
Re: [HACKERS] 8.3beta1 testing on Solaris
|
List | pgsql-performance |
Tom, Here is what I did: I started aggregating all read information: First I also had added group by pid (arg0,arg1, pid) and the counts were all coming as 1 Then I just grouped by filename and location (arg0,arg1 of reads) and the counts came back as # cat read.d #!/usr/sbin/dtrace -s syscall::read:entry /execname=="postgres"/ { @read[fds[arg0].fi_pathname, arg1] = count(); } # ./read.d dtrace: script './read.d' matched 1 probe ^C /export/home0/igen/pgdata/pg_clog/0014 -2753028293472 1 /export/home0/igen/pgdata/pg_clog/0014 -2753028277088 1 /export/home0/igen/pgdata/pg_clog/0015 -2753028244320 2 /export/home0/igen/pgdata/pg_clog/0015 -2753028268896 14 /export/home0/igen/pgdata/pg_clog/0015 -2753028260704 25 /export/home0/igen/pgdata/pg_clog/0015 -2753028252512 27 /export/home0/igen/pgdata/pg_clog/0015 -2753028277088 28 /export/home0/igen/pgdata/pg_clog/0015 -2753028293472 37 FYI I pressed ctrl-c within like less than a second So to me this seems that multiple processes are reading the same page from different pids. (This was with about 600 suers active. Aparently we do have a problem that we are reading the same buffer address again. (Same as not being cached anywhere or not finding it in cache anywhere). I reran lock wait script on couple of processes and did not see CLogControlFileLock as a problem.. # ./83_lwlock_wait.d 14341 Lock Id Mode Count WALInsertLock Exclusive 1 ProcArrayLock Exclusive 16 Lock Id Combined Time (ns) WALInsertLock 383109 ProcArrayLock 198866236 # ./83_lwlock_wait.d 14607 Lock Id Mode Count WALInsertLock Exclusive 2 ProcArrayLock Exclusive 15 Lock Id Combined Time (ns) WALInsertLock 55243 ProcArrayLock 69700140 # What will help you find out why it is reading the same page again? -Jignesh Jignesh K. Shah wrote: > I agree with Tom.. somehow I think increasing NUM_CLOG_BUFFERS is > just avoiding the symptom to a later value.. I promise to look more > into it before making any recommendations to increase NUM_CLOG_BUFFERs. > > > Because though "iGen" showed improvements in that area by increasing > num_clog_buffers , EAStress had shown no improvements.. Plus the > reason I think this is not the problem in 8.3beta1 since the Lock > Output clearly does not show CLOGControlFile as to be the issue which > I had seen in earlier case. So I dont think that increasing > NUM_CLOG_BUFFERS will change thing here. > > Now I dont understand the code pretty well yet I see three hotspots > and not sure if they are related to each other > * ProcArrayLock waits - causing Waits as reported by > 83_lockwait.d script > * SimpleLRUReadPage - causing read IOs as reported by > iostat/rsnoop.d > * GetSnapshotData - causing CPU utilization as reported by hotuser > > But I will shut up and do more testing. > > Regards, > Jignesh > > > > Tom Lane wrote: >> Josh Berkus <josh@agliodbs.com> writes: >> >>> Actually, 32 made a significant difference as I recall ... do you >>> still have the figures for that, Jignesh? >>> >> >> I'd want to see a new set of test runs backing up any call for a change >> in NUM_CLOG_BUFFERS --- we've changed enough stuff around this area that >> benchmarks using code from a few months back shouldn't carry a lot of >> weight. >> >> regards, tom lane >> > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings
pgsql-performance by date: