Re: [HACKERS] 8.3beta1 testing on Solaris - Mailing list pgsql-performance

From Jignesh K. Shah
Subject Re: [HACKERS] 8.3beta1 testing on Solaris
Date
Msg-id 47220B96.4080009@sun.com
Whole thread Raw
In response to Re: [HACKERS] 8.3beta1 testing on Solaris  ("Jignesh K. Shah" <J.K.Shah@Sun.COM>)
Responses Re: [HACKERS] 8.3beta1 testing on Solaris
List pgsql-performance
Tom,

Here is what I did:

I started aggregating all read information:

First I also had added group by pid    (arg0,arg1, pid) and the counts
were all coming as 1

Then I just grouped by filename and location (arg0,arg1 of reads) and
the counts came back as

# cat read.d
#!/usr/sbin/dtrace -s
syscall::read:entry
/execname=="postgres"/
{
    @read[fds[arg0].fi_pathname, arg1] = count();
}


# ./read.d
dtrace: script './read.d' matched 1 probe
^C

  /export/home0/igen/pgdata/pg_clog/0014
-2753028293472                1
  /export/home0/igen/pgdata/pg_clog/0014
-2753028277088                1
  /export/home0/igen/pgdata/pg_clog/0015
-2753028244320                2
  /export/home0/igen/pgdata/pg_clog/0015
-2753028268896               14
  /export/home0/igen/pgdata/pg_clog/0015
-2753028260704               25
  /export/home0/igen/pgdata/pg_clog/0015
-2753028252512               27
  /export/home0/igen/pgdata/pg_clog/0015
-2753028277088               28
  /export/home0/igen/pgdata/pg_clog/0015
-2753028293472               37


FYI  I pressed ctrl-c within like less than a second

So to me this seems that multiple processes are reading the same page
from different pids. (This was with about 600 suers active.

Aparently we do have a problem that we are reading the same buffer
address again.  (Same as not being cached anywhere or not finding it in
cache anywhere).

I reran lock wait script on couple of processes and did not see
CLogControlFileLock  as a problem..

# ./83_lwlock_wait.d 14341

             Lock Id            Mode           Count
       WALInsertLock       Exclusive               1
       ProcArrayLock       Exclusive              16

             Lock Id   Combined Time (ns)
       WALInsertLock               383109
       ProcArrayLock            198866236

# ./83_lwlock_wait.d 14607

             Lock Id            Mode           Count
       WALInsertLock       Exclusive               2
       ProcArrayLock       Exclusive              15

             Lock Id   Combined Time (ns)
       WALInsertLock                55243
       ProcArrayLock             69700140

#

What will help you find out why it is reading the same page again?


-Jignesh



Jignesh K. Shah wrote:
> I agree with Tom..  somehow I think  increasing NUM_CLOG_BUFFERS is
> just avoiding the symptom to a later value.. I promise to look more
> into it before making any recommendations to increase NUM_CLOG_BUFFERs.
>
>
> Because though "iGen"  showed improvements in that area by increasing
> num_clog_buffers , EAStress had shown no improvements.. Plus the
> reason I think this is not the problem in 8.3beta1 since the Lock
> Output clearly does not show CLOGControlFile as to be the issue which
> I had seen in earlier case.  So I dont think that increasing
> NUM_CLOG_BUFFERS will change thing here.
>
> Now I dont understand the code pretty well yet I see three hotspots
> and not sure if they are related to each other
> * ProcArrayLock waits  - causing Waits          as reported by
> 83_lockwait.d script
> * SimpleLRUReadPage - causing read IOs             as reported by
> iostat/rsnoop.d
> * GetSnapshotData - causing CPU utilization  as reported by hotuser
>
> But I will shut up and do more testing.
>
> Regards,
> Jignesh
>
>
>
> Tom Lane wrote:
>> Josh Berkus <josh@agliodbs.com> writes:
>>
>>> Actually, 32 made a significant difference as I recall ... do you
>>> still have the figures for that, Jignesh?
>>>
>>
>> I'd want to see a new set of test runs backing up any call for a change
>> in NUM_CLOG_BUFFERS --- we've changed enough stuff around this area that
>> benchmarks using code from a few months back shouldn't carry a lot of
>> weight.
>>
>>             regards, tom lane
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

pgsql-performance by date:

Previous
From: Greg Smith
Date:
Subject: Re: Bunching "transactions"
Next
From: "Jignesh K. Shah"
Date:
Subject: Re: [HACKERS] 8.3beta1 testing on Solaris