Re: Proposal of tunable fix for scalability of 8.4

From: Scott Carey
Subject: Re: Proposal of tunable fix for scalability of 8.4
Date: ,
Msg-id: BDFBB77C9E07BE4A984DAAE981D19F961AEE7AFBB5@EXVMBX018-1.exch018.msoutlookonline.net
(view: Whole thread, Raw)
In response to: Re: Proposal of tunable fix for scalability of 8.4  (Robert Haas)
Responses: Re: Proposal of tunable fix for scalability of 8.4  (Alvaro Herrera)
List: pgsql-performance

Tree view

Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
 Re: Proposal of tunable fix for scalability of 8.4  ("Kevin Grittner", )
  Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
   Re: Proposal of tunable fix for scalability of 8.4  ("Kevin Grittner", )
  Re: Proposal of tunable fix for scalability of 8.4  (Tom Lane, )
   Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
    Re: Proposal of tunable fix for scalability of 8.4  (Simon Riggs, )
     Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
     Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
      Re: Proposal of tunable fix for scalability of 8.4  (Simon Riggs, )
       Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
        Re: Proposal of tunable fix for scalability of 8.4  (Simon Riggs, )
        Re: Proposal of tunable fix for scalability of 8.4  (Gregory Stark, )
         Re: Proposal of tunable fix for scalability of 8.4  (Scott Carey, )
          Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
  Re: Proposal of tunable fix for scalability of 8.4  (Scott Carey, )
   Re: Proposal of tunable fix for scalability of 8.4  (Tom Lane, )
    Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
     Re: Proposal of tunable fix for scalability of 8.4  (decibel, )
      Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
       Re: Proposal of tunable fix for scalability of 8.4  (Josh Berkus, )
    Re: Proposal of tunable fix for scalability of 8.4  (Scott Carey, )
     Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
      Re: Proposal of tunable fix for scalability of 8.4  ("Kevin Grittner", )
      Re: Proposal of tunable fix for scalability of 8.4  (Scott Carey, )
       Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
        Re: Proposal of tunable fix for scalability of 8.4  (Gregory Stark, )
         Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
          Re: Proposal of tunable fix for scalability of 8.4  (Gregory Stark, )
         Re: Proposal of tunable fix for scalability of 8.4  (decibel, )
        Re: Proposal of tunable fix for scalability of 8.4  (Gregory Stark, )
        Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
         Re: Proposal of tunable fix for scalability of 8.4  ("Kevin Grittner", )
          Re: Proposal of tunable fix for scalability of 8.4  (Scott Carey, )
           Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
         Re: Proposal of tunable fix for scalability of 8.4  ("Kevin Grittner", )
         Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
          Re: Proposal of tunable fix for scalability of 8.4  (Scott Carey, )
           Re: Proposal of tunable fix for scalability of 8.4  (Scott Carey, )
            Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
          Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
           Re: Proposal of tunable fix for scalability of 8.4  (decibel, )
            Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
             Re: Proposal of tunable fix for scalability of 8.4  (Gregory Stark, )
              Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
      Re: Proposal of tunable fix for scalability of 8.4  (Greg Smith, )
   Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
    Re: Proposal of tunable fix for scalability of 8.4  (Scott Carey, )
    Re: Proposal of tunable fix for scalability of 8.4  (Greg Smith, )
     Re: Proposal of tunable fix for scalability of 8.4  (Scott Carey, )
      Re: Proposal of tunable fix for scalability of 8.4  (Robert Haas, )
       Re: Proposal of tunable fix for scalability of 8.4  (Tom Lane, )
        Re: Proposal of tunable fix for scalability of 8.4  (Scott Carey, )
        Re: Proposal of tunable fix for scalability of 8.4  ("Kevin Grittner", )
         Re: Proposal of tunable fix for scalability of 8.4  (Scott Carey, )
         Re: Proposal of tunable fix for scalability of 8.4  (, )
          Re: Proposal of tunable fix for scalability of 8.4  ("Kevin Grittner", )
           Re: Proposal of tunable fix for scalability of 8.4  ("Kevin Grittner", )
            Re: Proposal of tunable fix for scalability of 8.4  (Scott Carey, )
        Re: Proposal of tunable fix for scalability of 8.4  (Heikki Linnakangas, )
         Re: Proposal of tunable fix for scalability of 8.4  (Tom Lane, )
          Re: Proposal of tunable fix for scalability of 8.4  (Scott Carey, )
          Re: Proposal of tunable fix for scalability of 8.4  (Simon Riggs, )
         Re: Proposal of tunable fix for scalability of 8.4  (Matthew Wakeling, )
          Re: Proposal of tunable fix for scalability of 8.4  (Simon Riggs, )
           Re: Proposal of tunable fix for scalability of 8.4  (Matthew Wakeling, )
            Re: Proposal of tunable fix for scalability of 8.4  (Simon Riggs, )
             Re: Proposal of tunable fix for scalability of 8.4  (Matthew Wakeling, )
             Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
              Re: Proposal of tunable fix for scalability of 8.4  (Matthew Wakeling, )
               Re: Proposal of tunable fix for scalability of 8.4  (Simon Riggs, )
           Re: Proposal of tunable fix for scalability of 8.4  (Tom Lane, )
            Re: Proposal of tunable fix for scalability of 8.4  (Simon Riggs, )
           Re: Proposal of tunable fix for scalability of 8.4  (Scott Carey, )
            Re: Proposal of tunable fix for scalability of 8.4  (Matthew Wakeling, )
             Re: Proposal of tunable fix for scalability of 8.4  (Tom Lane, )
             Re: Proposal of tunable fix for scalability of 8.4  (Scott Carey, )
             Re: Proposal of tunable fix for scalability of 8.4  (Simon Riggs, )
          Re: Proposal of tunable fix for scalability of 8.4  (Heikki Linnakangas, )
           Re: Proposal of tunable fix for scalability of 8.4  (Matthew Wakeling, )
       Re: Proposal of tunable fix for scalability of 8.4  (Scott Carey, )
        Re: Proposal of tunable fix for scalability of 8.4  (Robert Haas, )
         Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
          Re: Proposal of tunable fix for scalability of 8.4  (Robert Haas, )
           Re: Proposal of tunable fix for scalability of 8.4  (Scott Carey, )
            Re: Proposal of tunable fix for scalability of 8.4  (Robert Haas, )
             Re: Proposal of tunable fix for scalability of 8.4  (Scott Carey, )
              Re: Proposal of tunable fix for scalability of 8.4  (Alvaro Herrera, )
               Re: Proposal of tunable fix for scalability of 8.4  (Alvaro Herrera, )
                Re: Proposal of tunable fix for scalability of 8.4  (Alvaro Herrera, )
                 Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
                  Re: Proposal of tunable fix for scalability of 8.4  (Robert Haas, )
                   Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
           Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
         Re: Proposal of tunable fix for scalability of 8.4  (Bruce Momjian, )
          Re: Proposal of tunable fix for scalability of 8.4  (Scott Carey, )
           Re: Proposal of tunable fix for scalability of 8.4  (Bruce Momjian, )
         Re: Proposal of tunable fix for scalability of 8.4  (Scott Carey, )
      Re: Proposal of tunable fix for scalability of 8.4  (Greg Smith, )
     8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
      Re: 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4  (Gregory Stark, )
      Re: 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4  (Gregory Stark, )
       Re: 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4  (Alan Stange, )
        Re: 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4  (Tom Lane, )
         Re: 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4  (Gregory Stark, )
          Re: 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4  (Tom Lane, )
           Re: 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4  (Robert Haas, )
            Re: 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4  (Gregory Stark, )
            Re: 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4  (Tom Lane, )
           Re: 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4  (Bruce Momjian, )
            Re: 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4  (Bruce Momjian, )
      Re: 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4  (Greg Smith, )
   Re: Proposal of tunable fix for scalability of 8.4  ("Kevin Grittner", )
    Re: Proposal of tunable fix for scalability of 8.4  (Grzegorz Jaśkiewicz, )
     Re: Proposal of tunable fix for scalability of 8.4  ("Kevin Grittner", )
      Re: Proposal of tunable fix for scalability of 8.4  (Ron, )
     Re: Proposal of tunable fix for scalability of 8.4  (Gregory Stark, )
      Re: Proposal of tunable fix for scalability of 8.4  (Scott Carey, )
    Re: Proposal of tunable fix for scalability of 8.4  (Scott Carey, )
     Re: Proposal of tunable fix for scalability of 8.4  (Tom Lane, )
      Re: Proposal of tunable fix for scalability of 8.4  (Alvaro Herrera, )
       Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
        Re: Proposal of tunable fix for scalability of 8.4  (decibel, )
      Re: Proposal of tunable fix for scalability of 8.4  (Scott Carey, )
    Re: Proposal of tunable fix for scalability of 8.4  (Tom Lane, )
     Re: Proposal of tunable fix for scalability of 8.4  (Scott Carey, )
    Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )
 Re: Proposal of tunable fix for scalability of 8.4  (Simon Riggs, )
  Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah", )

From: Robert Haas []
Sent: Thursday, March 19, 2009 8:45 PM
To: Scott Carey
 Cc: Jignesh K. Shah; Greg Smith; Kevin Grittner; 
 Subject: Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
>
> >On Thu, Mar 19, 2009 at 5:43 PM, Scott Carey <> wrote:
> >> Well, unless I'm misunderstanding something, waking all waiters every
> >> time could lead to arbitrarily long delays for writers on mostly
> >> read-only workloads... and by arbitrarily along, we mean to say
> >> "potentially just about forever".  That doesn't sound safe for
> >> production to me.
> >
> > The other discussion going on indicates that that condition already can
> > happen, shared can always currently cut in line while other shared locks
> > have the lock, though I don't understand all the details.
>
> No.  If the first process waiting for an LWLock wants an exclusive
> lock, we wake up that process, and only that process.   If the first
> process waiting for an LWLock wants a shared lock, we wake up that
> process, and the processes which it follow it in the queue that also
> want shared locks.  But if we come to a process which holds an
> exclusive lock, we stop.  So if the wait queue looks like this
> SSSXSSSXSSS, then the first three processes will be woken up, but the
> remainder will not.  The new wait queue will look like this: XSSSXSSS
> - and the exclusive waiter at the head of the queue is guaranteed to
> get the next turn.

Your description (much of which I cut out) is exactly how I understood it until Simon Riggs' post which changed my view
andunderstanding.  Under that situation, waking all shared will leave all XXXXX at the front and hence alternate
shared/exclusive/shared/exclusiveas long as both types are contending.  Simon's post changed my view.  Below is some
cut/pastefrom it: 
NOTE: things without a > in front here represent Simon until the ENDQUOTE:

QUOTE -----------
On Wed, 2009-03-18 at 11:45 +0000, Matthew Wakeling wrote:
> On Wed, 18 Mar 2009, Simon Riggs wrote:
> > I agree with that, apart from the "granting no more" bit.
> >
> > The most useful behaviour is just to have two modes:
> > * exclusive-lock held - all other x locks welcome, s locks queue
> > * shared-lock held - all other s locks welcome, x locks queue
>
> The problem with making all other locks welcome is that there is a
> possibility of starvation. Imagine a case where there is a constant stream
> of shared locks - the exclusive locks may never actually get hold of the
> lock under the "all other shared locks welcome" strategy.

That's exactly what happens now.

----------
 > [Scott Carey] (Further down in Simon's post, a quote from months ago: )
----------
"Each time a Shared request is dequeued, we
effectively re-enable queue jumping, so a Shared request arriving during
that point will actually jump ahead of Shared requests that were unlucky
enough to arrive while an Exclusive lock was held. Worse than that, the
new incoming Shared requests exacerbate the starvation, so the more
non-adjacent groups of Shared lock requests there are in the queue, the
worse the starvation of the exclusive requestors becomes.  We are
effectively randomly starving some shared locks as well as exclusive
locks in the current scheme, based upon the state of the lock when they
make their request."

ENDQUOTE  ( Simon Riggs, cut/paste by me.  post from his post Wednesday 3/18 5:10 AM pacific time).
------------------

I read that to mean that what is happening now is that in ADDITION to your explanation of how the queue works, while a
batchof shared locks are executing, NEW shared locks execute immediately and don't even queue.  That is, there is
sharedrequest queue jumping.  The queue operates as your description but not everythig queues.  
It seems pretty conclusive if that is truthful -- that there is starvation possible in the current system.  At this
stage,it would seem that neither of us are experts on the current behavior, or that Simon is wrong, or that I
completelymisunderstood his comments above. 

> Now, of course, EVENTUALLY one of the X guys will probably beat out
> all the S-lock waiters and he'll get to do his thing.  But there's no
> upper bound on how long this can take, and if the rate at which S-lock
> waiters are joining the queue is much higher than the rate at which
> X-lock waiters are joining the queue, it may be quite a long time.

And the average expected time and distribution of those events can be statistically calculated and empirically
measured. The fact that there is a chance at all is not as important as the magitude of the chance and the distribution
ofthose probabilities.   

> Even if the overall system throughput is better with this change, the
> fact that the guys who need the X-lock get seriously shafted is a
> really serious problem.

If 'serious shafting' is so, yes!  We only disagree on the current possibility of this and the magnitude/likelihood of
it.  
By Simon's comments above the starvation possiblility is already the case.  I am merely using that discussion as
evidence. It may be wrong, so in reality we agree overall but both don't have enough knowledge to go much beyond that.
Ithink we can both agree that IF the current system is unfair, then the 'wake all' system is roughly as unfair, and
perhapseven more fair and that testing evidence (averages and standar deviations too!) should guide us.  If the current
systemis truly fair and cannot have starvation, then the 'wake all' setup would be a step backwards on that front.
Thatis why my early comments on this were to wake only the shared or alternate. 

(I think an unfair simple 'wake all' lock is still useful for experimentation and testing and perhaps configuration
--wemay differ on that). 

> If I start a million transactions on my
> system and they all complete in average of 1 second each, that sounds
> pretty good - unless it's because 999,999 of them completed almost
> instantaneously and the last one took a million seconds.

Measuring standard deviation / variance is always important.  Averages alone are surely not good enough.  Whether this
isaverage time to commit a transaction (low level) or the average cost of a query plan (higher level),  consistency is
highlyvaluable.  Better to have slightly longer average times and very high consistency than the opposite.  

> > Also, the tests on the 'wake all' version clearly aren't starving anything
> > in a load test with thousands of threads and very heavy lock contention,
> > mostly for shared locks.
> > Instead throughput increases and all wait times decrease.

> On the average, yes...

I agree we would need more than the average to be confident.  Although I am not opposed to letting a user decide
betweenthe two -- gaining performance and sacrificing some consistency.  Its a common real-world tradeoff. 

> > There are several other proposals to make starvation less possible (wake
> > only shared and other proposals that alternate between shared and exclusive;
> > waking only X sized chunks, etc -- its all just investigation into fixing
> > what can be improved on -- solutions that are easily testable should not
> > just be thrown out: the first ones were just the easiest to try).
>
> Alternating between shared and exclusive is safe.  But a lot more
> testing in a lot more situations would be needed to determine whether
> it is better, I think.  Waking chunks of a certain size I believe will
> produce a more complicated version of the problem described above.
>
> ...Robert

The alternating proposal is the most elegant and based on my experience should also perform well.  The two list
solutionfor this is simpler and can probably be done without locking on the list adding with atomics (compare and
set/swap). Appending to a linked list can be done lock-free safely as can atomically swapping out lists.  Predominantly
lock-freeis the way to go for heavily contended situations like this.  The proposal that compacts the list by freeing
allshared, and compacts the exclusive remainders probably requires more locking and contention due to more complex list
manipulation. I agree that the chunk version is probably more complicated than needed. 

Our disagreement here revolves around two things I believe:  What the current functionality actually is, and how useful
thebrute force simple lock is as a tool and as a config option. 


pgsql-performance by date:

From: "ml@bortal.de"
Date:
Subject: current transaction in productive database
From: Glyn Astill
Date:
Subject: Re: Prepared statement does not exist