Re: testing HS/SR - 1 vs 2 performance - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: testing HS/SR - 1 vs 2 performance
Date
Msg-id 1272199800.4161.1662.camel@ebony
Whole thread Raw
In response to Re: testing HS/SR - 1 vs 2 performance  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: testing HS/SR - 1 vs 2 performance
List pgsql-hackers
On Sun, 2010-04-25 at 06:53 -0400, Robert Haas wrote:
> On Sat, Apr 24, 2010 at 5:17 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >> Both Heikki and I objected to that patch.
> >
> > Please explain your objection, based upon the patch and my explanations.
> 
> Well, we objected to the locking.  Having reread the patch a few times
> though, I think I'm starting to wrap my head around it so, I don't
> know, maybe it's OK.  Have you tested grabbing the ProcArrayLock in
> exclusive mode instead of having a separate spinlock, to see how that
> performs?

They both perform well. Heikki and I agree that spinlocks perform well.

The point of taking the ProcArrayLock in Shared rather than Exclusive
mode is the reduction in contention that gives. Adding a
KnownAssignedXid is exactly analogous to starting a transaction. We go
to some trouble to avoid taking a lock when we start a transaction and I
want to do a similar thing on the standby. I think it's even more
important we reduce contention on the standby than it is on the primary
because if the Startup process is delayed then it slows down recovery.

> >> And apparently it doesn't
> >> fix the problem, either.  So, -1 from me.
> >
> > There is an issue observed in Erik's later tests, but my interpretation
> > of the results so far is that the sorted array patch successfully
> > removes the initially reported loss of performance.
> 
> Is it possible the remaining spikes are due to fights over the spinlock?

That one specific spinlock? I considered it. I think its doubtful.

The spikes all involved a jump from <5ms to much more than that, often
as long as a second. That looks like a pure I/O problem or an LWlock
held across a slow I/O or platform specific issues of some kind.

Erik's tests were with 1 backend, so that would mean that at most 2
processes might want the spinlock you mention. The spinlock is only
taken at snapshot time, so once per transaction in Erik's tests. So the
if the spinlock were the problem then one or other of the processes
would need to grab and hold the spinlock for a very long time and then
release it sometime later. Since the spinlock code has been with us for
some time, I doubt there is anything wrong there.

I don't have any evidence as to whether the problem is solely on Erik's
system, nor whether it is an issue already in the codebase or introduced
by this patch. It seems unlikely to me that this patch introduces the
problem, and even more unlikely that the issue is caused by two
processes fighting over that particular spinlock. 

The patch is very effective at reducing overall cost of taking snapshots
and so I think it needs to be applied to allow more performance data to
be collected. I don't suppose this is the last performance tuning we
will need to do.

-- Simon Riggs           www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: global temporary tables
Next
From: Simon Riggs
Date:
Subject: Re: global temporary tables