Re: cost-based vacuum - Mailing list pgsql-performance

From Simon Riggs
Subject Re: cost-based vacuum
Date
Msg-id 1121287188.3970.296.camel@localhost.localdomain
Whole thread Raw
In response to Re: cost-based vacuum  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: cost-based vacuum  (Ian Westmacott <ianw@intellivid.com>)
List pgsql-performance
On Wed, 2005-07-13 at 14:58 -0400, Tom Lane wrote:
> Ian Westmacott <ianw@intellivid.com> writes:
> > On Wed, 2005-07-13 at 11:55, Simon Riggs wrote:
> >> On Tue, 2005-07-12 at 13:50 -0400, Ian Westmacott wrote:
> >>> It appears not to matter whether it is one of the tables
> >>> being written to that is ANALYZEd.  I can ANALYZE an old,
> >>> quiescent table, or a system table and see this effect.
> >>
> >> Can you confirm that this effect is still seen even when the ANALYZE
> >> doesn't touch *any* of the tables being accessed?
>
> > Yes.
>
> This really isn't making any sense at all.

Agreed. I think all of this indicates that some wierdness (technical
term) is happening at a different level in the computing stack. I think
all of this points fairly strongly to it *not* being a PostgreSQL
algorithm problem, i.e. if the code was executed by an idealised Knuth-
like CPU then we would not get this problem. Plus, I have faith that if
it was a problem in that "plane" then you or another would have
uncovered it by now.

> However, these certainly do not explain Ian's problem, because (a) these
> only apply to VACUUM, not ANALYZE; (b) they would only lock the table
> being VACUUMed, not other ones; (c) if these locks were to block the
> reader or writer thread, it'd manifest as blocking on a semaphore, not
> as a surge in LWLock thrashing.

I've seen enough circumstantial evidence to connect the time spent
inside LWLockAcquire/Release as being connected to the Semaphore ops
within them, not the other aspects of the code.

Months ago we discussed the problem of false sharing on closely packed
arrays of shared variables because of the large cache line size of the
Xeon MP. When last we touched on that thought, I focused on the thought
that the LWLock array was too tightly packed for the predefined locks.
What we didn't discuss (because I was too focused on the other array)
was the PGPROC shared array is equally tightly packed, which could give
problems on the semaphores in LWLock.

Intel says fairly clearly that this would be an issue.

> >> Is that Xeon MP then?
>
> > Yes.
>
> The LWLock activity is certainly suggestive of prior reports of
> excessive buffer manager lock contention, but it makes *no* sense that
> that would be higher with vacuum cost delay than without.  I'd have
> expected the other way around.
>
> I'd really like to see a test case for this...

My feeling is that a "micro-architecture" test would be more likely to
reveal some interesting information.

Best Regards, Simon Riggs


pgsql-performance by date:

Previous
From: Dan Harris
Date:
Subject: Re: Quad Opteron stuck in the mud
Next
From: Vivek Khera
Date:
Subject: Re: Quad Opteron stuck in the mud