Re: cost-based vacuum - Mailing list pgsql-performance

From Ian Westmacott
Subject Re: cost-based vacuum
Date
Msg-id 1121291170.13208.25.camel@spectre.intellivid.com
Whole thread Raw
In response to Re: cost-based vacuum  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-performance
I can at least report that the problem does not seem to
occur with Postgres 8.0.1 running on a dual Opteron.

    --Ian


On Wed, 2005-07-13 at 16:39, Simon Riggs wrote:
> On Wed, 2005-07-13 at 14:58 -0400, Tom Lane wrote:
> > Ian Westmacott <ianw@intellivid.com> writes:
> > > On Wed, 2005-07-13 at 11:55, Simon Riggs wrote:
> > >> On Tue, 2005-07-12 at 13:50 -0400, Ian Westmacott wrote:
> > >>> It appears not to matter whether it is one of the tables
> > >>> being written to that is ANALYZEd.  I can ANALYZE an old,
> > >>> quiescent table, or a system table and see this effect.
> > >>
> > >> Can you confirm that this effect is still seen even when the ANALYZE
> > >> doesn't touch *any* of the tables being accessed?
> >
> > > Yes.
> >
> > This really isn't making any sense at all.
>
> Agreed. I think all of this indicates that some wierdness (technical
> term) is happening at a different level in the computing stack. I think
> all of this points fairly strongly to it *not* being a PostgreSQL
> algorithm problem, i.e. if the code was executed by an idealised Knuth-
> like CPU then we would not get this problem. Plus, I have faith that if
> it was a problem in that "plane" then you or another would have
> uncovered it by now.
>
> > However, these certainly do not explain Ian's problem, because (a) these
> > only apply to VACUUM, not ANALYZE; (b) they would only lock the table
> > being VACUUMed, not other ones; (c) if these locks were to block the
> > reader or writer thread, it'd manifest as blocking on a semaphore, not
> > as a surge in LWLock thrashing.
>
> I've seen enough circumstantial evidence to connect the time spent
> inside LWLockAcquire/Release as being connected to the Semaphore ops
> within them, not the other aspects of the code.
>
> Months ago we discussed the problem of false sharing on closely packed
> arrays of shared variables because of the large cache line size of the
> Xeon MP. When last we touched on that thought, I focused on the thought
> that the LWLock array was too tightly packed for the predefined locks.
> What we didn't discuss (because I was too focused on the other array)
> was the PGPROC shared array is equally tightly packed, which could give
> problems on the semaphores in LWLock.
>
> Intel says fairly clearly that this would be an issue.
>
> > >> Is that Xeon MP then?
> >
> > > Yes.
> >
> > The LWLock activity is certainly suggestive of prior reports of
> > excessive buffer manager lock contention, but it makes *no* sense that
> > that would be higher with vacuum cost delay than without.  I'd have
> > expected the other way around.
> >
> > I'd really like to see a test case for this...
>
> My feeling is that a "micro-architecture" test would be more likely to
> reveal some interesting information.
>
> Best Regards, Simon Riggs
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend


pgsql-performance by date:

Previous
From: Vivek Khera
Date:
Subject: Re: Quad Opteron stuck in the mud
Next
From: Simon Riggs
Date:
Subject: Re: Quad Opteron stuck in the mud