Thread: SMP-PPC spinlocks in 7.2.4?

SMP-PPC spinlocks in 7.2.4?

From
eric soroos
Date:
Hello.

Previously, I had some problems that appear to have been caused by the smp-ppc spinlock issue that was in the early 7.2
series. 

The machine is a dual-800 g4, 10.1.5, using a libpq based client through local domain sockets.

The last few days, I've been dealing with a client who has drastically upped their usage of the database and in doing
sois causing deadlocks. I was running 7.2 or 7.2.1, I upgraded to a locally compiled 7.2.4.  I've run a vacuum full on
thedatabases.  

Sometimes the clients have a ps ax status of async_notify, sometimes there's just a stack of selects and updates that
gethung. (I'd estimate 6 deadlocks since Saturday).  It seems to coincide with times of extra activity, such as when
thedatabases are being backed up with pg_dump.  

I've also noticed the following in cron logs from nightly vacuums

NOTICE: Rel pg_attribute: Uninitialized page 59 - fixing
NOTICE: Rel pg_attribute: Uninitialized page 60 - fixing
....

Is there anything I can do to debug this?  I'm willing to give it a shot, but I'm also rapidly preparing a single proc
linux/intelmachine to take over db duties. 

eric




Re: SMP-PPC spinlocks in 7.2.4?

From
Tom Lane
Date:
eric soroos <eric-psql@soroos.net> writes:
> The last few days, I've been dealing with a client who has drastically upped their usage of the database and in doing
sois causing deadlocks. I was running 7.2 or 7.2.1, I upgraded to a locally compiled 7.2.4.  I've run a vacuum full on
thedatabases.  

> Sometimes the clients have a ps ax status of async_notify, sometimes there's just a stack of selects and updates that
gethung. (I'd estimate 6 deadlocks since Saturday).  It seems to coincide with times of extra activity, such as when
thedatabases are being backed up with pg_dump.  

Hm.  Do they use query-cancels at all?  The reference to async_notify
makes me wonder if this is related to the recently-discovered
async_notify bug that could prevent fast-mode shutdowns.  I'm not
certain how that might lead to an apparent deadlock, but a query cancel
arriving during async_notify would surely improve the odds of trouble.

If you don't mind running a slightly customized version, you might try
back-patching this fix:
http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/backend/commands/async.c.diff?r1=1.91&r2=1.91.2.1
into 7.2.4 and see if that improves matters.

If it doesn't, I'd be interested to look into the matter, but I'd
probably need access to the machine to see what is going on.

> I've also noticed the following in cron logs from nightly vacuums

> NOTICE: Rel pg_attribute: Uninitialized page 59 - fixing
> NOTICE: Rel pg_attribute: Uninitialized page 60 - fixing

These are harmless.

> Is there anything I can do to debug this?  I'm willing to give it a
> shot, but I'm also rapidly preparing a single proc linux/intel machine
> to take over db duties.

I think you're mistaken to be blaming the hardware...

            regards, tom lane

Re: SMP-PPC spinlocks in 7.2.4?

From
eric soroos
Date:
Tom,

> Hm.  Do they use query-cancels at all?  The reference to async_notify
> makes me wonder if this is related to the recently-discovered
> async_notify bug that could prevent fast-mode shutdowns.  I'm not
> certain how that might lead to an apparent deadlock, but a query cancel
> arriving during async_notify would surely improve the odds of trouble.

Not that I know of, unless it's for cleanup of queries when quitting the app or other such abort type states.

> If you don't mind running a slightly customized version, you might try
> back-patching this fix:
> http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/backend/commands/async.c.diff?r1=1.91&r2=1.91.2.1
> into 7.2.4 and see if that improves matters.

I'll give that a shot.

> If it doesn't, I'd be interested to look into the matter, but I'd
> probably need access to the machine to see what is going on.

That's probably possible, but there are some client confidentiality issues.

> > Is there anything I can do to debug this?  I'm willing to give it a
> > shot, but I'm also rapidly preparing a single proc linux/intel machine
> > to take over db duties.
>
> I think you're mistaken to be blaming the hardware...

The linux box is a migration that's being accelerated from this issue. It has more drive, more memory, no app servers,
andcontrol of the kernel shared memory parameters. 

eric




Re: SMP-PPC spinlocks in 7.2.4?

From
eric soroos
Date:
Tom,

> > If you don't mind running a slightly customized version, you might try
> > back-patching this fix:
> > http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/backend/commands/async.c.diff?r1=1.91&r2=1.91.2.1
> > into 7.2.4 and see if that improves matters.
>
> I'll give that a shot.

It patched cleanly except for the version header. I've been running it for about 36 hours now with no problems. I'd say
thatI'm about 85% convinced that it made the difference, as I've also done some optimizations since then that reduce
thedatabase load by caching.  

I'd say that this patch is a candidate for 7.2.5 if there's ever another 7.2 release.

thanks for your help.

eric






Re: SMP-PPC spinlocks in 7.2.4?

From
Tom Lane
Date:
eric soroos <eric-psql@soroos.net> writes:
> I'd say that this patch is a candidate for 7.2.5 if there's ever
> another 7.2 release.

Yeah.  I'm not sure that there will be another 7.2 release, but I'll pop
the patch into the 7.2 CVS branch while I'm thinking about it...

            regards, tom lane