Backends waiting, spinlocks, shared mem patches - Mailing list pgsql-hackers

From Wayne Piekarski
Subject Backends waiting, spinlocks, shared mem patches
Date
Msg-id 199905301406.XAA20023@helpdesk.senet.com.au
Whole thread Raw
Responses Re: [HACKERS] Backends waiting, spinlocks, shared mem patches  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi everyone,

Sorry this has taken me so long to get back to you. Just to refresh
everyones memory, I was the one who was having problems with postgres'
backends just hanging around in waiting, not doing anything. Tom Lane sent
me a patch to fix this for 6.4.2.

We didn't just install the patch on our live system and run it, as we were
worried about breaking something, so we spent a lot of time thrashing it
around, trying to reproduce the problem to check if it had been fixed
(this is why its taken me a while to do this). We captured a few hundred
sessions that our CGI's have with the database, including begin..commit
pairs and everything, in order to accurately simulate a heavy load on the
dbms. We tried this program, keeping about 40-50 connections going the
whole time, and we could not get the waiting problem to occur with even
the normal 6.4.2 so it was not possible to test if the patch had fixed our
particular problem.

So that was disappointing, we figured because we were hammering it so hard
that it would fail quickly and could use this as a good test program. The
6.4.2 patched version ran fine as well, so this was good. It seems that
the problem was caused by very rare circumstances which we just couldn't
reproduce during testing.

One thing we did notice is that when we tried to open more than say 50
backends, we would get the following:

InitPostgres
IpcSemaphoreCreate: semget failed (No space left on device) key=5432017,
num=16, permission=600
proc_exit(3) [#0]         

Shortly after, we would get:

FATAL: s_lock(18001065) at spin.c:125, stuck spinlock. Aborting.


Our FreeBSD machine was not setup for a huge number of semaphores, so the
semget was failing. That was fair enough, but then postmaster would die
afterwards with the spinlock error. I saw a post by Hiroshi Inoue with the
following:

>Hi all,
>
>ProcReleaseSpins() does nothing unless MyProc is set.
>So both elog(ERROR/FATAL) and proc_exit(0) before 
>InitProcess() don't release spinlocks.
>
>Comments ?
>
>Hiroshi Inoue
>Inoue@tpf.co.jp

I would have to agree with him here, i'm not familiar with postgres
internals but it looks like when semget fails, the backend doesn't clean
up the resources it already owns. I'm not sure if this is fixed, as I
can't always read the hackers list, but I thought I'd mention this in case
someone found it interesting.


We tried the same massive number of connections test with 6.5 and it
refuses to accept the connection after a while, which is good. I'm reading
through archives about MaxBackendId now, so I'm going to play with that.



So anyways, we installed the 6.4.2 patch a few days ago, and it seems to
be running ok. I haven't seen any cases where we get processes waiting for
nothing, (yet anyway - i'll have to wait and see for a few days). However,
now we are getting the stuck spinlock errors due to too many backends
being open, which I'm trying to prevent now so hopefully these two
problems will both go away now.


Now that I've learned more about the stuck spinlock problem, I realise
that when I emailed the first time, it was not just one problem, but two
or three at the same time which were making it harder to nail down what
the problem was. We will watch it over the week.

We have also been doing some testing with the latest 6.5 from the other
day, to check that certain problems we've bumped into have been fixed. We
can't run it live, but we'll try to run our testing programs on it as a
best approximation to help flush out any bugs that might be left.


Thanks for your help everyone, I hope that this has been helpful for
everyone else as well. I'm really looking forward to 6.5  :)


bye,
Wayne

------------------------------------------------------------------------------
Wayne Piekarski                               Tel:     (08) 8221 5221
Research & Development Manager                Fax:     (08) 8221 5220
SE Network Access Pty Ltd                     Mob:     0407 395 889
222 Grote Street                              Email:   wayne@senet.com.au
Adelaide SA 5000                              WWW:     http://www.senet.com.au


pgsql-hackers by date:

Previous
From: Vadim Mikheev
Date:
Subject: Re: [HACKERS] Mariposa commericalized by Stonebraker
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] Daemon News article