Thread: Stuck Spinlock Error Message

Stuck Spinlock Error Message

From
Ludwig Isaac Lim
Date:
 Hi:

    I notice the following error message in my
postgresql log file:

FATAL : s_lock (0x401db020) at lwlock.c Stuck
spinlock. Aborting

 Version of my postgresql :

PostgreSQL 7.2.3 on i686-pc-linux-gnu compiled by
GCC 2.96

Operating System : RedHat 7.1

     What can cause a stuck spinlock?

Thanks in advance,
ludwig lim



__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

Re: Stuck Spinlock Error Message

From
Tom Lane
Date:
Ludwig Isaac Lim <ludz_lim@yahoo.com> writes:
>      What can cause a stuck spinlock?

In theory, that shouldn't ever happen.  Can you reproduce it?

            regards, tom lane

Re: Stuck Spinlock Error Message

From
Ludwig Isaac Lim
Date:
Hi:

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >      What can cause a stuck spinlock?
> In theory, that shouldn't ever happen.  Can you
> reproduce it?
> >             regards, tom lane

   I could not reproduce it, but I'll describe how
error happen. I have a program that read a file large
file which 20,000 records and spawn a process that
execute a PLPGSQL stored function based on the content
of the file.

    The following is a table of the SQL statement
generated:
    process 1         SELECT f1(120,  123.3);
process 2         SELECT f1(120,   53.3);
process 3         SELECT f1(120,   31.3);
    ..
    ..
    process n         SELECT f1(120,    2.3);

  the function f1 is basically defined as
  CREATE OR REPLACE FUNCTION f1(integer, float8)
  RETURN INTEGER
  AS'
  DECLARE
     -- some variable declaration  BEGIN
     -- Lock the table based on the first parameter
    -- of the stored function (use record level lock)
   SELECT *
     FROM   t1
     WHERE  field1 = $1
     FOR UPDATE;
     --a batch of SQL statements here --
  END;'
  LANGUAGE 'plpgsql';

  As you noticed the the first parameter of the called
function is the same (Due to bug on our program).
Since it performs a record level lock on the record,
the processes will queue (i.e. will execute if  only a
process relinquish its lock).   I'm guessing that the
there was just to many postmaster process trying to
concurrently trying to access the same record being
lock by a record-lock. When I execute the "top"
command in linux there are a lot of postmaster process
in the process list

   Is the spinlock error possible given that scenario?
Is this error related to the following error messages:
   fatal 2: cannot write block 3 of 16556/148333 blind
: too many open files in sysytem.

   Note : I was able to correct the above error
messages by increasing the file-max parameter in the
"sysctl.conf".

   I'm guessing that the spinlock error occurs after
there are around hundreds (or thousands) of queued
postmaster processes.

best regards,
ludwig


__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com

Re: Stuck Spinlock Error Message

From
Tom Lane
Date:
Ludwig Isaac Lim <ludz_lim@yahoo.com> writes:
>    I'm guessing that the spinlock error occurs after
> there are around hundreds (or thousands) of queued
> postmaster processes.

Thousands?  How large is your max_connections parameter, anyway
(and do you really have big enough iron to support it)?

The stuck spinlock error implies that some work that should have
taken a fraction of a microsecond (namely the time to check and update
the internal state of an LWLock structure) took upwards of a minute.

Since the process holding the spinlock could lose the CPU, it's
certainly physically possible for the actual duration of holding the
spinlock to be much more than a microsecond.  But the odds of losing
the CPU while holding the spinlock are not large, since it's held for
just a small number of instructions.  And to get an actual "stuck
spinlock" failure would imply that the holding process didn't get
scheduled again for more than a minute (while some other process that
wanted the spinlock *did* get scheduled again --- repeatedly).  I
suppose this is possible if your machine is sufficiently badly
overloaded.

            regards, tom lane