Thread: BUG #2033: Assertion Failure: File: "procarray.c", Line: 492

BUG #2033: Assertion Failure: File: "procarray.c", Line: 492

From
"Joel Stevenson"
Date:
The following bug has been logged online:

Bug reference:      2033
Logged by:          Joel Stevenson
Email address:      joelstevenson@mac.com
PostgreSQL version: 8.1.0
Operating system:   RHEL 3 update 6
Description:        Assertion Failure: File: "procarray.c", Line: 492
Details:

Hi,

I'm running into an Assertion failure this morning w/8.1.0.  I believe it is
related to using the NOWAIT flag.  Here is the log message:

TRAP: FailedAssertion("!(serializable ? !((MyProc->xmin) != ((TransactionId)
0))
 : ((MyProc->xmin) != ((TransactionId) 0)))", File: "procarray.c", Line:
492)


Postgres was configured using both --enable-debug and --enable-cassert.
Full config options were:

./configure CFLAGS=-O2 -pipe --with-perl --with-openssl
--enable-thread-safety --enable-debug --enable-cassert
--with-includes=/usr/kerberos/include

Some non-default postgresql.conf params:
max_connections = 150
ssl = on
shared_buffers = 4000
work_mem = 102400
maintenance_work_mem = 131072
max_stack_depth = 4096
commit_delay = 100
checkpoint_segments = 5
effective_cache_size = 173015
stats_start_collector = on
stats_command_string = on
stats_block_level = on
stats_row_level = on
stats_reset_on_server_start = on
autovacuum = on
autovacuum_analyze_scale_factor = 0.1

I've removed the 'NOWAIT' for the time being, but thought I should mention
the issue.

Thanks,
Joel

Re: BUG #2033: Assertion Failure: File: "procarray.c", Line: 492

From
Tom Lane
Date:
"Joel Stevenson" <joelstevenson@mac.com> writes:
> I'm running into an Assertion failure this morning w/8.1.0.  I believe it is
> related to using the NOWAIT flag.  Here is the log message:

> TRAP: FailedAssertion("!(serializable ? !((MyProc->xmin) != ((TransactionId)
> 0))
>  : ((MyProc->xmin) != ((TransactionId) 0)))", File: "procarray.c", Line:
> 492)

With no stack trace, and no information about how to reproduce the
error, I'm afraid this report is just about useless :-(

            regards, tom lane

Re: BUG #2033: Assertion Failure: File: "procarray.c",

From
Joel Stevenson
Date:
Sorry, I realize this bug is a little lean on details.

PG is set up to serve multiple concurrent processes in a work queue
fashion, therefore these processes are in contention for the same
resources as the queue is basically a FIFO and I'm using SELECT ...
FOR UPDATE NOWAIT in each client to grab the next work segment in
line.

I've been running the setup on 8.1b3 and 8.1RC1 without this
particular Assertion failure, but had not been using the NOWAIT flag
until today.  During a period of perhaps 20 processes operating on
the queue it looks like postgres failed this assertion 17 times, the
first coming very shortly after the processes began and the rest
following in quick succession.

I do have the core files if further info is needed from them.

Thanks,
Joel


At 3:19 PM +0000 11/9/05, Joel Stevenson wrote:
>The following bug has been logged online:
>
>Bug reference:      2033
>Logged by:          Joel Stevenson
>Email address:      joelstevenson@mac.com
>PostgreSQL version: 8.1.0
>Operating system:   RHEL 3 update 6
>Description:        Assertion Failure: File: "procarray.c", Line: 492
>Details:
>
>Hi,
>
>I'm running into an Assertion failure this morning w/8.1.0.  I believe it is
>related to using the NOWAIT flag.  Here is the log message:
>
>TRAP: FailedAssertion("!(serializable ? !((MyProc->xmin) != ((TransactionId)
>0))
>  : ((MyProc->xmin) != ((TransactionId) 0)))", File: "procarray.c", Line:
>492)
>
>
>Postgres was configured using both --enable-debug and --enable-cassert.
>Full config options were:
>
>./configure CFLAGS=-O2 -pipe --with-perl --with-openssl
>--enable-thread-safety --enable-debug --enable-cassert
>--with-includes=/usr/kerberos/include
>
>Some non-default postgresql.conf params:
>max_connections = 150
>ssl = on
>shared_buffers = 4000
>work_mem = 102400
>maintenance_work_mem = 131072
>max_stack_depth = 4096
>commit_delay = 100
>checkpoint_segments = 5
>effective_cache_size = 173015
>stats_start_collector = on
>stats_command_string = on
>stats_block_level = on
>stats_row_level = on
>stats_reset_on_server_start = on
>autovacuum = on
>autovacuum_analyze_scale_factor = 0.1
>
>I've removed the 'NOWAIT' for the time being, but thought I should mention
>the issue.
>
>Thanks,
>Joel
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

Re: BUG #2033: Assertion Failure: File: "procarray.c",

From
"Jim C. Nasby"
Date:
A backtrace from one of the cores would be a good start.

On Wed, Nov 09, 2005 at 07:44:01AM -0800, Joel Stevenson wrote:
> Sorry, I realize this bug is a little lean on details.
>
> PG is set up to serve multiple concurrent processes in a work queue
> fashion, therefore these processes are in contention for the same
> resources as the queue is basically a FIFO and I'm using SELECT ...
> FOR UPDATE NOWAIT in each client to grab the next work segment in
> line.
>
> I've been running the setup on 8.1b3 and 8.1RC1 without this
> particular Assertion failure, but had not been using the NOWAIT flag
> until today.  During a period of perhaps 20 processes operating on
> the queue it looks like postgres failed this assertion 17 times, the
> first coming very shortly after the processes began and the rest
> following in quick succession.
>
> I do have the core files if further info is needed from them.
>
> Thanks,
> Joel
>
>
> At 3:19 PM +0000 11/9/05, Joel Stevenson wrote:
> >The following bug has been logged online:
> >
> >Bug reference:      2033
> >Logged by:          Joel Stevenson
> >Email address:      joelstevenson@mac.com
> >PostgreSQL version: 8.1.0
> >Operating system:   RHEL 3 update 6
> >Description:        Assertion Failure: File: "procarray.c", Line: 492
> >Details:
> >
> >Hi,
> >
> >I'm running into an Assertion failure this morning w/8.1.0.  I believe it
> >is
> >related to using the NOWAIT flag.  Here is the log message:
> >
> >TRAP: FailedAssertion("!(serializable ? !((MyProc->xmin) !=
> >((TransactionId)
> >0))
> > : ((MyProc->xmin) != ((TransactionId) 0)))", File: "procarray.c", Line:
> >492)
> >
> >
> >Postgres was configured using both --enable-debug and --enable-cassert.
> >Full config options were:
> >
> >./configure CFLAGS=-O2 -pipe --with-perl --with-openssl
> >--enable-thread-safety --enable-debug --enable-cassert
> >--with-includes=/usr/kerberos/include
> >
> >Some non-default postgresql.conf params:
> >max_connections = 150
> >ssl = on
> >shared_buffers = 4000
> >work_mem = 102400
> >maintenance_work_mem = 131072
> >max_stack_depth = 4096
> >commit_delay = 100
> >checkpoint_segments = 5
> >effective_cache_size = 173015
> >stats_start_collector = on
> >stats_command_string = on
> >stats_block_level = on
> >stats_row_level = on
> >stats_reset_on_server_start = on
> >autovacuum = on
> >autovacuum_analyze_scale_factor = 0.1
> >
> >I've removed the 'NOWAIT' for the time being, but thought I should mention
> >the issue.
> >
> >Thanks,
> >Joel
> >
> >---------------------------(end of broadcast)---------------------------
> >TIP 4: Have you searched our list archives?
> >
> >               http://archives.postgresql.org
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: BUG #2033: Assertion Failure: File: "procarray.c",

From
Joel Stevenson
Date:
At 11:42 AM -0500 11/9/05, Tom Lane wrote:
>Joel Stevenson <joelstevenson@mac.com> writes:
>>  gdb's 'bt' on one of the core files produces:
>
>>  #0  0x00138eff in ?? ()
>>  #1  0x0017ec8d in ?? ()
>>  #2  0x00246cd8 in ?? ()
>>  #3  0x00000000 in ?? ()
>
>This looks like you have a "stripped" executable, which is a bit odd
>considering you said that you built with --enable-debug.  Please
>check that the installed executable is what you think it is ...

Hmmm, I've moved all previous (beta, RC) PG base directories and
re-installed 8.1.0 again with --enable-debug and --enable-cassert.
During the make process I saw the cc lines floating past with the -g
option.

The commands were:

$ ./configure --with-perl --with-openssl --enable-thread-safety
--enable-debug --enable-cassert --with-includes=/usr/kerberos/include
$ make -j4
$ sudo make install
$ cp /usr/local/pgsql.old/data /usr/local/pgsql
$ sudo su postgres
$ /usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data start
$ pkill -ABRT postmaster

This produced a bunch of core files that show the following backtrace:

#0  0x001ea038 in ?? ()
#1  0xbfffa4d8 in ?? ()
#2  0xbfffa5e0 in ?? ()
#3  0xbfffa560 in ?? ()
#4  0x08180844 in ?? ()
#5  0x00000005 in ?? ()
#6  0xbfffa4e0 in ?? ()
#7  0x00000000 in ?? ()

I'm at a loss as to what to do about it, really; is there a hidden
configure flag or something that could be in my environment that's
causing the executable to be stripped?  I see various strip related
things within the Makefile, but I definitely didn't make the
install-strip target.

>Is there any chance of putting together a self-contained test case
>that other people could try?

The simple test case so far is not causing the assertion failure,
perhaps because it is just a simple approximation of the case.  At
this point I'm more concerned about getting a non-stripped PG
installed so that should this error or another occur the core file
will have perhaps helpful information.

Thanks in advance for any pointers,

Joel

Re: BUG #2033: Assertion Failure: File: "procarray.c", Line: 492

From
Tom Lane
Date:
Joel Stevenson <joelstevenson@mac.com> writes:
> This produced a bunch of core files that show the following backtrace:

> #0  0x001ea038 in ?? ()
> #1  0xbfffa4d8 in ?? ()
> #2  0xbfffa5e0 in ?? ()
> #3  0xbfffa560 in ?? ()
> #4  0x08180844 in ?? ()
> #5  0x00000005 in ?? ()
> #6  0xbfffa4e0 in ?? ()
> #7  0x00000000 in ?? ()

> I'm at a loss as to what to do about it, really; is there a hidden
> configure flag or something that could be in my environment that's
> causing the executable to be stripped?

Nondefault LDFLAGS maybe?  Look at the commands that link all the
SUBSYS.o files to produce the postgres executable.

Also, on most Unixen "file /path/to/postgres" will tell the difference
between a stripped and nonstripped executable.

If "file" says that the executable isn't stripped, then the problem is
either with gdb itself or with the way you're invoking it.  Could we see
a cut-and-paste of the whole terminal session with gdb, rather than just
the result of the bt part?

            regards, tom lane

Re: BUG #2033: Assertion Failure: File: "procarray.c",

From
Joel Stevenson
Date:
Thanks Tom, and apologies for the confusion; not having worked w/gdb
before I was invoking it incorrectly.  Here's what the core file
produced from the assertion failure has for a backtrace:

$ gdb ./bin/postgres data/core.20633
(gdb) bt
#0  0x00138eff in raise () from /lib/tls/libc.so.6
#1  0x0013a705 in abort () from /lib/tls/libc.so.6
#2  0x0820dc66 in ExceptionalCondition (
     conditionName=0x6 <Address 0x6 out of bounds>,
     errorType=0x5099 <Address 0x5099 out of bounds>, fileName=0x0,
lineNumber=492)
     at assert.c:51
#3  0x081987bd in GetSnapshotData (snapshot=0x8303f04, serializable=0 '\0')
     at procarray.c:514
#4  0x082293df in GetTransactionSnapshot () at tqual.c:1265
#5  0x081a8eb8 in PortalStart (portal=0x9b09a24, params=0x9b1c5dc,
snapshot=0x0)
     at pquery.c:375
#6  0x081a6286 in exec_bind_message (input_message=0xbfffa2e0) at
postgres.c:1643
#7  0x081a7fb2 in PostgresMain (argc=4, argv=0x9aaa50c,
username=0x9aaa4e4 "joels")
     at postgres.c:3205
#8  0x08182a45 in BackendRun (port=0x9ac8d58) at postmaster.c:2854
#9  0x08182529 in BackendStartup (port=0x9ac8d58) at postmaster.c:2498
#10 0x08180a57 in ServerLoop () at postmaster.c:1231
#11 0x0817fee5 in PostmasterMain (argc=3, argv=0x9aa8478) at postmaster.c:943
#12 0x08147a37 in main (argc=3, argv=0x9aa8478) at main.c:256


-Joel

Re: BUG #2033: Assertion Failure: File: "procarray.c", Line: 492

From
Tom Lane
Date:
Joel Stevenson <joelstevenson@mac.com> writes:
> Thanks Tom, and apologies for the confusion; not having worked w/gdb
> before I was invoking it incorrectly.  Here's what the core file
> produced from the assertion failure has for a backtrace:

OK, now that that's straightened out, there are some more things to
look at in the core file.  Please try these gdb commands:

    p *MyProc
    p *SerializableSnapshot
    p *CurrentTransactionState

            regards, tom lane

Re: BUG #2033: Assertion Failure: File: "procarray.c",

From
Joel Stevenson
Date:
At 3:24 PM -0500 11/9/05, Tom Lane wrote:
>Joel Stevenson <joelstevenson@mac.com> writes:
>>  Thanks Tom, and apologies for the confusion; not having worked w/gdb
>>  before I was invoking it incorrectly.  Here's what the core file
>>  produced from the assertion failure has for a backtrace:
>
>OK, now that that's straightened out, there are some more things to
>look at in the core file.  Please try these gdb commands:
>
>    p *MyProc
>    p *SerializableSnapshot
>    p *CurrentTransactionState
>
>            regards, tom lane

Ok, here are the results:

(gdb) p *MyProc
$1 = {links = {prev = 4294967295, next = 4294967295}, sem = {semId = 8880136,
     semNum = 10}, waitStatus = 0, xid = 0, xmin = 0, pid = 20633,
   databaseId = 16385, roleId = 16384, lwWaiting = 0 '\0', lwExclusive
= 1 '\001',
   lwWaitLink = 0x0, waitLock = 0x0, waitProcLock = 0x0, waitLockMode = 0,
   heldLocks = 0, procLocks = {prev = 34618656, next = 34618656}, subxids = {
     overflowed = 0 '\0', nxids = 0, xids = {0 <repeats 64 times>}}}

(gdb) p *SerializableSnapshot
$2 = {xmin = 1068152, xmax = 1068155, xcnt = 2, xip = 0x9b142b8, curcid = 0}

(gdb) p *CurrentTransactionState
$3 = {transactionId = 1068154, subTransactionId = 1, name = 0x0,
savepointLevel = 0,
   state = TRANS_ABORT, blockState = TBLOCK_ABORT, nestingLevel = 1,
   curTransactionContext = 0x9b06a9c, curTransactionOwner = 0x9ab0294,
   childXids = 0x0, currentUser = 0, prevXactReadOnly = 0 '\0', parent = 0x0}

-Joel

Re: BUG #2033: Assertion Failure: File: "procarray.c", Line: 492

From
Tom Lane
Date:
Joel Stevenson <joelstevenson@mac.com> writes:
> (gdb) p *CurrentTransactionState
> $3 = {transactionId = 1068154, subTransactionId = 1, name = 0x0, 
> savepointLevel = 0,
>    state = TRANS_ABORT, blockState = TBLOCK_ABORT, nestingLevel = 1,
>    curTransactionContext = 0x9b06a9c, curTransactionOwner = 0x9ab0294,
>    childXids = 0x0, currentUser = 0, prevXactReadOnly = 0 '\0', parent = 0x0}

Ah-hah, that tells the tale: you're in an already-failed transaction,
and so there are large parts of the backend that Just Won't Work.

What is missing here, I think, is a test in exec_bind_message()
similar to what's in exec_execute_message(): bail out early if the
transaction is aborted and the command in the portal is anything but
ROLLBACK/COMMIT/PREPARE.  In this case you are trying to bind to a
portal containing a SELECT statement (the line number in pquery.c proves
this), and since you couldn't Execute that, there seems little need to
let you Bind it.

Kinda surprising this hasn't been reported before; the bug or something
close to it doubtless exists in 7.4 as well.
        regards, tom lane


Re: BUG #2033: Assertion Failure: File: "procarray.c",

From
Tom Lane
Date:
Joel Stevenson <joelstevenson@mac.com> writes:
> I've been running the setup on 8.1b3 and 8.1RC1 without this
> particular Assertion failure, but had not been using the NOWAIT flag
> until today.  During a period of perhaps 20 processes operating on
> the queue it looks like postgres failed this assertion 17 times, the
> first coming very shortly after the processes began and the rest
> following in quick succession.

BTW, I imagine that the apparent correlation to NOWAIT is occurring
because your client-side code was not checking to see if the NOWAIT
query had failed before trying to launch another query in the same
transaction.

            regards, tom lane

Re: BUG #2033: Assertion Failure: File: "procarray.c",

From
Joel Stevenson
Date:
At 4:09 PM -0500 11/9/05, Tom Lane wrote:
>Joel Stevenson <joelstevenson@mac.com> writes:
>>  I've been running the setup on 8.1b3 and 8.1RC1 without this
>>  particular Assertion failure, but had not been using the NOWAIT flag
>>  until today.  During a period of perhaps 20 processes operating on
>>  the queue it looks like postgres failed this assertion 17 times, the
>>  first coming very shortly after the processes began and the rest
>>  following in quick succession.
>
>BTW, I imagine that the apparent correlation to NOWAIT is occurring
>because your client-side code was not checking to see if the NOWAIT
>query had failed before trying to launch another query in the same
>transaction.

That is indeed what the code was doing.  I'd added NOWAIT to the
SELECT statement and was catching the error that was being thrown
from the NOWAIT behavior, but didn't think that the error would
invalidate the current transaction (which was not a prepared
transaction.)  The fact that it does kill the transaction makes
perfect sense.  I'd been thinking of NOWAIT all wrong: as a "return
on block", but it's an "error on block".

Thanks again, Tom.

-Joel