Re: BUG #6222: Segmentation fault on unlogged table - Mailing list pgsql-bugs

From Robert Haas
Subject Re: BUG #6222: Segmentation fault on unlogged table
Date
Msg-id CA+TgmoY5t_y66hhudje=nuNF0FFaVvevSB5qpUhYpfTQaOP=hA@mail.gmail.com
Whole thread Raw
In response to Re: BUG #6222: Segmentation fault on unlogged table  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: BUG #6222: Segmentation fault on unlogged table  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-bugs
On Mon, Sep 26, 2011 at 11:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> The whole thing is a bit mysterious because ExecQual() doesn't really
> do much that seems like it could generate an invalid memory reference.
>
> I'll poke at this some more...

I added some debugging code which sets a global variable to various
values as it executes this code.  I was still able to reproduce the
crash (but only with unlogged tables, as in your original report) and
the crash appears to be happening here:

+               where_did_we_crash = 104;
                expr_value = ExecEvalExpr(clause, econtext, &isNull, NULL);
+               where_did_we_crash = 105;

I end up with where_did_we_crash = 104 in the core dump.

ExecEvalExpr is a macro which does this:

#define ExecEvalExpr(expr, econtext, isNull, isDone) \
        ((*(expr)->evalfunc) (expr, econtext, isNull, isDone))

I can't print out the value of "clause" directly, because the
invocation of ExecQual() doesn't even show up in the stack trace.  But
I can see from the backtrace that it's getting called by ExecScan()
with an argument of qual.  That qual is a one-element list, and it's
only element is also a List.  That List contains a FuncExprState node
with an evalfunc of ExecEvalOper.  But unless I'm missing something,
that's no good, because ExecQual is only walking the outer list, not
the inner one.  And certainly if it tries to use a List object as an
ExprState, that's not going to work.

To check my work, I did this:

--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -5003,6 +5003,7 @@ ExecQual(List *qual, ExprContext *econtext, bool
resultForNull)
                Datum           expr_value;
                bool            isNull;

+               Assert(!IsA(clause, List));
                expr_value = ExecEvalExpr(clause, econtext, &isNull, NULL);

                if (isNull)

And in fact the test case (when run against the unlogged tables) fails
that assertion:

TRAP: FailedAssertion("!(!((((Node*)(clause))->type) == T_List))",
File: "execQual.c", Line: 5006)

Now I'm not too sure why that is happening yet, but I'm leaning toward
the idea that the bug here is timing-related and that unlogged tables
aren't the cause, but rather just make it easier to hit whatever the
race condition is by removing some overhead.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-bugs by date:

Previous
From: Dave Page
Date:
Subject: Re: BUG #6223: Installation with service account
Next
From: Dave Page
Date:
Subject: Re: BUG #6224: Installation Error of dotconnect for postgre SQL Professional