On Thu, 2005-02-10 at 02:37 +0000, Brian B. wrote:
> I am loading some spam/ham data/tokens, to be used for the dspam anti-spam
> software, into PostgreSQL. After a few hours of inserting and updating the
> existing data, the backend crashes with a signal 10 (bus error). I am also
> running an ANALYZE command to help keep the indexes up to date every 10
> minutes.
As far as I can tell, ANALYZE is not at issue here.
> I have rebuilt PostgreSQL with debugging enabled and have a
> resulting core dump. I am not sure who to send this to, however. The
> backtrace using the core file is as follows:
>
> #0 0x80c4b52 in transformExpr (pstate=0x86fa630, expr=0x85ec148)
> at parse_expr.c:81
> #1 0x80c5126 in transformExpr (pstate=0x86fa630, expr=0x85ec1d0)
> at parse_expr.c:270
> #2 0x80c5126 in transformExpr (pstate=0x86fa630, expr=0x85ec258)
> at parse_expr.c:270
> #3 0x80c5126 in transformExpr (pstate=0x86fa630, expr=0x85ec2e0)
> at parse_expr.c:270
>
> [... repeat above several thousand times ...]
It seems what's happening here is that dspam is submitting a query with
many thousands of elements in the IN clause. In the parser, we transform
"foo IN (a, b, c)" into "foo = a OR foo = b OR foo = c", and then
recurse for each element of the OR expression and eventually run out of
stack space. (Note that this will actually be worse in HEAD, since a
refactoring I applied will mean we consume two stack frames for each
expression.)
A workaround would be to increase PostgreSQL's stack size.
Perhaps it would be worth considering representing IN lists as a
distinct expression type, at least in the parser. Then the
transformExpr() code would look like:
foreach (element of IN list)
transformExpr(element);
... do whatever else ...
so we wouldn't need to recurse. We could then transform the new
expression type into a list of OR clauses at this point.
-Neil