Hello,
we noticed that when writing a large query of form
(((A OR B) OR C) OR ...)
with many terms (e.g. 13000) this causes one of two errors:
1. memory exhausted at or near "("
2. stack depth limit exceeded
This seems bugged/unfortunate, as writing
A OR B OR C OR ...
does not cause the errors.
This means the query fails depending whether or not the user provided parentheses in the query.
Note the query doesn't have to involve `OR`, it can be anyting involving many parentheses.
I tracked down the cause of the two errors:
* In the Bison parser:
#define YYMAXDEPTH 10000
http://git.savannah.gnu.org/cgit/bison.git/tree/src/parse-gram.c?h=v3.8.2#n1376
This causes error:
ERROR: memory exhausted at or near \"(\"
* In the expression simplifier:
simplify_function() at clauses.c:3921
calls expression_tree_mutator() at nodeFuncs.c:3117
calls eval_const_expressions_mutator() at clauses.c:2449
calls simplify_function()
This causes error:
stack depth limit exceeded
from
https://github.com/postgres/postgres/blob/REL_14_12/src/backend/tcop/postgres.c#L3481-L3498
A minimal repro for this can be found in:
https://github.com/questdb/questdb/issues/3841#issuecomment-2536599109
(I'm not affiilated with QuestDB, it seems to be a database also using Postgres's parser.)
Repeating the repro here:
#!/usr/bin/env python3
import sys
N = int(sys.argv[1])
exp = ""
for i in range(N):
exp += "lower("
exp += "'ABCDEFGHI'"
for i in range(N):
exp += ")"
assert exp.count("(") == exp.count(")")
print(exp)
Example invocation:
psql -h localhost -p 5432 postgres -c "select $(python3 expression.py 10000); select 1"
For playing with the limits in Postgres, the `YYMAXDEPTH 10000` constant can be changed by manually changing it
`gram.c`(generated from `gram.y`).
Both above problems stem from using a recursive descent parser in a language with limited stack size (C, or Bison's
manualstack implemented with `goto`).
The Bison parser runs before the expression simplifier.
So if the expression is too large, we get `memory exhausted`.
If the expression is small enough to make it to the simplifer, we get `stack depth limit exceeded` if it's too large
forthe simplier.
The simplifier stack limit can be adjusted with
max_stack_depth = 2MB (the default)
max_stack_depth = 7680kB (possible maximum)
and this maximum is of course still too low if we have tens of terms.
We consider this a bug because it means one cannot easily generate large queries that query a couple thousand entries
ina `SELECT ... WHERE` condition.
It would be great if postgres can alleviate these limits, or at least succeed to parse everything that's written with
parenthesesequally well when written with parentheses.
Thank you!
Niklas & Ruben