"memory exhausted" in query parser/simplifier for many nested parentheses - Mailing list pgsql-bugs

From Niklas Hambüchen
Subject "memory exhausted" in query parser/simplifier for many nested parentheses
Date
Msg-id 161892c0-65fc-42a6-8af9-b098879decd4@nh2.me
Whole thread Raw
Responses Re: "memory exhausted" in query parser/simplifier for many nested parentheses
Re: "memory exhausted" in query parser/simplifier for many nested parentheses
List pgsql-bugs
Hello,

we noticed that when writing a large query of form

    (((A OR B) OR C) OR ...)

with many terms (e.g. 13000) this causes one of two errors:

1. memory exhausted at or near "("
2. stack depth limit exceeded

This seems bugged/unfortunate, as writing

    A OR B OR C OR ...

does not cause the errors.
This means the query fails depending whether or not the user provided parentheses in the query.

Note the query doesn't have to involve `OR`, it can be anyting involving many parentheses.

I tracked down the cause of the two errors:

* In the Bison parser:

      #define YYMAXDEPTH 10000

  http://git.savannah.gnu.org/cgit/bison.git/tree/src/parse-gram.c?h=v3.8.2#n1376
  This causes error:
      ERROR:  memory exhausted at or near \"(\"

* In the expression simplifier:

            simplify_function()                at clauses.c:3921
      calls expression_tree_mutator()          at nodeFuncs.c:3117
      calls eval_const_expressions_mutator()   at clauses.c:2449
      calls simplify_function()

  This causes error:
      stack depth limit exceeded
  from
      https://github.com/postgres/postgres/blob/REL_14_12/src/backend/tcop/postgres.c#L3481-L3498

A minimal repro for this can be found in:

    https://github.com/questdb/questdb/issues/3841#issuecomment-2536599109

(I'm not affiilated with QuestDB, it seems to be a database also using Postgres's parser.)

Repeating the repro here:

    #!/usr/bin/env python3

    import sys

    N = int(sys.argv[1])

    exp = ""
    for i in range(N):
        exp += "lower("

    exp += "'ABCDEFGHI'"

    for i in range(N):
        exp += ")"

    assert exp.count("(") == exp.count(")")
    print(exp)

Example invocation:

    psql -h localhost -p 5432 postgres -c "select $(python3 expression.py 10000); select 1"

For playing with the limits in Postgres, the `YYMAXDEPTH 10000` constant can be changed by manually changing it
`gram.c`(generated from `gram.y`).
 

Both above problems stem from using a recursive descent parser in a language with limited stack size (C, or Bison's
manualstack implemented with `goto`).
 

The Bison parser runs before the expression simplifier.
So if the expression is too large, we get `memory exhausted`.
If the expression is small enough to make it to the simplifer, we get `stack depth limit exceeded` if it's too large
forthe simplier.
 
The simplifier stack limit can be adjusted with

    max_stack_depth = 2MB     (the default)
    max_stack_depth = 7680kB  (possible maximum)

and this maximum is of course still too low if we have tens of terms.

We consider this a bug because it means one cannot easily generate large queries that query a couple thousand entries
ina `SELECT ... WHERE` condition.
 

It would be great if postgres can alleviate these limits, or at least succeed to parse everything that's written with
parenthesesequally well when written with parentheses.
 

Thank you!
Niklas & Ruben



pgsql-bugs by date:

Previous
From: Artur Zakirov
Date:
Subject: Re: pg_dump crash on identity sequence with not loaded attributes
Next
From: Thomas Munro
Date:
Subject: Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails