Thread: BUG #18030: Large memory consumption caused by in-clause subqueries

BUG #18030: Large memory consumption caused by in-clause subqueries

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      18030
Logged by:          Zuming Jiang
Email address:      zuming.jiang@inf.ethz.ch
PostgreSQL version: 16beta2
Operating system:   Ubuntu 20.04
Description:

My fuzzer finds a potential bug in Postgres, which consumes lots of memory
and finally kills the Postgres server on my machine.

--- Test case ---
create table t0 (vkey int4, c0 int4, c5 text);
create table t4 (c28 text);

insert into t0 values (16, -32, 'hCV');
insert into t0 values (18, 59, '#;y=+');
insert into t0 values (33, 16, 'xyOG;');
insert into t0 values (40, -38, 'I|f');
insert into t0 values (44, -58, 'F');
insert into t0 values (47, 88, 'I');
insert into t0 values (52, -0, 'BGrK');
insert into t0 values (53, -7, ' T*k');

insert into t4 values ('E');
insert into t4 values ('l&^');
insert into t4 values ('');
insert into t4 values ('Z@~');
insert into t4 values (null::text);
insert into t4 values ('ngL@');
insert into t4 values (null::text);
insert into t4 values ('M+');

select * from t0
where
lpad(t0.c5, int4mi(t0.vkey, t0.c0 << t0.c0)) in (
      select
        ref_0.c28 as c_0
      from
        t4 as ref_0
      where t0.vkey >= 0)
---

--- Expected behavior ---
The test case should be executed successfully.

--- Actual behavior ---
Postgres server consumes lots of memories and finally gets killed

--- Postgres version ---
Github commit: 8fab4b34801331f1c59352cb0a248be436b60aef
Version: PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by clang
version 10.0.0-4ubuntu1 , 64-bit

--- Platform information ---
Platform: Ubuntu 20.04
Kernel: Linux 5.4.0-147-generic

--- Note ---
After I replace "where t0.vkey >= 0" with "where true" (which does not
change semantics because t0.vkey is always larger than 0), the SELECT query
is much faster and consumes much less memories.


Re: BUG #18030: Large memory consumption caused by in-clause subqueries

From
Laurenz Albe
Date:
On Wed, 2023-07-19 at 08:49 +0000, PG Bug reporting form wrote:
> PostgreSQL version: 16beta2
>
> My fuzzer finds a potential bug in Postgres, which consumes lots of memory
> and finally kills the Postgres server on my machine.
>
> --- Test case ---
> create table t0 (vkey int4, c0 int4, c5 text);
> create table t4 (c28 text);
>
> insert into t0 values (16, -32, 'hCV');
> insert into t0 values (18, 59, '#;y=+');
> insert into t0 values (33, 16, 'xyOG;');
> insert into t0 values (40, -38, 'I|f');
> insert into t0 values (44, -58, 'F');
> insert into t0 values (47, 88, 'I');
> insert into t0 values (52, -0, 'BGrK');
> insert into t0 values (53, -7, ' T*k');
>
> insert into t4 values ('E');
> insert into t4 values ('l&^');
> insert into t4 values ('');
> insert into t4 values ('Z@~');
> insert into t4 values (null::text);
> insert into t4 values ('ngL@');
> insert into t4 values (null::text);
> insert into t4 values ('M+');
>
> select * from t0
> where
> lpad(t0.c5, int4mi(t0.vkey, t0.c0 << t0.c0)) in (
>       select
>         ref_0.c28 as c_0
>       from
>         t4 as ref_0
>       where t0.vkey >= 0)
>
> --- Actual behavior ---
> Postgres server consumes lots of memories and finally gets killed
>
> --- Postgres version ---
> Github commit: 8fab4b34801331f1c59352cb0a248be436b60aef
> Version: PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by clang
> version 10.0.0-4ubuntu1 , 64-bit
>
> --- Note ---
> After I replace "where t0.vkey >= 0" with "where true" (which does not
> change semantics because t0.vkey is always larger than 0), the SELECT query
> is much faster and consumes much less memories.

Strange.  When I run your test case on v16beta2 or Git HEAD, I get

 ERROR:  requested length too large

Commit 6217053f4e added some extra checks in that case, but you should
have that commit.  The symptoms you describe would match an attempt to
allocate a lot of memory, with the OOM killer killing PostgreSQL.

Yours,
Laurenz Albe



Re: BUG #18030: Large memory consumption caused by in-clause subqueries

From
David Rowley
Date:
On Wed, 19 Jul 2023 at 20:53, PG Bug reporting form
<noreply@postgresql.org> wrote:
> select * from t0
> where
> lpad(t0.c5, int4mi(t0.vkey, t0.c0 << t0.c0)) in (
>       select
>         ref_0.c28 as c_0
>       from
>         t4 as ref_0
>       where t0.vkey >= 0)
> ---
>
> --- Expected behavior ---
> The test case should be executed successfully.
>
> --- Actual behavior ---
> Postgres server consumes lots of memories and finally gets killed

When I looked at this at first, I just thought it was a problem
because of the huge values you're passing to lpad, but on looking
again, I don't quite see why we need to keep the result of the lpad
around for the entire execution of the subquery. I wonder if we're
missing a ResetExprContext(econtext) inside ExecScanSubPlan().

The following example is a bit easier to follow:

explain analyze select * from (values(1),(2)) v(v) where lpad(v::text,
1024*1024) in (select 'test' from generate_series(1,1000) where v.v >
0);

Currently, this will consume around 1GB of memory due to executing the
lpad once for each row of the subquery. But isn't it ok to just reset
the tuple context after fetching each row from the subquery? That
would mean we'd only allocate around 1MB at a time instead of 1GB.

David

Attachment
David Rowley <dgrowleyml@gmail.com> writes:
> When I looked at this at first, I just thought it was a problem
> because of the huge values you're passing to lpad, but on looking
> again, I don't quite see why we need to keep the result of the lpad
> around for the entire execution of the subquery. I wonder if we're
> missing a ResetExprContext(econtext) inside ExecScanSubPlan().

> The following example is a bit easier to follow:

> explain analyze select * from (values(1),(2)) v(v) where lpad(v::text,
> 1024*1024) in (select 'test' from generate_series(1,1000) where v.v >
> 0);

> Currently, this will consume around 1GB of memory due to executing the
> lpad once for each row of the subquery. But isn't it ok to just reset
> the tuple context after fetching each row from the subquery? That
> would mean we'd only allocate around 1MB at a time instead of 1GB.

I doubt that that's okay, because if we are passing any pass-by-ref
params to the subquery (cf. loop loading ecxt_param_exec_vals[] at
nodeSubplan.c:257ff), this change would zap them.  But perhaps it is
worth setting up a second, shorter-lived context to evaluate the
per-row comparison expression in?

I also wonder if we could avoid evaluating the lpad() for each row
in the first place.  This'd require breaking down the comparison
expression more finely than we do now.  But certainly the average
user would be surprised that we evaluate lpad() per-subquery-row.

            regards, tom lane