Thread: BUG #18030: Large memory consumption caused by in-clause subqueries
BUG #18030: Large memory consumption caused by in-clause subqueries
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18030 Logged by: Zuming Jiang Email address: zuming.jiang@inf.ethz.ch PostgreSQL version: 16beta2 Operating system: Ubuntu 20.04 Description: My fuzzer finds a potential bug in Postgres, which consumes lots of memory and finally kills the Postgres server on my machine. --- Test case --- create table t0 (vkey int4, c0 int4, c5 text); create table t4 (c28 text); insert into t0 values (16, -32, 'hCV'); insert into t0 values (18, 59, '#;y=+'); insert into t0 values (33, 16, 'xyOG;'); insert into t0 values (40, -38, 'I|f'); insert into t0 values (44, -58, 'F'); insert into t0 values (47, 88, 'I'); insert into t0 values (52, -0, 'BGrK'); insert into t0 values (53, -7, ' T*k'); insert into t4 values ('E'); insert into t4 values ('l&^'); insert into t4 values (''); insert into t4 values ('Z@~'); insert into t4 values (null::text); insert into t4 values ('ngL@'); insert into t4 values (null::text); insert into t4 values ('M+'); select * from t0 where lpad(t0.c5, int4mi(t0.vkey, t0.c0 << t0.c0)) in ( select ref_0.c28 as c_0 from t4 as ref_0 where t0.vkey >= 0) --- --- Expected behavior --- The test case should be executed successfully. --- Actual behavior --- Postgres server consumes lots of memories and finally gets killed --- Postgres version --- Github commit: 8fab4b34801331f1c59352cb0a248be436b60aef Version: PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by clang version 10.0.0-4ubuntu1 , 64-bit --- Platform information --- Platform: Ubuntu 20.04 Kernel: Linux 5.4.0-147-generic --- Note --- After I replace "where t0.vkey >= 0" with "where true" (which does not change semantics because t0.vkey is always larger than 0), the SELECT query is much faster and consumes much less memories.
On Wed, 2023-07-19 at 08:49 +0000, PG Bug reporting form wrote: > PostgreSQL version: 16beta2 > > My fuzzer finds a potential bug in Postgres, which consumes lots of memory > and finally kills the Postgres server on my machine. > > --- Test case --- > create table t0 (vkey int4, c0 int4, c5 text); > create table t4 (c28 text); > > insert into t0 values (16, -32, 'hCV'); > insert into t0 values (18, 59, '#;y=+'); > insert into t0 values (33, 16, 'xyOG;'); > insert into t0 values (40, -38, 'I|f'); > insert into t0 values (44, -58, 'F'); > insert into t0 values (47, 88, 'I'); > insert into t0 values (52, -0, 'BGrK'); > insert into t0 values (53, -7, ' T*k'); > > insert into t4 values ('E'); > insert into t4 values ('l&^'); > insert into t4 values (''); > insert into t4 values ('Z@~'); > insert into t4 values (null::text); > insert into t4 values ('ngL@'); > insert into t4 values (null::text); > insert into t4 values ('M+'); > > select * from t0 > where > lpad(t0.c5, int4mi(t0.vkey, t0.c0 << t0.c0)) in ( > select > ref_0.c28 as c_0 > from > t4 as ref_0 > where t0.vkey >= 0) > > --- Actual behavior --- > Postgres server consumes lots of memories and finally gets killed > > --- Postgres version --- > Github commit: 8fab4b34801331f1c59352cb0a248be436b60aef > Version: PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by clang > version 10.0.0-4ubuntu1 , 64-bit > > --- Note --- > After I replace "where t0.vkey >= 0" with "where true" (which does not > change semantics because t0.vkey is always larger than 0), the SELECT query > is much faster and consumes much less memories. Strange. When I run your test case on v16beta2 or Git HEAD, I get ERROR: requested length too large Commit 6217053f4e added some extra checks in that case, but you should have that commit. The symptoms you describe would match an attempt to allocate a lot of memory, with the OOM killer killing PostgreSQL. Yours, Laurenz Albe
On Wed, 19 Jul 2023 at 20:53, PG Bug reporting form <noreply@postgresql.org> wrote: > select * from t0 > where > lpad(t0.c5, int4mi(t0.vkey, t0.c0 << t0.c0)) in ( > select > ref_0.c28 as c_0 > from > t4 as ref_0 > where t0.vkey >= 0) > --- > > --- Expected behavior --- > The test case should be executed successfully. > > --- Actual behavior --- > Postgres server consumes lots of memories and finally gets killed When I looked at this at first, I just thought it was a problem because of the huge values you're passing to lpad, but on looking again, I don't quite see why we need to keep the result of the lpad around for the entire execution of the subquery. I wonder if we're missing a ResetExprContext(econtext) inside ExecScanSubPlan(). The following example is a bit easier to follow: explain analyze select * from (values(1),(2)) v(v) where lpad(v::text, 1024*1024) in (select 'test' from generate_series(1,1000) where v.v > 0); Currently, this will consume around 1GB of memory due to executing the lpad once for each row of the subquery. But isn't it ok to just reset the tuple context after fetching each row from the subquery? That would mean we'd only allocate around 1MB at a time instead of 1GB. David
Attachment
David Rowley <dgrowleyml@gmail.com> writes: > When I looked at this at first, I just thought it was a problem > because of the huge values you're passing to lpad, but on looking > again, I don't quite see why we need to keep the result of the lpad > around for the entire execution of the subquery. I wonder if we're > missing a ResetExprContext(econtext) inside ExecScanSubPlan(). > The following example is a bit easier to follow: > explain analyze select * from (values(1),(2)) v(v) where lpad(v::text, > 1024*1024) in (select 'test' from generate_series(1,1000) where v.v > > 0); > Currently, this will consume around 1GB of memory due to executing the > lpad once for each row of the subquery. But isn't it ok to just reset > the tuple context after fetching each row from the subquery? That > would mean we'd only allocate around 1MB at a time instead of 1GB. I doubt that that's okay, because if we are passing any pass-by-ref params to the subquery (cf. loop loading ecxt_param_exec_vals[] at nodeSubplan.c:257ff), this change would zap them. But perhaps it is worth setting up a second, shorter-lived context to evaluate the per-row comparison expression in? I also wonder if we could avoid evaluating the lpad() for each row in the first place. This'd require breaking down the comparison expression more finely than we do now. But certainly the average user would be surprised that we evaluate lpad() per-subquery-row. regards, tom lane