Re: mem context is not reset between extended stats - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: mem context is not reset between extended stats
Date
Msg-id cd3af9ee-fc08-c9b0-e7c9-d3eb6fb74df6@enterprisedb.com
Whole thread Raw
In response to mem context is not reset between extended stats  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: mem context is not reset between extended stats
List pgsql-hackers
On 9/15/21 10:09 PM, Justin Pryzby wrote:
> Memory allocation appeared be O(1) WRT the number of statistics objects, which
> was not expected to me.  This is true in v13 (and probably back to v10).
> 
> It seems to work fine to reset the memory context within the loop, so long as
> the statslist is allocated in the parent context.
> 

Yeah, and I agree this fix seems reasonable. Thanks for looking!

In principle we don't expect too many extended statistics on a single 
table, but building a single statistics may use quite a bit of memory, 
so it makes sense to release it early ...

But while playing with this a bit more, I discovered a much worse issue. 
Consider this:

   create table t (a text, b text, c text, d text,
                   e text, f text, g text, h text);

   insert into t select x, x, x, x, x, x, x, x from (
     select md5(mod(i,100)::text) as x
       from generate_series(1,30000) s(i)) foo;


   create statistics s (dependencies) on a, b, c, d, e, f, g, h from t;

   analyze t;

This ends up eating insane amounts of memory - on my laptop it eats 
~2.5GB and then crashes with OOM. This happens because each call to 
dependency_degree does build_sorted_items, which detoasts the values. 
And resetting the context can't fix that, because this happens while 
building a single statistics object.

IMHO the right fix is to run dependency_degree in a separate context, 
and reset it after each dependency. This releases the detoasted values, 
which are otherwise hard to deal with.

This does not mean we should not do what your patch does too. That does 
address various other "leaks" (for example MCV calls build_sorted_items 
too, but only once so it does not have this same issue).

These issues exist pretty much since PG10, which is where extended stats 
were introduced, so we'll have to backpatch it. But there's no rush and 
I don't want to interfere with rc1 at the moment.

Attached are two patches - 0001 is your patch (seems fine, but I looked 
only very briefly) and 0002 is the context reset I proposed.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: postgres.h included from relcache.h - but removing it breaks pg_upgrade
Next
From: A Z
Date:
Subject: PostgreSQL High Precision Support Extension.