Thread: how to correctly react on exception in pfree function?

how to correctly react on exception in pfree function?

From
Pavel Stehule
Date:
Hi

I had a talk with Julien about the correct handling of an exception raised by pfree function.

Currently, this exception (elog(ERROR, "could not find block containing chunk %p", chunk);) is not specially handled ever. Because the check of pointer sanity is executed first (before any memory modification), then it is safe to repeatedly call pfree (but if I read code correctly, this behavior is not asserted or tested).

The question is - What is the correct action on this error. In the end, this exception means detection of memory corruption. One, and probably safe way is raising FATAL error.  But it looks like too hard a solution and not too friendly. Moreover, this way is not used in the current code base. 

The traditional solution is just raising the exception and doing nothing more. I didn't find code, where the exception from pfree is exactly handled. Similar issues with the possible exception from pfree can be in plan cache, plpgsql code cache, partially in implementation of update of plpgsql variable. Everywhere the implementation is not too strict - just the exception is raised, but the session continues (although in this moment we know so some memory is corrupted).

Is it a common strategy in Postgres?

Regards

Pavel

Re: how to correctly react on exception in pfree function?

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I had a talk with Julien about the correct handling of an exception raised
> by pfree function.

> Currently, this exception (elog(ERROR, "could not find block containing
> chunk %p", chunk);) is not specially handled ever.

There are hundreds, if not thousands, of "shouldn't ever happen" elogs
in Postgres.  We don't make any attempt to trap any of them.  Why do
you think this one should be different?

            regards, tom lane



Re: how to correctly react on exception in pfree function?

From
Julien Rouhaud
Date:
On Wed, Oct 12, 2022 at 07:24:53PM -0400, Tom Lane wrote:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
> > I had a talk with Julien about the correct handling of an exception raised
> > by pfree function.
> 
> > Currently, this exception (elog(ERROR, "could not find block containing
> > chunk %p", chunk);) is not specially handled ever.
> 
> There are hundreds, if not thousands, of "shouldn't ever happen" elogs
> in Postgres.  We don't make any attempt to trap any of them.  Why do
> you think this one should be different?

Because session variables are allocated in a persistent memory context, so
there's a code doing something like this to implement LET variable:

[...]
oldctxt = MemoryContextSwitchTo(SomePersistentContext);
newval = palloc(...);
MemoryContextSwitchTo(oldctxt);
/* No error should happen after that point or we leak memory */
pfree(var->val);
var->val = newval;
return;

Any error thrown in pfree would mean leaking memory forever in that backend.

Is it ok to leak memory in such should-not-happen case or should there be some
safeguard?



Re: how to correctly react on exception in pfree function?

From
Tom Lane
Date:
Julien Rouhaud <rjuju123@gmail.com> writes:
> On Wed, Oct 12, 2022 at 07:24:53PM -0400, Tom Lane wrote:
>> There are hundreds, if not thousands, of "shouldn't ever happen" elogs
>> in Postgres.  We don't make any attempt to trap any of them.  Why do
>> you think this one should be different?

> Because session variables are allocated in a persistent memory context, so
> there's a code doing something like this to implement LET variable:

> [...]
> oldctxt = MemoryContextSwitchTo(SomePersistentContext);
> newval = palloc(...);
> MemoryContextSwitchTo(oldctxt);
> /* No error should happen after that point or we leak memory */
> pfree(var->val);
> var->val = newval;
> return;

> Any error thrown in pfree would mean leaking memory forever in that backend.

I've got little sympathy for that complaint, because an error
in pfree likely means you have worse problems than a leak.
Moreover, what it most likely means is that you messed up your
memory allocations sometime earlier; making your code more
complicated makes that risk higher not lower.

Having said that, the above is really not very good code.
Better would look like

    newval = MemoryContextAlloc(SomePersistentContext, ...);
    ... fill newval ...
    oldval = var->val;
    var->val = newval;
    pfree(oldval);

which leaves your data structure in a consistent state whether
pfree fails or not (and regardless of whether it fails before or
after recycling your chunk).  That property is way more important
than the possibility of leaking some memory.

            regards, tom lane



Re: how to correctly react on exception in pfree function?

From
Julien Rouhaud
Date:
On Wed, Oct 12, 2022 at 10:34:29PM -0400, Tom Lane wrote:
> Julien Rouhaud <rjuju123@gmail.com> writes:
> > On Wed, Oct 12, 2022 at 07:24:53PM -0400, Tom Lane wrote:
> >> There are hundreds, if not thousands, of "shouldn't ever happen" elogs
> >> in Postgres.  We don't make any attempt to trap any of them.  Why do
> >> you think this one should be different?
> 
> > Because session variables are allocated in a persistent memory context, so
> > there's a code doing something like this to implement LET variable:
> 
> > [...]
> > oldctxt = MemoryContextSwitchTo(SomePersistentContext);
> > newval = palloc(...);
> > MemoryContextSwitchTo(oldctxt);
> > /* No error should happen after that point or we leak memory */
> > pfree(var->val);
> > var->val = newval;
> > return;
> 
> > Any error thrown in pfree would mean leaking memory forever in that backend.
> 
> I've got little sympathy for that complaint, because an error
> in pfree likely means you have worse problems than a leak.

I agree, thus the question of what should be done in such case.

> Moreover, what it most likely means is that you messed up your
> memory allocations sometime earlier; making your code more
> complicated makes that risk higher not lower.
> 
> Having said that, the above is really not very good code.
> Better would look like
> 
>     newval = MemoryContextAlloc(SomePersistentContext, ...);
>     ... fill newval ...
>     oldval = var->val;
>     var->val = newval;
>     pfree(oldval);
> 
> which leaves your data structure in a consistent state whether
> pfree fails or not (and regardless of whether it fails before or
> after recycling your chunk).  That property is way more important
> than the possibility of leaking some memory.

Well, it was an over simplification to outline the reason for the question.

The real code has a dedicated function to cleanup the variable (used for other
cases, like ON TRANSACTION END RESET), and one of the callers will use that
function to implement LET so that approach isn't possible as-is.  Note that for
the other callers the code is written like that so the structure is consistent
whether pfree errors out or not.

We can change the API to accept an optional new value (and the few other needed
information) when cleaning the old one, but that's adding some complication
just to deal with a possible error in pfree.  So it still unclear to me what to
do here.



Re: how to correctly react on exception in pfree function?

From
Tom Lane
Date:
Julien Rouhaud <rjuju123@gmail.com> writes:
> We can change the API to accept an optional new value (and the few other needed
> information) when cleaning the old one, but that's adding some complication
> just to deal with a possible error in pfree.  So it still unclear to me what to
> do here.

I think it's worth investing some effort in ensuring consistency
of persistent data structures in the face of errors.  I doubt it's
worth investing effort in avoiding leaks in the face of errors.
In any case, thinking of it in terms of "trapping" errors is the
wrong approach.  We don't have a cheap or complication-free way
to do that, mainly because you can't trap just one error cause.

It may be worth looking at the GUC code, which has been dealing
with the same sorts of issues pretty successfully for many years.

            regards, tom lane



Re: how to correctly react on exception in pfree function?

From
Julien Rouhaud
Date:
On Wed, Oct 12, 2022 at 11:08:25PM -0400, Tom Lane wrote:
> Julien Rouhaud <rjuju123@gmail.com> writes:
> > We can change the API to accept an optional new value (and the few other needed
> > information) when cleaning the old one, but that's adding some complication
> > just to deal with a possible error in pfree.  So it still unclear to me what to
> > do here.
> 
> I think it's worth investing some effort in ensuring consistency
> of persistent data structures in the face of errors.  I doubt it's
> worth investing effort in avoiding leaks in the face of errors.

So if e.g.

LET myvar = somebigstring;

errors out because of hypothetical pfree() error, it would be ok to leak that
memory as long as everything is consistent, meaning here that myvar is in a
normal "reset" state?

> In any case, thinking of it in terms of "trapping" errors is the
> wrong approach.  We don't have a cheap or complication-free way
> to do that, mainly because you can't trap just one error cause.
>
> It may be worth looking at the GUC code, which has been dealing
> with the same sorts of issues pretty successfully for many years.

The GUC code relies on malloc/free, so any hypothetical error during free
should abort and force an emergency shutdown.  I don't think it's ok for
session variables to bypass memory contexts, and forcing a panic wouldn't be
possible without some PG_TRY block.



Re: how to correctly react on exception in pfree function?

From
Tom Lane
Date:
Julien Rouhaud <rjuju123@gmail.com> writes:
> On Wed, Oct 12, 2022 at 11:08:25PM -0400, Tom Lane wrote:
>> It may be worth looking at the GUC code, which has been dealing
>> with the same sorts of issues pretty successfully for many years.

> The GUC code relies on malloc/free,

Not for much longer [1].  And no, I don't believe that that patch
makes any noticeable difference in the code's robustness.

In the end, a bug affecting these considerations is a bug to be fixed
once it's found.  Building potentially-themselves-buggy defenses against
hypothetical bugs is an exercise with rapidly diminishing returns.

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/2982579.1662416866@sss.pgh.pa.us



Re: how to correctly react on exception in pfree function?

From
Julien Rouhaud
Date:
On Wed, Oct 12, 2022 at 11:34:32PM -0400, Tom Lane wrote:
> Julien Rouhaud <rjuju123@gmail.com> writes:
> > On Wed, Oct 12, 2022 at 11:08:25PM -0400, Tom Lane wrote:
> >> It may be worth looking at the GUC code, which has been dealing
> >> with the same sorts of issues pretty successfully for many years.
> 
> > The GUC code relies on malloc/free,
> 
> Not for much longer [1].  And no, I don't believe that that patch
> makes any noticeable difference in the code's robustness.

Ok, so the new code still assumes that guc_free can't/shouldn't fail:

static void
set_string_field(struct config_string *conf, char **field, char *newval)
{
    char       *oldval = *field;

    /* Do the assignment */
    *field = newval;

    /* Free old value if it's not NULL and isn't referenced anymore */
    if (oldval && !string_field_used(conf, oldval))
        guc_free(oldval);
}


[...]

                        set_string_field(conf, &conf->reset_val, newval);
                        set_extra_field(&conf->gen, &conf->reset_extra,
                                        newextra);
                        conf->gen.reset_source = source;
                        conf->gen.reset_scontext = context;
                        conf->gen.reset_srole = srole;

Any error in guc_free will leave the struct in some inconsistent state and
possibly leak some data.  We can use the same approach for session variables.