Thread: Use of ActiveSnapshot

Use of ActiveSnapshot

From
Jan Wieck
Date:
The use of ActiveSnapshot throughout the code appears rather dangerous 
to me. It is a global pointer, assumed not to be set yet in some places, 
assumed to be saved and restored by the caller in others. The actual 
(context) memory it points to is sometimes explicitly freed, sometimes 
just left in the context and thrown away by MemoryContextDelete() 
without resetting ActiveSnapshot to NULL.

The comment for the call of pg_plan_queries in util/cache/plancache.c 
line 469 for example is fatally wrong. Not only should the snapshot be 
set by all callers at this point, but if the call actually does replan 
the queries, the existing ActiveSnapshot is replaced with one allocated 
on the current memory context. If this happens to be inside of a nested 
SPI call sequence, the innermost SPI stack frame will free the snapshot 
data without restoring ActiveSnapshot to the one from the caller.

Either calling pg_plan_queries() with needSnapshot=false or saving and 
restoring ActiveSnapshot will prevent the backend from dumping core in 
the mentioned example, but I am not entirely sure as to which one is the 
right solution.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: Use of ActiveSnapshot

From
Jan Wieck
Date:
On 5/12/2007 4:53 PM, Jan Wieck wrote:
> Either calling pg_plan_queries() with needSnapshot=false or saving and
> restoring ActiveSnapshot will prevent the backend from dumping core in
> the mentioned example, but I am not entirely sure as to which one is the
> right solution.

Attached is a self contained example that crashes the current backend.
It took me a moment to figure out exactly how to reproduce it. The
problem occurs when the query that needs replanning is actually a

   FOR record IN SELECT ...

that is inside of a nested function call. In that case, the revalidation
of the saved plan actually happens inside of SPI_cursor_open(), which
does not save and restore the ActiveSnapshot. Shouldn't it?


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #
create table t1 (a integer, b text, primary key (a));

create function f1 (integer) returns text as '
declare
  key        alias for $1;
  row        record;
begin
  for row in select a, b from t1 loop
    if row.a = key then
      return row.b;
    end if;
  end loop;
  return null;
end;
' language plpgsql;

create function f2 (integer) returns text as '
declare
  key        alias for $1;
  result    record;
  tmp        record;
begin
  select 5 as a, f1 as b into result from f1(key);
  return result.b;
end;
' language plpgsql;

insert into t1 values (1, 'one');
insert into t1 values (2, 'two');

select f2(1);
alter table t1 add column c text;
select f2(2);

Re: Use of ActiveSnapshot

From
Tom Lane
Date:
Jan Wieck <JanWieck@Yahoo.com> writes:
> The comment for the call of pg_plan_queries in util/cache/plancache.c 
> line 469 for example is fatally wrong. Not only should the snapshot be 
> set by all callers at this point, but if the call actually does replan 
> the queries, the existing ActiveSnapshot is replaced with one allocated 
> on the current memory context. If this happens to be inside of a nested 
> SPI call sequence, the innermost SPI stack frame will free the snapshot 
> data without restoring ActiveSnapshot to the one from the caller.

Yeah, I'd been meaning to go back and recheck that point after the code
settled down, but forgot :-(.

It is possible for RevalidateCachedPlan to be called with no snapshot
yet set --- at least the protocol Describe messages can do that.  I
don't want Describe to force a snapshot because that would be bad for
cases like LOCK TABLE at the start of a serializable transaction, so
RevalidateCachedPlan had better be able to cope with this case.

Since the "typical" case in which no replan is necessary won't touch
the snapshot, I think we'd better adopt the rule that
RevalidateCachedPlan never causes any caller-visible change in
ActiveSnapshot, else we'll be risking very-hard-to-reproduce bugs.
So my proposal is that RevalidateCachedPlan should set a snapshot for
itself if it needs to replan and ActiveSnapshot is NULL (else it might
as well just use the existing snap); and that it should save and restore
ActiveSnapshot when it does this.
        regards, tom lane


Re: Use of ActiveSnapshot

From
Tom Lane
Date:
Jan Wieck <JanWieck@Yahoo.com> writes:
> The only problem with that is that there are code paths that set 
> ActiveSnapshot to palloc()'d memory that is released due to a 
> MemoryContextDelete() without resetting ActiveSnapshot to NULL.

Only at the very end of a transaction (where ActiveSnapshot *is* reset
to null, in FreeXactSnapshot); otherwise we'd have bugs unrelated to
RevalidateCachedPlan.  Eventually I would like to have reference-counted
snapshots managed by a centralized module, as was discussed a month or
two back; but right at the moment I don't think it's broken and I don't
want to spend time on intermediate solutions.

> I think it would be cleaner if RevalidateCachedPlan()'s API would have a 
> Snapshot argument.

How does that improve anything?  AFAICS the only thing that would ever
get passed is ActiveSnapshot, so this is just more notation to do
exactly the same thing.
        regards, tom lane


Re: Use of ActiveSnapshot

From
Jan Wieck
Date:
On 5/14/2007 1:29 PM, Tom Lane wrote:
> Jan Wieck <JanWieck@Yahoo.com> writes:
>> The comment for the call of pg_plan_queries in util/cache/plancache.c 
>> line 469 for example is fatally wrong. Not only should the snapshot be 
>> set by all callers at this point, but if the call actually does replan 
>> the queries, the existing ActiveSnapshot is replaced with one allocated 
>> on the current memory context. If this happens to be inside of a nested 
>> SPI call sequence, the innermost SPI stack frame will free the snapshot 
>> data without restoring ActiveSnapshot to the one from the caller.
> 
> Yeah, I'd been meaning to go back and recheck that point after the code
> settled down, but forgot :-(.
> 
> It is possible for RevalidateCachedPlan to be called with no snapshot
> yet set --- at least the protocol Describe messages can do that.  I
> don't want Describe to force a snapshot because that would be bad for
> cases like LOCK TABLE at the start of a serializable transaction, so
> RevalidateCachedPlan had better be able to cope with this case.
> 
> Since the "typical" case in which no replan is necessary won't touch
> the snapshot, I think we'd better adopt the rule that
> RevalidateCachedPlan never causes any caller-visible change in
> ActiveSnapshot, else we'll be risking very-hard-to-reproduce bugs.
> So my proposal is that RevalidateCachedPlan should set a snapshot for
> itself if it needs to replan and ActiveSnapshot is NULL (else it might
> as well just use the existing snap); and that it should save and restore
> ActiveSnapshot when it does this.

The only problem with that is that there are code paths that set 
ActiveSnapshot to palloc()'d memory that is released due to a 
MemoryContextDelete() without resetting ActiveSnapshot to NULL. So it 
might be possible for RevalidateCachedPlan to go ahead with an 
ActiveSnapshot pointing to garbage.

I think it would be cleaner if RevalidateCachedPlan()'s API would have a 
Snapshot argument. If it needs a snapshot and the argument is NULL, it 
can create (and free) one itself, otherwise it'd use the one given.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: Use of ActiveSnapshot

From
Jan Wieck
Date:
On 5/14/2007 3:35 PM, Tom Lane wrote:
> Jan Wieck <JanWieck@Yahoo.com> writes:
>> The only problem with that is that there are code paths that set 
>> ActiveSnapshot to palloc()'d memory that is released due to a 
>> MemoryContextDelete() without resetting ActiveSnapshot to NULL.
> 
> Only at the very end of a transaction (where ActiveSnapshot *is* reset
> to null, in FreeXactSnapshot); otherwise we'd have bugs unrelated to
> RevalidateCachedPlan.  Eventually I would like to have reference-counted
> snapshots managed by a centralized module, as was discussed a month or
> two back; but right at the moment I don't think it's broken and I don't
> want to spend time on intermediate solutions.

Which means that the 8.3 fix for the reproducible backend crash, I 
posted earlier, is to have SPI_cursor_open() save and restore 
ActiveSnapshot while calling RevalidateCachedPlan(). I'll cross check 
that this fixes this symptom and commit later today.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: Use of ActiveSnapshot

From
Tom Lane
Date:
Jan Wieck <JanWieck@Yahoo.com> writes:
> Which means that the 8.3 fix for the reproducible backend crash, I 
> posted earlier, is to have SPI_cursor_open() save and restore 
> ActiveSnapshot while calling RevalidateCachedPlan(). I'll cross check 
> that this fixes this symptom and commit later today.

No, the correct fix is to do that inside RevalidateCachedPlan ... and I
already did it.
        regards, tom lane


Re: Use of ActiveSnapshot

From
Jan Wieck
Date:
On 5/14/2007 4:26 PM, Tom Lane wrote:
> Jan Wieck <JanWieck@Yahoo.com> writes:
>> Which means that the 8.3 fix for the reproducible backend crash, I 
>> posted earlier, is to have SPI_cursor_open() save and restore 
>> ActiveSnapshot while calling RevalidateCachedPlan(). I'll cross check 
>> that this fixes this symptom and commit later today.
> 
> No, the correct fix is to do that inside RevalidateCachedPlan ... and I
> already did it.

Works for me. It fixed the Slony test that actually tripped over the 
bug. Thanks.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #