On Fri, Feb 8, 2019 at 8:00 AM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Sometimes FreeManagerPutInternal() returns a
> number-of-contiguous-pages-created-by-this-insertion that is too large
> by one. If this happens to be a new max-number-of-contiguous-pages,
> it causes trouble some arbitrary time later because the max is wrong
> and this FPM cannot satisfy a request that large, and it may not be
> recomputed for some time because the incorrect value prevents
> recomputation. Not sure yet if this is due to the lazy computation
> logic or a plain old fence-post error in the btree consolidation code
> or something else.
I spent a long time thinking about this and starting at code this
afternoon, but I didn't really come up with much of anything useful.
It seems like a strange failure mode, because
FreePageManagerPutInternal() normally just returns its third argument
unmodified. The only cases where anything else happens are the ones
where we're able to consolidate the returned span with a preceding or
following span, and I'm scratching my head as to how that logic could
be wrong, especially since it also has some Assert() statements that
seem like they would detect the kinds of inconsistencies that would
lead to trouble. For example, if we somehow ended up with two spans
that (improperly) overlapped, we'd trip an Assert(). And if that
didn't happen -- because we're not in an Assert-enabled build -- the
code is written so that it only relies on the npages value of the last
of the consolidated scans, so an error in the npages value of one of
the earlier spans would just get fixed up.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company