Re: dsa_allocate() faliure - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: dsa_allocate() faliure
Date
Msg-id CAEepm=1C3t0B9yXDFtNgPDS0c--RZjDQuaCpFCaCaFUbPb6AFQ@mail.gmail.com
Whole thread Raw
In response to Re: dsa_allocate() faliure  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: dsa_allocate() faliure  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: dsa_allocate() faliure  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
On Sun, Feb 10, 2019 at 5:41 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Feb 10, 2019 at 2:37 AM Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
> > But at first glance it shouldn't be allocating pages, because it just
> > does consolidation to try to convert to singleton format, and then it
> > does recycle list cleanup using soft=true so that no allocation of
> > btree pages should occur.
>
> I think I see what's happening.  At the moment the problem occurs,
> there is no btree - there is only a singleton range.  So
> FreePageManagerInternal() takes the fpm->btree_depth == 0 branch and
> then ends up in the section with the comment  /* Not contiguous; we
> need to initialize the btree. */.  And that section, sadly, does not
> respect the 'soft' flag, so kaboom.  Something like the attached might
> fix it.

Ouch.  Yeah, that'd do it and matches the evidence.  With this change,
I couldn't reproduce the problem after 90 minutes with a test case
that otherwise hits it within a couple of minutes.

Here's a patch with a commit message explaining the change.

It also removes an obsolete comment, which is in fact related.  The
comment refers to an output parameter internal_pages_used, which must
have been used to report this exact phenomenon in an earlier
development version.  But there is no such parameter in the committed
version, and instead there is the soft flag to prevent internal
allocation.  I have no view on which approach is best, but yeah, if
we're using a soft flag, it has to work reliably.

This brings us to a difficult choice: we're about to cut a new
release, and this could in theory be included.  Even though the fix is
quite convincing, it doesn't seem wise to change such complicated code
at the last minute, and I know from an off-list chat that that is also
Robert's view.  So I'll wait until after the release, and we'll have
to live with the bug for another 3 months.

Note that this patch addresses the error "dsa_allocate could not find
%zu free pages".  (The error "dsa_area could not attach to segment" is
something else and apparently rarer.)

> Boy, I love FreePageManagerDump!

Yeah.  And I love reproducible bugs.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Reporting script runtimes in pg_regress
Next
From: Tom Lane
Date:
Subject: Re: dsa_allocate() faliure