Thread: Issue in GIN fast-insert: XLogBeginInsert + Read/LockBuffer ordering

Issue in GIN fast-insert: XLogBeginInsert + Read/LockBuffer ordering

From
Matthias van de Meent
Date:
Hi,

In Neon, we've had to modify the GIN fast insertion path as attached,
due to an unexpected XLOG insertion and buffer locking ordering issue.

The xlog readme [0] mentions that the common order of operations is 1)
pin and lock any buffers you need for the log record, then 2) start a
critical section, then 3) call XLogBeginInsert.
In Neon, we rely on this documented order of operations to expect to
be able to WAL-log hint pages (freespace map, visibility map) when
they are written to disk (e.g. cache eviction, checkpointer). In
general, this works fine, except that in ginHeapTupleFastInsert we
call XLogBeginInsert() before the last of the buffers for the eventual
record was read, thus creating a path where eviction is possible in a
`begininsert_called = true` context. That breaks our current code by
being unable to evict (WAL-log) the dirtied hint pages.

PFA a patch that rectifies this issue, by moving the XLogBeginInsert()
down to where 1.) we have all relevant buffers pinned and locked, and
2.) we're in a critical section, making that part of the code
consistent with the general scheme for XLog insertion.

Kind regards,

Matthias van de Meent

[0] access/transam/README, section "Write-Ahead Log Coding", line 436-470

Attachment

Re: Issue in GIN fast-insert: XLogBeginInsert + Read/LockBuffer ordering

From
Zhang Mingli
Date:
HI,

On Sep 8, 2022, 19:08 +0800, Matthias van de Meent <boekewurm+postgres@gmail.com>, wrote:
 In general, this works fine, except that in ginHeapTupleFastInsert we
call XLogBeginInsert() before the last of the buffers for the eventual
record was read, thus creating a path where eviction is possible in a
`begininsert_called = true` context. That breaks our current code by
being unable to evict (WAL-log) the dirtied hint pages.
Does it break Postgres or Neon? 
I look around the codes and as far as I can see, dirty pages could be flushed whether begininsert_called is true or false.

PFA a patch that rectifies this issue, by moving the XLogBeginInsert()
down to where 1.) we have all relevant buffers pinned and locked, and
2.) we're in a critical section, making that part of the code
consistent with the general scheme for XLog insertion.
+1, Make sense.

Regards,
Zhang Mingli

Re: Issue in GIN fast-insert: XLogBeginInsert + Read/LockBuffer ordering

From
Zhang Mingli
Date:
Hi,

On Sep 8, 2022, 19:08 +0800, Matthias van de Meent <boekewurm+postgres@gmail.com>, wrote:

PFA a patch that rectifies this issue, by moving the XLogBeginInsert()
down to where 1.) we have all relevant buffers pinned and locked, and
2.) we're in a critical section, making that part of the code
consistent with the general scheme for XLog insertion.

In the same function, there is disorder of XLogBeginInsert and START_CRIT_SECTION.

```
collectordata = ptr = (char *) palloc(collector->sumsize);

 data.ntuples = collector->ntuples;

 if (needWal)
   XLogBeginInsert();

 START_CRIT_SECTION();
```

Shall we adjust that too?

Regards,
Zhang Mingli

Re: Issue in GIN fast-insert: XLogBeginInsert + Read/LockBuffer ordering

From
Michael Paquier
Date:
On Fri, Sep 30, 2022 at 06:22:02PM +0800, Zhang Mingli wrote:
> In the same function, there is disorder of XLogBeginInsert and START_CRIT_SECTION.
>
> ```
> collectordata = ptr = (char *) palloc(collector->sumsize);
>
>  data.ntuples = collector->ntuples;
>
>  if (needWal)
>    XLogBeginInsert();
>
>  START_CRIT_SECTION();
> ```
>
> Shall we adjust that too?

Nice catches, both of you.  Let's adjust everything spotted in one
shot.  Matthias' patch makes the ordering right, but the
initialization path a bit harder to follow when using a separate list.
The code is short so it does not strike me as a big deal, and it comes
from the fact that we need to lock an existing buffer when merging two
lists.  For the branch where we insert into a tail page, the pages are
already locked but it looks to be enough to move XLogBeginInsert()
before the first XLogRegisterBuffer() call.

Would any of you like to write a patch?
--
Michael

Attachment

Re: Issue in GIN fast-insert: XLogBeginInsert + Read/LockBuffer ordering

From
Zhang Mingli
Date:
HI,

On Oct 12, 2022, 10:09 +0800, Michael Paquier <michael@paquier.xyz>, wrote:

Nice catches, both of you. Let's adjust everything spotted in one
shot. Matthias' patch makes the ordering right, but the
initialization path a bit harder to follow when using a separate list.
The code is short so it does not strike me as a big deal, and it comes
from the fact that we need to lock an existing buffer when merging two
lists. For the branch where we insert into a tail page, the pages are
already locked but it looks to be enough to move XLogBeginInsert()
before the first XLogRegisterBuffer() call.

Would any of you like to write a patch?
--
Michael
Patch added.

Regards,
Zhang Mingli
Attachment

Re: Issue in GIN fast-insert: XLogBeginInsert + Read/LockBuffer ordering

From
Michael Paquier
Date:
On Wed, Oct 12, 2022 at 09:39:11PM +0800, Zhang Mingli wrote:
> Patch added.

Thanks.  This looks fine on a second look, so applied.
--
Michael

Attachment

Re: Issue in GIN fast-insert: XLogBeginInsert + Read/LockBuffer ordering

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> Thanks.  This looks fine on a second look, so applied.

Don't we need to back-patch these fixes?

            regards, tom lane



Re: Issue in GIN fast-insert: XLogBeginInsert + Read/LockBuffer ordering

From
Michael Paquier
Date:
On Wed, Oct 12, 2022 at 08:54:34PM -0400, Tom Lane wrote:
> Don't we need to back-patch these fixes?

I guess I should, though I have finished by not doing due to the
unlikeliness of the problem, where we would need the combination of a
page eviction with an error in the critical section to force a PANIC,
or a crash before the WAL gets inserted.  Other opinions?
--
Michael

Attachment

Re: Issue in GIN fast-insert: XLogBeginInsert + Read/LockBuffer ordering

From
Alvaro Herrera
Date:
On 2022-Oct-13, Michael Paquier wrote:

> On Wed, Oct 12, 2022 at 08:54:34PM -0400, Tom Lane wrote:
> > Don't we need to back-patch these fixes?
> 
> I guess I should, though I have finished by not doing due to the
> unlikeliness of the problem, where we would need the combination of a
> page eviction with an error in the critical section to force a PANIC,
> or a crash before the WAL gets inserted.  Other opinions?

I suppose it's a matter of whether any bugs are possible outside of
Neon.  If yes, then definitely this should be backpatched.  Offhand, I
don't see any.  On the other hand, even if no bugs are known, then it's
still valuable to have all code paths do WAL insertion in the same way,
rather than having a single place that is alone in doing it in a
different way.  But if we don't know of any bugs, then backpatching
might be more risk than not doing so.

I confess I don't understand why is it important that XLogBeginInsert is
called inside the critical section.  It seems to me that that part is
only a side-effect of having to acquire the buffer locks in the critical
section.  Right?

I noticed that line 427 logs the GIN metapage with flag REGBUF_STANDARD;
is the GIN metapage really honoring pd_upper?  I see only pg_lower being
set.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"La grandeza es una experiencia transitoria.  Nunca es consistente.
Depende en gran parte de la imaginación humana creadora de mitos"
(Irulan)



Re: Issue in GIN fast-insert: XLogBeginInsert + Read/LockBuffer ordering

From
Michael Paquier
Date:
On Mon, Oct 24, 2022 at 02:22:16PM +0200, Alvaro Herrera wrote:
> I suppose it's a matter of whether any bugs are possible outside of
> Neon.  If yes, then definitely this should be backpatched.  Offhand, I
> don't see any.  On the other hand, even if no bugs are known, then it's
> still valuable to have all code paths do WAL insertion in the same way,
> rather than having a single place that is alone in doing it in a
> different way.  But if we don't know of any bugs, then backpatching
> might be more risk than not doing so.

I have been putting my mind on that once again, and I don't see how
this would cause an issue in vanilla PG code.  XLogBeginInsert() does
its checks, meaning that we'd get a PANIC instead of an ERROR now that
these calls are within a critical section but that should not matter
because we know that recovery has ended or we would not be able to do
GIN insertions like that.  Then, it only switches to
begininsert_called to true, that we use for sanity checks in the
various WAL insert paths.  As Matthias has outlined, Neon relies on
begininsert_called more than we do currently.  FWIW, I think that
we're still fine not backpatching that, even considering the
consistency of the code with stable branches.  Now, if there is a
strong trend in favor of a backpatch, I'd be fine with this conclusion
as well.

> I confess I don't understand why is it important that XLogBeginInsert is
> called inside the critical section.  It seems to me that that part is
> only a side-effect of having to acquire the buffer locks in the critical
> section.  Right?

Yeah, you are right that it would not matter for XLogBeginInsert(),
though I'd like to think that this is a good practice on consistency
grounds with anywhere else, and we respect what's documented in the
README.

> I noticed that line 427 logs the GIN metapage with flag REGBUF_STANDARD;
> is the GIN metapage really honoring pd_upper?  I see only pg_lower being
> set.

Hmm.  Not sure.
--
Michael

Attachment

Re: Issue in GIN fast-insert: XLogBeginInsert + Read/LockBuffer ordering

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Mon, Oct 24, 2022 at 02:22:16PM +0200, Alvaro Herrera wrote:
>> I confess I don't understand why is it important that XLogBeginInsert is
>> called inside the critical section.  It seems to me that that part is
>> only a side-effect of having to acquire the buffer locks in the critical
>> section.  Right?

> Yeah, you are right that it would not matter for XLogBeginInsert(),
> though I'd like to think that this is a good practice on consistency
> grounds with anywhere else, and we respect what's documented in the
> README.

Yeah --- it's documented that way, and there doesn't seem to be
a good reason not to honor that here.

            regards, tom lane



Re: Issue in GIN fast-insert: XLogBeginInsert + Read/LockBuffer ordering

From
Alvaro Herrera
Date:
On 2022-Oct-25, Tom Lane wrote:

> Michael Paquier <michael@paquier.xyz> writes:
> > On Mon, Oct 24, 2022 at 02:22:16PM +0200, Alvaro Herrera wrote:
> >> I confess I don't understand why is it important that XLogBeginInsert is
> >> called inside the critical section.  It seems to me that that part is
> >> only a side-effect of having to acquire the buffer locks in the critical
> >> section.  Right?
> 
> > Yeah, you are right that it would not matter for XLogBeginInsert(),
> > though I'd like to think that this is a good practice on consistency
> > grounds with anywhere else, and we respect what's documented in the
> > README.
> 
> Yeah --- it's documented that way, and there doesn't seem to be
> a good reason not to honor that here.

Okay, so if we follow this argument, then the logical conclusion is that
this *should* be backpatched, after all.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
Maybe there's lots of data loss but the records of data loss are also lost.
(Lincoln Yeoh)



Re: Issue in GIN fast-insert: XLogBeginInsert + Read/LockBuffer ordering

From
Michael Paquier
Date:
On Tue, Oct 25, 2022 at 09:37:08AM +0200, Alvaro Herrera wrote:
> Okay, so if we follow this argument, then the logical conclusion is that
> this *should* be backpatched, after all.

After sleeping on it and looking at all the stable branches involved,
backpatched down to v10.
--
Michael

Attachment