Thread: Issue in GIN fast-insert: XLogBeginInsert + Read/LockBuffer ordering
Issue in GIN fast-insert: XLogBeginInsert + Read/LockBuffer ordering
From
Matthias van de Meent
Date:
Hi, In Neon, we've had to modify the GIN fast insertion path as attached, due to an unexpected XLOG insertion and buffer locking ordering issue. The xlog readme [0] mentions that the common order of operations is 1) pin and lock any buffers you need for the log record, then 2) start a critical section, then 3) call XLogBeginInsert. In Neon, we rely on this documented order of operations to expect to be able to WAL-log hint pages (freespace map, visibility map) when they are written to disk (e.g. cache eviction, checkpointer). In general, this works fine, except that in ginHeapTupleFastInsert we call XLogBeginInsert() before the last of the buffers for the eventual record was read, thus creating a path where eviction is possible in a `begininsert_called = true` context. That breaks our current code by being unable to evict (WAL-log) the dirtied hint pages. PFA a patch that rectifies this issue, by moving the XLogBeginInsert() down to where 1.) we have all relevant buffers pinned and locked, and 2.) we're in a critical section, making that part of the code consistent with the general scheme for XLog insertion. Kind regards, Matthias van de Meent [0] access/transam/README, section "Write-Ahead Log Coding", line 436-470
Attachment
HI,
On Sep 8, 2022, 19:08 +0800, Matthias van de Meent <boekewurm+postgres@gmail.com>, wrote:
On Sep 8, 2022, 19:08 +0800, Matthias van de Meent <boekewurm+postgres@gmail.com>, wrote:
In general, this works fine, except that in ginHeapTupleFastInsert we
call XLogBeginInsert() before the last of the buffers for the eventual
record was read, thus creating a path where eviction is possible in a
`begininsert_called = true` context. That breaks our current code by
being unable to evict (WAL-log) the dirtied hint pages.
Does it break Postgres or Neon?
I look around the codes and as far as I can see, dirty pages could be flushed whether begininsert_called is true or false.
I look around the codes and as far as I can see, dirty pages could be flushed whether begininsert_called is true or false.
PFA a patch that rectifies this issue, by moving the XLogBeginInsert()
down to where 1.) we have all relevant buffers pinned and locked, and
2.) we're in a critical section, making that part of the code
consistent with the general scheme for XLog insertion.
+1, Make sense.
Regards,
Zhang Mingli
Hi,
On Sep 8, 2022, 19:08 +0800, Matthias van de Meent <boekewurm+postgres@gmail.com>, wrote:
On Sep 8, 2022, 19:08 +0800, Matthias van de Meent <boekewurm+postgres@gmail.com>, wrote:
PFA a patch that rectifies this issue, by moving the XLogBeginInsert()
down to where 1.) we have all relevant buffers pinned and locked, and
2.) we're in a critical section, making that part of the code
consistent with the general scheme for XLog insertion.
In the same function, there is disorder of XLogBeginInsert and START_CRIT_SECTION.
```
collectordata = ptr = (char *) palloc(collector->sumsize);
data.ntuples = collector->ntuples;
if (needWal)
XLogBeginInsert();
START_CRIT_SECTION();
```
Shall we adjust that too?
```
collectordata = ptr = (char *) palloc(collector->sumsize);
data.ntuples = collector->ntuples;
if (needWal)
XLogBeginInsert();
START_CRIT_SECTION();
```
Shall we adjust that too?
Regards,
Zhang Mingli
On Fri, Sep 30, 2022 at 06:22:02PM +0800, Zhang Mingli wrote: > In the same function, there is disorder of XLogBeginInsert and START_CRIT_SECTION. > > ``` > collectordata = ptr = (char *) palloc(collector->sumsize); > > data.ntuples = collector->ntuples; > > if (needWal) > XLogBeginInsert(); > > START_CRIT_SECTION(); > ``` > > Shall we adjust that too? Nice catches, both of you. Let's adjust everything spotted in one shot. Matthias' patch makes the ordering right, but the initialization path a bit harder to follow when using a separate list. The code is short so it does not strike me as a big deal, and it comes from the fact that we need to lock an existing buffer when merging two lists. For the branch where we insert into a tail page, the pages are already locked but it looks to be enough to move XLogBeginInsert() before the first XLogRegisterBuffer() call. Would any of you like to write a patch? -- Michael
Attachment
HI,
On Oct 12, 2022, 10:09 +0800, Michael Paquier <michael@paquier.xyz>, wrote:
On Oct 12, 2022, 10:09 +0800, Michael Paquier <michael@paquier.xyz>, wrote:
Nice catches, both of you. Let's adjust everything spotted in one
shot. Matthias' patch makes the ordering right, but the
initialization path a bit harder to follow when using a separate list.
The code is short so it does not strike me as a big deal, and it comes
from the fact that we need to lock an existing buffer when merging two
lists. For the branch where we insert into a tail page, the pages are
already locked but it looks to be enough to move XLogBeginInsert()
before the first XLogRegisterBuffer() call.
Would any of you like to write a patch?
--
Michael
Patch added.
Regards,
Zhang Mingli
Attachment
On Wed, Oct 12, 2022 at 09:39:11PM +0800, Zhang Mingli wrote: > Patch added. Thanks. This looks fine on a second look, so applied. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > Thanks. This looks fine on a second look, so applied. Don't we need to back-patch these fixes? regards, tom lane
On Wed, Oct 12, 2022 at 08:54:34PM -0400, Tom Lane wrote: > Don't we need to back-patch these fixes? I guess I should, though I have finished by not doing due to the unlikeliness of the problem, where we would need the combination of a page eviction with an error in the critical section to force a PANIC, or a crash before the WAL gets inserted. Other opinions? -- Michael
Attachment
On 2022-Oct-13, Michael Paquier wrote: > On Wed, Oct 12, 2022 at 08:54:34PM -0400, Tom Lane wrote: > > Don't we need to back-patch these fixes? > > I guess I should, though I have finished by not doing due to the > unlikeliness of the problem, where we would need the combination of a > page eviction with an error in the critical section to force a PANIC, > or a crash before the WAL gets inserted. Other opinions? I suppose it's a matter of whether any bugs are possible outside of Neon. If yes, then definitely this should be backpatched. Offhand, I don't see any. On the other hand, even if no bugs are known, then it's still valuable to have all code paths do WAL insertion in the same way, rather than having a single place that is alone in doing it in a different way. But if we don't know of any bugs, then backpatching might be more risk than not doing so. I confess I don't understand why is it important that XLogBeginInsert is called inside the critical section. It seems to me that that part is only a side-effect of having to acquire the buffer locks in the critical section. Right? I noticed that line 427 logs the GIN metapage with flag REGBUF_STANDARD; is the GIN metapage really honoring pd_upper? I see only pg_lower being set. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "La grandeza es una experiencia transitoria. Nunca es consistente. Depende en gran parte de la imaginación humana creadora de mitos" (Irulan)
On Mon, Oct 24, 2022 at 02:22:16PM +0200, Alvaro Herrera wrote: > I suppose it's a matter of whether any bugs are possible outside of > Neon. If yes, then definitely this should be backpatched. Offhand, I > don't see any. On the other hand, even if no bugs are known, then it's > still valuable to have all code paths do WAL insertion in the same way, > rather than having a single place that is alone in doing it in a > different way. But if we don't know of any bugs, then backpatching > might be more risk than not doing so. I have been putting my mind on that once again, and I don't see how this would cause an issue in vanilla PG code. XLogBeginInsert() does its checks, meaning that we'd get a PANIC instead of an ERROR now that these calls are within a critical section but that should not matter because we know that recovery has ended or we would not be able to do GIN insertions like that. Then, it only switches to begininsert_called to true, that we use for sanity checks in the various WAL insert paths. As Matthias has outlined, Neon relies on begininsert_called more than we do currently. FWIW, I think that we're still fine not backpatching that, even considering the consistency of the code with stable branches. Now, if there is a strong trend in favor of a backpatch, I'd be fine with this conclusion as well. > I confess I don't understand why is it important that XLogBeginInsert is > called inside the critical section. It seems to me that that part is > only a side-effect of having to acquire the buffer locks in the critical > section. Right? Yeah, you are right that it would not matter for XLogBeginInsert(), though I'd like to think that this is a good practice on consistency grounds with anywhere else, and we respect what's documented in the README. > I noticed that line 427 logs the GIN metapage with flag REGBUF_STANDARD; > is the GIN metapage really honoring pd_upper? I see only pg_lower being > set. Hmm. Not sure. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Mon, Oct 24, 2022 at 02:22:16PM +0200, Alvaro Herrera wrote: >> I confess I don't understand why is it important that XLogBeginInsert is >> called inside the critical section. It seems to me that that part is >> only a side-effect of having to acquire the buffer locks in the critical >> section. Right? > Yeah, you are right that it would not matter for XLogBeginInsert(), > though I'd like to think that this is a good practice on consistency > grounds with anywhere else, and we respect what's documented in the > README. Yeah --- it's documented that way, and there doesn't seem to be a good reason not to honor that here. regards, tom lane
On 2022-Oct-25, Tom Lane wrote: > Michael Paquier <michael@paquier.xyz> writes: > > On Mon, Oct 24, 2022 at 02:22:16PM +0200, Alvaro Herrera wrote: > >> I confess I don't understand why is it important that XLogBeginInsert is > >> called inside the critical section. It seems to me that that part is > >> only a side-effect of having to acquire the buffer locks in the critical > >> section. Right? > > > Yeah, you are right that it would not matter for XLogBeginInsert(), > > though I'd like to think that this is a good practice on consistency > > grounds with anywhere else, and we respect what's documented in the > > README. > > Yeah --- it's documented that way, and there doesn't seem to be > a good reason not to honor that here. Okay, so if we follow this argument, then the logical conclusion is that this *should* be backpatched, after all. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ Maybe there's lots of data loss but the records of data loss are also lost. (Lincoln Yeoh)
On Tue, Oct 25, 2022 at 09:37:08AM +0200, Alvaro Herrera wrote: > Okay, so if we follow this argument, then the logical conclusion is that > this *should* be backpatched, after all. After sleeping on it and looking at all the stable branches involved, backpatched down to v10. -- Michael