Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints |
Date | |
Msg-id | CA+TgmoYKovODW2Y7rQmmRFaKu445p9uAahjpgfbY8eyeL07BXA@mail.gmail.com Whole thread Raw |
In response to | Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
|
List | pgsql-hackers |
On Mon, Mar 21, 2022 at 2:23 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Mar 21, 2022 at 11:21 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > I tried to debug the case but I realized that somehow > > CHECK_FOR_INTERRUPTS() is not calling the > > AcceptInvalidationMessages() and I could not find the same while > > looking into the code as well. While debugging I noticed that > > AcceptInvalidationMessages() is called multiple times but that is only > > through LockRelationId() but while locking the relation we had already > > closed the previous smgr because at a time we keep only one smgr open. > > And that's the reason it is not hitting the issue which we think it > > could. Is there any condition under which it will call > > AcceptInvalidationMessages() through CHECK_FOR_INTERRUPTS() ? because > > I could not see while debugging as well as in code. > > Yeah, I think the reason you can't find it is that it's not there. I > was confused in what I wrote earlier. I think we only process sinval > catchups when we're idle, not at every CHECK_FOR_INTERRUPTS(). And I > think the reason for that is precisely that it would be hard to write > correct code otherwise, since invalidations might then get processed > in a lot more places. So ... I guess all we really need to do here is > avoid assuming that the results of smgropen() are valid across any > code that might acquire relation locks. Which I think the code already > does. So I talked to Andres and Thomas about this and they told me that I was right to worry about this problem. Over on the thread about "wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:" there is a plan to make use ProcSignalBarrier to make smgr objects disappear, and ProcSignalBarrier can be processed at any CHECK_FOR_INTERRUPTS(), so then we'd have a problem here. Commit f10f0ae420ee62400876ab34dca2c09c20dcd030 established a policy that you should always re-fetch the smgr object instead of reusing one you've already got, and even before that it was known to be unsafe to keep them around for any period of time, because anything that opened a relation, including a syscache lookup, could potentially accept invalidations. So most of our code is already hardened against the possibility of smgr objects disappearing. I have a feeling there may be some that isn't, but it would be good if this patch didn't introduce more such code at the same time that patch is trying to introduce more ways to get rid of smgr objects. It was suggested to me that what this patch ought to be doing is calling CreateFakeRelcacheEntry() and then using RelationGetSmgr(fakerel) every time we need the SmgrRelation, without ever keeping it around for any amount of code. That way, if the smgr relation gets closed out from under us at a CHECK_FOR_INTERRUPTS(), we'll just recreate it at the next RelationGetSmgr() call. Andres also noted that he thinks the patch performs redundant cleanup, because of the fact that it uses RelationCreateStorage. That will arrange to remove files on abort, but createdb() also has its own mechanism for that. It doesn't seem like a thing to do twice in two different ways. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: