Re: [HACKERS] [PATCH] Assert that the correct locks are held whencalling PageGetLSN() - Mailing list pgsql-hackers
From | Jacob Champion |
---|---|
Subject | Re: [HACKERS] [PATCH] Assert that the correct locks are held whencalling PageGetLSN() |
Date | |
Msg-id | CABAq_6GvDL=V52kK9h2JujY_xmb3iKDHq7CUN7WrwbU4mYWD9A@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] [PATCH] Assert that the correct locks are held whencalling PageGetLSN() (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: [HACKERS] [PATCH] Assert that the correct locks are held whencalling PageGetLSN()
|
List | pgsql-hackers |
Hi Michael, thanks for the review! On Sun, Sep 3, 2017 at 11:14 PM, Michael Paquier <michael.paquier@gmail.com> wrote: >> While working on checksum support for GPDB, we noticed that several >> callers of PageGetLSN() didn't follow the correct locking procedure. > > Well, PageGetLSN can be used in some hot code paths, xloginsert.c > being one, so it does not seem wise to me to switch it to something > more complicated than a macro, If raw perf is an issue -- now that I've refactored the assertion into its own function, I could probably push the implementation back into a macro. Something like #define PageGetLSN(page) (AssertPageIsLockedForLSN((Page) page), PageXLogRecPtrGet(((PageHeader) page)->pd_lsn)) Callers would need to watch out for the double-evaluation, but that's already documented with this group of macros. > and also it is not bounded to any > locking contracts now. For example, pageinspect works on a copy of the > buffer, so that's one case where we just don't care. So we should it > be tightened in Postgres? The proposed patch doesn't tighten the contract in this case -- if the buffer that's passed to PageGetLSN() isn't a shared buffer (i.e. if it's not part of the BufferBlocks array), the assertion passes unconditionally. > That's the portion of the proposal I don't > get. You could introduce a custom API for this purpose, isn't the > origin of your problems with GPDB for checksums that data is > replicated across many nodes so things need to be locked uniquely? GPDB introduces additional problems, but I think this issue is independent of anything GPDB-specific. One way to look at it: are the changes in the proposed patch, from PageGetLSN to BufferGetLSNAtomic, correct? If not, well, that's a good discussion to have. But if they are correct, then why were they missed? and is there a way to help future contributors out in the future? That's the goal of this patch; I don't think it's something that a custom API solves. >> - This change introduces a subtle source incompatibility: whereas >> previously both Pages and PageHeaders could be passed to PageGetLSN(), >> now callers must explicitly cast to Page if they don't already have >> one. > > That would produce warnings for extensions, so people would likely > catch that when compiling with a newer version, if they use -Werror... > >> - Is my use of FRONTEND here correct? The documentation made it seem >> like some compilers could try to link against the >> AssertPageIsLockedForLSN function, even if PageGetLSN was never >> called. > > +#if defined(USE_ASSERT_CHECKING) && !defined(FRONTEND) > +void > +AssertPageIsLockedForLSN(Page page) > +{ > From a design point of view, bufmgr.c is an API interface for buffer > management on the backend-size, so just using FRONTEND there looks > wrong to me (bufmgr.h is already bad enough IMO now). The comments suggested that bufmgr.h could be incidentally pulled into frontend code through other headers. Or do you mean that I don't need to check for FRONTEND in the C source file (since, I assume, it's only ever being compiled for the backend)? I'm not sure what you mean by "bufmgr.h is already bad enough". >> - Use of PageGetLSN() in heapam.c seems to be incorrect, but I don't >> really know what the best way is to fix it. It explicitly unlocks the >> buffer, with the comment that visibilitymap_set() needs to handle >> locking itself, but then calls PageGetLSN() to determine whether >> visibilitymap_set() needs to be called. > > This bit in src/backend/access/transam/README may interest you: > Note that we must only use PageSetLSN/PageGetLSN() when we know the action > is serialised. Only Startup process may modify data blocks during recovery, > so Startup process may execute PageGetLSN() without fear of serialisation > problems. All other processes must only call PageSet/GetLSN when holding > either an exclusive buffer lock or a shared lock plus buffer header lock, > or be writing the data block directly rather than through shared buffers > while holding AccessExclusiveLock on the relation. > > So I see no problem with the exiting caller. Is heap_xlog_visible only called during startup? If so, is there a general rule for which locks are required during startup and which aren't? Currently there is no exception in the assertion made for startup processes, so that's something that would need to be changed. Is there a way to determine whether the current process considers itself to be a startup process? Thanks again! --Jacob
pgsql-hackers by date: