[HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN() - Mailing list pgsql-hackers

From Jacob Champion
Subject [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()
Date
Msg-id CABAq_6GXgQDVu3u12mK9O5Xt5abBZWQ0V40LZCE+oUf95XyNFg@mail.gmail.com
Whole thread Raw
Responses Re: [HACKERS] [PATCH] Assert that the correct locks are held whencalling PageGetLSN()
Re: [HACKERS] [PATCH] Assert that the correct locks are held whencalling PageGetLSN()
List pgsql-hackers
Hello all,

While working on checksum support for GPDB, we noticed that several
callers of PageGetLSN() didn't follow the correct locking procedure.
To try to help ferret out present and future mistakes, we added an
assertion to PageGetLSN() that checks whether those locks were being
held correctly. This patch is a first-draft attempt to port that
assertion back up to postgres master, based on work by Asim Praveen,
Ashwin Agrawal, and myself.

The patch is really two pieces: add the assertion, and fix the callers
that would trip it. The latter part is still in progress, because I'm
running into some places where I'm not sure what the correct way
forward is.

(I'm a newbie to this list and this code base, so please don't
hesitate to correct me on anything, whether that's code- or
culture-related!)

= Notes/Observations =

- This change introduces a subtle source incompatibility: whereas
previously both Pages and PageHeaders could be passed to PageGetLSN(),
now callers must explicitly cast to Page if they don't already have
one.

- I originally tried to put (and the GPDB patch succeeds in putting)
the entire assertion implementation into PageGetLSN in bufpage.h. That
seems unworkable here -- buf_internals.h is no longer publicized, and
the assertion needs those internals to perform its checks. So I moved
the assertion into its own helper function in bufmgr.c. If assertions
are disabled, that helper declaration gets turned into an empty macro,
so there shouldn't be a performance hit.

= Open Questions =

- Is my use of FRONTEND here correct? The documentation made it seem
like some compilers could try to link against the
AssertPageIsLockedForLSN function, even if PageGetLSN was never
called.

- Use of PageGetLSN() in heapam.c seems to be incorrect, but I don't
really know what the best way is to fix it. It explicitly unlocks the
buffer, with the comment that visibilitymap_set() needs to handle
locking itself, but then calls PageGetLSN() to determine whether
visibilitymap_set() needs to be called.

- TestForOldSnapshot is also problematic. The function is called in
places where only a shared lock is held, so it needs to hold the
header spinlock at least some of the time, but we only pass the Page
as an argument. I could do the same "find buffer descriptor from page
pointer" logic that we utilize in the assertion implementation, but is
there a better way?

- Should I try to add this assertion to PageSetLSN() as well? We were
focused mostly on the GetLSN side of things, since that was the more
risky piece of our backport. PageSetLSN is called from many more
places, and the review there would be much larger.

= Known Issues =

- This approach doesn't seem to work when checksums are disabled. In
that case, BufferGetLSNAtomic isn't actually atomic, so it seems to
always trip the assertion. I'm not sure of the best way to proceed --
is it really correct not to follow the locking contract if checksums
are disabled?

What do you think? Is this something worth pursuing? Any and all
comments welcome.

Thanks!
--Jacob

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Assorted leaks and weirdness in parallel execution
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Assorted leaks and weirdness in parallel execution