Re: Buffer locking is special (hints, checksums, AIO writes) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Buffer locking is special (hints, checksums, AIO writes)
Date
Msg-id 4csodkvvfbfloxxjlkgsnl2lgfv2mtzdl7phqzd4jxjadxm4o5@usw7feyb5bzf
Whole thread Raw
In response to Re: Buffer locking is special (hints, checksums, AIO writes)  (Andres Freund <andres@anarazel.de>)
Responses Re: Buffer locking is special (hints, checksums, AIO writes)
List pgsql-hackers
Hi,

I pushed what was 0001, 0002 in v8. Attached is an updated set of patches for
the rest.

Main changes:

- the explanation in "heapam: Use exclusive lock on old page in CLUSTER" as to
  why it's problematic to not set hint bits wasn't quite right. I've updated
  it:

    heapam: Use exclusive lock on old page in CLUSTER

    To be able to guarantee that we can set the hint bit, acquire an exclusive
    lock on the old buffer. This is required as a future commit will only allow
    hint bits to be set with a new lock level, which is acquired as-needed in a
    non-blocking fashion.

    We need the hint bits, set in heapam_relation_copy_for_cluster() ->
    HeapTupleSatisfiesVacuum(), to be set, as otherwise reform_and_rewrite_tuple()
    -> rewrite_heap_tuple() will get confused. Specifically, rewrite_heap_tuple()
    checks for HEAP_XMAX_INVALID in the old tuple to determine whether to check
    the old-to-new mapping hash table.

- I added a patch that inverts the meaning of LW_FLAG_RELEASE_OK, to make the
  equivalent code for content locks easier. For buffer content locks we reset
  the flags when invalidating, and otherwise we'd either need to not have the
  equivalent of LW_FLAG_RELEASE_OK in the flag mask or explicitly add it after
  making the buffer valid.

  I think it's also nicer this way round, because we e.g. can assert that
  there are no pending wakeups when invalidating a buffer.

- I added a patch to reorganize some of the flags stuff in buf_internals.h, to
  make the later patches cleaner. In particular flags are now defined with a
  macro so that changing at which offset flag bits are doesn't require
  touching every single flag value.

- For the main commit, the reorganized flag stuff removed one of the remaining
  FIXMEs.

- I removed the performance instrumentation stuff from the batch visibility
  commit.


I think 0001, 0002, 0003 can be committed. 0004, 0005 are new and probably
could use a sanity check.  0006 hasn't changed much and is imo pretty much
ready, but should be pushed together with 0007.  0007 is getting close, I
think.  0008-0010 need a bit more work, but I think that can wait until 0007
has been pushed.

For 0008, it'd be nice if somebody could look at the way buf_internals.h now
looks.

The only remaining FIXME in 0008 is about the the reuse of
PGPROC->{lwWaiting,lwWaitMode,lwWaitLink}. I think reusing them for content
locks isn't pretty, but it's probably not worth duplicating them. Thoughts?

Greetings,

Andres Freund

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [PATCH] Add pg_current_vxact_id() function to expose virtual transaction IDs
Next
From: Fujii Masao
Date:
Subject: Re: Use IsA() macro instead of nodeTag comparison