Block level concurrency during recovery - Mailing list pgsql-hackers

From Simon Riggs
Subject Block level concurrency during recovery
Date
Msg-id 1224507258.3808.679.camel@ebony.2ndQuadrant
Whole thread Raw
Responses Re: Block level concurrency during recovery
Re: Block level concurrency during recovery
Re: Block level concurrency during recovery
List pgsql-hackers
I'm looking at how to make queries safe during recovery, in the presence
of concurrent changes to blocks. In particular, concurrent removal of
rows that might be read by queries.

My thinking is 
* we ignore LockRelationForExtension(). Normal queries never request it.
All new blocks were created with that lock held and we are replaying
changes serially, so we do not need to re-create that lock. We already
do this, so no change.
* re-create the Cleanup lock on blocks, when the original operation was
performed while a Cleanup lock was held.

So the plan is to introduce a new XLogLockBufferForCleanup() function
and then use it in all places where a cleanup lock was held during the
write of the WAL record.

This means we will need to hold cleanup lock:

* while RM_HEAP2_ID records are applied (Freeze, Clean, CleanMove)

* while an XLOG_BTREE_DELETE was generated by VACUUM - this record type
is not always generated by VACUUM. So split this WAL record into two
types XLOG_BTREE_DELETE and XLOG_BTREE_VACUUM, so we can hold Cleanup
lock while applyinh XLOG_BTREE_VACUUM. (This may not be required, but
I'd rather do the full locking now and relax it later).

* Whenever we apply a backup block that performs the same function as
any of the above WAL records. So we would hold Cleanup lock when
applying WAL records of types all RM_HEAP2_ID types XLOG_BTREE_VACUUM

I'm most of the way through implementing the above and will post patch
as a separate item to make it easier to inspect.

Other aspects:

* For GIN indexes, we appear to not hold a Cleanup lock during
vacuuming, except on root page. That stops new scans from starting, but
it doesn't prevent progress of concurrent scans. Doesn't look correct to
me... so not sure what strength lock to acquire in each case. Probably
need to differentiate between WAL record types so we can tell which to
acquire CleanupLock for and which not.

* GIST doesn't use CleaupLocks at all. So I'm very unclear here.

Teodor has mentioned that it should be OK for GIST/GIN. Can I double
check that based upon my inspection of the code?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_hba options parsing
Next
From: Magnus Hagander
Date:
Subject: Re: pg_hba options parsing