Re: Hot Standby 0.2.1 - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Hot Standby 0.2.1
Date
Msg-id 1253604161.4449.138.camel@ebony.2ndQuadrant
Whole thread Raw
In response to Re: Hot Standby 0.2.1  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Hot Standby 0.2.1
List pgsql-hackers
On Mon, 2009-09-21 at 19:42 -0700, Jeff Janes wrote:
> On Tue, Sep 15, 2009 at 2:41 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >
> > OK, here is the latest version of the Hot Standby patchset. This is
> > about version 30+ by now, but we should regard this as 0.2.1
> > Patch against CVS HEAD (now): clean apply, compile, no known bugs.
> >
> > OVERVIEW
> >
> > You can download PDF versions of the fine manual is here
> > http://wiki.postgresql.org/images/0/01/Hot_Standby_main.pdf
> 
> 
> >From this doc:
> 
> "In recovery, transactions will not be permitted to take any lock
> higher other than
> AccessShareLock or AccessExclusiveLock. In addition, transactions may never
> assign a TransactionId and may never write WAL. The LOCK TABLE command by
> default applies an AccessExclusiveLock. Any LOCK TABLE command that runs on
> the standby and requests a specific lock type other than AccessShareLock will be
> rejected."
> 
> The first sentence seems to say that clients on the stand-by can take
> ACCESS EXCLUSIVE, while the last sentence seems to say that they
> cannot do so.

You are right to pick up that discrepancy, as Heikki did also.

The root cause of that discrepancy is a change in patch behaviour
between January and now that I will use your post to highlight and
discuss, if you don't mind. (and yes, the docs need to be corrected)

Initially, it seemed that it was certain that a read-only backend could
not take an AccessExclusiveLock. On further thought, there is no
particular reason to block AccessExclusiveLocks themselves, just that
most things you might do while holding one are banned. But the lock
itself is fine. (Any challenge on that?)

AccessExclusiveLocks can be used to serialize the actions of other
backends. That is a very common use case, so my concern was that LOCK
TABLE would be a no-op unless we allowed AccessExclusiveLock, so the
patch does allow it.

> I did a little experiment on a hot standby instance.  I expected that
> either I would be denied the lock altogether, or the lock would cause
> WAL replay to be paused until either I committed or was forcibly
> canceled.  But neither happened, I was granted the lock but WAL replay
> continued anyway.
> 
> jjanes=# begin;
> BEGIN
> jjanes=# lock table pgbench_history in access exclusive mode;
> LOCK TABLE
> jjanes=# select count(*) from pgbench_history;
>  count
> --------
>  519104
> (1 row)
> 
> jjanes=# select count(*) from pgbench_history;
>  count
> --------
>  527814
> (1 row)
> 
> Is this the expected behavior?

By me, yes. WAL replay does not require a table lock to progress. Any
changes are protected with block-level locks. It does acquire a table
lock and cancel conflicting queries when it is about to replay something
that would cause a query to explode, such as dropping a table, as
explained in docs.

So this is not a bug.

The explanation of how the above sequence of events occurs is that the
backend acquires AccessExclusiveLock - please check on other session in
pg_locks. WAL replay continues by the Startup process, inserting further
rows into the pgbench_history table as a series of transactions. The
second select takes a later snapshot than the first and so sees more
data than the first select, hence a larger count. (And I am pleased to
see that recovery is progressing quickly even while your queries run).

So not a bug, but just one of many possible behaviours we could enforce.
1. Allow AccessExclusiveLocks yet they do not interrupt WAL replay
2. Allow AccessExclusiveLocks but have them pause WAL replay
3. Disallow AccessExclusiveLocks (and so LOCK TABLE is effectively a
no-op because it will not be able to serialize anything)

So the patch originally implemented (3) but now implements (1).

I would say that (2) is very undesirable because it puts WAL replay in
the control of non-superusers. That could mean LOCK TABLE implicitly
alters the high availability of the standby, and might even be used
maliciously to do that.

I'm open to views on whether we should use (1) or (3). Comments?

Implementing either is no problem and we have a straight choice. We may
even wish to review that again later from additional feedback.

(Jeff, you have also helped me understand that there is a bug in the way
serializable transactions are cancelled, which is easily corrected.
Thanks for that unexpected windfall, but it is otherwise unrelated to
your comments.)

-- Simon Riggs           www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Hot Standby 0.2.1
Next
From: Heikki Linnakangas
Date:
Subject: Re: Hot Standby 0.2.1