Re: Hot Standby (v9d) - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Hot Standby (v9d)
Date
Msg-id 1233170847.2327.2510.camel@ebony.2ndQuadrant
Whole thread Raw
In response to Re: Hot Standby (v9d)  (Gregory Stark <stark@enterprisedb.com>)
Responses Re: Hot Standby (v9d)  ("Joshua D. Drake" <jd@commandprompt.com>)
Re: Hot Standby (v9d)  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Wed, 2009-01-28 at 18:55 +0000, Gregory Stark wrote:

> I skimmed through the Hot Standby patch for a preliminary review. I noted the
> following things, some minor tweaks, some just questions. None of the things I
> noted are big issues unless some of the questions uncover issues.

Thanks for your attention.

> 1) This code is obviously a cut-pasto:
> 
> +       else if (strcmp(tok1, "max_standby_delay") == 0)
> +       {
> +           errno = 0;
> +           maxStandbyDelay = (TransactionId) strtoul(tok2, NULL, 0);
> +           if (errno == EINVAL || errno == ERANGE)
> +               ereport(FATAL,
> +                (errmsg("max_standby_delay is not a valid number: \"%s\"",
> +                        tok2)));
> 
> Have you ever tested the behaviour with max_standby_delay = -1 ?

> Also, the default max_standby_delay is currently 0 -- ie, kill queries
> immediately upon detecting any conflicts at all -- which I don't really think
> anyone would be happy with. 

Agreed. As explained when I published that patch it is deliberately
severe to allow testing of conflict resolution and feedback on it.

> I still *strongly* feel the default has to be the
> non-destructive conservative -1.

I don't. Primarily, we must support high availability. It is much better
if we get people saying "I get my queries cancelled" and we say RTFM and
change parameter X, than if people say "my failover was 12 hours behind
when I needed it to be 10 seconds behind and I lost a $1 million because
of downtime of Postgres" and we say RTFM and change parameter X.

> 2) This hard coded constant of 500ms seems arbitrary to me. If your use case
> is a heavily-used reporting engine you might get this message all the time. I
> think this either has to be configurable (warn_standby_delay?) or based on
> max_standby_delay or some other parameter somehow.
> 
> +       /*
> +        * log that we have been waiting for a while now...
> +        */
> +       if (!logged && standbyWait_ms > 500)

Greg, that's a DEBUG5 message.

> 3) These two blocks of code seem unsatisfactory:
> 
> !           /*
> !            * Keep track of the block number of the lastBlockVacuumed, so
> !            * we can scan those blocks as well during WAL replay. This then
> !            * provides concurrency protection and allows btrees to be used
> !            * while in recovery.
> !            */
> !           if (lastBlockVacuumed > vstate->lastBlockVacuumed)
> !               vstate->lastBlockVacuumed = lastBlockVacuumed;
> ! 
> 
> 
> +           /*
> +            * XXXHS we don't actually need to read the block, we
> +            * just need to confirm it is unpinned. If we had a special call
> +            * into the buffer manager we could optimise this so that
> +            * if the block is not in shared_buffers we confirm it as unpinned.
> +            */
> +           buffer = XLogReadBufferExtended(xlrec->node, MAIN_FORKNUM, blkno, RBM_NORMAL);
> +           if (BufferIsValid(buffer))
> +           {
> +               LockBufferForCleanup(buffer);           
> +               UnlockReleaseBuffer(buffer);
> +           }
> 
> Aside from the XXX comment (I thought we actually had such a call now, but if
> not shouldn't we just add one instead of carping?)

We should add many things. The equivalent optimisation of VACUUM in
normal running has not been done either. Considering we have both HOT
and visibility map enhanced VACUUM, its not a priority.

>  I'm not convinced this
> handles all the cases that can arise. Notable, what happens if a previous
> vacuum died in the middle of the scan?

Nothing, because it won't then have removed any heap rows.

> I think we have a vacuum id which we use already with btrees to the same
> issue. It seems to me this be more robust if you stamped the xlog record with
> that id number and ensured it matched the id of the lastBloockVacuumed? Then
> you could start from 0 if the id changes.

> 4) Why is this necessary?
> 
> +   if (IsRecoveryProcessingMode() && 
> +       locktag->locktag_type == LOCKTAG_OBJECT &&
> +       lockmode > AccessShareLock)
> +       ereport(ERROR,
> +               (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +                errmsg("cannot acquire lockmode %s on database objects while recovery is in progress", 
> +                                   lockMethodTable->lockModeNames[lockmode]),
> +                errhint("Only AccessShareLock can be acquired on database objects during recovery.")));
> + 
> 
> Obviously we can't lock records (SELECT FOR UPDATE) since that requires write
> access. But shouldn't we be able to do manual LOCK TABLE calls?

Read the rest of the comments on the locking code section.

> I hope this isn't the only interlock we have against trying to do write i/o or
> DDL against tables...?

No it's not.

> 5) The code for killing off conflicting transactions looks odd to me but I
> haven't really traced through it to see what it's doing. It seems to kill off
> just one process? 

No, why do you think that?

> What if there are several holding the lock in question?

Covered.

> Also, it doesn't seem to take into account how long the various transactions
> have held the lock?

True. Why would that matter?

> Is there a risk of, for example, if you have a long report running against a
> table and lots of OLTP requests against the same table 

> it seems the conflict resolution code would fire randomly into the
> crowd 

OK, that's where I stop bothering to respond.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Output filter for psql
Next
From: "Joshua D. Drake"
Date:
Subject: Re: Hot Standby (v9d)