Re: [HACKERS] pgsql: Avoid archiving XLOG_RUNNING_XACTS on idle server - Mailing list pgsql-committers

From Simon Riggs
Subject Re: [HACKERS] pgsql: Avoid archiving XLOG_RUNNING_XACTS on idle server
Date
Msg-id CANP8+jJK4mXXEEFmw2P4wPT6-5fTaJY4vnY2Ea=yEGrSophfjA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] pgsql: Avoid archiving XLOG_RUNNING_XACTS on idle server  (Andres Freund <andres@anarazel.de>)
Responses Re: [HACKERS] pgsql: Avoid archiving XLOG_RUNNING_XACTS on idle server  (Andres Freund <andres@anarazel.de>)
List pgsql-committers
On 6 April 2016 at 13:27, Andres Freund <andres@anarazel.de> wrote:
On 2016-04-06 13:11:40 +0100, Simon Riggs wrote:
> On 6 April 2016 at 10:09, Andres Freund <andres@anarazel.de> wrote:
> > On 2016-04-06 10:04:42 +0100, Simon Riggs wrote:
> > The issue there is that we continue to issue checkpoints if the only
> > activity since the last checkpoint was emitting a standby
> > snapshot. That's because:
> >
>
> I agree this is the current situation in 9.4 and 9.5, hence the bug report.
>
> This no longer occurs with the patches I have proposed. The snapshot is
> skipped, so a further checkpoint never triggers.

Not if there's a longrunning/idle transaction.

Note that skipping the snapshot is actually a *problem* in some
cases. As I've brought up upthread, to which you never replied. A
xl_running_xacts->xcnt == 0/!overflowed snapshot can be very important
for hot standby, because it allows ProcArrayApplyRecoveryInfo() to
switch to INITIALIZED state:

I replied by posting a patch to address your concern, how is that non-reply?
 
> > > What issue is that? Previously you said it must not skip it at all for
> > > logical.
> >
> > It's fine to skip the records iff nothing important has happened since
> > the last time a snapshot has been logged. Again, the proposed approach
> > allowed to detect that.

> Agreed, both proposals do that.

No, the currently committed patch, even including your proposed followup
patch, doesn't do that. As you just commented about as quoted above. The
code currently reads like:

        if (wal_level < WAL_LEVEL_LOGICAL)
        {
                LWLockRelease(ProcArrayLock);

                /*
                 * Don't bother to log anything if nothing is happening, if we are
                 * using archive_timeout > 0 and we didn't overflow snapshot last time.
                 *
                 * This ensures that we don't issue an empty WAL record, which can
                 * be annoying when used in conjunction with archive timeout.
                 */
                if (running->xcnt == 0 &&
                        nlocks == 0 &&
                        XLogArchiveTimeout > 0 &&
                        !last_snapshot_overflowed)
                {
                        LWLockRelease(XidGenLock);
                        return InvalidXLogRecPtr;
                }

                last_snapshot_overflowed = running->subxid_overflow;
        }

This obviously doesn't apply to WAL_LEVEL_LOGICAL as is (the if). And it
also obviously repeats to log the same snapshot in a system where the
state hasn't changed, but where running->xcnt != 0 or nlocks != 0.

My understanding from your previous comments was that it would be incorrect to do that.
 
> > > > We now log more WAL with
> > > > XLogArchiveTimeout > 0 than without.
> >
> > > And the problem with that is what?
> >
> > That an idle system unnecessarily produces WAL? Waking up disks and
> > everything?
> >
>
> The OP discussed a problem with archive_timeout > 0. Are you saying we
> should put in a fix that applies whatever the setting of archive_timeout?

Yes. We should strive to fix the full scope of an issue; not just the
part complained about.

I believe that's what I did. You didn't mention that point earlier, nor do I now think it important.
 
You seem to ignore valid points made against your approach, apparently
because you dismiss them as "emotive language". Stop.

Not true. I have listened to everything you've said and been patient with the high number of mistakes in your replies.

Calling something a "bandaid" is not a "valid point" against it.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-committers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [HACKERS] pgsql: Avoid archiving XLOG_RUNNING_XACTS on idle server
Next
From: Michael Paquier
Date:
Subject: Re: pgsql: Generic Messages for Logical Decoding