Re: replication slot restart_lsn initialization - Mailing list pgsql-hackers

From Andres Freund
Subject Re: replication slot restart_lsn initialization
Date
Msg-id 20150610150738.GD10551@awork2.anarazel.de
Whole thread Raw
In response to Re: replication slot restart_lsn initialization  (Gurjeet Singh <gurjeet@singh.im>)
Responses Re: replication slot restart_lsn initialization  (Gurjeet Singh <gurjeet@singh.im>)
List pgsql-hackers
On 2015-06-10 08:00:28 -0700, Gurjeet Singh wrote:
> Attached is the patch that takes the former approach (initialize
> restart_lsn when the slot is created).

If it's an option that's imo a sane approach.

> pg_create_logical_replication_slot() prevents LSN from being
> recycled that by looping (worst case 2 times) until there's no
> conflict with the checkpointer recycling the segment. So I have used
> the same approach.

There's no need to change anything for logical slots? Or do you just
mean that you moved the existing code?
>  
>  /*
> + * Grab and save an LSN value to prevent WAL recycling past that point.
> + */
> +void
> +ReplicationSlotRegisterRestartLSN()
> +{
> +    ReplicationSlot *slot = MyReplicationSlot;
> +
> +    Assert(slot != NULL);
> +    Assert(slot->data.restart_lsn == InvalidXLogRecPtr);
> +
> +    /*
> +     * The replication slot mechanism is used to prevent removal of required
> +     * WAL. As there is no interlock between this and checkpoints required, WAL
> +     * segment could be removed before ReplicationSlotsComputeRequiredLSN() has
> +     * been called to prevent that. In the very unlikely case that this happens
> +     * we'll just retry.
> +     */
> +    while (true)
> +    {
> +        XLogSegNo    segno;
> +
> +        /*
> +         * Let's start with enough information if we can, so log a standby
> +         * snapshot and start decoding at exactly that position.
> +         */
> +        if (!RecoveryInProgress())
> +        {
> +            XLogRecPtr    flushptr;
> +
> +            /* start at current insert position */
> +            slot->data.restart_lsn = GetXLogInsertRecPtr();
> +
> +            /* make sure we have enough information to start */
> +            flushptr = LogStandbySnapshot();
> +
> +            /* and make sure it's fsynced to disk */
> +            XLogFlush(flushptr);
> +        }
> +        else
> +            slot->data.restart_lsn = GetRedoRecPtr();
> +
> +        /* prevent WAL removal as fast as possible */
> +        ReplicationSlotsComputeRequiredLSN();
> +
> +        /*
> +         * If all required WAL is still there, great, otherwise retry. The
> +         * slot should prevent further removal of WAL, unless there's a
> +         * concurrent ReplicationSlotsComputeRequiredLSN() after we've written
> +         * the new restart_lsn above, so normally we should never need to loop
> +         * more than twice.
> +         */
> +        XLByteToSeg(slot->data.restart_lsn, segno);
> +        if (XLogGetLastRemovedSegno() < segno)
> +            break;
> +    }
> +}

That doesn't look right to me. Why is this code logging a standby
snapshot for physical slots?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Nils Goroll
Date:
Subject: Re: s_lock() seems too aggressive for machines with many sockets
Next
From: Jan Wieck
Date:
Subject: Re: s_lock() seems too aggressive for machines with many sockets