Re: [HACKERS] Logical decoding on standby - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: [HACKERS] Logical decoding on standby
Date
Msg-id CAMsr+YHggfrs4CDKBNLmmMgtw81cfnzbQ5Y_m1fvaUUxF6Ttyg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Logical decoding on standby  (Petr Jelinek <petr.jelinek@2ndquadrant.com>)
Responses Re: [HACKERS] Logical decoding on standby
Re: [HACKERS] Logical decoding on standby
List pgsql-hackers
On 20 December 2016 at 15:03, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:

>> The biggest change in this patch, and the main intrusive part, is that
>> procArray->replication_slot_catalog_xmin is no longer directly used by
>> vacuum. Instead, a new ShmemVariableCache->oldestCatalogXmin field is
>> added, with a corresponding CheckPoint field.
>> [snip]
>
> If this mechanism would not be needed most of the time, wouldn't it be
> better to not have it and just have a way to ask physical slot about
> what's the current reserved catalog_xmin (in most cases the standby
> should actually know what it is since it's sending the hs_feedback, but
> first slot created on such standby may need to check).

Yes, and that was actually my originally preferred approach, though
the one above does offer the advantage that if something goes wrong we
can detect it and fail gracefully. Possibly not worth the complexity
though.

Your approach requires us to make very sure that hot_standby_feedback
does not get turned off by user or become ineffective once we're
replicating, though, since we won't have any way to detect when needed
tuples are removed. We'd probably just bail out with relcache/syscache
lookup errors, but I can't guarantee we wouldn't crash if we tried
logical decoding on WAL where needed catalogs have been removed.

I initially ran into trouble doing that, but now think I have a way forward.

> WRT preventing
> hs_feedback going off, we can refuse to start with hs_feedback off when
> there are logical slots detected.

Yes. There are some ordering issues there though. We load slots quite
late in startup and they don't get tracked in checkpoints. So we might
launch the walreceiver before we load slots and notice their needed
xmin/catalog_xmin. So we need to prevent sending of
hot_standby_feedback until slots are loaded, or load slots earlier in
startup. The former sounds less intrusive and safer - probably just
add an "initialized" flag to ReplicationSlotCtlData and suppress
hs_feedback until it becomes true.

We'd also have to suppress the validation callback action on the
hot_standby_feedback GUC until we know replication slot state is
initialised, and perform the check during slot startup instead. The
hot_standby_feedback GUC validation check might get called before
shmem is even set up so we have to guard against attempts to access a
shmem segment that may not event exist yet.

The general idea is workable though. Refuse to start if logical slots
exist and hot_standby_feedback is off or walreceiver isn't using a
physical slot. Refuse to allow hot_standby_feedback to change

> We can also refuse to connect to the
> master without physical slot if there are logical slots detected. I
> don't see problem with either of those.

Agreed. We must also be able to reliably enforce that the walreceiver
is using a replication slot to connect to the master and refuse to
start if it is not. The user could change recovery.conf and restart
the walreceiver while we're running, after we perform that check, so
walreceiver must also refuse to start if logical replication slots
exist but it has no primary slot name configured.

> You may ask what if user drops the slot and recreates or somehow
> otherwise messes up catalog_xmin on master, well, considering that under
> my proposal we'd first (on connect) check the slot for catalog_xmin we'd
> know about it so we'd either mark the local slots broken/drop them or
> plainly refuse to connect to the master same way as if it didn't have
> required WAL anymore (not sure which behavior is better). Note that user
> could mess up catalog_xmin even in your design the same way, so it's not
> really a regression.

Agreed. Checking catalog_xmin of the slot when we connect is
sufficient to guard against that, assuming we can trust that the
catalog_xmin is actually in effect on the master. Consider cascading
setups, where we set our catalog_xmin but it might not be "locked in"
until the middle cascaded server relays it to the master.

I have a proposed solution to that which I'll outline in a separate
patch+post; it ties in to some work on addressing the race between hot
standby feedback taking effect and queries starting on the hot
standby.  It boils down to "add a hot_standby_feedback reply protocol
message".

> Plus
> it might even be okay to only allow creating logical slots on standbys
> connected directly to master in v1.

True. I didn't consider that.

We haven't had much luck in the past with such limitations, but
personally I'd consider it a perfectly reasonable one.

> But in 0003 I don't understand following code:
>> +     if (endpos != InvalidXLogRecPtr && !do_start_slot)
>> +     {
>> +             fprintf(stderr,
>> +                             _("%s: cannot use --create-slot or --drop-slot together with --endpos\n"),
>> +                             progname);
>> +             fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
>> +                             progname);
>> +             exit(1);
>> +     }
>
> Why is --create-slot and --endpos not allowed together?

What would --create-slot with --endpos do?

Ah. I had not realised that it is legal to do

pg_recvlogical -S test --create-slot --start -f - -d 'test'

i.e. "create a slot then in the same invocation begin decoding from it".

I misread and thought that --create-slot and --start were mutually exclusive.

I will address that.

> 0005 is timeline following which is IMHO ready for committer, as is 0006
> and 0008 and I still maintain the opinion that these should go in soon.

I wonder if I should re-order 0005 and 0006 so we can commit the
hot_standby test improvements before logical decoding timeline
following.

> I think parts of this could be committed separately and are ready for
> committer IMHO, but there is no way in CF application to mark only part
> of patch-set for committer to attract the attention.

Yeah. I raised that before and nobody was really sure what to do about
it. It's confusing to post patches on the same thread on separate CF
entries. It's also confusing to post patches on a nest of
inter-related threads to allow each thread to be tracked by a separate
CF entry.

At the moment I'm aiming to progressively get the underlying
infrastructure/test stuff in so we can focus on the core feature.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [HACKERS] Replication slot xmin is not reset if HS feedback isturned off while standby is shut down
Next
From: Jim Nasby
Date:
Subject: [HACKERS] Why does plpython delay composite type resolution?