Thread: Question about StartLogicalReplication() error path

Question about StartLogicalReplication() error path

From

Jeff Davis

Date:

11 June 2021, 02:08:10

StartLogicalReplication() calls CreateDecodingContext(), which says:

  else if (start_lsn < slot->data.confirmed_flush)
  {
    /*
     * It might seem like we should error out in this case, but it's
     * pretty common for a client to acknowledge a LSN it doesn't
have to
     * do anything for, and thus didn't store persistently, because the
     * xlog records didn't result in anything relevant for logical
     * decoding. Clients have to be able to do that to support
synchronous
     * replication.
     */
     ...
     start_lsn = slot->data.confirmed_flush;
  }

But what about LSNs that are way in the past? Physical replication will
throw an error in that case (e.g. "requested WAL segment %s has already
been removed"), but StartLogicalReplication() ends up just starting
from confirmed_flush, which doesn't seem right.

I'm not sure I understand the comment overall. Why would the client
request something that it has already acknowledged, and why would the
server override that and just advance to the confirmed_lsn?

Regards,
    Jeff Davis

Re: Question about StartLogicalReplication() error path

From

Amit Kapila

Date:

11 June 2021, 04:43:07

On Fri, Jun 11, 2021 at 7:38 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> StartLogicalReplication() calls CreateDecodingContext(), which says:
>
>   else if (start_lsn < slot->data.confirmed_flush)
>   {
>     /*
>      * It might seem like we should error out in this case, but it's
>      * pretty common for a client to acknowledge a LSN it doesn't
> have to
>      * do anything for, and thus didn't store persistently, because the
>      * xlog records didn't result in anything relevant for logical
>      * decoding. Clients have to be able to do that to support
> synchronous
>      * replication.
>      */
>      ...
>      start_lsn = slot->data.confirmed_flush;
>   }
>
..
..
>
> I'm not sure I understand the comment overall. Why would the client
> request something that it has already acknowledged,
>

Because sometimes clients don't have to do anything for xlog records.
One example is WAL for DDL where logical decoding didn't produce
anything for the client but later with keepalive we send the LSN of
WAL where DDL has finished and the client just responds with the
position sent by the server as it doesn't have any other pending
transactions.

> and why would the
> server override that and just advance to the confirmed_lsn?
>

I think because there is no need to process the WAL that has been
confirmed by the client. Do you see any problems with this scheme?

-- 
With Regards,
Amit Kapila.

Re: Question about StartLogicalReplication() error path

From

Jeff Davis

Date:

11 June 2021, 06:22:53

On Fri, 2021-06-11 at 10:13 +0530, Amit Kapila wrote:
> Because sometimes clients don't have to do anything for xlog records.
> One example is WAL for DDL where logical decoding didn't produce
> anything for the client but later with keepalive we send the LSN of
> WAL where DDL has finished and the client just responds with the
> position sent by the server as it doesn't have any other pending
> transactions.

If I understand correctly, in this situation it avoids the cost of a
write on the client just to update its stored LSN progress value when
there's no real data to be written. In that case the client would need
to rely on the server's confirmed_flush_lsn instead of its own stored
LSN progress value.

That's a reasonable thing for the *client* to do explicitly, e.g. by
just reading the slot's confirmed_flush_lsn and comparing to its own
stored lsn. But I don't think it's reasonable for the server to just
skip over data requested by the client because it thinks it knows best.

> I think because there is no need to process the WAL that has been
> confirmed by the client. Do you see any problems with this scheme?

Several:

* Replication setups are complex, and it can be easy to misconfigure
something or have a bug in some control code. An error is valuable to
detect the problem closer to the source.

* There are plausible configurations where things could go badly wrong.
For instance, if you are storing the decoded data in another postgres
server with syncrhonous_commit=off, and acknowledging LSNs before they
are durable. A crash of the destination system would be consistent, but
it would be missing some data earlier than the confirmed_flush_lsn. The
client would then request the data starting at its stored lsn progress
value, but the server would skip ahead to the confirmed_flush_lsn;
silently missing data.

* It's contradicted by the docs: "Instructs server to start streaming
WAL for logical replication, starting at WAL location XXX/XXX."

* The comment acknowledges that a user might expect an error in that
case; but doesn't really address why the user would expect an error,
and why it's OK to violate that expectation.

Regards,
    Jeff Davis

Re: Question about StartLogicalReplication() error path

From

Robert Haas

Date:

11 June 2021, 17:15:11

On Fri, Jun 11, 2021 at 2:23 AM Jeff Davis <pgsql@j-davis.com> wrote:
> * The comment acknowledges that a user might expect an error in that
> case; but doesn't really address why the user would expect an error,
> and why it's OK to violate that expectation.

This code was written by Andres, so he'd be the best person to comment
on it, but it seems to me that the comment does explain this, and that
it's basically the same explanation as what Amit said. If the client
doesn't have to do anything for a certain range of WAL and just
acknowledges it, it would under your proposal have to also durably
record that it had chosen to do nothing, which might cause extra
fsyncs, potentially lots of extra fsyncs if this happens frequently
e.g. because most tables are filtered out and the replicated ones are
only modified occasionally. I'm not sure that it would be a good
trade-off to have a tighter sanity check at the expense of adding that
overhead.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Question about StartLogicalReplication() error path

From

Andres Freund

Date:

11 June 2021, 17:40:32

Hi,

On 2021-06-11 13:15:11 -0400, Robert Haas wrote:
> On Fri, Jun 11, 2021 at 2:23 AM Jeff Davis <pgsql@j-davis.com> wrote:
> > * The comment acknowledges that a user might expect an error in that
> > case; but doesn't really address why the user would expect an error,
> > and why it's OK to violate that expectation.
> 
> This code was written by Andres, so he'd be the best person to comment
> on it, but it seems to me that the comment does explain this, and that
> it's basically the same explanation as what Amit said. If the client
> doesn't have to do anything for a certain range of WAL and just
> acknowledges it, it would under your proposal have to also durably
> record that it had chosen to do nothing, which might cause extra
> fsyncs, potentially lots of extra fsyncs if this happens frequently
> e.g. because most tables are filtered out and the replicated ones are
> only modified occasionally.

Yes, that's the motivation.


> I'm not sure that it would be a good trade-off to have a tighter
> sanity check at the expense of adding that overhead.

Especially because it very well might break existing working setups...

Greetings,

Andres Freund

Re: Question about StartLogicalReplication() error path

From

Jeff Davis

Date:

11 June 2021, 18:49:19

On Fri, 2021-06-11 at 13:15 -0400, Robert Haas wrote:
> on it, but it seems to me that the comment does explain this, and
> that
> it's basically the same explanation as what Amit said.

It only addresses the "pro" side of the behavior, not the "con". It's a
bit like saying "Given that we are in the U.S., it might seem like we
should be driving on the right side of the road, but that side has
traffic and we are in a hurry."

Why might it seem that we should error out? If we don't error out, what
bad things might happen? How do these "con"s weigh against the "pro"s?

> I'm not sure that it would be a good
> trade-off to have a tighter sanity check at the expense of adding
> that
> overhead.

It doesn't add any overhead.

All the client would have to do is "SELECT confirmed_flush_lsn FROM
pg_replication_slots WHERE slot_name='myslot'", and compare it to the
stored value. If the stored value is earlier than the
confirmed_flush_lsn, the *client* can decide to start replication at
the confirmed_flush_lsn. That makes sense because the client knows more
about its behavior and configuration, and whether that's a safe choice
or not.

The only difference is whether the server is safe-by-default with
intuitive semantics that match the documentation, or unsafe-by-default
with unexpected semantics that don't match the documentation.

Regards,
    Jeff Davis

Re: Question about StartLogicalReplication() error path

From

Jeff Davis

Date:

11 June 2021, 18:56:06

On Fri, 2021-06-11 at 10:40 -0700, Andres Freund wrote:
> Especially because it very well might break existing working
> setups...

Please address my concerns[1].

Regards,
    Jeff Davis

[1] 
https://www.postgresql.org/message-id/e22a4606333ce1032e29fe2fb1aa9036e6f0ca98.camel%40j-davis.com

Re: Question about StartLogicalReplication() error path

From

Andres Freund

Date:

11 June 2021, 18:56:48

On 2021-06-11 11:49:19 -0700, Jeff Davis wrote:
> All the client would have to do is "SELECT confirmed_flush_lsn FROM
> pg_replication_slots WHERE slot_name='myslot'", and compare it to the
> stored value.

That doesn't work as easily in older versions because there was no SQL
support in replication connections until PG 10...

Re: Question about StartLogicalReplication() error path

From

Jeff Davis

Date:

11 June 2021, 19:07:24

On Fri, 2021-06-11 at 11:56 -0700, Andres Freund wrote:
> That doesn't work as easily in older versions because there was no
> SQL
> support in replication connections until PG 10...

9.6 will be EOL this year. I don't really see why such old versions are
relevant to this discussion.

Regards,
    Jeff Davis

Re: Question about StartLogicalReplication() error path

From

Robert Haas

Date:

11 June 2021, 20:05:10

On Fri, Jun 11, 2021 at 2:49 PM Jeff Davis <pgsql@j-davis.com> wrote:
> It doesn't add any overhead.
>
> All the client would have to do is "SELECT confirmed_flush_lsn FROM
> pg_replication_slots WHERE slot_name='myslot'", and compare it to the
> stored value. If the stored value is earlier than the
> confirmed_flush_lsn, the *client* can decide to start replication at
> the confirmed_flush_lsn. That makes sense because the client knows more
> about its behavior and configuration, and whether that's a safe choice
> or not.

True, but it doesn't seem very nice to forcethe client depend on SQL
when that wouldn't otherwise be needed. The SQL is a lot more likely
to fail than a replication command, for example due to some
permissions issue. So I think if we want to make this an optional
behavior, it would be better to add a flag to the START_REPLICATION
flag to say whether it's OK for the server to fast-forward like this.

You seem to see this as some kind of major problem and I guess I don't
agree. I think it's pretty clear what the motivation was for the
current behavior, because I believe it's well-explained by the comment
and the three people who have tried to answer your question. I also
think it's pretty clear why somebody might find it surprising: someone
might think that fast-forwarding is harmful and risky rather than a
useful convenience. As evidence for the fact that someone might think
that, I offer the fact that you seem to think exactly that thing. I
also think that there's pretty good evidence that the behavior as it
exists is not really so bad. As far as I know, and I certainly might
have missed something, you're the first one to complain about behavior
that we've had for quite a long time now, and you seem to be saying
that it might cause problems for somebody, not that you know it
actually did. So, I don't know, I'm not opposed to talking about
potential improvements here, but to the extent that you're suggesting
this is unreasonable behavior, I think that's too harsh.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Question about StartLogicalReplication() error path

From

Andres Freund

Date:

11 June 2021, 20:42:17

Hi,

On 2021-06-11 16:05:10 -0400, Robert Haas wrote:
> You seem to see this as some kind of major problem and I guess I don't
> agree. I think it's pretty clear what the motivation was for the
> current behavior, because I believe it's well-explained by the comment
> and the three people who have tried to answer your question. I also
> think it's pretty clear why somebody might find it surprising: someone
> might think that fast-forwarding is harmful and risky rather than a
> useful convenience. As evidence for the fact that someone might think
> that, I offer the fact that you seem to think exactly that thing. I
> also think that there's pretty good evidence that the behavior as it
> exists is not really so bad. As far as I know, and I certainly might
> have missed something, you're the first one to complain about behavior
> that we've had for quite a long time now, and you seem to be saying
> that it might cause problems for somebody, not that you know it
> actually did. So, I don't know, I'm not opposed to talking about
> potential improvements here, but to the extent that you're suggesting
> this is unreasonable behavior, I think that's too harsh.

Yea. I think it'd be a different matter if streaming logical decoding
had been added this cycle and we'd started out with supporting queries
over replication connection - but it's been long enough that it's likely
that people rely on the current behaviour, and I don't see the gain in
reliability outweigh the compat issues.

Your argument that one can just check kinda goes both ways - you can do
that with the current behaviour too...

Greetings,

Andres Freund

Re: Question about StartLogicalReplication() error path

From

Andres Freund

Date:

11 June 2021, 20:43:51

On 2021-06-11 12:07:24 -0700, Jeff Davis wrote:
> On Fri, 2021-06-11 at 11:56 -0700, Andres Freund wrote:
> > That doesn't work as easily in older versions because there was no
> > SQL
> > support in replication connections until PG 10...
> 
> 9.6 will be EOL this year. I don't really see why such old versions are
> relevant to this discussion.

It's relevant to understand how we ended up here.

Re: Question about StartLogicalReplication() error path

From

Amit Kapila

Date:

12 June 2021, 10:47:54

On Fri, Jun 11, 2021 at 11:52 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Fri, 2021-06-11 at 10:13 +0530, Amit Kapila wrote:
> > I think because there is no need to process the WAL that has been
> > confirmed by the client. Do you see any problems with this scheme?
>
> Several:
>
> * Replication setups are complex, and it can be easy to misconfigure
> something or have a bug in some control code. An error is valuable to
> detect the problem closer to the source.
>
> * There are plausible configurations where things could go badly wrong.
> For instance, if you are storing the decoded data in another postgres
> server with syncrhonous_commit=off, and acknowledging LSNs before they
> are durable. A crash of the destination system would be consistent, but
> it would be missing some data earlier than the confirmed_flush_lsn. The
> client would then request the data starting at its stored lsn progress
> value, but the server would skip ahead to the confirmed_flush_lsn;
> silently missing data.
>

AFAIU, currently, in such a case, the subscriber (client) won't
advance the flush location (confirmed_flush_lsn). So, we won't lose
any data.

-- 
With Regards,
Amit Kapila.

Re: Question about StartLogicalReplication() error path

From

Jeff Davis

Date:

14 June 2021, 16:50:32

On Sat, 2021-06-12 at 16:17 +0530, Amit Kapila wrote:
> AFAIU, currently, in such a case, the subscriber (client) won't
> advance the flush location (confirmed_flush_lsn). So, we won't lose
> any data.

I think you are talking about the official Logical Replication
specifically, rather than an arbitrary client that's using the logical
replication protocol based on the protocol docs.

It seems that there's not much agreement in a behavior change here. I
suggest one or more of the following:

 1. change the logical rep protocol docs to match the current behavior
    a. also briefly explain in the docs why it's different from
physical replication (which does always start at the provided LSN as
far as I can tell)

 2. Change the comment to add something like "Starting at a different
LSN than requested might not catch certain kinds of client errors.
Clients should be careful to check confirmed_flush_lsn if starting at
the requested LSN is required."

 3. upgrade DEBUG1 message to a WARNING

Can I get agreement on any of the above suggestions?

Regards,
    Jeff Davis

Re: Question about StartLogicalReplication() error path

From

Robert Haas

Date:

14 June 2021, 17:13:54

On Mon, Jun 14, 2021 at 12:50 PM Jeff Davis <pgsql@j-davis.com> wrote:
> It seems that there's not much agreement in a behavior change here. I
> suggest one or more of the following:
>
>  1. change the logical rep protocol docs to match the current behavior
>     a. also briefly explain in the docs why it's different from
> physical replication (which does always start at the provided LSN as
> far as I can tell)
>
>  2. Change the comment to add something like "Starting at a different
> LSN than requested might not catch certain kinds of client errors.
> Clients should be careful to check confirmed_flush_lsn if starting at
> the requested LSN is required."
>
>  3. upgrade DEBUG1 message to a WARNING
>
> Can I get agreement on any of the above suggestions?

I'm happy to hear other opinions, but I think I would be inclined to
vote in favor of #1 and/or #2 but against #3.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Question about StartLogicalReplication() error path

From

Jeff Davis

Date:

14 June 2021, 21:51:35

On Mon, 2021-06-14 at 13:13 -0400, Robert Haas wrote:
> I'm happy to hear other opinions, but I think I would be inclined to
> vote in favor of #1 and/or #2 but against #3.

What about upgrading it to, say, LOG? It seems like it would happen
pretty infrequently, and in the event something strange happens, might
rule out some possibilities.

Regards,
    Jeff Davis

Re: Question about StartLogicalReplication() error path

From

Kyotaro Horiguchi

Date:

15 June 2021, 06:19:54

At Mon, 14 Jun 2021 14:51:35 -0700, Jeff Davis <pgsql@j-davis.com> wrote in 
> On Mon, 2021-06-14 at 13:13 -0400, Robert Haas wrote:
> > I'm happy to hear other opinions, but I think I would be inclined to
> > vote in favor of #1 and/or #2 but against #3.
> 
> What about upgrading it to, say, LOG? It seems like it would happen
> pretty infrequently, and in the event something strange happens, might
> rule out some possibilities.

I don't think the message is neded, but I don't oppose it as far as
the level is LOG and the messages were changed as something like this:


-        elog(DEBUG1, "cannot stream from %X/%X, minimum is %X/%X, forwarding",
+        elog(LOG, "%X/%X has been already streamed, forwarding to %X/%X",

FWIW, I most prefer #1. I see #2 as optional. and see #3 as the above.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Question about StartLogicalReplication() error path

From

Amit Kapila

Date:

16 June 2021, 06:01:57

On Tue, Jun 15, 2021 at 3:21 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Mon, 2021-06-14 at 13:13 -0400, Robert Haas wrote:
> > I'm happy to hear other opinions, but I think I would be inclined to
> > vote in favor of #1 and/or #2 but against #3.
>
> What about upgrading it to, say, LOG? It seems like it would happen
> pretty infrequently, and in the event something strange happens, might
> rule out some possibilities.
>

I don't see any problem with changing it to LOG if that helps
especially because it won't happen too often.

-- 
With Regards,
Amit Kapila.

Re: Question about StartLogicalReplication() error path

From

Amit Kapila

Date:

19 June 2021, 10:00:08

On Thu, Jun 17, 2021 at 4:25 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Tue, 2021-06-15 at 15:19 +0900, Kyotaro Horiguchi wrote:
> > I don't think the message is neded, but I don't oppose it as far as
> > the level is LOG and the messages were changed as something like
> > this:
> >
> >
> > -               elog(DEBUG1, "cannot stream from %X/%X, minimum is
> > %X/%X, forwarding",
> > +               elog(LOG, "%X/%X has been already streamed,
> > forwarding to %X/%X",
> >
> > FWIW, I most prefer #1. I see #2 as optional. and see #3 as the
> > above.
>
> Attached.
>

LGTM.

-- 
With Regards,
Amit Kapila.