Re: Force streaming every change in logical decoding - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Force streaming every change in logical decoding
Date
Msg-id CAD21AoCL53znFEtFeb-wZ6X9F72aGBTZh5jmKoAAQdZwcG-0UQ@mail.gmail.com
Whole thread Raw
In response to RE: Force streaming every change in logical decoding  ("shiy.fnst@fujitsu.com" <shiy.fnst@fujitsu.com>)
Responses Re: Force streaming every change in logical decoding  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Wed, Dec 21, 2022 at 10:14 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
>
> On Wed, Dec 21, 2022 4:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Dec 21, 2022 at 1:55 PM Peter Smith <smithpb2250@gmail.com>
> > wrote:
> > >
> > > On Wed, Dec 21, 2022 at 6:22 PM Masahiko Sawada
> > <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Tue, Dec 20, 2022 at 7:49 PM Amit Kapila <amit.kapila16@gmail.com>
> > wrote:
> > > > >
> > > > > On Tue, Dec 20, 2022 at 2:46 PM Hayato Kuroda (Fujitsu)
> > > > > <kuroda.hayato@fujitsu.com> wrote:
> > > > > >
> > > > > > Dear hackers,
> > > > > >
> > > > > > > We have discussed three different ways to provide GUC for these
> > > > > > > features. (1) Have separate GUCs like force_server_stream_mode,
> > > > > > > force_server_serialize_mode, force_client_serialize_mode (we can
> > use
> > > > > > > different names for these) for each of these; (2) Have two sets of
> > > > > > > GUCs for server and client. We can have logical_decoding_mode with
> > > > > > > values as 'stream' and 'serialize' for the server and then
> > > > > > > logical_apply_serialize = true/false for the client. (3) Have one GUC
> > > > > > > like logical_replication_mode with values as 'server_stream',
> > > > > > > 'server_serialize', 'client_serialize'.
> > > > > >
> > > > > > I also agreed for adding new GUC parameters (and I have already done
> > partially
> > > > > > in parallel apply[1]), and basically options 2 made sense for me. But is
> > it OK
> > > > > > that we can choose "serialize" mode even if subscribers require
> > streaming?
> > > > > >
> > > > > > Currently the reorder buffer transactions are serialized on publisher
> > only when
> > > > > > the there are no streamable transaction. So what happen if the
> > > > > > logical_decoding_mode = "serialize" but streaming option streaming is
> > on? If we
> > > > > > break the first one and serialize changes on publisher anyway, it may
> > be not
> > > > > > suitable for testing the normal operation.
> > > > > >
> > > > >
> > > > > I think the change will be streamed as soon as the next change is
> > > > > processed even if we serialize based on this option. See
> > > > > ReorderBufferProcessPartialChange. However, I see your point that
> > when
> > > > > the streaming option is given, the value 'serialize' for this GUC may
> > > > > not make much sense.
> > > > >
> > > > > > Therefore, I came up with the variant of (2): logical_decoding_mode
> > can be
> > > > > > "normal" or "immediate".
> > > > > >
> > > > > > "normal" is a default value, which is same as current HEAD. Changes
> > are streamed
> > > > > > or serialized when the buffered size exceeds
> > logical_decoding_work_mem.
> > > > > >
> > > > > > When users set to "immediate", the walsenders starts to stream or
> > serialize all
> > > > > > changes. The choice is depends on the subscription option.
> > > > > >
> > > > >
> > > > > The other possibility to achieve what you are saying is that we allow
> > > > > a minimum value of logical_decoding_work_mem as 0 which would
> > mean
> > > > > stream or serialize each change depending on whether the streaming
> > > > > option is enabled. I think we normally don't allow a minimum value
> > > > > below a certain threshold for other *_work_mem parameters (like
> > > > > maintenance_work_mem, work_mem), so we have followed the same
> > here.
> > > > > And, I think it makes sense from the user's perspective because below
> > > > > a certain threshold it will just add overhead by either writing small
> > > > > changes to the disk or by sending those over the network. However, it
> > > > > can be quite useful for testing/debugging. So, not sure, if we should
> > > > > restrict setting logical_decoding_work_mem below a certain threshold.
> > > > > What do you think?
> > > >
> > > > I agree with (2), having separate GUCs for publisher side and
> > > > subscriber side. Also, on the publisher side, Amit's idea, controlling
> > > > the logical decoding behavior by changing logical_decoding_work_mem,
> > > > seems like a good idea.
> > > >
> > > > But I'm not sure it's a good idea if we lower the minimum value of
> > > > logical_decoding_work_mem to 0. I agree it's helpful for testing and
> > > > debugging but setting logical_decoding_work_mem = 0 doesn't benefit
> > > > users at all, rather brings risks.
> > > >
> > > > I prefer the idea Kuroda-san previously proposed; setting
> > > > logical_decoding_mode = 'immediate' means setting
> > > > logical_decoding_work_mem = 0. We might not need to have it as an
> > enum
> > > > parameter since it has only two values, though.
> > >
> > > Did you mean one GUC (logical_decoding_mode) will cause a side-effect
> > > implicit value change on another GUC value
> > > (logical_decoding_work_mem)?
> > >
> >
> > I don't think that is required. The only value that can be allowed for
> > logical_decoding_mode will be "immediate", something like we do for
> > recovery_target. The default will be "". The "immediate" value will
> > mean that stream each change if the "streaming" option is enabled
> > ("on" of "parallel") or if "streaming" is not enabled then that would
> > mean serializing each change.
> >
>
> I agreed and updated the patch as Amit suggested.
> Please see the attached patch.
>

The patch looks good to me. Some minor comments are:

- * limit, but we might also adapt a more elaborate eviction strategy
- for example
- * evicting enough transactions to free certain fraction (e.g. 50%)
of the memory
- * limit.
+ * limit, but we might also adapt a more elaborate eviction strategy - for
+ * example evicting enough transactions to free certain fraction (e.g. 50%) of
+ * the memory limit.

This change is not relevant with this feature.

---
+        if (logical_decoding_mode == LOGICAL_DECODING_MODE_DEFAULT
+                && rb->size < logical_decoding_work_mem * 1024L)

Since we put '&&' before the new line in all other places in
reorderbuffer.c, I think it's better to make it consistent. The same
is true for the change for while loop in the patch.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: "shiy.fnst@fujitsu.com"
Date:
Subject: RE: Force streaming every change in logical decoding
Next
From: Andrew Dunstan
Date:
Subject: float4in_internal