Thread: RE: Make default subscription streaming option as Parallel

RE: Make default subscription streaming option as Parallel

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Vignesh,

> One key point to consider is that the lock on transaction objects will
> be held for a longer duration when using streaming in parallel. This
> occurs because the parallel apply worker initiates the transaction as
> soon as streaming begins, maintaining the lock until the transaction
> is fully completed. As a result, for long-running transactions, this
> extended lock can hinder concurrent access that requires a lock.

So, the current default is the most conservative and can run anywhere; your
proposal is a bit optimistic, right? Since long transactions should be avoided
in any case, and everyone knows it, I agree with your point.

One comment for your patch;
Shouldn't we add a streaming=off case for pg_dump test? You added lines but no one
passes it.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Re: Make default subscription streaming option as Parallel

From
shveta malik
Date:
On Mon, Oct 7, 2024 at 4:03 PM vignesh C <vignesh21@gmail.com> wrote:
>

With parallel streaming as default, do you think there is a need to
increase the default for 'max_logical_replication_workers' as IIUC
parallel workers are taken from the same pool.

thanks
Shveta



Re: Make default subscription streaming option as Parallel

From
Amit Kapila
Date:
On Tue, Oct 8, 2024 at 2:25 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Oct 7, 2024 at 4:03 PM vignesh C <vignesh21@gmail.com> wrote:
> >
>
> With parallel streaming as default, do you think there is a need to
> increase the default for 'max_logical_replication_workers' as IIUC
> parallel workers are taken from the same pool.
>

Good question. But then one may say that we should proportionately
increase max_worker_processes as well. I don't know what should be
reasonable new defaults. I think we should make parallel streaming as
default and then wait for some user feedback before changing other
defaults.

--
With Regards,
Amit Kapila.



Re: Make default subscription streaming option as Parallel

From
shveta malik
Date:
On Tue, Oct 8, 2024 at 3:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Oct 8, 2024 at 2:25 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Mon, Oct 7, 2024 at 4:03 PM vignesh C <vignesh21@gmail.com> wrote:
> > >
> >
> > With parallel streaming as default, do you think there is a need to
> > increase the default for 'max_logical_replication_workers' as IIUC
> > parallel workers are taken from the same pool.
> >
>
> Good question. But then one may say that we should proportionately
> increase max_worker_processes as well. I don't know what should be
> reasonable new defaults.

Agreed on the same query on the next level of max_worker_processes .

> I think we should make parallel streaming as
> default and then wait for some user feedback before changing other
> defaults.
>

Okay, sounds good to me. It is not a blocking factor anyway, user can
always change it to a higher value in case his requirement is like
that.

thanks
Shveta



Re: Make default subscription streaming option as Parallel

From
Dilip Kumar
Date:
On Tue, Oct 8, 2024 at 3:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Oct 8, 2024 at 2:25 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Oct 7, 2024 at 4:03 PM vignesh C <vignesh21@gmail.com> wrote:
> >
>
> With parallel streaming as default, do you think there is a need to
> increase the default for 'max_logical_replication_workers' as IIUC
> parallel workers are taken from the same pool.
>

Good question. But then one may say that we should proportionately
increase max_worker_processes as well. I don't know what should be
reasonable new defaults. I think we should make parallel streaming as
default and then wait for some user feedback before changing other
defaults.


I agree, actually streaming of in progress transactions is a useful feature for performance in case of large transactions, so it makes sense to make it "on" by default.  So +1 from my side.


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Make default subscription streaming option as Parallel

From
Amit Kapila
Date:
On Fri, Oct 18, 2024 at 9:48 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Oct 8, 2024 at 3:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Tue, Oct 8, 2024 at 2:25 PM shveta malik <shveta.malik@gmail.com> wrote:
>> >
>> > On Mon, Oct 7, 2024 at 4:03 PM vignesh C <vignesh21@gmail.com> wrote:
>> > >
>> >
>> > With parallel streaming as default, do you think there is a need to
>> > increase the default for 'max_logical_replication_workers' as IIUC
>> > parallel workers are taken from the same pool.
>> >
>>
>> Good question. But then one may say that we should proportionately
>> increase max_worker_processes as well. I don't know what should be
>> reasonable new defaults. I think we should make parallel streaming as
>> default and then wait for some user feedback before changing other
>> defaults.
>>
>
> I agree, actually streaming of in progress transactions is a useful feature for performance in case of large
transactions,so it makes sense to make it "on" by default.  So +1 from my side. 
>

Your response is confusing. AFAIU, this proposal is to change the
default value of the streaming option to 'parallel' but you are
suggesting to make 'on' as default which is different from the
proposed default but OTOH you are saying +1 as well. So, both can't be
true.

--
With Regards,
Amit Kapila.



Re: Make default subscription streaming option as Parallel

From
Dilip Kumar
Date:
On Fri, 18 Oct 2024 at 5:24 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Oct 18, 2024 at 9:48 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Oct 8, 2024 at 3:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Tue, Oct 8, 2024 at 2:25 PM shveta malik <shveta.malik@gmail.com> wrote:
>> >
>> > On Mon, Oct 7, 2024 at 4:03 PM vignesh C <vignesh21@gmail.com> wrote:
>> > >
>> >
>> > With parallel streaming as default, do you think there is a need to
>> > increase the default for 'max_logical_replication_workers' as IIUC
>> > parallel workers are taken from the same pool.
>> >
>>
>> Good question. But then one may say that we should proportionately
>> increase max_worker_processes as well. I don't know what should be
>> reasonable new defaults. I think we should make parallel streaming as
>> default and then wait for some user feedback before changing other
>> defaults.
>>
>
> I agree, actually streaming of in progress transactions is a useful feature for performance in case of large transactions, so it makes sense to make it "on" by default.  So +1 from my side.
>

Your response is confusing. AFAIU, this proposal is to change the
default value of the streaming option to 'parallel' but you are
suggesting to make 'on' as default which is different from the
proposed default but OTOH you are saying +1 as well. So, both can't be
true.

Sorry for confusion I meant to say change default as ‘parallel’

Dilip

Re: Make default subscription streaming option as Parallel

From
Amit Kapila
Date:
On Mon, Oct 7, 2024 at 4:03 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Mon, 7 Oct 2024 at 12:26, Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
>
> > One comment for your patch;
> > Shouldn't we add a streaming=off case for pg_dump test? You added lines but no one
> > passes it.
> >
>
> Update existing pg_dump tests to cover the streaming options, the
> attached patch has the changes for the same.
>

@@ -5235,6 +5235,8 @@ dumpSubscription(Archive *fout, const
SubscriptionInfo *subinfo)
  appendPQExpBufferStr(query, ", streaming = on");
  else if (strcmp(subinfo->substream, "p") == 0)
  appendPQExpBufferStr(query, ", streaming = parallel");
+ else
+ appendPQExpBufferStr(query, ", streaming = off");

For newer versions (>=18), we shouldn't need to specify "streaming =
parallel" while creating a subscription as that will be the default.
However, with the above code pg_dump statements will still have that.
There is nothing wrong with that but ideally, it won't be required.
Now, OTOH, doing that would require some version-handling code, which
is nothing new for pg_dump but not sure it makes sense for this
particular case. Another related point is that normally we don't
recommend people to use dump generated with pg_dump to use with lower
server versions than pg_dump itself, but the current proposed patch
will allow that. However, if we change it as I am saying that won't be
possible. So, I am okay with the current code but want to see if
anyone else thinks otherwise or if I am missing something.

--
With Regards,
Amit Kapila.



Re: Make default subscription streaming option as Parallel

From
Peter Smith
Date:
On Mon, Oct 21, 2024 at 5:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Oct 7, 2024 at 4:03 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Mon, 7 Oct 2024 at 12:26, Hayato Kuroda (Fujitsu)
> > <kuroda.hayato@fujitsu.com> wrote:
> >
> > > One comment for your patch;
> > > Shouldn't we add a streaming=off case for pg_dump test? You added lines but no one
> > > passes it.
> > >
> >
> > Update existing pg_dump tests to cover the streaming options, the
> > attached patch has the changes for the same.
> >
>
> @@ -5235,6 +5235,8 @@ dumpSubscription(Archive *fout, const
> SubscriptionInfo *subinfo)
>   appendPQExpBufferStr(query, ", streaming = on");
>   else if (strcmp(subinfo->substream, "p") == 0)
>   appendPQExpBufferStr(query, ", streaming = parallel");
> + else
> + appendPQExpBufferStr(query, ", streaming = off");
>
> For newer versions (>=18), we shouldn't need to specify "streaming =
> parallel" while creating a subscription as that will be the default.
> However, with the above code pg_dump statements will still have that.
> There is nothing wrong with that but ideally, it won't be required.
> Now, OTOH, doing that would require some version-handling code, which
> is nothing new for pg_dump but not sure it makes sense for this
> particular case. Another related point is that normally we don't
> recommend people to use dump generated with pg_dump to use with lower
> server versions than pg_dump itself, but the current proposed patch
> will allow that. However, if we change it as I am saying that won't be
> possible.

Leaving the patch as-is seems better to me.

PROS
- The simple code explicitly setting all parameter values is easy to
understand as-is.
- AFAICT it works for all that the pg_dump docs [1] guarantees.
- No version handling code will be needed...
- Therefore, no risk of accidentally introducing any versioning bugs.
- Yields a more portable dump file (even though not guaranteed by pg_dump docs)

CONS
- A few more chars in the dump file?
- What else?

======
[1] https://www.postgresql.org/docs/devel/app-pgdump.html

Kind Regards,
Peter Smith.
Fujitsu Australia