Thread: RE: Make default subscription streaming option as Parallel
Dear Vignesh, > One key point to consider is that the lock on transaction objects will > be held for a longer duration when using streaming in parallel. This > occurs because the parallel apply worker initiates the transaction as > soon as streaming begins, maintaining the lock until the transaction > is fully completed. As a result, for long-running transactions, this > extended lock can hinder concurrent access that requires a lock. So, the current default is the most conservative and can run anywhere; your proposal is a bit optimistic, right? Since long transactions should be avoided in any case, and everyone knows it, I agree with your point. One comment for your patch; Shouldn't we add a streaming=off case for pg_dump test? You added lines but no one passes it. Best regards, Hayato Kuroda FUJITSU LIMITED
On Mon, Oct 7, 2024 at 4:03 PM vignesh C <vignesh21@gmail.com> wrote: > With parallel streaming as default, do you think there is a need to increase the default for 'max_logical_replication_workers' as IIUC parallel workers are taken from the same pool. thanks Shveta
On Tue, Oct 8, 2024 at 2:25 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Mon, Oct 7, 2024 at 4:03 PM vignesh C <vignesh21@gmail.com> wrote: > > > > With parallel streaming as default, do you think there is a need to > increase the default for 'max_logical_replication_workers' as IIUC > parallel workers are taken from the same pool. > Good question. But then one may say that we should proportionately increase max_worker_processes as well. I don't know what should be reasonable new defaults. I think we should make parallel streaming as default and then wait for some user feedback before changing other defaults. -- With Regards, Amit Kapila.
On Tue, Oct 8, 2024 at 3:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Oct 8, 2024 at 2:25 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Mon, Oct 7, 2024 at 4:03 PM vignesh C <vignesh21@gmail.com> wrote: > > > > > > > With parallel streaming as default, do you think there is a need to > > increase the default for 'max_logical_replication_workers' as IIUC > > parallel workers are taken from the same pool. > > > > Good question. But then one may say that we should proportionately > increase max_worker_processes as well. I don't know what should be > reasonable new defaults. Agreed on the same query on the next level of max_worker_processes . > I think we should make parallel streaming as > default and then wait for some user feedback before changing other > defaults. > Okay, sounds good to me. It is not a blocking factor anyway, user can always change it to a higher value in case his requirement is like that. thanks Shveta
On Tue, Oct 8, 2024 at 3:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Oct 8, 2024 at 2:25 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Oct 7, 2024 at 4:03 PM vignesh C <vignesh21@gmail.com> wrote:
> >
>
> With parallel streaming as default, do you think there is a need to
> increase the default for 'max_logical_replication_workers' as IIUC
> parallel workers are taken from the same pool.
>
Good question. But then one may say that we should proportionately
increase max_worker_processes as well. I don't know what should be
reasonable new defaults. I think we should make parallel streaming as
default and then wait for some user feedback before changing other
defaults.
I agree, actually streaming of in progress transactions is a useful feature for performance in case of large transactions, so it makes sense to make it "on" by default. So +1 from my side.
On Fri, Oct 18, 2024 at 9:48 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Oct 8, 2024 at 3:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> On Tue, Oct 8, 2024 at 2:25 PM shveta malik <shveta.malik@gmail.com> wrote: >> > >> > On Mon, Oct 7, 2024 at 4:03 PM vignesh C <vignesh21@gmail.com> wrote: >> > > >> > >> > With parallel streaming as default, do you think there is a need to >> > increase the default for 'max_logical_replication_workers' as IIUC >> > parallel workers are taken from the same pool. >> > >> >> Good question. But then one may say that we should proportionately >> increase max_worker_processes as well. I don't know what should be >> reasonable new defaults. I think we should make parallel streaming as >> default and then wait for some user feedback before changing other >> defaults. >> > > I agree, actually streaming of in progress transactions is a useful feature for performance in case of large transactions,so it makes sense to make it "on" by default. So +1 from my side. > Your response is confusing. AFAIU, this proposal is to change the default value of the streaming option to 'parallel' but you are suggesting to make 'on' as default which is different from the proposed default but OTOH you are saying +1 as well. So, both can't be true. -- With Regards, Amit Kapila.
On Fri, 18 Oct 2024 at 5:24 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Oct 18, 2024 at 9:48 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Oct 8, 2024 at 3:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Tue, Oct 8, 2024 at 2:25 PM shveta malik <shveta.malik@gmail.com> wrote:
>> >
>> > On Mon, Oct 7, 2024 at 4:03 PM vignesh C <vignesh21@gmail.com> wrote:
>> > >
>> >
>> > With parallel streaming as default, do you think there is a need to
>> > increase the default for 'max_logical_replication_workers' as IIUC
>> > parallel workers are taken from the same pool.
>> >
>>
>> Good question. But then one may say that we should proportionately
>> increase max_worker_processes as well. I don't know what should be
>> reasonable new defaults. I think we should make parallel streaming as
>> default and then wait for some user feedback before changing other
>> defaults.
>>
>
> I agree, actually streaming of in progress transactions is a useful feature for performance in case of large transactions, so it makes sense to make it "on" by default. So +1 from my side.
>
Your response is confusing. AFAIU, this proposal is to change the
default value of the streaming option to 'parallel' but you are
suggesting to make 'on' as default which is different from the
proposed default but OTOH you are saying +1 as well. So, both can't be
true.
Sorry for confusion I meant to say change default as ‘parallel’
—
Dilip
On Mon, Oct 7, 2024 at 4:03 PM vignesh C <vignesh21@gmail.com> wrote: > > On Mon, 7 Oct 2024 at 12:26, Hayato Kuroda (Fujitsu) > <kuroda.hayato@fujitsu.com> wrote: > > > One comment for your patch; > > Shouldn't we add a streaming=off case for pg_dump test? You added lines but no one > > passes it. > > > > Update existing pg_dump tests to cover the streaming options, the > attached patch has the changes for the same. > @@ -5235,6 +5235,8 @@ dumpSubscription(Archive *fout, const SubscriptionInfo *subinfo) appendPQExpBufferStr(query, ", streaming = on"); else if (strcmp(subinfo->substream, "p") == 0) appendPQExpBufferStr(query, ", streaming = parallel"); + else + appendPQExpBufferStr(query, ", streaming = off"); For newer versions (>=18), we shouldn't need to specify "streaming = parallel" while creating a subscription as that will be the default. However, with the above code pg_dump statements will still have that. There is nothing wrong with that but ideally, it won't be required. Now, OTOH, doing that would require some version-handling code, which is nothing new for pg_dump but not sure it makes sense for this particular case. Another related point is that normally we don't recommend people to use dump generated with pg_dump to use with lower server versions than pg_dump itself, but the current proposed patch will allow that. However, if we change it as I am saying that won't be possible. So, I am okay with the current code but want to see if anyone else thinks otherwise or if I am missing something. -- With Regards, Amit Kapila.
On Mon, Oct 21, 2024 at 5:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Oct 7, 2024 at 4:03 PM vignesh C <vignesh21@gmail.com> wrote: > > > > On Mon, 7 Oct 2024 at 12:26, Hayato Kuroda (Fujitsu) > > <kuroda.hayato@fujitsu.com> wrote: > > > > > One comment for your patch; > > > Shouldn't we add a streaming=off case for pg_dump test? You added lines but no one > > > passes it. > > > > > > > Update existing pg_dump tests to cover the streaming options, the > > attached patch has the changes for the same. > > > > @@ -5235,6 +5235,8 @@ dumpSubscription(Archive *fout, const > SubscriptionInfo *subinfo) > appendPQExpBufferStr(query, ", streaming = on"); > else if (strcmp(subinfo->substream, "p") == 0) > appendPQExpBufferStr(query, ", streaming = parallel"); > + else > + appendPQExpBufferStr(query, ", streaming = off"); > > For newer versions (>=18), we shouldn't need to specify "streaming = > parallel" while creating a subscription as that will be the default. > However, with the above code pg_dump statements will still have that. > There is nothing wrong with that but ideally, it won't be required. > Now, OTOH, doing that would require some version-handling code, which > is nothing new for pg_dump but not sure it makes sense for this > particular case. Another related point is that normally we don't > recommend people to use dump generated with pg_dump to use with lower > server versions than pg_dump itself, but the current proposed patch > will allow that. However, if we change it as I am saying that won't be > possible. Leaving the patch as-is seems better to me. PROS - The simple code explicitly setting all parameter values is easy to understand as-is. - AFAICT it works for all that the pg_dump docs [1] guarantees. - No version handling code will be needed... - Therefore, no risk of accidentally introducing any versioning bugs. - Yields a more portable dump file (even though not guaranteed by pg_dump docs) CONS - A few more chars in the dump file? - What else? ====== [1] https://www.postgresql.org/docs/devel/app-pgdump.html Kind Regards, Peter Smith. Fujitsu Australia