Thread: Don't try fetching future segment of a TLI.
Hello, I added (moved to) -hackers. At Tue, 28 Jan 2020 19:13:32 +0300, Pavel Suderevsky <psuderevsky@gmail.com> wrote in > But for me it still seems that PostgreSQL has enough information to check > that no WALs exist for the new timeline to omit searching all the > possibly-existing WALs. > > It can just look through the first received new-timeline's WAL and ensure > timeline switch occured in this WAL. Finally, it can check archive for the > only one possibly-existing previous WAL. Right. The timeline history file tells where a timeline ends. > Regading influence: issue is not about the large amount of WALs to apply > but in searching for the non-existing WALs on the remote storage, each such > search can take 5-10 seconds while obtaining existing WAL takes > milliseconds. Wow. I didn't know of a file system that takes that much seconds to trying non-existent files. Although I still think this is not a bug, but avoiding that actually leads to a big win on such systems. After a thought, I think it's safe and effectively doable to let XLogFileReadAnyTLI() refrain from trying WAL segments of too-high TLIs. Some garbage archive files out of the range of a timeline might be seen, for example, after reusing archive directory without clearing files. However, fetching such garbages just to fail doesn't contribute durability or reliablity at all, I think. The attached does that. Any thoughts? regards. -- Kyotaro Horiguchi NTT Open Source Software Center From 1b9211740175b7f9cb6810c822a67d4065ca9cf0 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Wed, 29 Jan 2020 11:17:56 +0900 Subject: [PATCH v1] Don't try fetching out-of-timeline segments. XLogFileReadAnyTLI scans known TLIs down from the largest one in descending order while searching the target segment. Even if we know that the segment belongs to a lower TLI, it tries opening the segment of the larger TLIs just to fail. Under certain circumstances that access to non-existent files take a long time and makes recovery time significantly longer. Although a segment beyond the end of a TLI suggests that the XLOG/archive files may be broken, we can safely ignore such files as far as recovery proceeds. --- src/backend/access/transam/xlog.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 6e09ded597..415288f50d 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -3738,11 +3738,27 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source) foreach(cell, tles) { - TimeLineID tli = ((TimeLineHistoryEntry *) lfirst(cell))->tli; + TimeLineHistoryEntry *hent = (TimeLineHistoryEntry *) lfirst(cell); + TimeLineID tli = hent->tli; if (tli < curFileTLI) break; /* don't bother looking at too-old TLIs */ + /* Skip segments not belonging to the TLI */ + if (hent->begin != InvalidXLogRecPtr) + { + XLogSegNo beginseg = 0; + + XLByteToSeg(hent->begin, beginseg, wal_segment_size); + + /* + * We are scanning TLIs in descending order. It is sufficient to + * check only the upper boundary. + */ + if (segno < beginseg) + continue; /* don't bother looking at future TLIs */ + } + if (source == XLOG_FROM_ANY || source == XLOG_FROM_ARCHIVE) { fd = XLogFileRead(segno, emode, tli, -- 2.18.2
On 1/28/20 8:02 PM, Kyotaro Horiguchi wrote: > At Tue, 28 Jan 2020 19:13:32 +0300, Pavel Suderevsky >> Regading influence: issue is not about the large amount of WALs to apply >> but in searching for the non-existing WALs on the remote storage, each such >> search can take 5-10 seconds while obtaining existing WAL takes >> milliseconds. > > Wow. I didn't know of a file system that takes that much seconds to > trying non-existent files. Although I still think this is not a bug, > but avoiding that actually leads to a big win on such systems. I have not tested this case but I can imagine it would be slow in practice. It's axiomatic that is hard to prove a negative. With multi-region replication it might well take some time to be sure that the file *really* doesn't exist and hasn't just been lost in a single region. > After a thought, I think it's safe and effectively doable to let > XLogFileReadAnyTLI() refrain from trying WAL segments of too-high > TLIs. Some garbage archive files out of the range of a timeline might > be seen, for example, after reusing archive directory without clearing > files. However, fetching such garbages just to fail doesn't > contribute durability or reliablity at all, I think. The patch seems sane, the trick will be testing it. Pavel, do you have an environment where you can ensure this is a performance benefit? Regards, -- -David david@pgmasters.net
On 1/28/20 8:02 PM, Kyotaro Horiguchi wrote: > At Tue, 28 Jan 2020 19:13:32 +0300, Pavel Suderevsky >> Regading influence: issue is not about the large amount of WALs to apply >> but in searching for the non-existing WALs on the remote storage, each such >> search can take 5-10 seconds while obtaining existing WAL takes >> milliseconds. > > Wow. I didn't know of a file system that takes that much seconds to > trying non-existent files. Although I still think this is not a bug, > but avoiding that actually leads to a big win on such systems. I have not tested this case but I can imagine it would be slow in practice. It's axiomatic that is hard to prove a negative. With multi-region replication it might well take some time to be sure that the file *really* doesn't exist and hasn't just been lost in a single region. > After a thought, I think it's safe and effectively doable to let > XLogFileReadAnyTLI() refrain from trying WAL segments of too-high > TLIs. Some garbage archive files out of the range of a timeline might > be seen, for example, after reusing archive directory without clearing > files. However, fetching such garbages just to fail doesn't > contribute durability or reliablity at all, I think. The patch seems sane, the trick will be testing it. Pavel, do you have an environment where you can ensure this is a performance benefit? Regards, -- -David david@pgmasters.net
Hi,
I've tested patch provided by Kyotaro and do confirm it fixes the issue.
Any chance it will be merged to one of the next minor releases?
Thank you very much!
сб, 1 февр. 2020 г. в 08:31, David Steele <david@pgmasters.net>:
On 1/28/20 8:02 PM, Kyotaro Horiguchi wrote:
> At Tue, 28 Jan 2020 19:13:32 +0300, Pavel Suderevsky
>> Regading influence: issue is not about the large amount of WALs to apply
>> but in searching for the non-existing WALs on the remote storage,
each such
>> search can take 5-10 seconds while obtaining existing WAL takes
>> milliseconds.
>
> Wow. I didn't know of a file system that takes that much seconds to
> trying non-existent files. Although I still think this is not a bug,
> but avoiding that actually leads to a big win on such systems.
I have not tested this case but I can imagine it would be slow in
practice. It's axiomatic that is hard to prove a negative. With
multi-region replication it might well take some time to be sure that
the file *really* doesn't exist and hasn't just been lost in a single
region.
> After a thought, I think it's safe and effectively doable to let
> XLogFileReadAnyTLI() refrain from trying WAL segments of too-high
> TLIs. Some garbage archive files out of the range of a timeline might
> be seen, for example, after reusing archive directory without clearing
> files. However, fetching such garbages just to fail doesn't
> contribute durability or reliablity at all, I think.
The patch seems sane, the trick will be testing it.
Pavel, do you have an environment where you can ensure this is a
performance benefit?
Regards,
--
-David
david@pgmasters.net
Hi,
I've tested patch provided by Kyotaro and do confirm it fixes the issue.
Any chance it will be merged to one of the next minor releases?
Thank you very much!
сб, 1 февр. 2020 г. в 08:31, David Steele <david@pgmasters.net>:
On 1/28/20 8:02 PM, Kyotaro Horiguchi wrote:
> At Tue, 28 Jan 2020 19:13:32 +0300, Pavel Suderevsky
>> Regading influence: issue is not about the large amount of WALs to apply
>> but in searching for the non-existing WALs on the remote storage,
each such
>> search can take 5-10 seconds while obtaining existing WAL takes
>> milliseconds.
>
> Wow. I didn't know of a file system that takes that much seconds to
> trying non-existent files. Although I still think this is not a bug,
> but avoiding that actually leads to a big win on such systems.
I have not tested this case but I can imagine it would be slow in
practice. It's axiomatic that is hard to prove a negative. With
multi-region replication it might well take some time to be sure that
the file *really* doesn't exist and hasn't just been lost in a single
region.
> After a thought, I think it's safe and effectively doable to let
> XLogFileReadAnyTLI() refrain from trying WAL segments of too-high
> TLIs. Some garbage archive files out of the range of a timeline might
> be seen, for example, after reusing archive directory without clearing
> files. However, fetching such garbages just to fail doesn't
> contribute durability or reliablity at all, I think.
The patch seems sane, the trick will be testing it.
Pavel, do you have an environment where you can ensure this is a
performance benefit?
Regards,
--
-David
david@pgmasters.net
On 2020/03/19 22:22, Pavel Suderevsky wrote: > Hi, > > I've tested patch provided by Kyotaro and do confirm it fixes the issue. The patch looks good to me. Attached is the updated version of the patch. I updated only comments. Barring any objection, I will commit this patch. > Any chance it will be merged to one of the next minor releases? This doesn't seem a bug, so I'm thinking to merge this to next *major* version release, i.e., v13. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
On 2020/03/19 22:22, Pavel Suderevsky wrote: > Hi, > > I've tested patch provided by Kyotaro and do confirm it fixes the issue. The patch looks good to me. Attached is the updated version of the patch. I updated only comments. Barring any objection, I will commit this patch. > Any chance it will be merged to one of the next minor releases? This doesn't seem a bug, so I'm thinking to merge this to next *major* version release, i.e., v13. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 4/6/20 1:43 PM, Fujii Masao wrote: > > > On 2020/03/19 22:22, Pavel Suderevsky wrote: >> Hi, >> >> I've tested patch provided by Kyotaro and do confirm it fixes the issue. > > The patch looks good to me. Attached is the updated version of the patch. > I updated only comments. > > Barring any objection, I will commit this patch. The patch looks good to me. >> Any chance it will be merged to one of the next minor releases? > > This doesn't seem a bug, so I'm thinking to merge this to next *major* > version release, i.e., v13. Not a bug, perhaps, but I think we do consider back-patching performance problems. The rise in S3 usage has just exposed how poorly this performed code in high-latency environments. Regards, -- -David david@pgmasters.net
On 4/6/20 1:43 PM, Fujii Masao wrote: > > > On 2020/03/19 22:22, Pavel Suderevsky wrote: >> Hi, >> >> I've tested patch provided by Kyotaro and do confirm it fixes the issue. > > The patch looks good to me. Attached is the updated version of the patch. > I updated only comments. > > Barring any objection, I will commit this patch. The patch looks good to me. >> Any chance it will be merged to one of the next minor releases? > > This doesn't seem a bug, so I'm thinking to merge this to next *major* > version release, i.e., v13. Not a bug, perhaps, but I think we do consider back-patching performance problems. The rise in S3 usage has just exposed how poorly this performed code in high-latency environments. Regards, -- -David david@pgmasters.net
Thank you for picking this up. At Tue, 7 Apr 2020 02:43:02 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > On 2020/03/19 22:22, Pavel Suderevsky wrote: > > Hi, > > I've tested patch provided by Kyotaro and do confirm it fixes the > > issue. > > The patch looks good to me. Attached is the updated version of the > patch. > I updated only comments. + * The logfile segment that doesn't belong to the timeline is + * older or newer than the segment that the timeline started or + * eneded at, respectively. It's sufficient to check only the s/eneded/ended/ ? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Thank you for picking this up. At Tue, 7 Apr 2020 02:43:02 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > On 2020/03/19 22:22, Pavel Suderevsky wrote: > > Hi, > > I've tested patch provided by Kyotaro and do confirm it fixes the > > issue. > > The patch looks good to me. Attached is the updated version of the > patch. > I updated only comments. + * The logfile segment that doesn't belong to the timeline is + * older or newer than the segment that the timeline started or + * eneded at, respectively. It's sufficient to check only the s/eneded/ended/ ? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2020/04/07 4:04, David Steele wrote: > On 4/6/20 1:43 PM, Fujii Masao wrote: >> >> >> On 2020/03/19 22:22, Pavel Suderevsky wrote: >>> Hi, >>> >>> I've tested patch provided by Kyotaro and do confirm it fixes the issue. >> >> The patch looks good to me. Attached is the updated version of the patch. >> I updated only comments. >> >> Barring any objection, I will commit this patch. > > The patch looks good to me. > >>> Any chance it will be merged to one of the next minor releases? >> >> This doesn't seem a bug, so I'm thinking to merge this to next *major* >> version release, i.e., v13. > > Not a bug, perhaps, but I think we do consider back-patching performance problems. The rise in S3 usage has just exposedhow poorly this performed code in high-latency environments. I understood the situation and am fine to back-patch that. But I'm not sure if it's fair to do that. Maybe we need to hear more opinions about this? OTOH, feature freeze for v13 is today, so what about committing the patch in v13 at first, and then doing the back-patch after hearing opinions and receiving many +1? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020/04/07 4:04, David Steele wrote: > On 4/6/20 1:43 PM, Fujii Masao wrote: >> >> >> On 2020/03/19 22:22, Pavel Suderevsky wrote: >>> Hi, >>> >>> I've tested patch provided by Kyotaro and do confirm it fixes the issue. >> >> The patch looks good to me. Attached is the updated version of the patch. >> I updated only comments. >> >> Barring any objection, I will commit this patch. > > The patch looks good to me. > >>> Any chance it will be merged to one of the next minor releases? >> >> This doesn't seem a bug, so I'm thinking to merge this to next *major* >> version release, i.e., v13. > > Not a bug, perhaps, but I think we do consider back-patching performance problems. The rise in S3 usage has just exposedhow poorly this performed code in high-latency environments. I understood the situation and am fine to back-patch that. But I'm not sure if it's fair to do that. Maybe we need to hear more opinions about this? OTOH, feature freeze for v13 is today, so what about committing the patch in v13 at first, and then doing the back-patch after hearing opinions and receiving many +1? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020/04/07 10:29, Kyotaro Horiguchi wrote: > Thank you for picking this up. > > At Tue, 7 Apr 2020 02:43:02 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >> On 2020/03/19 22:22, Pavel Suderevsky wrote: >>> Hi, >>> I've tested patch provided by Kyotaro and do confirm it fixes the >>> issue. >> >> The patch looks good to me. Attached is the updated version of the >> patch. >> I updated only comments. > > + * The logfile segment that doesn't belong to the timeline is > + * older or newer than the segment that the timeline started or > + * eneded at, respectively. It's sufficient to check only the > > s/eneded/ended/ ? Yes! Thanks! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
On 2020/04/07 10:29, Kyotaro Horiguchi wrote: > Thank you for picking this up. > > At Tue, 7 Apr 2020 02:43:02 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >> On 2020/03/19 22:22, Pavel Suderevsky wrote: >>> Hi, >>> I've tested patch provided by Kyotaro and do confirm it fixes the >>> issue. >> >> The patch looks good to me. Attached is the updated version of the >> patch. >> I updated only comments. > > + * The logfile segment that doesn't belong to the timeline is > + * older or newer than the segment that the timeline started or > + * eneded at, respectively. It's sufficient to check only the > > s/eneded/ended/ ? Yes! Thanks! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
At Tue, 7 Apr 2020 12:15:00 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > > On 2020/04/07 4:04, David Steele wrote: > > On 4/6/20 1:43 PM, Fujii Masao wrote: > >> > >> > >> On 2020/03/19 22:22, Pavel Suderevsky wrote: > >>> Hi, > >>> > >>> I've tested patch provided by Kyotaro and do confirm it fixes the > >>> issue. > >> > >> The patch looks good to me. Attached is the updated version of the > >> patch. > >> I updated only comments. > >> > >> Barring any objection, I will commit this patch. > > The patch looks good to me. > > > >>> Any chance it will be merged to one of the next minor releases? > >> > >> This doesn't seem a bug, so I'm thinking to merge this to next *major* > >> version release, i.e., v13. > > Not a bug, perhaps, but I think we do consider back-patching > > performance problems. The rise in S3 usage has just exposed how poorly > > this performed code in high-latency environments. > > I understood the situation and am fine to back-patch that. But I'm not > sure > if it's fair to do that. Maybe we need to hear more opinions about > this? > OTOH, feature freeze for v13 is today, so what about committing the > patch > in v13 at first, and then doing the back-patch after hearing opinions > and > receiving many +1? +1 for commit only v13 today, then back-patch if people wants and/or accepts. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Tue, 7 Apr 2020 12:15:00 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > > On 2020/04/07 4:04, David Steele wrote: > > On 4/6/20 1:43 PM, Fujii Masao wrote: > >> > >> > >> On 2020/03/19 22:22, Pavel Suderevsky wrote: > >>> Hi, > >>> > >>> I've tested patch provided by Kyotaro and do confirm it fixes the > >>> issue. > >> > >> The patch looks good to me. Attached is the updated version of the > >> patch. > >> I updated only comments. > >> > >> Barring any objection, I will commit this patch. > > The patch looks good to me. > > > >>> Any chance it will be merged to one of the next minor releases? > >> > >> This doesn't seem a bug, so I'm thinking to merge this to next *major* > >> version release, i.e., v13. > > Not a bug, perhaps, but I think we do consider back-patching > > performance problems. The rise in S3 usage has just exposed how poorly > > this performed code in high-latency environments. > > I understood the situation and am fine to back-patch that. But I'm not > sure > if it's fair to do that. Maybe we need to hear more opinions about > this? > OTOH, feature freeze for v13 is today, so what about committing the > patch > in v13 at first, and then doing the back-patch after hearing opinions > and > receiving many +1? +1 for commit only v13 today, then back-patch if people wants and/or accepts. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, Apr 07, 2020 at 12:15:00PM +0900, Fujii Masao wrote: > I understood the situation and am fine to back-patch that. But I'm not sure > if it's fair to do that. Maybe we need to hear more opinions about this? > OTOH, feature freeze for v13 is today, so what about committing the patch > in v13 at first, and then doing the back-patch after hearing opinions and > receiving many +1? I have not looked at the patch so I cannot say much about it, but it is annoying to fetch segments you are not going to need anyway if you target recovery with a timeline older than the segments fetched and this has a cost when you pay for the bandwidth of your environment with only one archive location. So a backpatch sounds like a good thing to do even if recovery is not broken per-se, only slower. Designing a TAP test for that is tricky, but you could look at the logs of the backend to make sure that only the wanted segments are fetched with a central archived solution and multiple timelines involved. And costly it is. -- Michael
Attachment
On Tue, Apr 07, 2020 at 12:15:00PM +0900, Fujii Masao wrote: > I understood the situation and am fine to back-patch that. But I'm not sure > if it's fair to do that. Maybe we need to hear more opinions about this? > OTOH, feature freeze for v13 is today, so what about committing the patch > in v13 at first, and then doing the back-patch after hearing opinions and > receiving many +1? I have not looked at the patch so I cannot say much about it, but it is annoying to fetch segments you are not going to need anyway if you target recovery with a timeline older than the segments fetched and this has a cost when you pay for the bandwidth of your environment with only one archive location. So a backpatch sounds like a good thing to do even if recovery is not broken per-se, only slower. Designing a TAP test for that is tricky, but you could look at the logs of the backend to make sure that only the wanted segments are fetched with a central archived solution and multiple timelines involved. And costly it is. -- Michael
On 4/7/20 3:48 AM, Kyotaro Horiguchi wrote: > At Tue, 7 Apr 2020 12:15:00 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >>>> This doesn't seem a bug, so I'm thinking to merge this to next *major* >>>> version release, i.e., v13. >>> Not a bug, perhaps, but I think we do consider back-patching >>> performance problems. The rise in S3 usage has just exposed how poorly >>> this performed code in high-latency environments. >> >> I understood the situation and am fine to back-patch that. But I'm not >> sure >> if it's fair to do that. Maybe we need to hear more opinions about >> this? >> OTOH, feature freeze for v13 is today, so what about committing the >> patch >> in v13 at first, and then doing the back-patch after hearing opinions >> and >> receiving many +1? > > +1 for commit only v13 today, then back-patch if people wants and/or > accepts. Definitely +1 for a commit today to v13. I certainly was not trying to hold that up. -- -David david@pgmasters.net
On 4/7/20 3:48 AM, Kyotaro Horiguchi wrote: > At Tue, 7 Apr 2020 12:15:00 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >>>> This doesn't seem a bug, so I'm thinking to merge this to next *major* >>>> version release, i.e., v13. >>> Not a bug, perhaps, but I think we do consider back-patching >>> performance problems. The rise in S3 usage has just exposed how poorly >>> this performed code in high-latency environments. >> >> I understood the situation and am fine to back-patch that. But I'm not >> sure >> if it's fair to do that. Maybe we need to hear more opinions about >> this? >> OTOH, feature freeze for v13 is today, so what about committing the >> patch >> in v13 at first, and then doing the back-patch after hearing opinions >> and >> receiving many +1? > > +1 for commit only v13 today, then back-patch if people wants and/or > accepts. Definitely +1 for a commit today to v13. I certainly was not trying to hold that up. -- -David david@pgmasters.net
On 2020/04/07 20:21, David Steele wrote: > > On 4/7/20 3:48 AM, Kyotaro Horiguchi wrote: >> At Tue, 7 Apr 2020 12:15:00 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >>>>> This doesn't seem a bug, so I'm thinking to merge this to next *major* >>>>> version release, i.e., v13. >>>> Not a bug, perhaps, but I think we do consider back-patching >>>> performance problems. The rise in S3 usage has just exposed how poorly >>>> this performed code in high-latency environments. >>> >>> I understood the situation and am fine to back-patch that. But I'm not >>> sure >>> if it's fair to do that. Maybe we need to hear more opinions about >>> this? >>> OTOH, feature freeze for v13 is today, so what about committing the >>> patch >>> in v13 at first, and then doing the back-patch after hearing opinions >>> and >>> receiving many +1? >> >> +1 for commit only v13 today, then back-patch if people wants and/or >> accepts. > > Definitely +1 for a commit today to v13. I certainly was not trying to hold that up Pushed the patch to v13, at first! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020/04/07 20:21, David Steele wrote: > > On 4/7/20 3:48 AM, Kyotaro Horiguchi wrote: >> At Tue, 7 Apr 2020 12:15:00 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >>>>> This doesn't seem a bug, so I'm thinking to merge this to next *major* >>>>> version release, i.e., v13. >>>> Not a bug, perhaps, but I think we do consider back-patching >>>> performance problems. The rise in S3 usage has just exposed how poorly >>>> this performed code in high-latency environments. >>> >>> I understood the situation and am fine to back-patch that. But I'm not >>> sure >>> if it's fair to do that. Maybe we need to hear more opinions about >>> this? >>> OTOH, feature freeze for v13 is today, so what about committing the >>> patch >>> in v13 at first, and then doing the back-patch after hearing opinions >>> and >>> receiving many +1? >> >> +1 for commit only v13 today, then back-patch if people wants and/or >> accepts. > > Definitely +1 for a commit today to v13. I certainly was not trying to hold that up Pushed the patch to v13, at first! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020/04/08 1:49, Fujii Masao wrote: > > > On 2020/04/07 20:21, David Steele wrote: >> >> On 4/7/20 3:48 AM, Kyotaro Horiguchi wrote: >>> At Tue, 7 Apr 2020 12:15:00 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >>>>>> This doesn't seem a bug, so I'm thinking to merge this to next *major* >>>>>> version release, i.e., v13. >>>>> Not a bug, perhaps, but I think we do consider back-patching >>>>> performance problems. The rise in S3 usage has just exposed how poorly >>>>> this performed code in high-latency environments. >>>> >>>> I understood the situation and am fine to back-patch that. But I'm not >>>> sure >>>> if it's fair to do that. Maybe we need to hear more opinions about >>>> this? >>>> OTOH, feature freeze for v13 is today, so what about committing the >>>> patch >>>> in v13 at first, and then doing the back-patch after hearing opinions >>>> and >>>> receiving many +1? >>> >>> +1 for commit only v13 today, then back-patch if people wants and/or >>> accepts. Please let me revisit this. Currently Grigory Smolkin, David Steele, Michael Paquier and Pavel Suderevsky agree to the back-patch and there has been no objection to that. So we should do the back-patch? Or does anyone object to that? I don't think that this is a feature bug because archive recovery works fine from a functional perspective without this commit. OTOH, I understand that, without the commit, there is complaint about that archive recovery may be slow unnecessarily when archival storage is located in remote, e.g., Amazon S3 and it takes a long time to fetch the non-existent archive WAL file. So I'm ok to the back-patch unless there is no strong objection to that. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020/04/08 1:49, Fujii Masao wrote: > > > On 2020/04/07 20:21, David Steele wrote: >> >> On 4/7/20 3:48 AM, Kyotaro Horiguchi wrote: >>> At Tue, 7 Apr 2020 12:15:00 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >>>>>> This doesn't seem a bug, so I'm thinking to merge this to next *major* >>>>>> version release, i.e., v13. >>>>> Not a bug, perhaps, but I think we do consider back-patching >>>>> performance problems. The rise in S3 usage has just exposed how poorly >>>>> this performed code in high-latency environments. >>>> >>>> I understood the situation and am fine to back-patch that. But I'm not >>>> sure >>>> if it's fair to do that. Maybe we need to hear more opinions about >>>> this? >>>> OTOH, feature freeze for v13 is today, so what about committing the >>>> patch >>>> in v13 at first, and then doing the back-patch after hearing opinions >>>> and >>>> receiving many +1? >>> >>> +1 for commit only v13 today, then back-patch if people wants and/or >>> accepts. Please let me revisit this. Currently Grigory Smolkin, David Steele, Michael Paquier and Pavel Suderevsky agree to the back-patch and there has been no objection to that. So we should do the back-patch? Or does anyone object to that? I don't think that this is a feature bug because archive recovery works fine from a functional perspective without this commit. OTOH, I understand that, without the commit, there is complaint about that archive recovery may be slow unnecessarily when archival storage is located in remote, e.g., Amazon S3 and it takes a long time to fetch the non-existent archive WAL file. So I'm ok to the back-patch unless there is no strong objection to that. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Thu, Apr 30, 2020 at 7:46 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > On 2020/04/08 1:49, Fujii Masao wrote: > > > > > > On 2020/04/07 20:21, David Steele wrote: > >> > >> On 4/7/20 3:48 AM, Kyotaro Horiguchi wrote: > >>> At Tue, 7 Apr 2020 12:15:00 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > >>>>>> This doesn't seem a bug, so I'm thinking to merge this to next *major* > >>>>>> version release, i.e., v13. > >>>>> Not a bug, perhaps, but I think we do consider back-patching > >>>>> performance problems. The rise in S3 usage has just exposed how poorly > >>>>> this performed code in high-latency environments. > >>>> > >>>> I understood the situation and am fine to back-patch that. But I'm not > >>>> sure > >>>> if it's fair to do that. Maybe we need to hear more opinions about > >>>> this? > >>>> OTOH, feature freeze for v13 is today, so what about committing the > >>>> patch > >>>> in v13 at first, and then doing the back-patch after hearing opinions > >>>> and > >>>> receiving many +1? > >>> > >>> +1 for commit only v13 today, then back-patch if people wants and/or > >>> accepts. > > Please let me revisit this. Currently Grigory Smolkin, David Steele, > Michael Paquier and Pavel Suderevsky agree to the back-patch and > there has been no objection to that. So we should do the back-patch? > Or does anyone object to that? > > I don't think that this is a feature bug because archive recovery works > fine from a functional perspective without this commit. OTOH, > I understand that, without the commit, there is complaint about that > archive recovery may be slow unnecessarily when archival storage is > located in remote, e.g., Amazon S3 and it takes a long time to fetch > the non-existent archive WAL file. So I'm ok to the back-patch unless > there is no strong objection to that. > I don't see any obvious problem with the changed code but we normally don't backpatch performance improvements. I can see that the code change here appears to be straight forward so it might be fine to backpatch this. Have we seen similar reports earlier as well? AFAIK, this functionality is for a long time and if people were facing this on a regular basis then we would have seen such reports multiple times. I mean to say if the chances of this hitting are less then we can even choose not to backpatch this. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 30, 2020 at 7:46 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > On 2020/04/08 1:49, Fujii Masao wrote: > > > > > > On 2020/04/07 20:21, David Steele wrote: > >> > >> On 4/7/20 3:48 AM, Kyotaro Horiguchi wrote: > >>> At Tue, 7 Apr 2020 12:15:00 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > >>>>>> This doesn't seem a bug, so I'm thinking to merge this to next *major* > >>>>>> version release, i.e., v13. > >>>>> Not a bug, perhaps, but I think we do consider back-patching > >>>>> performance problems. The rise in S3 usage has just exposed how poorly > >>>>> this performed code in high-latency environments. > >>>> > >>>> I understood the situation and am fine to back-patch that. But I'm not > >>>> sure > >>>> if it's fair to do that. Maybe we need to hear more opinions about > >>>> this? > >>>> OTOH, feature freeze for v13 is today, so what about committing the > >>>> patch > >>>> in v13 at first, and then doing the back-patch after hearing opinions > >>>> and > >>>> receiving many +1? > >>> > >>> +1 for commit only v13 today, then back-patch if people wants and/or > >>> accepts. > > Please let me revisit this. Currently Grigory Smolkin, David Steele, > Michael Paquier and Pavel Suderevsky agree to the back-patch and > there has been no objection to that. So we should do the back-patch? > Or does anyone object to that? > > I don't think that this is a feature bug because archive recovery works > fine from a functional perspective without this commit. OTOH, > I understand that, without the commit, there is complaint about that > archive recovery may be slow unnecessarily when archival storage is > located in remote, e.g., Amazon S3 and it takes a long time to fetch > the non-existent archive WAL file. So I'm ok to the back-patch unless > there is no strong objection to that. > I don't see any obvious problem with the changed code but we normally don't backpatch performance improvements. I can see that the code change here appears to be straight forward so it might be fine to backpatch this. Have we seen similar reports earlier as well? AFAIK, this functionality is for a long time and if people were facing this on a regular basis then we would have seen such reports multiple times. I mean to say if the chances of this hitting are less then we can even choose not to backpatch this. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 2020/05/02 20:40, Amit Kapila wrote: > On Thu, Apr 30, 2020 at 7:46 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> On 2020/04/08 1:49, Fujii Masao wrote: >>> >>> >>> On 2020/04/07 20:21, David Steele wrote: >>>> >>>> On 4/7/20 3:48 AM, Kyotaro Horiguchi wrote: >>>>> At Tue, 7 Apr 2020 12:15:00 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >>>>>>>> This doesn't seem a bug, so I'm thinking to merge this to next *major* >>>>>>>> version release, i.e., v13. >>>>>>> Not a bug, perhaps, but I think we do consider back-patching >>>>>>> performance problems. The rise in S3 usage has just exposed how poorly >>>>>>> this performed code in high-latency environments. >>>>>> >>>>>> I understood the situation and am fine to back-patch that. But I'm not >>>>>> sure >>>>>> if it's fair to do that. Maybe we need to hear more opinions about >>>>>> this? >>>>>> OTOH, feature freeze for v13 is today, so what about committing the >>>>>> patch >>>>>> in v13 at first, and then doing the back-patch after hearing opinions >>>>>> and >>>>>> receiving many +1? >>>>> >>>>> +1 for commit only v13 today, then back-patch if people wants and/or >>>>> accepts. >> >> Please let me revisit this. Currently Grigory Smolkin, David Steele, >> Michael Paquier and Pavel Suderevsky agree to the back-patch and >> there has been no objection to that. So we should do the back-patch? >> Or does anyone object to that? >> >> I don't think that this is a feature bug because archive recovery works >> fine from a functional perspective without this commit. OTOH, >> I understand that, without the commit, there is complaint about that >> archive recovery may be slow unnecessarily when archival storage is >> located in remote, e.g., Amazon S3 and it takes a long time to fetch >> the non-existent archive WAL file. So I'm ok to the back-patch unless >> there is no strong objection to that. >> > > I don't see any obvious problem with the changed code but we normally > don't backpatch performance improvements. I can see that the code > change here appears to be straight forward so it might be fine to > backpatch this. Have we seen similar reports earlier as well? AFAIK, > this functionality is for a long time and if people were facing this > on a regular basis then we would have seen such reports multiple > times. I mean to say if the chances of this hitting are less then we > can even choose not to backpatch this. I found the following two reports. ISTM there are not so many reports... https://www.postgresql.org/message-id/16159-f5a34a3a04dc67e0@postgresql.org https://www.postgresql.org/message-id/dd6690b0-ec03-6b3c-6fac-c963f91f87a7%40postgrespro.ru Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020/05/02 20:40, Amit Kapila wrote: > On Thu, Apr 30, 2020 at 7:46 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> On 2020/04/08 1:49, Fujii Masao wrote: >>> >>> >>> On 2020/04/07 20:21, David Steele wrote: >>>> >>>> On 4/7/20 3:48 AM, Kyotaro Horiguchi wrote: >>>>> At Tue, 7 Apr 2020 12:15:00 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >>>>>>>> This doesn't seem a bug, so I'm thinking to merge this to next *major* >>>>>>>> version release, i.e., v13. >>>>>>> Not a bug, perhaps, but I think we do consider back-patching >>>>>>> performance problems. The rise in S3 usage has just exposed how poorly >>>>>>> this performed code in high-latency environments. >>>>>> >>>>>> I understood the situation and am fine to back-patch that. But I'm not >>>>>> sure >>>>>> if it's fair to do that. Maybe we need to hear more opinions about >>>>>> this? >>>>>> OTOH, feature freeze for v13 is today, so what about committing the >>>>>> patch >>>>>> in v13 at first, and then doing the back-patch after hearing opinions >>>>>> and >>>>>> receiving many +1? >>>>> >>>>> +1 for commit only v13 today, then back-patch if people wants and/or >>>>> accepts. >> >> Please let me revisit this. Currently Grigory Smolkin, David Steele, >> Michael Paquier and Pavel Suderevsky agree to the back-patch and >> there has been no objection to that. So we should do the back-patch? >> Or does anyone object to that? >> >> I don't think that this is a feature bug because archive recovery works >> fine from a functional perspective without this commit. OTOH, >> I understand that, without the commit, there is complaint about that >> archive recovery may be slow unnecessarily when archival storage is >> located in remote, e.g., Amazon S3 and it takes a long time to fetch >> the non-existent archive WAL file. So I'm ok to the back-patch unless >> there is no strong objection to that. >> > > I don't see any obvious problem with the changed code but we normally > don't backpatch performance improvements. I can see that the code > change here appears to be straight forward so it might be fine to > backpatch this. Have we seen similar reports earlier as well? AFAIK, > this functionality is for a long time and if people were facing this > on a regular basis then we would have seen such reports multiple > times. I mean to say if the chances of this hitting are less then we > can even choose not to backpatch this. I found the following two reports. ISTM there are not so many reports... https://www.postgresql.org/message-id/16159-f5a34a3a04dc67e0@postgresql.org https://www.postgresql.org/message-id/dd6690b0-ec03-6b3c-6fac-c963f91f87a7%40postgrespro.ru Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Thu, May 7, 2020 at 12:13 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > On 2020/05/02 20:40, Amit Kapila wrote: > > > > I don't see any obvious problem with the changed code but we normally > > don't backpatch performance improvements. I can see that the code > > change here appears to be straight forward so it might be fine to > > backpatch this. Have we seen similar reports earlier as well? AFAIK, > > this functionality is for a long time and if people were facing this > > on a regular basis then we would have seen such reports multiple > > times. I mean to say if the chances of this hitting are less then we > > can even choose not to backpatch this. > > I found the following two reports. ISTM there are not so many reports... > https://www.postgresql.org/message-id/16159-f5a34a3a04dc67e0@postgresql.org > https://www.postgresql.org/message-id/dd6690b0-ec03-6b3c-6fac-c963f91f87a7%40postgrespro.ru > The first seems to be the same where this bug has been fixed. It has been moved to hackers in email [1]. Am, I missing something? Considering it has been encountered by two different people, I think it would not be a bad idea to back-patch this. [1] - https://www.postgresql.org/message-id/20200129.120222.1476610231001551715.horikyota.ntt%40gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, May 7, 2020 at 12:13 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > On 2020/05/02 20:40, Amit Kapila wrote: > > > > I don't see any obvious problem with the changed code but we normally > > don't backpatch performance improvements. I can see that the code > > change here appears to be straight forward so it might be fine to > > backpatch this. Have we seen similar reports earlier as well? AFAIK, > > this functionality is for a long time and if people were facing this > > on a regular basis then we would have seen such reports multiple > > times. I mean to say if the chances of this hitting are less then we > > can even choose not to backpatch this. > > I found the following two reports. ISTM there are not so many reports... > https://www.postgresql.org/message-id/16159-f5a34a3a04dc67e0@postgresql.org > https://www.postgresql.org/message-id/dd6690b0-ec03-6b3c-6fac-c963f91f87a7%40postgrespro.ru > The first seems to be the same where this bug has been fixed. It has been moved to hackers in email [1]. Am, I missing something? Considering it has been encountered by two different people, I think it would not be a bad idea to back-patch this. [1] - https://www.postgresql.org/message-id/20200129.120222.1476610231001551715.horikyota.ntt%40gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 2020/05/07 17:57, Amit Kapila wrote: > On Thu, May 7, 2020 at 12:13 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> On 2020/05/02 20:40, Amit Kapila wrote: >>> >>> I don't see any obvious problem with the changed code but we normally >>> don't backpatch performance improvements. I can see that the code >>> change here appears to be straight forward so it might be fine to >>> backpatch this. Have we seen similar reports earlier as well? AFAIK, >>> this functionality is for a long time and if people were facing this >>> on a regular basis then we would have seen such reports multiple >>> times. I mean to say if the chances of this hitting are less then we >>> can even choose not to backpatch this. >> >> I found the following two reports. ISTM there are not so many reports... >> https://www.postgresql.org/message-id/16159-f5a34a3a04dc67e0@postgresql.org >> https://www.postgresql.org/message-id/dd6690b0-ec03-6b3c-6fac-c963f91f87a7%40postgrespro.ru >> > > The first seems to be the same where this bug has been fixed. It has > been moved to hackers in email [1]. Yes, that's the original report that leaded to the commit. > Am, I missing something? > Considering it has been encountered by two different people, I think > it would not be a bad idea to back-patch this. +1 So I will do the back-patch. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020/05/07 17:57, Amit Kapila wrote: > On Thu, May 7, 2020 at 12:13 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> On 2020/05/02 20:40, Amit Kapila wrote: >>> >>> I don't see any obvious problem with the changed code but we normally >>> don't backpatch performance improvements. I can see that the code >>> change here appears to be straight forward so it might be fine to >>> backpatch this. Have we seen similar reports earlier as well? AFAIK, >>> this functionality is for a long time and if people were facing this >>> on a regular basis then we would have seen such reports multiple >>> times. I mean to say if the chances of this hitting are less then we >>> can even choose not to backpatch this. >> >> I found the following two reports. ISTM there are not so many reports... >> https://www.postgresql.org/message-id/16159-f5a34a3a04dc67e0@postgresql.org >> https://www.postgresql.org/message-id/dd6690b0-ec03-6b3c-6fac-c963f91f87a7%40postgrespro.ru >> > > The first seems to be the same where this bug has been fixed. It has > been moved to hackers in email [1]. Yes, that's the original report that leaded to the commit. > Am, I missing something? > Considering it has been encountered by two different people, I think > it would not be a bad idea to back-patch this. +1 So I will do the back-patch. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020/05/08 14:23, Fujii Masao wrote: > > > On 2020/05/07 17:57, Amit Kapila wrote: >> On Thu, May 7, 2020 at 12:13 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>> >>> On 2020/05/02 20:40, Amit Kapila wrote: >>>> >>>> I don't see any obvious problem with the changed code but we normally >>>> don't backpatch performance improvements. I can see that the code >>>> change here appears to be straight forward so it might be fine to >>>> backpatch this. Have we seen similar reports earlier as well? AFAIK, >>>> this functionality is for a long time and if people were facing this >>>> on a regular basis then we would have seen such reports multiple >>>> times. I mean to say if the chances of this hitting are less then we >>>> can even choose not to backpatch this. >>> >>> I found the following two reports. ISTM there are not so many reports... >>> https://www.postgresql.org/message-id/16159-f5a34a3a04dc67e0@postgresql.org >>> https://www.postgresql.org/message-id/dd6690b0-ec03-6b3c-6fac-c963f91f87a7%40postgrespro.ru >>> >> >> The first seems to be the same where this bug has been fixed. It has >> been moved to hackers in email [1]. > > Yes, that's the original report that leaded to the commit. > >> Am, I missing something? >> Considering it has been encountered by two different people, I think >> it would not be a bad idea to back-patch this. > > +1 So I will do the back-patch. Done. Thanks! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020/05/08 14:23, Fujii Masao wrote: > > > On 2020/05/07 17:57, Amit Kapila wrote: >> On Thu, May 7, 2020 at 12:13 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>> >>> On 2020/05/02 20:40, Amit Kapila wrote: >>>> >>>> I don't see any obvious problem with the changed code but we normally >>>> don't backpatch performance improvements. I can see that the code >>>> change here appears to be straight forward so it might be fine to >>>> backpatch this. Have we seen similar reports earlier as well? AFAIK, >>>> this functionality is for a long time and if people were facing this >>>> on a regular basis then we would have seen such reports multiple >>>> times. I mean to say if the chances of this hitting are less then we >>>> can even choose not to backpatch this. >>> >>> I found the following two reports. ISTM there are not so many reports... >>> https://www.postgresql.org/message-id/16159-f5a34a3a04dc67e0@postgresql.org >>> https://www.postgresql.org/message-id/dd6690b0-ec03-6b3c-6fac-c963f91f87a7%40postgrespro.ru >>> >> >> The first seems to be the same where this bug has been fixed. It has >> been moved to hackers in email [1]. > > Yes, that's the original report that leaded to the commit. > >> Am, I missing something? >> Considering it has been encountered by two different people, I think >> it would not be a bad idea to back-patch this. > > +1 So I will do the back-patch. Done. Thanks! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION