Thread: pg_receivexlog and feedback message
Right now, pg_receivexlog sets: replymsg->write = InvalidXLogRecPtr; replymsg->flush = InvalidXLogRecPtr; replymsg->apply = InvalidXLogRecPtr; when it sends it's status updates. I'm thinking it sohuld set replymsg->write = blockpos instad. Why? That way you can see in pg_stat_replication what has actually been received by pg_receivexlog - not just what we last sent. This can be useful in combination with an archive_command that can block WAL recycling until it has been saved to the standby. And it would be useful as a general monitoring thing as well. I think the original reason was that it shouldn't interefer with synchronous replication - but it does take away a fairly useful usecase... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander <magnus@hagander.net> wrote: > Right now, pg_receivexlog sets: > replymsg->write = InvalidXLogRecPtr; > replymsg->flush = InvalidXLogRecPtr; > replymsg->apply = InvalidXLogRecPtr; > > when it sends it's status updates. > > I'm thinking it sohuld set replymsg->write = blockpos instad. > > Why? That way you can see in pg_stat_replication what has actually > been received by pg_receivexlog - not just what we last sent. This can > be useful in combination with an archive_command that can block WAL > recycling until it has been saved to the standby. And it would be > useful as a general monitoring thing as well. > > I think the original reason was that it shouldn't interefer with > synchronous replication - but it does take away a fairly useful > usecase... I think that not only replaymsg->write but also ->flush should be set to blockpos in pg_receivexlog. Which allows pg_receivexlog to behave as synchronous standby, so we can write WAL to both local and remote synchronously. I believe there are some use cases for synchronous pg_receivexlog. OTOH, neither replaymsg->write nor ->flush should be set to InvalidXLogRecPtr, to prevent pg_basebackup from behaving as synchronous standby. Regards, -- Fujii Masao
On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander <magnus@hagander.net> wrote: >> Right now, pg_receivexlog sets: >> replymsg->write = InvalidXLogRecPtr; >> replymsg->flush = InvalidXLogRecPtr; >> replymsg->apply = InvalidXLogRecPtr; >> >> when it sends it's status updates. >> >> I'm thinking it sohuld set replymsg->write = blockpos instad. >> >> Why? That way you can see in pg_stat_replication what has actually >> been received by pg_receivexlog - not just what we last sent. This can >> be useful in combination with an archive_command that can block WAL >> recycling until it has been saved to the standby. And it would be >> useful as a general monitoring thing as well. >> >> I think the original reason was that it shouldn't interefer with >> synchronous replication - but it does take away a fairly useful >> usecase... > > I think that not only replaymsg->write but also ->flush should be set to > blockpos in pg_receivexlog. Which allows pg_receivexlog to behave > as synchronous standby, so we can write WAL to both local and remote > synchronously. I believe there are some use cases for synchronous > pg_receivexlog. pg_receivexlog doesn't currently fsync() after every write. It only fsync():s complete files. So we'd need to set ->flush only at the end of a segment, right? > OTOH, neither replaymsg->write nor ->flush should be set to > InvalidXLogRecPtr, to prevent pg_basebackup from behaving as > synchronous standby. Oh, good point. So yeah, we'd need to make it a parameter to the function. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Tue, Jun 5, 2012 at 11:44 PM, Magnus Hagander <magnus@hagander.net> wrote: > On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander <magnus@hagander.net> wrote: >>> Right now, pg_receivexlog sets: >>> replymsg->write = InvalidXLogRecPtr; >>> replymsg->flush = InvalidXLogRecPtr; >>> replymsg->apply = InvalidXLogRecPtr; >>> >>> when it sends it's status updates. >>> >>> I'm thinking it sohuld set replymsg->write = blockpos instad. >>> >>> Why? That way you can see in pg_stat_replication what has actually >>> been received by pg_receivexlog - not just what we last sent. This can >>> be useful in combination with an archive_command that can block WAL >>> recycling until it has been saved to the standby. And it would be >>> useful as a general monitoring thing as well. >>> >>> I think the original reason was that it shouldn't interefer with >>> synchronous replication - but it does take away a fairly useful >>> usecase... >> >> I think that not only replaymsg->write but also ->flush should be set to >> blockpos in pg_receivexlog. Which allows pg_receivexlog to behave >> as synchronous standby, so we can write WAL to both local and remote >> synchronously. I believe there are some use cases for synchronous >> pg_receivexlog. > > pg_receivexlog doesn't currently fsync() after every write. It only > fsync():s complete files. So we'd need to set ->flush only at the end > of a segment, right? Yes. Currently the status update is sent for each status interval. In sync replication, transaction has to wait for a while even after pg_receivexlog has written or flushed the WAL data. So we should add new option which specifies whether pg_receivexlog sends the status packet back as soon as it writes or flushes the WAL data, like the walreceiver does? Regards, -- Fujii Masao
On Tue, Jun 5, 2012 at 10:44 AM, Magnus Hagander <magnus@hagander.net> wrote: > On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander <magnus@hagander.net> wrote: >>> Right now, pg_receivexlog sets: >>> replymsg->write = InvalidXLogRecPtr; >>> replymsg->flush = InvalidXLogRecPtr; >>> replymsg->apply = InvalidXLogRecPtr; >>> >>> when it sends it's status updates. >>> >>> I'm thinking it sohuld set replymsg->write = blockpos instad. >>> >>> Why? That way you can see in pg_stat_replication what has actually >>> been received by pg_receivexlog - not just what we last sent. This can >>> be useful in combination with an archive_command that can block WAL >>> recycling until it has been saved to the standby. And it would be >>> useful as a general monitoring thing as well. >>> >>> I think the original reason was that it shouldn't interefer with >>> synchronous replication - but it does take away a fairly useful >>> usecase... >> >> I think that not only replaymsg->write but also ->flush should be set to >> blockpos in pg_receivexlog. Which allows pg_receivexlog to behave >> as synchronous standby, so we can write WAL to both local and remote >> synchronously. I believe there are some use cases for synchronous >> pg_receivexlog. > > pg_receivexlog doesn't currently fsync() after every write. It only > fsync():s complete files. So we'd need to set ->flush only at the end > of a segment, right? If you want to be able to use it as a synchronous standby, that's not going to work very well. You could end up with pg_receivexlog waiting for the end of the segment before it flushes; meanwhile, all the clients are sitting there waiting for the flush to happen before they do anything that could generate more WAL to fill the segment. Unless you have a solution to that problem, I'd recommend setting write (which should work with the new remote_write mode for sync rep) but not setting flush. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jun 6, 2012 at 8:26 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Tue, Jun 5, 2012 at 11:44 PM, Magnus Hagander <magnus@hagander.net> wrote: >> On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander <magnus@hagander.net> wrote: >>>> Right now, pg_receivexlog sets: >>>> replymsg->write = InvalidXLogRecPtr; >>>> replymsg->flush = InvalidXLogRecPtr; >>>> replymsg->apply = InvalidXLogRecPtr; >>>> >>>> when it sends it's status updates. >>>> >>>> I'm thinking it sohuld set replymsg->write = blockpos instad. >>>> >>>> Why? That way you can see in pg_stat_replication what has actually >>>> been received by pg_receivexlog - not just what we last sent. This can >>>> be useful in combination with an archive_command that can block WAL >>>> recycling until it has been saved to the standby. And it would be >>>> useful as a general monitoring thing as well. >>>> >>>> I think the original reason was that it shouldn't interefer with >>>> synchronous replication - but it does take away a fairly useful >>>> usecase... >>> >>> I think that not only replaymsg->write but also ->flush should be set to >>> blockpos in pg_receivexlog. Which allows pg_receivexlog to behave >>> as synchronous standby, so we can write WAL to both local and remote >>> synchronously. I believe there are some use cases for synchronous >>> pg_receivexlog. >> >> pg_receivexlog doesn't currently fsync() after every write. It only >> fsync():s complete files. So we'd need to set ->flush only at the end >> of a segment, right? > > Yes. > > Currently the status update is sent for each status interval. In sync > replication, transaction has to wait for a while even after pg_receivexlog > has written or flushed the WAL data. > > So we should add new option which specifies whether pg_receivexlog > sends the status packet back as soon as it writes or flushes the WAL > data, like the walreceiver does? That might be useful, but I think that's 9.3 material at this point. But I think we can get the "set the write location" in as a bugfix. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Thu, Jun 7, 2012 at 5:05 AM, Magnus Hagander <magnus@hagander.net> wrote: > On Wed, Jun 6, 2012 at 8:26 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Tue, Jun 5, 2012 at 11:44 PM, Magnus Hagander <magnus@hagander.net> wrote: >>> On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>> On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander <magnus@hagander.net> wrote: >>>>> Right now, pg_receivexlog sets: >>>>> replymsg->write = InvalidXLogRecPtr; >>>>> replymsg->flush = InvalidXLogRecPtr; >>>>> replymsg->apply = InvalidXLogRecPtr; >>>>> >>>>> when it sends it's status updates. >>>>> >>>>> I'm thinking it sohuld set replymsg->write = blockpos instad. >>>>> >>>>> Why? That way you can see in pg_stat_replication what has actually >>>>> been received by pg_receivexlog - not just what we last sent. This can >>>>> be useful in combination with an archive_command that can block WAL >>>>> recycling until it has been saved to the standby. And it would be >>>>> useful as a general monitoring thing as well. >>>>> >>>>> I think the original reason was that it shouldn't interefer with >>>>> synchronous replication - but it does take away a fairly useful >>>>> usecase... >>>> >>>> I think that not only replaymsg->write but also ->flush should be set to >>>> blockpos in pg_receivexlog. Which allows pg_receivexlog to behave >>>> as synchronous standby, so we can write WAL to both local and remote >>>> synchronously. I believe there are some use cases for synchronous >>>> pg_receivexlog. >>> >>> pg_receivexlog doesn't currently fsync() after every write. It only >>> fsync():s complete files. So we'd need to set ->flush only at the end >>> of a segment, right? >> >> Yes. >> >> Currently the status update is sent for each status interval. In sync >> replication, transaction has to wait for a while even after pg_receivexlog >> has written or flushed the WAL data. >> >> So we should add new option which specifies whether pg_receivexlog >> sends the status packet back as soon as it writes or flushes the WAL >> data, like the walreceiver does? > > That might be useful, but I think that's 9.3 material at this point. Fair enough. That's new feature rather than a bugfix. > But I think we can get the "set the write location" in as a bugfix. Also "set the flush location"? Sending the flush location back seems helpful when using pg_receivexlog for WAL archiving purpose. By seeing the flush location we can ensure that WAL file has been archived durably (IOW, WAL file has been flushed in remote archive area). Regards, -- Fujii Masao
On Thursday, June 7, 2012, Fujii Masao wrote:
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On Thu, Jun 7, 2012 at 5:05 AM, Magnus Hagander <magnus@hagander.net> wrote:
> On Wed, Jun 6, 2012 at 8:26 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Tue, Jun 5, 2012 at 11:44 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>> On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>> On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>>>> Right now, pg_receivexlog sets:
>>>>> replymsg->write = InvalidXLogRecPtr;
>>>>> replymsg->flush = InvalidXLogRecPtr;
>>>>> replymsg->apply = InvalidXLogRecPtr;
>>>>>
>>>>> when it sends it's status updates.
>>>>>
>>>>> I'm thinking it sohuld set replymsg->write = blockpos instad.
>>>>>
>>>>> Why? That way you can see in pg_stat_replication what has actually
>>>>> been received by pg_receivexlog - not just what we last sent. This can
>>>>> be useful in combination with an archive_command that can block WAL
>>>>> recycling until it has been saved to the standby. And it would be
>>>>> useful as a general monitoring thing as well.
>>>>>
>>>>> I think the original reason was that it shouldn't interefer with
>>>>> synchronous replication - but it does take away a fairly useful
>>>>> usecase...
>>>>
>>>> I think that not only replaymsg->write but also ->flush should be set to
>>>> blockpos in pg_receivexlog. Which allows pg_receivexlog to behave
>>>> as synchronous standby, so we can write WAL to both local and remote
>>>> synchronously. I believe there are some use cases for synchronous
>>>> pg_receivexlog.
>>>
>>> pg_receivexlog doesn't currently fsync() after every write. It only
>>> fsync():s complete files. So we'd need to set ->flush only at the end
>>> of a segment, right?
>>
>> Yes.
>>
>> Currently the status update is sent for each status interval. In sync
>> replication, transaction has to wait for a while even after pg_receivexlog
>> has written or flushed the WAL data.
>>
>> So we should add new option which specifies whether pg_receivexlog
>> sends the status packet back as soon as it writes or flushes the WAL
>> data, like the walreceiver does?
>
> That might be useful, but I think that's 9.3 material at this point.
Fair enough. That's new feature rather than a bugfix.
> But I think we can get the "set the write location" in as a bugfix.
Also "set the flush location"? Sending the flush location back seems
helpful when using pg_receivexlog for WAL archiving purpose. By
seeing the flush location we can ensure that WAL file has been archived
durably (IOW, WAL file has been flushed in remote archive area).
You can do that with the write location as well, as long as you round it off to complete segments, can't you?
In fact that's exactly the usecase that got me to realize we were missing this :-)
//Magnus
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On Thu, Jun 7, 2012 at 6:25 PM, Magnus Hagander <magnus@hagander.net> wrote: > On Thursday, June 7, 2012, Fujii Masao wrote: >> >> On Thu, Jun 7, 2012 at 5:05 AM, Magnus Hagander <magnus@hagander.net> >> wrote: >> > On Wed, Jun 6, 2012 at 8:26 PM, Fujii Masao <masao.fujii@gmail.com> >> > wrote: >> >> On Tue, Jun 5, 2012 at 11:44 PM, Magnus Hagander <magnus@hagander.net> >> >> wrote: >> >>> On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao <masao.fujii@gmail.com> >> >>> wrote: >> >>>> On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander <magnus@hagander.net> >> >>>> wrote: >> >>>>> Right now, pg_receivexlog sets: >> >>>>> replymsg->write = InvalidXLogRecPtr; >> >>>>> replymsg->flush = InvalidXLogRecPtr; >> >>>>> replymsg->apply = InvalidXLogRecPtr; >> >>>>> >> >>>>> when it sends it's status updates. >> >>>>> >> >>>>> I'm thinking it sohuld set replymsg->write = blockpos instad. >> >>>>> >> >>>>> Why? That way you can see in pg_stat_replication what has actually >> >>>>> been received by pg_receivexlog - not just what we last sent. This >> >>>>> can >> >>>>> be useful in combination with an archive_command that can block WAL >> >>>>> recycling until it has been saved to the standby. And it would be >> >>>>> useful as a general monitoring thing as well. >> >>>>> >> >>>>> I think the original reason was that it shouldn't interefer with >> >>>>> synchronous replication - but it does take away a fairly useful >> >>>>> usecase... >> >>>> >> >>>> I think that not only replaymsg->write but also ->flush should be set >> >>>> to >> >>>> blockpos in pg_receivexlog. Which allows pg_receivexlog to behave >> >>>> as synchronous standby, so we can write WAL to both local and remote >> >>>> synchronously. I believe there are some use cases for synchronous >> >>>> pg_receivexlog. >> >>> >> >>> pg_receivexlog doesn't currently fsync() after every write. It only >> >>> fsync():s complete files. So we'd need to set ->flush only at the end >> >>> of a segment, right? >> >> >> >> Yes. >> >> >> >> Currently the status update is sent for each status interval. In sync >> >> replication, transaction has to wait for a while even after >> >> pg_receivexlog >> >> has written or flushed the WAL data. >> >> >> >> So we should add new option which specifies whether pg_receivexlog >> >> sends the status packet back as soon as it writes or flushes the WAL >> >> data, like the walreceiver does? >> > >> > That might be useful, but I think that's 9.3 material at this point. >> >> Fair enough. That's new feature rather than a bugfix. >> >> > But I think we can get the "set the write location" in as a bugfix. >> >> Also "set the flush location"? Sending the flush location back seems >> helpful when using pg_receivexlog for WAL archiving purpose. By >> seeing the flush location we can ensure that WAL file has been archived >> durably (IOW, WAL file has been flushed in remote archive area). >> > > You can do that with the write location as well, as long as you round it > off to complete segments, can't you? You mean to prevent pg_receivexlog from sending back the end of WAL file as the write location *before* it completes the WAL file? If so, yes. But why do you want to keep the flush location invalid? Regards, -- Fujii Masao
On Thursday, June 7, 2012, Fujii Masao wrote:
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On Thu, Jun 7, 2012 at 6:25 PM, Magnus Hagander <magnus@hagander.net> wrote:
> On Thursday, June 7, 2012, Fujii Masao wrote:
>>
>> On Thu, Jun 7, 2012 at 5:05 AM, Magnus Hagander <magnus@hagander.net>
>> wrote:
>> > On Wed, Jun 6, 2012 at 8:26 PM, Fujii Masao <masao.fujii@gmail.com>
>> > wrote:
>> >> On Tue, Jun 5, 2012 at 11:44 PM, Magnus Hagander <magnus@hagander.net>
>> >> wrote:
>> >>> On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao <masao.fujii@gmail.com>
>> >>> wrote:
>> >>>> On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander <magnus@hagander.net>
>> >>>> wrote:
>> >>>>> Right now, pg_receivexlog sets:
>> >>>>> replymsg->write = InvalidXLogRecPtr;
>> >>>>> replymsg->flush = InvalidXLogRecPtr;
>> >>>>> replymsg->apply = InvalidXLogRecPtr;
>> >>>>>
>> >>>>> when it sends it's status updates.
>> >>>>>
>> >>>>> I'm thinking it sohuld set replymsg->write = blockpos instad.
>> >>>>>
>> >>>>> Why? That way you can see in pg_stat_replication what has actually
>> >>>>> been received by pg_receivexlog - not just what we last sent. This
>> >>>>> can
>> >>>>> be useful in combination with an archive_command that can block WAL
>> >>>>> recycling until it has been saved to the standby. And it would be
>> >>>>> useful as a general monitoring thing as well.
>> >>>>>
>> >>>>> I think the original reason was that it shouldn't interefer with
>> >>>>> synchronous replication - but it does take away a fairly useful
>> >>>>> usecase...
>> >>>>
>> >>>> I think that not only replaymsg->write but also ->flush should be set
>> >>>> to
>> >>>> blockpos in pg_receivexlog. Which allows pg_receivexlog to behave
>> >>>> as synchronous standby, so we can write WAL to both local and remote
>> >>>> synchronously. I believe there are some use cases for synchronous
>> >>>> pg_receivexlog.
>> >>>
>> >>> pg_receivexlog doesn't currently fsync() after every write. It only
>> >>> fsync():s complete files. So we'd need to set ->flush only at the end
>> >>> of a segment, right?
>> >>
>> >> Yes.
>> >>
>> >> Currently the status update is sent for each status interval. In sync
>> >> replication, transaction has to wait for a while even after
>> >> pg_receivexlog
>> >> has written or flushed the WAL data.
>> >>
>> >> So we should add new option which specifies whether pg_receivexlog
>> >> sends the status packet back as soon as it writes or flushes the WAL
>> >> data, like the walreceiver does?
>> >
>> > That might be useful, but I think that's 9.3 material at this point.
>>
>> Fair enough. That's new feature rather than a bugfix.
>>
>> > But I think we can get the "set the write location" in as a bugfix.
>>
>> Also "set the flush location"? Sending the flush location back seems
>> helpful when using pg_receivexlog for WAL archiving purpose. By
>> seeing the flush location we can ensure that WAL file has been archived
>> durably (IOW, WAL file has been flushed in remote archive area).
>>
>
> You can do that with the write location as well, as long as you round it
You mean to prevent pg_receivexlog from sending back the end of WAL file
as the write location *before* it completes the WAL file? If so, yes. But
why do you want to keep the flush location invalid?
No. pg_receivexlog sends back the correct write location. Whoever does the check (through pg_stat_replication) rounds down, so it only counts it once pg_receivexlog has acknowledged receiving the whole mail.
I'm not against doing the flush location as well, I'm just worried about feature-creep :-) But let's see how big a change that would turn out to be...
//Magnus
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On Thu, Jun 7, 2012 at 12:40 PM, Magnus Hagander <magnus@hagander.net> wrote: > On Thursday, June 7, 2012, Fujii Masao wrote: >> >> On Thu, Jun 7, 2012 at 6:25 PM, Magnus Hagander <magnus@hagander.net> >> wrote: >> > On Thursday, June 7, 2012, Fujii Masao wrote: >> >> >> >> On Thu, Jun 7, 2012 at 5:05 AM, Magnus Hagander <magnus@hagander.net> >> >> wrote: >> >> > On Wed, Jun 6, 2012 at 8:26 PM, Fujii Masao <masao.fujii@gmail.com> >> >> > wrote: >> >> >> On Tue, Jun 5, 2012 at 11:44 PM, Magnus Hagander >> >> >> <magnus@hagander.net> >> >> >> wrote: >> >> >>> On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao <masao.fujii@gmail.com> >> >> >>> wrote: >> >> >>>> On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander >> >> >>>> <magnus@hagander.net> >> >> >>>> wrote: >> >> >>>>> Right now, pg_receivexlog sets: >> >> >>>>> replymsg->write = InvalidXLogRecPtr; >> >> >>>>> replymsg->flush = InvalidXLogRecPtr; >> >> >>>>> replymsg->apply = InvalidXLogRecPtr; >> >> >>>>> >> >> >>>>> when it sends it's status updates. >> >> >>>>> >> >> >>>>> I'm thinking it sohuld set replymsg->write = blockpos instad. >> >> >>>>> >> >> >>>>> Why? That way you can see in pg_stat_replication what has >> >> >>>>> actually >> >> >>>>> been received by pg_receivexlog - not just what we last sent. >> >> >>>>> This >> >> >>>>> can >> >> >>>>> be useful in combination with an archive_command that can block >> >> >>>>> WAL >> >> >>>>> recycling until it has been saved to the standby. And it would be >> >> >>>>> useful as a general monitoring thing as well. >> >> >>>>> >> >> >>>>> I think the original reason was that it shouldn't interefer with >> >> >>>>> synchronous replication - but it does take away a fairly useful >> >> >>>>> usecase... >> >> >>>> >> >> >>>> I think that not only replaymsg->write but also ->flush should be >> >> >>>> set >> >> >>>> to >> >> >>>> blockpos in pg_receivexlog. Which allows pg_receivexlog to behave >> >> >>>> as synchronous standby, so we can write WAL to both local and >> >> >>>> remote >> >> >>>> synchronously. I believe there are some use cases for synchronous >> >> >>>> pg_receivexlog. >> >> >>> >> >> >>> pg_receivexlog doesn't currently fsync() after every write. It only >> >> >>> fsync():s complete files. So we'd need to set ->flush only at the >> >> >>> end >> >> >>> of a segment, right? >> >> >> >> >> >> Yes. >> >> >> >> >> >> Currently the status update is sent for each status interval. In >> >> >> sync >> >> >> replication, transaction has to wait for a while even after >> >> >> pg_receivexlog >> >> >> has written or flushed the WAL data. >> >> >> >> >> >> So we should add new option which specifies whether pg_receivexlog >> >> >> sends the status packet back as soon as it writes or flushes the WAL >> >> >> data, like the walreceiver does? >> >> > >> >> > That might be useful, but I think that's 9.3 material at this point. >> >> >> >> Fair enough. That's new feature rather than a bugfix. >> >> >> >> > But I think we can get the "set the write location" in as a bugfix. >> >> >> >> Also "set the flush location"? Sending the flush location back seems >> >> helpful when using pg_receivexlog for WAL archiving purpose. By >> >> seeing the flush location we can ensure that WAL file has been archived >> >> durably (IOW, WAL file has been flushed in remote archive area). >> >> >> > >> > You can do that with the write location as well, as long as you round >> > it >> You mean to prevent pg_receivexlog from sending back the end of WAL file >> as the write location *before* it completes the WAL file? If so, yes. But >> why do you want to keep the flush location invalid? > > > No. pg_receivexlog sends back the correct write location. Whoever does the > check (through pg_stat_replication) rounds down, so it only counts it once > pg_receivexlog has acknowledged receiving the whole mail. > > I'm not against doing the flush location as well, I'm just worried about > feature-creep :-) But let's see how big a change that would turn out to > be... How about this? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Attachment
On Sun, Jun 10, 2012 at 7:55 PM, Magnus Hagander <magnus@hagander.net> wrote: > How about this? + /* + * Set flushed position to the last byte in the previous + * file. Per above we know that xrecoff%XLOG_SEG_SIZE=0 + */ + flushedpos = blockpos; + if (flushedpos.xrecoff == 0) + { + flushedpos.xlogid--; + flushedpos.xrecoff = XLogFileSize-1; + } + else + flushedpos.xrecoff--; flushedpos.xrecoff doesn't need to be decremented by one. If xrecoff % XLOG_SEG_SIZE = 0, the position should be the last byte of previous (i.e., flushed) WAL file. Regards, -- Fujii Masao
On Sun, Jun 10, 2012 at 4:02 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Sun, Jun 10, 2012 at 7:55 PM, Magnus Hagander <magnus@hagander.net> wrote: >> How about this? > > + /* > + * Set flushed position to the last byte in the previous > + * file. Per above we know that xrecoff%XLOG_SEG_SIZE=0 > + */ > + flushedpos = blockpos; > + if (flushedpos.xrecoff == 0) > + { > + flushedpos.xlogid--; > + flushedpos.xrecoff = XLogFileSize-1; > + } > + else > + flushedpos.xrecoff--; > > flushedpos.xrecoff doesn't need to be decremented by one. > If xrecoff % XLOG_SEG_SIZE = 0, the position should be the last > byte of previous (i.e., flushed) WAL file. Hmm. I thikn I confused myself with "last byte written" vs "current position". And we're dealing with current position here... So it should just be flushedpos = blockpos and be done with it, right? Though before I commit anything with this, we need to decide what to wrt syncrep on that, per the other thread. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Mon, Jun 11, 2012 at 10:04 PM, Magnus Hagander <magnus@hagander.net> wrote: > On Sun, Jun 10, 2012 at 4:02 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Sun, Jun 10, 2012 at 7:55 PM, Magnus Hagander <magnus@hagander.net> wrote: >>> How about this? >> >> + /* >> + * Set flushed position to the last byte in the previous >> + * file. Per above we know that xrecoff%XLOG_SEG_SIZE=0 >> + */ >> + flushedpos = blockpos; >> + if (flushedpos.xrecoff == 0) >> + { >> + flushedpos.xlogid--; >> + flushedpos.xrecoff = XLogFileSize-1; >> + } >> + else >> + flushedpos.xrecoff--; >> >> flushedpos.xrecoff doesn't need to be decremented by one. >> If xrecoff % XLOG_SEG_SIZE = 0, the position should be the last >> byte of previous (i.e., flushed) WAL file. > > Hmm. I thikn I confused myself with "last byte written" vs "current > position". And we're dealing with current position here... > > So it should just be flushedpos = blockpos and be done with it, right? Yep. > Though before I commit anything with this, we need to decide what to > wrt syncrep on that, per the other thread. Yep. Regards, -- Fujii Masao
On Mon, Jun 11, 2012 at 5:24 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Mon, Jun 11, 2012 at 10:04 PM, Magnus Hagander <magnus@hagander.net> wrote: >> On Sun, Jun 10, 2012 at 4:02 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> On Sun, Jun 10, 2012 at 7:55 PM, Magnus Hagander <magnus@hagander.net> wrote: >>>> How about this? >>> >>> + /* >>> + * Set flushed position to the last byte in the previous >>> + * file. Per above we know that xrecoff%XLOG_SEG_SIZE=0 >>> + */ >>> + flushedpos = blockpos; >>> + if (flushedpos.xrecoff == 0) >>> + { >>> + flushedpos.xlogid--; >>> + flushedpos.xrecoff = XLogFileSize-1; >>> + } >>> + else >>> + flushedpos.xrecoff--; >>> >>> flushedpos.xrecoff doesn't need to be decremented by one. >>> If xrecoff % XLOG_SEG_SIZE = 0, the position should be the last >>> byte of previous (i.e., flushed) WAL file. >> >> Hmm. I thikn I confused myself with "last byte written" vs "current >> position". And we're dealing with current position here... >> >> So it should just be flushedpos = blockpos and be done with it, right? > > Yep. > >> Though before I commit anything with this, we need to decide what to >> wrt syncrep on that, per the other thread. > > Yep. Per the other thread, we decided to postpone this until 9.3. And also figure out a better set of switches for pg_receivexlog to control it with. -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/