Thread: pg_receivexlog and feedback message

pg_receivexlog and feedback message

From
Magnus Hagander
Date:
Right now, pg_receivexlog sets:        replymsg->write = InvalidXLogRecPtr;        replymsg->flush = InvalidXLogRecPtr;
      replymsg->apply = InvalidXLogRecPtr; 

when it sends it's status updates.

I'm thinking it sohuld set replymsg->write = blockpos instad.

Why? That way you can see in pg_stat_replication what has actually
been received by pg_receivexlog - not just what we last sent. This can
be useful in combination with an archive_command that can block WAL
recycling until it has been saved to the standby. And it would be
useful as a general monitoring thing as well.

I think the original reason was that it shouldn't interefer with
synchronous replication - but it does take away a fairly useful
usecase...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: pg_receivexlog and feedback message

From
Fujii Masao
Date:
On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander <magnus@hagander.net> wrote:
> Right now, pg_receivexlog sets:
>                        replymsg->write = InvalidXLogRecPtr;
>                        replymsg->flush = InvalidXLogRecPtr;
>                        replymsg->apply = InvalidXLogRecPtr;
>
> when it sends it's status updates.
>
> I'm thinking it sohuld set replymsg->write = blockpos instad.
>
> Why? That way you can see in pg_stat_replication what has actually
> been received by pg_receivexlog - not just what we last sent. This can
> be useful in combination with an archive_command that can block WAL
> recycling until it has been saved to the standby. And it would be
> useful as a general monitoring thing as well.
>
> I think the original reason was that it shouldn't interefer with
> synchronous replication - but it does take away a fairly useful
> usecase...

I think that not only replaymsg->write but also ->flush should be set to
blockpos in pg_receivexlog. Which allows pg_receivexlog to behave
as synchronous standby, so we can write WAL to both local and remote
synchronously. I believe there are some use cases for synchronous
pg_receivexlog.

OTOH, neither replaymsg->write nor ->flush should be set to
InvalidXLogRecPtr, to prevent pg_basebackup from behaving as
synchronous standby.

Regards,

--
Fujii Masao


Re: pg_receivexlog and feedback message

From
Magnus Hagander
Date:
On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> Right now, pg_receivexlog sets:
>>                        replymsg->write = InvalidXLogRecPtr;
>>                        replymsg->flush = InvalidXLogRecPtr;
>>                        replymsg->apply = InvalidXLogRecPtr;
>>
>> when it sends it's status updates.
>>
>> I'm thinking it sohuld set replymsg->write = blockpos instad.
>>
>> Why? That way you can see in pg_stat_replication what has actually
>> been received by pg_receivexlog - not just what we last sent. This can
>> be useful in combination with an archive_command that can block WAL
>> recycling until it has been saved to the standby. And it would be
>> useful as a general monitoring thing as well.
>>
>> I think the original reason was that it shouldn't interefer with
>> synchronous replication - but it does take away a fairly useful
>> usecase...
>
> I think that not only replaymsg->write but also ->flush should be set to
> blockpos in pg_receivexlog. Which allows pg_receivexlog to behave
> as synchronous standby, so we can write WAL to both local and remote
> synchronously. I believe there are some use cases for synchronous
> pg_receivexlog.

pg_receivexlog doesn't currently fsync() after every write. It only
fsync():s complete files. So we'd need to set ->flush only at the end
of a segment, right?


> OTOH, neither replaymsg->write nor ->flush should be set to
> InvalidXLogRecPtr, to prevent pg_basebackup from behaving as
> synchronous standby.

Oh, good point. So yeah, we'd need to make it a parameter to the function.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: pg_receivexlog and feedback message

From
Fujii Masao
Date:
On Tue, Jun 5, 2012 at 11:44 PM, Magnus Hagander <magnus@hagander.net> wrote:
> On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>> Right now, pg_receivexlog sets:
>>>                        replymsg->write = InvalidXLogRecPtr;
>>>                        replymsg->flush = InvalidXLogRecPtr;
>>>                        replymsg->apply = InvalidXLogRecPtr;
>>>
>>> when it sends it's status updates.
>>>
>>> I'm thinking it sohuld set replymsg->write = blockpos instad.
>>>
>>> Why? That way you can see in pg_stat_replication what has actually
>>> been received by pg_receivexlog - not just what we last sent. This can
>>> be useful in combination with an archive_command that can block WAL
>>> recycling until it has been saved to the standby. And it would be
>>> useful as a general monitoring thing as well.
>>>
>>> I think the original reason was that it shouldn't interefer with
>>> synchronous replication - but it does take away a fairly useful
>>> usecase...
>>
>> I think that not only replaymsg->write but also ->flush should be set to
>> blockpos in pg_receivexlog. Which allows pg_receivexlog to behave
>> as synchronous standby, so we can write WAL to both local and remote
>> synchronously. I believe there are some use cases for synchronous
>> pg_receivexlog.
>
> pg_receivexlog doesn't currently fsync() after every write. It only
> fsync():s complete files. So we'd need to set ->flush only at the end
> of a segment, right?

Yes.

Currently the status update is sent for each status interval. In sync
replication, transaction has to wait for a while even after pg_receivexlog
has written or flushed the WAL data.

So we should add new option which specifies whether pg_receivexlog
sends the status packet back as soon as it writes or flushes the WAL
data, like the walreceiver does?

Regards,

--
Fujii Masao


Re: pg_receivexlog and feedback message

From
Robert Haas
Date:
On Tue, Jun 5, 2012 at 10:44 AM, Magnus Hagander <magnus@hagander.net> wrote:
> On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>> Right now, pg_receivexlog sets:
>>>                        replymsg->write = InvalidXLogRecPtr;
>>>                        replymsg->flush = InvalidXLogRecPtr;
>>>                        replymsg->apply = InvalidXLogRecPtr;
>>>
>>> when it sends it's status updates.
>>>
>>> I'm thinking it sohuld set replymsg->write = blockpos instad.
>>>
>>> Why? That way you can see in pg_stat_replication what has actually
>>> been received by pg_receivexlog - not just what we last sent. This can
>>> be useful in combination with an archive_command that can block WAL
>>> recycling until it has been saved to the standby. And it would be
>>> useful as a general monitoring thing as well.
>>>
>>> I think the original reason was that it shouldn't interefer with
>>> synchronous replication - but it does take away a fairly useful
>>> usecase...
>>
>> I think that not only replaymsg->write but also ->flush should be set to
>> blockpos in pg_receivexlog. Which allows pg_receivexlog to behave
>> as synchronous standby, so we can write WAL to both local and remote
>> synchronously. I believe there are some use cases for synchronous
>> pg_receivexlog.
>
> pg_receivexlog doesn't currently fsync() after every write. It only
> fsync():s complete files. So we'd need to set ->flush only at the end
> of a segment, right?

If you want to be able to use it as a synchronous standby, that's not
going to work very well.  You could end up with pg_receivexlog waiting
for the end of the segment before it flushes; meanwhile, all the
clients are sitting there waiting for the flush to happen before they
do anything that could generate more WAL to fill the segment.

Unless you have a solution to that problem, I'd recommend setting
write (which should work with the new remote_write mode for sync rep)
but not setting flush.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_receivexlog and feedback message

From
Magnus Hagander
Date:
On Wed, Jun 6, 2012 at 8:26 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Jun 5, 2012 at 11:44 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>>> Right now, pg_receivexlog sets:
>>>>                        replymsg->write = InvalidXLogRecPtr;
>>>>                        replymsg->flush = InvalidXLogRecPtr;
>>>>                        replymsg->apply = InvalidXLogRecPtr;
>>>>
>>>> when it sends it's status updates.
>>>>
>>>> I'm thinking it sohuld set replymsg->write = blockpos instad.
>>>>
>>>> Why? That way you can see in pg_stat_replication what has actually
>>>> been received by pg_receivexlog - not just what we last sent. This can
>>>> be useful in combination with an archive_command that can block WAL
>>>> recycling until it has been saved to the standby. And it would be
>>>> useful as a general monitoring thing as well.
>>>>
>>>> I think the original reason was that it shouldn't interefer with
>>>> synchronous replication - but it does take away a fairly useful
>>>> usecase...
>>>
>>> I think that not only replaymsg->write but also ->flush should be set to
>>> blockpos in pg_receivexlog. Which allows pg_receivexlog to behave
>>> as synchronous standby, so we can write WAL to both local and remote
>>> synchronously. I believe there are some use cases for synchronous
>>> pg_receivexlog.
>>
>> pg_receivexlog doesn't currently fsync() after every write. It only
>> fsync():s complete files. So we'd need to set ->flush only at the end
>> of a segment, right?
>
> Yes.
>
> Currently the status update is sent for each status interval. In sync
> replication, transaction has to wait for a while even after pg_receivexlog
> has written or flushed the WAL data.
>
> So we should add new option which specifies whether pg_receivexlog
> sends the status packet back as soon as it writes or flushes the WAL
> data, like the walreceiver does?

That might be useful, but I think that's 9.3 material at this point.

But I think we can get the "set the write location" in as a bugfix.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: pg_receivexlog and feedback message

From
Fujii Masao
Date:
On Thu, Jun 7, 2012 at 5:05 AM, Magnus Hagander <magnus@hagander.net> wrote:
> On Wed, Jun 6, 2012 at 8:26 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Tue, Jun 5, 2012 at 11:44 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>> On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>> On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>>>> Right now, pg_receivexlog sets:
>>>>>                        replymsg->write = InvalidXLogRecPtr;
>>>>>                        replymsg->flush = InvalidXLogRecPtr;
>>>>>                        replymsg->apply = InvalidXLogRecPtr;
>>>>>
>>>>> when it sends it's status updates.
>>>>>
>>>>> I'm thinking it sohuld set replymsg->write = blockpos instad.
>>>>>
>>>>> Why? That way you can see in pg_stat_replication what has actually
>>>>> been received by pg_receivexlog - not just what we last sent. This can
>>>>> be useful in combination with an archive_command that can block WAL
>>>>> recycling until it has been saved to the standby. And it would be
>>>>> useful as a general monitoring thing as well.
>>>>>
>>>>> I think the original reason was that it shouldn't interefer with
>>>>> synchronous replication - but it does take away a fairly useful
>>>>> usecase...
>>>>
>>>> I think that not only replaymsg->write but also ->flush should be set to
>>>> blockpos in pg_receivexlog. Which allows pg_receivexlog to behave
>>>> as synchronous standby, so we can write WAL to both local and remote
>>>> synchronously. I believe there are some use cases for synchronous
>>>> pg_receivexlog.
>>>
>>> pg_receivexlog doesn't currently fsync() after every write. It only
>>> fsync():s complete files. So we'd need to set ->flush only at the end
>>> of a segment, right?
>>
>> Yes.
>>
>> Currently the status update is sent for each status interval. In sync
>> replication, transaction has to wait for a while even after pg_receivexlog
>> has written or flushed the WAL data.
>>
>> So we should add new option which specifies whether pg_receivexlog
>> sends the status packet back as soon as it writes or flushes the WAL
>> data, like the walreceiver does?
>
> That might be useful, but I think that's 9.3 material at this point.

Fair enough. That's new feature rather than a bugfix.

> But I think we can get the "set the write location" in as a bugfix.

Also "set the flush location"? Sending the flush location back seems
helpful when using pg_receivexlog for WAL archiving purpose. By
seeing the flush location we can ensure that WAL file has been archived
durably (IOW, WAL file has been flushed in remote archive area).

Regards,

--
Fujii Masao


Re: pg_receivexlog and feedback message

From
Magnus Hagander
Date:
On Thursday, June 7, 2012, Fujii Masao wrote:
On Thu, Jun 7, 2012 at 5:05 AM, Magnus Hagander <magnus@hagander.net> wrote:
> On Wed, Jun 6, 2012 at 8:26 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Tue, Jun 5, 2012 at 11:44 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>> On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>> On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>>>> Right now, pg_receivexlog sets:
>>>>>                        replymsg->write = InvalidXLogRecPtr;
>>>>>                        replymsg->flush = InvalidXLogRecPtr;
>>>>>                        replymsg->apply = InvalidXLogRecPtr;
>>>>>
>>>>> when it sends it's status updates.
>>>>>
>>>>> I'm thinking it sohuld set replymsg->write = blockpos instad.
>>>>>
>>>>> Why? That way you can see in pg_stat_replication what has actually
>>>>> been received by pg_receivexlog - not just what we last sent. This can
>>>>> be useful in combination with an archive_command that can block WAL
>>>>> recycling until it has been saved to the standby. And it would be
>>>>> useful as a general monitoring thing as well.
>>>>>
>>>>> I think the original reason was that it shouldn't interefer with
>>>>> synchronous replication - but it does take away a fairly useful
>>>>> usecase...
>>>>
>>>> I think that not only replaymsg->write but also ->flush should be set to
>>>> blockpos in pg_receivexlog. Which allows pg_receivexlog to behave
>>>> as synchronous standby, so we can write WAL to both local and remote
>>>> synchronously. I believe there are some use cases for synchronous
>>>> pg_receivexlog.
>>>
>>> pg_receivexlog doesn't currently fsync() after every write. It only
>>> fsync():s complete files. So we'd need to set ->flush only at the end
>>> of a segment, right?
>>
>> Yes.
>>
>> Currently the status update is sent for each status interval. In sync
>> replication, transaction has to wait for a while even after pg_receivexlog
>> has written or flushed the WAL data.
>>
>> So we should add new option which specifies whether pg_receivexlog
>> sends the status packet back as soon as it writes or flushes the WAL
>> data, like the walreceiver does?
>
> That might be useful, but I think that's 9.3 material at this point.

Fair enough. That's new feature rather than a bugfix.

> But I think we can get the "set the write location" in as a bugfix.

Also "set the flush location"? Sending the flush location back seems
helpful when using pg_receivexlog for WAL archiving purpose. By
seeing the flush location we can ensure that WAL file has been archived
durably (IOW, WAL file has been flushed in remote archive area).


You  can do that with the write location as well, as long as you round it off to complete segments, can't you?

In fact that's exactly the usecase that got me to realize we were missing this :-)

//Magnus
 


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: pg_receivexlog and feedback message

From
Fujii Masao
Date:
On Thu, Jun 7, 2012 at 6:25 PM, Magnus Hagander <magnus@hagander.net> wrote:
> On Thursday, June 7, 2012, Fujii Masao wrote:
>>
>> On Thu, Jun 7, 2012 at 5:05 AM, Magnus Hagander <magnus@hagander.net>
>> wrote:
>> > On Wed, Jun 6, 2012 at 8:26 PM, Fujii Masao <masao.fujii@gmail.com>
>> > wrote:
>> >> On Tue, Jun 5, 2012 at 11:44 PM, Magnus Hagander <magnus@hagander.net>
>> >> wrote:
>> >>> On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao <masao.fujii@gmail.com>
>> >>> wrote:
>> >>>> On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander <magnus@hagander.net>
>> >>>> wrote:
>> >>>>> Right now, pg_receivexlog sets:
>> >>>>>                        replymsg->write = InvalidXLogRecPtr;
>> >>>>>                        replymsg->flush = InvalidXLogRecPtr;
>> >>>>>                        replymsg->apply = InvalidXLogRecPtr;
>> >>>>>
>> >>>>> when it sends it's status updates.
>> >>>>>
>> >>>>> I'm thinking it sohuld set replymsg->write = blockpos instad.
>> >>>>>
>> >>>>> Why? That way you can see in pg_stat_replication what has actually
>> >>>>> been received by pg_receivexlog - not just what we last sent. This
>> >>>>> can
>> >>>>> be useful in combination with an archive_command that can block WAL
>> >>>>> recycling until it has been saved to the standby. And it would be
>> >>>>> useful as a general monitoring thing as well.
>> >>>>>
>> >>>>> I think the original reason was that it shouldn't interefer with
>> >>>>> synchronous replication - but it does take away a fairly useful
>> >>>>> usecase...
>> >>>>
>> >>>> I think that not only replaymsg->write but also ->flush should be set
>> >>>> to
>> >>>> blockpos in pg_receivexlog. Which allows pg_receivexlog to behave
>> >>>> as synchronous standby, so we can write WAL to both local and remote
>> >>>> synchronously. I believe there are some use cases for synchronous
>> >>>> pg_receivexlog.
>> >>>
>> >>> pg_receivexlog doesn't currently fsync() after every write. It only
>> >>> fsync():s complete files. So we'd need to set ->flush only at the end
>> >>> of a segment, right?
>> >>
>> >> Yes.
>> >>
>> >> Currently the status update is sent for each status interval. In sync
>> >> replication, transaction has to wait for a while even after
>> >> pg_receivexlog
>> >> has written or flushed the WAL data.
>> >>
>> >> So we should add new option which specifies whether pg_receivexlog
>> >> sends the status packet back as soon as it writes or flushes the WAL
>> >> data, like the walreceiver does?
>> >
>> > That might be useful, but I think that's 9.3 material at this point.
>>
>> Fair enough. That's new feature rather than a bugfix.
>>
>> > But I think we can get the "set the write location" in as a bugfix.
>>
>> Also "set the flush location"? Sending the flush location back seems
>> helpful when using pg_receivexlog for WAL archiving purpose. By
>> seeing the flush location we can ensure that WAL file has been archived
>> durably (IOW, WAL file has been flushed in remote archive area).
>>
>
> You  can do that with the write location as well, as long as you round it
> off to complete segments, can't you?

You mean to prevent pg_receivexlog from sending back the end of WAL file
as the write location *before* it completes the WAL file? If so, yes. But
why do you want to keep the flush location invalid?

Regards,

--
Fujii Masao


Re: pg_receivexlog and feedback message

From
Magnus Hagander
Date:
On Thursday, June 7, 2012, Fujii Masao wrote:
On Thu, Jun 7, 2012 at 6:25 PM, Magnus Hagander <magnus@hagander.net> wrote:
> On Thursday, June 7, 2012, Fujii Masao wrote:
>>
>> On Thu, Jun 7, 2012 at 5:05 AM, Magnus Hagander <magnus@hagander.net>
>> wrote:
>> > On Wed, Jun 6, 2012 at 8:26 PM, Fujii Masao <masao.fujii@gmail.com>
>> > wrote:
>> >> On Tue, Jun 5, 2012 at 11:44 PM, Magnus Hagander <magnus@hagander.net>
>> >> wrote:
>> >>> On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao <masao.fujii@gmail.com>
>> >>> wrote:
>> >>>> On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander <magnus@hagander.net>
>> >>>> wrote:
>> >>>>> Right now, pg_receivexlog sets:
>> >>>>>                        replymsg->write = InvalidXLogRecPtr;
>> >>>>>                        replymsg->flush = InvalidXLogRecPtr;
>> >>>>>                        replymsg->apply = InvalidXLogRecPtr;
>> >>>>>
>> >>>>> when it sends it's status updates.
>> >>>>>
>> >>>>> I'm thinking it sohuld set replymsg->write = blockpos instad.
>> >>>>>
>> >>>>> Why? That way you can see in pg_stat_replication what has actually
>> >>>>> been received by pg_receivexlog - not just what we last sent. This
>> >>>>> can
>> >>>>> be useful in combination with an archive_command that can block WAL
>> >>>>> recycling until it has been saved to the standby. And it would be
>> >>>>> useful as a general monitoring thing as well.
>> >>>>>
>> >>>>> I think the original reason was that it shouldn't interefer with
>> >>>>> synchronous replication - but it does take away a fairly useful
>> >>>>> usecase...
>> >>>>
>> >>>> I think that not only replaymsg->write but also ->flush should be set
>> >>>> to
>> >>>> blockpos in pg_receivexlog. Which allows pg_receivexlog to behave
>> >>>> as synchronous standby, so we can write WAL to both local and remote
>> >>>> synchronously. I believe there are some use cases for synchronous
>> >>>> pg_receivexlog.
>> >>>
>> >>> pg_receivexlog doesn't currently fsync() after every write. It only
>> >>> fsync():s complete files. So we'd need to set ->flush only at the end
>> >>> of a segment, right?
>> >>
>> >> Yes.
>> >>
>> >> Currently the status update is sent for each status interval. In sync
>> >> replication, transaction has to wait for a while even after
>> >> pg_receivexlog
>> >> has written or flushed the WAL data.
>> >>
>> >> So we should add new option which specifies whether pg_receivexlog
>> >> sends the status packet back as soon as it writes or flushes the WAL
>> >> data, like the walreceiver does?
>> >
>> > That might be useful, but I think that's 9.3 material at this point.
>>
>> Fair enough. That's new feature rather than a bugfix.
>>
>> > But I think we can get the "set the write location" in as a bugfix.
>>
>> Also "set the flush location"? Sending the flush location back seems
>> helpful when using pg_receivexlog for WAL archiving purpose. By
>> seeing the flush location we can ensure that WAL file has been archived
>> durably (IOW, WAL file has been flushed in remote archive area).
>>
>
> You  can do that with the write location as well, as long as you round it
You mean to prevent pg_receivexlog from sending back the end of WAL file
as the write location *before* it completes the WAL file? If so, yes. But
why do you want to keep the flush location invalid?

No. pg_receivexlog sends back the correct write location. Whoever does the check (through pg_stat_replication) rounds down, so it only counts it once pg_receivexlog has acknowledged receiving the whole mail.

I'm not against doing the flush location as well, I'm just worried about feature-creep :-) But let's see how big a change that would turn out to be...

//Magnus



--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: pg_receivexlog and feedback message

From
Magnus Hagander
Date:
On Thu, Jun 7, 2012 at 12:40 PM, Magnus Hagander <magnus@hagander.net> wrote:
> On Thursday, June 7, 2012, Fujii Masao wrote:
>>
>> On Thu, Jun 7, 2012 at 6:25 PM, Magnus Hagander <magnus@hagander.net>
>> wrote:
>> > On Thursday, June 7, 2012, Fujii Masao wrote:
>> >>
>> >> On Thu, Jun 7, 2012 at 5:05 AM, Magnus Hagander <magnus@hagander.net>
>> >> wrote:
>> >> > On Wed, Jun 6, 2012 at 8:26 PM, Fujii Masao <masao.fujii@gmail.com>
>> >> > wrote:
>> >> >> On Tue, Jun 5, 2012 at 11:44 PM, Magnus Hagander
>> >> >> <magnus@hagander.net>
>> >> >> wrote:
>> >> >>> On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao <masao.fujii@gmail.com>
>> >> >>> wrote:
>> >> >>>> On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander
>> >> >>>> <magnus@hagander.net>
>> >> >>>> wrote:
>> >> >>>>> Right now, pg_receivexlog sets:
>> >> >>>>>                        replymsg->write = InvalidXLogRecPtr;
>> >> >>>>>                        replymsg->flush = InvalidXLogRecPtr;
>> >> >>>>>                        replymsg->apply = InvalidXLogRecPtr;
>> >> >>>>>
>> >> >>>>> when it sends it's status updates.
>> >> >>>>>
>> >> >>>>> I'm thinking it sohuld set replymsg->write = blockpos instad.
>> >> >>>>>
>> >> >>>>> Why? That way you can see in pg_stat_replication what has
>> >> >>>>> actually
>> >> >>>>> been received by pg_receivexlog - not just what we last sent.
>> >> >>>>> This
>> >> >>>>> can
>> >> >>>>> be useful in combination with an archive_command that can block
>> >> >>>>> WAL
>> >> >>>>> recycling until it has been saved to the standby. And it would be
>> >> >>>>> useful as a general monitoring thing as well.
>> >> >>>>>
>> >> >>>>> I think the original reason was that it shouldn't interefer with
>> >> >>>>> synchronous replication - but it does take away a fairly useful
>> >> >>>>> usecase...
>> >> >>>>
>> >> >>>> I think that not only replaymsg->write but also ->flush should be
>> >> >>>> set
>> >> >>>> to
>> >> >>>> blockpos in pg_receivexlog. Which allows pg_receivexlog to behave
>> >> >>>> as synchronous standby, so we can write WAL to both local and
>> >> >>>> remote
>> >> >>>> synchronously. I believe there are some use cases for synchronous
>> >> >>>> pg_receivexlog.
>> >> >>>
>> >> >>> pg_receivexlog doesn't currently fsync() after every write. It only
>> >> >>> fsync():s complete files. So we'd need to set ->flush only at the
>> >> >>> end
>> >> >>> of a segment, right?
>> >> >>
>> >> >> Yes.
>> >> >>
>> >> >> Currently the status update is sent for each status interval. In
>> >> >> sync
>> >> >> replication, transaction has to wait for a while even after
>> >> >> pg_receivexlog
>> >> >> has written or flushed the WAL data.
>> >> >>
>> >> >> So we should add new option which specifies whether pg_receivexlog
>> >> >> sends the status packet back as soon as it writes or flushes the WAL
>> >> >> data, like the walreceiver does?
>> >> >
>> >> > That might be useful, but I think that's 9.3 material at this point.
>> >>
>> >> Fair enough. That's new feature rather than a bugfix.
>> >>
>> >> > But I think we can get the "set the write location" in as a bugfix.
>> >>
>> >> Also "set the flush location"? Sending the flush location back seems
>> >> helpful when using pg_receivexlog for WAL archiving purpose. By
>> >> seeing the flush location we can ensure that WAL file has been archived
>> >> durably (IOW, WAL file has been flushed in remote archive area).
>> >>
>> >
>> > You  can do that with the write location as well, as long as you round
>> > it
>> You mean to prevent pg_receivexlog from sending back the end of WAL file
>> as the write location *before* it completes the WAL file? If so, yes. But
>> why do you want to keep the flush location invalid?
>
>
> No. pg_receivexlog sends back the correct write location. Whoever does the
> check (through pg_stat_replication) rounds down, so it only counts it once
> pg_receivexlog has acknowledged receiving the whole mail.
>
> I'm not against doing the flush location as well, I'm just worried about
> feature-creep :-) But let's see how big a change that would turn out to
> be...

How about this?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Attachment

Re: pg_receivexlog and feedback message

From
Fujii Masao
Date:
On Sun, Jun 10, 2012 at 7:55 PM, Magnus Hagander <magnus@hagander.net> wrote:
> How about this?

+                /*
+                 * Set flushed position to the last byte in the previous
+                 * file. Per above we know that xrecoff%XLOG_SEG_SIZE=0
+                 */
+                flushedpos = blockpos;
+                if (flushedpos.xrecoff == 0)
+                {
+                    flushedpos.xlogid--;
+                    flushedpos.xrecoff = XLogFileSize-1;
+                }
+                else
+                    flushedpos.xrecoff--;

flushedpos.xrecoff doesn't need to be decremented by one.
If xrecoff % XLOG_SEG_SIZE = 0, the position should be the last
byte of previous (i.e., flushed) WAL file.

Regards,

-- 
Fujii Masao


Re: pg_receivexlog and feedback message

From
Magnus Hagander
Date:
On Sun, Jun 10, 2012 at 4:02 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Sun, Jun 10, 2012 at 7:55 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> How about this?
>
> +                               /*
> +                                * Set flushed position to the last byte in the previous
> +                                * file. Per above we know that xrecoff%XLOG_SEG_SIZE=0
> +                                */
> +                               flushedpos = blockpos;
> +                               if (flushedpos.xrecoff == 0)
> +                               {
> +                                       flushedpos.xlogid--;
> +                                       flushedpos.xrecoff = XLogFileSize-1;
> +                               }
> +                               else
> +                                       flushedpos.xrecoff--;
>
> flushedpos.xrecoff doesn't need to be decremented by one.
> If xrecoff % XLOG_SEG_SIZE = 0, the position should be the last
> byte of previous (i.e., flushed) WAL file.

Hmm. I thikn I confused myself with "last byte written" vs "current
position". And we're dealing with current position here...

So it should just be flushedpos = blockpos and be done with it, right?

Though before I commit anything with this, we need to decide what to
wrt syncrep on that, per the other thread.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: pg_receivexlog and feedback message

From
Fujii Masao
Date:
On Mon, Jun 11, 2012 at 10:04 PM, Magnus Hagander <magnus@hagander.net> wrote:
> On Sun, Jun 10, 2012 at 4:02 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Sun, Jun 10, 2012 at 7:55 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>> How about this?
>>
>> +                               /*
>> +                                * Set flushed position to the last byte in the previous
>> +                                * file. Per above we know that xrecoff%XLOG_SEG_SIZE=0
>> +                                */
>> +                               flushedpos = blockpos;
>> +                               if (flushedpos.xrecoff == 0)
>> +                               {
>> +                                       flushedpos.xlogid--;
>> +                                       flushedpos.xrecoff = XLogFileSize-1;
>> +                               }
>> +                               else
>> +                                       flushedpos.xrecoff--;
>>
>> flushedpos.xrecoff doesn't need to be decremented by one.
>> If xrecoff % XLOG_SEG_SIZE = 0, the position should be the last
>> byte of previous (i.e., flushed) WAL file.
>
> Hmm. I thikn I confused myself with "last byte written" vs "current
> position". And we're dealing with current position here...
>
> So it should just be flushedpos = blockpos and be done with it, right?

Yep.

> Though before I commit anything with this, we need to decide what to
> wrt syncrep on that, per the other thread.

Yep.

Regards,

--
Fujii Masao


Re: pg_receivexlog and feedback message

From
Magnus Hagander
Date:
On Mon, Jun 11, 2012 at 5:24 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Mon, Jun 11, 2012 at 10:04 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> On Sun, Jun 10, 2012 at 4:02 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> On Sun, Jun 10, 2012 at 7:55 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>>> How about this?
>>>
>>> +                               /*
>>> +                                * Set flushed position to the last byte in the previous
>>> +                                * file. Per above we know that xrecoff%XLOG_SEG_SIZE=0
>>> +                                */
>>> +                               flushedpos = blockpos;
>>> +                               if (flushedpos.xrecoff == 0)
>>> +                               {
>>> +                                       flushedpos.xlogid--;
>>> +                                       flushedpos.xrecoff = XLogFileSize-1;
>>> +                               }
>>> +                               else
>>> +                                       flushedpos.xrecoff--;
>>>
>>> flushedpos.xrecoff doesn't need to be decremented by one.
>>> If xrecoff % XLOG_SEG_SIZE = 0, the position should be the last
>>> byte of previous (i.e., flushed) WAL file.
>>
>> Hmm. I thikn I confused myself with "last byte written" vs "current
>> position". And we're dealing with current position here...
>>
>> So it should just be flushedpos = blockpos and be done with it, right?
>
> Yep.
>
>> Though before I commit anything with this, we need to decide what to
>> wrt syncrep on that, per the other thread.
>
> Yep.

Per the other thread, we decided to postpone this until 9.3. And also
figure out a better set of switches for pg_receivexlog to control it
with.

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/