Thread: making write location work (was: Efficient transaction-controlled synchronous replication)

making write location work (was: Efficient transaction-controlled synchronous replication)

From

Robert Haas

Date:

23 March 2011, 15:35:56

On Fri, Mar 18, 2011 at 8:16 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Mar 18, 2011 at 3:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> I agree to get rid of write_location.
>>
>> No, don't remove it.
>>
>> We seem to be just looking for things to tweak without any purpose.
>> Removing this adds nothing for us.
>>
>> We will have the column in the future, it is there now, so leave it.
>
> Well then can we revert the part of your patch that causes it to not
> actually work any more?

Specifically, if we're not going to remove write location, then I
think we need to apply something like the attached.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

write-location-fix.patch

Re: making write location work (was: Efficient transaction-controlled synchronous replication)

From

Simon Riggs

Date:

23 March 2011, 16:10:27

On Wed, Mar 23, 2011 at 3:35 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Mar 18, 2011 at 8:16 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Mar 18, 2011 at 3:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>> I agree to get rid of write_location.
>>>
>>> No, don't remove it.
>>>
>>> We seem to be just looking for things to tweak without any purpose.
>>> Removing this adds nothing for us.
>>>
>>> We will have the column in the future, it is there now, so leave it.
>>
>> Well then can we revert the part of your patch that causes it to not
>> actually work any more?
>
> Specifically, if we're not going to remove write location, then I
> think we need to apply something like the attached.

The protocol supports different write/fsync values, so the view should
display them.
We don't know what the standby end will be doing with the data in all cases.

For the main server, making the additional change will just decrease
performance, for no benefit.

In the future we would have a parameter that says how often we send
replies, but there's no point having a parameter if there is only one
meaningful value for standby servers currently.

Please leave this as it is now.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: making write location work (was: Efficient transaction-controlled synchronous replication)

From

Tom Lane

Date:

23 March 2011, 16:46:09

Robert Haas <robertmhaas@gmail.com> writes:
> Specifically, if we're not going to remove write location, then I
> think we need to apply something like the attached.

>              while (walrcv_receive(0, &type, &buf, &len))
>                  XLogWalRcvProcessMsg(type, buf, len);
> +            /* Let the master know that we received some data. */
> +            XLogWalRcvSendReply();

What if we didn't actually receive any new data?
        regards, tom lane

Re: Re: making write location work (was: Efficient transaction-controlled synchronous replication)

From

Robert Haas

Date:

23 March 2011, 18:22:42

On Wed, Mar 23, 2011 at 12:10 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Specifically, if we're not going to remove write location, then I
>> think we need to apply something like the attached.
>
> The protocol supports different write/fsync values, so the view should
> display them.

That's exactly the point.  Currently, we have a protocol that supports
different write and fsync values, but the code as written does not
actually ever send a reply at any time when the two values can ever be
different.  So there is no point in sending both of them.  The write
location is completely redundant with the fsync location and therefore
completely useless.  We shouldn't bother sending the value twice, or
displaying it twice, if it's absolutely 100% guaranteed to be
identical in every case.

The point of the patch that I posted is that it restores the previous
behavior, where we send an update before flushing WAL and again after
flushing WAL.  If we do that, then the write location can be ahead of
the flush location when we've written but not flushed.  If we don't do
that, and only send replies after flushing everything, then the two
fields are perforce always the same on the master.  I don't see that
as being a useful behavior, and in fact I think it could be quite
confusing.  Someone might assume that if we bother to expose both a
write_location and a flush_location, they are somehow different.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: making write location work (was: Efficient transaction-controlled synchronous replication)

From

Robert Haas

Date:

23 March 2011, 18:26:14

On Wed, Mar 23, 2011 at 12:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Specifically, if we're not going to remove write location, then I
>> think we need to apply something like the attached.
>
>>                       while (walrcv_receive(0, &type, &buf, &len))
>>                               XLogWalRcvProcessMsg(type, buf, len);
>
>> +                     /* Let the master know that we received some data. */
>> +                     XLogWalRcvSendReply();
>
> What if we didn't actually receive any new data?

The portion of the code immediately preceding what's included in the
diff guards against that, and there is a second guard in
XLogWalRcvSendReply().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Re: making write location work (was: Efficient transaction-controlled synchronous replication)

From

Robert Haas

Date:

23 March 2011, 19:29:46

On Wed, Mar 23, 2011 at 2:43 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Wed, Mar 23, 2011 at 6:22 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Mar 23, 2011 at 12:10 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>> Specifically, if we're not going to remove write location, then I
>>>> think we need to apply something like the attached.
>>>
>>> The protocol supports different write/fsync values, so the view should
>>> display them.
>>
>> That's exactly the point.
>
> No its not.
>
>> Currently, we have a protocol that supports
>> different write and fsync values, but the code as written does not
>> actually ever send a reply at any time when the two values can ever be
>> different.  So there is no point in sending both of them.  The write
>> location is completely redundant with the fsync location and therefore
>> completely useless.  We shouldn't bother sending the value twice, or
>> displaying it twice, if it's absolutely 100% guaranteed to be
>> identical in every case.
>
> As of 9.1, we now support other tools that use the protocol, so you
> cannot assume you know what is being sent, just because one sender has
> certain characteristics.

Oh, really?  Is this strictly hypothetical or is such a beast
planned/already in existence?

I'm just afraid this is going to be confusing to users who will expect
it to do something that it doesn't.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Re: making write location work (was: Efficient transaction-controlled synchronous replication)

From

Simon Riggs

Date:

23 March 2011, 19:33:27

On Wed, Mar 23, 2011 at 7:29 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 23, 2011 at 2:43 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On Wed, Mar 23, 2011 at 6:22 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Wed, Mar 23, 2011 at 12:10 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>>> Specifically, if we're not going to remove write location, then I
>>>>> think we need to apply something like the attached.
>>>>
>>>> The protocol supports different write/fsync values, so the view should
>>>> display them.
>>>
>>> That's exactly the point.
>>
>> No its not.
>>
>>> Currently, we have a protocol that supports
>>> different write and fsync values, but the code as written does not
>>> actually ever send a reply at any time when the two values can ever be
>>> different.  So there is no point in sending both of them.  The write
>>> location is completely redundant with the fsync location and therefore
>>> completely useless.  We shouldn't bother sending the value twice, or
>>> displaying it twice, if it's absolutely 100% guaranteed to be
>>> identical in every case.
>>
>> As of 9.1, we now support other tools that use the protocol, so you
>> cannot assume you know what is being sent, just because one sender has
>> certain characteristics.
>
> Oh, really?  Is this strictly hypothetical or is such a beast
> planned/already in existence?

Ask Magnus.

In any case, that's not the only argument for keeping it. We introduce
the view in this release and I would like it to stay the same from
now, since we know we will need that info later.

No more minor tweaks, please.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Re: making write location work (was: Efficient transaction-controlled synchronous replication)

From

Robert Haas

Date:

23 March 2011, 20:21:43

On Wed, Mar 23, 2011 at 3:33 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> In any case, that's not the only argument for keeping it. We introduce
> the view in this release and I would like it to stay the same from
> now, since we know we will need that info later.

At least as I understand it, it's not our project policy to carry
around code that doesn't accomplish anything useful.  I have no
objection to keeping the field; I simply think that if we're going to
have it, we should make it work, as in fact it did before you changed
it without discussion.  You haven't offered any evidence at all that
it introduces any kind of a performance regression AT ALL, much less
that such any such regression can't be trivially patched around by
making SyncRepReleaseWaiters exit quickly if the flush LSN hasn't
advanced.  The onus is as much on you to justify the change as it is
on me to justify changing it back.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Re: making write location work (was: Efficient transaction-controlled synchronous replication)

From

Simon Riggs

Date:

23 March 2011, 20:33:27

On Wed, Mar 23, 2011 at 8:20 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 23, 2011 at 3:33 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> In any case, that's not the only argument for keeping it. We introduce
>> the view in this release and I would like it to stay the same from
>> now, since we know we will need that info later.
>
> At least as I understand it, it's not our project policy to carry
> around code that doesn't accomplish anything useful.  I have no
> objection to keeping the field; I simply think that if we're going to
> have it, we should make it work, as in fact it did before you changed
> it without discussion.  You haven't offered any evidence at all that
> it introduces any kind of a performance regression AT ALL, much less
> that such any such regression can't be trivially patched around by
> making SyncRepReleaseWaiters exit quickly if the flush LSN hasn't
> advanced.  The onus is as much on you to justify the change as it is
> on me to justify changing it back.

What a stupid conversation.

There's no onus on me to have to keep justifying to you why the code
is the way it is, but I do.

If you want to make a change that I already know reduces performance,
you have to have a good reason. So far, you don't.

Stop fussing and wrap the release.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Re: making write location work (was: Efficient transaction-controlled synchronous replication)

From

Robert Haas

Date:

24 March 2011, 21:12:05

On Thu, Mar 24, 2011 at 2:28 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> The protocol supports sending two values, so two are displayed.
>
> If you wish to remove one from the display then that only makes sense
> if you also prevent the protocol from supporting two values.
>
> There is no benefit in doing that, so why do it? We are going to put
> that back in 9.2 if you remove it now. Why not leave it, so we don't
> need to rewrite all the monitoring tools that will use this view?

If we're going to put it back in 9.2, then we shouldn't remove it now.We should just make it work.  It's a three line
patch. If 9.2 is

going to meaningfully distinguish between write location and flush
location, then we may as well do the same thing in 9.1.  Then we'll be
ahead of the game: not only will the view have the same columns in
both releases, but they'll actually have the same semantics in both
releases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Re: making write location work (was: Efficient transaction-controlled synchronous replication)

From

Fujii Masao

Date:

25 March 2011, 09:03:55

On Fri, Mar 25, 2011 at 6:11 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Mar 24, 2011 at 2:28 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> The protocol supports sending two values, so two are displayed.
>>
>> If you wish to remove one from the display then that only makes sense
>> if you also prevent the protocol from supporting two values.
>>
>> There is no benefit in doing that, so why do it? We are going to put
>> that back in 9.2 if you remove it now. Why not leave it, so we don't
>> need to rewrite all the monitoring tools that will use this view?

What are you planning to use write_location for? BTW, I'm thinking to
add recv_location (not write_location) in 9.2 to support another sync rep
mode which makes transactions wait until the standby has received
(not fsync'd) the WAL. You're planning the same?

> If we're going to put it back in 9.2, then we shouldn't remove it now.
>  We should just make it work.  It's a three line patch.  If 9.2 is
> going to meaningfully distinguish between write location and flush
> location, then we may as well do the same thing in 9.1.  Then we'll be
> ahead of the game: not only will the view have the same columns in
> both releases, but they'll actually have the same semantics in both
> releases.

+1

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Re: making write location work (was: Efficient transaction-controlled synchronous replication)

From

Robert Haas

Date:

25 March 2011, 15:32:43

On Fri, Mar 25, 2011 at 5:03 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Fri, Mar 25, 2011 at 6:11 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Mar 24, 2011 at 2:28 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> The protocol supports sending two values, so two are displayed.
>>>
>>> If you wish to remove one from the display then that only makes sense
>>> if you also prevent the protocol from supporting two values.
>>>
>>> There is no benefit in doing that, so why do it? We are going to put
>>> that back in 9.2 if you remove it now. Why not leave it, so we don't
>>> need to rewrite all the monitoring tools that will use this view?
>
> What are you planning to use write_location for? BTW, I'm thinking to
> add recv_location (not write_location) in 9.2 to support another sync rep
> mode which makes transactions wait until the standby has received
> (not fsync'd) the WAL. You're planning the same?
>
>> If we're going to put it back in 9.2, then we shouldn't remove it now.
>>  We should just make it work.  It's a three line patch.  If 9.2 is
>> going to meaningfully distinguish between write location and flush
>> location, then we may as well do the same thing in 9.1.  Then we'll be
>> ahead of the game: not only will the view have the same columns in
>> both releases, but they'll actually have the same semantics in both
>> releases.
>
> +1

I think we have adequate consensus on this topic, so committed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Re: making write location work (was: Efficient transaction-controlled synchronous replication)

From

Simon Riggs

Date:

26 March 2011, 00:23:48

On Wed, Mar 23, 2011 at 6:22 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 23, 2011 at 12:10 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Specifically, if we're not going to remove write location, then I
>>> think we need to apply something like the attached.
>>
>> The protocol supports different write/fsync values, so the view should
>> display them.
>
> That's exactly the point.

No its not.

> Currently, we have a protocol that supports
> different write and fsync values, but the code as written does not
> actually ever send a reply at any time when the two values can ever be
> different.  So there is no point in sending both of them.  The write
> location is completely redundant with the fsync location and therefore
> completely useless.  We shouldn't bother sending the value twice, or
> displaying it twice, if it's absolutely 100% guaranteed to be
> identical in every case.

As of 9.1, we now support other tools that use the protocol, so you
cannot assume you know what is being sent, just because one sender has
certain characteristics.

> The point of the patch that I posted is that it restores the previous
> behavior, where we send an update before flushing WAL and again after
> flushing WAL.  If we do that, then the write location can be ahead of
> the flush location when we've written but not flushed.  If we don't do
> that, and only send replies after flushing everything, then the two
> fields are perforce always the same on the master.  I don't see that
> as being a useful behavior, and in fact I think it could be quite
> confusing.  Someone might assume that if we bother to expose both a
> write_location and a flush_location, they are somehow different.

They can be in 9.1 and almost certainly will be in 9.2

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Re: making write location work (was: Efficient transaction-controlled synchronous replication)

From

Fujii Masao

Date:

26 March 2011, 00:23:50

On Thu, Mar 24, 2011 at 3:22 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 23, 2011 at 12:10 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Specifically, if we're not going to remove write location, then I
>>> think we need to apply something like the attached.
>>
>> The protocol supports different write/fsync values, so the view should
>> display them.
>
> That's exactly the point.  Currently, we have a protocol that supports
> different write and fsync values, but the code as written does not
> actually ever send a reply at any time when the two values can ever be
> different.  So there is no point in sending both of them.  The write
> location is completely redundant with the fsync location and therefore
> completely useless.  We shouldn't bother sending the value twice, or
> displaying it twice, if it's absolutely 100% guaranteed to be
> identical in every case.
>
> The point of the patch that I posted is that it restores the previous
> behavior, where we send an update before flushing WAL and again after
> flushing WAL.  If we do that, then the write location can be ahead of
> the flush location when we've written but not flushed.  If we don't do
> that, and only send replies after flushing everything, then the two
> fields are perforce always the same on the master.  I don't see that
> as being a useful behavior, and in fact I think it could be quite
> confusing.  Someone might assume that if we bother to expose both a
> write_location and a flush_location, they are somehow different.

I agree with Robert. It's useless and confusing to send the same
location as flush_location as write_location redundantly. We should
either remove write_location from the pg_stat_replication view and
the protocol at all, or apply the proposed patch.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Re: making write location work (was: Efficient transaction-controlled synchronous replication)

From

Simon Riggs

Date:

26 March 2011, 00:25:35

On Thu, Mar 24, 2011 at 11:45 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Thu, Mar 24, 2011 at 3:22 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Mar 23, 2011 at 12:10 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>> Specifically, if we're not going to remove write location, then I
>>>> think we need to apply something like the attached.
>>>
>>> The protocol supports different write/fsync values, so the view should
>>> display them.
>>
>> That's exactly the point.  Currently, we have a protocol that supports
>> different write and fsync values, but the code as written does not
>> actually ever send a reply at any time when the two values can ever be
>> different.  So there is no point in sending both of them.  The write
>> location is completely redundant with the fsync location and therefore
>> completely useless.  We shouldn't bother sending the value twice, or
>> displaying it twice, if it's absolutely 100% guaranteed to be
>> identical in every case.
>>
>> The point of the patch that I posted is that it restores the previous
>> behavior, where we send an update before flushing WAL and again after
>> flushing WAL.  If we do that, then the write location can be ahead of
>> the flush location when we've written but not flushed.  If we don't do
>> that, and only send replies after flushing everything, then the two
>> fields are perforce always the same on the master.  I don't see that
>> as being a useful behavior, and in fact I think it could be quite
>> confusing.  Someone might assume that if we bother to expose both a
>> write_location and a flush_location, they are somehow different.
>
> I agree with Robert. It's useless and confusing to send the same
> location as flush_location as write_location redundantly. We should
> either remove write_location from the pg_stat_replication view and
> the protocol at all, or apply the proposed patch.

You may be confused, but that doesn't mean its useless.

The protocol supports sending two values, so two are displayed.

If you wish to remove one from the display then that only makes sense
if you also prevent the protocol from supporting two values.

There is no benefit in doing that, so why do it? We are going to put
that back in 9.2 if you remove it now. Why not leave it, so we don't
need to rewrite all the monitoring tools that will use this view?

Just say this in the docs. "Currently, standbys return the same value
for write and flush locations and so in most cases these values will
be the same. It is possible to write a user program that sends replies
at different times and in that case the values would differ. In later
release these values will differ for standbys. when more options are
available."

It's almost certain that Magnus will fix the bug in pg_xlogstream (or
whatever its called) and a tool will be available, so lets leave this.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services