Thread: Review of Refactoring code for sync node detection

Review of Refactoring code for sync node detection

From
Jim Nasby
Date:
Original message (sorry, don't have a copy to reply to :( ):
http://www.postgresql.org/message-id/CAB7nPqThX=WvuGA1J-_CQ293dK3FmUivuYkNvHR0W5xjEc=oFQ@mail.gmail.com
Commitfest URL: https://commitfest.postgresql.org/action/patch_view?id=1579

Summary:
I'd like someone to chime in on the potential performance impact. Other than that, looks good modulo a few grammar
fixes.

Patch applies cleanly and passes make check.

Searching for both SyncRepLock and sync_standby_priority I find no other code this patch needs to modify.

SyncRepGetSynchronousNode():
IMHO, the top comment should mention that if there are multiple standbys of the same priority it returns the first
one.

This switches from using a single if() with multiple conditions &&'d together to a bunch of if() continue's. I don't
knowif those will perform the same, and AFAIK this is pretty performance critical.
 

<grammarZealot>In the comments, "Process" should be "Proceed".</grammarZealot>

The original search logic in SyncRepReleaseWaiters is correctly copied and negated.

pg_stat_get_wal_senders():
The original version only loops through walsnds[] one time; the new code loops twice, both times while holding
SyncRepLock(shared).This will block transaction commit if syncrep is enabled. That's not great, but presumably this
functionisn't going to be called terribly often, so maybe it's OK. (On a side note, perhaps we should add a warning to
thedocumentation for pg_stat_replication warning not to hammer it...)
 

<grammar>
+    /* Get first the priorities on each standby as long as we hold a lock */
should be
+    /* First get the priority of each standby as long as we hold a lock */
or even just
+    /* First get the priority of each standby */
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: Review of Refactoring code for sync node detection

From
Michael Paquier
Date:
Thanks for your review! (No worries for creating a new thread, I don't
mind as this is not a huge patch. I think you could have downloaded
the mbox from postgresql.org btw).

On Thu, Oct 30, 2014 at 8:24 AM, Jim Nasby <jim@nasby.net> wrote:
> SyncRepGetSynchronousNode():
> IMHO, the top comment should mention that if there are multiple standbys of
> the same priority it returns the first one.
OK. Corrected.

> This switches from using a single if() with multiple conditions &&'d
> together to a bunch of if() continue's. I don't know if those will perform
> the same, and AFAIK this is pretty performance critical.
Well, we could still use the old notation with a single if(). That's
not much complicated to change.

> <grammarZealot>In the comments, "Process" should be
> "Proceed".</grammarZealot>
Noted. Thanks for correcting my Frenglish.

> pg_stat_get_wal_senders():
> The original version only loops through walsnds[] one time; the new code
> loops twice, both times while holding SyncRepLock(shared). This will block
> transaction commit if syncrep is enabled. That's not great, but presumably
> this function isn't going to be called terribly often, so maybe it's OK. (On
> a side note, perhaps we should add a warning to the documentation for
> pg_stat_replication warning not to hammer it...)
Hm. For code readability I did not really want to do so, but let's do
this: let's add a new argument in SyncRepGetSynchronousNode to
retrieve the priority of each WAL sender and do a single pass through
the array such as there is no performance difference.

Updated patch attached.
Regards,
--
Michael

Attachment

Re: Review of Refactoring code for sync node detection

From
Jim Nasby
Date:
On 10/30/14, 8:05 AM, Michael Paquier wrote:
>> This switches from using a single if() with multiple conditions &&'d
>> together to a bunch of if() continue's. I don't know if those will perform
>> the same, and AFAIK this is pretty performance critical.
> Well, we could still use the old notation with a single if(). That's
> not much complicated to change.

I actually prefer the multiple if's; it reads a LOT cleaner. I don't know what the compiler will do with it though.

If we stick with this version I'd argue it makes more sense to just stick the sync_node = and priority = statements
intothe if block and ditch the continue. </nitpick>
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Review of Refactoring code for sync node detection

From
Michael Paquier
Date:
On Fri, Oct 31, 2014 at 6:59 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
If we stick with this version I'd argue it makes more sense to just stick the sync_node = and priority = statements into the if block and ditch the continue. </nitpick>
Let's go with the cleaner version then, I'd prefer code that can be read easily for this code path. Switching back is not much complicated either.
--
Michael
Attachment

Re: Review of Refactoring code for sync node detection

From
Simon Riggs
Date:
On 31 October 2014 00:27, Michael Paquier <michael.paquier@gmail.com> wrote:
> On Fri, Oct 31, 2014 at 6:59 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>>
>> If we stick with this version I'd argue it makes more sense to just stick
>> the sync_node = and priority = statements into the if block and ditch the
>> continue. </nitpick>
>
> Let's go with the cleaner version then, I'd prefer code that can be read
> easily for this code path. Switching back is not much complicated either.

I like where you are headed with this feature set. I will do my best
to comment and review so we get something in 9.5.

Some review comments

* The new function calls them a Node rather than a Standby. That is a
good change, but it needs to be applied consistently, so we no longer
use the word Standby anywhere near the sync rep code. I'd prefer if we
continue to use the abbreviation sync for synchronous, since its less
typing and avoids spelling mistakes in code and comments.

* Rationale...
Robert Haas wrote
> Interestingly, the syncrep code has in some of its code paths the idea
> that a synchronous node is unique, while other code paths assume that
> there can be multiple synchronous nodes. If that's fine I think that
> it would be better to just make the code multiple-sync node aware, by
> having a single function call in walsender.c and syncrep.c that
> returns an integer array of WAL sender positions (WalSndCtl). as that
> seems more extensible long-term.

I thought this was the rationale for the patch, but it doesn't do
this. If you disagree with Robert, it would be best to say so now,
since I would guess he's expecting the patch to be doing as he asked.
Or perhaps I misunderstand.

* Multiple sync standbys were originally avoided because of the code
cost and complexity at ReleaseWaiters(). I'm very keen to keep the
code at that point very robust, so simplicity is essential. It would
also be useful to have some kind of micro-benchmark that allows us to
understand the overhead of various configs. Jim mentions he isn't sure
of the performance there. I don't see too much a problem with this
patch, but the later patches will require this, so we may as well do
this soon.

* We probably don't need a comment like /* Cleanup */ inserted above a
pfree. ;-)

I see this should get committed in next few days, even outside the CF.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Review of Refactoring code for sync node detection

From
Michael Paquier
Date:
Thanks for the comments!

On Sun, Nov 16, 2014 at 8:32 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 31 October 2014 00:27, Michael Paquier <michael.paquier@gmail.com> wrote:
>> On Fri, Oct 31, 2014 at 6:59 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>>>
>>> If we stick with this version I'd argue it makes more sense to just stick
>>> the sync_node = and priority = statements into the if block and ditch the
>>> continue. </nitpick>
>>
>> Let's go with the cleaner version then, I'd prefer code that can be read
>> easily for this code path. Switching back is not much complicated either.
>
> I like where you are headed with this feature set. I will do my best
> to comment and review so we get something in 9.5.
>
> Some review comments
>
> * The new function calls them a Node rather than a Standby. That is a
> good change, but it needs to be applied consistently, so we no longer
> use the word Standby anywhere near the sync rep code. I'd prefer if we
> continue to use the abbreviation sync for synchronous, since its less
> typing and avoids spelling mistakes in code and comments.
Yes I guess this makes sense.

> * Rationale...
> Robert Haas wrote
>> Interestingly, the syncrep code has in some of its code paths the idea
>> that a synchronous node is unique, while other code paths assume that
>> there can be multiple synchronous nodes. If that's fine I think that
>> it would be better to just make the code multiple-sync node aware, by
>> having a single function call in walsender.c and syncrep.c that
>> returns an integer array of WAL sender positions (WalSndCtl). as that
>> seems more extensible long-term.
>
> I thought this was the rationale for the patch, but it doesn't do
> this. If you disagree with Robert, it would be best to say so now,
> since I would guess he's expecting the patch to be doing as he asked.
> Or perhaps I misunderstand.

This comment is originally from me :)
http://www.postgresql.org/message-id/CAB7nPqTtmSo0YX1_3_PykpKbdGUi3UeFd0_=2-6VRQo5N_TB+A@mail.gmail.com
Changing with my older opinion, I wrote the patch on this thread with
in mind the idea to keep the code as simple as possible, so for now it
is better to still think that we have a single sync node. Let's work
the multiple node thing once we have a better spec of how to do it,
visibly using a dedicated micro-language within s_s_names.

> * Multiple sync standbys were originally avoided because of the code
> cost and complexity at ReleaseWaiters(). I'm very keen to keep the
> code at that point very robust, so simplicity is essential. It would
> also be useful to have some kind of micro-benchmark that allows us to
> understand the overhead of various configs. Jim mentions he isn't sure
> of the performance there. I don't see too much a problem with this
> patch, but the later patches will require this, so we may as well do
> this soon.
Yes I have been playing with that with the patch series of the
previous commit fest, with something not that much kludgy.

> * We probably don't need a comment like /* Cleanup */ inserted above a
> pfree. ;-)
Sure. I'll send an updated patch tomorrow.
Regards,
-- 
Michael



Re: Review of Refactoring code for sync node detection

From
Michael Paquier
Date:
On Sun, Nov 16, 2014 at 9:07 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> I'll send an updated patch tomorrow.

Here are updated versions. I divided the patch into two for clarity,
the first one is the actual refactoring patch, renaming
SyncRepGetSynchronousNode to SyncRepGetSynchronousStandby (+alpha,
like updating synchronous to sync in the comments as you mentioned)
such as the namings have no conflicts.

The second one updates the syncrep code, including WAL sender and WAL
receiver, and its surroundings to use the term "node" instead of
"standby". This brings in the code the idea that a node using
replication APIs is not necessarily a standby, making it more generic.
This can be applied on top of the refactoring patch. If any other
folks (Heikki or Fujii-san?) have comments about this idea feel free.
Regards,
--
Michael

Attachment

Re: Review of Refactoring code for sync node detection

From
Simon Riggs
Date:
On 16 November 2014 12:07, Michael Paquier <michael.paquier@gmail.com> wrote:

> Let's work
> the multiple node thing once we have a better spec of how to do it,
> visibly using a dedicated micro-language within s_s_names.

Hmm, please make sure that is a new post. That is easily something I
could disagree with, even though I support the need for more
functionality.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Review of Refactoring code for sync node detection

From
Michael Paquier
Date:
On Mon, Nov 17, 2014 at 10:00 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 16 November 2014 12:07, Michael Paquier <michael.paquier@gmail.com> wrote:
>
>> Let's work
>> the multiple node thing once we have a better spec of how to do it,
>> visibly using a dedicated micro-language within s_s_names.
>
> Hmm, please make sure that is a new post. That is easily something I
> could disagree with, even though I support the need for more
> functionality.
Sure. I am not really on that yet though :)
-- 
Michael



Re: Review of Refactoring code for sync node detection

From
Fujii Masao
Date:
On Mon, Nov 17, 2014 at 2:20 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Sun, Nov 16, 2014 at 9:07 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> I'll send an updated patch tomorrow.
>
> Here are updated versions. I divided the patch into two for clarity,
> the first one is the actual refactoring patch, renaming
> SyncRepGetSynchronousNode to SyncRepGetSynchronousStandby (+alpha,
> like updating synchronous to sync in the comments as you mentioned)
> such as the namings have no conflicts.
>
> The second one updates the syncrep code, including WAL sender and WAL
> receiver, and its surroundings to use the term "node" instead of
> "standby". This brings in the code the idea that a node using
> replication APIs is not necessarily a standby, making it more generic.
> This can be applied on top of the refactoring patch. If any other
> folks (Heikki or Fujii-san?) have comments about this idea feel free.

I've not read the patches yet. But while I was reading the code around
sync node detection, I thought that it might be good idea to treat the
node with priority as special. That is, if we find the active node with
the priority 1, we can immediately go out of the sync-node-detection-loop.

In many cases we can expect that the sync standby has the priority 1.
If yes, treating the priority 1 as special is good way to do, I think. Thought?

Regards,

-- 
Fujii Masao



Re: Review of Refactoring code for sync node detection

From
Michael Paquier
Date:
On Tue, Nov 18, 2014 at 3:03 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Mon, Nov 17, 2014 at 2:20 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Sun, Nov 16, 2014 at 9:07 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> I'll send an updated patch tomorrow.
>>
>> Here are updated versions. I divided the patch into two for clarity,
>> the first one is the actual refactoring patch, renaming
>> SyncRepGetSynchronousNode to SyncRepGetSynchronousStandby (+alpha,
>> like updating synchronous to sync in the comments as you mentioned)
>> such as the namings have no conflicts.
>>
>> The second one updates the syncrep code, including WAL sender and WAL
>> receiver, and its surroundings to use the term "node" instead of
>> "standby". This brings in the code the idea that a node using
>> replication APIs is not necessarily a standby, making it more generic.
>> This can be applied on top of the refactoring patch. If any other
>> folks (Heikki or Fujii-san?) have comments about this idea feel free.
>
> I've not read the patches yet. But while I was reading the code around
> sync node detection, I thought that it might be good idea to treat the
> node with priority as special. That is, if we find the active node with
> the priority 1, we can immediately go out of the sync-node-detection-loop.
>
> In many cases we can expect that the sync standby has the priority 1.
> If yes, treating the priority 1 as special is good way to do, I think. Thought?

That would really save some cycles. Now we still need to fetch the
priority numbers of all nodes for pg_stat_get_wal_senders, so in this
case scanning all the entries in the WAL sender array is necessary.
This optimization is orthogonal to the refactoring patch btw, no?
-- 
Michael



Re: Review of Refactoring code for sync node detection

From
Simon Riggs
Date:
On 18 November 2014 00:02, Michael Paquier <michael.paquier@gmail.com> wrote:

>> I've not read the patches yet. But while I was reading the code around
>> sync node detection, I thought that it might be good idea to treat the
>> node with priority as special. That is, if we find the active node with
>> the priority 1, we can immediately go out of the sync-node-detection-loop.
>>
>> In many cases we can expect that the sync standby has the priority 1.
>> If yes, treating the priority 1 as special is good way to do, I think. Thought?

Agreed

> That would really save some cycles. Now we still need to fetch the
> priority numbers of all nodes for pg_stat_get_wal_senders, so in this
> case scanning all the entries in the WAL sender array is necessary.
> This optimization is orthogonal to the refactoring patch btw, no?

To the refactoring patch, possibly. But not to the changes you're
planning to make.

ISTM that we should remember the priority list of nodes and check them
in that order by direct lookups to array elements. That will work for
1 or more nodes and it also works better with large numbers of
walsenders.

Can we just wait on this patch until we have the whole feature?

We quite often break larger patches down into smaller ones, but if
they arrive in lots of small pieces its more difficult to understand
and correct issues that are happening on the larger scale. Churning
the same piece of code multiple times is rather mind numbing.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Review of Refactoring code for sync node detection

From
Michael Paquier
Date:


On Tue, Nov 18, 2014 at 6:33 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
Can we just wait on this patch until we have the whole feature?
Well, this may take some time to even define, and even if goals are clearly defined this may take even more time to have a prototype to discuss about.
 
We quite often break larger patches down into smaller ones, but if
they arrive in lots of small pieces its more difficult to understand
and correct issues that are happening on the larger scale. Churning
the same piece of code multiple times is rather mind numbing.
Hm. I think that we are losing ourselves in this thread. The primary goal is to remove a code duplication between syncrep.c and walsender,c that exists since 9.1. Would it be possible to focus only on that for now? That's still the topic of this CF.
--
Michael

Re: Review of Refactoring code for sync node detection

From
Heikki Linnakangas
Date:
On 11/18/2014 11:23 PM, Michael Paquier wrote:
> On Tue, Nov 18, 2014 at 6:33 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
>> Can we just wait on this patch until we have the whole feature?
>>
> Well, this may take some time to even define, and even if goals are clearly
> defined this may take even more time to have a prototype to discuss about.
>
>
>> We quite often break larger patches down into smaller ones, but if
>> they arrive in lots of small pieces its more difficult to understand
>> and correct issues that are happening on the larger scale. Churning
>> the same piece of code multiple times is rather mind numbing.
>>
> Hm. I think that we are losing ourselves in this thread. The primary goal
> is to remove a code duplication between syncrep.c and walsender,c that
> exists since 9.1. Would it be possible to focus only on that for now?
> That's still the topic of this CF.

Some comments on this patch:

* I don't like filling the priorities-array in
SyncRepGetSynchronousStandby(). It might be convenient for the one
caller that needs it, but it seems pretty ad hoc.

In fact, I don't really see the point of having the array at all. You
could just print the priority in the main loop in
pg_stat_get_wal_senders(). Yeah, nominally you need to hold SyncRepLock
while reading the priority, but it seems over-zealous to be so strict
about that in pg_stat_wal_senders(), since it's just an informational
view, and priority changes so rarely that it's highly unlikely to hit a
race condition there. Also note that when you change the list of
synchronous standbys in the config file, and SIGHUP, the walsender
processes will update their priorities in random order. So the idea that
you get a consistent snapshot of the priorities is a bit bogus anyway.
Also note that between filling the priorities array and the main loop in
pg_stat_get_wal_senders(), a new WAL sender might connect and set a
slot's pid. With the current coding, you'll print an uninitialized value
from the array if that happens. So the only thing that holding the
SyncRepLock really protects from is a "torn" read of the value, if
reading an int is not atomic. We could use the spinlock to protect from
that if we care.

* Would be nicer to return a pointer, than index into the wal senders
array. That's what the caller really wants.

I propose the attached (I admit I haven't tested it).

- Heikki


Attachment

Re: Review of Refactoring code for sync node detection

From
Michael Paquier
Date:


On Thu, Dec 11, 2014 at 10:07 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
* I don't like filling the priorities-array in SyncRepGetSynchronousStandby(). It might be convenient for the one caller that needs it, but it seems pretty ad hoc.

In fact, I don't really see the point of having the array at all. You could just print the priority in the main loop in pg_stat_get_wal_senders(). Yeah, nominally you need to hold SyncRepLock while reading the priority, but it seems over-zealous to be so strict about that in pg_stat_wal_senders(), since it's just an informational view, and priority changes so rarely that it's highly unlikely to hit a race condition there. Also note that when you change the list of synchronous standbys in the config file, and SIGHUP, the walsender processes will update their priorities in random order. So the idea that you get a consistent snapshot of the priorities is a bit bogus anyway. Also note that between filling the priorities array and the main loop in pg_stat_get_wal_senders(), a new WAL sender might connect and set a slot's pid. With the current coding, you'll print an uninitialized value from the array if that happens. So the only thing that holding the SyncRepLock really protects from is a "torn" read of the value, if reading an int is not atomic. We could use the spinlock to protect from that if we care.

That's fair, and it simplifies the whole process as well as the API able to fetch the synchronous WAL sender.
 
* Would be nicer to return a pointer, than index into the wal senders array. That's what the caller really wants.

I propose the attached (I admit I haven't tested it).

Actually if you do it this way I think that it would be worth adding the small optimization Fujii-san mentioned upthread: if priority is equal to 1, we leave the loop earlier and return immediately the pointer. All those things gathered give the patch attached, that I actually tested FWIW with multiple standbys and multiple entries in s_s_names.

Regards,
--
Michael
Attachment

Re: Review of Refactoring code for sync node detection

From
Heikki Linnakangas
Date:
On 12/12/2014 04:29 AM, Michael Paquier wrote:
> On Thu, Dec 11, 2014 at 10:07 PM, Heikki Linnakangas <
> hlinnakangas@vmware.com> wrote:
>
>> I propose the attached (I admit I haven't tested it).
>>
> Actually if you do it this way I think that it would be worth adding the
> small optimization Fujii-san mentioned upthread: if priority is equal to 1,
> we leave the loop earlier and return immediately the pointer. All those
> things gathered give the patch attached, that I actually tested FWIW with
> multiple standbys and multiple entries in s_s_names.

Ok, committed.

- Heikki



Re: Review of Refactoring code for sync node detection

From
Michael Paquier
Date:
On Fri, Dec 12, 2014 at 9:38 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 12/12/2014 04:29 AM, Michael Paquier wrote:
>>
>> On Thu, Dec 11, 2014 at 10:07 PM, Heikki Linnakangas <
>> hlinnakangas@vmware.com> wrote:
>>
>>> I propose the attached (I admit I haven't tested it).
>>>
>> Actually if you do it this way I think that it would be worth adding the
>> small optimization Fujii-san mentioned upthread: if priority is equal to
>> 1,
>> we leave the loop earlier and return immediately the pointer. All those
>> things gathered give the patch attached, that I actually tested FWIW with
>> multiple standbys and multiple entries in s_s_names.
>
>
> Ok, committed.
Thanks!
-- 
Michael