Re: improve pg_receivewal code - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: improve pg_receivewal code |
Date | |
Msg-id | CALj2ACWw5qz0R2RxLUXbv1jyVyqghdqpJdxzKOEftAsRFE1Q7g@mail.gmail.com Whole thread Raw |
In response to | Re: improve pg_receivewal code (Ronan Dunklau <ronan.dunklau@aiven.io>) |
Responses |
Re: improve pg_receivewal code
|
List | pgsql-hackers |
On Thu, Sep 2, 2021 at 9:05 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote: > > 1) Fetch the server system identifier in the first RunIdentifySystem > > call and use it to identify(via pg_receivewal's ReceiveXlogStream) any > > unexpected changes that may happen in the server while pg_receivewal > > is connected to it. This can be helpful in scenarios when > > pg_receivewal tries to reconnect to the server (see the code around > > pg_usleep with RECONNECT_SLEEP_TIME) but something unexpected has > > happnend in the server that changed the its system identifier. Once > > the pg_receivewal establishes the conenction to server again, then the > > ReceiveXlogStream has a code chunk to compare the system identifier > > that we received in the initial connection. > > I'm not sure what kind of problem could be happening here: if you were > somewhat routed to another server ? Or if we "switched" the cluster listening > on that port ? Yeah. Also, pg_control file on the server can get corrupted for whatever reasons it may be. This sys identifier check is useful in case if the pg_receivewal connects to the server again and again. These are things that sound over cautious to me, however it's nothing wrong to use what ReceiveXlogStream provides. pg_basebackup does make use of this already. > > 2) Move the RunIdentifySystem to identify timeline id and start LSN > > from the server only if the pg_receivewal failed to get them from > > FindStreamingStart. This way, an extra IDENTIFY_SYSTEM command is > > avoided. > > That makes sense, even if we add another IDENTIFY_SYSTEM to check against the > one set in the first place. > > > 3) Place the "replication connetion shouldn't have any database name > > associated" error check right after RunIdentifySystem so that we can > > avoid fetching wal segment size with RetrieveWalSegSize if at all we > > were to fail with that error. This change is similar to what > > pg_recvlogical.c does. > > Makes sense. > > > 4) Move the RetrieveWalSegSize to just before pg_receivewal.c enters > > main loop to get the wal from the server. This avoids an unnecessary > > query for pg_receivewal with "--create-slot" or "--drop-slot". > > 5) Have an assertion after the pg_receivewal done a good amount of > > work to find start timeline and LSN might be helpful: > > Assert(stream.timeline != 0 && stream.startpos != InvalidXLogRecPtr); > > > > Attaching a patch that does take care of above improvements. Thoughts? > > Overall I think it is good. Thanks for your review. > However, you have some formatting issues, here it mixes tabs and spaces: > > + /* > + * No valid data can be found in the existing output > directory. > + * Get start LSN position and current timeline ID from > the server. > + */ My bad. I forgot to run "git diff --check" on the v1 patch. > And here it is not formatted properly: > > +static char *server_sysid = NULL; Done. Here's the v2 with above modifications. Regards, Bharath Rupireddy.
Attachment
pgsql-hackers by date: