Re: Coding TODO for 8.4: Synch Rep - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Coding TODO for 8.4: Synch Rep
Date
Msg-id 3f0b79eb0812171904x220b5f2cw3b639320c7473b43@mail.gmail.com
Whole thread Raw
In response to Re: Coding TODO for 8.4: Synch Rep  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Coding TODO for 8.4: Synch Rep
List pgsql-hackers
Hi,

On Thu, Dec 18, 2008 at 9:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> On Tue, 2008-12-16 at 14:27 +0900, Fujii Masao wrote:
>
>> I'd like to clarify the coding TODO of Synch Rep for 8.4. If indispensable
>> TODO item is not listed, please feel free to let me know.
>
>> Since there are many TODO items, I'm worried about the deadline.
>> When is the deadline of this commit fest? December 31st? first half
>> of January? ...etc?
>
> I think we're in a difficult position. The changes I've requested are
> major architecture changes, not that difficult to implement. I would
> have to say *not* doing them leaves us in a situation with a fairly
> awful architecture and it really doesn't make sense to sacrifice long
> term design for a few weeks.
>
> I don't think the review or scale of change is any different to other
> major patches in recent times. If people want to spend time discussing
> the points again, we can. Changes always seem like heavy lifting, but
> there's nothing I've asked for that is difficult, it's all
> straightforward stuff.

You are right. But I'm afraid that my coding speed is not so high as some
great hackers including you ;-) Yeah, I'm ready for happy Coding Xmas!

>
> In all honesty, I didn't think you were going to make the deadline. But
> you did, though with significantly reduced discussion on the key issues.
> That's definitely not a problem with me, sure we're a few weeks behind
> where we wanted to be, but that's nothing when you look at what we're
> dealing with and what we will gain.
>
>> 1. replication_timeout_action (GUC)
>>
>> This is new GUC to specify the reaction to replication_timeout. In the
>> latest patch, the user cannot configure the reaction, and the primary
>> always continue processing after the timeout occurs. In the next, the
>> user can choose the reaction:
>>
>> - standalone
>>   When the backend waits for replication much longer than the
>>   specified time (replication_timeout), that is, the timeout occurs, the
>>   backend sends replication_timeout interrupt to walsender. Then,
>>   walsender closes the connection to the standby, wake all waiting
>>   backends and exits. All the processing go on the standalone
>>   primary.
>>
>> - down
>>   When the timeout occurs, walsender signals SIGQUIT to
>>   postmaster instead of waking all backends, then the primary shuts
>>   down immediately.
>
> I'd put this as a much lower priority than other changes. It might still
> be required, but lets get it out there as soon as possible and see. If
> that means we have to punt on it entirely, so be it.

Okey.

>
>> 2. log_min_duration_replication (GUC)
>>
>> If the backend waits much longer than log_min_duration_replication,
>> the warning log message is produced like log_min_duration_xxx.
>> Unit is not percent against the timeout but msec because "msec" is
>> more convenient.
>
> Yes, but low priority.

Okey.

>
>> 3. recovery.conf
>>
>> I need  to change the recovery.conf patch to work with EXEC_BACKEND.
>> Someone advised locally me to move the options of replication to
>> postgresql.conf for convenient. That is, in order to start replication,
>> all the configuration files the user has to care is postgresql.conf.
>> Which do you think is best?
>>
>> The options which I'm going to use for replication are the following.
>>
>> - host of the primary (new)
>> - port of the primary (new)
>> - username to connect to the primary (new)
>> - restore_command
>
> Why not just have walreceiver explicitly read recovery.conf? That's what
> Startup process does. (It's only those two processes, right?)
>
> Reworking everything in the way described above would take ages and
> introduce lots of bugs.

Yes, I will make startup and walreceiver read recovery.conf separately.

>
>> 4. sleeping
>> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00438.php
>>
>> I'm looking for the better idea. How should we resolve that problem?
>> Only reduce the timeout of pq_wait to 100ms? Get rid of
>> SA_RESTART only during pq_wait as follows?
>>
>>    remove SA_RESTART
>>    pq_wait()
>>    add SA_RESTART
>
> Not sure, will consider. Ask others as well.
>
>> 5. Extend archive_mode
>> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00718.php
>
> Yes, definitely.

Okey.

>
>> 6. Continuous recovery without pg_standby
>> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00296.php
>
> Yes, definitely.

Okey.

>
>> 7. Switch modes on the standby
>> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00503.php
>
> This is a consequence of 5 and 6, not an additional feature. It's part
> of the same thing. So yes, definitely.

Yes.

>
>> 8. Define new trigger to promote the standby
>>
>> In the latest patch, since the standby always recover with pg_standby,
>> the standby is promoted by only the trigger file of pg_standby. But, the
>> architecture should be changed as indicated #6, 7. We need to define
>> new trigger to promote the standby to the primary. I have two ideas:
>>
>> - Trigger based on file
>>   Like pg_standby, startup process also check whether the trigger file
>>   exists periodically. The path of trigger file is specified in recovery.conf.
>>   The advantage of this idea is that one trigger file can promote the
>>   standby easily whether it's in FLS or SLS mode.
>>
>> - Trigger based on signal
>>   If postmaster received SIGTERM during recovery, the standby stops
>>   walreceiver, completes recovery and becomes the primary. In current
>>   HEAD, SIGTERM (Smart Shutdown) during recovery is not used yet.
>>
>> Which idea is better? Or, do you have any other better idea?
>>
>> In my design, trigger is always required to promote the standby. I mean,
>> the standby is not permitted to complete recovery and become the
>> primary without trigger. Even if the standby finds the corruption of WAL
>> record, it waits for trigger before ending recovery. This is because
>> postgres cannot make a correct decision whether to end recovery,
>> and wrong decision might cause split brain and undesirable increment
>> of timeline. Is this design OK?
>
> We don't need this change now because of (7). We aren't using pg_standby
> except for the initial stage so its much less important to do this for
> failover. So low priority, if at all.

I think that this feature is requisite. Otherwise, startup process
might wait for
next WAL record forever. And, since this is the problem about interface,
I wanted to hear from users before conding it.

>
>> 9. New synchronous option on the standby
>> http://archives.postgresql.org/pgsql-hackers/2008-12/msg01160.php
>>
>>
>> Pending now. These features are indispensable for 8.4?
>
> Given comments, yes.
>
> I don't see that as hard. Is there a problem in implementation? This
> seems the easiest thing to implement, just sneak in an fsync().

Ooops! Sorry for my confusing writing.
"Pending now" covers the following items, that is, (10). Of course,
I will add new synchronous option (fsync mode).

>
>> 10. Hang all connections everything is setup for "sync rep"
>> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00868.php
>
> IMHO don't really think we can do this sensibly until we can support
> multiple standby nodes. If we did this it would imply that if the
> standby was down then we should stop processing transactions, which is
> just a recipe for low availability, not high availability.
>
> ISTM we should offer a simple boolean function which says whether
> streaming replication is connected or not. If people want to defer
> connection until replication is connected then they can create a more
> complex startup script, just as they do to ensure correct sequence of
> all the required services already.

OK, I wiil add that function.

Name: pg_is_in_replication
Args: None
Returns: boolean
Description: whether replication is in progress

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Unicron
Date:
Subject: questions for the patch 'Enable pl/python to return records based on multiple OUT params' during reviewing
Next
From: "Fujii Masao"
Date:
Subject: Re: Sync Rep: First Thoughts on Code